【问题标题】:dplyr filter columns with value 0 for all rows with unique combinations of other columnsdplyr 为具有其他列的唯一组合的所有行过滤值为 0 的列
【发布时间】:2020-11-28 22:15:37
【问题描述】:

我有一个如下所示的数据框:

df <- tibble(date = c(2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01), 
             site = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"), 
             treatment = c("a", "a", "b", "b", "a", "a", "b", "b"),
             species = c("vetch", "clover", "vetch", "clover", "vetch", "clover", "vetch", "clover"),
             frequency = c(0, 1, 1, 1 1, 0, 1, 0))

但有很多日期、地点和治疗方法。我想要的是过滤掉该站点的该物种的所有频率(在所有处理和日期中)为 0 的观察值。因此,在上面我想删除站点“Z”的三叶草,因为它在该站点的任何处理或日期都没有发生,但我想将三叶草留在站点“X”,因为它确实发生在其中一种处理中。所以我想要:

tibble(date = c(2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01),
       site = c("X", "X", "X" "X", "Z", "Z"),
       treatment = c("a", "a", "b", "b", "a", "b"),
       species = c("vetch", "clover", "vetch", "clover", "vetch", "vetch")
       frequency = c(0, 1, 1, 1, 1, 1))

我的第一个想法是pivot_wider,选择列然后再次pivot_longer,但这不起作用,因为在站点“X”中仍然选择了三叶草列:

  df %>%
    pivot_wider(names_from = species, names_prefix = "spp.", values_from = frequency, values_fill = 0) %>%
    group_by(site) %>%
    select_if(~ !is.numeric(.) || sum(.) != 0) %>%
    pivot_longer(starts_with("spp."), names_to = "species", names_prefix = "spp.", values_to = "frequency") -> df

所以我想我需要过滤,但我不知道该怎么做。

【问题讨论】:

    标签: r dataframe dplyr filtering


    【解决方案1】:

    一个简单的解决方案可以通过创建另一个列来实现,该列包含按日期、地点和物种分组的每个物种的频率(忽略处理)。然后,您可以轻松地使用这个新列进行过滤,然后将其删除。

    library(tidyverse)
    df %>%
        # Group by date site and species
        group_by(date, site, species) %>%
        # Create new column that sums frequency values by grouping variables
        mutate(appears = sum(frequency)) %>%
        # ignore rows where appears = 0
        filter(appears != 0) %>%
        # Eliminate appears column
        select(-appears)
    

    【讨论】:

      【解决方案2】:

      可能不适用于此数据集,但通常使用 sum 可能不是正确的方法,因为如果您有负数,它可能会取消它并且您会删除错误的组。您可以使用allany

      dplyr

      library(dplyr)
      df %>% group_by(date, site, species) %>% filter(any(frequency != 0))
      #Also
      #df %>% group_by(date, site, species) %>% filter(!all(frequency == 0))
      
      #  date site  treatment species frequency
      #  <dbl> <chr> <chr>     <chr>       <dbl>
      #1  2018 X     a         vetch           0
      #2  2018 X     a         clover          1
      #3  2018 X     b         vetch           1
      #4  2018 X     b         clover          1
      #5  2018 Z     a         vetch           1
      #6  2018 Z     b         vetch           1
      

      data.table 也可以这样做:

      library(data.table)
      setDT(df)[, .SD[any(frequency != 0)], .(date, site, species)]
      

      或者在基础 R 中:

      subset(df, ave(frequency != 0, date, site, species, FUN = any))
      

      【讨论】:

        猜你喜欢
        • 2017-01-14
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-11-02
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多