过滤依赖于两列的逻辑[重复]答案

【问题标题】：Logic for filtering dependent on two columns [duplicate]过滤依赖于两列的逻辑[重复]
【发布时间】：2020-12-15 00:07:30
【问题描述】：

我正在努力编写正确的逻辑来仅根据一列中的条件过滤两列。我有多个 id，如果一个 id 出现在 2020 年，我希望该 id 被测量的其他年份的所有数据都出现。

例如，如果一个组包含数字 3，我想要该组中的所有值。我们应该最终得到一个包含所有 b 和 d 行的数据框。

df4 <- data.frame(group = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", 
                        "c", "c", "c", "c", "c", "d", "d", "d", "d", "d"),
                  pop = c(1, 2, 2, 4, 5, 1, 2, 3, 4, 5, 1, 2, 1, 4, 5, 1, 2, 3, 4, 5),
                  value  = c(1,2,3,2.5,2,2,3,4,3.5,3,3,2,1,2,2.5,0.5,1.5,6,2,1.5)) 

threes <- df4 %>%
   filter(pop == 3 |&ifelse????

【问题讨论】：

标签： r dplyr

【解决方案1】：

比这里的其他答案慢一点（涉及更多步骤），但对我来说更清楚一点：

df4 %>% 
  filter(pop == 3) %>% 
  distinct(group) %>% 
  pull(group) -> groups

df4 %>% 
  filter(group %in% groups)

或者如果你想结合这两个步骤：

df4 %>% 
  filter(group %in% df4 %>% 
           filter(pop == 3) %>% 
           distinct(group) %>% 
           pull(group))

【讨论】：

【解决方案2】：

你可以这样做：

df4[df4$group %in% df4$group[df4$pop == 3],]
#>    group pop value
#> 6      b   1   2.0
#> 7      b   2   3.0
#> 8      b   3   4.0
#> 9      b   4   3.5
#> 10     b   5   3.0
#> 16     d   1   0.5
#> 17     d   2   1.5
#> 18     d   3   6.0
#> 19     d   4   2.0
#> 20     d   5   1.5

【讨论】：

基础 R FTW！我个人觉得使用with 更容易阅读（虽然没有保存 nchars）：df4[with(df4, group %in% group[pop == 3]), ]。

【解决方案3】：

您可以通过结合使用 dplyr group_by()、filter() 和 any() 函数来实现此目的。 any() 将为匹配条件返回 TRUE。 Group by 将对您作为分组提到的变量的每个子组执行操作。请按以下步骤操作：

首先将数据通过管道传输到 group_by() 以按您的组变量进行分组。
然后使用 any() 函数通过管道传递给 filter() 以过滤任何组 pop 是否等于 3。

df4 <- data.frame(group = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", 
                            "c", "c", "c", "c", "c", "d", "d", "d", "d", "d"),
                  pop = c(1, 2, 2, 4, 5, 1, 2, 3, 4, 5, 1, 2, 1, 4, 5, 1, 2, 3, 4, 5),
                  value  = c(1,2,3,2.5,2,2,3,4,3.5,3,3,2,1,2,2.5,0.5,1.5,6,2,1.5)) 
# load the library
library(dplyr)

threes <- df4 %>% 
group_by(group) %>%  
filter(any(pop == 3))
# print the result
threes

输出：

threes
# A tibble: 10 x 3
# Groups:   group [2]
   group   pop value
   <chr> <dbl> <dbl>
 1 b         1   2  
 2 b         2   3  
 3 b         3   4  
 4 b         4   3.5
 5 b         5   3  
 6 d         1   0.5
 7 d         2   1.5
 8 d         3   6  
 9 d         4   2  
10 d         5   1.5

【讨论】：

这和this answer有什么不同？
我没有注意到其他人在写答案，直到我自己弄清楚了这个逻辑。巧合的是，我们的逻辑是一样的。我试图更详细地解释如何使用该代码并测试我的代码以确保它按照问题中提出的方式工作。

【解决方案4】：

一个简单的基本 R 选项是使用 subset + ave

subset(
  df4,
  ave(pop == 3, group, FUN = any)
)

给了

   group pop value
6      b   1   2.0
7      b   2   3.0
8      b   3   4.0
9      b   4   3.5
10     b   5   3.0
16     d   1   0.5
17     d   2   1.5
18     d   3   6.0
19     d   4   2.0

【讨论】：

【解决方案5】：

使用 dplyr：

df4%>%group_by(group)%>%filter(any(pop==3))

【讨论】：