在 R 中使用 tidyverse 的多条件 if/else 过滤器答案

【问题标题】：Multi-conditional if/else filter using tidyverse in R在 R 中使用 tidyverse 的多条件 if/else 过滤器
【发布时间】：2021-02-25 05:49:54
【问题描述】：

我想根据多个条件有条件地过滤。我在这个网站上看到很多帖子使用if/else 条件作为过滤器，但从来没有在单个if 语句中包含多个条件。

以如下示例数据为例：

data <- structure(list(names = c("Mike", "Mike", "Sam", "Sam", "Sam", 
"Emma", "Jessica", "Jessica"), tries = c(1, 2, 3, 2, 2, 3, 1, 
3)), class = "data.frame", row.names = c(NA, -8L))

data
        names tries
1    Mike     1
2    Mike     2
3     Sam     3
4     Sam     2
5     Sam     2
6    Emma     3
7 Jessica     1
8 Jessica     3

在这个具体的例子中，我想取每个分组名称的第一行，也许是第二行。如果尝试次数大于 1，我只想保留名称的第一行。但是，如果尝试次数等于 1，那么我想保留前两行。

我有一个从技术上讲可以完成这项工作的解决方法，但它似乎不应该工作。这是我的解决方法：

library(tidyverse)

data %>% 
  group_by(names) %>% 
  filter(
    if (row_number() == 1 & tries > 1) {
      row_number() == 1
    } else {
      row_number() == 1 | row_number() == 2
    }
  )
# A tibble: 6 x 2
# Groups:   names [4]
  names   tries
  <chr>   <dbl>
1 Mike        1
2 Mike        2
3 Sam         3
4 Emma        3
5 Jessica     1
6 Jessica     3
Warning messages:
1: In if (row_number() == 1 & tries > 1) { :
  the condition has length > 1 and only the first element will be used
2: In if (row_number() == 1 & tries > 1) { :
  the condition has length > 1 and only the first element will be used
3: In if (row_number() == 1 & tries > 1) { :
  the condition has length > 1 and only the first element will be used

您可以看到它确实返回了正确的数据集，但它给了我大量警告，并说由于if 语句中有多个条件，所以它只会使用第一个元素。但是，它似乎同时使用了这两个元素。我在这里错过了什么吗？是否有针对此问题的tidyverse 解决方案可以避免警告消息？

感谢您的帮助！

【问题讨论】：

你看过case_when()吗？

标签： r if-statement filter tidyverse

【解决方案1】：

好吧，当它们有多行时，您说“尝试次数等于 1”是什么意思，这有点不清楚。这是否意味着第一次尝试等于 1，或者他们的任何尝试都等于 1？你可以这样做

data %>% 
  group_by(names) %>% 
  filter(row_number()==1 | (row_number()==2 & first(tries)==1))

注意if 隐含在filter() 表达式中。这只是检查它在组中的第一行的位置，或者 (|) 如果它是第二行并且 (&) tries 的第一个值是 1。

【讨论】：

为了改进答案，能否详细说明|和&对于OP的作用？

【解决方案2】：

这对你有用吗：

library(dplyr)
data %>% group_by(names) %>% filter(if(tries[1] > 1) row_number() == 1 else row_number() == 1:2 )
# A tibble: 6 x 2
# Groups:   names [4]
  names   tries
  <chr>   <dbl>
1 Mike        1
2 Mike        2
3 Sam         3
4 Emma        3
5 Jessica     1
6 Jessica     3

【讨论】：