【问题标题】:Only filter values in a column based on a condition仅根据条件过滤列中的值
【发布时间】:2021-02-05 10:54:00
【问题描述】:

假设我有以下数据框:

my_basket = data.frame(ITEM_GROUP = c("Fruit","Fruit","Fruit","Fruit","Fruit","Vegetable","Vegetable","Vegetable","Vegetable","Dairy","Dairy","Dairy","Dairy","Dairy"), 
                   ITEM_NAME = c("Apple","Banana","Orange","Mango","Papaya","Carrot","Potato","Brinjal","Raddish","Milk","Curd","Cheese","Milk","Paneer"),
                   Price = c(100,80,80,90,65,70,60,70,25,60,40,35,50,NA),
                   Tax = c(2,4,5,6,2,3,5,1,3,4,5,6,4,NA))

然后产生:

    > my_basket
   ITEM_GROUP ITEM_NAME Price Tax
1       Fruit     Apple   100   2
2       Fruit    Banana    80   4
3       Fruit    Orange    80   5
4       Fruit     Mango    90   6
5       Fruit    Papaya    65   2
6   Vegetable    Carrot    70   3
7   Vegetable    Potato    60   5
8   Vegetable   Brinjal    70   1
9   Vegetable   Raddish    25   3
10      Dairy      Milk    60   4
11      Dairy      Curd    40   5
12      Dairy    Cheese    35   6
13      Dairy      Milk    50   4
14      Dairy    Paneer    NA  NA

我现在想做的是列出我想要保留的水果,然后过滤它们,所以:

fruitlist = c("Apple", "Banana")

我将如何使用 tidyverse 过滤我的 data.frame 中的数据,只保留我的水果列表中的水果,以及我所有的蔬菜和奶制品?通常我会这样做:

my_basket %<>% filter(ITEM_NAME %in% fruitlist)

但是我也会失去所有的蔬菜和奶制品,这不是我想要的。我一直在尝试用 case_when 做一些事情,但似乎无法让它发挥作用。一定有一些明显的东西我在这里遗漏了。

编辑:发布我的问题几秒钟后,我终于意识到:

my_basket %<>% filter(ITEM_NAME %in% fruitlist | ITEM_GROUP != "Fruit")

这样就解决了。我想如果我必须像这样过滤多个组,那么重复管道过滤器命令也可以。

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    您可以将grepl 与正则表达式交替使用:

    fruitlist <- c("Apple", "Banana")
    regex <- paste0("^(?:", paste0(fruitlist, collapse="|"), ")$")
    my_basket %<>% filter(grepl(regex, ITEM_NAME))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-11-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-08-12
      • 2022-10-07
      相关资源
      最近更新 更多