根据 R 中的单词列表过滤列

【问题标题】：Filter a column based on a list of words in R根据 R 中的单词列表过滤列
【发布时间】：2016-08-14 09:25:35
【问题描述】：

我想过滤数据集中具有 >200 万行的列。如果该列中的任何行包含 70 个单词列表中的单词，则应进行过滤。

我用过这个fruits$type[grepl(c("apple","orange","grapes"),fruits$type)] 但我得到如下错误：

参数 'pattern' 的长度 > 1 并且只有第一个元素是用过的我只用一个词过滤的时候效果很好，但我有大约 70 个词，因此很难单独写 70 行。

我尝试了提到here 的建议，但没有奏效。谁能帮帮我？

【问题讨论】：

我觉得你需要使用grepl("apple|orange|grapes", fruits$type)
我之前也试过了，它给出了这个错误“操作只能用于数字、逻辑或复杂类型”
不知道，看看这里：stackoverflow.com/questions/5680819/…

标签： r string filter dplyr

【解决方案1】：

如果有很多关键词，我们可以循环遍历这些词做 grepl 和 Reduce 和 | 得到一个逻辑向量来子集数据集

res <- fruits$type[Reduce(`|`, lapply(v1, grepl, x = fruits$type))]
length(res)
#[1] 11

数据

v1 <- c("apple", "orange", "grapes")
set.seed(24)
fruits <- data.frame(type = sample(c("apple", "orange", "grapes", 
    "banana", "water melon"), 20, replace=TRUE), val = rnorm(20), stringsAsFactors=FALSE)

【讨论】：

感谢它提取了这些行。也许我的问题并不清楚。如何选择与此结果关联的所有其他列？也许在 dplyr 中使用管道？
@Tony 在这种情况下，您不需要选择该列。即fruits[Reduce(|, lapply(v1, grepl, x = fruits$type)),]