过滤 Stringr 匹配 (str_detect)，除了 R 中的特定相似值？答案

【问题标题】：Filtering on a Stringr Match (str_detect) EXCEPT For a Particular Similar Value in R?过滤 Stringr 匹配 (str_detect)，除了 R 中的特定相似值？
【发布时间】：2019-06-07 11:24:07
【问题描述】：

我正在尝试创建一个 dplyr 管道来过滤

想象一个数据框jobs，我想从titles 列中过滤掉最高级的职位：

titles

Chief Executive Officer
Chief Financial Officer
Chief Technical Officer
Manager
Product Manager
Programmer
Scientist
Marketer
Lawyer
Secretary

用于过滤掉它们的 R 代码（直到“经理”）将是...

jobs %>% 
filter(!str_detect(title, 'Chief')) %>%
filter(!str_detect(title, 'Manager')) ...

但我仍希望将“程序管理器”保留在最终过滤中，以生成一个包含所有“较低级别作业”的新数据框，例如

Product Manager
Programmer
Scientist
Marketer
Lawyer
Secretary

有没有办法在给定的值上指定 str_detect() 过滤器，除了一个特定的字符串？

假设数据框的列有 1000 多个角色，包括“经理”在内的各种字符串组合，但总会有针对特定异常的过滤器。

【问题讨论】：

为什么不像... %>% filter(!stringr::str_detect(title, "^Chief|^Manager$))那样简单地使用锚点来“经理”。 ^ 锚告诉正则表达式匹配以“Manager”开头的字符串。另一个锚$ 确保字符串也必须以“Manager”结尾。

标签： r dplyr stringr

【解决方案1】：

或者您可以为“产品经理”设置一个单独的filter

library(tidyverse)

jobs %>% 
filter((!str_detect(title, "Chief|Manager")) | str_detect(title, "Product Manager"))


#            title
#1 Product Manager
#2      Programmer
#3       Scientist
#4        Marketer
#5          Lawyer
#6       Secretary

也可以使用grepl/grep在base R中扭曲

jobs[c(grep("Product Manager",jobs$title), 
       grep("Chief|Manager", jobs$title, invert = TRUE)),, drop = FALSE]

【讨论】：