正则表达式中的 R AND 运算符 [重复]答案

【问题标题】：R AND Operator in Regex [duplicate]正则表达式中的 R AND 运算符 [重复]
【发布时间】：2017-09-17 02:24:28
【问题描述】：

我正在尝试获取一个包含大量段落的表达式，并在该行中找到两个特定单词的行，所以我正在寻找 AND 运算符？有什么办法可以做到这一点？

例如：

c <- ("She sold seashells by the seashore, and she had a great time while doing so.")

我想要一个表达式，它可以找到一行中同时包含“sold”和“great”的行。

我尝试过类似的方法：

grep("sold", "great", c, value = TRUE)

有什么想法吗？

非常感谢！

【问题讨论】：

标签： r regex operator-keyword operations

【解决方案1】：

您可以创建两个捕获组，假设单词的顺序不重要

grep("(sold|great)(?:.+)(sold|great)", c, value = TRUE)

【讨论】：

谢谢，但我实际上是在寻找包含两者的行，而不是任何一个词。如果线路已售出但不是很好，我不希望退回线路。
@intern14，抱歉，我误解了。请参阅上面的编辑。

【解决方案2】：

虽然在大多数情况下，我会使用 stringr 包，正如 CPak 的回答中已经建议的那样，我也有 grep 解决方案：

# create the sample string
c <- ("She sold seashells by the seashore, and she had a great time while doing so.")

# match any sold and great string within the text
# ignore case so that Sold and Great are also matched
grep("(sold.*great|great.*sold)", c, value = TRUE, ignore.case = TRUE)

嗯，还不错吧？但是如果有一个词只包含短语sold 或great 呢？

# set up alternative string
d <- ("She saw soldier eating seashells by the seashore, and she had a great time while doing so.")
# even soldier is matched here:
grep("(sold.*great|great.*sold)", d, value = TRUE, ignore.case = TRUE)

所以你可能想要使用单词边界，即匹配整个单词：

# \\b is a special character which matches word endings
grep("(\\bsold\\b.*\\bgreat\\b|\\bgreat\\b.*\\bsold\\b)", d, value = TRUE, ignore.case = TRUE)

\\b 匹配字符串中的第一个字符、字符串中的最后一个字符或两个字符之间，其中一个属于单词，另一个不属于：

更多关于 \b 元字符的信息在这里： http://www.regular-expressions.info/wordboundaries.html

【讨论】：

【解决方案3】：

重复的帖子可能会让您入门，但我认为不能直接解决您的问题。

您可以将stringr::str_detect 与all 结合使用

pos <- ("She sold seashells by the seashore, and she had a great time while doing so.") # contains sold and great
neg <- ("She bought seashells by the seashore, and she had a great time while doing so.") # contains great

pattern <- c("sold", "great")

library(stringr)
all(str_detect(pos,pattern))
# [1] TRUE

all(str_detect(neg,pattern))
# [1] FALSE

stringr::detect 具有搜索模式字符向量的优势（优于 grepl）

【讨论】：