虽然在大多数情况下,我会使用 stringr 包,正如 CPak 的回答中已经建议的那样,我也有 grep 解决方案:
# create the sample string
c <- ("She sold seashells by the seashore, and she had a great time while doing so.")
# match any sold and great string within the text
# ignore case so that Sold and Great are also matched
grep("(sold.*great|great.*sold)", c, value = TRUE, ignore.case = TRUE)
嗯,还不错吧?但是如果有一个词只包含短语sold 或great 呢?
# set up alternative string
d <- ("She saw soldier eating seashells by the seashore, and she had a great time while doing so.")
# even soldier is matched here:
grep("(sold.*great|great.*sold)", d, value = TRUE, ignore.case = TRUE)
所以你可能想要使用单词边界,即匹配整个单词:
# \\b is a special character which matches word endings
grep("(\\bsold\\b.*\\bgreat\\b|\\bgreat\\b.*\\bsold\\b)", d, value = TRUE, ignore.case = TRUE)
\\b 匹配字符串中的第一个字符、字符串中的最后一个字符或两个字符之间,其中一个属于单词,另一个不属于:
更多关于 \b 元字符的信息在这里:
http://www.regular-expressions.info/wordboundaries.html