【问题标题】:R AND Operator in Regex [duplicate]正则表达式中的 R AND 运算符 [重复]
【发布时间】:2017-09-17 02:24:28
【问题描述】:

我正在尝试获取一个包含大量段落的表达式,并在该行中找到两个特定单词的行,所以我正在寻找 AND 运算符?有什么办法可以做到这一点?

例如:

c <- ("She sold seashells by the seashore, and she had a great time while doing so.")

我想要一个表达式,它可以找到一行中同时包含“sold”和“great”的行。

我尝试过类似的方法:

grep("sold", "great", c, value = TRUE) 

有什么想法吗?

非常感谢!

【问题讨论】:

    标签: r regex operator-keyword operations


    【解决方案1】:

    您可以创建两个捕获组,假设单词的顺序不重要

    grep("(sold|great)(?:.+)(sold|great)", c, value = TRUE)
    

    【讨论】:

    • 谢谢,但我实际上是在寻找包含两者的行,而不是任何一个词。如果线路已售出但不是很好,我不希望退回线路。
    • @intern14,抱歉,我误解了。请参阅上面的编辑。
    【解决方案2】:

    虽然在大多数情况下,我会使用 stringr 包,正如 CPak 的回答中已经建议的那样,我也有 grep 解决方案:

    # create the sample string
    c <- ("She sold seashells by the seashore, and she had a great time while doing so.")
    
    # match any sold and great string within the text
    # ignore case so that Sold and Great are also matched
    grep("(sold.*great|great.*sold)", c, value = TRUE, ignore.case = TRUE)
    

    嗯,还不错吧?但是如果有一个词只包含短语soldgreat 呢?

    # set up alternative string
    d <- ("She saw soldier eating seashells by the seashore, and she had a great time while doing so.")
    # even soldier is matched here:
    grep("(sold.*great|great.*sold)", d, value = TRUE, ignore.case = TRUE)
    

    所以你可能想要使用单词边界,即匹配整个单词:

    # \\b is a special character which matches word endings
    grep("(\\bsold\\b.*\\bgreat\\b|\\bgreat\\b.*\\bsold\\b)", d, value = TRUE, ignore.case = TRUE)
    

    \\b 匹配字符串中的第一个字符、字符串中的最后一个字符或两个字符之间,其中一个属于单词,另一个不属于:

    更多关于 \b 元字符的信息在这里: http://www.regular-expressions.info/wordboundaries.html

    【讨论】:

      【解决方案3】:

      重复的帖子可能会让您入门,但我认为不能直接解决您的问题。

      您可以将stringr::str_detectall 结合使用

      pos <- ("She sold seashells by the seashore, and she had a great time while doing so.") # contains sold and great
      neg <- ("She bought seashells by the seashore, and she had a great time while doing so.") # contains great
      
      pattern <- c("sold", "great")
      
      library(stringr)
      all(str_detect(pos,pattern))
      # [1] TRUE
      
      all(str_detect(neg,pattern))
      # [1] FALSE
      

      stringr::detect 具有搜索模式字符向量的优势(优于 grepl

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2018-06-19
        • 2020-05-07
        • 2010-10-02
        • 1970-01-01
        相关资源
        最近更新 更多