【问题标题】:Regex: negative look-ahead between two matches正则表达式:两个匹配项之间的负前瞻
【发布时间】:2012-04-08 06:10:32
【问题描述】:

我正在尝试构建一个有点像这样的正则表达式:

[match-word] ... [exclude-specific-word] ... [match-word]

这似乎适用于负前瞻,但是当我遇到这样的情况时遇到了问题:

[match-word] ... [exclude-specific-word] ... [match-word] ... [excluded word appears again]

我希望上面的句子匹配,但是第一个和第二个匹配的单词之间的否定前瞻“溢出”,所以第二个单词永远不会匹配。

让我们看一个实际的例子。

我不想匹配每个包含单词“i”和单词“pie”的句子,但不匹配这两个单词之间的单词“hate”。 我有这三句话:

i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this

我有这个正则表达式:

^i(?!.*hate).*pie          - have removed the word boundaries for clarity, original is: ^i\b(?!.*\bhate\b).*\bpie\b 

匹配第一句,但不匹配第二句,因为否定前瞻扫描整个字符串。

有没有办法限制负前瞻,让它在遇到“仇恨”之前遇到“派”就满足了?

注意:在我的实现中,这个正则表达式后面可能还有其他术语(它是从语法搜索引擎动态构建的),例如:

^i(?!.*hate).*pie.*donuts

我目前正在使用 JRegex,但如有必要可能会切换到 JDK Regex

更新:我忘了在我最初的问题中提到一些内容:

“否定”结构可能存在于句子的更远位置,即使“否定”结构存在于更远的位置,我也希望匹配该句子。

为了澄清,请看这些句子:

i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this
i sure like eating pie, but i like donuts and i hate making pie <- Do want to match this

rob 的回答非常适合这个额外的限制,所以我接受了。

【问题讨论】:

    标签: regex lookahead negative-lookahead


    【解决方案1】:

    在开始词和停用词之间的每个字符处,您必须确保它与您的否定词或停用词不匹配。像这样(为了便于阅读,我添加了一些空白):

    ^i ( (?!hate|pie) . )* pie
    

    这是一个用于测试的 python 程序。

    import re
    
    test = [ ('i sure like eating pie, but i love donuts', True),
             ('i sure like eating pie, but i hate donuts', True),
             ('i sure hate eating pie, but i like donuts', False) ]
    
    rx = re.compile(r"^i ((?!hate|pie).)* pie", re.X)
    
    for t,v in test:
        m = rx.match(t)
        print t, "pass" if bool(m) == v else "fail"
    

    【讨论】:

    • 正则表达式中的空白无助于提高可读性,只会让人困惑
    • @death 空白在 python 正则表达式中有效,带有“verbose”标志。让你困惑,对我有帮助......我们有不同的意见。 (编辑也很容易。)
    • 那你为什么不在你的 python 例子中使用空格呢?
    • 感谢 rob,我自己永远也想不通,但它很有意义!
    【解决方案2】:

    这个正则表达式应该适合你

    ^(?!i.*hate.*pie)i.*pie.*donuts
    

    说明

    "^" +          // Assert position at the beginning of a line (at beginning of the string or after a line break character)
    "(?!" +        // Assert that it is impossible to match the regex below starting at this position (negative lookahead)
       "i" +          // Match the character “i” literally
       "." +          // Match any single character that is not a line break character
          "*" +          // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
       "hate" +       // Match the characters “hate” literally
       "." +          // Match any single character that is not a line break character
          "*" +          // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
       "pie" +        // Match the characters “pie” literally
    ")" +
    "i" +          // Match the character “i” literally
    "." +          // Match any single character that is not a line break character
       "*" +          // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    "pie" +        // Match the characters “pie” literally
    "." +          // Match any single character that is not a line break character
       "*" +          // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    "donuts"       // Match the characters “donuts” literally
    

    【讨论】:

      【解决方案3】:

      ...A...B...之间不匹配C

      python中测试:

      $ python
      >>> import re
      >>> re.match(r'.*A(?!.*C.*B).*B', 'C A x B C')
      <_sre.SRE_Match object at 0x94ab7c8>
      

      所以我得到了这个正则表达式:

      .*\bi\b(?!.*hate.*pie).*pie
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-08-22
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多