Ruby 中带有前后匹配项的负前瞻答案

【问题标题】：Negative lookahead in Ruby with preceding and following matchesRuby 中带有前后匹配项的负前瞻
【发布时间】：2018-02-02 22:40:15
【问题描述】：

我正在尝试解析 XML 文档（特别是 Sublime 颜色主题）并且我正在尝试使用负前瞻来阻止我不想要的匹配，但它似乎无法正常工作.

模式如下：

/
<key>name<\/key>
.*?                     # find as little as possible including new lines
<string>(.*?)<\/string> # Match the name of this color Rule
.*?
<dict>
((?!<\/dict>).)*?       # After the second opening <dict>, do not allow a closing </dict>
<key>foreground<\/key>  
.*?
<string>(.*?)<\/string> # Match the hex code for the name found in Match 1.
/mx                     # Treat a newline as a character matched by .
                        # Ignore Whitespace, comments.

正在匹配的字符串是：

<dict>
        <key>name</key>
        <string>**Variable**</string>
        <key>scope</key>
        <string>variable</string>
        <key>settings</key>
        <dict>
            <key>fontStyle</key>
            <string></string>
        </dict>
    </dict>

    <dict>
        <key>name</key>
        <string>Keyword</string>
        <key>scope</key>
        <string>keyword - (source.c keyword.operator | source.c++ keyword.operator | source.objc keyword.operator | source.objc++ keyword.operator), keyword.operator.word</string>
        <key>settings</key>
        <dict>
            <key>foreground</key>
            <string>**#F92672**</string>

匹配整个字符串，**Variable** 作为第一个捕获的组，**#F92672** 作为第二个。理想情况下，我希望第一个捕获的组是第二部分中的Keyword。我假设负前瞻的存在意味着第一部分不会成为匹配的一部分，因为它会看到 </dict> 并且无法匹配。

有谁知道我是否做错了，我可以做些什么来解决它？谢谢！

【问题讨论】：

标签： ruby regex regex-lookarounds negative-lookahead

【解决方案1】：

这是使用 Nokogiri 的一种方法：

require 'nokogiri'

theme = Nokogiri::XML.fragment(xml)
puts theme.xpath('./dict[1]/key[text()="name"]/following-sibling::string[1]').text
#=> "**Variable**"
puts theme.xpath('.//dict[preceding-sibling::key[1][text()="settings"]]/string').text
#=> "**#F92672**"

第一个 xpath 获取第一个 dict 并找到包含“名称”的 key，然后获取以下 string 元素的文本。

第二个 XPath 在包含“设置”的 key 之后立即查找 dict，并检索其 string 元素的文本。

请注意，如果您要解析完整文档而不是给定片段，则需要进行一些更改，例如将调用更改为 theme = Nokogiri::XML.parse(xml) 并从 XPath 表达式中删除前导 .。

【讨论】：

谢谢！我对 xpath 不太满意，并且在使用 Nokogiri 时遇到了麻烦，但我会再试一次。

【解决方案2】：

第一个dict 与字符串**Variable** 和第二个Keyword 具有相同的结构。而且您想通过负前瞻来区分它们，但这是不可能的。

将((?!<\/dict>).)*? 更改为(((?!<\/dict>).)*?) 以进行调试并且可以看到新组的内容是

result="
        <key>name</key>
        <string>Keyword</string>
        <key>scope</key>
        <string>keyword - (source.c keyword.operator | source.c++ keyword.operator | source.objc keyword.operator | source.objc++ keyword.operator), keyword.operator.word</string>
        <key>settings</key>
        <dict>
            "

这满足了您的否定前瞻。

即使你添加更多条件（只是使用结构作为条件而不是内容），因为相同的结构，**Variable** 将始终位于**#F92672** 之前。

所以使用 xml 解析器可能是更好的选择。

【讨论】：