捕获开始但不捕获结束标记答案

【问题标题】：Capture opening but not the closing tag捕获开始但不捕获结束标记
【发布时间】：2020-05-31 01:55:01
【问题描述】：

我想分割一个父块，同时沿着每个段的文本捕获嵌套标签：

(?<tag>.)(?: href="(?<url>.+?)")?>(?<text>.+?)<

它可以工作，但我希望“标签”在未包含在标签中的段中为空，但是使用当前的 reg.，这些捕获前一段的结束标签...:(

现场样例：https://regex101.com/r/UEZAaw/3/

我要获取的结果集，注意第2项和第4项的tag应该有null：

{
   "0":{
      match: "p>The <",
      tag: "p",
      url: null,
      text: "The "
   },
   "1":[
      match: "a href=\"https://www.legislation.gov.uk/ukpga/2010/23/contents\">UK Bribery Act<",
      tag: "a",
      url: "https://www.legislation.gov.uk/ukpga/2010/23/contents",
      text: "UK Bribery Act"
   ],
   "2":[
      match: "/a> (“the Act”) received Royal Assent in April 2010 and came into ... <",
      tag: null
      url: null,
      text: " (“the Act”) received Royal Assent in April 2010 and came into ... "
   ],
   "3":[
      match: "a href=\"http://www.oecd.org/daf/anti-bribery/ConvCombatBribery_ENG.pdf\">OECD anti-bribery Convention<",
      tag: "a",
      url: "http://www.oecd.org/daf/anti-bribery/ConvCombatBribery_ENG.pdf",
      text: "OECD anti-bribery Convention"
   ],
   "4":[
      match: "/a>. The Act outlined four prime offences, including the introduction ... <",
      tag: null,
      url: null,
      text: ". The Act outlined four prime offences, including the introduction ... "
   ],
   "5":[
      match: "b>rest is history<",
      tag: "b",
      url: null,
      text: "rest is history"
   ]
   ...
}

花了几个小时，还没有弄明白，非常感谢您的建议。

【问题讨论】：

你能举例说明什么可以捕捉，什么不能捕捉吗？
你已经给出了你的尝试以及示例文本。但是，您没有给出您想要获得的示例（我认为这与@Addis 已经评论的内容一致）。另外，我个人觉得这部分 但我想 [...] :( 有点令人费解。你能用更多的词来表达你的意图吗？最后，我们为什么要关心示例标记？怎么样？对问题/答案重要吗？
必须是正则表达式吗？它通常不是一个很好的 XML/HTML 解析工具，但是浏览器有一些很棒的 JavaScript 工具，包括 DOM 和 XMLDocument 正是为了这个目的。
@EnricoMariaDeAngelis 我已经通过包含我想要获得的结果来更新问题。我还想包含一个源标记示例，这确实只对结果示例有意义。
@David784 一定是这样，我将它用于我的 HTML 到 pdf 解析器。

标签： javascript html regex regex-lookarounds capturing-group

【解决方案1】：

根据我在regex101 的比赛信息框中看到的内容，我认为这可行：

/(?:(?<tag>(?<!\/).)|(?:\/.))(?: href="(?<url>.+?)")?>(?<text>.+?)</gm

【讨论】：

谢谢伙计，这行得通。只有一个警告，它使用了 Safari 不支持的负面回溯 :( 你能建议一种没有它的方法吗？
@EdmondTamas，我想不出办法，原因是：如果您不想向后看（即匹配某些东西而不消耗它）标签之前的内容，那么您有消费它；但是，如果您使用 <（这是后向匹配而不使用的内容），它将无法用于下一次匹配。