使用正则表达式提取控股公司答案

【问题标题】：Using regex to extract holding company使用正则表达式提取控股公司
【发布时间】：2020-11-25 07:18:50
【问题描述】：

给定一个结构如下的字符串-

" (subsidiary of <holding_company>) <post_>"

在哪里

holding_company 可能包含字母和一些特殊字符，包括括号
post_ 可以包含任何字符

示例字符串：“ google (subsidiary of alphabet (inc.)) xyz”

如何使用正则表达式提取控股公司名称？

【问题讨论】：

标签： python regex python-2.7

【解决方案1】：

要提取的正则表达式如下：

"subsidiary of\s+(.*)\)\s+\S+"

在 Python2 代码中，您会执行以下操作：

import re
regex = r"subsidiary of\s+(.*)\)\s+\S+"
test_str = "\" (subsidiary of <holding_company>) <post_>\""

m = re.search(regex, test_str)

if m:
  # if it found the pattern, the company name is in group(1)
  print m.group(1)

在此处查看实际操作：https://repl.it/repls/ShyFocusedInstructions#main.py

【讨论】：

完美运行！谢谢...但是您能否就 group(1) 如何捕获控股公司添加一些解释？
那组方法很棒！如果python重新导入，我看到的越多，我就越喜欢它。
@schwillr 正则表达式具有捕获组，允许您提取匹配文本的部分。这些组括在括号中，在本例中为(.*)。可以有很多组，其中第一个 - group(0) - 是整个匹配。我建议您在线阅读更好更深入的解释，例如 [docs.python.org/2.7/library/re.html#module-re](the python 文档）。另外，和优秀的https://regex101.com/在线玩

【解决方案2】：

这会让你到达那里：

(?<=\(subsidiary of)(.*)(?=\) )

【讨论】：

【解决方案3】：

这将为您的控股公司和帖子创建捕获组。您可能需要扩展正则表达式以包含其他特殊字符。如果您需要扩展它，这是 regex101 上的正则表达式 https://regex101.com/r/xpVfqU/1

#!/usr/bin/python3

import re

str=" (subsidiary of <holding_company>) <post_>"

holding_company=re.sub(r'\s\(subsidiary\ of\ ([\w<>]*)\)\s*(.*)', '\\1', str)
post=re.sub(r'\s\(subsidiary\ of\ ([\w<>]*)\)\s*(.*)', '\\2', str)

print(holding_company)
print(post)

【讨论】：