【问题标题】:How to create capturing groups with regex re.compile?如何使用正则表达式 re.compile 创建捕获组?
【发布时间】:2019-10-05 04:26:51
【问题描述】:

可以成功找到字符串,但无法将匹配对象分成正确的组

完整的字符串如下:

 Technology libraries: Techlibhellohellohello

(都在一行上)。我想要做的是在文件中找到这一行(有效),但是当我想添加到字典时,我只想添加“技术库”部分而不是其他所有内容。我想使用 .group() 并指定哪个组,但只有 Techlibhellohellohello 似乎作为组(1)弹出,没有其他组出现。此外,技术库之前还有前导空白

要匹配的对象

is_startline_1 = re.compile(r" Technology libraries: (.*)$")

匹配的行

startline1_match = is_startline_1.match(line)

添加到字典

bookmark_dict['context']        = startline1_match.group(1)

所需的输出是 .groups(1) 或 .groups(2) 包含“技术库”

【问题讨论】:

  • 我已经修改了!
  • 对不起,我对 python 很陌生,不知道你的意思

标签: python regex parsing regex-group regex-greedy


【解决方案1】:

在这里,我们可能只想用捕获组包装第一部分:

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(Technology libraries: )(.*)$"

test_str = "Technology libraries: Techlibhellohellohello"

subst = "\\1\\n\\2"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

这个 JavaScript 演示展示了捕获组的工作原理:

const regex = /(Technology libraries: )(.*)$/gm;
const str = `Technology libraries: Techlibhellohellohello`;
const subst = `\n$1\n$2`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

正则表达式

如果这不是您想要的表达方式,您可以在regex101.com 中修改/更改您的表达方式。

 (Technology libraries: )(.*)

正则表达式电路

您还可以在jex.im 中可视化您的表达式:


如果您希望删除 : 和空格,您只需添加一个中间捕获组即可:

Demo

(Technology libraries)(:\s+)(.*)

Python 代码

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(Technology libraries)(:\s+)(.*)"

test_str = ("Technology libraries: Techlibhellohellohello\n"
    "Technology libraries:     Techlibhellohellohello")

subst = "\\1\\n\\3"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

JavaScript 演示

const regex = /(Technology libraries)(:\s+)(.*)/gm;
const str = `Technology libraries: Techlibhellohellohello
Technology libraries:     Techlibhellohellohello`;
const subst = `\n$1\n$3`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

如果您想捕获“技术库”之前的空格,您可以简单地将它们添加到捕获组:

^(\s+)(Technology libraries)(:\s+)(.*)$

Demo

Python 测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"^(\s+)(Technology libraries)(:\s+)(.*)$"

test_str = ("    Technology libraries: Techlibhellohellohello\n"
    "       Technology libraries:     Techlibhellohellohello")

subst = "\\2\\n\\4"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

JavaScript 演示

const regex = /^(\s+)(Technology libraries)(:\s+)(.*)$/gm;
const str = `    Technology libraries: Techlibhellohellohello
       Technology libraries:     Techlibhellohellohello`;
const subst = `$2\n$4`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

【讨论】:

  • 你好 Emma,这会在技术库之前处理领先的空白吗?
  • 你好,是的,我愿意!我稍后会尝试你的解决方案,看看它是否适合我:)
  • 嘿,Emma,我相信您误解了技术之前的空格,该文件有一行有 5 个“空格”,然后是技术库一词。你有一个非常详细的解决方案,我很感谢你,我现在正在尝试应用它。
  • 艾玛,你超越了,我感谢你。我正在查看您所做的事情,基本上,我所要做的就是将包含我的 re.compile 的行放入并更改如下: is_startline_1 = re.compile(r" (技术库): (.*) $")
猜你喜欢
  • 1970-01-01
  • 2016-06-01
  • 2017-06-19
  • 2018-03-11
  • 1970-01-01
  • 2018-07-21
  • 1970-01-01
相关资源
最近更新 更多