用于匹配字符串中任意位置 XYZ 之后出现的任何 ABC 的正则表达式答案

【问题标题】：Regex for matching any occurrence of ABC following XYZ anywhere in the string用于匹配字符串中任意位置 XYZ 之后出现的任何 ABC 的正则表达式
【发布时间】：2017-12-03 17:51:18
【问题描述】：

我正在尝试编写一个正则表达式，它可以匹配字符串中任何位置的 XYZ 之后出现的任何 ABC：

例如。 text - "一些 ABC 文本后跟 XYZ 后跟多个 ABC，更多 ABC，更多 ABC"

即，正则表达式应匹配 XYZ 之后的三个 ABC。

有什么线索吗？

【问题讨论】：

它应该在示例文本中捕获什么？ followed by multiple ABC, more ABC, more ABC 或每个人 ABC?
它应该捕获所有 ABC 的
是统计出现次数的重点吗？

标签： python regex

【解决方案1】：

您可以采用迭代方法：

s = "Some ABC text followed by XYZ followed by multiple ABC, more ABC, more ABC"

pattern = re.compile(r'(?<=XYZ)(.*?)ABC')
while pattern.search(s):
   s = pattern.sub(r'\1REPLACED', s)

print s

输出：

一些 ABC 文本后跟 XYZ 后跟多个 REPLACED，更多替换，更多替换

【讨论】：

非常感谢您的帮助。这正是我想要的。
@roopesh 请不要忘记接受或支持有用的答案。
@Hans然后完成。（虽然我没有足够的声誉来支持）

【解决方案2】：

只需匹配文字 XYZ 并在重复的 ABC 上分组：

r'XYZ((?:ABC)+)'

(?:ABC)+ 模式至少匹配一组文字字符一次，并且整个组前面都有一个文字 XYZ。

这是非常基本的正则表达式 101，您应该阅读良好的 tutorial on regular expression matching 以开始使用。

【讨论】：

【解决方案3】：

这样的？ r"(?<=XYZ)((?:ABC)+)"。这将仅匹配出现在 XYZ 之后的 ABC，但不包括 XYZ 本身。

编辑

看来我误解了 OP 的原始问题。最简单的方法是首先找到字符串XYZ。保存XYZ的起始位置。使用起始位置作为p.finditer(string, startpos) 的额外参数。请注意，这仅适用于已编译的正则表达式，因此您需要先编译您的模式。

您需要的模式只是r"(ABC)"。

或者，您也可以使用p.sub()，它也可以进行替换，但是要仅对字符串的一部分起作用，您需要先创建一个子字符串。 p.sub() 没有 startpos 参数。

【讨论】：

使用 python 的 re 模块尝试过，但似乎不起作用。代码{for i in re.finditer(r"(?
你的意思是像"Some ABC ABC text followed by XYZ followed by multiple ABC, more ABC, more ABC" 这样的字符串应该只返回最后三个ABC 实例？
是的，只有 XYZ 后面的那些 ABC - 最后三个。

【解决方案4】：

集合中有一个漂亮的 Counter 对象可能会有所帮助。 Counter 对象是一个字典，键是单个项目，值是计数。示例：

Counter('hello there hello'.split()) # {'hello':2, 'there', 1}

因为我们要计算单词，所以我们必须在看到空格的地方拆分短语。这是 split 方法的默认行为。这是一个使用计数器的示例脚本。如果需要，可以将下半部分改编成函数。

from collections import Counter

def count_frequency(phrase):
    """ Return a dictionary with {word: num_of_occurences} """
    counts = Counter(phrase.split())
    return counts

def replace_word(target_word, replacement, phrase):
    """ Replaces *word* with *replacement* in string *phrase* """
    phrase = phrase.split()

    for count, word in enumerate(phrase):
        if word == target_word:
            phrase[count] = replacement

    return ''.join(phrase)

phrase = "hello there hello hello"
word_counts = count_frequency(phrase)
new_phrase = ''
replacement = 'replaced'

for word in word_counts:
    if word_counts[word] > 2:
        phrase = phrase.replace(word, replacement)

print(phrase)

【讨论】：