在 re.sub 中传递函数时获取匹配号 [重复]答案

【问题标题】：Getting the match number when passing a function in re.sub [duplicate]在 re.sub 中传递函数时获取匹配号 [重复]
【发布时间】：2020-04-07 18:06:11
【问题描述】：

使用re.sub中的函数时：

import re
def custom_replace(match):
    # how to get the match number here? i.e. 0, 1, 2
    return 'a'
print(re.sub(r'o', custom_replace, "oh hello wow"))

如何获取custom_replace里面的匹配号？

即0、1、2 表示示例输入字符串的三个“o”。

注意：我不想为此使用全局变量，因为多个此类操作可能发生在不同的线程等中。

【问题讨论】：

比赛号码是什么意思？
@Jan 我的意思是 0、1、2 表示示例输入字符串的三个“o”。
应该只使用re.sub 吗？
@Ch3steR 我也愿意接受其他解决方案。

标签： python regex python-re

【解决方案1】：

根据@Barmar 的回答，我尝试了这个：

import re

def custom_replace(match, matchcount):
    result = 'a' + str(matchcount.i)
    matchcount.i += 1
    return result

def any_request():
    matchcount = lambda: None  # an empty "object", see https://stackoverflow.com/questions/19476816/creating-an-empty-object-in-python/37540574#37540574
    matchcount.i = 0           # benefit : it's a local variable that we pass to custom_replace "as reference
    print(re.sub(r'o', lambda match: custom_replace(match, matchcount), "oh hello wow"))
    # a0h hella1 wa2w

any_request()

它似乎有效。

原因：我有点不愿意为此使用全局变量，因为我在 Web 框架内的路由函数中使用它（此处称为 any_request()）。
假设有许多并行请求（在线程中），我不希望在不同调用之间“混合”一个全局变量（因为操作可能不是原子的？）

【讨论】：

【解决方案2】：

似乎没有内置方式。您可以使用全局变量作为计数器。

def custom_replace(match):
    global match_num
    result = 'a' + str(match_num)
    match_num += 1
    return result

match_num = 0
print(re.sub(r'o', custom_replace, "oh hello wow"))

输出是

a0h hella1 wa2w

在每次使用此函数调用re.sub() 之前，不要忘记将match_num 重置为0。

【讨论】：

或者在通话前将其设置为0，如上所示。
感谢您的回答@Barmar。我有点不愿意为此使用全局变量，因为我在 Web 框架内的路由请求中使用它。假设有许多并行请求（在线程中？），我不希望这个全局变量在不同调用（不是原子的？）之间“混合”。
您对此有何看法：stackoverflow.com/questions/61086537/…？很遗憾match 中没有内置属性来提供匹配号:)

【解决方案3】：

您可以将re.search 与re.sub 一起使用。

def count_sub(pattern,text,repl=''):
    count=1
    while re.search(pattern,text):
        text=re.sub(pattern,repl+str(count),text,count=1)
        count+=1
    return text

输出：

count_sub(r'o', 'oh hello world')
# '1h hell2 w3rld'

count_sub(r'o', 'oh hello world','a')
# 'a1h hella2 wa3rld'

替代方案：

def count_sub1(pattern,text,repl=''):
    it=enumerate(re.finditer(pattern,text),1)
    count=1
    while count:
        count,_=next(it,(0,0))
        text=re.sub(pattern,repl+str(count),text,count=1)
    return text

输出：

count_sub1(r'o','oh hello world')
# '1h hell2 w3rld'

count_sub1(r'o','oh hello world','a')
# 'a1h hella2 wa3rld'

【讨论】：

感谢您的回答。在re.search(pattern, text) 上循环时修改text 不是问题吗？这不危险吗？
@Basj 我不知道 TBH。但增加了另一种选择。不是正则表达式专家。怎么会有危险？如果有任何链接或资源指向我。