基于匹配对象的字符串替换（Python）答案

【问题标题】：String substitutions based on the matching object (Python)基于匹配对象的字符串替换（Python）
【发布时间】：2016-11-24 14:55:56
【问题描述】：

我很难理解 Python 正则表达式库中的 group 方法。在这种情况下，我尝试根据匹配的对象对字符串进行替换。

也就是说，我想用my_dict字典中的特定字符串（分别为rep1和rep2）替换匹配的对象（本例中为+和\n）。

从这个question和answer可以看出，我试过这个：

content = '''
Blah - blah \n blah * blah + blah.
'''

regex = r'[+\-*/]'

for mobj in re.finditer(regex, content):
    t = mobj.lastgroup
    v = mobj.group(t)

    new_content = re.sub(regex, repl_func(mobj), content)

def repl_func(mobj):
    my_dict = { '+': 'rep1', '\n': 'rep2'}
    try:
        match = mobj.group(0)
    except AttributeError:
        match = ''
    else:
        return my_dict.get(match, '')

print(new_content)

但是在计算v 时，我得到None 对应t，后跟IndexError。

任何解释和示例代码将不胜感激。

【问题讨论】：

很难猜出你的代码应该做什么（有很多语法错误，缩进被破坏，逻辑不清楚）。最好提供一个示例来描述您想要实现的目标。
@TomR8 抱歉！我修复了所有语法问题和拼写错误（希望如此）。

标签： python regex python-3.x

【解决方案1】：

尽管 Wiktor 给出了真正的 Pythonic 答案，但仍然存在为什么 OP 的原始算法不起作用的问题。基本上有2个问题：

new_content = re.sub(regex, repl_func(mobj), content) 的调用会将regex 的所有匹配替换为第一个匹配的替换值。

正确的呼叫必须是new_content = re.sub(regex, repl_func, content)。正如here 所记录的那样，repl_func 被当前匹配对象动态调用！

repl_func(mobj)做了一些不必要的异常处理，可以简化：

my_dict = {'\n': '', '+':'rep1', '*':'rep2', '/':'rep3', '-':'rep4'}
def repl_func(mobj):
    global my_dict
    return my_dict.get(mobj.group(0), '')

这相当于 Wiktor 的解决方案 - 他只是通过使用 lambda 表达式摆脱了函数定义本身。

通过此修改，for mobj in re.finditer(regex, content): 循环变得多余，因为它多次执行相同的计算。

为了完整起见，这里是一个使用re.finditer() 的工作解决方案。它从content的匹配切片构建结果字符串：

my_regx = r'[\n+*/-]'
my_dict = {'\n': '', '+':'rep1'     , '*':'rep2', '/':'rep3', '-':'rep4'}
content = "A*B+C-D/E"
res = ""
cbeg = 0
for mobj in re.finditer(my_regx, content):
    # get matched string and its slice indexes
    mstr = mobj.group(0)
    mbeg = mobj.start()
    mend = mobj.end()

    # replace matched string
    mrep = my_dict.get(mstr, '')

    # append non-matched part of content plus replacement
    res += content[cbeg:mbeg] + mrep

    # set new start index of remaining slice
    cbeg = mend

# finally add remaining non-matched slice
res += content[cbeg:]
print (res)

【讨论】：

【解决方案2】：

r'[+\-*/]' 正则表达式与换行符不匹配，因此您的 '\n': 'rep2' 不会被使用。否则，将\n 添加到正则表达式：r'[\n+*/-]'。

接下来，您会得到None，因为您的正则表达式不包含任何named capturing groups，请参阅re docs：

match.lastgroup
最后匹配的捕获组的名称，如果该组没有名称，或者根本没有匹配的组，则为 None。

要使用匹配替换，你甚至不需要使用re.finditer，使用带有lambda的re.sub作为替换：

import re
content = '''
Blah - blah \n blah * blah + blah.
'''

regex = r'[\n+*/-]'
my_dict = { '+': 'rep1', '\n': 'rep2'}
new_content = re.sub(regex, lambda m: my_dict.get(m.group(),""), content)
print(new_content)
# => rep2Blah  blah rep2 blah  blah rep1 blah.rep2

见Python demo

m.group() 获取整个匹配项（整个匹配项存储在match.group(0) 中）。如果模式中有一对未转义的括号，它将创建一个capturing group，您可以使用m.group(1) 等访问第一个括号。

【讨论】：