【问题标题】：Spliting on every character except for preserved substring除保留的子字符串外，在每个字符上拆分
【发布时间】：2017-03-31 03:38:02
【问题描述】：

给定字符串

word = "These"

包含元组

pair = ("h", "e")

目的是替换word，使其拆分除pair 元组以外的所有字符，即输出：

('T', 'he', 's', 'e')

我试过了：

word = 'These'
pair = ('h', 'e')
first, second = pair
pair_str = ''.join(pair)
pair_str = pair_str.replace('\\','\\\\')
pattern = re.compile(r'(?<!\S)' + re.escape(first + ' ' + second) + r'(?!\S)')
new_word = ' '.join(word)
new_word = pattern.sub(pair_str, new_word)
result = tuple(new_word.split())

请注意，有时pair 元组可以包含斜杠、反斜杠或任何其他转义字符，因此上述正则表达式中的替换和转义。

有没有更简单的方法来实现相同的字符串替换？

已编辑

来自 cmets 的详细信息：

这对中的两个字符是唯一的还是不唯一的之间有区别吗？

不，应该以同样的方式对待他们。

【问题讨论】：

您不需要这样做 pair_str.replace('\\','\\\\')，因为 re.escape 会这样做。
first 和 second 是什么？
注意，我使用的是from __future__ import unicode_literals。
如果代码有效并且您正在寻求如何重新设计它的建议，CodeReview.stackexchange.com 是合适的地方。 SO 是为了帮助修复根本不起作用的代码。
为什么会是['T', 'he', 's', 'e', 'aa']？对只有('h', 'e')，所以应该是['T', 'he', 's', 'e', 'a', 'a']

标签： python regex string preserve

【解决方案1】：

匹配而不是拆分：

pattern = re.escape(''.join(pair)) + '|.'
result = tuple(re.findall(pattern, word))

模式是<pair>|.，如果可能，匹配对，否则匹配单个字符*。

你也可以不使用正则表达式：

import itertools

non_pairs = word.split(''.join(pair))
result = [(''.join(pair),)] * (2 * len(non_pairs) - 1)
result[::2] = non_pairs
result = tuple(itertools.chain(*result))

^{* 但是它不匹配换行符；如果你有这些，将re.DOTALL 作为第三个参数传递给re.findall。}

【讨论】：

【解决方案2】：

你可以不使用正则表达式来做到这一点：

import functools

word = 'These here when she'
pair = ('h', 'e')
digram = ''.join(pair)
parts = map(list, word.split(digram))
lex = lambda pre,post: post if pre is None else pre+[digram]+post

print(functools.reduce(lex, parts, None))

【讨论】：