如何在python中反转部分句子？答案

【问题标题】：How can I reverse parts of sentence in python?如何在python中反转部分句子？
【发布时间】：2016-09-07 00:21:28
【问题描述】：

我有一句话，比方说：

敏捷的棕狐跳过懒惰的狗

我想创建一个带有 2 个参数、一个句子和一个要忽略的事物列表的函数。它返回带有反转单词的句子，但是它应该忽略我在第二个参数中传递给它的东西。这是我目前拥有的：

def main(sentence, ignores):
    return ' '.join(word[::-1] if word not in ignores else word for word in sentence.split())

但这只有在我像这样传递第二个列表时才有效：

print(main('The quick brown fox jumps over the lazy dog', ['quick', 'lazy']))

但是，我想传递这样的列表：

print(main('The quick brown fox jumps over the lazy dog', ['quick brown', 'lazy dog']))

预期结果： ehT quick brown xof spmuj revo eht lazy dog

所以基本上第二个参数（列表）将包含应该忽略的句子部分。不只是一个词。

我必须为此使用正则表达式吗？我试图避免它......

【问题讨论】：

您是忽略单词还是忽略整个短语？
我试图忽略完整的短语。
所以如果棕色后来出现，不是很快就应该被反转？
如果只有“棕色”，则应将其反转。 'quick brown' 不应该颠倒过来..
所以这几乎就像为这些忽略创建临时占位符一样。也许我应该这样做？只需使用replace 在文本中创建{} 这些，然后再执行format(ignores)。

标签： python python-3.x

【解决方案1】：

我是第一个建议避免使用正则表达式的人，但在这种情况下，不使用的复杂性要大于使用它们所增加的复杂性：

import re

def main(sentence, ignores):
    # Dedup and allow fast lookup for determining whether to reverse a component
    ignores = frozenset(ignores)

    # Make a pattern that will prefer matching the ignore phrases, but
    # otherwise matches each space and non-space run (so nothing is dropped)
    # Alternations match the first pattern by preference, so you'll match
    # the ignores phrases if possible, and general space/non-space patterns
    # otherwise
    pat = r'|'.join(map(re.escape, ignores)) + r'|\S+|\s+'

    # Returns the chopped up pieces (space and non-space runs, but ignore phrases stay together
    parts = re.findall(pat, sentence)

    # Reverse everything not found in ignores and then put it all back together
    return ''.join(p if p in ignores else p[::-1] for p in parts)

【讨论】：

为什么是frozenset 而不仅仅是set？
@StefanPochmann：绝对没有理由，只是我们以后不会修改它。在实践中，我认为这没有什么区别（当前版本的 Python 没有对 frozenset 进行特殊优化；set 和 frozenset 的内部结构是相同的，所以如果你从不改变它，唯一的区别适合用作dict 键或set 成员），但我使用frozenset 时不打算进行突变；这样做不需要任何费用，如果需要，可以随时更改。如果短语可能是彼此的子短语，您会希望 collections.OrderedDict 保持顺序。

【解决方案2】：

只是另一个想法，颠倒每个单词，然后将忽略的单词反过来：

>>> from functools import reduce
>>> def main(sentence, ignores):
        def r(s):
            return ' '.join(w[::-1] for w in s.split())
        return reduce(lambda s, i: s.replace(r(i), i), ignores, r(sentence))

>>> main('The quick brown fox jumps over the lazy dog', ['quick brown', 'lazy dog'])
'ehT quick brown xof spmuj revo eht lazy dog'

【讨论】：

@PadraicCunningham 你低估了我。例如，将r 及其所有变量重命名为_ 怎么样？即def _(_): return ' '.join(_[::-1] for _ in _.split())?

【解决方案3】：

而不是占位符，为什么不首先反转您想要以正确方式出现的任何短语，然后反转整个字符串：

def main(sentence, ignores):
    for phrase in ignores:
        reversed_phrase = ' '.join([word[::-1] for word in phrase.split()])
        sentence = sentence.replace(phrase, reversed_phrase)

    return ' '.join(word[::-1] for word in sentence.split())

print(main('The quick brown fox jumps over the lazy dog', ['quick', 'lazy']))
print(main('The quick brown fox jumps over the lazy dog', ['quick brown', 'lazy dog']))

ehT quick nworb xof spmuj revo eht lazy god
ehT quick brown xof spmuj revo eht lazy dog

【讨论】：

如果你有['lazy dog', 'quick brown'] 怎么办？
这不是一个坏主意，但它确实要求短语的list 保持不变，在两个重复次数中都与句子结构相匹配（不能忽略短语的所有实例，一次只有一个）和顺序（ignores 必须与句子中看到的顺序完全匹配）。
我认为您可以先搜索在句子中找到每个短语的索引，然后按该索引排序忽略，然后进行替换。
@SuperShoot：这仍然假设没有重复的短语。
是的，你是对的。也许有一个使用关键字占位符的解决方案，以便.format() 替换所有实例并且顺序无关紧要。不过有点令人费解。

【解决方案4】：

我试图解决重叠忽略短语的问题，例如['brown fox', 'quick brown']@PadraicCunningham 提出。

显然有更多的循环，而且代码感觉不那么 Python 化，所以我对如何改进这一点的反馈很感兴趣。

import re

def _span_combiner(spans):
    """replace overlapping spans with encompasing single span"""
    for i, s in enumerate(spans):
        start = s[0]
        end = s[1]
        for x in spans[i:]:
            if x[0] < end:
                end = x[1]
        yield (start, end) 

def main(sentence, ignores):
    # spans is a start and finish indices for each ignore phrase in order of occurence
    spans = sorted(
            [[m.span() for m in re.finditer(p, sentence)][0] for p in ignores if p in sentence]
    )
    # replace overlapping indices with single set of indices encompasing overlapped range
    spans = [s for s in _span_combiner(spans)]
    # recreate ignore list by slicing sentence with combined spans
    ignores = [sentence[s[0]:s[1]] for s in spans]
    for phrase in ignores:
        reversed_phrase = ' '.join([word[::-1] for word in phrase.split()])
        sentence = sentence.replace(phrase, reversed_phrase)

    return ' '.join(word[::-1] for word in sentence.split())

if __name__ == "__main__":
    print(main('The quick brown fox jumps over the lazy dog', ['quick', 'lazy']))
    print(main('The quick brown fox jumps over the lazy dog', ['brown fox', 'lazy dog']))
    print(main('The quick brown fox jumps over the lazy dog', ['nonexistent' ,'brown fox', 'quick brown']))
    print(main('The quick brown fox jumps over the brown fox', ['brown fox', 'quick brown']))

结果：

ehT quick nworb xof spmuj revo eht lazy god
ehT kciuq brown fox spmuj revo eht lazy dog
ehT quick brown fox spmuj revo eht yzal god
ehT quick brown fox spmuj revo eht brown fox

【讨论】：