Python 搜索和替换答案

【问题标题】：Python search and replacePython 搜索和替换
【发布时间】：2016-07-18 21:15:05
【问题描述】：

我已经用 Python 编写了两个函数。当我运行 replace() 时，它会查看名为 replaces 的数据结构。它获取键，遍历文档，当它与文档中的一个词匹配一个键时，它用值替换这个词。

现在看来正在发生什么，因为我也有相反的情况（“停止”更改为“暂停”，“暂停”更改为“停止”，具体取决于文本文件中的内容），似乎它遍历文件，更改了一些单词，然后又改回来（即没有进行任何更改）

当我运行 replace2() 时，我从文本文档中取出每个单词，看看这是否是替换的关键。如果是，我更换它。不过，我注意到的是，当我运行它时，暂停（包含子字符串“结束”）以“暂停”结束？

有没有更简单的方法来遍历文本文件并且只更改一次单词（如果找到的话）？我认为 replace2() 做了我想做的事，虽然我丢失了短语，但它似乎也拾取了子字符串，它不应该这样做，因为我确实使用了 split() 函数。

def replace():
        fileinput = open('tennis.txt').read()
        out = open('tennis.txt', 'w')
        for i in replacements.keys():
            fileinput = fileinput.replace(i, replacements[i])
            print(i, " : ", replacements[i])
        out.write(fileinput)
        out.close


def replace2():
        fileinput = open('tennis.txt').read()
        out = open('tennis.txt', 'w')
        #for line in fileinput:
        for word in fileinput.split():
            for i in replacements.keys():
                print(i)
                if word == i:
                    fileinput = fileinput.replace(word, replacements[i])
        out.write(fileinput)
        out.close

replacements = {
    'suspended'    : 'stopped',
    'stopped'      : 'suspended',
    'due to'       : 'because of',
    'ended'        : 'finished',
    'finished'     : 'ended',
    '40'           : 'forty',
    'forty'        : '40',
    'because of'   : 'due to' }

比赛开始后仅 40 分钟就因下雨而结束。它是因下雨暂停。

【问题讨论】：

标签： python string replace substring

【解决方案1】：

rawbeans 答案的改进版本。它没有按预期工作，因为您的一些替换键包含多个单词。

用您的示例行测试并输出：the match finished because of rain a mere forty minutes after it started. it was stopped due to rain.

import re

def replace2():
    fileinput = open('tennis.txt').read()
    out = open('tennisout.txt', 'w')
    #for line in fileinput:

    wordpats = '|'.join(replacements.keys())
    pattern = r'({0}|\w+|\W|[.,!?;-_])'.format(wordpats)
    words = re.findall(pattern, fileinput)
    output = "".join(replacements.get(x, x) for x in words)
    out.write(output)
    out.close()


replacements = {
    'suspended'    : 'stopped',
    'stopped'      : 'suspended',
    'due to'       : 'because of',
    'ended'        : 'finished',
    'finished'     : 'ended',
    '40'           : 'forty',
    'forty'        : '40',
    'because of'   : 'due to' }


if __name__ == '__main__':
    replace2()

【讨论】：

【解决方案2】：

有没有更简单的方法来遍历文本文件并且只更改一次单词（如果找到的话）？

有一个更简单的方法：

output = " ".join(replacements.get(x, x) for x in fileinput.split())
out.write(output)

【讨论】：

在这种情况下，如果单词（比如说“40”）在文本中出现 4 次，我希望将所有出现的 40 更改为“四十”。我提到只更改一次单词是在使用 replace() 函数时。这会将“40”这个词替换为“四十”，然后它似乎又回到了原来的“40”。
这里的主要问题是没有考虑标点符号。如果单词后面有句点或逗号，则不会被替换。
@rawbeans：没错，虽然我同意它存在于 OP 试图解决的更广泛的问题中，但这不是问题的一部分。
@SuperSaiyan 编辑了您的答案以包含更全面的拆分功能
@rawbeans：请将其发布为不同的答案。

【解决方案3】：

要考虑标点符号，请使用正则表达式而不是 split()：

output = " ".join(replacements.get(x, x) for x in re.findall(r"[\w']+|[.,!?;]", fileinput))
out.write(output)

这样，标点符号将在替换过程中被忽略，但会出现在最终字符串中。有关解释和潜在警告，请参阅 this post。

【讨论】：