用例如 UTF-8 编码的文本文件中的字典替换单词答案

【问题标题】：Replacing words with a dictionary in text file encoded for example in UTF-8用例如 UTF-8 编码的文本文件中的字典替换单词
【发布时间】：2018-08-08 04:05:27
【问题描述】：

我正在尝试打开一个文本文件，然后通读它，将某些字符串替换为存储在字典中的字符串。基于对Replacing words in text file using a dictionary 和How to search and replace text in a file using Python? 的回答

如：

# edit print line to print (line) 
import fileinput

text = "sample file.txt"
fields = {"pattern 1": "replacement text 1", "pattern 2": "replacement text 2"}

for line in fileinput.input(text, inplace=True):
    line = line.rstrip()
    for field in fields:
        if field in line:
            line = line.replace(field, fields[field])

    print (line)

我的文件编码为utf-8。

当我运行它时，控制台显示此错误：

UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

当添加：encoding = "utf8" 到 fileinput.FileInput() 时显示错误：

TypeError: __init__() got an unexpected keyword argument 'encoding'

当添加：openhook=fileinput.hook_encoded("utf8") 到 fileinput.FileInput() 时显示错误：

ValueError: FileInput cannot use an opening hook in inplace mode

我不想插入子代码'ignore' 忽略错误。

我有文件、字典并希望将字典中的值替换为 stdout 之类的文件。

utf-8中的源文件：

Plain text on the line in the file.
This is a greeting to the world.
Hello world!
Here's another plain text.
And here too!

我想用单词earth 替换单词world。

字典中：{"world": "earth"}

utf-8中的修改文件：

Plain text on the line in the file.
This is a greeting to the earth.
Hello earth!
Here's another plain text.
And here too!

【问题讨论】：

您使用的是 Python 2，而不是 Python 3。
我正在使用 Python 3。在 .py 顶部我有 # -- coding: utf-8 --。否则它将如何用 Python 3 编写？主要问题是一些较旧的问题和答案是用 Python 2 编写的。
啊，确实，我错了，Python 3 版本没有encoding 选项。我的错。
我猜你使用的是Windows？
是的。 Anaconda Spyder。

标签： python dictionary replace file-io runtime-error

【解决方案1】：

fileinput库有几个问题我addressed in the past in a blog post;其中之一是您不能设置编码和使用就地文件重写。

以下代码可以执行此操作，但您必须将您的 print() 调用替换为对传出文件对象的写入：

from contextlib import contextmanager
import io
import os


@contextmanager
def inplace(filename, mode='r', buffering=-1, encoding=None, errors=None,
            newline=None, backup_extension=None):
    """Allow for a file to be replaced with new content.

    yields a tuple of (readable, writable) file objects, where writable
    replaces readable.

    If an exception occurs, the old file is restored, removing the
    written data.

    mode should *not* use 'w', 'a' or '+'; only read-only-modes are supported.

    """

    # move existing file to backup, create new file with same permissions
    # borrowed extensively from the fileinput module
    if set(mode).intersection('wa+'):
        raise ValueError('Only read-only file modes can be used')

    backupfilename = filename + (backup_extension or os.extsep + 'bak')
    try:
        os.unlink(backupfilename)
    except os.error:
        pass
    os.rename(filename, backupfilename)
    readable = io.open(backupfilename, mode, buffering=buffering,
                       encoding=encoding, errors=errors, newline=newline)
    try:
        perm = os.fstat(readable.fileno()).st_mode
    except OSError:
        writable = open(filename, 'w' + mode.replace('r', ''),
                        buffering=buffering, encoding=encoding, errors=errors,
                        newline=newline)
    else:
        os_mode = os.O_CREAT | os.O_WRONLY | os.O_TRUNC
        if hasattr(os, 'O_BINARY'):
            os_mode |= os.O_BINARY
        fd = os.open(filename, os_mode, perm)
        writable = io.open(fd, "w" + mode.replace('r', ''), buffering=buffering,
                           encoding=encoding, errors=errors, newline=newline)
        try:
            if hasattr(os, 'chmod'):
                os.chmod(filename, perm)
        except OSError:
            pass
    try:
        yield readable, writable
    except Exception:
        # move backup back
        try:
            os.unlink(filename)
        except os.error:
            pass
        os.rename(backupfilename, filename)
        raise
    finally:
        readable.close()
        writable.close()
        try:
            os.unlink(backupfilename)
        except os.error:
            pass

所以你的代码看起来像：

导入文件输入

text = "sample file.txt"
fields = {"pattern 1": "replacement text 1", "pattern 2": "replacement text 2"}

with inplace(text, encoding='utf8') as (infh, outfh):
    for line in infh:
        for field in fields:
            if field in line:
                line = line.replace(field, fields[field])

        outfh.write(line)

请注意，您现在不必删除换行符。

【讨论】：

【解决方案2】：

我试过用这个：

with open(fileName1, "r+", encoding = "utf8", newline='') as fileIn, open(fileName1, "r+", encoding = "utf8", newline='') as fileOut:
    for line in fileIn:             
        for field in fields:
            if field in line:
                line = line.replace(field, fields[field])
        fileOut.write(line)

注意：使用一个文件时，垃圾被推到文件末尾。到目前为止，我还没有弄清楚为什么。它不反映替换的数量。（更换的次数大于废品的行数。）

伪数学： oriA

我已经准备好修复它了。

编辑：当我使用两个文件时，一切正常。将第二个open() 中的fileName1 更改为fileName2。并将 mod 参数更改为 "w+"。

【讨论】：