【问题标题】:Replacing words with a dictionary in text file encoded for example in UTF-8用例如 UTF-8 编码的文本文件中的字典替换单词
【发布时间】:2018-08-08 04:05:27
【问题描述】:

我正在尝试打开一个文本文件,然后通读它,将某些字符串替换为存储在字典中的字符串。基于对Replacing words in text file using a dictionaryHow to search and replace text in a file using Python? 的回答

如:

# edit print line to print (line) 
import fileinput

text = "sample file.txt"
fields = {"pattern 1": "replacement text 1", "pattern 2": "replacement text 2"}

for line in fileinput.input(text, inplace=True):
    line = line.rstrip()
    for field in fields:
        if field in line:
            line = line.replace(field, fields[field])

    print (line)

我的文件编码为utf-8

当我运行它时,控制台显示此错误:

UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

当添加:encoding = "utf8"fileinput.FileInput() 时显示错误:

TypeError: __init__() got an unexpected keyword argument 'encoding'

当添加:openhook=fileinput.hook_encoded("utf8")fileinput.FileInput() 时显示错误:

ValueError: FileInput cannot use an opening hook in inplace mode

我不想插入子代码'ignore' 忽略错误。

我有文件、字典并希望将字典中的值替换为 stdout 之类的文件。

utf-8中的源文件:

Plain text on the line in the file.
This is a greeting to the world.
Hello world!
Here's another plain text.
And here too!

我想用单词earth 替换单词world

字典中:{"world": "earth"}

utf-8中的修改文件:

Plain text on the line in the file.
This is a greeting to the earth.
Hello earth!
Here's another plain text.
And here too!

【问题讨论】:

  • 您使用的是 Python 2,而不是 Python 3。
  • 我正在使用 Python 3。在 .py 顶部我有 # -- coding: utf-8 --。否则它将如何用 Python 3 编写?主要问题是一些较旧的问题和答案是用 Python 2 编写的。
  • 啊,确实,我错了,Python 3 版本没有encoding 选项。我的错。
  • 我猜你使用的是Windows?
  • 是的。 Anaconda Spyder。

标签: python dictionary replace file-io runtime-error


【解决方案1】:

fileinput库有几个问题我addressed in the past in a blog post;其中之一是您不能设置编码使用就地文件重写。

以下代码可以执行此操作,但您必须将您的 print() 调用替换为对传出文件对象的写入:

from contextlib import contextmanager
import io
import os


@contextmanager
def inplace(filename, mode='r', buffering=-1, encoding=None, errors=None,
            newline=None, backup_extension=None):
    """Allow for a file to be replaced with new content.

    yields a tuple of (readable, writable) file objects, where writable
    replaces readable.

    If an exception occurs, the old file is restored, removing the
    written data.

    mode should *not* use 'w', 'a' or '+'; only read-only-modes are supported.

    """

    # move existing file to backup, create new file with same permissions
    # borrowed extensively from the fileinput module
    if set(mode).intersection('wa+'):
        raise ValueError('Only read-only file modes can be used')

    backupfilename = filename + (backup_extension or os.extsep + 'bak')
    try:
        os.unlink(backupfilename)
    except os.error:
        pass
    os.rename(filename, backupfilename)
    readable = io.open(backupfilename, mode, buffering=buffering,
                       encoding=encoding, errors=errors, newline=newline)
    try:
        perm = os.fstat(readable.fileno()).st_mode
    except OSError:
        writable = open(filename, 'w' + mode.replace('r', ''),
                        buffering=buffering, encoding=encoding, errors=errors,
                        newline=newline)
    else:
        os_mode = os.O_CREAT | os.O_WRONLY | os.O_TRUNC
        if hasattr(os, 'O_BINARY'):
            os_mode |= os.O_BINARY
        fd = os.open(filename, os_mode, perm)
        writable = io.open(fd, "w" + mode.replace('r', ''), buffering=buffering,
                           encoding=encoding, errors=errors, newline=newline)
        try:
            if hasattr(os, 'chmod'):
                os.chmod(filename, perm)
        except OSError:
            pass
    try:
        yield readable, writable
    except Exception:
        # move backup back
        try:
            os.unlink(filename)
        except os.error:
            pass
        os.rename(backupfilename, filename)
        raise
    finally:
        readable.close()
        writable.close()
        try:
            os.unlink(backupfilename)
        except os.error:
            pass

所以你的代码看起来像:

导入文件输入

text = "sample file.txt"
fields = {"pattern 1": "replacement text 1", "pattern 2": "replacement text 2"}

with inplace(text, encoding='utf8') as (infh, outfh):
    for line in infh:
        for field in fields:
            if field in line:
                line = line.replace(field, fields[field])

        outfh.write(line)

请注意,您现在不必删除换行符。

【讨论】:

    【解决方案2】:

    我试过用这个:

    with open(fileName1, "r+", encoding = "utf8", newline='') as fileIn, open(fileName1, "r+", encoding = "utf8", newline='') as fileOut:
        for line in fileIn:             
            for field in fields:
                if field in line:
                    line = line.replace(field, fields[field])
            fileOut.write(line)
    

    注意:使用一个文件时,垃圾被推到文件末尾。 到目前为止,我还没有弄清楚为什么。它不反映替换的数量。 (更换的次数大于废品的行数。)

    伪数学: oriA

    我已经准备好修复它了。

    编辑:当我使用两个文件时,一切正常。将第二个open() 中的fileName1 更改为fileName2。并将 mod 参数更改为 "w+"

    【讨论】:

      猜你喜欢
      • 2010-11-30
      • 2017-08-25
      • 1970-01-01
      • 2015-07-12
      • 2020-07-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多