在文件中搜索字符串在 Python 中不起作用答案

【问题标题】：Searching for a string in a file is not working in Python在文件中搜索字符串在 Python 中不起作用
【发布时间】：2017-08-17 19:15:18
【问题描述】：

我正在使用此代码在 Python 中查找字符串：

buildSucceeded = "Build succeeded."
datafile = r'C:\PowerBuild\logs\Release\BuildAllPart2.log'

with open(datafile, 'r') as f:
    for line in f:
        if buildSucceeded in line:
            print(line)

我很确定文件中有字符串，尽管它没有返回任何内容。

如果我只是逐行打印，它会在每个“有效”字符之间返回大量“NUL”字符。

编辑 1： 问题是 Windows 的编码。我在这篇文章之后更改了编码并且它起作用了：Why doesn't Python recognize my utf-8 encoded source file?

不管怎样，文件看起来是这样的：

Line 1.
Line 2.
...
Build succeeded.
    0 Warning(s)
    0 Error(s)
...

我目前正在使用 Sublime for Windows 编辑器进行测试 - 它在每个“真实”字符之间输出一个“NUL”字符，这很奇怪。

使用 python 命令行我有这个输出：

C:\Dev>python readFile.py
Traceback (most recent call last):
  File "readFile.py", line 7, in <module>
    print(line)
  File "C:\Program Files\Python35\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 1: character maps to <undefined>

感谢您的帮助...

【问题讨论】：

1.恐怕“很确定”还不够。 2. 尝试在if buildSucceeded in line.strip() 中使用strip 以删除尾随'\n'。
尝试for line in f:，而不是拆分整个文件。然后你可以在打印之前去掉 nul 字符。
欢迎来到 StackOverflow。请阅读并遵循帮助文档中的发布指南。 Minimal, complete, verifiable example 适用于此。在您发布 MCVE 代码并准确描述问题之前，我们无法有效地帮助您。逐行阅读文件，在阅读时打印每一行，然后查看您实际拥有的内容。如果失败，然后将数据文件分成几行，重现问题，并在此处发布输出。

标签： python

【解决方案1】：

如果你的文件不是那么大，你可以做一个简单的查找。否则我会检查文件以查看文件中是否有字符串/检查位置是否有任何拼写错误并尝试缩小问题范围。

f = open(datafile, 'r') lines = f.read() answer = lines.find(buildSucceeded) 另请注意，如果它没有找到字符串答案将是 -1。

【讨论】：

【解决方案2】：

如上所述，发生的问题与编码有关。在下面的网站中，有一个很好的解释，说明如何在一种编码的文件之间转换为另一种编码。

我使用了最后一个示例（我的情况是 Python 3），它按预期工作：

buildSucceeded = "Build succeeded."
datafile = 'C:\\PowerBuild\\logs\\Release\\BuildAllPart2.log'

# Open both input and output streams.
#input = open(datafile, "rt", encoding="utf-16")
input = open(datafile, "r", encoding="utf-16")
output = open("output.txt", "w", encoding="utf-8")

# Stream chunks of unicode data.
with input, output:
    while True:
        # Read a chunk of data.
        chunk = input.read(4096)
        if not chunk:
            break
        # Remove vertical tabs.
        chunk = chunk.replace("\u000B", "")
        # Write the chunk of data.
        output.write(chunk)

with open('output.txt', 'r') as f:
    for line in f:
        if buildSucceeded in line:
            print(line)

来源：http://blog.etianen.com/blog/2013/10/05/python-unicode-streams/

【讨论】：