【发布时间】:2015-02-06 20:40:43
【问题描述】:
请帮忙。我已经为此苦苦挣扎了一段时间,并在一个又一个问题中遇到了问题。我只是想创建一个循环来打开文件夹中的每个 csv 文件。这是循环:
folder = '/Users/jolijttamanaha/Documents/Senior/Thesis/Python/TextAnalysis/datedmatchedngrams2/'
for file in os.listdir (folder):
with codecs.open(file, mode='rU', encoding='utf-8') as f:
m=min(int(line[1]) for line in csv.reader(f))
f.seek(0)
for line in csv.reader(f):
if int(line[1])==m:
print line
这是错误:
Traceback (most recent call last):
File "findfirsttrigram.py", line 11, in <module>
m=min(int(line[1]) for line in csv.reader(f))
File "findfirsttrigram.py", line 11, in <genexpr>
m=min(int(line[1]) for line in csv.reader(f))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 684, in next
return self.reader.next()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 615, in next
line = self.readline()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 530, in readline
data = self.read(readsize, firstline=True)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 477, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x87 in position 0: invalid start byte
我来到这里是因为首先我遇到了“空字节”错误,我通过这篇文章解决了这个问题:"Line contains NULL byte" in CSV reader (Python)
然后我得到一个整数错误,我用这篇帖子解决了这个问题"an integer is required" when open()'ing a file as utf-8?
然后我收到一条错误消息:'UnicodeException: UTF-16 stream doesn't start with BOM' 我用这篇文章解决了这个问题utf-16 file seeking in python. how?
然后我意识到 csv 模块需要 utf-8 所以我在这里。
但我终于达到了现有问题的极限。我不知道发生了什么。请帮忙。
【问题讨论】:
-
您是否考虑过使用带有
errors参数的编码错误处理程序之一 - docs.python.org/2.7/library/codecs.html#codecs.replace_errors ?