【发布时间】:2020-11-29 21:48:58
【问题描述】:
代码:
import pandas as pd
data = pd.read_csv('db.csv')
data.head()
data.drop(['Company Rate', 'Metascore', 'Minutes Release Budget', 'Opening Weekend USA', 'Gross USA'], axis=0)
data.to_csv('db2.csv', encoding='utf-8')
错误信息:
Traceback (most recent call last):
File "/Users/christine/Documents/Christine-CS/ALT 2/ALT2 Project/clean db2.py", line 3, in <module>
data = pd.read_csv('db.csv')
File "/Users/christine/Library/Python/3.7/lib/python/site-packages/pandas/io/parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "/Users/christine/Library/Python/3.7/lib/python/site-packages/pandas/io/parsers.py", line 454, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/Users/christine/Library/Python/3.7/lib/python/site-packages/pandas/io/parsers.py", line 948, in __init__
self._make_engine(self.engine)
File "/Users/christine/Library/Python/3.7/lib/python/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Users/christine/Library/Python/3.7/lib/python/site-packages/pandas/io/parsers.py", line 2010, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 537, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 740, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 5: invalid start byte
【问题讨论】:
-
看起来 db.csv 不是 utf-8 编码的。你知道文件是用什么写的,用了什么编码吗?
-
我很确定它是 utf-8,因为我以这种格式将它导出为 csv
-
你能把文件的第一部分贴出来让我们做实验吗?
print(open('db.csv', 'rb').read(32))会很多,因为错误是由第 5 个字符命中的。您可以尝试自己的实验,例如open('db.csv', encoding="utf-16-le").read(32),看看您是否得到了正确的文本。 -
原始产权公司费率metascore纪要发布预算开放周末美国总体全球票房1钢铁侠奇迹7.9 79 126 2008 140000000 98618668 318604126 5853666247漫威2漫威7 57 124 2010 200000000 128122480 312433331 623933331 4雷神漫威7 57 115 2011 150000000 65723338 181030624 449326618
-
它的出现并不正确,因为它是一个关于漫威和 DC 电影的表格,供我正在研究的 ALT 使用