UnicodeDecodeError：“utf-8”编解码器无法解码位置 5 中的字节 0xa0：无效的起始字节答案

【问题标题】：UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 5: invalid start byteUnicodeDecodeError：“utf-8”编解码器无法解码位置 5 中的字节 0xa0：无效的起始字节
【发布时间】：2020-11-29 21:48:58
【问题描述】：

代码：

import pandas as pd

data = pd.read_csv('db.csv')
data.head()
data.drop(['Company Rate', 'Metascore', 'Minutes Release Budget', 'Opening Weekend USA', 'Gross USA'], axis=0)

data.to_csv('db2.csv', encoding='utf-8')

错误信息：

Traceback (most recent call last):
  File "/Users/christine/Documents/Christine-CS/ALT 2/ALT2 Project/clean db2.py", line 3, in <module>
    data = pd.read_csv('db.csv')
  File "/Users/christine/Library/Python/3.7/lib/python/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/Users/christine/Library/Python/3.7/lib/python/site-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/Users/christine/Library/Python/3.7/lib/python/site-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "/Users/christine/Library/Python/3.7/lib/python/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/christine/Library/Python/3.7/lib/python/site-packages/pandas/io/parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 537, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 740, in pandas._libs.parsers.TextReader._get_header

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 5: invalid start byte

【问题讨论】：

看起来 db.csv 不是 utf-8 编码的。你知道文件是用什么写的，用了什么编码吗？
我很确定它是 utf-8，因为我以这种格式将它导出为 csv
你能把文件的第一部分贴出来让我们做实验吗？ print(open('db.csv', 'rb').read(32)) 会很多，因为错误是由第 5 个字符命中的。您可以尝试自己的实验，例如open('db.csv', encoding="utf-16-le").read(32)，看看您是否得到了正确的文本。
原始产权公司费率metascore纪要发布预算开放周末美国总体全球票房1钢铁侠奇迹7.9 79 126 2008 140000000 98618668 318604126 5853666247漫威2漫威7 57 124 2010 200000000 128122480 312433331 623933331 4雷神漫威7 57 115 2011 150000000 65723338 181030624 449326618
它的出现并不正确，因为它是一个关于漫威和 DC 电影的表格，供我正在研究的 ALT 使用

标签： python pandas

【解决方案1】：

如果你使用，你会得到同样的错误：

data.to_csv('db2.csv', encoding='latin1')

【讨论】：

latin1 可以毫无错误地读取任何数据文件，但如果它不是以 latin1 编码的，它将包含不正确的代码点。
回溯（最近一次调用最后一次）：文件“/Users/christine/Documents/Christine-CS/ALT 2/ALT2 Project/clean db2.py”，第 7 行 data.to_csv('db2. csv' encoding='latin1') ^ SyntaxError: invalid syntax