Python，UnicodeDecodeError：“ascii”编解码器无法解码位置 1718 中的字节 0xc2：序数不在范围内（128）答案

【问题标题】：Python, UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1718: ordinal not in range(128)Python，UnicodeDecodeError：“ascii”编解码器无法解码位置 1718 中的字节 0xc2：序数不在范围内（128）
【发布时间】：2016-04-26 22:56:05
【问题描述】：

我正在尝试对文件进行简单解析，但由于特殊字符而出现错误：

#!/usr/bin/env python                                                                                                                 
# -*- coding: utf-8 -*-                                                                                                               

infile = 'finance.txt'
input = open(infile)
for line in input:
  if line.startswith(u'▼'):

我得到错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1718: ordinal not in range(128)

解决方案？

【问题讨论】：

您也可以使用codecs 来打开带有utf-8 或utf-16 编码的文件

标签： python string python-2.7 encoding

【解决方案1】：

您需要提供编码。例如如果是utf-8:

import io

with io.open(infile, encoding='utf-8') as fobj:
    for line in fobj:
        if line.startswith(u'▼'):

这适用于 Python 2 和 3。默认情况下，Python 2 会打开假定没有编码的文件，即读取内容将返回字节字符串。因此，您只能读取ascii 个字符。在 Python 3 中，默认值是 locale.getpreferredencoding(False) 返回，在很多情况下是 utf-8。 Python 2 中的标准 open() 不允许指定编码。使用 io.open() 可以确保它面向未来，因为您在切换到 Python 3 时无需更改代码。

在 Python 3 中：

>>> io.open is open
True

【讨论】：

从技术上讲，Python 2 打开文件时假设没有编码并返回字节字符串。 Python 3 以 locale.getpreferredencoding()（可能不是 utf8）打开文件并返回 Unicode 字符串，但允许指定编码。
感谢您的提示。更新了我的答案。

【解决方案2】：

使用正确的编码打开您的文件，例如，如果您的文件是使用 Python 3 进行 UTF8 编码的：

with open('finance.txt', encoding='utf8') as f:
    for line in input:
        if line.startswith(u'▼'):
            # whatever

在 Python 2 中，您可以使用 io.open()（也适用于 Python 3）：

import io

with io.open('finance.txt', encoding='utf8') as f:
    for line in input:
        if line.startswith(u'▼'):
            # whatever

【讨论】：