语法错误：非 ASCII 字符。 Python答案

【问题标题】：SyntaxError: Non-ASCII character. Python语法错误：非 ASCII 字符。 Python
【发布时间】：2015-01-06 14:30:20
【问题描述】：

谁能告诉我以下哪个字符是非 ASCII 字符：

Columns(str) – 逗号分隔的值列表。仅当格式为制表符或 xls 时才有效。对于 UnitprotKB，一些可能的列是：id、条目名称、长度、有机体。某些列名必须后跟数据库名称（即“数据库（PDB）”）。再次查看 unipro 网站了解更多详情。有关 column 关键字的完整列表，另请参见 _valid_columns。

本质上我是在定义一个类并试图给它一个注释来定义它是如何工作的：

def test(self,uniprot_id):
    '''
    Same as the UniProt.search() method arguments:
    search(query, frmt='tab', columns=None, include=False, sort='score', compress=False, limit=None, offset=None, maxTrials=10)


    query (str) -- query must be a valid uniprot query. See http://www.uniprot.org/help/text-search, http://www.uniprot.org/help/query-fields See also example below
    frmt (str) -- a valid format amongst html, tab, xls, asta, gff, txt, xml, rdf, list, rss. If tab or xls, you can also provide the columns argument. (default is tab)
    include (bool) -- include isoform sequences when the frmt parameter is fasta. Include description when frmt is rdf.
    sort (str) -- by score by default. Set to None to bypass this behaviour
    compress (bool) -- gzip the results
    limit (int) -- Maximum number of results to retrieve.
    offset (int) -- Offset of the first result, typically used together with the limit parameter.
    maxTrials (int) -- this request is unstable, so we may want to try several time.
    Columns(str) -- comma-seperated list of values. Works only if format is tab or xls. For UnitprotKB, some possible columns are: id, entry name, length, organism. Some column names must be followed by a database name (i.e. ‘database(PDB)’). Again see uniprot website for more details. See also _valid_columns for the full list of column keyword. '

    '''        
    u = UniProt()
    uniprot_entry = u.search(uniprot_id)
    return uniprot_entry

如果没有第 52 行，即在引用的注释块中以“列”开头的行，这可以按预期工作，但是一旦我描述了“列”是什么，我就会收到以下错误：

SyntaxError: Non-ASCII character '\xe2' in file /home/cw00137/Documents/Python/Identify_gene.py on line 52, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

有人知道发生了什么吗？

【问题讨论】：

您是否尝试阅读错误信息？
是的，正如我上面所说，第 52 行是引用的注释块中以“列”开头的行。删除整行会导致代码正常

标签： python unicode syntax syntax-error quoting

【解决方案1】：

您在该行中使用了“花式”大引号：

>>> u'‘database(PDB)’'
u'\u2018database(PDB)\u2019'

开头是U+2018 LEFT SINGLE QUOTATION MARK，结尾是U+2019 RIGHT SINGLE QUOTATION MARK。

使用 ASCII 引号（U+0027 APOSTROPHE 或 U+0022 QUOTATION MARK）或为您的源声明 ASCII 以外的编码。

你也在使用U+2013 EN DASH：

>>> u'Columns(str) –'
u'Columns(str) \u2013'

将其替换为U+002D HYPHEN-MINUS。

所有三个字符都编码为带有前导 E2 字节的 UTF-8：

>>> u'\u2013 \u2018 \u2019'.encode('utf8')
'\xe2\x80\x93 \xe2\x80\x98 \xe2\x80\x99'

然后您会看到它反映在 SyntaxError 异常消息中。

您可能希望一开始就避免使用这些字符。可能是您的操作系统在您键入时替换了这些，或者您使用文字处理器而不是纯文本编辑器来编写代码并且它正在为您替换这些。您可能想关闭该功能。

【讨论】：

【解决方案2】：

之前遇到同样的问题同样的错误，python2会默认使用ASCII编码。您可以尝试在 py 文件的 first 或 second 行声明以下注释：

# -*- coding: utf-8 -*-

【讨论】：