【问题标题】:Why am I getting SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte为什么我收到 SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte
【发布时间】:2015-04-17 23:37:40
【问题描述】:

我从 API 获得了一些 json 数据。我使用了 json.loads,然后将其打印到如下所示的 REPL。

  {'warnings': {'query': {'*': "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."}}, 'query-continue': {'links': {'plcontinue': '25618423|10|R_from_other_capitalisation', 'gplcontinue': "15095968|0|1991_US_Open_-_Women's_Doubles"}}, 'query': {'pages': {'32203010': {'pageid': 32203010, 'title': "1988 Australian Open - Women's Doubles", 'ns': 0}, '25618558': {'pageid': 25618558, 'title': "1984 Wimbledon Championships - Women's Singles", 'ns': 0}, '29486043': {'pageid': 29486043, 'title': "1984 Wimbledon Championships - Women's Doubles", 'ns': 0}, '25618819': {'pageid': 25618819, 'title': "1986 US Open - Women's Singles", 'ns': 0}, '25619314': {'pageid': 25619314, 'title': "1989 US Open - Women's Singles", 'ns': 0}, '25618668': {'pageid': 25618668, 'title': "1985 US Open - Women's Singles", 'ns': 0}, '25618857': {'pageid': 25618857, 'title': "1987 Australian Open - Women's Singles", 'ns': 0}, '25618423': {'links': [{'title': "1983 Wimbledon Championships – Women's Singles", 'ns': 0}, {'title': 'Wikipedia:Mainspace', 'ns': 4}, {'title': 'Template:R from long name', 'ns': 10}], 'pageid': 25618423, 'title': "1983 Wimbledon Championships - Women's Singles", 'ns': 0}, '23826062': {'links': [{'title': "1984 French Open – Women's Singles", 'ns': 0}, {'title': 'Wikipedia:Mainspace', 'ns': 4}, {'title': 'Template:R from long name', 'ns': 10}, {'title': 'Template:R from other capitalisation', 'ns': 10}, {'title': 'Template:R from plural', 'ns': 10}, {'title': 'Template:R from short name', 'ns': 10}, {'title': 'Category:Redirects from modifications', 'ns': 14}], 'pageid': 23826062, 'title': "1984 French Open - Women's Singles", 'ns': 0}, '25619177': {'pageid': 25619177, 'title': "1989 Australian Open - Women's Singles", 'ns': 0}}}}

然后我将该数据从 repl 复制到一个 .py 模块并分配给一个变量,以便我可以执行一些单元测试。但我不断收到此错误:

SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte

发生了什么事?

更新:我得到错误的确切方式。使用 Visual Studio,我运行了一个脚本,该脚本使用 Requests 和 .text 获取数据以获取内容。然后我应用了 json.loads。我将此打印到 Visual Studio Python 3.4 Interactive(又名 REPL)。然后我使用鼠标从这个 REPL 复制并粘贴到 Visual Studio 中的 .py 文件中。

更新 2:因此,当我获取数据时,我使用 Requests,然后使用 text 属性。当我在没有 json.loads 的情况下打印它时,它很好。但是,如果我从 REPL 复制这个“更原始的”并粘贴它不再是字符串而是对象,那么 JSON 加载将不起作用。 python 3打印功能是否打印对象,即使它应该是一个json?

这是 API 使用 Requests.text 的原始 no json.loads 输出:

{"warnings":{"query":{"*":"Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."}},"query-continue":{"links":{"plcontinue":"25618423|10|R_from_other_capitalisation","gplcontinue":"15095968|0|1991_US_Open_-_Women's_Doubles"}},"query":{"pages":{"25618423":{"pageid":25618423,"ns":0,"title":"1983 Wimbledon Championships - Women's Singles","links":[{"ns":0,"title":"1983 Wimbledon Championships \u2013 Women's Singles"},{"ns":4,"title":"Wikipedia:Mainspace"},{"ns":10,"title":"Template:R from long name"}]},"23826062":{"pageid":23826062,"ns":0,"title":"1984 French Open - Women's Singles","links":[{"ns":0,"title":"1984 French Open \u2013 Women's Singles"},{"ns":4,"title":"Wikipedia:Mainspace"},{"ns":10,"title":"Template:R from long name"},{"ns":10,"title":"Template:R from other capitalisation"},{"ns":10,"title":"Template:R from plural"},{"ns":10,"title":"Template:R from short name"},{"ns":14,"title":"Category:Redirects from modifications"}]},"29486043":{"pageid":29486043,"ns":0,"title":"1984 Wimbledon Championships - Women's Doubles"},"25618558":{"pageid":25618558,"ns":0,"title":"1984 Wimbledon Championships - Women's Singles"},"25618668":{"pageid":25618668,"ns":0,"title":"1985 US Open - Women's Singles"},"25618819":{"pageid":25618819,"ns":0,"title":"1986 US Open - Women's Singles"},"25618857":{"pageid":25618857,"ns":0,"title":"1987 Australian Open - Women's Singles"},"32203010":{"pageid":32203010,"ns":0,"title":"1988 Australian Open - Women's Doubles"},"25619177":{"pageid":25619177,"ns":0,"title":"1989 Australian Open - Women's Singles"},"25619314":{"pageid":25619314,"ns":0,"title":"1989 US Open - Women's Singles"}}}}

【问题讨论】:

  • 您是否尝试过在行首替换此选项卡以替换空格?

标签: python json unicode utf-8


【解决方案1】:

您的文本中有EN DASH (U+2013) 个字符。在Windows-1252 编解码器中,它们映射到字节\x96。您遇到了编码问题,但具体原因取决于您将文本复制到 .py 文件所采取的步骤。我将您问题中的文本剪切并粘贴到 Notepad++ 中,编码设置为ANSI,并将其分配给一个变量,然后就得到了:

  File "C:\temp.py", line 1
SyntaxError: unknown decode error

但是选择UTF-8UTF-8 without BOM 作为编码它可以正常工作。如果没有声明源编码的#coding: 注释,Python 3 假定为 UTF-8。

请注意,我的美国 Windows 系统上的 ANSI 实际上是 Windows-1252。使用ANSI 并添加#coding:windows-1252 也可以正常工作。如果源编码不同于默认值,Python 需要知道源编码(Python 2 上的ascii 和 Python 3 上的utf-8)。

【讨论】:

  • 我认为它发生在我打印到 REPL 时,因为我只在使用默认为 UTF-8 的 json.loads 时出现错误,然后当我使用 Windows-1252 打印它时。啊,真是一场噩梦。谢谢
猜你喜欢
  • 2022-09-26
  • 2020-12-26
  • 1970-01-01
  • 1970-01-01
  • 2020-01-31
  • 2021-11-24
  • 2018-10-15
  • 1970-01-01
  • 2014-09-27
相关资源
最近更新 更多