Python 2to3 不工作答案

【问题标题】：Python 2to3 not workingPython 2to3 不工作
【发布时间】：2012-02-26 12:49:40
【问题描述】：

我目前正在通过 python 挑战，我已经达到 4 级，see here 我才学习 python 几个月，我正在尝试学习 python 3 超过 2.x 所以到目前为止一切顺利，除了我使用这段代码时，这里是 python 2.x 版本：

import urllib, re
prefix = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="
findnothing = re.compile(r"nothing is (\d+)").search
nothing = '12345'
while True:
    text = urllib.urlopen(prefix + nothing).read()
    print text
    match = findnothing(text)
    if match:
        nothing = match.group(1)
        print "   going to", nothing
    else:
        break

所以要将其转换为 3，我将更改为：

import urllib.request, urllib.parse, urllib.error, re
prefix = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="
findnothing = re.compile(r"nothing is (\d+)").search
nothing = '12345'
while True:
    text = urllib.request.urlopen(prefix + nothing).read()
    print(text)
    match = findnothing(text)
    if match:
        nothing = match.group(1)
        print("   going to", nothing)
    else:
        break

所以如果我运行 2.x 版本，它工作正常，通过循环，抓取 url 并走到最后，我得到以下输出：

and the next nothing is 72198
   going to 72198
and the next nothing is 80992
   going to 80992
and the next nothing is 8880
   going to 8880 etc

如果我运行 3.x 版本，我会得到以下输出：

b'and the next nothing is 44827'
Traceback (most recent call last):
  File "C:\Python32\lvl4.py", line 26, in <module>
    match = findnothing(b"text")
TypeError: can't use a string pattern on a bytes-like object

所以如果我在这一行中将 r 更改为 a b

findnothing = re.compile(b"nothing is (\d+)").search

我明白了：

b'and the next nothing is 44827'
   going to b'44827'
Traceback (most recent call last):
  File "C:\Python32\lvl4.py", line 24, in <module>
    text = urllib.request.urlopen(prefix + nothing).read()
TypeError: Can't convert 'bytes' object to str implicitly

有什么想法吗？

我对编程很陌生，所以请不要咬我的头。

_bk201

【问题讨论】：

标签： python python-3.x python-2to3

【解决方案1】：

你不能隐式混合 bytes 和 str 对象。

最简单的方法是解码 urlopen().read() 返回的字节并在任何地方使用 str 对象：

text = urllib.request.urlopen(prefix + nothing).read().decode() #note: utf-8

该页面未通过 Content-Type 标头或 <meta> 元素指定首选字符编码。我不知道text/html 的默认编码应该是什么，但rfc 2068 says：

当发送者没有提供明确的字符集参数时，媒体 “文本”类型的子类型被定义为具有默认字符集通过 HTTP 接收时的“ISO-8859-1”值。

【讨论】：

【解决方案2】：

正则表达式仅对文本有意义，对二进制数据无效。因此，保留findnothing = re.compile(r"nothing is (\d+)").search，并将text 转换为字符串。

【讨论】：

也谢谢你！这确实很有意义。
您可以对字节应用正则表达式，但在这种情况下，模式也应该是字节。

【解决方案3】：

我们使用的是requests，而不是urllib，它有两个选项（也许你可以在urllib中搜索类似的选项）

响应对象

import requests
>>> response = requests.get('https://api.github.com')

使用response.content - 具有bytes 类型

>>> response.content
b'{"current_user_url":"https://api.github.com/user","current_us...."}'

使用 response.text 时 - 您有编码的响应

>>> response.text
'{"current_user_url":"https://api.github.com/user","current_us...."}'

默认编码是utf-8，但是你可以像这样在请求之后设置它

import requests
>>> response = requests.get('https://api.github.com')
>>> response.encoding = 'SOME_ENCODING'

然后response.text 将保存您请求的编码中的内容...

【讨论】：