【发布时间】:2015-02-28 07:20:35
【问题描述】:
我是 python 新手,我了解到单引号和双引号字符串之间没有区别。 但我发现了一些不同的行为。
from bs4 import BeautifulSoup
import urllib.request
url1 = "http://www.backpackers.com.tw/forum/forumdisplay.php?f=310"
url2 = 'http://www.backpackers.com.tw/forum/forumdisplay.php?f=310'
如果我跑:
response = urllib.request.urlopen(url1)
结果:脚本完成且没有错误
如果我跑步:
response = urllib.request.urlopen(url2)
结果:错误
C:\Users\user1\Desktop\scrape>python backpacker_tw.py
Traceback (most recent call last):
File "C:\Python34\lib\urllib\request.py", line 1189, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Python34\lib\http\client.py", line 1090, in request
self._send_request(method, url, body, headers)
File "C:\Python34\lib\http\client.py", line 1128, in _send_request
self.endheaders(body)
File "C:\Python34\lib\http\client.py", line 1086, in endheaders
self._send_output(message_body)
File "C:\Python34\lib\http\client.py", line 924, in _send_output
self.send(msg)
File "C:\Python34\lib\http\client.py", line 859, in send
self.connect()
File "C:\Python34\lib\http\client.py", line 836, in connect
self.timeout, self.source_address)
File "C:\Python34\lib\socket.py", line 509, in create_connection
raise err
File "C:\Python34\lib\socket.py", line 500, in create_connection
sock.connect(sa)
ConnectionRefusedError: [WinError 10061] No connection could be made because the
target machine actively refused it
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "backpacker_tw.py", line 7, in <module>
response = urllib.request.urlopen(url2)
File "C:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 455, in open
response = self._open(req, data)
File "C:\Python34\lib\urllib\request.py", line 473, in _open
'_open', req)
File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 1215, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Python34\lib\urllib\request.py", line 1192, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [WinError 10061] No connection could be ma
de because the target machine actively refused it>
这是一个错误还是我错过了什么?
C:\Users\user1\Desktop\scrape>python -V
Python 3.4.1
【问题讨论】:
-
他们都为我工作。你重试了吗?由于某些连接或服务器故障,它可能只是第二次失败。
-
如果我相信en.wikipedia.org/wiki/Percent-encoding,撇号是一个有效的 URI 字符,但是最新的 RFC 说它是一个保留字符,应该被编码为 %27 (当以这种方式编码时,它实际上在我的浏览器中工作)。
-
@MohitBhasi:当然,URL 中的撇号应该是百分比编码的,但该 URL 中没有撇号。 FWIW,当您将字符串文字传递给函数时,该函数只接收字符串字符,而不是分隔符 - 无论您使用
'、"还是三重引用,或者是否由某些str方法等动态创建。 -
也许该网站不喜欢您使用脚本来访问它。也许尝试更改User-Agent
标签: python beautifulsoup urllib python-3.4