使用 HTTP 代理 - Python [重复]答案

【问题标题】：Using an HTTP PROXY - Python [duplicate]使用 HTTP 代理 - Python [重复]
【发布时间】：2011-08-02 23:29:09
【问题描述】：

我很熟悉应该将 HTTP_RPOXY 环境变量设置为代理地址。

一般 urllib 工作正常，问题是处理 urllib2。

>>> urllib2.urlopen("http://www.google.com").read()

返回

urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>

或

urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>

额外信息：

urllib.urlopen(....) 工作正常！只是 urllib2 在耍花招……

我尝试了@Fenikso 的回答，但我现在收到了这个错误：

URLError: <urlopen error [Errno 10060] A connection attempt failed because the 
connected party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond>

有什么想法吗？

【问题讨论】：

你能发布实际的完整示例代码吗？
@Fenikso: 这个urllib2.urlopen("http://www.google.com").read()
所以你在 HTTP_PROXY 环境变量中设置了代理服务器？您确定服务器接受连接吗？

标签： python http proxy urllib2

【解决方案1】：

即使没有 HTTP_PROXY 环境变量，您也可以这样做。试试这个示例：

import urllib2

proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)

html = urllib2.urlopen("http://www.google.com").read()
print html

在您的情况下，代理服务器似乎确实拒绝连接。

更多尝试：

import urllib2

#proxy = "61.233.25.166:80"
proxy = "YOUR_PROXY_GOES_HERE"

proxies = {"http":"http://%s" % proxy}
url = "http://www.google.com/search?q=test"
headers={'User-agent' : 'Mozilla/5.0'}

proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)

req = urllib2.Request(url, None, headers)
html = urllib2.urlopen(req).read()
print html

2014 年编辑： 这似乎是一个流行的问题/答案。不过今天我会改用第三方requests 模块。

只做一个请求：

import requests

r = requests.get("http://www.google.com", 
                 proxies={"http": "http://61.233.25.166:80"})
print(r.text)

对于多个请求，请使用 Session 对象，因此您不必在所有请求中添加 proxies 参数：

import requests

s = requests.Session()
s.proxies = {"http": "http://61.233.25.166:80"}

r = s.get("http://www.google.com")
print(r.text)

【讨论】：

感谢您的回复！ :) 现在我得到了URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>... urllib 工作得很好。
@RadiantHex - 在我的系统上运行良好。您是否有任何代理必须用于互联网访问？
@RadiantHex - 您使用的代理类型是什么？
@Fenikso：我必须使用 http 代理来访问互联网，这与我用于所有软件的互联网访问相同。它与我在 HTTP_PROXY 变量中设置的代理相同。
@RadiantHex - 那么是不是因为用户代理而代理拒绝连接？

【解决方案2】：

我建议您只使用请求模块。

它比内置的 http 客户端要容易得多： http://docs.python-requests.org/en/latest/index.html

示例用法：

r = requests.get('http://www.thepage.com', proxies={"http":"http://myproxy:3129"})
thedata = r.content

【讨论】：

如何设置超时时间？
太棒了。这适用于 https 和 http，而 urllib 仅适用于 http 对我使用 python3。
我以为这对我有用，但尝试放置随机代理信息，并且每次仍然检索数据（只要使用https）

【解决方案3】：

只是想提一下，如果需要访问 https URL，您可能还必须设置 https_proxy 操作系统环境变量。就我而言，这对我来说并不明显，我尝试了几个小时才发现这一点。

我的用例：Win 7，jython-standalone-2.5.3.jar，通过ez_setup.py安装setuptools

【讨论】：

【解决方案4】：

Python 3：

import urllib.request

htmlsource = urllib.request.FancyURLopener({"http":"http://127.0.0.1:8080"}).open(url).read().decode("utf-8")

【讨论】：

来自 TraceBack：DeprecationWarning：不推荐使用 FancyURLopener 调用请求的样式。使用更新的 urlopen 函数/方法。

【解决方案5】：

我在 jython 客户端上遇到了这个问题。服务器只使用 TLS 和客户端使用 SSL 上下文。

javax.net.ssl.SSLContext.getInstance("SSL")

一旦客户端使用 TLS，一切就开始工作了。

【讨论】：