如何在 Python 中通过代理打开带有 urllib 的网站？答案

【问题标题】：How can I open a website with urllib via proxy in Python?如何在 Python 中通过代理打开带有 urllib 的网站？
【发布时间】：2011-03-11 05:41:20
【问题描述】：

我有这个检查网站的程序，我想知道如何通过 Python 中的代理检查它...

这是代码，只是举例

while True:
    try:
        h = urllib.urlopen(website)
        break
    except:
        print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'
        time.sleep(5)

【问题讨论】：

urllib2 stackoverflow.com/questions/1450132/proxy-with-urllib2

标签： python proxy

【解决方案1】：

默认情况下，urlopen 使用环境变量 http_proxy 来确定使用哪个 HTTP 代理：

$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py  # Using http://myproxy.example.com:1234 as a proxy

如果您想在应用程序中指定代理，可以将 proxies 参数传递给 urlopen：

proxies = {'http': 'http://myproxy.example.com:1234'}
print("Using HTTP proxy %s" % proxies['http'])
urllib.urlopen("http://www.google.com", proxies=proxies)

编辑：如果我正确理解您的 cmets，您想尝试多个代理并在尝试时打印每个代理。这样的事情怎么样？

candidate_proxies = ['http://proxy1.example.com:1234',
                     'http://proxy2.example.com:1234',
                     'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
    print("Trying HTTP proxy %s" % proxy)
    try:
        result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
        print("Got URL using proxy %s" % proxy)
        break
    except:
        print("Trying next proxy in 5 seconds")
        time.sleep(5)

【讨论】：

使用您的示例，如何在 urlopen 发生时打印它正在使用的代理？
@Shady：只需输入一个打印proxies['http'] 值的print 语句。看看我更新的示例，看看它是如何完成的。
好的，谢谢，但如果我想要更多代理，比如，大量代理，例如 10 个代理，在下一个之前打开一个
@Shady：你的意思是你想为每个呼叫尝试一个新的代理，直到你找到一个有效的代理？将每个调用的 proxies 参数更改为 urlopen，为每个调用传入一个新代理。
实际上，我想用一些代理检查网站，比如 10，然后用这个代理重复这个过程，但这里的问题是如何打印 urlopen 当时使用的代理支票

【解决方案2】：

Python 3 在这里稍有不同。它会尝试自动检测代理设置，但如果您需要特定或手动代理设置，请考虑这种代码：

#!/usr/bin/env python3
import urllib.request

proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:pass@server:port', 
                                             'https': 'https://...'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)

with urllib.request.urlopen(url) as response:
    # ... implement things such as 'html = response.read()'

另请参阅the relevant section in the Python 3 docs

【讨论】：

【解决方案3】：

此处示例代码指导如何使用 urllib 通过代理进行连接：

authinfo = urllib.request.HTTPBasicAuthHandler()

proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})

# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
                                     urllib.request.CacheFTPHandler)

# install it
urllib.request.install_opener(opener)

f = urllib.request.urlopen('http://www.google.com/')
"""

【讨论】：

你能解释一下什么是authinfo并举个例子吗？谢谢。

【解决方案4】：

对于 http 和 https 使用：

proxies = {'http':'http://proxy-source-ip:proxy-port',
           'https':'https://proxy-source-ip:proxy-port'}

可以类似地添加更多代理

proxies = {'http':'http://proxy1-source-ip:proxy-port',
           'http':'http://proxy2-source-ip:proxy-port'
           ...
          }

用法

filehandle = urllib.urlopen( external_url , proxies=proxies)

不要使用任何代理（如果是网络内的链接）

filehandle = urllib.urlopen(external_url, proxies={})

通过用户名和密码使用代理身份验证

proxies = {'http':'http://username:password@proxy-source-ip:proxy-port',
           'https':'https://username:password@proxy-source-ip:proxy-port'}

注意：避免在用户名和密码中使用:,@等特殊字符

【讨论】：