我应该用什么来打开 url 而不是 urllib3 中的 urlopen答案

【问题标题】：What should I use to open a url instead of urlopen in urllib3我应该用什么来打开 url 而不是 urllib3 中的 urlopen
【发布时间】：2016-07-30 16:42:21
【问题描述】：

我想写一段如下代码：

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)

但我发现我现在必须安装urllib3 包。

此外，我找不到任何教程或示例来理解如何重写上述代码，例如，urllib3 没有urlopen。

有什么解释或例子吗？！

P/S：我使用的是 python 3.4。

【问题讨论】：

示例运行时为什么要安装urllib3？
因为它对我不起作用，所以没有找到 urllib2。
@niloofar Python 3.4 将 urllib2 命名为 urllib。 from urllib import urlopen 应该适用于这种情况。
不要使用 urllib3。这样做：import urllib.requesturllib.request.urlopen('https://...')

标签： python web-scraping beautifulsoup urllib3

【解决方案1】：

urllib3 是与 urllib 和 urllib2 不同的库。它为标准库中的 urllibs 提供了许多附加功能，如果您需要它们，例如重用连接。文档在这里：https://urllib3.readthedocs.org/

如果你想使用 urllib3，你需要pip install urllib3。一个基本示例如下所示：

from bs4 import BeautifulSoup
import urllib3

http = urllib3.PoolManager()

url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)

【讨论】：

response.read() 至少在 Python 2.7 中不起作用。根据文档urllib3.readthedocs.io/en/latest/user-guide.html，它应该是html = response.data。
这个例子给我一个异常错误 (urllib3.exceptions.MaxRetryError) python3

【解决方案2】：

您不必安装urllib3。您可以选择任何适合您需要的 HTTP 请求生成库并将响应提供给 BeautifulSoup。虽然通常选择requests，因为它具有丰富的功能集和方便的 API。您可以通过在命令行中输入pip install requests 来安装requests。这是一个基本的例子：

from bs4 import BeautifulSoup
import requests

url = "url"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

【讨论】：

FWIW，如果你想使用请求，你仍然需要安装请求。它们都不是 Python 自带的。
请求取决于 urllib3。
@CeesTimmerman 我尝试了没有 urllib3 的请求并且它有效，为什么它依赖于 urllib3？

【解决方案3】：

新的 urllib3 库有一个很好的文档here
为了得到你想要的结果，你应该遵循：

Import urllib3
from bs4 import BeautifulSoup

url = 'http://www.thefamouspeople.com/singers.php'

http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))

“解码 utf-8”部分是可选的。当我尝试时它没有它工作，但我还是发布了这个选项。
来源：User Guide

【讨论】：

是在后台简单地使用 urllib3 的请求
@PirateApp 是。

【解决方案4】：

使用gazpacho，您可以将页面直接输送到可解析的汤对象中：

from gazpacho import Soup
url = "http://www.thefamouspeople.com/singers.php"
soup = Soup.get(url)

然后在上面运行 finds：

soup.find("div")

【讨论】：

【解决方案5】：

在 url3 中没有 .urlopen，试试这个：

import requests
html = requests.get(url)

【讨论】：

【解决方案6】：

你应该使用 urllib.reuqest，而不是 urllib3。

import urllib.request   # not urllib - important!
urllib.request.urlopen('https://...')

【讨论】：