使用 urllib 和 BeautifulSoup 通过 Python 从 Web 检索信息答案

【问题标题】：Using urllib and BeautifulSoup to retrieve info from web with Python使用 urllib 和 BeautifulSoup 通过 Python 从 Web 检索信息
【发布时间】：2011-02-08 11:37:01
【问题描述】：

我可以使用urllib获取html页面，并使用BeautifulSoup解析html页面，看来我必须生成要从BeautifulSoup读取的文件。

import urllib                                       
sock = urllib.urlopen("http://SOMEWHERE") 
htmlSource = sock.read()                            
sock.close()                                        
--> write to file

有没有办法在不从 urllib 生成文件的情况下调用 BeautifulSoup？

【问题讨论】：

标签： python web-scraping beautifulsoup urllib2

【解决方案1】：

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(htmlSource)

无需写入文件：只需传入 HTML 字符串。也可以直接传递urlopen返回的对象：

f = urllib.urlopen("http://SOMEWHERE") 
soup = BeautifulSoup(f)

【讨论】：

【解决方案2】：

您可以使用gazpacho 打开网址，下载 html，并使其一次性解析：

from gazpacho import Soup
soup = Soup.get("https://www.example.com/")

【讨论】：