美汤如何收集元素？答案

【问题标题】：How do I collect elements with Beautiful Soup?美汤如何收集元素？
【发布时间】：2019-11-14 13:30:50
【问题描述】：

我正在尝试用漂亮的汤制作一个网络刮板，但每次我尝试刮掉网站时，我都没有得到任何回报。在下面的代码中，我使用请求来获取网站，然后将其放入漂亮的汤对象中。之后我尝试抓取所有标签。

我尝试观看 youtube 教程并查看了框架的文档，但我就是不明白如何使用它。

from bs4 import BeautifulSoup
import bs4
import urllib

url = requests.get("https://www.rt.com/")

print(url.status_code)

soup = BeautifulSoup(url.content, 'html.parser')

soup.find_all('div')

【问题讨论】：

标签： python web beautifulsoup

【解决方案1】：

您缺少requests 包并且您没有对结果做任何事情。

from bs4 import BeautifulSoup
import requests

url = requests.get("https://www.rt.com/")

print(url.status_code)

soup = BeautifulSoup(url.content, 'html.parser')

divs = soup.find_all('div') # save results to a variable

# Print the text inside each div (example of how to use the results)
for div in divs:
    print(div.text)

【讨论】：

【解决方案2】：

首先，由于您忘记导入 requests 包，您的代码此时无法正常工作。因此，一旦您导入包，它将起作用。

其次，我建议彻底阅读 BeautifulSoup docs。它有你需要的所有答案。因此，如果您需要该页面上的所有锚点，只需将它们分配给如下所示的变量：

 elems = soup.find_all('a')

之后，您可以像处理结果集一样使用它，因此如果您需要从锚元素中提取链接，您可以执行以下操作：

for link in elems:
    print(link.get('href'))

# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie

【讨论】：