【问题标题】:How to extract a text from a nested tag in Beautiful Soup?如何从 Beautiful Soup 中的嵌套标签中提取文本?
【发布时间】:2019-11-06 15:48:04
【问题描述】:

我希望通过网络抓取 Google 搜索结果,并希望获得出现的第一条信息。如何指定特定的 HTML 路径以从中提取文本?

import requests
import lxml
from bs4 import BeautifulSoup

city = "Potomac"
suffix = "Weather"
query = city + " " + suffix

url = "https://www.google.com/search?q=" + query

# Now have the best URL for a city
results = requests.get(url)

# Extract all content
src = results.content

# Get HTML soup of all content on that page
soup = BeautifulSoup(src, "lxml")
# print(soup.prettify())

# Try to find and print specific places
precip = soup.findAll("span", attrs = {"id": "wob_pp"})

我期待找到所有的跨度标签(这是我试图提取的数据的标签),但是许多嵌套的跨度标签没有出现。

【问题讨论】:

    标签: python html google-chrome beautifulsoup


    【解决方案1】:

    您应该通过 HTTP 标头请求。 HTTP 标头允许客户端和服务器在请求或响应中传递附加信息。

    results = requests.get(url, headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'})
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-08-24
      • 2021-12-28
      • 2021-12-28
      • 1970-01-01
      • 1970-01-01
      • 2021-12-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多