用美汤刮身份证答案

【问题标题】：Scraping an ID with Beautiful Soup用美汤刮身份证
【发布时间】：2023-03-22 09:11:01
【问题描述】：

我正在尝试从该站点提取数据：

https://www.ultimatetennisstatistics.com/tournamentEvent?tournamentEventId=4073

我想要单场比赛的统计数据，例如决赛，你可以看到它点击蓝色图标。

所以，我写了这段代码：

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla5/.0'}
URL = "https://www.ultimatetennisstatistics.com/tournamentEvent?tournamentEventId=4073"

page = requests.get(URL, headers= headers)
soup = BeautifulSoup(page.content, 'html.parser')

stats = soup.find(id="matchStats-171140Overview")

print(stats)

但结果是“无”。我不明白为什么，因为那个 id 确实存在。我想得到，例如3.1%。

谁能帮帮我？

谢谢。

【问题讨论】：

代码中的错字，试试这个stats = soup.find(id="matchStats-171140")

标签： python beautifulsoup

【解决方案1】：

当我检查该网页时，我没有看到 ID 为 matchStats-171140Overview 的元素，但我看到了 ID 为 matchStats-171140 的元素。所以我认为 Sushanth 是对的。

但是，当我运行您的代码并打印 soup 时，我找不到 ID 为 matchStats-171140 的元素。

这是有效的：

soup = BeautifulSoup(page.text, 'html.parser') # page.text instead of page.content

stats = soup.find(id="matchStats-171140") # Took out 'Overview' from the id

page 是请求响应，.content 以字节为单位提供响应。它可能会导致您以某种方式丢失 id 属性。

请参阅.content 与.text 上的请求文档。

编辑：我现在看到了这个问题。在用户单击蓝色图标以显示弹出窗口之前，您想要的元素不在 DOM 中。这就是Beautiful Soup找不到它的原因。这个帖子可能就是你要找的：Is it possible to scrape a "dynamical webpage" with beautifulsoup?

【讨论】：

您好，感谢您的回答！那个ID得到了一些东西，但不是我想要的。我刚刚实现了这个帖子，现在可能更清楚了。我尝试了 .text 和 .content ，但在这种情况下，我的 ID 和我的 ID 得到了相同的结果。所以问题依然存在。
知道了。我编辑了我的答案，希望它能引导你朝着正确的方向前进。基本上，仅使用 Beautiful Soup 可能是不可能的，因为这是一个动态网页。