BeautifulSoup 仅识别 5 个表中的 2 个

【问题标题】：BeautifulSoup only identifying 2 of 5 tablesBeautifulSoup 仅识别 5 个表中的 2 个
【发布时间】：2020-11-02 21:28:58
【问题描述】：

我正在开发我的第一个 python 项目，但遇到了障碍。我正在尝试使用 BeautifulSoup 从该站点上的某些表中抓取数据：https://www.basketball-reference.com/awards/awards_2020.html

当我使用以下代码时，我能够从前两个表中获取数据，但其他三个无法识别（即 len(tables) =2 当它应该 =5）

import requests
from bs4 import BeautifulSoup

url = 'https://www.basketball-reference.com/awards/awards_{}.html'.format(awardyear)
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

tables = soup.find_all('table')
len(tables)

当我打印汤时，所有表格都在 html 中，所以我不确定为什么最后三个无法识别。我花了一些时间试图找出被识别/未被识别但到目前为止都是空的表之间的区别。

【问题讨论】：

标签： python beautifulsoup

【解决方案1】：

发生这种情况是因为其他 3 个表在 HTML cmets  内。

您可以提取表格检查标签是否为Comment 类型：

import requests
from bs4 import BeautifulSoup, Comment


URL = "https://www.basketball-reference.com/awards/awards_2020.html"

soup = BeautifulSoup(requests.get(URL).content, "html.parser")

# Find all comments
comments = soup.find_all(text=lambda t: isinstance(t, Comment))
comment_soup = BeautifulSoup(str(comments), "html.parser")

print("The length of tables:", len(soup.find_all("table")))

print("The length of tables within comments:", len(comment_soup.find(class_="table_outer_container")))

输出：

The length of tables: 2
The length of tables within comments: 3

【讨论】：

感谢您的帮助！