两个类在 HTML 中具有相同的名称，BeautifulSoup 只选择第一个类答案

【问题标题】：Two classes have the same name in HTML and BeautifulSoup is only selecting the first class两个类在 HTML 中具有相同的名称，BeautifulSoup 只选择第一个类
【发布时间】：2019-08-19 00:53:02
【问题描述】：

下面的链接页面有两个同名的类，其中包含数据。我正在尝试从中挖掘球员姓名并分配他们在锦标赛中的位置。 beautifulsoup 中的 find 函数只允许我抓取该类的第一个实例。

我尝试了几次不同的迭代尝试迭代类的第一个实例，但没有任何效果。有两个 Table2__tbody 实例似乎是问题所在，我如何通过第一个实例并从第二个实例中挖掘数据。

    url_page = "https://www.espn.com/golf/leaderboard/_/tournamentId/401056502"
    page = requests.get(url_page)
    soup = BeautifulSoup(page.text, 'html.parser')

    name_list = soup.find(class_='Table2__tbody')

    name_list_items = name_list.find_all('a')

name_list 仅从 Table2__tbody 的第一个实例中捕获数据。我需要的只是第二个实例的数据。

【问题讨论】：

你似乎已经知道find_all;你为什么不用它？

标签： python html css beautifulsoup

【解决方案1】：

我认为你没有完全进入正确的属性。 'Table2__tbody' 仅指向hole_playoff 结果的第一张表。您要查找的属性实际上是 'tl Table2__td'。

所以当你运行以下代码（在python3中运行）和BS4时：

from bs4 import BeautifulSoup
from urllib import request

url_page = "https://www.espn.com/golf/leaderboard/_/tournamentId/401056502"
page = request.urlopen(url_page)
soup = BeautifulSoup(page, 'html.parser')

name_list = soup.find_all(class_='tl Table2__td')
name_list_items = []
for i in name_list:
    name_list_items.append(i.get_text())

实际上，您会得到一个列表，其中包含玩家在偶数索引上的位置，以及在奇数索引上的名称。一些简单的数据操作可以安排它做任何你需要做的事情。

【讨论】：

【解决方案2】：

如何选择合适的表格的一个解决方案是使用 CSS 选择器。

table:has(a.leaderboard_player_name) 将选择包含<a> 和类leaderboard_player_name 的<table>，这是我们的播放器列表：

import requests
from bs4 import BeautifulSoup

url_page = "https://www.espn.com/golf/leaderboard/_/tournamentId/401056502"
page = requests.get(url_page)
soup = BeautifulSoup(page.text, 'html.parser')

table_with_namelist = soup.select_one('table:has(a.leaderboard_player_name)')

for a in table_with_namelist.select('.leaderboard_player_name'):
    print(a.text)

打印：

Xander Schauffele
Tony Finau
Justin Rose
Andrew Putnam
Kiradech Aphibarnrat
Keegan Bradley

...etc.

【讨论】：