【问题标题】:Extracting Specific Links with Beauttiful Soup使用 Beautifulsoup 提取特定链接
【发布时间】:2018-04-19 00:51:21
【问题描述】:

我正在尝试从以下 HTML 代码中提取特定链接。

 <div class="RadAjaxPanel" id="LiveBoard1_LiveBoard1_litGamesPanel">
<br /><b><a href="winss.aspx?team=White Sox&pos=all&stats=bat&qual=0&type=8&season=2018&month=0&season1=2018">White Sox</a></b> @ <b><a href="winss.aspx?team=Athletics&pos=all&stats=bat&qual=0&type=8&season=2018&month=0&season1=2018">Athletics</a></b>&nbsp;&nbsp;15:35 ET<br /><center><table style="width:360px;"><tr><td align="center" width="120.07295665741px" style="border:1px solid black;">33.4 %</td><td align="center" width="239.92704334259px" style="border:1px solid black;">66.6 %</td></tr><table></center><br /><center><table style="width:360px;" class="lineup"><tr><td align="left">SP: <a href="statss.aspx?playerid=18311&position=P">Carson Fulmer</a></td><td align="left">SP: <a href="statss.aspx?playerid=13533&position=P">Andrew Triggs</a></td></tr><tr><td align="left">1. <a href="statss.aspx?playerid=17232&position=2B">Yoan Moncada</a> (2B)<br />2. <a href="statss.aspx?playerid=11602&position=2B">Yolmer Sanchez</a> (3B)<br />3. <a href="statss.aspx?playerid=15676&position=1B">Jose Abreu</a> (DH)<br />4. <a href="statss.aspx?playerid=13157&position=OF">Nick Delmonico</a> (LF)<br />5. <a href="statss.aspx?playerid=7226&position=3B/DH">Matt Davidson</a> (1B)<br />6. <a href="statss.aspx?playerid=5913&position=OF">Leury Garcia</a> (RF)<br />7. <a href="statss.aspx?playerid=3256&position=C">Welington Castillo</a> (C)<br />8. <a href="statss.aspx?playerid=15172&position=SS">Tim Anderson</a> (SS)<br />9. <a href="statss.aspx?playerid=15082&position=OF">Adam Engel</a> (CF)<br /></td>

我希望最终提取包含球队名称,在本例中为 Athletics 和 White Sox,以及相应的获胜概率(33.4% 和 66.6%)。我可以使用漂亮的汤提取所有这些链接,但我无法删除阵容链接。我注意到所有的阵容链接都以“statss”开头。提取页面上所有链接时,有什么方法可以告诉美汤分解“statss”链接吗?我当前的代码如下所示。正如你所知道的,我已经尝试通过尝试查找 class=lineup 来使用分解功能,但输出仍然返回整个阵容。提前感谢您的帮助!

import requests
from bs4 import BeautifulSoup

page=requests.get('https://www.fangraphs.com/livescoreboard.aspx?date=2018- 
04-18')
soup=BeautifulSoup(page.text, 'html.parser')

#Remove Lineup Links
lineup_links=soup.find(class_='lineup')
lineup_links.decompose()

team_name_list=soup.find(class_='RadAjaxPanel')
team_name_list_items=team_name_list.find_all('a')


for team_name in team_name_list_items:
 print(team_name.prettify())


odds_list=soup.find(class_='RadAjaxPanel')
odds_list_items=odds_list.find_all('td',attrs={'style':'border:1px solid 
black;'})

for odds in odds_list_items:
 print(odds.prettify())

【问题讨论】:

    标签: python beautifulsoup


    【解决方案1】:

    似乎您正在删除第一个实例,而不是每个实例。尝试循环链接并一一分解,例如:

    #Remove Lineup Links
    [link.decompose() for link in soup.find_all(class_='lineup')] 
    

    【讨论】:

      猜你喜欢
      • 2021-11-20
      • 1970-01-01
      • 1970-01-01
      • 2015-09-08
      • 1970-01-01
      • 2019-12-30
      • 1970-01-01
      • 2022-12-05
      • 2015-12-09
      相关资源
      最近更新 更多