【问题标题】:Scraping with requests and BS4使用请求和 BS4 进行抓取
【发布时间】:2019-03-12 08:26:20
【问题描述】:

我想获取表格中的内容,然后将其放入以下网站中的 pandas 数据框:https://projects.fivethirtyeight.com/soccer-predictions/premier-league/

我对 BS 很陌生,但我相信我想要的是这样的:

import requests
from bs4 import BeautifulSoup
r = requests.get(url = "https://projects.fivethirtyeight.com/soccer-predictions/ligue-1/")
soup = BeautifulSoup(r.text, "html.parser")
#print(soup.prettify())
print(soup.find("div", {"class":"forecast-table"}))

但是,当然,不幸的是,这将返回“无”。任何帮助和指导都会很棒! 我相信我需要得到的东西在这里的某个地方(虽然不太确定):

<div id="forecast-table-wrapper">
      <table class="forecast-table" id="forecast-table">
       <thead>
        <tr class="desktop">
         <th class="top nosort">
         </th>
         <th class="top bordered-right rating nosort drop-6" colspan="3">
          Team rating
         </th>
         <th class="top nosort rating2" colspan="1">
         </th>
         <th class="top bordered-right nosort drop-1" colspan="5">
          avg. simulated season
         </th>
         <th class="top bordered-right nosort show-1 drop-3" colspan="2">
          avg. simulated season
         </th>
         <th class="top bordered nosort" colspan="4">
          end-of-season probabilities
         </th>
        </tr>
        <tr class="sep">
         <th colspan="11">
         </th>
        </tr>

【问题讨论】:

  • 那么,有没有一个div,其classforecast-table

标签: python beautifulsoup


【解决方案1】:

既然你用pandas反正你可以使用内置的表处理,像这样:

pandas.read_html('https://projects.fivethirtyeight.com/soccer-predictions/premier-league/',
  attrs = {
    'class': 'forecast-table'
  }, header = 1)

【讨论】:

    【解决方案2】:
    import requests
    from bs4 import BeautifulSoup
    r = requests.get('https://projects.fivethirtyeight.com/soccer-predictions/ligue-1/')
    soup = BeautifulSoup(r.content, 'html.parser')
    table = soup.find_all('table', attrs={'class':'forecast-table'})
    for i in table:
        tr = i.find_all('tr')
        for l in tr:
            print(l.text)
    

    输出:

    Team ratingavg. simulated seasonavg. simulated seasonend-of-season probabilities
    
    teamspioff.def.WDLgoal diff.proj. pts.pts.relegatedrel.qualify for UCLmake UCLwin Ligue 1win league
    PSG24 pts90.03.00.530.74.52.9+7897<1%>99%97%
    Lyon14 pts76.32.10.719.69.19.3+2768<1%60%2%
    Marseille13 pts71.12.00.918.38.311.4+1663<1%40%<1%
    Lille19 pts63.71.70.916.78.612.6+9591%24%<1%
    St Étienne15 pts62.71.60.914.710.912.4-1553%14%<1%
    Montpellier16 pts64.01.50.713.912.411.7+2543%12%<1%
    Nice11 pts62.01.60.913.510.014.5-7507%7%<1%
    Monaco6 pts65.91.80.913.010.714.2+0508%7%<1%
    Rennes8 pts63.41.60.813.010.514.5-3499%6%<1%
    Bordeaux14 pts59.21.50.913.09.915.0-6498%5%<1%
    Strasbourg12 pts59.21.51.012.610.814.6-2499%5%<1%
    Angers11 pts60.41.50.912.610.215.2-54810%4%<1%
    Toulouse13 pts58.21.50.911.912.014.1-104811%4%<1%
    Dijon FCO10 pts57.71.61.112.28.517.3-124517%2%<1%
    Caen10 pts55.61.41.010.812.414.8-104518%3%<1%
    Nîmes10 pts54.91.51.110.711.615.6-134420%2%<1%
    Reims10 pts55.31.30.910.312.315.4-144321%2%<1%
    Nantes6 pts59.01.50.910.410.916.7-144225%1%<1%
    Guingamp5 pts57.31.51.010.39.817.9-194130%<1%<1%
    Amiens10 pts53.01.31.010.49.018.6-164031%<1%<1%
    

    【讨论】:

    • 有什么建议可以将团队名称与分数分开吗?源代码中的团队代码如下所示:&lt;div class="name"&gt;Man. City&lt;span class="record"&gt;19 pts&lt;/span&gt;&lt;/div&gt;
    【解决方案3】:

    那是因为你搜索的是一个div,但它是一个表格,所以应该是:

    print(soup.find("table", {"class":"forecast-table"}))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-12-09
      • 1970-01-01
      • 2020-09-04
      • 2018-11-08
      • 2019-02-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多