【问题标题】:IndexError: list index out of range (Python web scraping)IndexError:列表索引超出范围(Python 网页抓取)
【发布时间】:2019-07-25 20:22:28
【问题描述】:

这是我第一次网络抓取。我遵循了一个教程,但我试图抓取一个不同的页面,我得到以下信息:

gamesplayed = 数据[1].getText()

IndexError: 列表索引超出范围

这是目前为止的代码

from bs4 import BeautifulSoup
import urllib.request
import csv

urlpage =  'https://www.espn.com/soccer/standings/_/league/FIFA.WORLD/fifa-world-cup'
page = urllib.request.urlopen(urlpage)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)

table = soup.find('table', attrs={'class': 'Table2__table__wrapper'})
results = table.find_all('tr')
#print('Number of results:', len(results))


rows = []
rows.append(['Group A', 'Games Played', 'Wins', 'Draws', 'Losses', 'Goals For', 'Goals Against', 'Goal Difference', 'Points'])
print(rows)

# loop over results
for result in results:
    # find all columns per result
    data = result.find_all('td')
    # check that columns have data
    if len(data) == 0:
        continue

    # write columns to variables
    groupa = data[0].getText()
    gamesplayed = data[1].getText()
    wins = data[2].getText()
    draws = data[3].getText()
    losses = data[4].getText()
    goalsfor = data[5].getText()
    goalsagainst = data[6].getText()
    goaldifference = data[7].getText()
    point = data[8].getText()

【问题讨论】:

  • 当您在调试器中检查data 时,它说它包含什么?

标签: python python-3.x web web-scraping beautifulsoup


【解决方案1】:

请看下面的内容

if len(data) == 0:
        continue

在下方屏蔽

from bs4 import BeautifulSoup
import urllib.request
import csv

urlpage =  'https://www.espn.com/soccer/standings/_/league/FIFA.WORLD/fifa-world-cup'
page = urllib.request.urlopen(urlpage)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)

table = soup.find('table', attrs={'class': 'Table2__table__wrapper'})
results = table.find_all('tr')
#print('Number of results:', len(results))


rows = []
rows.append(['Group A', 'Games Played', 'Wins', 'Draws', 'Losses', 'Goals For', 'Goals Against', 'Goal Difference', 'Points'])
print(rows)

# loop over results
for result in results:
    # find all columns per result
    data = result.find_all('td')
    # check that columns have data
    if len(data) == 0:
        continue
    print(len(data))
    # Here's where you didn't see that what you scraped was list of list
    print(data)
    #[['Group A', 'Games Played', 'Wins', 'Draws', 'Losses', 'Goals For', 'Goals Against', 'Goal Difference', 'Points']]
    data = data[0]
    # write columns to variables
    groupa = data[0].getText()
    gamesplayed = data[1].getText()
    wins = data[2].getText()
    draws = data[3].getText()
    losses = data[4].getText()
    goalsfor = data[5].getText()
    goalsagainst = data[6].getText()
    goaldifference = data[7].getText()
    point = data[8].getText()

【讨论】:

    【解决方案2】:

    错误消息非常具有描述性:您正在尝试访问列表中不存在的索引。

    如果data 必须包含至少 9 个元素(您正在访问索引 0 到 8),那么您可能应该更改

    if len(data) == 0:
        continue
    

    if len(data) < 9:
        continue
    

    所以在这种情况下您可以放心地跳过data

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-08-25
      • 2017-04-05
      • 2012-07-15
      • 2013-06-04
      相关资源
      最近更新 更多