【问题标题】:Bypassing Loop AttributeError: 'NoneType' object has no attribute 'findAll'绕过循环 AttributeError:“NoneType”对象没有属性“findAll”
【发布时间】:2015-06-07 08:02:13
【问题描述】:
import requests
from bs4 import BeautifulSoup
import csv
from urlparse import urljoin
import urllib2


base_url = 'http://www.baseball-reference.com'
data = requests.get("http://www.baseball-reference.com/players/")
soup = BeautifulSoup(data.content)
player_url = 'http://www.baseball-reference.com/players/'
game_logs = 'http://www.baseball-reference.com/players/gl.cgi?id='
years = ['2000','2001','2002','2003','2004','2005','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015']
url = []
for link in soup.find_all('a'):
    if link.has_attr('href'):
        base_url + link['href']
        url.append(base_url + link['href'])
sink = []
for l in url:
    if l[0:42] in player_url:
        sink.append(l)
abc = []
for aa in sink:
    if len(aa) > 48:
        abc.append(aa)
urlz = []
for ab in abc:
    data = requests.get(ab)
    soup = BeautifulSoup(data.content)
    for link in soup.find_all('a'):
        if link.has_attr('href'):
            urlz.append(base_url + link['href'])
abc = []
for aa in urlz:
    if game_logs in aa:
        abc.append(aa)
urlll = []
for ab in years:
    for ac in abc:
        if ab in ac:
            urlll.append(ac)

for j in urlll:
    response = requests.get(j)
    html = response.content
    soup = BeautifulSoup(html)
    table = soup.find('table', attrs={'id': 'batting_gamelogs'})
    list_of_rows = []
    for row in table.findAll('tr'):
        list_of_cells = []
        for cell in row.findAll('td'):
            text = cell.text.replace(' ', '').encode("utf-8")
            list_of_cells.append(text)
        list_of_rows.append(list_of_cells)
    print list_of_rows

当我遍历 url 以获取表格时,有一些表格不存在的 url。我收到一个错误返回给我,看起来像:

Traceback (most recent call last):
  File "py5.py", line 55, in <module>
    list_of_cells.append(text)
AttributeError: 'NoneType' object has no attribute 'findAll'

有没有办法在没有桌子的情况下继续循环?

【问题讨论】:

  • 使用try and except
  • 使用Exception handling
  • if whatever is None: continue?
  • 该死的,我写完答案的那一刻,我看到每个人都评论了同样的想法。
  • 我不明白这一行:- list_of_cells.append(text) 在属性 findAll 上出现错误。

标签: python loops error-handling web-scraping


【解决方案1】:

使用try and except 并处理错误

 for row in table.findAll('tr'):
        list_of_cells = []
        for cell in row.findAll('td'):
            text = cell.text.replace('&nbsp;', '').encode("utf-8")
            try:
                list_of_cells.append(text)
            except Exception, e:
                # handle exception
        list_of_rows.append(list_of_cells)

【讨论】:

  • 没有工作,在没有属性的情况下循环通过 url 时,错误似乎在到达 try/exception 之前终止了循环。
  • 你能提供一个失败页面的例子吗?
  • baseball-reference.com/players/… 是一个失败的页面,当我循环通过时,我希望循环在发生错误时移过错误。我怎么能这样做?
猜你喜欢
  • 2013-08-06
  • 1970-01-01
  • 2019-01-01
  • 2021-12-26
  • 2019-07-23
  • 2018-05-13
  • 2020-09-07
  • 2017-05-03
  • 2023-03-16
相关资源
最近更新 更多