【发布时间】:2013-12-29 15:26:30
【问题描述】:
我正在尝试从“决赛比赛列表”表(第二个表)中删除年份和获胜者(第一和第二列) http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals:我正在使用下面的代码:
import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm"
soup = BeautifulSoup(urllib2.urlopen(url).read())
soup.findAll('table')[0].tbody.findAll('tr')
for row in soup.findAll('table')[0].tbody.findAll('tr'):
first_column = row.findAll('th')[0].contents
third_column = row.findAll('td')[2].contents
print first_column, third_column
使用上面的代码,我能够很好地获得第一列和第三列。但是当我使用与http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals 相同的代码时,它找不到tbody 作为它的元素,但是当我检查元素时我可以看到tbody。
url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals"
soup = BeautifulSoup(urllib2.urlopen(url).read())
print soup.findAll('table')[2]
soup.findAll('table')[2].tbody.findAll('tr')
for row in soup.findAll('table')[0].tbody.findAll('tr'):
first_column = row.findAll('th')[0].contents
third_column = row.findAll('td')[2].contents
print first_column, third_column
这是我从评论错误中得到的:
'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-150-fedd08c6da16> in <module>()
7 # print soup.findAll('table')[2]
8
----> 9 soup.findAll('table')[2].tbody.findAll('tr')
10 for row in soup.findAll('table')[0].tbody.findAll('tr'):
11 first_column = row.findAll('th')[0].contents
AttributeError: 'NoneType' object has no attribute 'findAll'
'
【问题讨论】:
标签: python web-scraping beautifulsoup