【发布时间】:2017-01-31 22:20:57
【问题描述】:
我正在尝试抓取表格并将其转换为 python 中的 data.tables,但我在美国的选举数据中运气不佳。这是我要抓取的数据的 html。
<tr class="type-republican">
<th class="results-name" scope="row"><a href="xxxxx"><span class="name-combo"><span class="token token-party"><abbr title="Republican">R</abbr></span> <span class="token token-winner"><b aria-hidden="true" class="icon icon-check"></b> <span class="icon-text">Winner</span></span> D. Trump</span></a></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">62.9%</span><span class="graph"><span class="bar"><span class="index" style="width:62.9%;"></span></span></span></span></td>
<td class="results-popular">1,306,925</td>
<td class="delegates-cell">9</td>
</tr>
<tr class="type-democrat">
<th class="results-name" scope="row"><a href="xxxxxx"><span class="name-combo"><span class="token token-party"><abbr title="Democratic">D</abbr></span> H. Clinton</span></a></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">34.6%</span><span class="graph"><span class="bar"><span class="index" style="width:34.6%;"></span></span></span></span></td>
<td class="results-popular">718,084</td>
<td class="delegates-cell"></td>
</tr>
<tr class="type-independent">
<th class="results-name" scope="row"><span class="name-combo"><span class="token token-party"><abbr title="Independent">I</abbr></span> G. Johnson</span></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">2.1%</span><span class="graph"><span class="bar"><span class="index" style="width:2.1%;"></span></span></span></span></td>
<td class="results-popular">43,869</td>
<td class="delegates-cell"></td>
</tr>
<tr class="type-independent">
<th class="results-name" scope="row"><span class="name-combo"><span class="token token-party"><abbr title="Independent">I</abbr></span> J. Stein</span></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">0.4%</span><span class="graph"><span class="bar"><span class="index" style="width:0.4%;"></span></span></span></span></td>
<td class="results-popular">9,287</td>
<td class="delegates-cell"></td>
</tr>
</tbody>
</table>, <table class="results-table">
<tbody>
<tr class="type-republican">
<th class="results-name" scope="row"><a href="xxxxx"><span class="name-combo"><span class="token token-party"><abbr title="Republican">R</abbr></span> D. Trump</span></a></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">73.4%</span><span class="graph"><span class="bar"><span class="index" style="width:73.4%;"></span></span></span></span></td>
<td class="results-popular">18,110</td>
</tr>
<tr class="type-democrat">
<th class="results-name" scope="row"><a href="xxxxxx"><span class="name-combo"><span class="token token-party"><abbr title="Democratic">D</abbr></span> H. Clinton</span></a></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">24.0%</span><span class="graph"><span class="bar"><span class="index" style="width:24.0%;"></span></span></span></span></td>
<td class="results-popular">5,908</td>
</tr>
<tr class="type-independent">
<th class="results-name" scope="row"><span class="name-combo"><span class="token token-party"><abbr title="Independent">I</abbr></span> G. Johnson</span></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">2.2%</span><span class="graph"><span class="bar"><span class="index" style="width:2.2%;"></span></span></span></span></td>
<td class="results-popular">538</td>
</tr>
<tr class="type-independent">
<th class="results-name" scope="row"><span class="name-combo"><span class="token token-party"><abbr title="Independent">I</abbr></span> J. Stein</span></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">0.4%</span><span class="graph"><span class="bar"><span class="index" style="width:0.4%;"></span></span></span></span></td>
<td class="results-popular">105</td>
</tr>
</tbody>
等等…… 所以我的代码看起来像这样。
Percentage = []
Count = []
page = requests.get('xxxx')
soup = BeautifulSoup(page.text, "lxml")
table = soup.find('div', class_='content-alpha')
for row in table.find_all('tr'):
col = row.find_all('td')
Percentage = col[0].find(text=True)
Count = col[1].find(text=True
print (Count)
但我在这里得到的只是几张表的信息,但不是全部。如何从所有表格中获取信息?为什么我只能从几个表中获取信息?
我希望你能理解这个问题。
HTML 真的很大,所以我添加了指向网站http://www.politico.com/2016-election/results/map/president/alabama/ 的链接。我想抓取阿拉巴马州每个县的 2016 年美国大选数据
【问题讨论】:
-
您的数据中不存在“content-alpha”类。你能更新你想要抓取的数据和预期的结果吗?
-
如果您提供要抓取的网址,我们会更容易为您提供帮助
-
我添加了网站的链接。
标签: python web-scraping beautifulsoup html-table