【发布时间】:2016-04-22 06:50:37
【问题描述】:
所以我正在尝试从网站获取一些数据。而且我很难获得数据。我可以得到球员的名字,但目前仅此而已。一直在尝试不同的事情。这是我试图通过的示例代码。请注意,有两个表(每个团队一个)。并且每个玩家的类从“偶数”到“奇数”或“奇数”到“偶数”下面的示例 html 文件交替,然后是我的 python 脚本。我标记了我想要的部分。我也在使用 python 2.7
`<table id="nbaGITeamStats" cellpadding="0" cellspacing="0">
<thead class="nbaGIClippers">
<tr>
<th colspan="17">Los Angeles Clippers (1-0)</th> <!-- I want team name -->
</tr>
</thead>
<tbody><tr colspan="17">
<td colspan="17" class="nbaGIBoxCat"><span>field goals</span><span>rebounds</span></td>
</tr>
<tr>
<td class="nbaGITeamHdrStatsNoBord" colspan="1"> </td>
<td class="nbaGITeamHdrStats">pos</td>
<td class="nbaGITeamHdrStats">min</td>
<td class="nbaGITeamHdrStats">fgm-a</td>
<td class="nbaGITeamHdrStats">3pm-a</td>
<td class="nbaGITeamHdrStats">ftm-a</td>
<td class="nbaGITeamHdrStats">+/-</td>
<td class="nbaGITeamHdrStats">off</td>
<td class="nbaGITeamHdrStats">def</td>
<td class="nbaGITeamHdrStats">tot</td>
<td class="nbaGITeamHdrStats">ast</td>
<td class="nbaGITeamHdrStats">pf</td>
<td class="nbaGITeamHdrStats">st</td>
<td class="nbaGITeamHdrStats">to</td>
<td class="nbaGITeamHdrStats">bs</td>
<td class="nbaGITeamHdrStats">ba</td>
<td class="nbaGITeamHdrStats">pts</td>
</tr>
<tr class="odd">
<td id="nbaGIBoxNme" class="b"><a href="/playerfile/paul_pierce/index.html">P. Pierce</a></td> <!-- I want player name -->
<td class="nbaGIPosition">F</td> <!-- I want position name -->
<td>14:16</td> <!-- I want this -->
<td>1-4</td> <!-- I want this -->
<td>1-2</td> <!-- I want this -->
<td>2-2</td> <!-- I want this -->
<td>+12</td> <!-- I want this -->
<td>1</td> <!-- I want this -->
<td>0</td> <!-- I want this -->
<td>1</td> <!-- I want this -->
<td>1</td> <!-- I want this -->
<td>3</td> <!-- I want this -->
<td>2</td> <!-- I want this -->
<td>0</td> <!-- I want this -->
<td>0</td> <!-- I want this -->
<td>0</td> <!-- I want this -->
<td>5</td> <!-- I want this -->
</tr>
<tr class="even">
<td id="nbaGIBoxNme" class="b"><a href="/playerfile/blake_griffin/index.html">B. Griffin</a></td> <!-- I want this -->
<td class="nbaGIPosition">F</td> <!-- I want this -->
<td>26:19</td> <!-- I want this -->
<td>5-14</td> <!-- I want this -->
<td>0-1</td> <!-- I want this -->
<td>1-1</td> <!-- I want this -->
<td>+14</td> <!-- I want this -->
<td>0</td> <!-- I want this -->
<td>5</td> <!-- I want this -->
<td>5</td> <!-- I want this -->
<td>2</td> <!-- I want this -->
<td>1</td> <!-- I want this -->
<td>1</td> <!-- I want this -->
<td>1</td> <!-- I want this -->
<td>1</td> <!-- I want this -->
<td>1</td> <!-- I want this -->
<td>11</td> <!-- I want this -->
</tr>
<tr class="odd">
<td id="nbaGIBoxNme" class="b"><a href="/playerfile/deandre_jordan/index.html">D. Jordan</a></td> <!-- I want this -->
<td class="nbaGIPosition">C</td> <!-- I want this -->
<td>26:27</td> <!-- I want this -->
<td>6-7</td> <!-- I want this -->
<td>0-0</td> <!-- I want this -->
<td>3-5</td> <!-- I want this -->
<td>+19</td> <!-- I want this -->
<td>1</td> <!-- I want this -->
<td>11</td> <!-- I want this -->
<td>12</td> <!-- I want this -->
<td>0</td> <!-- I want this -->
<td>1</td> <!-- I want this -->
<td>0</td> <!-- I want this -->
<td>2</td> <!-- I want this -->
<td>3</td> <!-- I want this -->
<td>0</td> <!-- I want this -->
<td>15</td> <!-- I want this -->
</tr>
<!-- And so on it will keep changing class from odd to even, even to odd -->
<!-- Also note there are to tables one for each team -->
<!--this is he table id>>> <table id="nbaGITeamStats" cellpadding="0" cellspacing="0"> -->`
这很长,但我想举一个切换类的例子,这里是我的 python 脚本,我打算在实际成功抓取数据后使用字典来保存数据。
import urllib
import urllib2
from bs4 import BeautifulSoup
import re
gamesForDay = ['/games/20151002/DENLAC/gameinfo.html']
for game in gamesForDay:
url = "http://www.nba.com/"+game
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
for tr in soup.find_all('table id="nbaGITeamStats'):
tds = tr.find_all('td')
print tds
【问题讨论】:
标签: python html web-scraping