【发布时间】:2016-09-20 06:04:26
【问题描述】:
我去年写了一个数据库播种器,用于抓取一个统计网站。重新访问我的代码后,它似乎不再起作用,我对原因感到有些困惑。 $html->find() 应该返回找到的元素数组,但它似乎只在使用时才找到第一个表。
根据文档,我尝试使用 find() 并指定每个表的 ID,但这似乎也失败了。
$table_passing = $html->find('table[id=passing]');
谁能帮我弄清楚这里出了什么问题?我不知道为什么这些方法都不起作用,页面源清楚地显示了多个表和 ID,这两种方法都应该起作用。
private function getTeamStats()
{
$url = 'http://www.pro-football-reference.com/years/2016/opp.htm';
$html = file_get_html($url);
$tables = $html->find('table');
$table_defense = $tables[0];
$table_passing = $tables[1];
$table_rushing = $tables[2];
//$table_passing = $html->find('table[id=passing]');
$teams = array();
# OVERALL DEFENSIVE STATISTICS #
foreach ($table_defense->find('tr') as $row)
{
$stats = $row->find('td');
// Ignore the lines that don't have ranks, these aren't teams
if (isset($stats[0]) && !empty($stats[0]->plaintext))
{
$name = $stats[1]->plaintext;
$rank = $stats[0]->plaintext;
$games = $stats[2]->plaintext;
$yards = $stats[4]->plaintext;
// Calculate the Yards Allowed per Game by dividing Total / Games
$tydag = $yards / $games;
$teams[$name]['rank'] = $rank;
$teams[$name]['games'] = $games;
$teams[$name]['tydag'] = $tydag;
}
}
# PASSING DEFENSIVE STATISTICS #
foreach ($table_passing->find('tr') as $row)
{
$stats = $row->find('td');
// Ignore the lines that don't have ranks, these aren't teams
if (isset($stats[0]) && !empty($stats[0]->plaintext))
{
$name = $stats[1]->plaintext;
$pass_rank = $stats[0]->plaintext;
$pass_yards = $stats[14]->plaintext;
$teams[$name]['pass_rank'] = $pass_rank;
$teams[$name]['paydag'] = $pass_yards;
}
}
# RUSHING DEFENSIVE STATISTICS #
foreach ($table_rushing->find('tr') as $row)
{
$stats = $row->find('td');
// Ignore the lines that don't have ranks, these aren't teams
if (isset($stats[0]) && !empty($stats[0]->plaintext))
{
$name = $stats[1]->plaintext;
$rush_rank = $stats[0]->plaintext;
$rush_yards = $stats[7]->plaintext;
$teams[$name]['rush_rank'] = $rush_rank;
$teams[$name]['ruydag'] = $rush_yards;
}
}
【问题讨论】:
标签: php html parsing web-scraping simple-html-dom