【发布时间】:2014-08-05 23:16:51
【问题描述】:
我正在尝试在 craigslist 上搜索公寓。
代码:
$city = 'saltlakecity';
$rooms = '';
$query = '';
$sdate ='';
$url = 'http://'.$city.'.craigslist.org/search/apa?bedrooms='.$rooms.'&query='.$query.'&sale_date='.$sdate.'';
$base_url = parse_url($url, PHP_URL_HOST);
$resultspage = file_get_contents($url);
// use DOMDocument and DOMXpath
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($resultspage);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$data = array();
$rows = $xpath->query('//p[@class="row"]'); // get all rows
foreach($rows as $entries) { // loop each row
$entry = array();
$entry['title'] = $xpath->query('./span[@class="txt"]/span[@class="pl"]/a', $entries)->item(0)->nodeValue;
$entry['link'] = 'http://' . $base_url . $xpath->query('./a[@class="i"]', $entries)->item(0)->getAttribute('href');
$entry['price'] = $xpath->query('./span[@class="txt"]/span[@class="l2"]/span[1]', $entries)->item(0)->nodeValue;
$location = $xpath->query('./span[@class="txt"]/span[@class="l2"]/span[2]', $entries)->item(0)->nodeValue;
$loc = str_replace(array('(', ')'), '', $location);
$entry['location'] = $loc;
$entry['seller'] = $xpath->query('./span[@class="txt"]/span[@class="l2"]/a', $entries)->item(0)->nodeValue;
$url2 = $entry['link'];
$listingpage = file_get_contents($url2);
$dom2 = new DOMDocument();
libxml_use_internal_errors(true);
$dom2->loadHTML($listingpage);
libxml_clear_errors();
$xpath2 = new DOMXpath($dom2);
$entry['address'] = $xpath2->query('./div[@class="mapAndAttrs"]/div[3]')->item(0)->nodeValue;
$text_node = $xpath->query('./span[@class="txt"]/span[@class="l2"]/span[1]/following-sibling::text()[1]', $entries)->item(0)->nodeValue;
// remove "/"" and "-"" | explode by space | filter space (now, its left by 2 values: bedroom and size)
$text_node = array_filter(explode(' ', str_replace(array('/', '-'), '', $text_node)));
$entry['bedrooms'] = array_shift($text_node); // bedroom
$entry['dimensions'] = array_shift($text_node); // dimensions
$data[] = $entry; // after gathering necessary items, assign inside
}
echo '<pre>';
print_r($data);
**更新:我现在正在尝试抓取已抓取的链接,以获取该物业的地址**
我想要完成的是进行预匹配,找到标题、URL、卧室数量、所在城市以及价格,然后将其打印出来。但是,如果我简单地放置“$matches”,则页面放置数组。如果我把代码放在上面,页面加载为白色。
有人可以检查我的代码并告诉我我在这里可能做错了什么吗? 谢谢!
【问题讨论】:
标签: php html xpath web-scraping domdocument