【发布时间】:2018-06-29 16:45:24
【问题描述】:
我正在尝试构建一个简单的爬虫。爬虫工作正常,但我想在递归函数中输出一些消息,以了解在 $crawling 数组中还有多少页面需要爬取,以及当前正在爬取哪些页面。
下面是相关代码。我在函数中有两个回声,但是在脚本完成之前没有输出任何内容。是否可以在递归函数中沿途输出消息?
$alreadyCrawled = array();
$crawling = array();
function followLinks($url) {
global $alreadyCrawled;
global $crawling;
echo "Now crawling: $url";
$parser = new DomDocumentParser($url);
$linkList = $parser->getLinks();
// Get the links
for($i = 0; $linkList->length > $i; $i++) {
$href = $linkList->item($i)->getAttribute("href");
// Convert relative links to absolute links
if(strpos($href, "#") !== false) {
continue;
} else if(substr($href, 0, 11) === "javascript:") {
continue;
} else if(substr($href, 0, 6) === "mailto") {
continue;
}
$href = createLink($href, $url);
// Crawl page
if(!in_array($href, $alreadyCrawled)) {
$alreadyCrawled[] = $href;
$crawling[] = $href;
getDetails($href);
}
}
array_shift($crawling); // Remove page just crawled
echo "Finished crawling: $url, Pages left to crawl: " . count($crawling);
// Crawl until array is empty
foreach ($crawling as $site) {
followLinks($site);
}
}
【问题讨论】:
-
考虑冲洗输出,参考php.net/manual/en/function.flush.php
标签: php