【发布时间】:2021-09-07 21:58:19
【问题描述】:
我希望这个函数从网站上抓取链接:
function filter ($url)
{
$content = file_get_contents($url);
$dom = new DOMDocument();
@$dom->loadHTML($content);
$outcomes = $dom->getElementsByTagName('a');
foreach ($outcomes as $outcome) {
$seeds = $outcome->getAttribute('href');
}
}
$index = "scrap.html";
$fn = filter($index);
我希望这个函数从我从上述函数获取的那些 url 中抓取元数据以进行抓取:
function meta_crawl($site) {
$get_meta = get_meta_tags($site);
$meta_list = array();
$meta_list[] = $get_meta['keywords'];
$meta_list[] = $get_meta['description'];
$keywords = explode(',', $meta_list[0]);
foreach ($keywords as $keyword) {
$keyword;
$a[] = $keyword;
}
$keywordList = [];
array_push($keywordList, $a);
print_r($keywordList);
}
我想从过滤器函数中调用@seed 变量@ 也许它会起作用:
meta_crawl($fn($seeds));
【问题讨论】:
-
filter()需要返回一些东西。
标签: javascript php function web-crawler