Perl Mechanize 查找 Div 中的所有链接答案

【问题标题】：Perl Mechanize find all links within DivPerl Mechanize 查找 Div 中的所有链接
【发布时间】：2011-09-20 19:40:27
【问题描述】：

有没有办法使用 Mechanize 找到特定 div 中的所有链接？

我尝试使用 find_all_links 但找不到解决方法。例如，

<div class="sometag">
<ul class"tags">
<li><a href="/a.html">A</a></li>
<li><a href="/b.html">B</a></li> 
</ul>
</div>

【问题讨论】：

find_all_links 的参数是什么？

标签： html perl mechanize

【解决方案1】：

从 HTML 文件中获取有用信息的有用工具是 HTML::Grabber。它使用 jQuery 风格的语法来引用 HTML 中的元素，所以你可以这样做：

use HTML::Grabber;

# Your mechanize stuff here ...

my $dom = HTML::Grabber->new( html => $mech->content );

my @links;
$dom->find('div.sometag a')->each(sub {
    push @links, $_->attr('href');
});

【讨论】：

【解决方案2】：

Web::Scraper 对于抓取很有用。

use strict;
use warnings;
use WWW::Mechanize;
use Web::Scraper;

my $mech = WWW::Mechanize->new;
$mech->env_proxy;
# If you want to login, do it with mechanize.

my $staff = scrape { process 'div.sometag li.tags a', 'links[]' => '@href' };
# pass mechanize to scraper as useragent.
$staff->user_agent($mech);

my $res = $staff->scrape( URI->new("http://example.com/") );
for my $link (@{$res->{links}}) {
    warn $link;
}

抱歉，我没有测试这段代码。

【讨论】：