【问题标题】:Laravel Goutte cannot get meta tagsLaravel Goutte 无法获取元标签
【发布时间】:2016-08-27 22:29:30
【问题描述】:

我在 laravel 5.2 中使用Goutte\Client 似乎无法获取元标记内容但可以获取标题、链接等。

这将返回空字符串。

$parse = $htmlParser->request('GET', 'http://www.sample.com');
$parse->filter('meta');

输出:

  private 'nodes' => 
    array (size=20)
      0 => 
        object(DOMElement)[289]
          public 'tagName' => string 'meta' (length=4)
          public 'schemaTypeInfo' => null
          public 'nodeName' => string 'meta' (length=4)
          public 'nodeValue' => string '' (length=0)
          public 'nodeType' => int 1
          public 'parentNode' => string '(object value omitted)' (length=22)
          public 'childNodes' => string '(object value omitted)' (length=22)
          public 'firstChild' => null
          public 'lastChild' => null
          public 'previousSibling' => null
          public 'nextSibling' => string '(object value omitted)' (length=22)
          public 'attributes' => string '(object value omitted)' (length=22)
          public 'ownerDocument' => string '(object value omitted)' (length=22)
          public 'namespaceURI' => null
          public 'prefix' => string '' (length=0)
          public 'localName' => string 'meta' (length=4)
          public 'baseURI' => null
          public 'textContent' => string '' (length=0)

这会返回标题。

$parse = $htmlParser->request('GET', 'http://www.sample.com');
$parse->filter('title');

输出:

  private 'nodes' => 
    array (size=1)
      0 => 
        object(DOMElement)[289]
          public 'tagName' => string 'title' (length=5)
          public 'schemaTypeInfo' => null
          public 'nodeName' => string 'title' (length=5)
          public 'nodeValue' => string 'Test title' (length=36)
          public 'nodeType' => int 1
          public 'parentNode' => string '(object value omitted)' (length=22)
          public 'childNodes' => string '(object value omitted)' (length=22)
          public 'firstChild' => string '(object value omitted)' (length=22)
          public 'lastChild' => string '(object value omitted)' (length=22)
          public 'previousSibling' => string '(object value omitted)' (length=22)
          public 'nextSibling' => string '(object value omitted)' (length=22)
          public 'attributes' => string '(object value omitted)' (length=22)
          public 'ownerDocument' => string '(object value omitted)' (length=22)
          public 'namespaceURI' => null
          public 'prefix' => string '' (length=0)
          public 'localName' => string 'title' (length=5)
          public 'baseURI' => null
          public 'textContent' => string 'Test title' (length=36)

【问题讨论】:

标签: php laravel web-scraping laravel-5.2 goutte


【解决方案1】:

@moisesgallego 在this 上的帖子能够回答我的问题,虽然玩弄它,但我也能找到另一个问题。所以基本上它会遍历所有元标记并将名称和内容作为数组返回。

$crawler = $client->request('GET', 'https://stackoverflow.com/');
$meta = $crawler->filter('meta')->each(function($node) {
    return [
        'name' => $node->attr('name'),
        'content' => $node->attr('content'),
    ];
});

【讨论】:

    猜你喜欢
    • 2019-10-29
    • 2018-06-13
    • 2018-03-22
    • 1970-01-01
    • 1970-01-01
    • 2021-08-10
    • 2020-12-14
    • 2020-02-05
    • 1970-01-01
    相关资源
    最近更新 更多