Goutte 提取带有标签的文本答案

【问题标题】：Goutte extract text with tagsGoutte 提取带有标签的文本
【发布时间】：2018-06-13 22:11:51
【问题描述】：

在尝试学习和使用 Goutte 抓取网站以获取描述时，它会检索文本但会删除所有标签（即<br><b>）。有没有办法检索 div 中所有文本的值，包括 html 标签？或者有没有更简单的替代方法可以赋予我这种能力？

    <?php 
            require_once "vendor/autoload.php";
            use Goutte\Client;

            // Init. new client
            $client = new Client();
            $crawler = $client->request('GET', "examplesite.com/example");

            // Crawl response
            $description = $crawler->filter('element.class')->extract('_text');
    ?>

【问题讨论】：

标签： php web-scraping goutte domcrawler

【解决方案1】：

你可以使用html()函数

http://api.symfony.com/4.0/Symfony/Component/DomCrawler/Crawler.html#method_html

这样

$descriptions = $crawler->filter('element.class')->each(function($node) {
    return $node->html();
})

之后就可以使用strip_tagsPHP函数来清理了

http://php.net/manual/fr/function.strip-tags.php

【讨论】：

不知道这是一个函数，解决了一切谢谢