【问题标题】:how to get multiple data from xpath query?如何从 xpath 查询中获取多个数据?
【发布时间】:2015-03-14 20:50:18
【问题描述】:

这是 HTML 页面 (test.html)

<div id = 'mainid'>
    <div id = 'subid'>
        Name: ABC
    </div>
    <div id = 'subid'>
        Country: USA
    </div>
    <div id = 'subid'>
        Date of birth: 15 Feb 1985
    </div>
</div>
<div id = 'mainid'>
    <div id = 'subid'>
        Name: Jisan
    </div>
    <div id = 'subid'>
        Country: Japan
    </div>
    <div id = 'subid'>
        Date of birth: 15 Feb 1985
    </div>
</div>
<div id = 'mainid'>
    <div id = 'subid'>
        Name: Mr Barman
    </div>
    <div id = 'subid'>
        Country: Canada
    </div>
    <div id = 'subid'>
        Date of birth: 15 Feb 1985
    </div>
</div>

这里是 PHP 代码

$file = $DOCUMENT_ROOT. "test.html";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);

$xpath = new DOMXpath($doc);

$Querys = $xpath->query("*//div[@id='mainid']");
foreach ($Querys as $Querys) {
    echo $Name  = Please help me about this code;
    echo $Country   = Please help me about this code;
    echo $DOB   = Please help me about this code;
}

注意:我想得到这样的结果

Name: ABC, Country: USA, Date of birth: 15 Feb 1985.
Name: Jisan, Country: Japan, Date of birth: 15 Feb 1985.
Name: Mr Barman, Country: Canada, Date of birth: 15 Feb 1985.

【问题讨论】:

    标签: php xpath web-scraping scraper


    【解决方案1】:

    一种方法是使用 DOMXPath::query 的 contextnode 参数对子 subid 的每个 mainid 元素进行子查询。像这样的:

    $mainElements = $xpath->query("*//div[@id='mainid']");
    foreach ($mainElements as $mainElement) {
        $subElements = $xpath->query("div[@id='subid']", $mainElement);
    
        if ($subElements && $subElements->length == 3) {
            $Name = trim($subElements[0]->nodeValue);
            $Country = trim($subElements[1]->nodeValue);
            $DOB = trim($subElements[2]->nodeValue);
            echo "$Name, $Country, $DOB\n";
        } else {
            echo "Invalid number of sub-elements.\n";
        }   
    }
    

    请注意,trim 调用是必要的,否则您将在输出中得到原始文档中的所有空白。

    【讨论】:

      猜你喜欢
      • 2020-11-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-06-18
      • 1970-01-01
      • 2012-03-13
      • 2018-12-22
      相关资源
      最近更新 更多