【问题标题】:How do I get specific values from a DOMXPath query?如何从 DOMXPath 查询中获取特定值?
【发布时间】:2014-05-17 00:19:59
【问题描述】:

我是 DOMXPath 的新手,但我正在努力了解更多信息。目前我有一个这样的 HTML 结构:

    <span class="1">
        <div class="headerClass">
            Here you have <span class="spanClass1">some text</span>. And here there is <span class="spanClass2">even more text</span>
        </div>
        <table class="tableClass" id="tableID">
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td>some text</td>
            </tr>
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td><a href="http://www.website1.com" target="_blank">My Link</a></td>
            </tr>
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td><a href="http://www.website2.com" target="_blank">My Link</a></td>
            </tr>
        </table>
    </span>

    <span class="2">
        <div class="headerClass">
            Here you have <span class="spanClass1">some text</span>. And here there is <span class="spanClass2">even more text</span>
        </div>
        <table class="tableClass" id="tableID">
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td>some text</td>
            </tr>
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td><a href="http://www.website1.com" target="_blank">My Link</a></td>
            </tr>
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td><a href="http://www.website2.com" target="_blank">My Link</a></td>
            </tr>
        </table>
    </span>

... and the spans continue: 3, 4, 5 ... etc

为了从源文件中检索这个 HTML 代码,我使用这个:

$oDomXpath = new DOMXpath($oDom);
$query = "//span[number(@class)=number(@class)]";   
$oDomObject = $oDomXpath->query($query);

foreach ($oDomObject as $oObject) {
    // WHAT GOES HERE????
}

我需要将以下值存储在一个数组中:

  1. 所有&lt;div class="headerClass"&gt;的纯文本,不带html标签。
  2. 所有&lt;span class="spanClass2"&gt;的文字
  3. 所有网址都在表格内。表格可以有从 0 到很多的任意行数。

我怎样才能做到这一点?我必须在 foreach 循环中放入什么?我是否需要运行另一个查询??

非常感谢您的帮助!

【问题讨论】:

    标签: php html dom xpath domxpath


    【解决方案1】:

    您可以选择,可以使用多个 XPath 查询并逐个获取值,也可以构建具有多个路径的唯一 XPath 查询:

    <pre><?php
    $dom = new DOMDocument();
    @$dom->loadHTMLFile('yourfile.html');
    
    $xpath = new DOMXPath($dom);
    
    $xquery = <<<'EOD'
    //span[number(@class)=@class]/@class |
    //span[number(@class)=@class]/div[@class='headerClass'] |
    //span[number(@class)=@class]/div[@class='headerClass']/span[@class='spanClass2'] | 
    //span[number(@class)=@class]/table[@class='tableClass']/tr/td/a
    EOD;
    
    $nodes = $xpath->query($xquery);
    
    foreach ($nodes as $node) {
        if ($node->nodeType == XML_ELEMENT_NODE)
            switch($node->nodeName):
                case 'div' : echo '<br/>div content: ' . $node->nodeValue; break;
                case 'span': echo '<br/>span content: ' . $node->nodeValue; break;
                default    : echo '<br/>url: ' . $node->getAttribute('href');
            endswitch;
        else
            echo '<br/><br/>number: ' . $node->value;
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-08-11
      • 2019-09-07
      • 1970-01-01
      • 1970-01-01
      • 2020-02-20
      • 1970-01-01
      • 2012-04-24
      • 1970-01-01
      相关资源
      最近更新 更多