【问题标题】:Pull node attribute out aswell as node value from td element从 td 元素中拉出节点属性以及节点值
【发布时间】:2015-11-10 14:44:20
【问题描述】:

我有下面的 PHP 代码,它获取一个 HTML 文件并从中拉出表格,接下来解析表格并返回单元格数据,就像在 Current Output 中一样,我正在尝试获取 href 属性输出也就像在Desired Output sn-p 中一样,如果存在 href,我看不到如何仅针对单元格中的 href,我似乎只能获取节点值,非常感谢任何帮助。

电流输出

Array
(
    [0] => Array
        (
            [id] => 213
            [url] => Website
        )
)

期望的输出

Array
(
    [0] => Array
        (
            [id] => 213
            [url] => Website
            [link] => example.com/page/1/
        )
)

HTML

<table>
    <tr>
        <td>213</td>
        <td><a href="example.com/page/1/">Website</a></td>
    </tr>
</table>

PHP

$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);

$dom->preserveWhiteSpace = false;

$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
$cols = $rows->item(0)->getElementsByTagName('th');
$row_headers = null;

foreach($cols AS $node) {
    $row_headers[] = $node->nodeValue;
}

$table = array();
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach($rows AS $row) {
    $cols = $row->getElementsByTagName('td');
    $row = array();
    $i = 0;
    foreach($cols AS $node) {
        if ($row_headers != null) {
            $row[$row_headers[$i]] = $node->nodeValue;
        }
        $i++;
    }
    if (!empty($row)) {
        $table[] = $row;
    }
}

我曾在嵌套的 foreach foreach($cols AS $node) 中尝试过 $row['link'] = $node-&gt;getAttribute('href');,但它似乎也不起作用。

【问题讨论】:

    标签: php html


    【解决方案1】:

    查看下面的代码和内联 cmets

    $html = '<table>
        <tr>
            <td>213</td>
            <td><a href="example.com/page/1/">Website</a></td>
        </tr>
        <tr>
            <td>444</td>
            <td><a href="example.org/page/1/">not a website</a></td>
        </tr>
    </table>';
    
    $dom = new DOMDocument();
    $html = $dom->loadHTML($html);
    
    $dom->preserveWhiteSpace = false;
    
    $rows = $dom->getElementsByTagName("tr");
    
    foreach($rows as $row){
    
        $cols = $row->getElementsByTagName('td'); 
    
        $id = $cols->item(0)->nodeValue; // get the id, the first td element, index=0
        $anchor = $cols->item(1)->nodeValue; // get the anchor text, the second td element, index=1
        $url    = $cols->item(1)->getElementsByTagName('a')->item(0)->getAttribute('href'); // get the url from the href attribute, the second td element, index=1
    
        $result[] = array(
            'id' => $id,
            'anchor'=> $anchor,
            'url'=>$url
        );
    }
    
    print_r($result);
    

    应该输出这个

    Array
    (
        [0] => Array
            (
                [id] => 213
                [anchor] => Website
                [url] => example.com/page/1/
            )
    
        [1] => Array
            (
                [id] => 444
                [anchor] => not a website
                [url] => example.org/page/1/
            )
    
    )
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-02-27
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多