【问题标题】:Convert scraping result into an array将抓取结果转换为数组
【发布时间】:2019-07-01 14:10:45
【问题描述】:

我正在使用 Simple HTML DOM 抓取网站,输出如下所示:

<tr>
    <th>Satuan</th>
    <th>Harga Barang 1</th>
    <th>Harga Barang 2</th>
    <th>Harga Barang 3</th>
    <th>Harga Barang 4</th>
</tr>
<tr>
    <td>0.5</td>
    <td>Rp 388.000</td>
    <td>Rp 342.000</td>
    <td>Rp 456.000</td>
    <td>Rp 377.000</td>
</tr>
<tr>
    <td>1.0</td>
    <td>Rp 725.000</td>
    <td>Rp 676.000</td>
    <td>Rp 855.000</td>
    <td>Rp 684.000</td>
</tr>

这是我的代码:

<?php
include('simple_html_dom.php');
$html = new simple_html_dom();
$html->load_file("mylink.com/blabla");

foreach($html->find('tr') as $e) {
    echo $e;
}
?>

如何将输出转换成数组?

【问题讨论】:

  • 访问循环内节点的文本内容(或您真正感兴趣的任何内容),并将其简单地分配为新的数组元素 - 实际的问题在哪里? (仅供参考,您是 scraping,而不是 scraping。)

标签: php html arrays web-scraping


【解决方案1】:

这里是sn-p,

$ret     = $html->find('tr');
$i       = true;
$headers = [];
foreach ($ret as $key => $value) {
    if ($i) {
        // fetching headers of first row
        foreach ($value->find('th') as $cell) {
            $headers[] = $cell->plaintext;
        }
    } else {
        $temp = [];
        // fetching pending values of td
        foreach ($value->find('td') as $cell) {
            $temp[] = $cell->plaintext;
        }
        // combining headers with values fetched from not first row
        $result[] = array_combine($headers, $temp);
    }
    $i = false;
}
print_r($result);die;

输出

Array
(
    [0] => Array
        (
            [Satuan] => 0.5
            [Harga Barang 1] => Rp 388.000
            [Harga Barang 2] => Rp 342.000
            [Harga Barang 3] => Rp 456.000
            [Harga Barang 4] => Rp 377.000
        )

    [1] => Array
        (
            [Satuan] => 1.0
            [Harga Barang 1] => Rp 725.000
            [Harga Barang 2] => Rp 676.000
            [Harga Barang 3] => Rp 855.000
            [Harga Barang 4] => Rp 684.000
        )

)

【讨论】:

    猜你喜欢
    • 2015-01-07
    • 2021-04-08
    • 2011-08-09
    • 2018-06-26
    • 2015-09-14
    • 2012-01-20
    • 1970-01-01
    • 1970-01-01
    • 2011-02-17
    相关资源
    最近更新 更多