我的抓取 php 脚本中没有循环答案

【问题标题】：no loop in my scraping php script我的抓取 php 脚本中没有循环
【发布时间】：2018-04-01 10:28:53
【问题描述】：

我做了一些代码来从网站上抓取标题和标题的链接，脚本是这样的

<?php

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, 'http://dunia21.tv/?s=fast+anf+furious');
curl_setopt($ch, CURLOPT_URL, 'http://dunia21.tv/?s=fast+anf+furious');

// html
$data = curl_exec($ch);
curl_close($ch);

//
$output = array();

// inclue simple html dom
require('./lib/simple_html_dom.php');

// ubah html jadi string
$html = str_get_html($data);

// menentukan bahan yang akan diolah, yaitu class=mag-box wide-post-box
$bahan = $html->find('div[class=row content]', 0);

// ambil kotak2 postingan dalam li class=post-item
$kotak = $bahan->find('section[class=primary col-md-11]', 0);

// ekstrak kotak
foreach($kotak as $key => $val) {
    // title di h3 class=post-title
    $header = $kotak->find('header[class=col-xs-12 entry-header]', 0);
    $title = $header->find('a[rel=bookmark]', 0)->innertext;
    $url = $header->find('a[rel=bookmark]', 0)->href;   

        $output[] = array(
        'title' => $title,
        'link' => $url
        );
}

print  '<pre>';
print_r($output);
print  '<pre>';
?>

我认为这个脚本会按照我的意愿运行，并且这个脚本设法发出了响应，但是这个脚本只取第一个标题，而其他的没有，输出是这样的

Array
(
    [0] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [1] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [2] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [3] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [4] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [5] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [6] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

    [7] => Array
        (
            [title] => 2 Fast 2 Furious (2003)
            [link] => http://dunia21.tv/2-fast-2-furious-2003/
        )

)

有什么建议可以解决这个问题吗？谢谢

【问题讨论】：

标签： php json curl scrape

【解决方案1】：

foreach 循环中的代码始终查看具有完整 DOM 的变量，因此它始终会找到第一个实例。

改变

$header = $kotak->find('header[class=col-xs-12 entry-header]', 0);

到

$header = $val->find('header[class=col-xs-12 entry-header]', 0);

【讨论】：