【问题标题】:Specific Array to CSV file特定数组到 CSV 文件
【发布时间】:2016-07-21 15:53:57
【问题描述】:

我在将数据从表发送到 CSV 文件时遇到问题。

Array

[link1] => HTTP Code
[link2] => HTTP Code
[link3] => HTTP Code
[link4] => HTTP Code

我需要将数据发送到 CSV 文件,这样链接就不会重复出现。 不幸的是,我不知道如何在链接之后发送链接(我在 foreach 循环中工作)以提取每个链接并将其发送到 CSV,同时检查已经没有显示的链接。

这是我的代码:

require('simple/simple_html_dom.php');
$xml = simplexml_load_file('https://www.gutscheinpony.de/sitemap.xml');
$fp = fopen('Links2.csv', 'w');
set_time_limit(0);

$links=[];

foreach ($xml->url as $link_url) 
{

    $url = $link_url->loc;

    $data=file_get_html($url);
    $data = strip_tags($data,"<a>");
    $d = preg_split("/<\/a>/",$data);

    foreach ( $d as $k=>$u ){
        if( strpos($u, "<a href=") !== FALSE ){
            $u = preg_replace("/.*<a\s+href=\"/sm","",$u);
            $u = preg_replace("/\".*/","",$u);

            if ( strpos($u, "http") !== FALSE) { 
                    $ch = curl_init($u);
                    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
                    $output = curl_exec($ch);
                    $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);

                    if(strpos($u, "https://www.gutscheinpony.de/") !== FALSE )
                        $u = substr($u, 28);

                        if($u == "/")
                            $u = $url;
                        }

            $links[$u] = $http_code;  

                $wynik = array( array($u, $url , $http_code));



            foreach ($wynik as $fields) {
                fputcsv($fp, $fields);
            }
        } 
    }
}


    curl_close($ch);
    fclose($fp);

echo 'Send to CSV file successfully completed ... ';

我需要从 .xml 中获取每个链接,下载同一页面上的链接并指定 HTTP 状态。这部分我已经完成了。我不能只以适当的方式将数据发送到 CSV 文件。

我指望你的帮助。

【问题讨论】:

  • 在使用 fputscsv 之前尝试从 $wynik = array( array()) 中删除一个 array()。所以只给出一次$wynik = array()
  • 你不能把 fputcsv() foreach 循环放在链接处理循环之后,即在它之外吗? $wynik 当然会变得更大,所以前提是你有足够的内存。也使它成为一个关联数组,以 $url 作为键。这样每个$url 值只会被写入一次。

标签: php arrays csv http-status-codes


【解决方案1】:

下面的代码本质上是您的代码,经过了一些修改。还有人观察到:// 作为 PHP 数组键的一部分似乎不可接受。

    <?php

        require __DIR__ . '/simple/simple_html_dom.php';
        $xml        = simplexml_load_file('https://www.gutscheinpony.de/sitemap.xml');
        $fp         = fopen(__DIR__ . '/Links2.csv', 'w');
        set_time_limit(0);
        $links      = [];
        $status     = false;

        foreach ($xml->url as $link_url){

            $url    = $link_url->loc;
            $data   = file_get_html($url);
            $data   = strip_tags($data,"<a>");
            $d      = preg_split("/<\/a>/",$data);

            foreach ( $d as $k=>$u ){
                $http_code = 404;
                if( strpos($u, "<a href=") !== FALSE ){
                    $u = preg_replace("/.*<a\s+href=\"/sm","",$u);
                    $u = preg_replace("/\".*/","",$u);

                    if ( strpos($u, "http") !== FALSE) {
                        // JUST GET THE CODE ON EACH ITERATION,
                        // OPENING THE STREAM & CLOSING IT AGAIN ON EACH ITERATION...
                        $http_code  = getHttpCodeStatus($u);

                        if(strpos($u, "https://www.gutscheinpony.de/") !== FALSE ){
                            $u = substr($u, 28);
                        }

                        if($u == "/") {
                            $u = $url;
                        }
                        // THIS COULD BE A BUG... USING :// AS PART OF AN ARRAY KEY SEEMS NOT TO WORK
                        $links[str_replace("://", "_", $u)] = $http_code;

                        // RUN THE var_dump(), TO VIEW THE PROCESS AS IT PROGRESSES IF YOU WISH TO
                        var_dump($links);
                        $status = fputcsv($fp, array($u, $url , $http_code));
                    }

                }
            }
        }


        fclose($fp);
        if($status) {
            echo count($links) . ' entries were successfully processed and written to disk as a CSV File... ';
        }else{
            echo  'It seems like some entries were not successfully written to disk  - at least the last entry... ';                
        }

        function getHttpCodeStatus($u){
            $ch         = curl_init($u);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
            $output     = curl_exec($ch);
            $http_code  = curl_getinfo($ch, CURLINFO_HTTP_CODE);
            curl_close($ch);
            return $http_code;
        }

【讨论】:

  • 感谢您的帮助。我完成了这项任务。首先,我创建一个新数组并保存循环的每个步骤的结果。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2017-02-18
  • 2016-06-14
  • 1970-01-01
  • 2015-08-20
  • 2015-10-13
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多