【问题标题】:Use Simple HTML DOM Parser to JSON?使用简单的 HTML DOM 解析器到 JSON?
【发布时间】:2016-02-10 09:27:43
【问题描述】:

我正在尝试对抓取的网站的每个元素进行分组,将其转换为 json 元素,但它似乎不起作用。

<?php

// Include the php dom parser    
include_once 'simple_html_dom.php';

header('Content-type: application/json');

// Create DOM from URL or file

$html = file_get_html('urlhere');

foreach($html->find('hr ul') as $ul)
{
    foreach($ul->find('div.product') as $li) 
    $data[$count]['products'][]['li']= $li->innertext;
    $count++;
}
echo json_encode($data);

?> 

这会返回

{"":{"products":[{"li":"   <a class=\"th\" href=\"\/products\/56942-haters-crewneck-sweatshirt\">            <div style=\"background-image:url('http:\/\/s0.merchdirect.com\/images\/15814\/v600_B_AltApparel_Crew.png');\">         <img src=\"http:\/\/s0.com\/images\/6398\/product-image-placeholder-600.png\">       <\/div>                  <\/a>   <div class=\"panel panel-info\" style=\"display: none;\">     <div class=\"name\">       <a href=\"\/products\/56942-haters-crewneck-sweatshirt\">                    Haters Crewneck Sweatshirt                <\/a>     <\/div>     <div class=\"subtitle\">                                                                                                                                                                                                                                                                                                                                  $60.00                                 <\/div>   <\/div> "}

当我真正希望实现时:

{"products":[{
"link":"/products/56942-haters-crewneck-sweatshirt",
"image":"http://s0.com/images/15814/v600_B_AltApparel_Crew.png",
"name":"Haters Crewneck Sweatshirt",
 "subtitle":"60.00"}
]}

如何去除所有冗余信息,并可能在重新格式化的 json 中命名每个元素?

谢谢!

【问题讨论】:

    标签: php json html-parsing


    【解决方案1】:

    您只需要在内部循环中扩展您的逻辑:

    foreach($html->find('hr ul') as $ul)
    {
        foreach($ul->find('div.product') as $li) {
            $product = array();
    
            $product['link'] = $li->find('a.th')[0]->href;
            $product['name'] = trim($li->find('div.name a')[0]->innertext);
            $product['subtitle'] = trim($li->find('div.subtitle')[0]->innertext);
            $product['image'] = explode("'", $li->find('div')[0]->style)[1];
    
            $data[$count]['products'][] = $product;
        }
    
    }
    echo json_encode($data);
    

    【讨论】:

    • 请问如何删除 "products":[{"link":..." 前面的 "{"":{" 吗?
    • 只需将$data[$count]['products'][] = $product;更改为$data['products'][] = $product;
    • 你是最棒的。谢谢你的帮助! :)
    猜你喜欢
    • 1970-01-01
    • 2012-01-17
    • 1970-01-01
    • 1970-01-01
    • 2014-06-24
    • 2016-07-30
    • 2015-05-17
    • 2015-02-14
    相关资源
    最近更新 更多