【问题标题】:HTML Parsing by PHPPHP 解析 HTML
【发布时间】:2017-10-30 13:40:27
【问题描述】:

我正在尝试解析来自以下站点的数组中的投标信息(部门、位置、参考编号、截止日期、说明和投标文件的 URL(用于附件/可下载的 pdf)) http://www.biman-airlines.com/corporate/tender

我是新手,不知道该怎么做。我尝试了以下不起作用的方法

<?php
$url = "http://www.biman-airlines.com/corporate/tender";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$html = curl_exec($ch);
curl_close($ch);

$dom = new DOMDocument();

@$dom->loadHTML($html);

$xpath = new DomXPath($dom);


$headerNames = array();
foreach ($xpath->query('//table[@id=""]//th') as $node) {
$headerNames[] = $node->nodeValue;
}


$data = array();
foreach ($xpath->query('//tbody[@id=""]//tr') as $node) {
$rowData = array();
foreach ($xpath->query('td', $node) as $cell) {
    $rowData[] = $cell->nodeValue;
}

$data[] = array_combine($headerNames, $rowData);
}

print_r($data);
?>

【问题讨论】:

  • 请分享更多信息:什么不起作用?您遇到了哪些错误,您尝试过哪些修复?

标签: php html parsing xpath


【解决方案1】:

试试这个。看看有没有帮助。

    $url = "http://www.biman-airlines.com/corporate/tender";

    /* Use internal libxml errors -- turn on in production, off for debugging */
    libxml_use_internal_errors(true);
    /* Createa a new DomDocument object */
    $dom = new DomDocument;
    /* Load the HTML */
    $dom->loadHTMLFile($url);
    /* Create a new XPath object */
    $xpath = new DomXPath($dom);
    /* Query all <td> nodes containing specified class name */
    $nodes = $xpath->query("//th");
    /* Set HTTP response header to plain text for debugging output */
    // header("Content-type: text/plain");

    /* Traverse the DOMNodeList object to output each DomNode's nodeValue */
    foreach ($nodes as $i => $node) {
        if(($i % 10) == 0)
        echo "Node($i): ", $node->nodeValue, "\n" . "<br>";
    }

【讨论】:

  • 工作得很好。非常感谢。如何划分数组中的节点值?像节点 0-----节点(0):1 部门:企业规划 地点:达卡 Ref.No:DACPM/Q400/173/2017/529 截止日期:2017 年 11 月 21 日 ----从此我想要将部门、位置、参考编号和结束日期保存在数组变量中以插入数据库。 ——
  • 使用爆炸。或者,查看数组中没有 %10 的其他元素。这可以为您提供个人详细信息。如果您喜欢答案,请点赞。
  • 我怎样才能捕捉到描述部分和 $nodes 下的 url?非常感谢
【解决方案2】:
    <!doctype html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta name="description" content="Showcases a horizontal menu that hides at
     small window widths, and which scrolls when revealed.">
    <title>Responsive Horizontal-to-Scrollable Menu &ndash; Layout Examples &ndash; Pure</title>

    <link rel="stylesheet" href="https://unpkg.com/purecss@1.0.0/build/pure-min.css" integrity="sha384-nn4HPE8lTHyVtfCBi5yW9d20FjT8BJwUXyWZT9InLYax14RDjBj46LmSztkmNP9w" crossorigin="anonymous">


    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css"> <script src="https://ajax.googleapis.com/ajax/lib
    <!--[if lt IE 9]>
        <script src="http://cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7/html5shiv.js"></script>
    <![endif]-->
    <script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-41480445-1', 'purecss.io');
    ga('send', 'pageview');
    </script>
</head>
<body>





<style>
.custom-menu-wrapper {
    background-color: white<!--#808080-->;
    margin-bottom: 2.5em;
    white-space: nowrap;
    position: fixed;
}

.custom-menu {
background-color: white;
    display: inline-block;
    width: auto;
    vertical-align: middle;
    -webkit-font-smoothing: antialiased;
}

.custom-menu .pure-menu-link,
.custom-menu .pure-menu-heading {
    color: black;
}

.custom-menu .pure-menu-link:hover,
.custom-menu .pure-menu-heading:hover {
    background-color: transparent;
}

.custom-menu-top {
    position: relative;
    padding-top: .5em;
    padding-bottom: .5em;
}

.custom-menu-brand {
    display: block;
    text-align: center;
    position: relative;
}

.custom-menu-toggle {
    width: 44px;
    height: 44px;
    display: block;
    position: absolute;
    top: 3px;
    right: 0;
    display: none;
}

.custom-menu-toggle .bar {
    background-color: black;
    display: block;
    width: 20px;
    height: 2px;
    border-radius: 100px;
    position: absolute;
    top: 22px;
    right: 12px;
    -webkit-transition: all 0.5s;
    -moz-transition: all 0.5s;
    -ms-transition: all 0.5s;
    transition: all 0.5s;
}

.custom-menu-toggle .bar:first-child {
    -webkit-transform: translateY(-6px);
    -moz-transform: translateY(-6px);
    -ms-transform: translateY(-6px);
    transform: translateY(-9px);
}

.custom-menu-toggle.x .bar {
    -webkit-transform: rotate(45deg);
    -moz-transform: rotate(45deg);
    -ms-transform: rotate(45deg);
    transform: rotate(45deg);
}

.custom-menu-toggle.x .bar:first-child {
    -webkit-transform: rotate(-45deg);
    -moz-transform: rotate(-45deg);
    -ms-transform: rotate(-45deg);
    transform: rotate(-45deg);
}

.custom-menu-screen {
    background-color:white;
    -webkit-transition: all 0.5s;
    -moz-transition: all 0.5s;
    -ms-transition: all 0.5s;
    transition: all 0.5s;
    height: 3em;
    width: 100%;
    position: absolute;
    top: 0;
    z-index: -1;
}

.custom-menu-tucked .custom-menu-screen {
    -webkit-transform: translateY(-44px);
    -moz-transform: translateY(-44px);
    -ms-transform: translateY(-44px);
    transform: translateY(-44px);
}
.sizee{
    width: 10%;
    margin-top: -1%;
    margin-bottom: -1%;
    }
@media (max-width: 90em) {

    .custom-menu {
        display: block;

    }

    .custom-menu-toggle {
        display: block;
        display: none\9;
    }

    .custom-menu-bottom {
        position: absolute;
        width: 100%;
        border-top: 1px solid #eee;
        background-color: white\9;
        z-index: 100;
    }

    .custom-menu-bottom .pure-menu-link {
        opacity: 1;
        -webkit-transform: translateX(0);
        -moz-transform: translateX(0);
        -ms-transform: translateX(0);
        transform: translateX(0);
        -webkit-transition: all 0.5s;
        -moz-transition: all 0.5s;
        -ms-transition: all 0.5s;
        transition: all 0.5s;
    }

    .custom-menu-bottom.custom-menu-tucked .pure-menu-link {
        -webkit-transform: translateX(-140px);
        -moz-transform: translateX(-140px);
        -ms-transform: translateX(-140px);
        transform: translateX(-140px);
        opacity: 0;
        opacity: 1\9;
    }

    .pure-menu-horizontal.custom-menu-tucked {
        z-index: -1;
        top: 45px;
        position: absolute;
        overflow: hidden;
    }
}
    @media (max-width: 60em) {
    .sizee{
    width: 25%;
    margin-top: -1%;
    margin-bottom: -1%;
    }
    }

</style>
<div>
<div class="custom-menu-wrapper navbar-fixed-top">
    <div class="pure-menu custom-menu custom-menu-top">
        <a href="" class="pure-menu-heading custom-menu-brand"><img class="sizee" src='bhagat.jpg'></a>
        <a href="#" class="custom-menu-toggle" id="toggle"><s class="bar"></s><s class="bar"></s></a>
    </div>
    <div class="pure-menu pure-menu-horizontal pure-menu-scrollable custom-menu custom-menu-bottom custom-menu-tucked" id="tuckedMenu" style='border: inherit;'>
        <div class="custom-menu-screen"></div>
        <center>
        <ul class="pure-menu-list">
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">Home</a></li>
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">About</a></li>
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">Contact</a></li>
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">Blog</a></li>
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">GitHub</a></li>
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">Twitter</a></li>
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">Apple</a></li>
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">Google</a></li>
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">Wang</a></li>
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">Yahoo</a></li>
            <li class="pure-menu-item"><a href="#" class="pure-menu-link">W3C</a></li>
        </ul>
        </center>
    </div>
</div>
</div>
<script>
(function (window, document) {
document.getElementById('toggle').addEventListener('click', function (e) {
    document.getElementById('tuckedMenu').classList.toggle('custom-menu-tucked');
    document.getElementById('toggle').classList.toggle('x');
});
})(this, this.document);
</script>


<style>
.main {
    padding: 2em;
    color: black;
}
</style>

<div class="main">
</div>


</body>
</html>

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2015-01-23
    • 2013-02-08
    • 1970-01-01
    • 2011-11-03
    • 2011-10-11
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多