【问题标题】:How to get meta tags in php?如何在 php 中获取元标记?
【发布时间】:2018-02-14 11:36:31
【问题描述】:

我正在尝试导出以下 url 元标记,但它无法正常工作,给出以下结果 警告:get_meta_tags(https://www.washingtonpost.com/politics/white-house-reels-as-fbi-director-contradicts-official-claims-about-alleged-abuser/2018/02/13/f010f256-10d9-11e8-9570-29c9830535e5_story.html?tid=pm_pop):无法打开流:已达到重定向限制,正在中止。 对此有何想法?

【问题讨论】:

  • 呃,可能是网址坏了?
  • 不是。你试过了吗?
  • 没有 url 工作正常,但它的元标记没有被导出。
  • 3 分钟前浏览器提示无法建立安全连接,访问 http:// 版本时出现 403 错误,但现在可以正常使用

标签: php curl


【解决方案1】:

首先,您需要调用第一页来设置 cookie,否则它将不起作用

$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,FALSE);
curl_setopt($ch,CURLOPT_URL,"https://www.washingtonpost.com");
curl_setopt($ch,CURLOPT_RETURNTRANSFER,TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
$cookieName = "";
if(isset($_COOKIE['PHPSESSID'])){
    $cookieName = $_COOKIE['PHPSESSID'];
}
curl_setopt( $ch, CURLOPT_COOKIEJAR, $_SERVER['DOCUMENT_ROOT'].'/logs/'.$cookieName.'.txt'); 
curl_setopt( $ch, CURLOPT_COOKIEFILE, $_SERVER['DOCUMENT_ROOT'].'/logs/'.$cookieName.'.txt');
curl_exec($ch);
curl_close($ch);

然后第二次调用以获取实际页面

$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,FALSE);
curl_setopt($ch,CURLOPT_URL,"https://www.washingtonpost.com/politics/white-house-reels-as-fbi-director-contradicts-official-claims-about-alleged-abuser/2018/02/13/f010f256-10d9-11e8-9570-29c9830535e5_story.html?tid=pm_pop");
curl_setopt($ch,CURLOPT_RETURNTRANSFER,TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
$cookieName = "";
if(isset($_COOKIE['PHPSESSID'])){
    $cookieName = $_COOKIE['PHPSESSID'];
}
curl_setopt( $ch, CURLOPT_COOKIEJAR, LOG_DIR.'/'.$cookieName.'.txt');
curl_setopt( $ch, CURLOPT_COOKIEFILE, LOG_DIR.'/'.$cookieName.'.txt');
$page = curl_exec($ch);
curl_close($ch);

最后我们用 DOMDocument 解析 dom 树

libxml_use_internal_errors(true);
$siteData = new DOMDocument();
$siteData->loadHTML($page);

$metaElements = $siteData->getElementsByTagName("meta");
if($metaElements->item(0)==null){
    echo "ERROR";
}

$meta = array();
for($i=0;$i<$metaElements->length;$i++){
    $meta[$i] = array();
    for($j=0;$j<$metaElements->item($i)->attributes->length;$j++){
        $meta[$i][$j] = array($metaElements->item($i)->attributes->item($j)->name,$metaElements->item($i)->attributes->item($j)->value);
    }
}
print_r($meta);

元数据存储在 $meta 数组中

你可以通过组织 curl 来美化这段代码。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-11-10
    • 2012-08-14
    • 1970-01-01
    • 2015-04-13
    • 2020-12-27
    • 2020-10-17
    • 1970-01-01
    相关资源
    最近更新 更多