【问题标题】:How to open facebook, twitter page using PHP Curl如何使用 PHP Curl 打开 facebook、twitter 页面
【发布时间】:2018-07-28 18:39:37
【问题描述】:

当我尝试打开 url1(https://www.google.co.in)url2(https://www.amazon.com)url5(https://www.instagram.com) 时工作正常,我可以加载 url1、url2 和 url5,但是当我尝试打开 url3(https://www.facebook.com)url4(https://www.twitter.com) 时,它会打印我的错误消息:“错误,无法打开。”因为它无法打开 facebook、twitter 页面。我不想使用 API。提前谢谢你。

 <?php

    $curl = curl_init();

    //url1 = https://www.google.co.in
    //url2 = https://www.amazon.com
    //url3 = https://www.facebook.com
    //url4 = https://www.twitter.com
    //url5 = https://www.instagram.com

    $url ="https://www.facebook.com";

    curl_setopt($curl, CURLOPT_URL, $url);

    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

    //curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2);

    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

    $output = curl_exec($curl);
    if($output)
    {
        echo $output;       
    }
    else
    {
        echo "Error, Unable to open.";
    }
?> 

【问题讨论】:

  • echo curl_error($curl) 如果您对错误感到好奇。
  • 你到底想实现什么,为什么不用API呢?你可能正在尝试做一些你不应该做的事情,这就是为什么它不起作用。
  • Facebook 不允许您抓取它们。所以他们会做一些事情来让它变得更难。
  • @luschn 实际上,它不起作用,因为他没有关注HTTP redirects,并且他没有提供任何User-Agent string

标签: php facebook curl instagram php-curl


【解决方案1】:

在调试此类问题时,请启用 CURLOPT_VERBOSE。此外,调试时,不要使用echo,使用var_dump。如果你这样做了,你会看到类似的东西

* Rebuilt URL to: https://www.facebook.com/
*   Trying 157.240.20.35...
* TCP_NODELAY set
*   Trying 2a03:2880:f10a:83:face:b00c:0:25de...
* TCP_NODELAY set
* Immediate connect fail for 2a03:2880:f10a:83:face:b00c:0:25de: Network is unreachable
* Connected to www.facebook.com (157.240.20.35) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=US; ST=California; L=Menlo Park; O=Facebook, Inc.; CN=*.facebook.com
*  start date: Dec 15 00:00:00 2017 GMT
*  expire date: Mar 22 12:00:00 2019 GMT
*  subjectAltName: host "www.facebook.com" matched cert's "*.facebook.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
*  SSL certificate verify ok.
> GET / HTTP/1.1
Host: www.facebook.com
Accept: */*

< HTTP/1.1 302 Found
< Strict-Transport-Security: max-age=15552000; preload
< Location: https://www.facebook.com/unsupportedbrowser
< Content-Type: text/html; charset=UTF-8
< X-FB-Debug: x3NeeaaJHxPQkX5Z9H7yMX3evzYJocXmZpzMV6GoWtacO8bXLL3O58vidPHZUvXTuP9iE9pHPEnbr/RvNsT23Q==
< Date: Mon, 19 Feb 2018 09:12:51 GMT
< Connection: keep-alive
< Content-Length: 0
< 
* Connection #0 to host www.facebook.com left intact
string(0) ""

问题在于 facebook 尝试发出 HTTP 重定向(到 https://www.facebook.com/unsupportedbrowser),而您没有遵循它。启用 CURLOPT_FOLLOWLOCATION 让 curl 自动处理重定向。为什么facebook重定向你?因为您没有提供任何用户代理标头。设置一个 facebook 将识别为 CURLOPT_USERAGENT 支持的选项,例如 Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0(又名 Firefox 52 ESR 在 Windows 7 x64 上运行)

至于 twitter.com,

* Rebuilt URL to: https://www.twitter.com/
*   Trying 104.244.42.193...
* TCP_NODELAY set
* Connected to www.twitter.com (104.244.42.193) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: businessCategory=Private Organization; jurisdictionC=US; jurisdictionST=Delaware; serialNumber=4337446; C=US; ST=California; L=San Francisco; O=Twitter, Inc.; OU=tsa_o Point of Presence; CN=twitter.com
*  start date: Jul 25 00:00:00 2017 GMT
*  expire date: Jul 30 12:00:00 2018 GMT
*  subjectAltName: host "www.twitter.com" matched cert's "www.twitter.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 Extended Validation Server CA
*  SSL certificate verify ok.
> GET / HTTP/1.1
Host: www.twitter.com
Accept: */*

< HTTP/1.1 301 Moved Permanently
< content-length: 0
< date: Mon, 19 Feb 2018 09:17:51 GMT
< location: https://twitter.com/
< server: tsa_o
< set-cookie: personalization_id="v1_ersTgWQIOjuJkjk6VFUlXw=="; Expires=Wed, 19 Feb 2020 09:17:51 UTC; Path=/; Domain=.twitter.com
< set-cookie: guest_id=v1%3A151903187127250514; Expires=Wed, 19 Feb 2020 09:17:51 UTC; Path=/; Domain=.twitter.com
< strict-transport-security: max-age=631138519
< x-connection-hash: aae827a6347e88db5f417a0c31bba366
< x-response-time: 101
< 
* Connection #0 to host www.twitter.com left intact
string(0) ""
  • 它试图将您重定向到该站点的非 www url 版本,但同样,您没有遵循重定向。启用 CURLOPT_FOLLOWLOCATION 让 curl 自动跟随 http 重定向。

【讨论】:

    猜你喜欢
    • 2014-10-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-07-08
    • 1970-01-01
    相关资源
    最近更新 更多