【发布时间】:2019-04-06 18:30:29
【问题描述】:
我正在使用CURL 代理旋转:
$url = 'https://www.stubhub.com/';
$proxiesArray = array();
$curl = curl_init();
for ($i = 0; $i <= count($proxiesArray) - 1; $i++) {
//CURL options.
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
curl_setopt($curl, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($curl, CURLOPT_PROXY, $proxiesArray[$i]);
curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt( $curl, CURLOPT_AUTOREFERER, TRUE );
curl_setopt( $curl, CURLOPT_HEADER, FALSE );
curl_setopt( $curl, CURLOPT_CONNECTTIMEOUT, 0 );
curl_setopt( $curl, CURLOPT_TIMEOUT, 0 );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE );
curl_setopt( $curl, CURLOPT_URL, trim($url) );
curl_setopt($curl, CURLOPT_REFERER, trim($url));
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE );
curl_setopt($curl, CURLOPT_VERBOSE, TRUE);
//CURL info.
$data = curl_exec( $curl );
$info = curl_getinfo( $curl );
$error = curl_error( $curl );
$all = array($data, $info, $error);
//If success.
if (empty($error)) {
echo '<pre>';
print_r($all);
echo '</pre>';
break;
}
//Wait for 2 seconds.
sleep(2);
}
curl_close( $curl );
但我被重定向到包含一条消息的 Recaptcha 页面:
Due to high volume of activity from your computer, our anti-robot software has blocked your access to stubhub.com. Please solve the puzzle below and you will immediately regain access.
为了减慢请求,我尝试了:
curl_setopt($curl,CURLOPT_MAX_RECV_SPEED_LARGE,10);
还有:
curl_setopt($curl, CURLOPT_PROGRESSFUNCTION, function() {
sleep(2);
return 0;
});
但是我得到了同样的信息,那么如何减慢这个过程就像来自浏览器的真实请求一样?
【问题讨论】:
标签: php curl web-scraping https proxy