【发布时间】:2017-11-17 05:32:36
【问题描述】:
我正在尝试从页面上刮下一些食谱以用作学校项目的示例,但该页面一直在加载一个空白页面。
我正在关注本教程 - here
这是我的代码:
<?php
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$continue = true;
$url = curl("https://www.justapinch.com/recipes/main-course/");
while ($continue == true) {
$results_page = curl($url);
$results_page = scrape_between($results_page,"<div id=\"grid-normal\">","<div id=\"rightside-content\"");
$separate_results = explode("<h3 class=\"tight-margin\"",$results_page);
foreach ($separate_results as $separate_result) {
if ($separate_result != "") {
$results_urls[] = "https://www.justapinch.com" . scrape_between($separate_result,"href=\"","\" class=\"");
}
}
// Commented out to test code above
// if (strpos($results_page,"Next Page")) {
// $continue = true;
// $url = scrape_between($results_page,"<nav><div class=\"col-xs-7\">","</div><nav>");
// if (strpos($url,"Back</a>")) {
// $url = scrape_between($url,"Back</a>",">Next Page");
// }
// $url = "https://www.justapinch.com" . scrape_between($url, "href=\"", "\"");
// } else {
// $continue = false;
// }
// sleep(rand(3,5));
print_r($results_urls);
}
?>
我正在使用cloud9,并且我已经安装了php5 cURL,并且正在运行apache2。我将不胜感激。
【问题讨论】:
标签: php html curl web-scraping