【发布时间】:2020-05-10 16:16:17
【问题描述】:
我正在尝试使用 Node JS Axios 抓取该网站以获取大学名称。我注意到该网站使用分页 API,因此要抓取所有大学名称,我必须发送多个请求。
const url = 'https://www.usnews.com/best-colleges/search?_sort=rank&_sortDirection=asc&study=Engineering&_mode=table&_page=1;
const url = 'https://www.usnews.com/best-colleges/search?_sort=rank&_sortDirection=asc&study=Engineering&_mode=table&_page=2;
const url = 'https://www.usnews.com/best-colleges/search?_sort=rank&_sortDirection=asc&study=Engineering&_mode=table&_page=3;
...
const url = 'https://www.usnews.com/best-colleges/search?_sort=rank&_sortDirection=asc&study=Engineering&_mode=table&_page=55;
我编写了只抓取一页的代码。我不知道如何抓取超过 1 页。 这是我的代码
const axios = require('axios');
const cheerio = require('cheerio');
var request = require('request');
fs = require('fs');
_sort=rank&_sortDirection=asc&study=Engineering";
// table view
page= 1;
const url = 'https://www.usnews.com/best-colleges/search?_sort=rank&_sortDirection=asc&study=Engineering&_mode=table&_page=' +page;
fetchData(url).then((res) => {
const html = res.data;
const $ = cheerio.load(html);
const unilist = $('.TableTabular__TableContainer-febmbj-0.guaRKP > tbody > tr >td ');
unilist.each(function() {
let title = $(this).find('div').attr("name");
if (typeof(title) == 'string') {
console.log(title);
fs.appendFileSync('universityRanking.txt', title+'\n', function (err) {
if (err) return console.log(err);
});
}
});
})
async function fetchData(url){
console.log("Crawling data...")
// make http call to url
let response = await axios(url).catch((err) => console.log(err));
if(response.status !== 200){
console.log("Error occurred while fetching data");
return;
}
return response;
}
我需要有关如何发出 55 个 Axios 请求的帮助?我检查了该页面有 55 页。我需要将每个页面中的所有大学名称附加到文本文件中。
【问题讨论】:
标签: node.js axios http-headers httprequest