【发布时间】:2017-03-04 09:34:39
【问题描述】:
我正在尝试从网站上抓取一些内容,一切正常,但抓取的文本仅在控制台中可供我使用,但我想在浏览器上打印这些抓取的数据。我认为我处理回调的方式有问题。有人可以帮忙吗?
我的代码如下:
app.get('/test', function(req, res) {
//All the web scraping magic will happen here
var url = 'https://www.mywebsite.com/path/to/abc';
var allText;
var getTheText = function() {
request(url, function getText(error, response, html){
// First we'll check to make sure no errors occurred when making the request
if(!error){
// Next, we'll utilize the cheerio library on the returned html which will essentially give us jQuery functionality
var $ = cheerio.load(html);
// Finally, we'll define the variables we're going to capture
var allText = $('body').children().find('p').text()
console.log('allText');
console.log(allText);
return allText;
}
else {
}
//return result;
});
console.log(allText);
}
getTheText();
console.log('gettheText is ' + getTheText());
res.send(allText);
})
【问题讨论】:
-
只是一个提示,不要处理cheerio wile 处理请求。使用 redis 或 kue 将其推送到后台作业。完成抓取后,将结果推送到 websocket 或通过 ws 发送事件以获取结果
标签: javascript node.js express callback cheerio