【问题标题】:nodejs web scraping and callback issuesnodejs网页抓取和回调问题
【发布时间】:2017-03-04 09:34:39
【问题描述】:

我正在尝试从网站上抓取一些内容,一切正常,但抓取的文本仅在控制台中可供我使用,但我想在浏览器上打印这些抓取的数据。我认为我处理回调的方式有问题。有人可以帮忙吗?

我的代码如下:

app.get('/test', function(req, res) {
    
      //All the web scraping magic will happen here
    var url = 'https://www.mywebsite.com/path/to/abc';
  var allText;
  var getTheText = function() {
          request(url, function getText(error, response, html){

        // First we'll check to make sure no errors occurred when making the request

        if(!error){
            // Next, we'll utilize the cheerio library on the returned html which will essentially give us jQuery functionality

            var $ = cheerio.load(html);

            // Finally, we'll define the variables we're going to capture
            
          var allText = $('body').children().find('p').text()

                console.log('allText');
                console.log(allText);
            return allText; 
        }
        else {
        }

        //return result;
    });   
          console.log(allText);
   
  }

getTheText();
  console.log('gettheText is ' + getTheText());
  res.send(allText);
})

【问题讨论】:

  • 只是一个提示,不要处理cheerio wile 处理请求。使用 redis 或 kue 将其推送到后台作业。完成抓取后,将结果推送到 websocket 或通过 ws 发送事件以获取结果

标签: javascript node.js express callback cheerio


【解决方案1】:

从数据可用的回调函数发送响应。 请看下面的代码:

app.get('/test', function(req, res) {

  //All the web scraping magic will happen here
  var url = 'https://www.mywebsite.com/path/to/abc';
  var allText;
  var getTheText = function() {
    request(url, function getText(error, response, html) {

      // First we'll check to make sure no errors occurred when making the request

      if (!error) {
        // Next, we'll utilize the cheerio library on the returned html which will essentially give us jQuery functionality

        var $ = cheerio.load(html);

        // Finally, we'll define the variables we're going to capture

        var allText = $('body')
          .children()
          .find('p')
          .text()

        console.log('allText');
        console.log(allText);
        // return allText;
        res.send(allText); // Send response from here
      } else {}

      //return result;
    });
    console.log(allText);

  }

  getTheText();
  console.log('gettheText is ' + getTheText());
  // res.send(allText); // Remove this line
})

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-07-03
    • 2019-07-07
    • 2021-07-25
    • 1970-01-01
    • 1970-01-01
    • 2016-02-05
    相关资源
    最近更新 更多