【问题标题】:How to return data from loop in order in node如何在节点中按顺序从循环中返回数据
【发布时间】:2020-01-13 20:35:22
【问题描述】:

我正在创建一个网络抓取工具,它会从该网站 (https://www.imdb.com/movies-coming-soon/) 抓取明年上映的所有电影,并循环访问一系列链接,其中包含明年每个月的所有电影,它可以工作,但唯一的问题是由于 node.js 异步行为,它没有按顺序返回它们,我如何让它循环遍历数组并按顺序返回数据?

我试着做一个回调函数,但我不知道它会去哪里

const request = require('request')
const cheerio = require('cheerio')

const movieArray = [ '/movies-coming-soon/2019-09/',
'/movies-coming-soon/2019-10/',
'/movies-coming-soon/2019-11/',
'/movies-coming-soon/2019-12/',
'/movies-coming-soon/2020-01/',
'/movies-coming-soon/2020-02/',
'/movies-coming-soon/2020-03/',
'/movies-coming-soon/2020-04/',
'/movies-coming-soon/2020-05/',
'/movies-coming-soon/2020-06/',
'/movies-coming-soon/2020-07/',
'/movies-coming-soon/2020-08/' ]
for (let i = 0; i < movieArray.length; i++) {
    request.get('https://www.imdb.com' + movieArray[i] , (err, res, body) => {
        if (!err && res.statusCode == 200) {
            console.log(res.request.href)
            const $ = cheerio.load(body)
            //console.log(next)
            $('h4').each((i, v) => {
                const date = $(v).text()
                console.log(date)
            })               
        }
    })
}

我希望它按顺序返回数据,而不是根据节点异步行为导致数据返回的速度按顺序返回

【问题讨论】:

  • 你想按顺序收集什么结果?每个日期?
  • @jfriend00 日期和在该日期发布的电影,它返回它们,但是在循环遍历链接数组时顺序错误
  • 电影在哪里显示的结果?我只看到你得到了日期。
  • @jfriend00 date const 包含日期和电影,

标签: node.js asynchronous cheerio


【解决方案1】:

根据https://lavrton.com/javascript-loops-how-to-handle-async-await-6252dd3c795/ 的解释,这是 for 循环中的经典异步问题。以下是解决方案:

// const request = require('request')
const request = require('request-promise');
const cheerio = require('cheerio');

const movieArray = [
  '/movies-coming-soon/2019-09/',
  '/movies-coming-soon/2019-10/',
  '/movies-coming-soon/2019-11/',
  '/movies-coming-soon/2019-12/',
  '/movies-coming-soon/2020-01/',
  '/movies-coming-soon/2020-02/',
  '/movies-coming-soon/2020-03/',
  '/movies-coming-soon/2020-04/',
  '/movies-coming-soon/2020-05/',
  '/movies-coming-soon/2020-06/',
  '/movies-coming-soon/2020-07/',
  '/movies-coming-soon/2020-08/',
];

async function processMovieArray(array) {
  for (const item of array) {
    await getMovie(item);
  }
  console.log('Done');
}

async function getMovie(item) {
  const options = {
    method: `GET`,
    uri: 'https://www.imdb.com' + item,
  };
  const response = await request(options);
  const $ = cheerio.load(response.body);
  $('h4').each((i, v) => {
    const date = $(v).text();
    console.log(date);
  });
}

processMovieArray(movieArray);

【讨论】:

  • 我在说``` const $ = await chesterio.load(body) ^^^^^ SyntaxError: await is only valid in async function```时遇到错误```
  • 我的坏处是cheerio.load() 是同步的……而且请求也不是异步的。代码已更新,或者您可以关注@jfriend00 的解决方案
  • 我认为常规请求库是异步的,你将如何在那个库中做它,而不是请求承诺的库,因为我对承诺不太熟悉
  • 这里是一个相关讨论,其中包含关于 promise 的类似用例,您可能会发现他们的代码很有用 stackoverflow.com/questions/47341603/…
【解决方案2】:

与当前代码偏差最小的低技术方法是仅使用 for 循环的索引来填充数组。由于for 循环中的let 将为i 循环的每次迭代创建一个单独的变量for,因此我们可以在异步回调中使用该索引来引用结果数组中的所需位置。然后,您还可以使用cntr 来了解您何时完成所有结果:

const request = require('request');
const cheerio = require('cheerio');

if (!Array.prototype.flat) {
    Array.prototype.flat = function() {
        return this.reduce((acc, val) => acc.concat(val), []);
    }
}


const movieArray = [ '/movies-coming-soon/2019-09/',
'/movies-coming-soon/2019-10/',
'/movies-coming-soon/2019-11/',
'/movies-coming-soon/2019-12/',
'/movies-coming-soon/2020-01/',
'/movies-coming-soon/2020-02/',
'/movies-coming-soon/2020-03/',
'/movies-coming-soon/2020-04/',
'/movies-coming-soon/2020-05/',
'/movies-coming-soon/2020-06/',
'/movies-coming-soon/2020-07/',
'/movies-coming-soon/2020-08/' ];

let results = [];
let cntr = 0;
for (let i = 0; i < movieArray.length; i++) {
    request.get('https://www.imdb.com' + movieArray[i] , (err, res, body) => {
        ++cntr;
        if (!err && res.statusCode == 200) {
            console.log(res.request.href)
            const $ = cheerio.load(body)
            //console.log(next)
            let textArray = [];
            $('h4').each((i, v) => {
                console.log(date)
                textArray.push($(v).text());
            });
            results[i] = textArray;
        }
        if (cntr === moveArray.length) {
            // all results are done now
            let allResults = results.flat();
        }
    })
}

更优雅一点的方法是切换到 Promise 并让 Promise 基础架构为您保持一切井井有条:

const rp = require('request-promise');
const cheerio = require('cheerio');

if (!Array.prototype.flat) {
    Array.prototype.flat = function() {
        return this.reduce((acc, val) => acc.concat(val), []);
    }
}

const movieArray = [ '/movies-coming-soon/2019-09/',
'/movies-coming-soon/2019-10/',
'/movies-coming-soon/2019-11/',
'/movies-coming-soon/2019-12/',
'/movies-coming-soon/2020-01/',
'/movies-coming-soon/2020-02/',
'/movies-coming-soon/2020-03/',
'/movies-coming-soon/2020-04/',
'/movies-coming-soon/2020-05/',
'/movies-coming-soon/2020-06/',
'/movies-coming-soon/2020-07/',
'/movies-coming-soon/2020-08/' ];

// 
if (!Array.prototype.flat) {
    Array.prototype.flat = function() {
        return this.reduce((acc, val) => acc.concat(val), []);
    }
}

Promise.all(movieArray.map(path => {
    return rp('https://www.imdb.com' + path).then(body => {
        const $ = cheerio.load(body);
        let textArray = [];
        $('h4').each((i, v) => {
            // console.log($(v).text());
            textArray.push($(v).text());
        });
        return textArray;

    }).catch(err => {
        // ignore errors on urls that didn't work
        // so we can get the rest of the results without aborting
        console.log("err");
        return undefined;
    });
})).then(results => {
    // flatten the two level array and remove empty items
    let allResults = results.flat().filter(item => !!item);
    console.log(allResults);
}).catch(err => {
    console.log(err);
});

仅供参考,我在 nodejs 版本 10.16.0 中测试了第二个版本,它可以工作。

【讨论】:

  • @kendallkelly - 这对你有用吗?你有机会尝试吗?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-08-20
  • 2018-06-08
  • 2012-06-11
  • 1970-01-01
相关资源
最近更新 更多