【发布时间】:2022-02-15 00:14:49
【问题描述】:
我最近将一个脚本从 Puppeteer 转换为 Puppeteer Cluster,在测试期间,我在同时测试多个页面时观察到一些奇怪的结果。
实际上,我正在加载单个页面,然后遍历页面上的产品选项并收集任何产品变体的价格。
一个特定的产品有大约 9 个产品变体,有时我会准确地捕获所有 9 个变体,而在下一个测试周期中它可能只返回 2 或 3 个变体。
任何帮助将不胜感激!
const puppeteer = require('puppeteer');
const { Cluster } = require('puppeteer-cluster');
const Product = require('../utils/product')
const config = require('../config/config.json')
const selectors = config.productData;
(async () => {
const urls = [
{link: ...},
{link: ...},
{link: ...}
]
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 5,
puppeteerOptions: {
headless: false
},
});
await cluster.task(async ({ page, data: url }) => {
//instantiate a new product object
const product = new Product();
await page.goto(url, { waitUntil: 'load' });
const skuprice = await page.$eval(selectors.price, element => element.innerText);
console.log('Sku Price:' + skuprice)
//deal with variants
const options = await page.$$eval(selectors.variant, elements => elements.map(element=>element.id))
if (options.length > 0) {
//set up a variants array
for (let index = 0; index < options.length; index++) {
const element = options[index];
await page.waitForSelector(`#${element}`);
await page.$eval(`#${element}`, radio => radio.click());
await page.waitForTimeout(500);
const variantprice = await page.$eval(selectors.price, element => element.innerText);
console.log('Variant Price:' + variantprice)
}
}
});
urls.forEach(url => {
cluster.queue(url.link);
})
// many more pages
await cluster.idle();
await cluster.close();
})();
【问题讨论】:
-
请提供更多详细信息,例如您要抓取哪些站点/页面?