【发布时间】:2020-08-19 02:24:57
【问题描述】:
我有一个按钮元素数组,我想一个一个地单击它们,并为每个打开的新选项卡执行此操作:
- 抓取一些信息并存储在一个名为“providers”的数组中
- 关闭该选项卡
虽然我能够做到这一点,但由于我在 browser.pages() 之前使用的导航组件,我不断收到超时错误。如果我删除该组件,我会收到另一个超时错误。此外,每次我运行程序时,在按钮数组的迭代次数不同后都会遇到超时错误。这是我的代码:
const puppeteer = require("puppeteer");
(async () => {
try {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
//google.com
await page.setExtraHTTPHeaders({ "Accept-Language": "en-US" });
await page.goto("https://google.com");
await page.type("input.gLFyf.gsfi", "hotels in london");
await page.keyboard.press("Enter");
//search results
await page.waitForXPath('//span[contains(text(),"View ")]');
const btn1 = await page.$x('//span[contains(text(),"View ")]');
await btn1[0].click();
//list of hotels
await page.waitForXPath('//span[contains(text(),"Learn more")]');
let hotels = [];
//buttons array that contains a list of buttons
let buttons = await page.$x("//button[contains(., 'View prices')]");
//prints a different value each time the program is run
console.log(buttons.length);
//looping through buttons array
for (var i = 0; i < buttons.length; i++) {
//i = 1 or 0 when program hangs
console.log("got here " + I);
//*******************************cause of timeout error******************************************
await page.setDefaultNavigationTimeout(0);
await Promise.all([
page.waitForNavigation({ waitUntil: "load", timeout: 0 }),
buttons[i].click(),
]);
//***********************************************************************************************
//getting all open tabs in an array
const pages = await browser.pages();
const page2 = pages[pages.length - 1];
console.log(pages.length);
//newly opened tab, sometimes program hangs before opening a new tab
await page2
.waitForSelector(
"#prices > c-wiz > div > div.G86l0b > div > div > div > div > div > section > div.Hkwcrd.q9W60.A5WLXb.fLClSe > c-wiz > div > div > span > div > div > div > div > div > a > div > div.cFdfnb > div > span.mK0tQb > span",
{ timeout: 30000 }
)
.catch(() => console.log("Class doesn't exist!"));
/*-----------------scraping information on new tab ----------------------------------*/
console.log("going to start collecting providers");
let providers = await page2.evaluate(() => {
let data = [];
let elements = document.querySelectorAll(
"#prices > c-wiz > div > div.G86l0b > div > div > div > div > div > section > div.Hkwcrd.q9W60.A5WLXb.fLClSe > c-wiz > div > div > span > div > div > div > div > div > a > div > div.cFdfnb > div > span.mK0tQb > span"
);
for (var element of elements) data.push(element.textContent);
return data;
});
console.log(providers.length);
console.log("all done");
console.log(providers);
hotels.push(providers);
//closing the new tab
page2.close();
}
await browser.close();
return hotels;
} catch (err) {
console.error(err);
}
})()
.then((resolvedValue) => {
console.log(resolvedValue);
})
.catch((rejectedValue) => {
console.log(rejectedValue);
});
为了摆脱错误,我使用了 timeout: 0 和 setDefaultNavigationTimeout(0),但现在程序只是冻结了。这是我在禁用超时获取之前遇到的错误:
TimeoutError: Navigation timeout of 30000 ms exceeded
at C:\Users\Me\Desktop\web_scraping_practice\node_modules\puppeteer\lib\LifecycleWatcher.js:100:111
at async FrameManager.waitForFrameNavigation (C:\Users\Me\Desktop\web_scraping_practice\node_modules\puppeteer\lib\FrameManager.js:107:23)
at async Frame.waitForNavigation (C:\Users\Me\Desktop\web_scraping_practice\node_modules\puppeteer\lib\FrameManager.js:298:16)
at async Page.waitForNavigation (C:\Users\Me\Desktop\web_scraping_practice\node_modules\puppeteer\lib\Page.js:560:16)
at async Promise.all (index 0)
at async C:\Users\Me\Desktop\web_scraping_practice\backend.js:41:7
-- ASYNC --
at Frame.<anonymous> (C:\Users\Me\Desktop\web_scraping_practice\node_modules\puppeteer\lib\helper.js:116:19)
at Page.waitForNavigation (C:\Users\Me\Desktop\web_scraping_practice\node_modules\puppeteer\lib\Page.js:560:53)
at Page.<anonymous> (C:\Users\Me\Desktop\web_scraping_practice\node_modules\puppeteer\lib\helper.js:117:27)
at C:\Users\Me\Desktop\web_scraping_practice\backend.js:42:14
at processTicksAndRejections (internal/process/task_queues.js:97:5) {
name: 'TimeoutError'
}
undefined
谢谢
【问题讨论】:
标签: navigation timeout puppeteer