【问题标题】:Puppeteer losing context before retrieving lisPuppeteer 在检索 lis 之前失去上下文
【发布时间】:2021-09-26 04:08:01
【问题描述】:

我正在尝试使用 node/puppeteer 从无序列表中检索项目。我能够导航到页面,执行搜索,但是当我尝试使用 lis 生成数组时,它会因以下错误而中断:

UnhandledPromiseRejectionWarning:错误:协议错误 (Runtime.callFunctionOn): 执行上下文被破坏。

回复评论,这里是完整代码:

require('dotenv').config();
const puppeteer = require('puppeteer');
const ac = require("@antiadmin/anticaptchaofficial");

(async () => {

    ac.setAPIKey(process.env.ANTICAPTCHA_KEY);
    ac.getBalance()
        .then(balance => console.log('my balance is $'+balance))
        .catch(error => console.log('received error '+error))

    console.log('solve recaptcha first');
    let token = await ac.solveRecaptchaV2Proxyless('https://secure.meetup.com/login','6LcA8EUUAAAAAG17qfEfNaX6H8ozmI-IvmokZUnZ');
    if (!token) {
        console.log('something went wrong with captcha solving');
        return;
    } else {
        console.log('token is: ', token);
    }
    console.log ('opening browser');
    const browser = await puppeteer.launch({
        headless: false
    });

    console.log('creating new tab');
    const page = await browser.newPage();

    console.log('setting page size');
    await page.setViewport({width: 1368, height: 1080})

    console.log('opening target page');
    await page.goto('https://secure.thesite.com/login', {waitUntil: 'networkidle2'});

    await page.type('#email', process.env.MY_EMAIL)
    await page.type('#password', process.env.MY_PASSWORD)

    console.log('click login button');
    await page.evaluate((token) => {
        const textarea = document.querySelector("textarea#g-recaptcha-response.g-recaptcha-response")
        if (textarea) {
            textarea.innerText=token
        }
        const button = document.querySelector("#loginFormSubmit")
        button.disabled = false
        button.click()
    }, token)

    console.log('Entering keywords');
    await page.waitForSelector("input#mainKeywords.dropdown-toggle.ellipsize")
    await page.type("input#mainKeywords.dropdown-toggle.ellipsize","write")

    console.log('Click to search and filter fr groups')
    await page.evaluate(async () => {
        const searchForm = document.querySelector("#searchForm")
        searchForm.submit()
        const groupButton = document.querySelector("#simple-view-selector-group")
        await groupButton.click()
        function sleep(seconds) {
            return new Promise((resolve) => {
                setTimeout(() => resolve(true), seconds*1000)
            });
        }
        await sleep(2);
    })
    console.log('start grroups check')
    const groups = page.evaluate(async () => {
        console.log('setting grroups aray');
        const groups=[];
        console.log('about to awaait selector');
        if (await page.waitForSelector('a.groupCard--photo.loading.nametag-photo') !== null) {
            const groupList = document.querySelectorAll("a.groupCard--photo.loading.nametag-photo")
            console.log("groupis is: ", groupList);
            groupList.forEach(group => {
                groups.push(group.href);
            })
            return groups;
        } else {
            console.log('selector is null');
        }

    })
    console.log(groups);

})();

【问题讨论】:

    标签: node.js web-scraping puppeteer chromium


    【解决方案1】:

    您的 puppeteer 脚本有点过于复杂。我无法重现 Execution context was destroyed 场景(当然使用不同的页面,因为我没有 Meetup 帐户),但 groups 函数表达式肯定不是有效代码,这会导致您的大部分问题:

    1. 您需要 await page.evaluate 方法,因为它返回一个承诺,否则它会行为不端(例如,仅记录 [object Promise] 等)
    2. 您不能在页面上下文中使用 puppeteer API 方法(在 page.evaluate 内):
    if (await page.waitForSelector('a.groupCard--photo.loading.nametag-photo') !== null)
    

      您应该将此条件移到评估之外。

    1. page.evaluate 返回一个数组比您想象的要复杂(参见existing questions and answers on this)。

    建议

    我会重构这部分以避免上述问题并避免页面上下文问题。你可以使用page.$$eval:

    const groups = await page.$$eval('a.groupCard--photo.loading.nametag-photo', (groupList) => 
      groupList.map((group) => group.href)
    )
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-08-15
      • 1970-01-01
      • 2022-01-06
      • 1970-01-01
      • 2013-07-22
      • 2019-09-08
      • 2018-04-05
      相关资源
      最近更新 更多