【问题标题】:puppeteer identifying element content after click eventpuppeteer 在点击事件后识别元素内容
【发布时间】:2018-06-07 00:44:24
【问题描述】:

在输入查询并单击按钮后,我试图从页面中提取特定元素。该页面不会导航到新的 URL:它只是返回我需要提取的新 HTML 内容。

这描述了我已经走了多远:

const puppeteer = require('puppeteer');

function timeout(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
};

const input_val = 'some query text';

(async() => {
    const browser = await puppeteer.launch()
    const page = await browser.newPage()
    await page.goto('http://target.com', { waitUntil: 'networkidle2' })
    await page.waitFor('input[name=query]')

    await page.evaluate((input_val) => {
      document.querySelector('input[name=query]').value = input_val;
      document.querySelector('.Button').click();
    }, input_val)

    // Now I want to console.log the <strong> tag fields 
    // innerText (will be 0-3 matching elements).
    // The lines below describe in non-puppeteer what 
    // I need to do. But this has no effect.

    const strongs = await page.$$('strong')
    for(var i=0; i<strongs.length; i++) {
      console.log(strongs[i].innerText);
    }

    await timeout(2000)
    await page.screenshot({path: 'example.png'}) // this renders results page ok

    browser.close();
})();

所以输入查询被正确输入,按钮点击被触发,屏幕截图显示网页已按预期响应。我只是不知道如何提取和报告相关位。

我一直试图了解整个 async/await 范式,但我对它还是很陌生。非常感谢您的帮助。


编辑 - Vaviloff 方法错误:

(node:67405) UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Cannot find context with specified id undefined
    at Promise (/Users/user/node_modules/puppeteer/lib/Connection.js:200:56)
    at new Promise (<anonymous>)
    at CDPSession.send (/Users/user/node_modules/puppeteer/lib/Connection.js:199:12)
    at ExecutionContext.evaluateHandle (/Users/user/node_modules/puppeteer/lib/ExecutionContext.js:79:75)
    at ExecutionContext.evaluate (/Users/user/node_modules/puppeteer/lib/ExecutionContext.js:46:31)
    at Frame.evaluate (/Users/user/node_modules/puppeteer/lib/FrameManager.js:326:20)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:160:7)
(node:67405) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:67405) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

【问题讨论】:

  • 请使用您当前的脚本更新问题,以便我们进行调试。可能是错字或遗漏了await。另外:你的木偶版本是什么?一些旧版本有a bug producing this error
  • 我添加了一个用于测试该技术的工作脚本。
  • 对我来说,将 await page.goto(url); 更改为 await page.goto(url, { waitUntil: 'networkidle2' }); 修复了错误。

标签: selenium-chromedriver puppeteer


【解决方案1】:

有一个有用的帮助工具page.$$eval

此方法在页面内运行Array.from(document.querySelectorAll(selector)),并将其作为第一个参数传递给 pageFunction。

由于它将一个数组传递给评估函数,我们可以在其上使用.map() 来提取所需的属性:

const strongs = await page.$$eval('strong', items => items.map( item => item.innerText));

更新 这是用于测试的完整工作脚本:

const puppeteer = require('puppeteer');

const input_val = '[puppeteer]';
const items_selector = '.question-hyperlink';

(async() => {

    const browser = await puppeteer.launch({
        headless: false,
    })
    const page = await browser.newPage()

    await page.goto('https://stackoverflow.com/', { waitUntil: 'networkidle2' })
    await page.waitFor('input[name=q]')
    await page.type('input[name=q]', input_val + '\r');
    await page.waitForNavigation();

    const items = await page.$$eval(items_selector, items => items.map( item => item.innerText));

    console.log(items);

    await browser.close();
})();

更新 2
沙盒脚本的修改版本https://diplodata.shinyapps.io/puppeteer-test/

const puppeteer = require('puppeteer');
const input_val = 'puppeteer';

(async() => {

    const browser = await puppeteer.launch({
        headless: false,
    })
    const page = await browser.newPage()

    await page.goto('https://diplodata.shinyapps.io/puppeteer-test/', { waitUntil: 'networkidle2' })
    await page.waitFor('#query')
    await page.type('#query', input_val);
    await page.click('#go');
    await page.waitFor(500);
    const items = await page.$$eval('strong', items => items.map( item => item.innerText));

    console.log(items);

    await browser.close();
})();

产生以下结果:

[ '点击下方应为:', '', 'puppeteer' ]

【讨论】:

  • 谢谢瓦维洛夫。这对我来说是一个错误 - 见上文。
  • 这很有用。然而,在控制台中没有任何打印。它运行 30 秒然后返回 (node:93096) UnhandledPromiseRejectionWarning: Error: Navigation Timeout Exceeded: 30000ms exceeded at Promise.then (/Users/robinedwards/node_modules/puppeteer/lib/NavigatorWatcher.js:73:21) at &lt;anonymous&gt; (node:93096) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1) (node:93096) [DEP0018] DeprecationWarning...
  • 专门为您的沙箱修改了脚本。
  • 有效!惊人的。所以(至少部分)我的脚本失败似乎是我报告的方式。例如for(var i=0; i&lt;items.length; i++) { console.log(items[i]); } 打印除“puppeteer”之外的所有内容。
  • 可以使用await page.waitFor(500)代替自定义await timeout(500);函数
猜你喜欢
  • 2020-12-10
  • 2019-07-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多