【问题标题】:Puppeteer doesn't extract all elementsPuppeteer 不会提取所有元素
【发布时间】:2021-05-12 20:56:59
【问题描述】:

我正在用 NodeJS 编写一个脚本来提取过去 24 小时内交易量最大的加密货币。 我想在这样的数组中提取名称、股票代码和 24 小时百分比的列:

[{ name: 'Bitcoin', ticker: 'BTC', percentage: '20.62%' },
{ name: 'Ethereum', ticker: 'ETH', percentage: '10.19%' },
...
]

我的脚本看起来像这样,但是一旦你执行它,它会跳过一些行。 有谁知道为什么它随机跳过一些行?有没有更好的方法来做到这一点?

let cryptoData = []

const browser = await puppeteer.launch({ args: ['--no-sandbox'], headless: true })
const page = await browser.newPage()

await page.setViewport({ width: 1536, height: 850 })

await page.goto('https://coinmarketcap.com/', { waitUntil: 'networkidle2' })

// Wait for tickers table to fully load
  await page.waitForSelector('tr:nth-child(1) > td > .cmc-link > .sc-16r8icm-0 > .sc-16r8icm-0 > .sc-1eb5slv-0')

// Sort the list 24h descending
  await page.waitForSelector('.stickyTop:nth-child(5) > div > .sc-9dqrx-0 > .sc-9dqrx-1 > .sc-1eb5slv-0')
  await page.click('.stickyTop:nth-child(5) > div > .sc-9dqrx-0 > .sc-9dqrx-1 > .sc-1eb5slv-0')

// Wait for tickers table to fully load
  await page.waitForSelector('tr:nth-child(1) > td > .cmc-link > .sc-16r8icm-0 > .sc-16r8icm-0 > .sc-1eb5slv-0')


let data = await page.evaluate(() => {
  let tempData = []

  for (let index = 1; index <= 100; index++) {
    let name = document.querySelector(`tr:nth-child(${index}) > td > .cmc-link > .sc-16r8icm-0 > .sc-16r8icm-0 > .sc-1eb5slv-0`)
    let ticker = document.querySelector(`tr:nth-child(${index}) > td > .cmc-link > .sc-16r8icm-0 > .sc-16r8icm-0 > .sc-1teo54s-2 > .sc-1eb5slv-0`)
    let percentage = document.querySelector(`.cmc-table > tbody > tr:nth-child(${index}) > td > .iqsl6q-0`)

    if (name && ticker && percentage) {
      name = name.innerText

      tempData.push({
          id: index,
          name,
          ticker,
          percentage,
        })
    }
  }

  return tempData
})

console.log(data)

await browser.close()

【问题讨论】:

    标签: node.js puppeteer


    【解决方案1】:

    问题是 Puppeteer 没有加载整个页面,需要滚动到页面底部才能加载延迟加载数据。

    为此,我使用了另一个用户的回复:

    https://stackoverflow.com/a/53527984

    const puppeteer = require('puppeteer');
    
    (async () => {
        const browser = await puppeteer.launch({
            headless: false
        });
        const page = await browser.newPage();
        await page.goto('https://www.yoursite.com');
        await page.setViewport({
            width: 1200,
            height: 800
        });
    
        await autoScroll(page);
    
        await page.screenshot({
            path: 'yoursite.png',
            fullPage: true
        });
    
        await browser.close();
    })();
    
    async function autoScroll(page){
        await page.evaluate(async () => {
            await new Promise((resolve, reject) => {
                var totalHeight = 0;
                var distance = 100;
                var timer = setInterval(() => {
                    var scrollHeight = document.body.scrollHeight;
                    window.scrollBy(0, distance);
                    totalHeight += distance;
    
                    if(totalHeight >= scrollHeight){
                        clearInterval(timer);
                        resolve();
                    }
                }, 100);
            });
        });
    }
    
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-05-13
      • 2020-11-03
      • 1970-01-01
      • 1970-01-01
      • 2020-07-09
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多