【问题标题】:Iterate over table using puppeteer使用 puppeteer 遍历表
【发布时间】:2021-12-12 19:10:57
【问题描述】:

我想从表格中的网站获取数据。首先,我尝试获取整个表,然后获取其中的trtd。我现在的代码只是返回空数组。

const puppeteer = require("puppeteer");

async function run() {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();
  await page.goto(
    "https://www.basketball-reference.com/leagues/NBA_2021_standings.html" //Eastern Conference
  );

  var temp = [];
  const data = await page.evaluate(() => {
    const tableBody = document.querySelector(
      'table[id="confs_standings_E"] tbody'
    );

     for (var i = 0; i < tableBody.length; i++) {
      const tr = tableBody[i].querySelectorAll("tr");
      for (var j = 0; j < tr.length; j++) {
       const td = tr[j].querySelectorAll("td").innerText;
       temp.push(td);
  }
}
  });

  console.log(temp);
  //await browser.close();
}

run();

更新

我尝试了发布的解决方案,它确实有效,非常感谢,但我想尝试以另一种方式获得解决方案。下面的代码得到了正确数量的元素,但它们都是undefined 这是我在控制台中得到的:

0: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
1: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
2: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
3: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
4: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
5: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
6: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
7: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
8: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
9: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
10: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
11: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
12: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
13: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]
14: (7) [undefined, undefined, undefined, undefined, undefined, undefined, undefined]

这是新代码:

async function run() {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();
  await page.goto(
    "https://www.basketball-reference.com/leagues/NBA_2021_standings.html" //Eastern Conference
  );

  seznamEkip = [];
  const ekipe = await page.$$("#confs_standings_E tbody tr");

  for (const ekipa of ekipe) {
    const podatki = await ekipa.$$("td");
    const spread = [...podatki].map((element) => element.innerText);
    seznamEkip.push(spread);
  }

  console.log(seznamEkip);
}

run();

感谢您的帮助

【问题讨论】:

    标签: javascript node.js puppeteer


    【解决方案1】:

    回想一下 document.querySelector 返回单个元素,而不是数组。因此,用 for 循环遍历它的长度是没有意义的,因为它没有长度——它只是一个元素。您可能会发现迭代 element's .children instead 很有用。

    作为替代方案,请考虑使用 querySelectorAll 并改用稍微不同的选择器。例如,这将选择 tbody 中的每个 td,并返回其innerTexts 的数组:

    const data = await page.evaluate(() => {
        const tableBody = document.querySelectorAll('#confs_standings_E tbody td');
        return Array.from(tableBody).map(element => element.innerText);
    });
    console.log(data);
    

    【讨论】:

    • 感谢您的帮助。你的解决方案有效,但我也试图找到其他方法来解决这个问题,我用新代码和新问题更新了我的问题,所以如果你知道如何解决它,我将不胜感激。
    • 关于您的更新,请注意 page.$$ method 返回一个 Promise,它解析为 Puppeteer ElementHandles 的数组,而不是实际的 DOM 元素。因此,innerText 未定义。您应该使用 Puppeteer API 通过评估页面上的 JS 来访问文本。查看ElementHandle.$$eval method 作为开始。
    猜你喜欢
    • 1970-01-01
    • 2020-12-18
    • 2021-01-27
    • 2014-06-10
    • 1970-01-01
    • 1970-01-01
    • 2017-06-15
    • 2020-09-07
    • 1970-01-01
    相关资源
    最近更新 更多