【发布时间】:2021-01-12 20:54:45
【问题描述】:
尝试捕获页面中的所有<a>
console.log 返回未定义,但我不明白为什么
这是const anchors = Array.from(document.querySelectorAll(sel)); 正确吗?
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
userDataDir: "C:\\Users\\johndoe\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
});
const page = await browser.newPage();
await page.setViewport({
width: 1920,
height: 1080,
deviceScaleFactor: 1,
});
await page.goto('https://www.facebook.com/groups/632312010245152/members');
//https://github.com/puppeteer/puppeteer/blob/main/examples/search.js
let membri = await page.evaluate((sel) => {
const anchors = Array.from(document.querySelectorAll(sel));
return anchors;
}, 'a');
console.log(membri);
})();
【问题讨论】:
-
谢谢,我得到并返回了元素的属性(href),以便拥有一个可序列化的数组
const serializableLinks = anchors.map(x => x.getAttribute("href")); //<-- convert to string -
请记住,
x.getAttribute("href")可能会返回相对 URL。如果您需要绝对 URL,请改用x.href。
标签: javascript node.js web-scraping puppeteer