【发布时间】:2021-10-13 08:05:41
【问题描述】:
我正在尝试使用以下代码从 pinterest 上抓取图像:Module(s) (puppeteer)。 src 属性返回每个图像的最小尺寸,我知道实际尺寸要大得多。这在srcset 属性中很明显,其中最后一个字符串具有原始图像及其原始大小。我只是不知道如何选择最后一个字符串,这就是我想要的。怎么选?
async function scrapePage(url) {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const images = await page.$$eval("img", imgs => {
return imgs.map(x => x.src)
});
for (const photo of images) {
console.log(photo)
}
await browser.close();
} catch (err) {
console.log("Error Found: " + err);
}
}
一张图片的元素:
<img alt="This contains an image of: {{ pinTitle }}" class="hCL kVc L4E MIw" importance="auto"
loading="auto" src="https://i.pinimg.com/236x/2c/9c/e7/2c9ce7fb090051e25a4983474ede2b86.jpg"
srcset="https://i.pinimg.com/236x/2c/9c/e7/2c9ce7fb090051e25a4983474ede2b86.jpg 1x,
https://i.pinimg.com/474x/2c/9c/e7/2c9ce7fb090051e25a4983474ede2b86.jpg 2x,
https://i.pinimg.com/736x/2c/9c/e7/2c9ce7fb090051e25a4983474ede2b86.jpg 3x,
https://i.pinimg.com/originals/2c/9c/e7/2c9ce7fb090051e25a4983474ede2b86.jpg 4x">
输出:
https://i.pinimg.com/236x/fa/84/ac/fa84acd127ecdbe42fa6d15b33f3336f.jpg
https://i.pinimg.com/236x/ab/2d/43/ab2d43d73cd57d0112768257f81058e7.jpg
https://i.pinimg.com/236x/39/9e/23/399e23b9c5bc9ba0dbece7538ed114f1.jpg
https://i.pinimg.com/236x/d3/37/bd/d337bd8466e3946bad14118b37403831.jpg
https://i.pinimg.com/236x/fb/19/ba/fb19bac40a682a8dd942ea90ea188a2a.jpg
...
编辑: return imgs.map(x => x.srcset)
输出: 什么都没有输出,它是空的
【问题讨论】:
标签: javascript html node.js puppeteer