【问题标题】:unable to get URL inside existing object using puppeteer无法使用 puppeteer 在现有对象中获取 URL
【发布时间】:2020-10-05 16:27:31
【问题描述】:

以下是我的 HTML:

<div class="product-thumbShim"></div><a target="_blank" href="tshirts/herenow/herenow-men-black-printed-round-neck-t-shirt/4318138/buy" style="display: block;"><div class="product-imageSliderContainer"><div class="product-sliderContainer" style="display: block;"><div style="background: rgb(244, 255, 249);"><div style="height: 280px; width: 100%;"><picture class="img-responsive" style="width: 100%; height: 100%; display: block;"><source srcset="
    https://assets.myntassets.com/f_webp,dpr_1.0,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg ,
    https://assets.myntassets.com/f_webp,dpr_1.5,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 1.5x,
    https://assets.myntassets.com/f_webp,dpr_1.8,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 1.8x,
    https://assets.myntassets.com/f_webp,dpr_2.0,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.0x,
    https://assets.myntassets.com/f_webp,dpr_2.2,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.2x,
    https://assets.myntassets.com/f_webp,dpr_2.4,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.4x,
    https://assets.myntassets.com/f_webp,dpr_2.6,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.6x,
    https://assets.myntassets.com/f_webp,dpr_2.8,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.8x" type="image/webp"><img src="https://assets.myntassets.com/dpr_2,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg" class="img-responsive" alt="HERE&amp;NOW Men Black Printed Round Neck T-shirt" title="HERE&amp;NOW Men Black Printed Round Neck T-shirt" style="width: 100%; display: block;"></picture></div></div></div></div><div class="product-productMetaInfo"><h3 class="product-brand">HERE&amp;NOW</h3><h4 class="product-product">Men Printed Round Neck T-shirt</h4><h4 class="product-sizes"><!-- react-text: 396 -->Sizes: <!-- /react-text --><span class="product-sizeInventoryPresent">S, </span><span class="product-sizeInventoryPresent">M, </span><span class="product-sizeInventoryPresent">L, </span><span class="product-sizeInventoryPresent">XL, </span><span class="product-sizeInventoryPresent">XXL</span></h4><div class="product-price"><span><span class="product-discountedPrice"><!-- react-text: 405 -->Rs. <!-- /react-text --><!-- react-text: 406 -->374<!-- /react-text --></span><span class="product-strike"><!-- react-text: 408 -->Rs. <!-- /react-text --><!-- react-text: 409 -->749<!-- /react-text --></span></span><span class="product-discountPercentage">(50% OFF)</span></div></div></a><div class="image-grid-similarColorsCta product-similarItemCta"><span class="myntraweb-sprite image-grid-similarColorsIcon sprites-similarProductsIcon"></span><span class="image-grid-iconText">VIEW SIMILAR</span></div><div class="product-actions "><span class="product-actionsButton product-wishlist " style="width: 100%; text-align: center;"><!-- react-text: 416 -->wishlist<!-- /react-text --></span></div><div class="product-sizeDisplayDiv"><div class="product-sizeDisplayHeader"><span>Select a size</span><span class="myntraweb-sprite product-sizeDisplayRemoveMark sprites-remove"></span></div><div class="product-sizeButtonsContaier"><button class="product-sizeButton">S</button><button class="product-sizeButton">M</button><button class="product-sizeButton">L</button><button class="product-sizeButton">XL</button><button class="product-sizeButton">XXL</button></div></div>"

当前代码:

const res = await page.evaluate(() => {
        const productArry = [...document.querySelectorAll(".product-base")];

        return productArry.map((product) => {
            let productSizeText = product.querySelector(".product-sizes").innerText;
            let productSizeArr = productSizeText
                .replace("Sizes:", "")
                .trim()
                .split(",");

            return {
                imageurl: product.querySelector("div > picture > .img-responsive")
                    .src,
                brandName: product.querySelector(".product-brand").innerText,
                productName: product.querySelector(".product-product").innerText,
                productSizes: productSizeArr,
            };
        });
    });

是不是因为延迟加载导致从上面的标签获取 src 时出现 null 错误

【问题讨论】:

    标签: node.js puppeteer


    【解决方案1】:

    请尝试以下代码sn-ps:

    //To get all the source tag urls
    let imageURLArr = await page.evaluate(() => {
        //This will get the first sourceTag of the DOM, change the value 0 according to your DOM that you are scrapping if it has more source tags and is not the first source tag element
        let sourceTag = document.getElementsByTagName('source')[0];
        // check selector exists
        if (sourceTag) {
            // This will give you all the image URLs of source tag
            let imagURLs = sourceTag.getAttribute('srcset')
            return imagURLs;
        }
    });
    
    console.log(imageURLArr);
    
    //To get the product brand name you can do this
    await page.waitForSelector('h3');
    const brandName = await page.evaluate(() => document.getElementsByClassName('product-brand').textContent);
    console.log('Brand Name = ' + brandName);
    
    // To get the product Sizes you can do this
    let productSizes = await page.$$eval('.product-sizeInventoryPresent', elements => {
        let sizes = elements.map((element) => element.textContent);
        return sizes;
    });
    

    【讨论】:

    • 我需要这种对象形式的 URL { imageurl: null, brandName: 'Rigo', productName: '圆领 T 恤', productPrice: 'Rs. 419卢比。 999(Rs. 580 OFF)',productSizes:['S','M','L','XL','XXL'] },
    • 而不是那个 null,实际的 imageURL
    • @SagarChavan 借助上面的代码,你可以进行相应的格式化
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-05-27
    • 1970-01-01
    • 1970-01-01
    • 2020-09-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多