在 Scrapy 中抓取目标价格

【问题标题】：Scraping Target Prices in Scrapy在 Scrapy 中抓取目标价格
【发布时间】：2020-12-17 02:40:22
【问题描述】：

我正在尝试使用 Scrapy 编写一个网络爬虫，它从目标中爬取产品的价格，但似乎价格是通过 javascript 获得的。我正在考虑使用硒，但我不确定我会如何做到这一点。你有什么建议吗？我的代码如下。

import scrapy


class TargetSpider(scrapy.Spider):
    name = 'target'
    allowed_domains = ['target.com']
    start_urls = ['https://www.target.com/p/red-blend-wine-750ml-bottle-california-roots-8482/-/A-52525405#lnk=sametab']

    def parse(self, response):
        price = response.xpath('/html/body/div[1]/div/div[5]/div/div[2]/div[2]/div[1]/div[1]/div[1]')
        print(price)

【问题讨论】：

您的替代方案是使用 selenium（或任何浏览器模拟器）或对该请求进行逆向工程以了解您需要的信息来自何处

标签： python selenium scrapy

【解决方案1】：

有一些选项可以呈现 javascript 页面。如果你想使用 Python，你可以使用Scrapy-Splash / Selenium / Playwright。

如果你对 NodeJs 没问题，你可以使用 Puppeteer / Playwright

如果我需要渲染页面，我更喜欢 Puppeteer。

这里有适合您的解决方案。

npm i puppeteer

文件名：price.js

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.target.com/p/red-blend-wine-750ml-bottle-california-roots-8482/-/A-52525405#lnk=sametab');
  let price = await page.evaluate(()=>{
      return document.querySelector('div[data-test="product-price"]').innerText;
  });

  await browser.close();
})();

【讨论】：

所有提到的技术都有 Scrapy 插件，而不仅仅是 Splash。