【问题标题】:Scrapy exports everything in the same rowScrapy 在同一行中导出所有内容
【发布时间】:2022-01-17 23:09:45
【问题描述】:

我正在尝试用这个 python 文件抓取电子商务:

import scrapy
from scrapy.item import Field
from scrapy.loader import ItemLoader


class RipleyscraperItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    marca = scrapy.Field()
    descripcion = scrapy.Field()
    precio_normal = scrapy.Field()
    precio_internet = scrapy.Field()
    precio_tarjeta = scrapy.Field()
    vinculo = scrapy.Field()


class RipleySpider(scrapy.Spider):
    name = 'ripley'

    custom_settings = {
        'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.80 Chrome/71.0.3578.80 Safari/537.36',
        'FEED_EXPORT_FIELDS': ['marca', 'descripcion', 'precio_normal', 'precio_internet', 'precio_tarjeta', 'vinculo'],
        'CLOSESPIDER_PAGECOUNT': 50
    }


    allowed_domains = ['simple.ripley.cl']
    start_urls = ['https://simple.ripley.cl/otras-categorias/instrumentos-musicales/pianos-y-teclados?source=menu&s=mdco']

    def parse(self, response):
        for products in response.xpath('.//div[@class="catalog-product-item catalog-product-item__container col-xs-6 col-sm-6 col-md-4 col-lg-4"]'):
            item = ItemLoader(RipleyscraperItem(), selector = products)

            item.add_xpath('marca', '//div[@class="catalog-product-details__logo-container"]/div/span/text()')
            item.add_xpath('descripcion', '//div[@class="catalog-product-details__name"]/text()' ) 
            item.add_xpath('precio_normal', '//ul[@class="catalog-prices__list"]/li[@class="catalog-prices__list-price catalog-prices__lowest catalog-prices__line_thru"]/text()')
            item.add_xpath('precio_internet', '//ul[@class="catalog-prices__list"]/li[@class="catalog-prices__offer-price"]/text()')
            item.add_xpath('precio_tarjeta', '//ul[@class="catalog-prices__list"]/li[@class="catalog-prices__card-price"]/text()')
            item.add_xpath('vinculo', '//div[@class="catalog-product-item catalog-product-item__container col-xs-6 col-sm-6 col-md-4 col-lg-4"]/a/@href')

            yield item.load_item()

            next_page = response.xpath('//*[@id="catalog-page"]/div/div[2]/div[4]/nav/ul/li[6]/a/@href')
            if next_page is not None:
                yield response.follow(next_page, callback=self.parse)

  

然后导出为 CSV:

scrapy runspider ripley_end.py -o tablaripley.csv -t csv 但我的 csv 输出 是:CSV export

不是一个项目。是一个python文件。

如果您需要,我可以发送更多详细信息。

谢谢!!!

【问题讨论】:

    标签: python csv scrapy yield


    【解决方案1】:

    您的custom_settings 中缺少FEEDS 设置。如下定义您的custom_settings,然后简单地将脚本作为scrapy runspider ripley_end.py 运行,不带-o 参数。

    custom_settings = {
            'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.80 Chrome/71.0.3578.80 Safari/537.36',
            'FEEDS': {
                'tablaripley.csv':{
                    'format': 'csv',
                    'fields': ['marca', 'descripcion', 'precio_normal', 'precio_internet', 'precio_tarjeta', 'vinculo']
                }
            },
            'CLOSESPIDER_PAGECOUNT': 50
        }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-07-29
      • 2014-04-16
      • 2013-08-28
      • 1970-01-01
      相关资源
      最近更新 更多