在scrapy中导出为CSV格式不正确答案

【问题标题】：Exporting to CSV format incorrect in scrapy在scrapy中导出为CSV格式不正确
【发布时间】：2015-10-06 18:54:09
【问题描述】：

我正在尝试在使用 piplines 抓取后打印出一个 CSV 文件，但格式有点奇怪，因为不是从上到下打印，而是在抓取第 1 页和第 2 页的所有内容后一次打印一栏。我附上了 piplines.py 和 csv 输出中的一行（相当大）。那么我该如何从一页中一次打印列呢

管道.py

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter

class CSVPipeline(object):

    def __init__(self):
        self.files = {}

    @classmethod
    def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline


    def spider_opened(self, spider):
        file = open('%s_items.csv' % spider.name, 'w+b')
        self.files[spider] = file
        self.exporter = CsvItemExporter(file)
        self.exporter.fields_to_export = ['names','stars','subjects','reviews']
        self.exporter.start_exporting()

    def spider_closed(self, spider):
        self.exporter.finish_exporting()
        file = self.files.pop(spider)
        file.close()


    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

和输出.csv

names   stars   subjects
Vivek0388,NikhilVashisth,DocSharad,Abhimanyu_swarup,Suresh N,kaushalhkapadia,JyotiMallick,Nitin T,mhdMumbai,SunilTukrel(COLUMN 2)   5 of 5 stars,4 of 5 stars,1 of 5 stars,5 of 5 stars,3 of 5 stars,4 of 5 stars,5 of 5 stars,5 of 5 stars,4 of 5 stars,4 of 5 stars(COLUMN 3) Best Stay,Awesome View... Nice Experience!,Highly mismanaged and dishonest.,A Wonderful Experience,Good place with average front office,Honeymoon,Awesome Resort,Amazing,ooty's beauty!!,Good stay and food

它应该看起来像这样

Vivek0388      5 of 5
NikhilVashisth 5 of 5
DocSharad      5 of 5
...so on

编辑：

items = [{'reviews:':"",'subjects:':"",'names:':"",'stars:':""} for k in range(1000)]
if(sites and len(sites) > 0):
    for site in sites:
        i+=1
        items[i]['names'] = item['names']
        items[i]['stars'] = item['stars']
        items[i]['subjects'] = item['subjects']
        items[i]['reviews'] = item['reviews']
        yield Request(url="http://tripadvisor.in" + site, callback=self.parse)
    for k in  range(1000):
        yield items[k]

【问题讨论】：

忘了说我改变了设置
您知道，您的刮板将所有名称作为列表存储在您的项目中吗？（我记得昨天的问题）。尝试将每个条目拆分为其单独的项目以获得所需的结果。您的所有条目也是如此：您的一个条目是条目列表。
我试过了，但无济于事，我得到的只是一个空白文档。因为无论我在我的蜘蛛中定义什么，它都会被调用。但我认为我将转换为 JSON，然后将其转换为 CSV，因为我更习惯了。感谢您的帮助！
没问题，但正如我所说，您应该在 Spider 本身中处理这些结果，然后它就会像魅力一样工作。
我试过了，但我不断收到错误消息，说我需要返回 Item/Field() 我尝试返回一个字典，但我又遇到了一个错误。也没有作为它的递归调用起作用，因此它将重新定义删除它的字典。但我会再试一次，照你说的做。

标签： python csv web-scraping scrapy export-to-csv

【解决方案1】：

想通了， csv 压缩它，然后 for 循环它并写入行。阅读文档后，这会简单得多。

import csv
import itertools

class CSVPipeline(object):

   def __init__(self):
      self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',')
      self.csvwriter.writerow(['names','starts','subjects','reviews'])

   def process_item(self, item, ampa):

      rows = zip(item['names'],item['stars'],item['subjects'],item['reviews'])


      for row in rows:
         self.csvwriter.writerow(row)

      return item

【讨论】：