【发布时间】:2015-10-06 18:54:09
【问题描述】:
我正在尝试在使用 piplines 抓取后打印出一个 CSV 文件,但格式有点奇怪,因为不是从上到下打印,而是在抓取第 1 页和第 2 页的所有内容后一次打印一栏。我附上了 piplines.py 和 csv 输出中的一行(相当大)。那么我该如何从一页中一次打印列呢
管道.py
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter
class CSVPipeline(object):
def __init__(self):
self.files = {}
@classmethod
def from_crawler(cls, crawler):
pipeline = cls()
crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
return pipeline
def spider_opened(self, spider):
file = open('%s_items.csv' % spider.name, 'w+b')
self.files[spider] = file
self.exporter = CsvItemExporter(file)
self.exporter.fields_to_export = ['names','stars','subjects','reviews']
self.exporter.start_exporting()
def spider_closed(self, spider):
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
和输出.csv
names stars subjects
Vivek0388,NikhilVashisth,DocSharad,Abhimanyu_swarup,Suresh N,kaushalhkapadia,JyotiMallick,Nitin T,mhdMumbai,SunilTukrel(COLUMN 2) 5 of 5 stars,4 of 5 stars,1 of 5 stars,5 of 5 stars,3 of 5 stars,4 of 5 stars,5 of 5 stars,5 of 5 stars,4 of 5 stars,4 of 5 stars(COLUMN 3) Best Stay,Awesome View... Nice Experience!,Highly mismanaged and dishonest.,A Wonderful Experience,Good place with average front office,Honeymoon,Awesome Resort,Amazing,ooty's beauty!!,Good stay and food
它应该看起来像这样
Vivek0388 5 of 5
NikhilVashisth 5 of 5
DocSharad 5 of 5
...so on
编辑:
items = [{'reviews:':"",'subjects:':"",'names:':"",'stars:':""} for k in range(1000)]
if(sites and len(sites) > 0):
for site in sites:
i+=1
items[i]['names'] = item['names']
items[i]['stars'] = item['stars']
items[i]['subjects'] = item['subjects']
items[i]['reviews'] = item['reviews']
yield Request(url="http://tripadvisor.in" + site, callback=self.parse)
for k in range(1000):
yield items[k]
【问题讨论】:
-
忘了说我改变了设置
-
您知道,您的刮板将所有名称作为列表存储在您的项目中吗? (我记得昨天的问题)。尝试将每个条目拆分为其单独的项目以获得所需的结果。您的所有条目也是如此:您的一个条目是条目列表。
-
我试过了,但无济于事,我得到的只是一个空白文档。因为无论我在我的蜘蛛中定义什么,它都会被调用。但我认为我将转换为 JSON,然后将其转换为 CSV,因为我更习惯了。感谢您的帮助!
-
没问题,但正如我所说,您应该在 Spider 本身中处理这些结果,然后它就会像魅力一样工作。
-
我试过了,但我不断收到错误消息,说我需要返回 Item/Field() 我尝试返回一个字典,但我又遇到了一个错误。也没有作为它的递归调用起作用,因此它将重新定义删除它的字典。但我会再试一次,照你说的做。
标签: python csv web-scraping scrapy export-to-csv