【发布时间】:2013-04-16 18:39:28
【问题描述】:
我正在尝试通过 scrapy 下载图像。这是我的不同文件:
items.py
class DmozItem(Item):
title = Field()
image_urls = Field()
images = Field()
settings.py
BOT_NAME = 'tutorial'
SPIDER_MODULES = ['tutorial.spiders']
NEWSPIDER_MODULE = 'tutorial.spiders'
ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']
IMAGES= '/home/mayank/Desktop/sc/tutorial/tutorial'
蜘蛛
class DmozSpider(BaseSpider):
name = "wikipedia"
allowed_domains = ["wikipedia.org"]
start_urls = [
"http://en.wikipedia.org/wiki/Pune"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
items = []
images=hxs.select('//a[@class="image"]')
for image in images:
item = DmozItem()
link=image.select('@href').extract()[0]
link = 'http://en.wikipedia.com'+link
item['image_urls']=link
items.append(item)
尽管进行了所有这些设置,我的管道仍未激活。请帮助。我是这个框架的新手。
【问题讨论】:
-
你是否安装了 PIL(Python Imaging Library)?这是图片下载的先决条件:doc.scrapy.org/en/latest/topics/images.html
-
你怎么知道管道没有被激活?您能否包含一些日志输出,例如:
2013-04-16 16:40:31-0500 [scrapy] DEBUG: Enabled item pipelines: ImagesPipeline。
标签: image download scrapy imagedownload