【问题标题】:Downloading mulitple images through scrapy in a single page通过scrapy在一个页面中下载多张图片
【发布时间】:2014-11-11 06:11:02
【问题描述】:

您好,这是我在 scrapy 中抓取多张图片的代码,但它只抓取一张图片。

你能告诉我哪里错了吗?

def parse(self, response):
    item = DmozItem()
    image_urls = response.xpath('//div[@class="overhid"]//img/@src').extract()
    item['image_urls'] = [ x for x in image_urls]
    return item

【问题讨论】:

  • 你能分享你正在网络抓取的网址吗?或者,至少用class="overhid" 显示div 的内容?谢谢。
  • 这是链接link,通过xpath我可以看到6个图片链接。我的代码对吗?

标签: python web-scraping scrapy


【解决方案1】:

问题是其他图像具有lazysrc 属性而不是src。获取两个属性:

$ scrapy shell http://www.snapdeal.com/product/xolo-win-q900s-black/770747222
>>> for image in response.xpath('//div[@class="overhid"]//img'):
...     print image.xpath('@src | @lazysrc').extract()[0]
... 
http://n4.sdlcdn.com/imgs/a/k/9/small/Xolo-WIN-Q900s-Black-SDL051074306-1-9dbe9.jpg
http://n1.sdlcdn.com/imgs/a/k/9/small/Xolo-WIN-Q900s-Black-SDL051074306-2-1c8f7.jpg
http://n3.sdlcdn.com/imgs/a/k/9/small/Xolo-WIN-Q900s-Black-SDL051074306-3-09694.jpg
http://n4.sdlcdn.com/imgs/a/k/9/small/Xolo-WIN-Q900s-Black-SDL051074306-4-af867.jpg
http://n4.sdlcdn.com/imgs/a/k/9/small/Xolo-WIN-Q900s-Black-SDL051074306-5-73467.jpg
http://n2.sdlcdn.com/imgs/a/k/9/small/Xolo-WIN-Q900s-Black-SDL051074306-6-5c97f.jpg

您应该如何更改parse() 回调:

def parse(self, response):
    item = DmozItem()
    images = response.xpath('//div[@class="overhid"]//img')
    item['image_urls'] = [image.xpath('@src | @lazysrc').extract()[0] 
                          for image in images]
    return item

【讨论】:

  • 嘿,我必须在我的代码中进行哪些更改
  • 感谢兄弟它的工作真棒你也可以在我的图片网址“n4.sdlcdn.com/imgs/a/k/9/small/…”中告诉一件事我想删除一个“小”字,因为由于这个我得到了小图片.你能告诉我一个方法吗?
猜你喜欢
  • 2020-06-08
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多