【发布时间】:2015-06-09 03:23:54
【问题描述】:
我正在尝试从该类别页面上给出的所有(#123)详细信息页面中抓取一些属性 - http://stinkybklyn.com/shop/cheese/ 但scrapy无法遵循我设置的链接模式,我也检查了scrapy文档和一些教程但是没运气!
下面是代码:
import scrapy
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
class Stinkybklyn(CrawlSpider):
name = "Stinkybklyn"
allowed_domains = ["stinkybklyn.com"]
start_urls = [
"http://stinkybklyn.com/shop/cheese/chandoka",
]
Rule(LinkExtractor(allow=r'\/shop\/cheese\/.*'),
callback='parse_items', follow=True)
def parse_items(self, response):
print "response", response
hxs= HtmlXPathSelector(response)
title=hxs.select("//*[@id='content']/div/h4").extract()
title="".join(title)
title=title.strip().replace("\n","").lstrip()
print "title is:",title
有人可以告诉我在这里做错了什么吗?
【问题讨论】:
标签: python web-scraping web-crawler scrapy scrapy-spider