【发布时间】:2015-01-20 19:34:41
【问题描述】:
这是一个抓取代码,我想从mouthshut.com 抓取数据,其中包含强标记。我能够运行它并获得标题,但它们是空白的。为什么它没有提取任何数据?
import scrapy
from scrapy.selector import Selector
from shut.items import ShutItem
class criticspider(scrapy.Spider):
name ="shut"
allowed_domains =["mouthshut.com"]
start_urls =["http://www.mouthshut.com/mobile-operators/vodafone-mobile-operator-reviews-925020930"]
def parse(self,response):
hxs = Selector(response)
sites = hxs.select('//li[@class="profile"]')
items = []
for site in sites:
item = ShutItem()
item['title'] = site.select('//strong[@style=" font-size: 15px;font-weight: 700;"]//a/text()').extract()
#item['date'] = site.select('div[@class="review_stats"]//div[@class="date"]/text()').extract()
#item['desc'] = site.select('div[@class="review_body"]//span[@class="blurb blurb_expanded"]/text()').extract()
items.append(item)
return items
【问题讨论】:
标签: python xpath scrapy web-crawler selector