【发布时间】:2014-12-03 13:08:34
【问题描述】:
这是我的代码,我正在尝试访问此站点的评论,但显示错误。
class DomainCrawlSpider(BaseSpider):
name = "Spider"
allowed_domains = ["www.smahavarkar.wordpress.com"]
start_urls = "http://smahavarkar.wordpress.com/"
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//p")
items = []
for titles in titles:
item = DItem()
item ["address"] = titles.select("a/text()").extract()
item ["review1"] = titles.select("p/text()").extract()
item.append(item)
return item
【问题讨论】:
-
ValueError:请求 url 中缺少方案:h
标签: python xpath error-handling web-scraping web-crawler