Python。 Scrapy Xpath 返回空数组答案

【问题标题】：Python. Scrapy Xpath returning empty arrayPython。 Scrapy Xpath 返回空数组
【发布时间】：2015-10-27 19:25:51
【问题描述】：

我正在使用scrapy 从python 的网站上抓取信息，而我只是习惯于使用 Xpaths 来查找信息。

我想从此页面返回该艺术家的专辑的所有平均评分列表。 https://rateyourmusic.com/artist/kanye_west

为了找到我使用的专辑的节点//div[@id="disco_type_s"] 我尝试使用div[@class="disco_avg_rating"]/text() 为具有disco_avg_rating 属性的div 搜索孩子

这是我的功能

def parse_dir_contents(self, response):
    item = rateyourmusicalbums() *ignore this

    for i in response.xpath('//div[@id="disco_type_s"]'):
        item['average rating']=i.xpath('div[@class="disco_avg_rating"]/text()').extract()
        yield item

我尝试获取此列表的所有内容都会导致问题。通常它更直接，但这次我必须区分专辑和单曲等，所以我遇到了麻烦。

感谢您的帮助，我对网络抓取还很陌生。

【问题讨论】：

标签： python python-2.7 xpath web-scraping scrapy

【解决方案1】：

response.xpath('//div[@id="disco_type_s"]') 只找到一个标签（这是使用 id 匹配 xpath 时最常发生的情况，它们是唯一的）。要获取选择器列表，您应该使用以下内容：

response.xpath('//div[@id="disco_type_s"]/div[@class="disco_release"]') 将匹配多个标签，因此您可以对其进行迭代。

然后使用'./div[@class="disco_avg_rating"]/text()' 获取average rating

【讨论】：

【解决方案2】：

以下应该可以工作。

def parse_dir_contents(self, response):
 for i in response.xpath('//*[@class="disco_release"]/div[3]'):
    item['average rating']=i.xpath('text()').extract()
    yield item

【讨论】：

需要 extract_first() 以避免数组。