如何使用scrapy抓取歌曲答案

【问题标题】：How to scrape the songs using scrapy如何使用scrapy抓取歌曲
【发布时间】：2016-07-06 22:03:30
【问题描述】：

我正在抓取链接http://gaana.com/。我想获取编辑器选择专辑的列表，但我无法抓取此链接，不知道我的代码有什么问题。我的蜘蛛代码：

import scrapy
from tutorial.items import GannaItem


class GannaSpider(scrapy.Spider):
    name = 'gannaspider'
    start_urls = ["http://www.songspk.link/"]

    def parse(self, response):
        for sel in response.xpath('/html/body'):
            item = GannaItem()
            item['Albumname'] = sel.xpath('div[4]/div[4]/div[2]/div[1]/div[5]/div/ul/li[1]/div/div[2]/a[1]/span/text()').extract()
            item['link'] = sel.xpath('div[4]/div[4]/div[2]/div[1]/div[3]/div/div[2]/div/ul/li[1]/div/div[2]/a/@href').extract()
        yield item

我得到了输出

{'Albumname': [], 'link': []}

【问题讨论】：

标签： python-2.7 scrapy

【解决方案1】：

您的代码中有几个问题。

您的 Xpath 路径相当复杂。您可能使用Portia 之类的工具生成它们。我宁愿使用类名。正如我解释的那样，应该避免使用here 索引（如div[4]），以使您的Xpath 表达式更加健壮。我使用类名从根本上降低了复杂性，这使得调试它们更容易。
如果您使用嵌套选择器（就像使用 for 循环一样），您随后必须使用相对路径（以 ./ 开头），如 here 所述。

这段代码会做你想做的事：

import scrapy
from tutorial.items import GannaItem


class GannaSpider(scrapy.Spider):
    name = 'gannaspider'
    start_urls = ["http://www.songspk.link/"]

    def parse(self, response):
        for sel in response.xpath('//ul[@class="songs-list1"]/li[not(@class="title violett")]'):
            item = GannaItem()
            item['Albumname'] = sel.xpath('.//a[@class="link"]//text()').extract()
            item['link'] = sel.xpath('.//a[@class="link"]/@href').extract()
            yield item

【讨论】：