Scrapy 错误：exceptions.AttributeError：“HtmlResponse”对象没有属性“urljoin”答案

【问题标题】：Scrapy error : exceptions.AttributeError: 'HtmlResponse' object has no attribute 'urljoin'Scrapy 错误：exceptions.AttributeError：“HtmlResponse”对象没有属性“urljoin”
【发布时间】：2015-09-17 16:30:27
【问题描述】：

我已经使用 pip 安装了 scrapy，并尝试了 scrapy 文档中的示例。

我收到了错误cannot import name xmlrpc_client

在查看 stachoverflow 问题后here 我已经修复了它使用

sudo pip uninstall scrapy

sudo pip install scrapy==0.24.2

但现在它显示给我exceptions.AttributeError: 'HtmlResponse' object has no attribute 'urljoin'

这是我的代码：

import scrapy


class StackOverflowSpider(scrapy.Spider):
    name = 'stackoverflow'
    start_urls = ['https://stackoverflow.com/questions?sort=votes']

    def parse(self, response):
        for href in response.css('.question-summary h3 a::attr(href)'):
            full_url = response.urljoin(href.extract())
            yield scrapy.Request(full_url, callback=self.parse_question)

    def parse_question(self, response):
        yield {
            'title': response.css('h1 a::text').extract()[0],
            'votes': response.css('.question .vote-count-post::text').extract()[0],
            'body': response.css('.question .post-text').extract()[0],
            'tags': response.css('.question .post-tag::text').extract(),
            'link': response.url,
        }

谁能帮帮我！

【问题讨论】：

标签： python web-scraping scrapy

【解决方案1】：

在 Scrapy >=0.24.2 中，HtmlResponse 类还没有 urljoin() 方法。直接使用urlparse.urljoin()：

full_url = urlparse.urljoin(response.url, href.extract())

别忘了导入它：

import urlparse

注意urljoin() alias/helper 是在 Scrapy 1.0 中添加的，这里是相关问题：

Add Response.urljoin() helper

这里是what it actually is：

from six.moves.urllib.parse import urljoin

def urljoin(self, url):
    """Join this Response's url with a possible relative url to form an
    absolute interpretation of the latter."""
    return urljoin(self.url, url)

【讨论】：

但是他们网站上显示的例子怎么可能是错误的呢？当我用 urlparse 尝试它时，它给了我urljoin() takes at least 2 arguments (1 given)
@user3437315 该示例适用于 Scrapy 1.0，而您使用的是 0.24.2
谢谢，但是我在哪里可以获得 Scrapy 0.24.2 的文档。他们的网站上似乎没有
@user3437315 尝试卸载scrapy并重新安装。
我做到了。我得到了无法导入名称 xmlrpc_client 错误！我也尝试重新安装六个。但不起作用！

【解决方案2】：

您使用的示例代码适用于 scrapy 1.0。由于您已降级到 0.24，因此您需要使用 urljoin from urlparse：

full_url = urljoin(response.url, href.extract())

如果您点击示例上方的“Scrapy 0.24（旧稳定版）”按钮，您将获得您正在使用的 scrapy 版本的示例代码。

【讨论】：