【问题标题】:TypeError: cannot create weak reference to 'str' object in SCRAPY in pythonTypeError:无法在 python 的 SCRAPY 中创建对“str”对象的弱引用
【发布时间】:2016-01-21 08:34:26
【问题描述】:

我在 python 中使用 scrapy 编写了以下蜘蛛,如下所示:

#!/usr/bin/python 
from twisted.internet import reactor
import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.selector import Selector

class GivenSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]

    def parse(self, response):
        select = Selector(response.body)
        title = select.xpath("//a[@class=listinglink]/@href").extract()
        print title
#       for t in title:
#           title4 = MyItem()
#           title4['content'] = t
#           yield title4

#       filename = response.url.split("/")[-2] + '.html'
#       with open(filename, 'wb') as f:
#           f.write(response.body)

configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runner = CrawlerRunner()

d = runner.crawl(GivenSpider)
d.addBoth(lambda _: reactor.stop())
reactor.run()

我正在运行它:

$ python runTimeSpider.py

我给出的以下输出是:

INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
INFO: Enabled item pipelines: 
INFO: Spider opened
INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG: Telnet console listening on 127.0.0.1:6023
DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
ERROR: Spider error processing <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "runTimeSpider.py", line 17, in parse
    select = Selector(str(response.body))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/unified.py", line 80, in __init__
    _root = LxmlDocument(response, self._parser)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/lxmldocument.py", line 24, in __new__
    cache = cls.cache.setdefault(response, {})
  File "/usr/lib/python2.7/weakref.py", line 433, in setdefault
    return self.data.setdefault(ref(key, self._remove),default)
TypeError: cannot create weak reference to 'str' object
ERROR: Spider error processing <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "runTimeSpider.py", line 17, in parse
    select = Selector(str(response.body))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/unified.py", line 80, in __init__
    _root = LxmlDocument(response, self._parser)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/lxmldocument.py", line 24, in __new__
    cache = cls.cache.setdefault(response, {})
  File "/usr/lib/python2.7/weakref.py", line 433, in setdefault
    return self.data.setdefault(ref(key, self._remove),default)
TypeError: cannot create weak reference to 'str' object
INFO: Closing spider (finished)
INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 514,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 16284,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 1, 21, 8, 28, 26, 17960),
 'log_count/DEBUG': 3,
 'log_count/ERROR': 2,
 'log_count/INFO': 7,
 'response_received_count': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'spider_exceptions/TypeError': 2,
 'start_time': datetime.datetime(2016, 1, 21, 8, 28, 24, 986319)}
INFO: Spider closed (finished)

如何打印标题? Ut有错误:

TypeError: cannot create weak reference to 'str' object

【问题讨论】:

    标签: python xpath scrapy typeerror


    【解决方案1】:

    原因是您想将response.body 转换为选择器。 response.body 是一个字符串——你不能在字符串上进行 XPath 查询。

    所以要么使用

    select = Selector(response)
    

    或直接在 response 对象上调用 XPath 查询,因为它是一个包含 xpath 作为方法的对象:

    title = response.xpath("//a[@class=listinglink]/@href").extract()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-05-25
      • 1970-01-01
      • 1970-01-01
      • 2020-05-16
      相关资源
      最近更新 更多