【问题标题】:TypeError: Selector object is not iterableTypeError:选择器对象不可迭代
【发布时间】:2021-10-07 11:30:32
【问题描述】:

我正在尝试运行这个练习scrapy代码,但它不断地给出这个错误。 它给了我 AttributeError 的错误:Selector object is not iterable 错误

代码如下:

from scrapy import Spider


class WikiSpider(Spider):
    name = 'wiki'
    allowed_domains = ['wikipedia.com']
    start_urls = ['https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States']

    def parse(self, response):
        Tabel=response.xpath('//table[contains(@class,"wikitable sortable")]')[0]
        for tabel in Tabel:        
          state=tabel.xpath('.//tbody/tr/th/a/text()')[1:].extract()
          yield{
                state
            }
         


这是错误信息:

2021-10-07 04:23:39 [scrapy.core.engine] INFO: Spider opened
2021-10-07 04:23:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-10-07 04:23:39 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2021-10-07 04:23:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://en.wikipedia.org/robots.txt> (referer: None)
2021-10-07 04:23:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States> (referer: None)
2021-10-07 04:23:41 [scrapy.core.scraper] ERROR: Spider error processing <GET https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States> (referer: None)
Traceback (most recent call last):
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\defer.py", line 120, in iter_errback
    yield next(it)
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
    return next(self.data)
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
    return next(self.data)
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 342, in <genexpr>     
    return (_set_referer(r) for r in result or ())
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 40, in <genexpr>    
    return (r for r in result or () if _filter(r))
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>        
    return (r for r in result or () if _filter(r))
  File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "D:\tutorials\WEB scrapping\web scraping practice projects\wikipedia\wikipedia\spiders\wiki.py", line 17, in parse
    for tabel in Tabel:
TypeError: 'Selector' object is not iterable

提前感谢您的大力支持

【问题讨论】:

    标签: python-3.x web-scraping scrapy


    【解决方案1】:

    当您执行 Tabel=response.xpath('//table[contains(@class,"wikitable sortable")]') 时,它会为您提供 Selector 列表,但您在行尾选择了带有 [0] 的第一个元素
    这给了你一个选择器,因为你得到了那个例外

    更改
    Tabel=response.xpath('//table[contains(@class,"wikitable sortable")]')[0]
    改为
    Tabel=response.xpath('//table[contains(@class,"wikitable sortable")]')

    【讨论】:

    • 感谢您的回复。但是在出现以下错误之后,不可散列的类型:'list'
    • yield{ state } 你在 set 中使用 list 你必须返回一个 dict 并且 dict 有这样的键 yield{ "data": state }
    【解决方案2】:

    因为你在收益中缺少关键:

    from scrapy import Spider
    
    
    class WikiSpider(Spider):
        name = 'wiki'
        allowed_domains = ['wikipedia.com']
        start_urls = [
            'https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States']
    
        def parse(self, response):
            Tabel = response.xpath(
                '//table[contains(@class,"wikitable sortable")]')
            for tabel in Tabel:
                state = tabel.xpath('.//tbody/tr/th/a/text()')[1:].extract()
                yield {
                    'state': state
                    }
    

    【讨论】:

      猜你喜欢
      • 2013-09-01
      • 2017-08-27
      • 2018-10-10
      • 2021-12-13
      • 2019-02-20
      • 2020-03-27
      • 2018-12-12
      • 2018-07-16
      • 2011-09-12
      相关资源
      最近更新 更多