【发布时间】:2018-11-14 11:05:00
【问题描述】:
我对scrapy很陌生,在运行我的代码时,我收到了这个错误。
我的代码
import urlparse
from scrapy.http import Request
from scrapy.spiders import BaseSpider
class legco(BaseSpider):
name = "sec_gov"
allowed_domains = ["www.sec.gov", "search.usa.gov", "secsearch.sec.gov"]
start_urls = ["https://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=&filenum=&State=&Country=&SIC=2834&owner=exclude&Find=Find+Companies&action=getcompany"]
#extract home page search results
def parse(self, response):
for link in response.xpath('//div[@id="seriesDiv"]//table[@class="tableFile2"]/a/@href').extract():
req = Request(url = link, callback = self.parse_page)
print link
yield req
#extract second link search results
def parse_second(self, response):
for link in response.xpath('//div[@id="seriesDiv"]//table[@class="tableFile2"]//*[@id="documentsbutton"]/a/@href').extract():
req = Request(url = link, callback = self.parse_page)
print link
yield req
一旦我尝试运行此代码:scrapy crawl sec_gov 出现此错误。
2018-11-14 15:37:26 [scrapy.core.engine] INFO: Spider opened
2018-11-14 15:37:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-14 15:37:26 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-11-14 15:37:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=&filenum=&State=&Country=&SIC=2834&owner=exclude&Find=Find+Companies&action=getcompany> (referer: None)
2018-11-14 15:37:27 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=&filenum=&State=&Country=&SIC=2834&owner=exclude&Find=Find+Companies&action=getcompany> (referer: None)
Traceback (most recent call last):
File "/home/surukam/.local/lib/python2.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/surukam/.local/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 90, in parse
raise NotImplementedError('{}.parse callback is not defined'.format(self.__class__.__name__))
NotImplementedError: legco.parse callback is not defined
2018-11-14 15:37:27 [scrapy.core.engine] INFO: Closing spider (finished)
谁能帮我解决这个问题?提前致谢
【问题讨论】:
-
这是python 2代码?
-
感谢您的回复 dejan,是的,它是 python 2 代码。
标签: python web-scraping scrapy web-crawler scrapy-spider