【发布时间】:2018-09-29 07:27:07
【问题描述】:
使用 Scrapy 下载器中间件时,您找不到所需的内容。你是构建一个Response 对象并返回它还是返回process_response 传入的response变量?
我尝试了后者,但在与 FilesPipeline 一起使用时不断收到 response has no attribute selector。
class CaptchaMiddleware(object):
def process_response(self, request, response, spider):
download_path = spider.settings['CAPTCHA_STORE']
# 1
captcha_images = parse_xpath(response, CAPTCHA_PATTERN, 'image')
if captcha_images:
for url in captcha_images:
url = response.urljoin(url)
print("Downloading %s" % url)
download_file(url, os.path.join(download_path, url.split('/')[-1]))
for image in os.listdir(download_path):
Image.open(image)
# 2
return response
如果我返回#1,FilesPipeline 会正常运行并下载文件,但如果我返回#2,则会返回错误response has no attribute selector
【问题讨论】: