【发布时间】:2019-09-05 07:48:16
【问题描述】:
说明
我的代码在 Python 2 中运行,但 Scrapy 很快就会停止对 Python 2 的支持。我正在尝试迁移到 Python 3,但似乎 Scrapy 与 POST 请求中的二进制文件存在一些兼容性问题。
复制步骤
我正在尝试使用填充有图像二进制文件的 reponse.body 来执行此请求。
yield scrapy.Request(u"{}/formrecognizer/v1.0-preview/prebuilt/receipt/asyncBatchAnalyze".format(self.endpoint),
method='POST',
body=response.body,
headers=self.binary_headers,
callback=self.parse_result_url)
然后我得到这个错误:
Traceback (most recent call last):
File "c:\python374\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "c:\python374\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "c:\python374\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
for x in result:
File "c:\python374\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "c:\python374\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in <genexpr>
return (_set_referer(r) for r in result or ())
File "c:\python374\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "c:\python374\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "c:\python374\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
for r in iterable:
File "c:\python374\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "D:\Kerja\HIT\Python Projects\<my_project>\receipts\receipts\receipts\spiders\receipt_recognizer.py", line 63, in parse_result_url
yield scrapy.Request(response.headers['Operation-Location'], headers=self.receipt_headers, callback=self.parse_result)
File "c:\python374\lib\site-packages\scrapy\http\request\__init__.py", line 25, in __init__
self._set_url(url)
File "c:\python374\lib\site-packages\scrapy\http\request\__init__.py", line 63, in _set_url
raise TypeError('Request url must be str or unicode, got %s:' % type(url).__name__)
TypeError: Request url must be str or unicode, got bytes:
版本
Scrapy : 1.7.3
lxml : 4.4.1.0
libxml2 : 2.9.5
cssselect : 1.1.0
parsel : 1.5.2
w3lib : 1.21.0
Twisted : 19.7.0
Python : 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 19:29:22) [MSC v.1916 32 bit (Intel)]
pyOpenSSL : 19.0.0 (OpenSSL 1.1.1c 28 May 2019)
cryptography : 2.7
Platform : Windows-10-10.0.17134-SP0
【问题讨论】:
标签: python-3.x web-scraping scrapy screen-scraping