【问题标题】:Getting json response using POST Request using scrapy python使用scrapy python使用POST请求获取json响应
【发布时间】:2021-10-20 07:19:41
【问题描述】:

我正在尝试使用来自此website 的发布请求获取数据。我在该网站上找到了帖子网址,但使用 scrapy 没有得到相同的回复。

这是我的代码:

import scrapy
from scrapy.http import request
from scrapy.http.request.form import FormRequest
from scrapy.http import FormRequest
import json

class CodeSpider(scrapy.Spider):
    name = 'code'
    allowed_domains = ['code.comcom']
    start_urls = ['https://technet.rapaport.com/HTTP/JSON/RetailFeed/GetDiamonds.aspx']

    def start_requests(self): 
        form_data = {"request":{"header":{"raplink_access_key":"e7d7d61946804c579d02dab565371113","domain":"www.sarvadajewels.com"},"body":{"search_type":"white","shapes":["round"],"size_from":0.1,"size_to":100,"color_from":"D","color_to":"M","clarity_from":"IF","clarity_to":"I1","cut_from":"Excellent","cut_to":"Poor","polish_from":"Excellent","polish_to":"Poor","symmetry_from":"Excellent","symmetry_to":"Poor","labs":[],"fancy_colors":[],"price_total_from":0,"price_total_to":7428404930,"page_number":2,"page_size":"60","sort_by":"price","sort_direction":"asc","currency_code":"INR"}}}
        request_body = json.dumps(form_data)
        yield scrapy.Request('https://technet.rapaport.com/HTTP/JSON/RetailFeed/GetDiamonds.aspx',
                            method="POST",
                            body=request_body,
                            headers={'Content-Type': 'application/json; charset=UTF-8'},callback=self.parse )

    def parse(self, response):
        with open('test.json', 'w') as file:
            file.write(str(response.body)

我正面临这个错误:

{'response': {'header': {'error_code': 1001, 'error_message': 'Invalid format'
        }, 'body': {}
    }



IS there anyway to get this. 

【问题讨论】:

    标签: python json post scrapy request


    【解决方案1】:

    您使用的标题不正确:

    {'Content-Type': 'application/json; charset=UTF-8'}
    

    应该是:

    {'Content-Type': 'application/x-www-form-urlencoded'}
    

    完整代码:

    import scrapy
    from scrapy.http import request
    from scrapy.http.request.form import FormRequest
    from scrapy.http import FormRequest
    import json
    
    class CodeSpider(scrapy.Spider):
        name = 'code'
        allowed_domains = ['code.comcom']
        start_urls = ['https://technet.rapaport.com/HTTP/JSON/RetailFeed/GetDiamonds.aspx']
    
        def start_requests(self): 
            form_data = {"request":{"header":{"raplink_access_key":"e7d7d61946804c579d02dab565371113","domain":"www.sarvadajewels.com"},"body":{"search_type":"white","shapes":["round"],"size_from":0.1,"size_to":100,"color_from":"D","color_to":"M","clarity_from":"IF","clarity_to":"I1","cut_from":"Excellent","cut_to":"Poor","polish_from":"Excellent","polish_to":"Poor","symmetry_from":"Excellent","symmetry_to":"Poor","labs":[],"fancy_colors":[],"price_total_from":0,"price_total_to":7428404930,"page_number":2,"page_size":"60","sort_by":"price","sort_direction":"asc","currency_code":"INR"}}}
            request_body = json.dumps(form_data)
            yield scrapy.Request('https://technet.rapaport.com/HTTP/JSON/RetailFeed/GetDiamonds.aspx',
                                method="POST",
                                body=request_body,
                                headers={'Content-Type': 'application/x-www-form-urlencoded'},callback=self.parse )
    
        def parse(self, response):
            yield json.loads(response.text)
    
    

    另外,scrapy 支持使用-o 标志获得的项目以不同格式写入文件。所以可以用它代替pythonwrite,试试:

    scrapy runspider <spider_name> -o test.json
    

    scrapy crawl code -o test.json
    

    【讨论】:

    • 现在数据来了,但我也想从 json 中获取数据
    • 你这是什么意思?
    • 我已经做到了。谢谢指正
    • 如果此答案是正确的,请考虑投票并将其标记为已回答。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-12-05
    • 2012-09-13
    • 1970-01-01
    • 2013-08-12
    • 1970-01-01
    • 2014-06-28
    相关资源
    最近更新 更多