【问题标题】:Scrapy FormRequest returning 400 error while Python Requests works当 Python 请求工作时,Scrapy FormRequest 返回 400 错误
【发布时间】:2021-06-26 20:55:30
【问题描述】:

通过 Scrapy FormRequest 发送 Post 请求会导致 400 错误,而通过 Python Requests 发出的相同请求成功。

请求headersparams 不会是问题,因为它们可以处理请求。 Scrapy 中的什么可能会破坏这一点?

以下代码在scrapy shell中运行:

url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html'
headers = {
    'authority': 'www.tripadvisor.co.uk',
    'method': 'POST',
    'scheme': 'https',
    'accept': 'text/html, */*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'no-cache',
    'content-length': '102',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'dnt': '1',
    'origin': 'https://www.tripadvisor.co.uk',
    'pragma': 'no-cache',
    'sec-ch-ua-mobile': '?0',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest',
}
params = {
    'returnTo': '#REVIEWS',
    'filterLang': 'ALL',
    'changeSet': 'REVIEW_LIST'
}

Scrapy FormRequst 返回 400 错误。

In [10]: req = scrapy.http.FormRequest(
    ...:             url,
    ...:             method='POST',
    ...:             formdata=params,
    ...:             headers=headers)

In [11]: fetch(req)
2021-06-26 21:28:18 [scrapy.core.engine] DEBUG: Crawled (400) <POST https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html> (referer: None)

Python 请求返回 200,我可以访问内容。

In [17]: r = requests.post(url=url, headers=headers, json=params)
2021-06-26 21:30:02 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.tripadvisor.co.uk:443
2021-06-26 21:30:04 [urllib3.connectionpool] DEBUG: https://www.tripadvisor.co.uk:443 "POST /ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html HTTP/1.1" 200 16360

In [18]: r.status_code
Out[18]: 200

【问题讨论】:

    标签: python web-scraping python-requests scrapy


    【解决方案1】:

    由于我无法从这里访问网址,您可以尝试以下代码是否有效。您还必须添加用户代理。

    import scrapy
    
    class ReviewsSpider(scrapy.Spider):
        name = 'reviews' 
        body = "reqNum=1&isLastPoll=false&paramSeqId=0&waitTime=41&changeSet=REVIEW_LIST&puid=YNgN2QokGScAA0-MH9MAAAIQ"
        def start_requests(self):
            yield scrapy.Request(
                url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r791416821-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html',
                method = "POST",
                body = self.body,
                callback = self.parse,
                headers = {
                    'content-type': 'application/x-www-form-urlencoded',
                    'x-puid': 'YNgN2QokGScAA0-MH9MAAAIQ',
                    'x-requested-with': 'XMLHttpRequest'
                   
                }
            )
        def parse(self, response):
            pass
    

    【讨论】:

    • 感谢您的成功。我仍然不知道为什么我之前的尝试不起作用。我正在提供标头(包括用户代理)。谢谢!
    猜你喜欢
    • 1970-01-01
    • 2019-07-28
    • 2019-07-11
    • 2015-04-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多