从 ajax 网站获取响应数据的 python 程序？答案

【问题标题】：A python program that fetches response data from ajax website?从 ajax 网站获取响应数据的 python 程序？
【发布时间】：2021-12-30 18:28:26
【问题描述】：

请注意，我是编程新手。这些是我在使用python学习网页抓取时遇到的问题。我使用的网站是https://www.mobikwik.com/（手机、dth、电费的在线充值和支付网站）但我得到的只是抓取时的 403 响应。然后我明白这可能是因为该网站使用的是ajax。我在制作程序时的目标是接收用户输入的手机号码，然后在网站的移动运营商搜索中传递该值，页面加载当前的运营商和圈子，我想在我的程序中显示它们。如果将手机号码移植到其他运营商，python phonenumber 模块将毫无用处。任何帮助表示赞赏。谢谢。

【问题讨论】：

您是否尝试打印出响应的内容（response.content()）？
服务器告诉你不。我建议尝试使用selenium 来实现您想要的目标。
@BrokenBenchmark 我会试试的。谢谢。
@Chris 谢谢你
@Chris 我尝试了 selenium，但它给了我错误。我尝试了 find_element() 中的所有方法，但它带来了错误。当使用 find_element(By.TAG_NAME, 'input') 时 is_displayed() 给出响应 false 而 is_enabled() 给出 true。

标签： python web-scraping beautifulsoup python-requests scrapy

【解决方案1】：

有两个 xhr 请求，我不确定你想要哪个，所以我都做了。您只需重新创建请求即可。

getconnectiondetails:

scrapy shell

In [1]: phone_number = '9820123456'

In [2]: url = 'https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn='

In [3]: headers = {
   ...: "Accept": "application/json, text/plain, */*",
   ...: "Accept-Encoding": "gzip, deflate, br",
   ...: "Accept-Language": "en-US,en;q=0.5",
   ...: "Cache-Control": "no-cache",
   ...: "Connection": "keep-alive",
   ...: "DNT": "1",
   ...: "Host": "rapi.mobikwik.com",
   ...: "Origin": "https://www.mobikwik.com",
   ...: "Pragma": "no-cache",
   ...: "Referer": "https://www.mobikwik.com/",
   ...: "Sec-Fetch-Dest": "empty",
   ...: "Sec-Fetch-Mode": "cors",
   ...: "Sec-Fetch-Site": "same-site",
   ...: "Sec-GPC": "1",
   ...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
   ...: 9.169 Safari/537.36",
   ...: "X-MClient": "0"
   ...: }

In [4]: req = scrapy.Request(url=url+phone_number, headers=headers)

In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn=9820123456> (referer: https://www.mobikwik.com/)

In [6]: json_data = response.json()

In [7]: json_data['data']['operatorId']
Out[7]: 338

In [8]: json_data['data']['circleId']
Out[8]: 15

recommendedplans:

scrapy shell

In [1]: phone_number = '9820123456'

In [2]: url = 'https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn='

In [3]: headers = {
   ...: "Accept": "application/json, text/plain, */*",
   ...: "Accept-Encoding": "gzip, deflate, br",
   ...: "Accept-Language": "en-US,en;q=0.5",
   ...: "Cache-Control": "no-cache",
   ...: "Connection": "keep-alive",
   ...: "DNT": "1",
   ...: "Host": "rapi.mobikwik.com",
   ...: "Origin": "https://www.mobikwik.com",
   ...: "Pragma": "no-cache",
   ...: "Referer": "https://www.mobikwik.com/",
   ...: "Sec-Fetch-Dest": "empty",
   ...: "Sec-Fetch-Mode": "cors",
   ...: "Sec-Fetch-Site": "same-site",
   ...: "Sec-GPC": "1",
   ...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
   ...: 9.169 Safari/537.36",
   ...: "X-MClient": "0"
   ...: }

In [4]: req = scrapy.Request(url=url+phone_number, headers=headers)

In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn=9820123456> (referer: https://www.mobikwik.com/)

In [6]: json_data = response.json()

In [7]: for item in json_data['data']['plans']:
   ...:     print(item['id'])
   ...:
1104293
1155779
1155937
1164885
1156067

【讨论】：

第一个计划是我需要的。我尝试使用用户代理和引用者标头，但没有得到任何响应。谢谢你，这真的很有帮助，拓宽了我的视野。也许您可能已经注意到，该网站显示正确的操作员姓名和圈子名称而不是数字。有没有办法让屏幕上显示的内容比相应的数字？谢谢你
不，没有运营商名称。
先生，请访问mobikwik.com 并输入任何印度号码。他们会给你当前的运营商和模拟活动所在的圈子。
我的意思是API中没有运营商名称，只有id。
是的，API 中没有操作员名称，我认为他们用数字引用它。有没有办法在显示时从网站上捕获运营商名称和圈子。