【问题标题】:A python program that fetches response data from ajax website?从 ajax 网站获取响应数据的 python 程序?
【发布时间】:2021-12-30 18:28:26
【问题描述】:

请注意,我是编程新手。这些是我在使用python学习网页抓取时遇到的问题。 我使用的网站是https://www.mobikwik.com/(手机、dth、电费的在线充值和支付网站) 但我得到的只是抓取时的 403 响应。然后我明白这可能是因为该网站使用的是ajax。我在制作程序时的目标是接收用户输入的手机号码,然后在网站的移动运营商搜索中传递该值,页面加载当前的运营商和圈子,我想在我的程序中显示它们。如果将手机号码移植到其他运营商,python phonenumber 模块将毫无用处。任何帮助表示赞赏。谢谢。

【问题讨论】:

  • 您是否尝试打印出响应的内容(response.content())?
  • 服务器告诉你不。我建议尝试使用selenium 来实现您想要的目标。
  • @BrokenBenchmark 我会试试的。谢谢。
  • @Chris 谢谢你
  • @Chris 我尝试了 selenium,但它给了我错误。我尝试了 find_element() 中的所有方法,但它带来了错误。当使用 find_element(By.TAG_NAME, 'input') 时 is_displayed() 给出响应 false 而 is_enabled() 给出 true。

标签: python web-scraping beautifulsoup python-requests scrapy


【解决方案1】:

有两个 xhr 请求,我不确定你想要哪个,所以我都做了。您只需重新创建请求即可。

  1. getconnectiondetails:
scrapy shell

In [1]: phone_number = '9820123456'

In [2]: url = 'https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn='

In [3]: headers = {
   ...: "Accept": "application/json, text/plain, */*",
   ...: "Accept-Encoding": "gzip, deflate, br",
   ...: "Accept-Language": "en-US,en;q=0.5",
   ...: "Cache-Control": "no-cache",
   ...: "Connection": "keep-alive",
   ...: "DNT": "1",
   ...: "Host": "rapi.mobikwik.com",
   ...: "Origin": "https://www.mobikwik.com",
   ...: "Pragma": "no-cache",
   ...: "Referer": "https://www.mobikwik.com/",
   ...: "Sec-Fetch-Dest": "empty",
   ...: "Sec-Fetch-Mode": "cors",
   ...: "Sec-Fetch-Site": "same-site",
   ...: "Sec-GPC": "1",
   ...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
   ...: 9.169 Safari/537.36",
   ...: "X-MClient": "0"
   ...: }

In [4]: req = scrapy.Request(url=url+phone_number, headers=headers)

In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn=9820123456> (referer: https://www.mobikwik.com/)

In [6]: json_data = response.json()

In [7]: json_data['data']['operatorId']
Out[7]: 338

In [8]: json_data['data']['circleId']
Out[8]: 15
  1. recommendedplans:
scrapy shell

In [1]: phone_number = '9820123456'

In [2]: url = 'https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn='

In [3]: headers = {
   ...: "Accept": "application/json, text/plain, */*",
   ...: "Accept-Encoding": "gzip, deflate, br",
   ...: "Accept-Language": "en-US,en;q=0.5",
   ...: "Cache-Control": "no-cache",
   ...: "Connection": "keep-alive",
   ...: "DNT": "1",
   ...: "Host": "rapi.mobikwik.com",
   ...: "Origin": "https://www.mobikwik.com",
   ...: "Pragma": "no-cache",
   ...: "Referer": "https://www.mobikwik.com/",
   ...: "Sec-Fetch-Dest": "empty",
   ...: "Sec-Fetch-Mode": "cors",
   ...: "Sec-Fetch-Site": "same-site",
   ...: "Sec-GPC": "1",
   ...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
   ...: 9.169 Safari/537.36",
   ...: "X-MClient": "0"
   ...: }

In [4]: req = scrapy.Request(url=url+phone_number, headers=headers)

In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn=9820123456> (referer: https://www.mobikwik.com/)

In [6]: json_data = response.json()

In [7]: for item in json_data['data']['plans']:
   ...:     print(item['id'])
   ...:
1104293
1155779
1155937
1164885
1156067

【讨论】:

  • 第一个计划是我需要的。我尝试使用用户代理和引用者标头,但没有得到任何响应。谢谢你,这真的很有帮助,拓宽了我的视野。也许您可能已经注意到,该网站显示正确的操作员姓名和圈子名称而不是数字。有没有办法让屏幕上显示的内容比相应的数字?谢谢你
  • 不,没有运营商名称。
  • 先生,请访问mobikwik.com 并输入任何印度号码。他们会给你当前的运营商和模拟活动所在的圈子。
  • 我的意思是API中没有运营商名称,只有id。
  • 是的,API 中没有操作员名称,我认为他们用数字引用它。有没有办法在显示时从网站上捕获运营商名称和圈子。
猜你喜欢
  • 1970-01-01
  • 2010-10-21
  • 2019-04-07
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-08-22
  • 2020-08-05
相关资源
最近更新 更多