使用 BeautifulSoup 选择下拉框值时无法抓取数据答案

【问题标题】：Can't scrape the data while selecting drop down boxes value using BeautifulSoup使用 BeautifulSoup 选择下拉框值时无法抓取数据
【发布时间】：2021-06-27 02:32:13
【问题描述】：

我正在使用BeautifulSoup 从网站https://oxygen.digiavidity.com/?fbclid=IwAR3d_HtQPWni0lyHOMQOdokZGg3J7acwYc80EOFX7g8XYHloC550R5BtO94 抓取数据。

但是，如果我从 District 下拉框中选择一个特定区域以从特定区域获取所有 Suppliers name(in bold) 和 contacts，同时将其他两个下拉框保持为默认值，那么我将无法获取所需的数据。

假设我选择下拉框为：

这是我的代码：

import requests
from bs4 import BeautifulSoup

url = "https://oxygen.digiavidity.com/? 
       fbclid=IwAR3d_HtQPWni0lyHOMQOdokZGg3J7acwYc80EOFX7g8XYHloC550R5BtO94"
soup = BeautifulSoup(requests.get(url).content, "lxml")

x=soup.find_all('div',class_='list-group')
for val in x:
   name=val.find('h5',class_='mb-1').text
   contact=val.find('p').text
   print(name)
   print(contact)

有人，请帮助我。提前致谢！

【问题讨论】：

标签： python web-scraping beautifulsoup

【解决方案1】：

由于数据是从 api 加载的，因此无需抓取该网站。您可以使用requests 获取数据并使用response.json() 将json 解析为字典。然后你可以在pandas 中加载它。

import requests
import pandas as pd

headers = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0; Touch)',
    'Content-type': 'application/json; charset=UTF-8',
}

response = requests.post('https://oxygen.digiavidity.com/ViewData/All', headers=headers)
df = pd.DataFrame(response.json())

结果df.head():

	_id	Ident	District	Area_Name	Supplier_Name	Supplier_Contact	Updated_date	Updated_Time	Fresh_Cylinder_Availability	Oxygen_Refilling	Additional_Information	Delivery_Range	SPOC	Availability_Status
0	60b91659c21655ec6eac3bf6	1	Kolkata	Kolkata	Swarnabha Dey	9038399847	3-June-2021	8:31 PM	Yes	No	photo identity proof and prescription required	All over West Bengal	Ranita	nan
1	60b91659c21655ec6eac3bf7	2	Bankura	Bankura	Shreyasi(Volunteer)	7866855988	3-Jun-2021	12:57 PM	Yes	Yes	photo identity proof and prescription required	Bankura	Ranita	immediate refilling will be done only in town. Rest will take some time or contacts will be shared
2	60b91659c21655ec6eac3bf8	3	Bankura	Maliyaja, Bankura	Baishali Tiwari	9831935524	20-May-2021	8:14:00 AM	No	No	nan	Bankura	Chirantan	Delivering cylinders only to hospitals
3	60b91659c21655ec6eac3bf9	4	Birbhum	Rampurhat	Deb Bikram Dutta, Tarun Dutta (Don't call before 10am)	9434132232	3-Jun-2021	1:00:00 PM	Yes	Yes	Prescription and Aadhar card required	Rampurhat	Ranita	Both fresh cylinder and refilling available
4	60b9165ac21655ec6eac3bfa	5	Birbhum	Bolpur	Ani	7029177504	3-Jun-2021	13:03:00	Yes	Yes	Whatsapp him the patient details to his number	Bolpur	Ranita	Both fresh cylinder and refilling available

您可以像这样按地区过滤：df[df['District'] == 'Birbhum']

【讨论】：

如何通过省略列_id来打印数据？ @RJAdriaansen
df = df.drop(columns=['_id'])