【问题标题】:Can't scrape the data while selecting drop down boxes value using BeautifulSoup使用 BeautifulSoup 选择下拉框值时无法抓取数据
【发布时间】:2021-06-27 02:32:13
【问题描述】:

我正在使用BeautifulSoup 从网站https://oxygen.digiavidity.com/?fbclid=IwAR3d_HtQPWni0lyHOMQOdokZGg3J7acwYc80EOFX7g8XYHloC550R5BtO94 抓取数据。

但是,如果我从 District 下拉框中选择一个特定区域以从特定区域获取所有 Suppliers name(in bold)contacts,同时将其他两个下拉框保持为默认值,那么我将无法获取所需的数据。

假设我选择下拉框为:

这是我的代码:

import requests
from bs4 import BeautifulSoup

url = "https://oxygen.digiavidity.com/? 
       fbclid=IwAR3d_HtQPWni0lyHOMQOdokZGg3J7acwYc80EOFX7g8XYHloC550R5BtO94"
soup = BeautifulSoup(requests.get(url).content, "lxml")

x=soup.find_all('div',class_='list-group')
for val in x:
   name=val.find('h5',class_='mb-1').text
   contact=val.find('p').text
   print(name)
   print(contact)

有人,请帮助我。提前致谢!

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    由于数据是从 api 加载的,因此无需抓取该网站。您可以使用requests 获取数据并使用response.json() 将json 解析为字典。然后你可以在pandas 中加载它。

    import requests
    import pandas as pd
    
    headers = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0; Touch)',
        'Content-type': 'application/json; charset=UTF-8',
    }
    
    response = requests.post('https://oxygen.digiavidity.com/ViewData/All', headers=headers)
    df = pd.DataFrame(response.json())
    

    结果df.head():

    _id Ident District Area_Name Supplier_Name Supplier_Contact Updated_date Updated_Time Fresh_Cylinder_Availability Oxygen_Refilling Additional_Information Delivery_Range SPOC Availability_Status
    0 60b91659c21655ec6eac3bf6 1 Kolkata Kolkata Swarnabha Dey 9038399847 3-June-2021 8:31 PM Yes No photo identity proof and prescription required All over West Bengal Ranita nan
    1 60b91659c21655ec6eac3bf7 2 Bankura Bankura Shreyasi(Volunteer) 7866855988 3-Jun-2021 12:57 PM Yes Yes photo identity proof and prescription required Bankura Ranita immediate refilling will be done only in town. Rest will take some time or contacts will be shared
    2 60b91659c21655ec6eac3bf8 3 Bankura Maliyaja, Bankura Baishali Tiwari 9831935524 20-May-2021 8:14:00 AM No No nan Bankura Chirantan Delivering cylinders only to hospitals
    3 60b91659c21655ec6eac3bf9 4 Birbhum Rampurhat Deb Bikram Dutta, Tarun Dutta (Don't call before 10am) 9434132232 3-Jun-2021 1:00:00 PM Yes Yes Prescription and Aadhar card required Rampurhat Ranita Both fresh cylinder and refilling available
    4 60b9165ac21655ec6eac3bfa 5 Birbhum Bolpur Ani 7029177504 3-Jun-2021 13:03:00 Yes Yes Whatsapp him the patient details to his number Bolpur Ranita Both fresh cylinder and refilling available

    您可以像这样按地区过滤:df[df['District'] == 'Birbhum']

    【讨论】:

    • 如何通过省略列_id来打印数据? @RJAdriaansen
    • df = df.drop(columns=['_id'])
    猜你喜欢
    • 2020-02-06
    • 1970-01-01
    • 1970-01-01
    • 2018-07-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多