【问题标题】:extract data from website pandas read_html从网站 pandas read_html 中提取数据
【发布时间】:2021-03-01 00:52:58
【问题描述】:

我正在尝试从网站URL 提取数据

表格有一个跨度标签,它正在搞乱数据提取,表格值与跨度标签连接,我想在单独的单元格中提取单元格内容和跨度标签,任何帮助将不胜感激

这里是代码

import pandas as pd

url = "https://www.sqimway.com/lte_band.php"

lte_band = pd.read_html(url)

lte_band[0]

【问题讨论】:

    标签: python pandas web-scraping


    【解决方案1】:

    如果您有 pandas 0.24+,您可以使用 pandas.MultiIndex.to_flat_index(),然后将唯一值映射到每个列名。

    # Set a new DataFrame variable.
    df = lte_band[0]
    
    # Note: We will have to sort on the tuple index to retain order.
    df.columns = list(map(lambda q: " ".join(sorted(set(q), key = q.index)), df.columns.to_flat_index()))
    

    df.columns的输出:

    Index(['Band', 'Name', 'Mode', 'Downlink (MHz) Low Earfcn',
           'Downlink (MHz) Middle Earfcn', 'Downlink (MHz) High Earfcn',
           'BandwidthDL/UL (MHz)', 'Uplink (MHz) Low Earfcn',
           'Uplink (MHz) Middle Earfcn', 'Uplink (MHz) High Earfcn',
           'Duplex spacing(MHz)', 'Geographicalarea', '3GPPrelease',
           'Channel bandwidth (MHz) 1.4', 'Channel bandwidth (MHz) 3',
           'Channel bandwidth (MHz) 5', 'Channel bandwidth (MHz) 10',
           'Channel bandwidth (MHz) 15', 'Channel bandwidth (MHz) 20'],
          dtype='object')
    

    格式化:

    Band
    Name
    Mode
    Downlink (MHz) Low Earfcn
    Downlink (MHz) Middle Earfcn
    Downlink (MHz) High Earfcn
    BandwidthDL/UL (MHz)
    Uplink (MHz) Low Earfcn
    Uplink (MHz) Middle Earfcn
    Uplink (MHz) High Earfcn
    Duplex spacing(MHz)
    Geographicalarea
    3GPPrelease
    Channel bandwidth (MHz) 1.4
    Channel bandwidth (MHz) 3
    Channel bandwidth (MHz) 5
    Channel bandwidth (MHz) 10
    Channel bandwidth (MHz) 15
    Channel bandwidth (MHz) 20
    

    【讨论】:

      猜你喜欢
      • 2017-01-07
      • 2019-11-07
      • 1970-01-01
      • 2018-04-03
      • 2015-08-26
      • 1970-01-01
      • 2023-04-05
      • 2016-02-27
      • 1970-01-01
      相关资源
      最近更新 更多