【问题标题】:Reading CSV with sparsely labeled column headers using pandas使用 pandas 读取带有稀疏标记的列标题的 CSV
【发布时间】:2018-01-22 02:00:11
【问题描述】:

我有一个 .csv 我正试图读入一个具有多行列标题的 pandas 数据帧,但第一行标记稀疏。

例如:

Binned_average_and_predicted_H2O_spectra_sorted_by_RH-class.,,,,,,,,
,RH=0.8,,,,RH=0.9,,,
,n_=_60,,,,n_=_29,,,
nat_freq,avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o),avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o)
6.10E-04,8.40E-02,0.117551351,0.117550357,8.64E-02,0.128696811,0.163304381,0.163304015,0.127552704
1.22E-03,7.49E-02,0.126467592,0.126465605,7.70E-02,9.05E-02,0.200350295,0.200349563,8.97E-02
1.83E-03,7.54E-02,0.124370072,0.124367091,7.76E-02,8.54E-02,0.121274897,0.121273799,8.46E-02
2.44E-03,7.76E-02,0.136590839,0.136586865,7.99E-02,5.45E-02,0.100995665,0.100994202,5.40E-02
3.05E-03,8.73E-02,0.141422799,0.141417832,8.98E-02,7.57E-02,0.170033442,0.170031614,7.50E-02
3.66E-03,7.29E-02,0.143599074,0.143593115,7.50E-02,0.10001777,0.165468366,0.165466173,9.91E-02

当我阅读 csv 文件时,

Cosp2 = pd.read_csv(DPath,index_col=0, header=[1,3])
print(Cosp2)

我最终得到 Unnamed: #_level_0 所有标题上的标签,第一级标题没有明确标记。

             RH=0.8 Unnamed: 2_level_0    Unnamed: 3_level_0  \
nat_freq avrg_sp(T)       avrg_sp(h2o) denoised_avrg_sp(h2o)   
0.00061      0.0840           0.117551              0.117550   
0.00122      0.0749           0.126468              0.126466   
0.00183      0.0754           0.124370              0.124367   
0.00244      0.0776           0.136591              0.136587   
0.00305      0.0873           0.141423              0.141418   
0.00366      0.0729           0.143599              0.143593   

         Unnamed: 4_level_0     RH=0.9 Unnamed: 6_level_0  \
nat_freq       pred_sp(h2o) avrg_sp(T)       avrg_sp(h2o)   
0.00061              0.0864   0.128697           0.163304   
0.00122              0.0770   0.090500           0.200350   
0.00183              0.0776   0.085400           0.121275   
0.00244              0.0799   0.054500           0.100996   
0.00305              0.0898   0.075700           0.170033   
0.00366              0.0750   0.100018           0.165468   

            Unnamed: 7_level_0 Unnamed: 8_level_0  
nat_freq denoised_avrg_sp(h2o)       pred_sp(h2o)  
0.00061               0.163304           0.127553  
0.00122               0.200350           0.089700  
0.00183               0.121274           0.084600  
0.00244               0.100994           0.054000  
0.00305               0.170032           0.075000  
0.00366               0.165466           0.099100 

有没有办法让 pandas 在未标记的列中传播 0 级标签?我想要看起来像这样的东西:

             RH=0.8                                                  \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)   
0.00061      0.0840     0.117551              0.117550       0.0864   
0.00122      0.0749     0.126468              0.126466       0.0770   
0.00183      0.0754     0.124370              0.124367       0.0776   
0.00244      0.0776     0.136591              0.136587       0.0799   
0.00305      0.0873     0.141423              0.141418       0.0898   
0.00366      0.0729     0.143599              0.143593       0.0750   

             RH=0.9                                                  
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)  
0.00061    0.128697     0.163304              0.163304     0.127553  
0.00122    0.090500     0.200350              0.200350     0.089700  
0.00183    0.085400     0.121275              0.121274     0.084600  
0.00244    0.054500     0.100996              0.100994     0.054000  
0.00305    0.075700     0.170033              0.170032     0.075000  
0.00366    0.100018     0.165468              0.165466     0.099100 

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    您可以先将get_level_valuesto_series 一起用于Series

    a = Cosp2.columns.get_level_values(0).to_series()
    print (a)
    RH=0.8                            RH=0.8
    Unnamed: 2_level_0    Unnamed: 2_level_0
    Unnamed: 3_level_0    Unnamed: 3_level_0
    Unnamed: 4_level_0    Unnamed: 4_level_0
    RH=0.9                            RH=0.9
    Unnamed: 6_level_0    Unnamed: 6_level_0
    Unnamed: 7_level_0    Unnamed: 7_level_0
    Unnamed: 8_level_0    Unnamed: 8_level_0
    dtype: object
    

    如果startswith Unnamed 则将mask 用于NaNs 并将NaN 替换为ffillfillnamethod='ffill'

    b = a.mask(a.str.startswith('Unnamed')).ffill()
    print (b)
    RH=0.8                RH=0.8
    Unnamed: 2_level_0    RH=0.8
    Unnamed: 3_level_0    RH=0.8
    Unnamed: 4_level_0    RH=0.8
    RH=0.9                RH=0.9
    Unnamed: 6_level_0    RH=0.9
    Unnamed: 7_level_0    RH=0.9
    Unnamed: 8_level_0    RH=0.9
    dtype: object
    

    最后由from_arrays创建新的MultiIndex

    Cosp2.columns = pd.MultiIndex.from_arrays([b, Cosp2.columns.get_level_values(1)])
    print (Cosp2)
                 RH=0.8                                                  \
    nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)   
    0.00061      0.0840     0.117551              0.117550       0.0864   
    0.00122      0.0749     0.126468              0.126466       0.0770   
    0.00183      0.0754     0.124370              0.124367       0.0776   
    0.00244      0.0776     0.136591              0.136587       0.0799   
    0.00305      0.0873     0.141423              0.141418       0.0898   
    0.00366      0.0729     0.143599              0.143593       0.0750   
    
                 RH=0.9                                                  
    nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)  
    0.00061    0.128697     0.163304              0.163304     0.127553  
    0.00122    0.090500     0.200350              0.200350     0.089700  
    0.00183    0.085400     0.121275              0.121274     0.084600  
    0.00244    0.054500     0.100996              0.100994     0.054000  
    0.00305    0.075700     0.170033              0.170032     0.075000  
    0.00366    0.100018     0.165468              0.165466     0.099100  
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-10-28
      • 2020-12-15
      • 1970-01-01
      • 2018-12-13
      • 2012-12-31
      • 2018-05-03
      • 1970-01-01
      相关资源
      最近更新 更多