【问题标题】:Breaking the dataframe when particular string is found and creating multiple dataframes from the same找到特定字符串时打破数据框并从同一字符串创建多个数据框
【发布时间】:2023-01-12 19:33:23
【问题描述】:

我拥有的数据格式如下:

col_1         col_2                            col_3

NaN            NaN                              NaN
Date         21-04-2022                         NaN
Id            Name                            status
01            A11                              Pass
02            A22                              F_1
03            A33                              P_2
SUMMARY    'Total :$20  Approved $ 10'         NaN
NaN            NaN                             NaN
Date         22-04-2022                        NaN
Id            Name                           status
04            A12                              P_2
05            A23                              F_1
06            A34                              P_2
SUMMARY    'Total :$30  Approved $ 20'         NaN

预期输出: df_1 -

Id            Name                            status
01            A11                              Pass
02            A22                              F_1
03            A33                              P_2
SUMMARY    'Total :$20  Approved $ 10'         NaN

df_2 -

Id            Name                           status
04            A12                              P_2
05            A23                              F_1
06            A34                              P_2
SUMMARY    'Total :$30  Approved $ 20'         NaN

以上只是示例数据。我拥有的实际列数约为 24K。因此将创建许多 df 如何接近它..?

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    您可以创建一个辅助的布尔值列,并使用它来将数据框分成更小的部分:

    import pandas as pd
    df = pd.DataFrame({'col_1': [1,2,'Id',3,4,5,'SUMMARY',1,2,'Id',3,4,5,'SUMMARY']})
    
    mask = df['col_1'].eq('Id') | df['col_1'].eq('SUMMARY').shift()
    df['group_id'] = mask.cumsum()
    dfs = list()
    for group_id in df['group_id'].unique():
        if group_id % 2 != 0:
            dfs.append(df[df['group_id'].eq(group_id)])
    
    print(dfs[0])
    print(dfs[1])
    

    【讨论】:

      猜你喜欢
      • 2016-10-14
      • 2018-11-03
      • 1970-01-01
      • 2022-10-12
      • 1970-01-01
      • 2018-05-27
      • 2023-01-08
      • 2021-11-29
      • 1970-01-01
      相关资源
      最近更新 更多