【问题标题】:Iterate over dataframes and merge by conditions遍历数据框并按条件合并
【发布时间】:2021-03-31 04:10:53
【问题描述】:

我要数据框

  id-input     id-output       Date         Price   Type 
    1            3           20/09/2020      100     ABC
    2            1           20/09/2020      200     ABC
    2            1           21/09/2020      300     ABC
    1            3           21/09/2020      50      AD
    1            2           21/09/2020      40      AD
 

我想得到这个输出:

    id-inp-ABC  id-out-ABC  Date-ABC    Price-ABC Type-ABC  id-inp-AD   id-out-AD     Date-AD     Price-AD  Type-AD 
      
      1          3          20/09/2020     10        ABC       2            1       20/09/2020        10     AD 
      1'         3          20/09/2020     90        ABC       Nan          Nan        Nan            Nan    Nan
      2          1          20/09/2020     40        ABC       1            2       21/09/2020        40     AD
      2'         1          20/09/2020     160       ABC       Nan         Nan           Nan         Nan     Nan
      2          1          21/09/2020     300       ABC       Nan         Nan           Nan         Nan     Nan
    

我的想法是:

-将数据框按类型分为两个数据框 - 遍历两个数据帧并检查是否相同的 id-input == id-output

-检查价格是否相等,如果不拆分行并提取价格。 重命名列并合并它们。

grp = df.groupby('type')

transformed_df_list = []

for idx, frame in grp:
frame.reset_index(drop=True, inplace=True)
transformed_df_list.append(frame.copy())
ABC = pd.DataFrame([transformed_df_list[0])
AD =  pd.DataFrame([transformed_df_list[1])
for i , row in ABC.iterrows(): 
    for i, row1 in AD.iterrows(): 
        if row['id-inp'] == row1['id-out']:2
            row_df = pd.DataFrame([row1])
            row_df= row_df.rename(columns={'id-inp': 'id-inp-AD', 'id-out':'id-out-AD' , 'Date':'Date-AD' ,'price':'price-AD'})
            output = pd.merge(ABC.set_index('id-inp' , drop =False) ,row_df.set_index('id-out-AD' , drop =False),  how='left' , left_on =['id-inp'] ,  right_on =['id-inp-AD' ])

但结果是 id-inp-AD id-out-AD Date-AD Price-AD Type-AD 部分中的 Nan, 而 row_df 只包含最后一行:

1            2           21/09/2020      40      A

我还希望迭代尊重顺序,并且输出数据框中的每个插入都按日期排序。

【问题讨论】:

    标签: python dataframe


    【解决方案1】:

    解决问题最优雅的方法是使用pandas.DataFrame.pivot。您最终会得到多级列名,而不是单级。如果需要将DataFrame转回单级列名,请查看第二个答案here

    import pandas as pd
    
    input = [
        [1, 3, '20/09/2020', 100, 'ABC'],
        [2, 1, '20/09/2020', 200, 'ABC'],
        [2, 1, '21/09/2020', 300, 'ABC'],
        [1, 3, '21/09/2020', 50, 'AD'],
        [1, 2, '21/09/2020', 40, 'AD']
    ]
    df = pd.DataFrame(data=input, columns=["id-input", "id-output", "Date", "Price", "Type"])
    df_pivot = df.pivot(columns=["Type"])
    print(df_pivot)
    

    输出

         id-input      id-output             Date              Price      
    Type      ABC   AD       ABC   AD         ABC          AD    ABC    AD
    0         1.0  NaN       3.0  NaN  20/09/2020         NaN  100.0   NaN
    1         2.0  NaN       1.0  NaN  20/09/2020         NaN  200.0   NaN
    2         2.0  NaN       1.0  NaN  21/09/2020         NaN  300.0   NaN
    3         NaN  1.0       NaN  3.0         NaN  21/09/2020    NaN  50.0
    4         NaN  1.0       NaN  2.0         NaN  21/09/2020    NaN  40.0
    

    【讨论】:

      猜你喜欢
      • 2021-01-19
      • 1970-01-01
      • 2020-01-19
      • 2020-04-21
      • 1970-01-01
      • 2019-09-29
      • 1970-01-01
      • 1970-01-01
      • 2020-03-02
      相关资源
      最近更新 更多