【问题标题】:Sorting and grouping pandas dataframe in a weird mannerSorting and grouping pandas dataframe in a weird manner
【发布时间】:2022-11-20 11:01:34
【问题描述】:

List = [0, 3, 2, 1]

This is the pandas dataframe i have.

TIME DATA DMMA CODE
0 0 days 00:00:00 Layer1 B
1 0 days 00:00:00 Layer2 S
2 0 days 00:08:00 Layer3 B
3 0 days 00:16:00 Layer4 S
4 0 days 00:24:00 Layer5 B
5 0 days 00:53:00 Layer6 S
6 0 days 01:16:00 Layer7 B
7 0 days 01:29:00 Layer8 S

This is the end-result im trying to get.

TIME DATA
0 0 days 00:00:00 , 0 days 00:00:00 Layer1 , Layer2
1 0 days 00:08:00 , 0 days 01:29:00 Layer3 , Layer8
2 0 days 00:24:00 , 0 days 00:53:00 Layer5 , Layer6
3 0 days 01:16:00 , 0 days 00:16:00 Layer7 , Layer4

Basically index of the element inside "List" is the index of datas that have DMMA CODE of "B" and the element itself is the index of the datas that have DMMA CODE of "S".

First element of the list tells that "First CODE:B should be grouped with the first CODE:S"

Second element of the list tells that "Second CODE:B should be grouped with the fourth CODE:S" etc.

I am not familiar with pandas and couldn't even find a way to approach this problem.

【问题讨论】:

  • How would you do this outside of pandas?

标签: python pandas list dataframe


【解决方案1】:

Your data is basically out of order, so let's re-order it:

info = [0, 3, 2, 1]
mask = df['DMMA CODE'].eq('S')
df.loc[mask] = df[mask].iloc[info].values

# Then we can get what you want fairly easily:
groups = df.groupby('DMMA CODE').cumcount()
out = df.groupby(groups).agg(list)
print(out)

Output:

                                 TIME              DATA DMMA CODE
0  [0 days 00:00:00, 0 days 00:00:00]  [Layer1, Layer2]    [B, S]
1  [0 days 00:08:00, 0 days 01:29:00]  [Layer3, Layer8]    [B, S]
2  [0 days 00:24:00, 0 days 00:53:00]  [Layer5, Layer6]    [B, S]
3  [0 days 01:16:00, 0 days 00:16:00]  [Layer7, Layer4]    [B, S]

【讨论】:

  • your solution is really neat!
【解决方案2】:

I have a suggestion to do the sorting using python dictionary, but there are some steps involving pandas as well. First I need to output a dictionary for each DMMA CODE "B" and "S", and each to have the natural order index 0,1,2,3. From the starting df

              TIME    DATA DMMA_CODE
0  0_days_00:00:00  Layer1         B
1  0_days_00:00:00  Layer2         S
2  0_days_00:08:00  Layer3         B
3  0_days_00:16:00  Layer4         S
4  0_days_00:24:00  Layer5         B
5  0_days_00:53:00  Layer6         S
6  0_days_01:16:00  Layer7         B
7  0_days_01:29:00  Layer8         S

Next I append the items according to the desired order, ie, for DMMA CODE "B" with order 0,1,2,3 whereas for DMMA CODE "S" with order 0,3,2,1.

import pandas as pd

df = df.sort_values('DMMA_CODE')
b = df[df['DMMA_CODE']=='B'].reset_index(drop=True).to_dict()
s = df[df['DMMA_CODE']=='S'].reset_index(drop=True).to_dict()

d = {}
time, data = [], []
order_list = [0,3,2,1]
for i, order in enumerate(order_list):
    time.append(b['TIME'][i] + ' , ' + s['TIME'][order])
    data.append(b['DATA'][i] + ' , ' + s['DATA'][order])
d['TIME'] = time
d['DATA'] = data
print(pd.DataFrame(d))

Output:

                                TIME             DATA
0  0_days_00:00:00 , 0_days_00:00:00  Layer1 , Layer2
1  0_days_00:08:00 , 0_days_01:29:00  Layer3 , Layer8
2  0_days_00:24:00 , 0_days_00:53:00  Layer5 , Layer6
3  0_days_01:16:00 , 0_days_00:16:00  Layer7 , Layer4

【讨论】:

    【解决方案3】:

    Here is an alternative way using sort_values()

    ndf = (df.sort_values('DMMA CODE')
    .sort_values('DMMA CODE',key = lambda x: [i for i in list(range(0,4)) + list(np.array(l))]))
    ndf.groupby(ndf['DMMA CODE'].eq('B').cumsum()).agg(', '.join)
    

    or

    df.loc[[j for i in zip(b,[i[-1] for i in sorted(zip(l,s))]) for j in i]].reset_index(drop=True).groupby(lambda x: x//2).agg(list)
    

    or

    df.groupby(df.groupby('DMMA CODE').cumcount().mask(df['DMMA CODE'].eq('S'),lambda x: x.map(dict(enumerate(l))))).agg(list)
    

    Output:

                                           TIME            DATA DMMA CODE
    DMMA CODE                                                            
    1          0 days 00:00:00, 0 days 00:00:00  Layer1, Layer2      B, S
    2          0 days 00:08:00, 0 days 01:29:00  Layer3, Layer8      B, S
    3          0 days 00:24:00, 0 days 00:53:00  Layer5, Layer6      B, S
    4          0 days 01:16:00, 0 days 00:16:00  Layer7, Layer4      B, S
    

    【讨论】:

      猜你喜欢
      • 2022-12-02
      • 2022-12-28
      • 2022-12-01
      • 2022-12-02
      • 2022-12-27
      • 2022-12-01
      • 2022-12-01
      • 2022-12-02
      • 2018-12-10
      相关资源
      最近更新 更多