【问题标题】:Reshape dataframe using pandas melt to get two value columns使用 pandas melt 重塑数据框以获得两个值列
【发布时间】:2022-01-16 18:08:36
【问题描述】:

我的数据格式如下

x = pd.DataFrame([
    {'date': '2011-01-01', 'col1': '1','col2': '5', 'A_Q': '1', 'A_W': 'aa', 'B_Q': '2', 'B_W': 'zz'},
    {'date': '2011-01-02', 'col1': '1','col2': '9', 'A_Q': '-1', 'A_W': 'bb', 'B_Q': '3', 'B_W': 'rr'},
    {'date': '2011-01-03', 'col1': '3','col2': '3', 'A_Q': '0', 'A_W': 'cc', 'B_Q': '4', 'B_W': 'vv'},
    {'date': '2011-02-04', 'col1': '4','col2': '1', 'A_Q': '3', 'A_W': 'dd', 'B_Q': '5', 'B_W': 'gg'},
])
    date      col1 col2 A_Q A_W  B_Q B_W
0   2011-01-01  1    5   1   aa   2  zz
1   2011-01-02  1    9   -1  bb   3  rr
2   2011-01-03  3    3   0   cc   4  vv
3   2011-02-04  4    1   3   dd   5  gg

我想使用 melt 或类似功能重塑数据框,并带有两个输出值列。关于如何在不拆分输入数组的情况下执行此操作的任何想法?


     date     col1 col2 VAR Q   W
0   2011-01-01  1   5   A   1   aa
1   2011-01-01  1   5   B   2   zz
2   2011-01-02  1   9   A   -1  bb
3   2011-01-02  1   9   B   3   rr
4   2011-01-03  3   3   A   0   cc
5   2011-01-03  3   3   B   4   vv
6   2011-01-04  4   1   A   3   dd
7   2011-01-04  4   1   B   5   gg

【问题讨论】:

    标签: python pandas melt


    【解决方案1】:

    第一个想法是通过使用_ 拆分列创建MultiIndex 并通过DataFrame.stack 重塑:

    df = x.set_index(['date','col1', 'col2'])
    df.columns = df.columns.str.split('_', expand=True)
    df = df.stack(0).reset_index().rename(columns={'level_3':'VAR'})
    print (df)
             date col1 col2 VAR   Q   W
    0  2011-01-01    1    5   A   1  aa
    1  2011-01-01    1    5   B   2  zz
    2  2011-01-02    1    9   A  -1  bb
    3  2011-01-02    1    9   B   3  rr
    4  2011-01-03    3    3   A   0  cc
    5  2011-01-03    3    3   B   4  vv
    6  2011-02-04    4    1   A   3  dd
    7  2011-02-04    4    1   B   5  gg
    

    或者使用wide_to_long 与rshape stackSeries.unstack

    df = (pd.wide_to_long(x,stubnames=["A","B"],
                           i=['date','col1', 'col2'],
                           j="new", sep="_",suffix=".*")
            .stack()
            .unstack(-2)
            .reset_index()
            .rename(columns={'level_3':'VAR'}))
    print (df)
    new        date col1 col2 VAR   Q   W
    0    2011-01-01    1    5   A   1  aa
    1    2011-01-01    1    5   B   2  zz
    2    2011-01-02    1    9   A  -1  bb
    3    2011-01-02    1    9   B   3  rr
    4    2011-01-03    3    3   A   0  cc
    5    2011-01-03    3    3   B   4  vv
    6    2011-02-04    4    1   A   3  dd
    7    2011-02-04    4    1   B   5  gg
    

    或者在first和last到last和first之间用_交换值,所以只需要wide_to_long

    df1 = x.copy()
    df1.columns = df1.columns.str.replace(r'(\w+)_(\w+)', r'\2_\1', regex=True)
    
    #thank you sammywemmy for alternative
    df1.columns = df1.columns.str.split('_').str[::-1].str.join('_')
    
    df1 = pd.wide_to_long(df1,stubnames=["Q","W"],
                            i=['date','col1', 'col2'],
                            j="VAR", sep="_",suffix=".*").reset_index()
    print (df1)
             date col1 col2 VAR   Q   W
    0  2011-01-01    1    5   A   1  aa
    1  2011-01-01    1    5   B   2  zz
    2  2011-01-02    1    9   A  -1  bb
    3  2011-01-02    1    9   B   3  rr
    4  2011-01-03    3    3   A   0  cc
    5  2011-01-03    3    3   B   4  vv
    6  2011-02-04    4    1   A   3  dd
    7  2011-02-04    4    1   B   5  gg
    

    【讨论】:

    • 在我看来,一个更简单的选择是:x.columns.str.split('_').str[::-1].str.join('_') ... 而不是正则表达式
    【解决方案2】:

    一个选项是pivot_longer 来自pyjanitor

    # pip install pyjanitor
    import pandas as pd
    import janitor
    x.pivot_longer(index = ['date', 'col1', 'col2'], 
                   names_to = ('VAR', '.value'), 
                   names_sep='_', 
                   sort_by_appearance=True)
     
             date col1 col2 VAR   Q   W
    0  2011-01-01    1    5   A   1  aa
    1  2011-01-01    1    5   B   2  zz
    2  2011-01-02    1    9   A  -1  bb
    3  2011-01-02    1    9   B   3  rr
    4  2011-01-03    3    3   A   0  cc
    5  2011-01-03    3    3   B   4  vv
    6  2011-02-04    4    1   A   3  dd
    7  2011-02-04    4    1   B   5  gg
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-05-12
      • 1970-01-01
      • 2014-07-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-12-10
      • 1970-01-01
      相关资源
      最近更新 更多