【问题标题】:Split and Pivot a Data Frame拆分和透视数据框
【发布时间】:2015-08-04 18:33:52
【问题描述】:

我有一个带有以下值的熊猫数据框:

df1 = pd.DataFrame([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [2000, 2000, 2000, 5000, 2000, 5000, 2000, 5000, 2000, 5000], [0, 3, 0, 3, 0, 3, 0, 3, 0, 3], [233, 233, 96, 96, 53, 53, 29, 29, 24, 24], [251.109065, 251.109065, 300.141548, 412.916402, 291.836682, 394.260558, 327.351227, 478.924355, 371.598847, 574.811102], [18.858343, 18.858343, 67.874851, -127.405555, 58.692756, -148.001670, 95.252774, -63.949017, 136.983014, 26.888185]]).T


df1.columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']

df1 

   col1  col2  col3  col4  col5        col6        col7
0     2     1  2000     0   233  251.109065   18.858343
1     2     1  2000     3   233  251.109065   18.858343
2     2     2  2000     0    96  300.141548   67.874851
3     2     2  5000     3    96  412.916402 -127.405555
4     2     3  2000     0    53  291.836682   58.692756
5     2     3  5000     3    53  394.260558 -148.001670
6     2     4  2000     0    29  327.351227   95.252774
7     2     4  5000     3    29  478.924355  -63.949017
8     2     5  2000     0    24  371.598847  136.983014
9     2     5  5000     3    24  574.811102   26.888185

现在基于 col1 和 col2 的值的组合,我想将 col3 拆分为两个单独的列,其中的值来自 col4。并且基于此 col6 和 col7 也需要分别拆分为两个单独的列。所以我生成的数据框需要是这样的:

df2 = pd.DataFrame([[2, 2, 2, 2, 2], [1, 2, 3, 4, 5], [2000, 2000, 2000, 2000, 2000], [2000, 5000, 5000, 5000, 5000], [233, 96, 53, 29, 24], [251.109065, 300.141548, 291.836682, 327.351227, 371.598847], [251.109065, 412.916402, 394.260558, 478.924355, 574.811102], [18.858343, 67.874851, 58.692756, 95.252774, 136.983014], [18.858343, -127.405555, -148.00167, -63.949017, 26.888185]]).T


df2.columns = ['col1', 'col2', 'col3_0', 'col3_3', 'col5', 'col6_0', 'col6_3', 'col7_0', 'col7_3']

df2

   col1  col2  col3_0  col3_3  col5      col6_0      col6_3      col7_0      col7_3
0     2     1    2000    2000   233  251.109065  251.109065   18.858343   18.858343
1     2     2    2000    5000    96  300.141548  412.916402   67.874851 -127.405555
2     2     3    2000    5000    53  291.836682  394.260558   58.692756 -148.001670
3     2     4    2000    5000    29  327.351227  478.924355   95.252774  -63.949017
4     2     5    2000    5000    24  371.598847  574.811102  136.983014   26.888185

请注意,“0”和“3”是来自 col4 的值,它用作新列的后缀:col3_0、col3_3col6_0、col6_3、col7_0 和 col7_3。如果我可以提供有关拆分的任何进一步信息,请告诉我。非常感谢任何帮助。

【问题讨论】:

    标签: python-2.7 pandas pivot dataframe pivot-table


    【解决方案1】:
    res = pd.merge(df1[df1.col4 == 0].drop('col4', axis=1), df1[df1.col4 == 3].drop('col4', axis=1), on=['col1', 'col2', 'col5'], suffixes=['_0', '_3'])
    
       col1  col2  col3_0  col5    col6_0    col7_0  col3_3    col6_3    col7_3
    0     2     1    2000   233  251.1091   18.8583    2000  251.1091   18.8583
    1     2     2    2000    96  300.1415   67.8749    5000  412.9164 -127.4056
    2     2     3    2000    53  291.8367   58.6928    5000  394.2606 -148.0017
    3     2     4    2000    29  327.3512   95.2528    5000  478.9244  -63.9490
    4     2     5    2000    24  371.5988  136.9830    5000  574.8111   26.8882
    
    # to sort columns
    res.T.sort_index().T
    
       col1  col2  col3_0  col3_3  col5    col6_0    col6_3    col7_0    col7_3
    0     2     1    2000    2000   233  251.1091  251.1091   18.8583   18.8583
    1     2     2    2000    5000    96  300.1415  412.9164   67.8749 -127.4056
    2     2     3    2000    5000    53  291.8367  394.2606   58.6928 -148.0017
    3     2     4    2000    5000    29  327.3512  478.9244   95.2528  -63.9490
    4     2     5    2000    5000    24  371.5988  574.8111  136.9830   26.8882
    

    【讨论】:

      【解决方案2】:

      您可以使用简单的合并来完成此操作

      df1_0 = df1[df1.col4==0].drop('col4',axis=1)
      df1_3 = df1[df1.col4==3].drop('col4',axis=1)
      
      result = pandas.merge( df1_0, df1_3, on=['col1','col2'],suffixes=['_0','_3']  )
      result = result[sorted(list(result))] # to get columns in the order you like
      
         col1  col2  col3_0  col3_3  col5      col6_0      col6_3      col7_0  \
      0     2     1    2000    2000   233  251.109065  251.109065   18.858343   
      1     2     2    2000    5000    96  300.141548  412.916402   67.874851   
      2     2     3    2000    5000    53  291.836682  394.260558   58.692756   
      3     2     4    2000    5000    29  327.351227  478.924355   95.252774   
      4     2     5    2000    5000    24  371.598847  574.811102  136.983014   
      
             col7_3  
      0   18.858343  
      1 -127.405555  
      2 -148.001670  
      3  -63.949017  
      4   26.888185 
      

      【讨论】:

        猜你喜欢
        • 2016-12-30
        • 2018-02-23
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-06-10
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多