【问题标题】:Reshaping pandas dataframe (transpose rows by columns)重塑熊猫数据框(按列转置行)
【发布时间】:2021-09-27 05:48:12
【问题描述】:

我是学习 Python 的新手,并希望在 R 上实现重塑它的方式。

我目前有 d1 形式的数据框,想将其转换为 d2:

d1:列是:Country、Movie_type(值为 Action、Romance 或 Drama)、Jan-20、Feb-20 和 Mar-20。

d2:列是:国家、月-年(值是 Jan-20、Feb-20 或 Mar-20)、动作、戏剧、浪漫......

(请注意,Drama 和 Romance 的列不显示 d1 的实际计数,只有 Action 显示)。

d1 = pd.DataFrame({

    "Country" : {0: "Australia",1: "Australia", 2:"Australia", 3:"Canada", 4:"Canada", 5:"Israel", 6:"India",7: "India", 8:"Poland", 9: "Zambia"},
    "Movie_type" : {0: "Action",1:"Drama",2: "Romance",3: "Romance",4: "Action",5: "Action",6: "Action",7: "Drama",8: "Drama",9:"Action"},
    "Jan-20" :{0: 0,1: 1,2: 2,3: 3,4: 0,5: 3,6: 2,7: 0,8: 2,9: 5},
    "Feb-20" :{0: 2,1: 3,2: 2,3: 1,4: 0,5: 0,6: 2,7: 4,8: 2,9: 7},
    "Mar-20" :{0: 1,1: 5,2: 2,3: 7,4: 4,5: 8,6: 2,7: 4,8: 5,9: 9}
}
)



d2 = pd.DataFrame({

    "Country" : {0: "Australia",1: "Australia", 2:"Australia", 3:"Canada", 4:"Canada", 5:"Canada", 6:"Israel",7: "Israel", 8:"Israel", 9: "India", 10: "India", 11: "India", 12: "Poland", 13: "Poland", 14: "Poland", 15: "Zambia", 16: "Zambia", 17: "Zambia"},
    "Month_year" : {0: "Jan-20",1: "Feb-20", 2:"Mar-20", 3: "Jan-20",4: "Feb-20", 5:"Mar-20",6: "Jan-20",7: "Feb-20", 8:"Mar-20",9: "Jan-20",10: "Feb-20", 11:"Mar-20",12: "Jan-20",13: "Feb-20", 14:"Mar-20",15: "Jan-20",16: "Feb-20", 17:"Mar-20"},
    "Action" :{0: 0,1: 0,2: 0,3: 0,4: 0,5: 0,6: 3,7: 0,8: 0,9: 2,10:0,11:0,12:0,13:0,14:0,15:5,16:0,17:0},
    "Drama" :{0: 0,1: 0,2: 0,3: 0,4: 0,5: 0,6: 0,7: 0,8: 0,9: 0,10:0,11:0,12:0,13:0,14:0,15:0,16:0,17:0},
    "Romance" :{0: 0,1: 0,2: 0,3: 0,4: 0,5: 0,6: 0,7: 0,8: 0,9: 0,10:0,11:0,12:0,13:0,14:0,15:0,16:0,17:0}
}
)

如何将 Movie_types 转换为列名,并将 MMM-YY(当前在 d1 中作为列名)转换为行的差异值?如果这是一个堆栈/取消堆栈/枢轴或重塑问题,我很乐意接受教育。

【问题讨论】:

    标签: python pandas dataframe reshape


    【解决方案1】:

    使用DataFrame.meltDataFrame.pivot,正确的顺序是将值转换为日期时间:

    df = (d1.melt(['Country','Movie_type'], 
                 var_name='Month_year')
            .assign(Month_year = lambda x: pd.to_datetime(x['Month_year'], format='%b-%y'))
            .pivot(['Country','Month_year'], 'Movie_type','value')
            .fillna(0)
            .astype(int)
            .rename(index = lambda x: x.strftime('%b-%y'), level=1)
            .reset_index()
            )
    print (df)
    Movie_type    Country Month_year  Action  Drama  Romance
    0           Australia     Jan-20       0      1        2
    1           Australia     Feb-20       2      3        2
    2           Australia     Mar-20       1      5        2
    3              Canada     Jan-20       0      0        3
    4              Canada     Feb-20       0      0        1
    5              Canada     Mar-20       4      0        7
    6               India     Jan-20       2      0        0
    7               India     Feb-20       2      4        0
    8               India     Mar-20       2      4        0
    9              Israel     Jan-20       3      0        0
    10             Israel     Feb-20       0      0        0
    11             Israel     Mar-20       8      0        0
    12             Poland     Jan-20       0      2        0
    13             Poland     Feb-20       0      2        0
    14             Poland     Mar-20       0      5        0
    15             Zambia     Jan-20       5      0        0
    16             Zambia     Feb-20       7      0        0
    17             Zambia     Mar-20       9      0        0
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-11-03
      • 1970-01-01
      • 1970-01-01
      • 2017-08-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多