【问题标题】:Pandas - Iterate over a distinct value of a column and get mean()/std()Pandas - 遍历列的不同值并获取 mean()/std()
【发布时间】:2023-04-02 00:34:01
【问题描述】:

请协助我处理以下案例。

我有一个看起来像这样的 Pandas DataFrame:

DATE; NAME; COUNT1; COUNT2; COUNT3;
01; Name1; 14; 3; 6;
02; Name1; 17; 4; 2;
03; Name1; 11; 1; 9;
01; Name2; 14; 3; 6;
02; Name2; 17; 4; 2;
03; Name2; 11; 1; 9;
01; Name3; 14; 3; 6;
02; Name3; 11; 1; 9;
03; Name3; 17; 4; 2;

我需要获取每个名称(例如 COUNT3 个)的 mean() 和 std() 值。

结果应该是这样的:

Name1 std(): value
Name1 mean(): value
Name2 std(): value
Name2 mean(): value
Name3 std(): value
Name3 mean(): value

提前致谢!

【问题讨论】:

    标签: python loops pandas dataframe


    【解决方案1】:

    我认为您需要groupby 并通过agg 聚合:

    df1 = df.groupby('NAME')['COUNT3'].agg(['mean','std'])
    print (df1)
               mean       std
    NAME                     
    Name1  5.666667  3.511885
    Name2  5.666667  3.511885
    Name3  5.666667  3.511885
    

    如果你想改变输出格式添加unstack:

    df1 = df.groupby('NAME')['COUNT3']
            .agg(['mean','std'])
            .unstack()
            .swaplevel(0,1)
            .rename_axis(('names','aggreg'))
            .reset_index(name='val')
    print (df1)
       names aggreg       val
    0  Name1   mean  5.666667
    1  Name2   mean  5.666667
    2  Name3   mean  5.666667
    3  Name1    std  3.511885
    4  Name2    std  3.511885
    5  Name3    std  3.511885
    

    对于更多的列在一起:

    df2 = df.groupby('NAME')['COUNT1','COUNT2','COUNT3'].agg(['mean','std'])
    print (df2)
          COUNT1         COUNT2              COUNT3          
            mean  std      mean       std      mean       std
    NAME                                                     
    Name1     14  3.0  2.666667  1.527525  5.666667  3.511885
    Name2     14  3.0  2.666667  1.527525  5.666667  3.511885
    Name3     14  3.0  2.666667  1.527525  5.666667  3.511885
    

    df2 = df.groupby('NAME')['COUNT1','COUNT2','COUNT3']
            .agg(['mean','std'])
            .unstack()
            .reorder_levels((2,1,0))
            .rename_axis(('names','aggreg','columns'))
            .reset_index(name='val')
    print (df2)
        names aggreg columns        val
    0   Name1   mean  COUNT1  14.000000
    1   Name2   mean  COUNT1  14.000000
    2   Name3   mean  COUNT1  14.000000
    3   Name1    std  COUNT1   3.000000
    4   Name2    std  COUNT1   3.000000
    5   Name3    std  COUNT1   3.000000
    6   Name1   mean  COUNT2   2.666667
    7   Name2   mean  COUNT2   2.666667
    8   Name3   mean  COUNT2   2.666667
    9   Name1    std  COUNT2   1.527525
    10  Name2    std  COUNT2   1.527525
    11  Name3    std  COUNT2   1.527525
    12  Name1   mean  COUNT3   5.666667
    13  Name2   mean  COUNT3   5.666667
    14  Name3   mean  COUNT3   5.666667
    15  Name1    std  COUNT3   3.511885
    16  Name2    std  COUNT3   3.511885
    17  Name3    std  COUNT3   3.511885
    

    【讨论】:

      猜你喜欢
      • 2015-09-21
      • 1970-01-01
      • 2016-11-28
      • 1970-01-01
      • 2019-01-11
      • 2021-04-14
      • 2019-03-28
      • 2012-12-26
      • 1970-01-01
      相关资源
      最近更新 更多