【问题标题】:Calculate z-score for multiple columns of dataset on groupby and transform to original shape in pandas without using loop在 groupby 上计算多列数据集的 z-score 并在不使用循环的情况下转换为 pandas 中的原始形状
【发布时间】:2021-03-08 13:56:57
【问题描述】:

我有一个数据框

df = pd.DataFrame([["A",1,98,56,61], ["B",1,99,54,36], ["C",1,97,32,83],["B",1,96,31,90], ["C",1,45,32,12], ["A",1,67,33,55], ["C",1,54,65,73], ["A",1,34,84,98], ["B",1,76,12,99]], columns=["id","date","c1","c2","c3"])

需要在“id”上使用 groupby 计算列“c1”、“c2”、“c3”的 Z-score,并在不使用循环的情况下将其转换为原始形式。

预期输出:

df_out = pd.DataFrame([["A",1,1.21179,-0.079921,-0.543442], ["B",1,0.84893,1.26172,-1.401826], ["C",1,1.395551,-0.707107,0.860437],["B",1,0.55507,-0.077644,0.539164], ["C",1,-0.89609,-0.707107,-1.402194], ["A",1,0.025511,-1.182827,-0.858988], ["C",1,-0.49946,1.414214,0.541757], ["A",1,-1.237301,1.262748,1.40243], ["B",1,-1.404,-1.184075,0.862662]], columns=["id","date","c1","c2","c3"])

怎么做?

【问题讨论】:

    标签: python python-3.x pandas dataframe


    【解决方案1】:

    GroupBy.transformDataFrame.join 一起使用:

    from scipy.stats import zscore
    
    df = df[['id','date']].join(df.groupby(['id','date']).transform(zscore))
    print (df)
      id  date        c1        c2        c3
    0  A     1  1.211790 -0.079921 -0.543442
    1  B     1  0.848930  1.261720 -1.401826
    2  C     1  1.395551 -0.707107  0.860437
    3  B     1  0.555070 -0.077644  0.539164
    4  C     1 -0.896090 -0.707107 -1.402194
    5  A     1  0.025511 -1.182827 -0.858988
    6  C     1 -0.499460  1.414214  0.541757
    7  A     1 -1.237301  1.262748  1.402430
    8  B     1 -1.404000 -1.184075  0.862662
    

    【讨论】:

      猜你喜欢
      • 2021-06-07
      • 2019-06-13
      • 2019-02-08
      • 2018-02-26
      • 1970-01-01
      • 2022-06-14
      • 1970-01-01
      • 2018-11-25
      • 1970-01-01
      相关资源
      最近更新 更多