【问题标题】:Z-score for multiple columns on groupby and concatenate to same dataset with same column name+"_zscore", tag less than -1 z-score valuesgroupby 上多列的 Z-score 并连接到具有相同列名+“_zscore”的同一数据集,标记小于 -1 z-score 值
【发布时间】:2021-06-07 10:40:41
【问题描述】:

我有一个数据框

df = pd.DataFrame([["A",1,98,56,61], ["B",1,99,54,36], ["C",1,97,32,83],["B",1,96,31,90], ["C",1,45,32,12], ["A",1,67,33,55], ["C",1,54,65,73], ["A",1,34,84,98], ["B",1,76,12,99]], columns=["id","date","c1","c2","c3"])

需要在“id”上使用 groupby 计算列“c1”、“c2”、“c3”的 Z 分数,并连接到具有相同列名 +“_zscore”的相同数据帧。并将其转换为原始形式。如果z-score值小于-1,标记为-1,其余为1,不使用循环标记同名列+“_tag”。

预期输出:

df_out = pd.DataFrame([["A",1,98,56,61,1.21179,-0.079921,-0.543442,1,1,1], ["B",1,99,54,36,0.84893,1.26172,-1.401826,1,1,-1], ["C",1,97,32,83,1.395551,-0.707107,0.860437,1,1,1],["B",1,96,31,90,0.55507,-0.077644,0.539164,1,1,1], ["C",1,45,32,12,-0.89609,-0.707107,-1.402194,1,1,-1], ["A",1,67,33,55,0.025511,-1.182827,-0.858988,1,-1,1], ["C",1,54,65,73,-0.49946,1.414214,0.541757,1,1,1], ["A",1,34,84,98,-1.237301,1.262748,1.40243,-1,1,1], ["B",1,76,12,99,-1.404,-1.184075,0.862662,-1,-1,1]], columns=["id","date","c1","c2","c3","c1_zscore","c2_zscore","c3_zscore","c1_tag","c2_tag","c3_tag"])

怎么做

【问题讨论】:

    标签: python python-3.x pandas python-2.7 dataframe


    【解决方案1】:

    试试groupby().transformscipy.stats.zscore

    from scipy.stats import zscore
    
    df.join(df.groupby('id')[['c1','c2','c3']]
              .transform(zscore).add_suffix('_zscore')
           )
    

    输出:

      id  date  c1  c2  c3  c1_zscore  c2_zscore  c3_zscore
    0  A     1  98  56  61   1.211790  -0.079921  -0.543442
    1  B     1  99  54  36   0.848930   1.261720  -1.401826
    2  C     1  97  32  83   1.395551  -0.707107   0.860437
    3  B     1  96  31  90   0.555070  -0.077644   0.539164
    4  C     1  45  32  12  -0.896090  -0.707107  -1.402194
    5  A     1  67  33  55   0.025511  -1.182827  -0.858988
    6  C     1  54  65  73  -0.499460   1.414214   0.541757
    7  A     1  34  84  98  -1.237301   1.262748   1.402430
    8  B     1  76  12  99  -1.404000  -1.184075   0.862662
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-04-27
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-08-21
      • 1970-01-01
      • 2018-06-09
      相关资源
      最近更新 更多