【问题标题】:Groupby year-month and find top N smallest standard deviation values columns in PythonGroupby年月并在Python中找到前N个最小标准差值列
【发布时间】:2022-01-03 10:02:41
【问题描述】:

使用下面的示例数据和代码,我正在尝试按年分组并在所有以_values 结尾的列中找到具有最小标准值的 top K 列:

import pandas as pd
import numpy as np
from statistics import stdev

np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 6)), index=dates, columns=['A_values', 'B_values', 'C_values', 'D_values', 'E_values', 'target'])

k = 3    # set k as 3

value_cols = df.columns[df.columns.str.endswith('_values')]

def find_topK_smallest_std(group):
    std = stdev(group[value_cols])
    cols = std.nsmallest(k).index
    out_cols = [f'std_{i+1}' for i in range(k)]
    rv = group.loc[:, cols]
    rv.columns = out_cols
    return rv

df.groupby(pd.Grouper(freq='M'), dropna=False).apply(find_topK_smallest_std)

但是它会引发类型错误,我该如何解决这个问题?提前致以诚挚的感谢。

输出:

TypeError: can't convert type 'str' to numerator/denominator

参考链接:

Groupby year-month and find top N smallest values columns in Python

【问题讨论】:

    标签: python-3.x pandas numpy statistics


    【解决方案1】:

    在您的解决方案中,为每列添加stdev DataFrame.apply,如果需要每行添加axis=1

    def find_topK_smallest_std(group):
        #procssing per columns
        std = group[value_cols].apply(stdev)
        cols = std.nsmallest(k).index
        out_cols = [f'std_{i+1}' for i in range(k)]
        rv = group.loc[:, cols]
        rv.columns = out_cols
        return rv
    
    df = df.groupby(pd.Grouper(freq='M'), dropna=False).apply(find_topK_smallest_std)
    print (df)
                   std_1     std_2     std_3
    2013-02-26  7.333694  3.126731  1.389472
    2013-02-27  7.529254  7.843101  6.621605
    2013-02-28  6.165574  5.612724  0.866300
    2013-03-01  5.693051  3.711608  4.521452
    2013-03-02  7.322250  4.763135  5.178144
                 ...       ...       ...
    2013-05-22  8.795736  3.864723  6.316478
    2013-05-23  7.959282  5.140268  1.839659
    2013-05-24  5.412016  5.890717  9.081583
    2013-05-25  1.088414  1.610210  9.016004
    2013-05-26  4.930571  6.893207  2.338785
    
    [90 rows x 3 columns]
    

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2023-04-10
    • 2018-03-12
    • 2021-09-06
    • 2020-04-16
    • 2016-05-10
    • 1970-01-01
    • 2012-08-16
    相关资源
    最近更新 更多