pandas groupby 然后按日期过滤以获得平均值答案

【问题标题】：pandas groupby then filter by date to get meanpandas groupby 然后按日期过滤以获得平均值
【发布时间】：2021-07-13 04:34:18
【问题描述】：

使用 pandas 数据框，我试图根据 CustId 获取每行（不包括当前行本身）在过去 90 天内的平均购买次数，然后添加一个新列“PurchaseMeanLast90Days”。

这是我试过的代码，不正确：

group = df.groupby(['CustId'])
df['PurchaseMeanLast90Days'] = group.apply(lambda g: g[g['Date'] > (pd.DatetimeIndex(g['Date']) + pd.DateOffset(-90))])['Purchases'].mean()

这是我的数据：

Index	CustId	Date	Purchases
0	1	1/01/2021	5
1	1	1/12/2021	1
2	1	3/28/2021	2
3	1	4/01/2021	4
4	1	4/20/2021	2
5	1	5/01/2021	5
6	2	1/01/2021	1
7	2	2/01/2021	1
8	2	3/01/2021	2
9	2	4/01/2021	3

例如，行索引 5 将在其 mean() = 3.33 中包含这些行

Index	CustId	Date	Purchases
2	1	3/28/2021	2
3	1	4/01/2021	4
4	1	4/20/2021	2

新的数据框看起来像这样（我没有计算 CustId=2）：

Index	CustId	Date	Purchases	PurchaseMeanLast90Days
0	1	1/09/2021	5	0
1	1	1/12/2021	1	5
2	1	3/28/2021	2	3
3	1	4/01/2021	4	2.67
4	1	4/20/2021	2	3.0
5	1	5/01/2021	5	3.33
6	2	1/01/2021	1	...
7	2	2/01/2021	1	...
8	2	3/01/2021	2	...
9	2	4/01/2021	3	...

【问题讨论】：

您确定索引4 计算正确吗？ 4/20/2021 和 1/12/2021 之间是 98 天。所以应该是3.0，而不是2.33
谢谢，已修复错误

标签： python pandas dataframe filter mean

【解决方案1】：

您可以进行滚动计算：

df["Date"] = pd.to_datetime(df["Date"], dayfirst=False)
df["PurchaseMeanLast90Days"] = (
    (
        df.groupby("CustId")
        .rolling("90D", min_periods=1, on="Date", closed="both")["Purchases"]
        .apply(lambda x: x.shift(1).sum() / (len(x) - 1))
    )
    .fillna(0)
    .values
)
print(df)

打印：

   Index  CustId       Date  Purchases  PurchaseMeanLast90Days
0      0       1 2021-01-01          5                0.000000
1      1       1 2021-01-12          1                5.000000
2      2       1 2021-03-28          2                3.000000
3      3       1 2021-04-01          4                2.666667
4      4       1 2021-04-20          2                3.000000
5      5       1 2021-05-01          5                2.666667
6      6       2 2021-01-01          1                0.000000
7      7       2 2021-02-01          1                1.000000
8      8       2 2021-03-01          2                1.000000
9      9       2 2021-04-01          3                1.333333

【讨论】：