我认为您需要每个唯一的滚动方式 dates 添加排除日期移动 1 天。
根据定义,这里使用替代解决方案 - sum / count。
df1 = (df.groupby('date')['numbers']
.agg(['sum','size'])
.asfreq('d', fill_value=0)
.rolling(window=3, min_periods=1)
.sum())
df['av'] = df['date'].map(df1['sum'].div(df1['size']).shift())
print (df)
date numbers av
0 2022-01-01 1 NaN
1 2022-01-01 2 NaN
2 2022-01-01 3 NaN
3 2022-01-03 4 2.0
4 2022-01-03 7 2.0
5 2022-01-05 5 5.5
解释:
首先是聚合 sum 和 size 用于计数:
print (df.groupby('date')['numbers'].agg(['sum','size']))
sum size
date
2022-01-01 6 3
2022-01-03 11 2
2022-01-05 5 1
通过DataFrame.asfreq 添加了缺失的连续日期:
print (df.groupby('date')['numbers']
.agg(['sum','size'])
.asfreq('d', fill_value=0))
sum size
date
2022-01-01 6 3
2022-01-02 0 0
2022-01-03 11 2
2022-01-04 0 0
2022-01-05 5 1
每 3 天滚动使用 sum:
df1 = (df.groupby('date')['numbers']
.agg(['sum','size'])
.asfreq('d', fill_value=0)
.rolling(window=3, min_periods=1)
.sum())
print (df1)
sum size
date
2022-01-01 6.0 3.0
2022-01-02 6.0 3.0
2022-01-03 17.0 5.0
2022-01-04 11.0 2.0
2022-01-05 16.0 3.0
将 df1 的列除以求平均值:
print (df1['sum'].div(df1['size']))
date
2022-01-01 2.000000
2022-01-02 2.000000
2022-01-03 3.400000
2022-01-04 5.500000
2022-01-05 5.333333
Freq: D, dtype: float64
排除 Series.shift 一天的一天:
print (df1['sum'].div(df1['size']).shift())
date
2022-01-01 NaN
2022-01-02 2.0
2022-01-03 2.0
2022-01-04 3.4
2022-01-05 5.5
Freq: D, dtype: float64
最后用于新列使用Series.map:
print (df['date'].map(df1['sum'].div(df1['size']).shift()))
0 NaN
1 NaN
2 NaN
3 2.0
4 2.0
5 5.5
Name: date, dtype: float64