如何用 pandas 中两个日期之间计算的值填充一列？答案

【问题标题】：How can I fill a column with values that are computed between two dates in pandas?如何用 pandas 中两个日期之间计算的值填充一列？
【发布时间】：2021-12-26 01:02:28
【问题描述】：

我有这个数据框：

Date	Position	TrainerID	Win%
2017-09-03	4	1788	0 (0 wins, 1 race)
2017-09-16	5	1788	0 (0 wins, 2 races)
2017-10-14	1	1788	33 (1 win, 3 races)

我想在Win% 列的每一行上计算过去 1000 天比赛的获胜百分比，如上所述。

我尝试过这样的事情：

def compute_winning_percentage(a, b):
    return (a / b)*100

featured_data['Percentage win of trainer in the last 1000 days'] = featured_data.groupby('TrainerID').apply(
    compute_winning_percentage(len(featured_data.loc[featured_data.Position == 1]),
                               featured_data[featured_data.Position].cumcount()))

但我收到一个错误，我不知道如何插入 过去 1000 天部分。

我该怎么做？

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

创建一个指标列来表示获胜，然后将指标列按TrainerID分组并应用rollingmean计算获胜百分比，最后merge计算的百分比列与原始数据框

# Create indicator column
df['win'] = df['Position'].eq(1) 

# Groupby and calculate rolling mean on indicator column
w = df.groupby('TrainerID').rolling('1000D', on='Date')['win'].mean().mul(100)

# Merge the result back to dataframe
df_new = df.merge(w.reset_index(name='Win_%'), on=['TrainerID', 'Date'])

>>> df_new

        Date  Position  TrainerID    win      Win_%
0 2017-09-03         4       1788  False   0.000000
1 2017-09-16         5       1788  False   0.000000
2 2017-10-14         1       1788   True  33.333333

【讨论】：

这正是我想要的！