【问题标题】:Calculate 2-month rolling average for group in Pandas计算 Pandas 组的 2 个月滚动平均值
【发布时间】:2022-01-05 20:03:50
【问题描述】:

我正在尝试使用以下 df 计算玩家“得分”的两个月滚动平均值:

df = pd.DataFrame({'player_id': [1098, 1098, 1098, 1098, 1116, 1116, 1116, 1116, 1116, 1116, 1116, 1116],
              'date': ['2018-06-22', '2018-06-23', '2018-07-24', '2018-07-25', 
                      '2018-07-22', '2018-07-23', '2018-07-24', '2018-07-25',
                      '2018-08-22', '2018-08-23', '2018-08-24', '2018-08-25'],
              'score': [-2,1,2,3,-8,3,2,-3,-2,1,2,3]})

我希望将平均值按“player_ID”列分组,以便玩家 1098 的每行的平均值分别为 -2、-0.5、0.33 和 1。如果 2020 年只有一个日期,那就是那个值。如果有一个日期在 2020 年 1 月和 2 月,则 1 月行将是该值,而 2 月行将是这两者的平均值。

【问题讨论】:

  • 您会使用现有数据演示两个月窗口的输出吗?

标签: python pandas dataframe moving-average


【解决方案1】:

您可以按'player_id' 分组并在'score' 列上使用expanding().mean() 方法:

df['rolling mean'] = df.groupby('player_id')['score'].expanding().mean().round(2).droplevel(0)

编辑:

鉴于 cmets 中的新信息,您可能需要rolling.mean 方法。从 OP 向数据框中添加了一些行,以更好地显示正在发生的事情。对于DataFramedf

df = pd.DataFrame({'player_id': [1098, 1098, 1098, 1098, 1098, 1098, 1098, 1098, 
                                 1116, 1116, 1116, 1116, 1116, 1116, 1116, 1116], 
                   'date': ['2018-06-22', '2018-06-23', '2018-07-24', '2018-07-25', 
                            '2019-06-22', '2019-06-25', '2019-07-25', '2020-06-22', 
                            '2018-07-22', '2018-07-23', '2018-07-24', '2018-07-25', 
                            '2018-08-22', '2018-08-23', '2018-08-24', '2018-08-25'], 
                   'score': [-2, 1, 2, 3, 7, 8, 6, 5, -8, 3, 2, -3, -2, 1, 2, 3]})

我们在这里找到每个'player_id' 60 天的滚动平均值:

df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by=['player_id','date'])
df['rolling_mean'] = df.set_index('date').groupby('player_id', sort=False)['score'].rolling('60D').mean().round(2).to_numpy()

输出:

    player_id       date  score  rolling_mean
0        1098 2018-06-22     -2         -2.00
1        1098 2018-06-23      1         -0.50
2        1098 2018-07-24      2          0.33
3        1098 2018-07-25      3          1.00
6        1098 2019-06-22      7          7.00
7        1098 2019-06-25      8          7.50
5        1098 2019-07-25      6          7.00
4        1098 2020-06-22      5          5.00
8        1116 2018-07-22     -8         -8.00
9        1116 2018-07-23      3         -2.50
10       1116 2018-07-24      2         -1.00
11       1116 2018-07-25     -3         -1.50
12       1116 2018-08-22     -2         -1.60
13       1116 2018-08-23      1         -1.17
14       1116 2018-08-24      2         -0.71
15       1116 2018-08-25      3         -0.25

【讨论】:

    【解决方案2】:

    不确定它是否是最优雅的,但给你:

    import pandas as pd
        
    #DataFrame you provided
    df = pd.DataFrame({'player_id': [1098, 1098, 1098, 1098, 1116, 1116, 1116, 1116, 1116, 1116, 1116, 1116],
                      'date': ['2018-06-22', '2018-06-23', '2018-07-24', '2018-07-25', 
                              '2018-07-22', '2018-07-23', '2018-07-24', '2018-07-25',
                              '2018-08-22', '2018-08-23', '2018-08-24', '2018-08-25'],
                      'score': [-2,1,2,3,-8,3,2,-3,-2,1,2,3]})
        
        #As best practice, set date strings to date type
        df['date'] = pd.to_datetime(df['date'])
        
        """
        Group by player and date, then add a rolling average with minimum level 1
        (meaning you can start rolling from the first row). The rolling function creates a series
        from your dataframe
        """
        rolling_average_series = df.groupby(by=['player_id','date']).rolling(1)['score'].mean()
    
    #Reset series as dataframe with .to_frame then reset index so that player_id and date are still columns
    df_grouped = rolling_average_series.to_frame().reset_index(level=['player_id','date'])
    

    为奇怪的格式道歉。

    【讨论】:

      【解决方案3】:
      df = pd.DataFrame({'player_id': [1098, 1098, 1098, 1098, 1116, 1116, 1116, 1116, 1116, 1116, 1116, 1116],
                    'date': ['2018-06-22', '2018-06-23', '2018-07-24', '2018-07-25', 
                            '2018-07-22', '2018-07-23', '2018-07-24', '2018-07-25',
                            '2018-08-22', '2018-08-23', '2018-08-24', '2018-08-25'],
                    'score': [-2,1,2,3,-8,3,2,-3,-2,1,2,3]})
      
      window=4
      min_periods=3
      cap=10
      
      def get_month(date):
          return date.month
      
      df=df.sort_values(by='date')
      df['date']=pd.to_datetime(df['date'])
      df['month']=df['date'].apply(get_month)
      df['elapsed_days'] = (df['date']-df['date'].shift(1)).dt.days
      df['accumulative_elapsed_days'] = df['elapsed_days'].cumsum()
      
      df['rolling_mean'] = df['score'].rolling(window, min_periods).mean().round(2)
      

      print(df[['date','score','month','elapsed_days','accum_elapsed_days','rolling_mean']]) df.set_index('date',inplace=True)

      df[['score','rolling_mean']].plot()
      plt.show() 
      

      输出:

               date  score  month  elapsed_days  accum_elapsed_days  rolling_mean
      0  2018-06-22     -2      6           NaN                 NaN           NaN
      1  2018-06-23      1      6           1.0                 1.0           NaN
      4  2018-07-22     -8      7          29.0                30.0         -3.00
      5  2018-07-23      3      7           1.0                31.0         -1.50
      2  2018-07-24      2      7           1.0                32.0         -0.50
      6  2018-07-24      2      7           0.0                32.0         -0.25
      3  2018-07-25      3      7           1.0                33.0          2.50
      7  2018-07-25     -3      7           0.0                33.0          1.00
      8  2018-08-22     -2      8          28.0                61.0          0.00
      9  2018-08-23      1      8           1.0                62.0         -0.25
      10 2018-08-24      2      8           1.0                63.0         -0.50
      11 2018-08-25      3      8           1.0                64.0          1.00
      

      【讨论】:

        猜你喜欢
        • 2015-12-10
        • 2019-09-09
        • 2018-05-08
        • 2021-10-04
        • 2022-01-11
        • 1970-01-01
        • 2019-12-16
        • 2019-08-15
        相关资源
        最近更新 更多