【问题标题】:Pandas: Rolling average starts again on new Multi-Index valuePandas:滚动平均在新的多指数值上重新开始
【发布时间】:2021-07-08 08:44:24
【问题描述】:

我有以下数据框:

df = pd.DataFrame({'Team':['A','A','A','A','B','B','B','B'],
                   'Date':list(pd.date_range(start='1/1/2021', periods=8)),
                   'Score':[7,3,3,6,7,3,7,5],
                  }).set_index(['Team', 'Date'])

我想添加一个滚动平均列,该列在 0 级索引为新值时重置。以下简单代码不起作用,因为滚动平均值在索引值之间结转:

df['Avg'] = df['Score'].rolling(window=2).mean()


                 Score  Avg
Team Date                  
A    2021-01-01      7  NaN
     2021-01-02      3  5.0
     2021-01-03      3  3.0
     2021-01-04      6  4.5
B    2021-01-05      7  6.5
     2021-01-06      3  5.0
     2021-01-07      7  5.0
     2021-01-08      5  6.0

我怎样才能得到下面的Dataframe?:

                 Score  Avg
Team Date                  
A    2021-01-01      7  NaN
     2021-01-02      3  5.0
     2021-01-03      3  3.0
     2021-01-04      6  4.5
B    2021-01-05      7  NaN
     2021-01-06      3  5.0
     2021-01-07      7  5.0
     2021-01-08      5  6.0

谢谢

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    使用df.group_by(分配给新列时使用df.values):

    df['Avg'] = df.groupby('Team').rolling(window=2).mean().values
    

    生产

                     Score  Avg
    Team Date                  
    A    2021-01-01      7  NaN
         2021-01-02      3  5.0
         2021-01-03      3  3.0
         2021-01-04      6  4.5
    B    2021-01-05      7  NaN
         2021-01-06      3  5.0
         2021-01-07      7  5.0
         2021-01-08      5  6.0
    

    【讨论】:

      【解决方案2】:

      level='Team'droplevel 上使用groupby rolling mean 以正确对齐索引:

      df['Avg'] = (
          df.groupby(level='Team')['Score'].rolling(window=2).mean().droplevel(0)
      )
      

      df:

                       Score  Avg
      Team Date                  
      A    2021-01-01      7  NaN
           2021-01-02      3  5.0
           2021-01-03      3  3.0
           2021-01-04      6  4.5
      B    2021-01-05      7  NaN
           2021-01-06      3  5.0
           2021-01-07      7  5.0
           2021-01-08      5  6.0
      

      droplevel 比使用values 的好处是索引将正确对齐。

      给定一个无序的 DataFrame,例如:

      df = pd.DataFrame({'Team': ['B', 'B', 'B', 'B', 'A', 'A', 'A', 'A'],
                         'Date': list(pd.date_range(start='1/1/2021', periods=8)),
                         'Score': [7, 7, 7, 8, 1, 2, 1, 2],
                         }).set_index(['Team', 'Date'])
      

      df:

                       Score
      Team Date             
      B    2021-01-01      7
           2021-01-02      7
           2021-01-03      7
           2021-01-04      8
      A    2021-01-05      1
           2021-01-06      2
           2021-01-07      1
           2021-01-08      2
      

      注意droplevelvalues 之间的区别:

      df['drop_level'] = (
          df.groupby(level='Team')['Score'].rolling(window=2).mean().droplevel(0)
      )
      df['values'] = (
          df.groupby(level='Team')['Score'].rolling(window=2).mean().values
      )
      
                       Score  drop_level  values
      Team Date                                 
      B    2021-01-01      7         NaN     NaN
           2021-01-02      7         7.0     1.5
           2021-01-03      7         7.0     1.5
           2021-01-04      8         7.5     1.5  # These are the averages from A
      A    2021-01-05      1         NaN     NaN
           2021-01-06      2         1.5     7.0  # These are the averages from B
           2021-01-07      1         1.5     7.0
           2021-01-08      2         1.5     7.5
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2019-06-16
        • 2021-01-07
        • 1970-01-01
        • 2019-08-02
        • 2022-01-18
        • 2018-05-08
        相关资源
        最近更新 更多