【问题标题】:Pandas rolling over days and getting sum熊猫翻滚几天并得到总和
【发布时间】:2020-07-25 09:35:35
【问题描述】:

这是我的数据框

d= {'dates': ['2020-07-16','2020-07-15','2020-07-14','2020-07-13','2020-07-16','2020-07-15','2020-07-14','2020-07-13'], 
    'location':['Paris','Paris','Paris','Paris','NY','NY','NY','NY'],'T':[100,200,300,400,10,20,30,40]} 
df = pandas.DataFrame(data=d)
df['dates']=pandas.to_datetime(df['dates'])
df
    dates   location    T
0   2020-07-16  Paris   100
1   2020-07-15  Paris   200
2   2020-07-14  Paris   300
3   2020-07-13  Paris   400
4   2020-07-16  NY       10
5   2020-07-15  NY       20
6   2020-07-14  NY       30
7   2020-07-13  NY       40

我想为过去 2 天(包括当前日期)滚动的给定位置获取一些 T 值。 这是我想要的熊猫:

    dates   location    T     SUM2D
0   2020-07-16  Paris   100     300
1   2020-07-15  Paris   200     500
2   2020-07-14  Paris   300     700
3   2020-07-13  Paris   400     NaN
4   2020-07-16  NY       10      30
5   2020-07-15  NY       20      50
6   2020-07-14  NY       30      70
7   2020-07-13  NY       4      NaN

我试过玩这句话没有成功:

df['SUM2D'] = df.set_index('dates').groupby('location').rolling(window=2, freq='D').sum()['T'].values

【问题讨论】:

    标签: python pandas dataframe pandas-groupby


    【解决方案1】:

    尝试在索引之前对数据帧进行排序:

    df = df.sort_values(['location','dates']).set_index('dates')
    df['SUM2D'] = df.groupby('location')['T'].rolling(window=2, freq='D').sum().values
    
    df[::-1]
    

    结果集:

               location    T  SUM2D
    dates                          
    2020-07-16    Paris  100  300.0
    2020-07-15    Paris  200  500.0
    2020-07-14    Paris  300  700.0
    2020-07-13    Paris  400    NaN
    2020-07-16       NY   10   30.0
    2020-07-15       NY   20   50.0
    2020-07-14       NY   30   70.0
    2020-07-13       NY   40    NaN
    

    更简洁优雅的解决方案是使用transform:

    df['SUM2D'] = df.sort_values(['dates']).groupby('location')['T'].transform(lambda x: x.rolling(2, 2).sum())
    

    现在的结果是:

           dates location    T  SUM2D
    0 2020-07-16    Paris  100  300.0
    1 2020-07-15    Paris  200  500.0
    2 2020-07-14    Paris  300  700.0
    3 2020-07-13    Paris  400    NaN
    4 2020-07-16       NY   10   30.0
    5 2020-07-15       NY   20   50.0
    6 2020-07-14       NY   30   70.0
    7 2020-07-13       NY   40    NaN
    

    【讨论】:

    • 只需将 df[::-1] 添加到您的第一个解决方案中即可重新排序 date 。谢谢!
    • 刚刚编辑 - 重新排序了日期。如果解决方案没问题,请考虑接受它作为答案。
    猜你喜欢
    • 2019-04-11
    • 2016-02-20
    • 1970-01-01
    • 1970-01-01
    • 2019-12-12
    • 2013-07-01
    • 2023-03-09
    • 2021-03-19
    • 2020-02-18
    相关资源
    最近更新 更多