如何在熊猫中重新采样数据框并应用加权平均值？答案

【问题标题】：How to resample a dataframe in pandas and apply a weighted average?如何在熊猫中重新采样数据框并应用加权平均值？
【发布时间】：2020-06-12 04:40:58
【问题描述】：

我有一个按时间索引的数据框，有 2 列：price 和 quantity。

我想构建一个新系列，它是按数量加权的 15 分钟间隔内的加权平均价格。

这是我的数据框的头部：

                          price  quantity
ts                                        
2020-06-10 15:56:34+00:00  203.0       400
2020-06-10 15:57:10+00:00  203.0      1300
2020-06-10 15:57:11+00:00  203.0      1100
2020-06-10 15:57:13+00:00  203.0      3000
2020-06-10 15:57:14+00:00  203.0       700

这是我最好的尝试：

def resample_method(x):
    return np.average(x.price, weights=x.quantity)

df.resample("15T").apply(resample_method)

虽然上面的代码表达了我的意图（我相信），但我收到以下错误：

Exception has occurred: AttributeError
'Series' object has no attribute 'price'

【问题讨论】：

当您使用应用时，默认轴在列上，因此应用一次访问每一列。因此，当调用“价格”列时，“数量”列不可用。您需要做的是计算每行加权价格，然后在该加权价格列上重新采样 15T。

标签： python pandas numpy

【解决方案1】：

正如@Scott Boston 在评论中指出的那样，当使用resample 时，不能同时访问两个列。一个技巧可能是将列数量附加到索引，因为索引可以通过每一列访问。

# note I used '1T' instead of 15T like you but simple change in the method
dfr = (df.set_index('quantity', append=True)
         .resample('1T', level=0) # the datetime index is the level=0 
         .apply(lambda x: np.average(x, weights=x.index.get_level_values(1))) #quantity is on level=1
      )
print (dfr) #result not really interesting here it works
                           price
ts                              
2020-06-10 15:56:00+00:00  203.0
2020-06-10 15:57:00+00:00  203.0

【讨论】：