【发布时间】:2016-01-09 05:29:41
【问题描述】:
TL;DR : 是否有解决方案:
- 将数据实时添加到 DataFrame(非恒定采样率:每个新数据之间有时为 1 秒,有时为 0.2 秒,有时为 2 秒等)
- 能够在固定 5 秒的窗口上计算
rolling_mean(无论此窗口中是 10 或 100 还是只有 2 个样本)
更准确地说:
import pandas as pd, time
df = pd.DataFrame(columns = ['x'])
for i in range(10):
df.ix[pd.datetime.now()] = {'x': 10 + i}
time.sleep(0.2) # here 0.2 seconds between each new data...
df.ix[pd.datetime.now()] = {'x': 20}
time.sleep(1) # here 1 second...
df.ix[pd.datetime.now()] = {'x': 21}
time.sleep(3) # here 3 seconds...
df.ix[pd.datetime.now()] = {'x': 22}
给df:
x
2016-01-08 13:57:10.679 10
2016-01-08 13:57:10.882 11
2016-01-08 13:57:11.085 12
2016-01-08 13:57:11.287 13
2016-01-08 13:57:11.489 14
2016-01-08 13:57:11.691 15
2016-01-08 13:57:11.893 16
2016-01-08 13:57:12.095 17
2016-01-08 13:57:12.297 18
2016-01-08 13:57:12.499 19
2016-01-08 13:57:12.701 20
2016-01-08 13:57:13.703 21
2016-01-08 13:57:16.706 22
这是pd.rolling_mean(df, 5)
x
2016-01-08 13:57:10.679 NaN
2016-01-08 13:57:10.882 NaN
2016-01-08 13:57:11.085 NaN
2016-01-08 13:57:11.287 NaN
2016-01-08 13:57:11.489 12
2016-01-08 13:57:11.691 13
2016-01-08 13:57:11.893 14
2016-01-08 13:57:12.095 15
2016-01-08 13:57:12.297 16
2016-01-08 13:57:12.499 17
2016-01-08 13:57:12.701 18
2016-01-08 13:57:13.703 19
2016-01-08 13:57:16.706 20
当然,pd.rolling_mean(df, 5) 会计算 5 行周期内的滚动平均值,这不是我想要的:我想要 5 秒的周期。
一个解决方案是df.resample('1S', ...),但由于我想在每次添加新数据时计算一个新的rolling_mean,这意味着我应该每分钟多次.resample(...)整个DataFrame,这确实非常耗时,而且我认为这不是一个干净的解决方案。(在我的实际用例中,DataFrame 很大)。
什么是干净的解决方案?
【问题讨论】:
-
您是否找到了一种无需花费大量时间就能奏效的解决方案?
标签: python pandas time-series