【发布时间】:2015-04-30 04:39:39
【问题描述】:
我有一个很长的列表(10 年)每小时值,我想平均第 3 列,每天。这样每个日期都将具有从第 3 列得出的平均值。
我的数据如下所示:
> 1/1/2005,16:00:00,83.3971,-3.8950
> 1/1/2005,17:00:00,0.0000,-3.9146
> 1/1/2005,18:00:00,0.0000,-3.9337
> 1/1/2005,19:00:00,0.0000,-3.9532
> 1/1/2005,20:00:00,0.0000,-3.9727
> 1/1/2005,21:00:00,0.0000,-3.9920
> 1/1/2005,22:00:00,0.0000,-4.0116
> 1/1/2005,23:00:00,0.0000,-4.0311
> 1/2/2005,0:00:00,0.0000,-4.0503
> 1/2/2005,1:00:00,0.0000,-4.0697
> 1/2/2005,2:00:00,0.0000,-4.0891
> 1/2/2005,3:00:00,0.0000,-4.1083
> 1/2/2005,4:00:00,0.0000,-4.1279
> 1/2/2005,5:00:00,0.0000,-4.1472
> 1/2/2005,6:00:00,0.0000,-4.1662
> 1/2/2005,7:00:00,0.0000,-4.1858
> 1/2/2005,8:00:00,0.0000,-4.2053
> 1/2/2005,9:00:00,152.7058,-4.2242
> 1/2/2005,10:00:00,302.6400,-4.2436
> 1/2/2005,11:00:00,405.2218,-4.2630
> 1/2/2005,12:00:00,452.6208,-4.2821
> 1/2/2005,13:00:00,441.4662,-4.3016
> 1/2/2005,14:00:00,372.5459,-4.3208
> 1/2/2005,15:00:00,250.8291,-4.3398
> 1/2/2005,16:00:00,86.6172,-4.3592
> 1/2/2005,17:00:00,0.0000,-4.3785
> 1/2/2005,18:00:00,0.0000,-4.3973
> 1/2/2005,19:00:00,0.0000,-4.4167
>...
12/30/2014,23:00:00,0.0000,0.7601 12/31/2014,0:00:00,0.0000,0.7601 12/31/2014,1:00:00,0.0000,0.7601 12/31/2014,2:00:00,0.0000,0.7601 12/31/2014,3:00:00,0.0000,0.7601 12/31/2014,4:00:00,0.0000,0.7601 12/31/2014,5:00:00,0.0000,0.7601 12/31/2014,6:00:00,0.0000,0.7601 12/31/2014,7:00:00,0.0000,0.7601 12/31/2014,8:00:00,0.0000,2.6808 12/31/2014,9:00:00,153.8084,1.6338 12/31/2014,10:00:00,301.9711,1.3491 12/31/2014,11:00:00,402.5888,1.2512 12/31/2014,12:00:00,447.9860,1.2191 12/31/2014,13:00:00,434.9283,1.2277...
这可能是一个很好的机会来突出 "Split, Apply, Combine" 前提和一个简单的案例使用?
也许读取 csv 到 pandas,索引为日期时间对象,然后 groupby day,聚合总和/除以计数(又名 平均)?
问题: 我需要平均每日价值,我从上述 10 年每小时时间序列开始。例如,我有一个从 2005 年 1 月 1 日到 2014 年 12 月 31 日的每小时数据集,我想要基于该数据集的 10 年每日平均值的每日平均值。你挖?
我已经从每小时到每天使用:
df = pd.read_csv('file.csv', parse_dates='datetime':0,1]},index_col='datetime', header=True, usecols=[0,1,2])
day_avgs = df.groupby(pd.TimeGrouper('D'))
这确实会返回平均每日值,见下文:
date
2005-01-01 106.307291
2005-01-02 102.578729
2005-01-03 103.332883
2005-01-04 104.139979
2005-01-05 104.999592
... ...
2014-12-02 108.292092
2014-12-03 107.189729
2014-12-04 106.142721
2014-12-05 105.151696
但是,我不知道如何将这些每日值分组到“day_avgs”中,因此在每个日期(其中 10 个)分组,然后平均给出一个每日平均值,即所有这些日期的平均值超过 10 年的数据集。卡皮奇?
即,我想根据 10 年的每日平均值计算一年中每天的平均值 (365)。
【问题讨论】:
-
为什么要除以 8?你有额外的 8 个观察值,你想打折 0.0000 值
-
另外你的问题有很多问题,这是不鼓励这样做的,理想情况下每个帖子1个问题,所以你需要编辑你的问题
-
我的问题只有一个,但有步骤,毫无疑问。我可以处理包括当天的平均值在内的平均值,或者不处理任何对受访者来说最容易的平均值。 -8 只是平均计算的一个例子。我认为这个问题(同样,只有 1 个)值得不编辑,因为我相信答案将大大有助于帮助他人。谢谢
标签: python pandas group-by time-series python-datetime