【发布时间】:2020-11-06 16:58:55
【问题描述】:
我正在处理一个非常大的 Excel 文件,我将其读入数据框。我清理/过滤数据框,以便只剩下我想要的数据。新的数据框如下所示;
Date DirectionName FlagText
0 2018-02-02 00:00:03.000 South Friday
1 2018-02-02 00:00:22.010 South Friday
2 2018-02-02 00:00:22.020 South Friday
3 2018-02-02 00:00:36.040 South Friday
4 2018-02-02 00:00:49.070 South Friday
... ... ... ...
445632 2018-02-23 23:59:28.070 South Friday
445633 2018-02-23 23:59:29.000 South Friday
445634 2018-02-23 23:59:33.090 South Friday
445635 2018-02-23 23:59:45.070 South Friday
445636 2018-02-23 23:59:50.080 South Friday
然后我按照天/小时分组,这样我就可以看到一个小时的时间段内的数据量。我正在使用这段代码来实现这一点:
pd.set_option('display.max_rows', None)
df['Day/Hour'] = df['Date'].apply(lambda x: "%d/%d" % (x.day, x.hour))
df.groupby(['Day/Hour', 'DirectionName']).size()
输出显示四个日期 02/02/18、09/02/18、16/02/18、23/02/18,并显示每个日期的 24 小时范围。它计算了一个小时范围内有多少个“SOUTH”……例如,在 2 月 16 日凌晨 0 点到 1 点,它计算了 219 个南方数据。
Day/Hour DirectionName
16/0 South 219
16/1 South 163
16/10 South 1594
16/11 South 1775
16/12 South 2026
16/13 South 2111
16/14 South 2400
16/15 South 2588
16/16 South 2927
16/17 South 2690
16/18 South 2071
16/19 South 1513
16/2 South 87
16/20 South 1025
16/21 South 798
16/22 South 831
16/23 South 590
16/3 South 125
16/4 South 117
16/5 South 290
16/6 South 802
16/7 South 1760
16/8 South 1964
16/9 South 1592
2/0 South 250
2/1 South 137
2/10 South 1493
2/11 South 1716
2/12 South 1970
2/13 South 2081
2/14 South 2363
2/15 South 2583
2/16 South 2746
2/17 South 2647
2/18 South 2107
2/19 South 1521
2/2 South 92
2/20 South 1047
2/21 South 851
2/22 South 813
2/23 South 557
2/3 South 92
2/4 South 110
2/5 South 272
2/6 South 832
2/7 South 1972
2/8 South 2106
2/9 South 1695
23/0 South 214
23/1 South 123
23/10 South 1592
23/11 South 1767
23/12 South 2030
23/13 South 2046
23/14 South 2387
23/15 South 2616
23/16 South 2796
23/17 South 2581
23/18 South 1979
23/19 South 1490
23/2 South 95
23/20 South 1056
23/21 South 858
23/22 South 783
23/23 South 563
23/3 South 83
23/4 South 134
23/5 South 265
23/6 South 803
23/7 South 1859
23/8 South 2089
23/9 South 1670
9/0 South 222
9/1 South 114
9/10 South 1505
9/11 South 1688
9/12 South 1888
9/13 South 2052
9/14 South 2366
9/15 South 2656
9/16 South 2906
9/17 South 2728
9/18 South 2043
9/19 South 1488
9/2 South 87
9/20 South 1097
9/21 South 840
9/22 South 711
9/23 South 628
9/3 South 88
9/4 South 134
9/5 South 293
9/6 South 890
9/7 South 1941
9/8 South 2095
9/9 South 1639
我很困惑我如何无法操作这个新的数据框。我希望能够将共享同一时间的数字的总和相加。
例如:
2/0 South 250 +
9/0 South 222 +
16/0 South 219 +
23/0 South 214 +
= 250 + 222 + 219 + 214 = 905.
我需要找到这个总数,然后才能找到平均值。 905 / 4 = 226.25 有了这个平均值,我就知道,在 2018 年 2 月,周五凌晨 0 点到 1 点之间记录的数据量为 226.5。
我希望我已经解释得足够好,我知道这有点令人困惑。我试图避免添加原始数据集,因为我认为这是不必要的。非常感谢您的帮助。
【问题讨论】:
标签: python pandas dataframe datetime