Pandas - 对如何操作数据框感到困惑答案

【问题标题】：Pandas - Confused about how to manipulate dataframePandas - 对如何操作数据框感到困惑
【发布时间】：2020-11-06 16:58:55
【问题描述】：

我正在处理一个非常大的 Excel 文件，我将其读入数据框。我清理/过滤数据框，以便只剩下我想要的数据。新的数据框如下所示；

Date    DirectionName   FlagText
0   2018-02-02 00:00:03.000 South   Friday
1   2018-02-02 00:00:22.010 South   Friday
2   2018-02-02 00:00:22.020 South   Friday
3   2018-02-02 00:00:36.040 South   Friday
4   2018-02-02 00:00:49.070 South   Friday
... ... ... ...
445632  2018-02-23 23:59:28.070 South   Friday
445633  2018-02-23 23:59:29.000 South   Friday
445634  2018-02-23 23:59:33.090 South   Friday
445635  2018-02-23 23:59:45.070 South   Friday
445636  2018-02-23 23:59:50.080 South   Friday

然后我按照天/小时分组，这样我就可以看到一个小时的时间段内的数据量。我正在使用这段代码来实现这一点：

pd.set_option('display.max_rows', None)
df['Day/Hour'] = df['Date'].apply(lambda x: "%d/%d" % (x.day, x.hour))
df.groupby(['Day/Hour', 'DirectionName']).size()

输出显示四个日期 02/02/18、09/02/18、16/02/18、23/02/18，并显示每个日期的 24 小时范围。它计算了一个小时范围内有多少个“SOUTH”……例如，在 2 月 16 日凌晨 0 点到 1 点，它计算了 219 个南方数据。

Day/Hour  DirectionName
16/0      South             219
16/1      South             163
16/10     South            1594
16/11     South            1775
16/12     South            2026
16/13     South            2111
16/14     South            2400
16/15     South            2588
16/16     South            2927
16/17     South            2690
16/18     South            2071
16/19     South            1513
16/2      South              87
16/20     South            1025
16/21     South             798
16/22     South             831
16/23     South             590
16/3      South             125
16/4      South             117
16/5      South             290
16/6      South             802
16/7      South            1760
16/8      South            1964
16/9      South            1592
2/0       South             250
2/1       South             137
2/10      South            1493
2/11      South            1716
2/12      South            1970
2/13      South            2081
2/14      South            2363
2/15      South            2583
2/16      South            2746
2/17      South            2647
2/18      South            2107
2/19      South            1521
2/2       South              92
2/20      South            1047
2/21      South             851
2/22      South             813
2/23      South             557
2/3       South              92
2/4       South             110
2/5       South             272
2/6       South             832
2/7       South            1972
2/8       South            2106
2/9       South            1695
23/0      South             214
23/1      South             123
23/10     South            1592
23/11     South            1767
23/12     South            2030
23/13     South            2046
23/14     South            2387
23/15     South            2616
23/16     South            2796
23/17     South            2581
23/18     South            1979
23/19     South            1490
23/2      South              95
23/20     South            1056
23/21     South             858
23/22     South             783
23/23     South             563
23/3      South              83
23/4      South             134
23/5      South             265
23/6      South             803
23/7      South            1859
23/8      South            2089
23/9      South            1670
9/0       South             222
9/1       South             114
9/10      South            1505
9/11      South            1688
9/12      South            1888
9/13      South            2052
9/14      South            2366
9/15      South            2656
9/16      South            2906
9/17      South            2728
9/18      South            2043
9/19      South            1488
9/2       South              87
9/20      South            1097
9/21      South             840
9/22      South             711
9/23      South             628
9/3       South              88
9/4       South             134
9/5       South             293
9/6       South             890
9/7       South            1941
9/8       South            2095
9/9       South            1639

我很困惑我如何无法操作这个新的数据框。我希望能够将共享同一时间的数字的总和相加。

例如：

2/0       South             250 +
9/0       South             222 +
16/0      South             219 + 
23/0      South             214 +

= 250 + 222 + 219 + 214 = 905.

我需要找到这个总数，然后才能找到平均值。 905 / 4 = 226.25 有了这个平均值，我就知道，在 2018 年 2 月，周五凌晨 0 点到 1 点之间记录的数据量为 226.5。

我希望我已经解释得足够好，我知道这有点令人困惑。我试图避免添加原始数据集，因为我认为这是不必要的。非常感谢您的帮助。

【问题讨论】：

标签： python pandas dataframe datetime

【解决方案1】：

据我了解，您无法组合具有相同 Hour 值的值。我认为这是因为您将它们与数据放在一起。我的解决方案：

df['Day'] = df['Date'].apply(lambda x: "%d" % (x.day))
df['Hour'] = df['Date'].apply(lambda x: "%d" % (x.hour))

counts = df.groupby(['Day', 'Hour', 'DirectionName'])['FlagText'].apply('count')
counts = counts.reset_index()

由于您已将日和小时分开，您现在可以总结零时发生的所有事情：

counts_zero_hour = counts[counts['Hour'] == 0]['FlagText']
result = counts_zero_hour.sum()/counts_zero_hour.shape[0]

【讨论】：

TypeError: 'str' object is not callable----> 9 counts = df.groupby(['Day', 'Hour', 'DirectionName'])['FlagText'].apply ('计数')