【发布时间】:2019-08-22 07:19:12
【问题描述】:
我想计算服务器从数据集中停止的时间长度。我知道停机时间,但不知道持续时间。
我有这个 df:
index a b c reboot
2018-06-25 12:51:00 NaN NaN NaN 1
2018-06-25 12:52:00 NaN NaN NaN 0
2018-06-25 12:53:00 NaN NaN NaN 0
2018-06-25 12:54:00 NaN NaN NaN 0
2018-06-25 12:55:00 NaN NaN NaN 0
2018-06-25 12:56:00 NaN NaN NaN 0
2018-06-25 12:57:00 NaN NaN NaN 0
2018-06-25 12:58:00 NaN 0.6 0.6 0
2018-06-25 12:59:00 NaN NaN 0.5 0
2018-06-25 13:00:00 NaN NaN 0.3 0
2018-06-25 13:01:00 2.55 94.879997 0.23 0
2018-06-25 13:02:00 1.17 Nan 0.13 0
2018-06-25 13:03:00 1.08 98.199997 0.10 0
2018-06-25 13:28:00 NaN NaN NaN 1
2018-06-25 13:29:00 NaN NaN NaN 0
2018-06-25 13:30:00 NaN NaN NaN 0
2018-06-25 13:31:00 NaN NaN NaN 0
2018-06-25 13:31:00 0.5 0.2 0.1 0
2018-06-25 13:32:00 NaN NaN NaN 0
2018-06-25 13:33:00 NaN NaN NaN 0
2018-06-25 13:34:00 3 0.6 0.5 0
我想统计a、b和c都是NaN和reboot == 1的行,结果如下:
index period reboot
2018-06-25 12:51:00 7 1
2018-06-25 13:28:00 4 1
我已经尝试在没有重启条件的情况下逐列进行。
输入:
index a b c reboot
2018-06-25 12:51:00 NaN NaN NaN 1
2018-06-25 12:52:00 NaN NaN NaN 0
2018-06-25 12:53:00 NaN NaN NaN 0
2018-06-25 12:54:00 NaN NaN NaN 0
2018-06-25 12:55:00 NaN NaN NaN 0
2018-06-25 12:56:00 NaN NaN NaN 0
2018-06-25 12:57:00 NaN NaN NaN 0
2018-06-25 12:58:00 NaN NaN NaN 0
2018-06-25 12:59:00 NaN NaN NaN 0
2018-06-25 13:00:00 NaN NaN NaN 0
2018-06-25 13:01:00 2.55 94.879997 0.23 0
2018-06-25 13:02:00 1.17 Nan 0.13 0
2018-06-25 13:03:00 1.08 98.199997 0.10 0
2018-06-25 13:28:00 NaN NaN NaN 1
2018-06-25 13:29:00 NaN NaN NaN 0
2018-06-25 13:30:00 NaN NaN NaN 0
a=df.index
b=df.b.values
idx0 = np.flatnonzero(np.r_[True, np.diff(np.isnan(b))!=0,True])
count = np.diff(idx0)
idx = idx0[:-1]
valid_mask = (count>=step) & np.isnan(b[idx])
out_idx = idx[valid_mask]
out_num = a[out_idx]
out_count = count[valid_mask]
outb = zip(out_num, out_count)
periodb=list(outb)
结果:
'[(Timestamp('2018-06-25 12:51:00'), 10),
(Timestamp('2018-06-25 13:28:00'), 3),'
【问题讨论】:
-
1始终位于reboot列中仅缺失组的第一行? -
是的,重启从 1 开始,但我们不知道它什么时候结束,只是我们有新的值 (a,b,c),我们说服务器已经启动
-
(当我们检测到 a 或 b 或 c 或 all 中的第一个新值时,它会停止重启
-
只有 3 个 NaN 列组(从
12:51:00、13:28和13:32:00开始),reboot始终为 1,仅用于该组的第一行?或者可能的,例如在12:54:00? -
在 13:32:00 的情况下,我们只有一个缺失值但不是重新启动,因为 reboot==0 12:51:00 的第一次重新启动持续了 7 分钟 12:58:00我知道服务器已打开,因为我在 b 和 c 中有新值
标签: python pandas numpy data-science