【发布时间】:2021-02-11 23:12:54
【问题描述】:
我想将df 按ID 分组,然后将State 以1 开头并以第一个0 结尾的行分组(如果最后没有0 ,如下面的预期输出所示,1s 将被视为一组)。如果连续有1s,则继续下一个值,直到找到0。以第一个 1s 开头并以第一个 0 结尾的行属于一个组。如果观察到连续的0s,我们不感兴趣(除了第一个,应该是一个组的结尾)。然后我想为每个组中的行分配相同的组号。在df 的示例中,ID 有 2 个值 - 32 和 64,它们被视为独立组。
df:
ID Timestamp Value State
103177 64 2010-09-21 23:13:21.090 21.5 1.0
252019 64 2010-09-22 00:44:14.890 21.5 1.0
271381 64 2010-09-22 00:44:15.890 21.5 0.0
268939 64 2010-09-22 00:44:17.890 23.0 0.0
259875 64 2010-09-22 00:44:18.440 23.0 1.0
18870 64 2010-09-22 00:44:19.890 24.5 1.0
205910 32 2010-09-22 00:44:23.440 24.5 1.0
103865 32 2010-09-22 01:04:33.440 23.5 0.0
152281 32 2010-09-22 01:27:01.790 22.5 1.0
138988 32 2010-09-22 02:18:52.850 21.5 0.0
可重现的例子:
df = pd.DataFrame({'ID': {103177: 64,
252019: 64,
271381: 64,
268939: 64,
259875: 64,
18870: 64,
205910: 32,
103865: 32,
152281: 32,
138988: 32},
'Timestamp': {103177: Timestamp('2010-09-21 23:13:21.090000'),
252019: Timestamp('2010-09-22 00:44:14.890000'),
271381: Timestamp('2010-09-22 00:44:15.890000'),
268939: Timestamp('2010-09-22 00:44:17.890000'),
259875: Timestamp('2010-09-22 00:44:18.440000'),
18870: Timestamp('2010-09-22 00:44:19.890000'),
205910: Timestamp('2010-09-22 00:44:23.440000'),
103865: Timestamp('2010-09-22 01:04:33.440000'),
152281: Timestamp('2010-09-22 01:27:01.790000'),
138988: Timestamp('2010-09-22 02:18:52.850000')},
'Value': {103177: 21.5,
252019: 21.5,
271381: 21.5,
268939: 23.0,
259875: 23.0,
18870: 24.5,
205910: 24.5,
103865: 23.5,
152281: 22.5,
138988: 21.5},
'State': {103177: 1.0,
252019: 1.0,
271381: 0.0,
268939: 0.0,
259875: 1.0,
18870: 1.0,
205910: 1.0,
103865: 0.0,
152281: 1.0,
138988: 0.0}})
df
预期输出:
ID Timestamp Value State Group
103177 64 2010-09-21 23:13:21.090 21.5 1.0 1
252019 64 2010-09-22 00:44:14.890 21.5 1.0 1
271381 64 2010-09-22 00:44:15.890 21.5 0.0 1
268939 64 2010-09-22 00:44:17.890 23.0 0.0 -
259875 64 2010-09-22 00:44:18.440 23.0 1.0 2 (* `State` only has `1`, didn't end with `0`.)
18870 64 2010-09-22 00:44:19.890 24.5 1.0 2 (* `State` only has `1`, didn't end with `0`.)
205910 32 2010-09-22 00:44:23.440 24.5 1.0 3 * New `ID`, thus `Group` increases by 1.
103865 32 2010-09-22 01:04:33.440 23.5 0.0 3
152281 32 2010-09-22 01:27:01.790 22.5 1.0 4
138988 32 2010-09-22 02:18:52.850 21.5 0.0 4
【问题讨论】:
标签: python pandas group-by time-series