【发布时间】:2021-02-28 00:10:58
【问题描述】:
我创建了一些如下所示的数据:
import pandas as pd
d = {'Time': ['01.10.2019, 09:56:52', '01.10.2019, 09:57:15', '02.10.2019 09:57:23', '02.10.2019 10:02:58', '02.10.2019 13:11:58', '02.10.2019 13:22:55', '03.10.2019, 09:56:52', '03.10.2019, 09:57:15', '04.10.2019 09:57:23', '04.10.2019 10:02:58', '04.10.2019 13:11:58', '04.10.2019 13:22:55']
,'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
,'Name': ['Bayer', 'Bayer', 'ITM', 'ITM', 'ITM' , 'ITM', 'ITM', 'ITM', 'Treso', 'Treso', 'Geco' , 'Geco']}
df = pd.DataFrame(data=d)
Time Action Name
0 01.10.2019, 09:56:52 Opened Bayer
1 01.10.2019, 09:57:15 Closed Bayer
2 02.10.2019, 09:57:23 Opened ITM
3 02.10.2019, 10:03:58 Closed ITM
4 02.10.2019, 13:11:58 Opened ITM
5 02.10.2019, 13:22:55 Closed ITM
6 03.10.2019, 09:56:52 Opened ITM
7 03.10.2019, 09:57:15 Closed ITM
8 04.10.2019, 09:57:23 Opened Treso
9 04.10.2019, 10:03:58 Closed Treso
10 04.10.2019, 13:11:58 Opened Geco
11 04.10.2019, 13:22:55 Closed Geco
现在我想通过这些条件删除数据:
- 如果打开和关闭之间的时间小于5分钟并且同名,则应该删除它
- 如果有一个打开的动作和相同的名称,并且它在线路关闭后重复并且它在同一天 -> 它应该删除所有具有相同名称的内容 在第一次打开和最后一次打开之间。例如,应该删除第 2 行到第 5 行,但不要删除到第 7 行,因为它是在一天之后。
第二个条件例如:如果有这个输入:
Time Action Name
0 02.10.2019, 09:57:23 Opened ITM
1 02.10.2019, 10:03:58 Closed ITM
2 02.10.2019, 13:11:58 Opened ITM
3 02.10.2019, 13:22:55 Closed ITM
4 03.10.2019, 09:56:52 Opened ITM
5 03.10.2019, 09:57:15 Closed ITM
我的输出应该是这样的:
0 02.10.2019, 13:11:58 Opened ITM
1 02.10.2019, 13:22:55 Closed ITM
2 03.10.2019, 09:56:52 Opened ITM
3 03.10.2019, 09:57:15 Closed ITM
因为是次日所以从10月2日到3日,其他时间不到5分钟
但如果我们有这种情况:
0 02.10.2019, 09:57:23 Opened ITM
1 02.10.2019, 10:03:58 Closed ITM
2 02.10.2019, 13:11:58 Opened ITM
3 02.10.2019, 13:22:55 Closed ITM
4 02.10.2019, 09:56:52 Opened ITM
5 02.10.2019, 09:57:15 Closed ITM
除了第二行和第三行之外的所有数据都应该删除:
2 02.10.2019, 13:11:58 Opened ITM
3 02.10.2019, 13:22:55 Closed ITM
我希望的输出应该是这样的:
Time Action Name
0 02.10.2019, 09:57:23 Opened ITM
3 02.10.2019, 13:22:55 Closed ITM
4 03.10.2019, 09:56:52 Opened ITM
5 03.10.2019, 09:57:15 Closed ITM
6 04.10.2019, 09:57:23 Opened Treso
7 04.10.2019, 10:03:58 Closed Treso
8 04.10.2019, 13:11:58 Opened Geco
9 04.10.2019, 13:22:55 Closed Geco
我尝试了什么:
df_new = df.assign(group=pd.to_datetime(df["Time"]).diff().dt.seconds.gt(300).cumsum()).groupby(["group",
"Time",
"Action",
"Name"]).first()
有人可以帮我吗?
【问题讨论】:
-
开闭总是连续的吗?
-
是的,它应该每次打开和关闭,所以它们应该是连续的
-
看来
shift()可以处理 -
感谢您的评论,然后我该如何添加第二个条件? :)
-
你能有像23:59开门和00:04关门的东西吗?
标签: python pandas dataframe csv