【发布时间】:2015-06-24 21:13:09
【问题描述】:
我正在处理时间序列传感器数据。 这是我们的流程:有 3 列(EditDate、ID、InsertDate)
EditDate: This is date when the sensor data is edited/modified for that week
ID: A manufacturing tool identifier
InsertDate: This is the date when all the sensor information will be added to the data frame at once for that week
我们每周五早上 6:30 (InsertDate) 添加数据。我的问题是在过去 7 天的数据中找出异常值(注意:原始数据框也将包含前几周的数据)。当我正确地实现我的异常值函数时,我搞砸了日期,这是我需要帮助的地方。
例如考虑这个数据框
EditDate <- c("04/17/2015 5:46:23 AM", "04/17/2015 5:23:23 AM","04/16/2015 9:46:34 AM","04/15/2015 23:46:11AM","04/11/2015 11:46:17 AM","04/10/2015 6:34:23 AM","04/10/2015 6:29:34 AM","04/8/2015 5:46:12 AM","04/5/2015 5:46:22 AM","04/3/2015 6:31:22 AM","04/3/2015 6:29:23 AM")
ID <- c("DX154", "DX156","DX157","DX159","DX132,"DX137","DX111","DX123","DX136","DX051","DX021")
InsertDate <- c("4/17/2015 6:30:00 AM", "4/17/2015 6:30:00 AM","4/17/2015 6:30:00 AM","4/17/2015 6:30:00 AM","4/17/2015 6:30:00 AM","4/17/2015 6:30:00 AM","4/10/2015 6:30:00 AM","4/10/2015 6:30:00 AM","4/10/2015 6:30:00 AM","4/10/2015 6:30:00 AM","4/3/2015 6:30:00 AM")
df1 <- data.frame(EditDate , ID, InsertDate)
输出
+------------------------+-------+----------------------+
| EditDate | ID | InsertDate |
+------------------------+-------+----------------------+
| 04/17/2015 5:46:23 AM | DX154 | 4/17/2015 6:30:00 AM |
| 04/17/2015 5:23:23 AM | DX156 | 4/17/2015 6:30:00 AM |
| 04/16/2015 9:46:34 AM | DX157 | 4/17/2015 6:30:00 AM |
| 04/15/2015 23:46:11AM | DX159 | 4/17/2015 6:30:00 AM |
| 04/11/2015 11:46:17 AM | DX132 | 4/17/2015 6:30:00 AM |
| 04/10/2015 6:34:23 AM | DX137 | 4/17/2015 6:30:00 AM |
| 04/10/2015 6:29:34 AM | DX111 | 4/10/2015 6:30:00 AM |
| 04/8/2015 5:46:12 AM | DX123 | 4/10/2015 6:30:00 AM |
| 04/5/2015 5:46:22 AM | DX123 | 4/10/2015 6:30:00 AM |
| 04/3/2015 6:31:22 AM | DX123 | 4/10/2015 6:30:00 AM |
| 04/3/2015 6:29:23 AM | DX123 | 4/3/2015 6:30:00 AM |
+------------------------+-------+----------------------+
一旦我有了数据框,我要做的是
BackAWeek <-Sys.time() - (604800*2) #604800 is a week in seconds
df2 <- subset(df1, df1$EditDate<BackAWeek)
df3 <- subset(df1, df1$EditDate>BackAWeek)
df2 包含最近 7 天的数据,df3 应包含不属于上周的所有数据。这种意义上的星期是根据插入日期计算的,即(例如:假设我们有 4 周的数据。df2 应该返回从第 3 周的星期五早上 6:30:00 到星期五 6:29 的所有数据:第 4 周上午 59 点)。
我当前的脚本要求我在每个星期五早上 6:31:00 运行它,以获取过去 7 天的数据,这在每次都是不可能的。假设当我在下周中(例如,星期三(2015 年 4 月 22 日))运行脚本来查看数据时,我的脚本需要当前时间并减去 7 天,因此我错过了任何数据在 2015 年 4 月 15 日之前输入。
如果我在 2015 年 4 月 22 日运行脚本,我会得到的数据框是
EditDate ID InsertDate
04/17/2015 5:46:23 AM DX154 4/17/2015 6:30:00 AM
04/17/2015 5:23:23 AM DX156 4/17/2015 6:30:00 AM
04/16/2015 9:46:34 AM DX157 4/17/2015 6:30:00 AM
04/15/2015 23:46:11AM DX159 4/17/2015 6:30:00 AM
但想要的是
EditDate ID InsertDate
04/17/2015 5:46:23 AM DX154 4/17/2015 6:30:00 AM
04/17/2015 5:23:23 AM DX156 4/17/2015 6:30:00 AM
04/16/2015 9:46:34 AM DX157 4/17/2015 6:30:00 AM
04/15/2015 23:46:11AM DX159 4/17/2015 6:30:00 AM
04/11/2015 11:46:17 AM DX132 4/17/2015 6:30:00 AM
04/10/2015 6:34:23 AM DX137 4/17/2015 6:30:00 AM
请提供有关如何修复我的代码的意见,以始终考虑周五至周五早上 6:30,无论我每周什么时间运行它。
【问题讨论】:
标签: r datetime dataframe time-series