【发布时间】:2020-04-10 14:40:32
【问题描述】:
我有一个车站维修数据框。
工作流程是这样的:机械师去一个车站,他们按下一个按钮,记录一个名为 release 的 action。他们修好车站后,再次按下按钮,现在的动作是return。
您可以在下面看到row 1 和row 2 是一个已完成的任务,需要Jane Jetson 10 秒才能完成。
dt name foo_id foo_role bikeId station_name station_id action
1 2019-12-12 13:05:47 Jane Jetson 106337 Mechanic 12345 FooStation 1234.89 Release
2 2019-12-12 13:05:57 Jane Jetson 106337 Mechanic 12345 FooStation 1234.89 Return
3 2019-12-12 13:06:16 John Doe 106338 Mechanic 12345 FooStation 1234.89 Release
4 2019-12-12 13:06:19 John Doe 106338 Mechanic 12345 FooStation 1234.89 Return
5 2019-12-12 13:07:16 John Doe 106338 Mechanic 12345 FooStation 1234.89 Release
6 2019-12-12 14:07:16 John Doe 106338 Mechanic 56789 Some Station 4567.12 Release
我想要发生的事情:
- 我想知道每个
mechanic使用actionRelease然后跟随Return花了多长时间来修复站点。 - 如果
Release没有Return,我想取Sys.time()并从dt中减去它。你会看到row 5和row 6
我这样做了:(我不是 100% 确定我需要上一个操作,但我包括在内以防万一。)
library(dplyr)
library(tidyr)
foo = arrange(foo, foo_id, name, foo_role, bikeId, station_id) %>%
group_by(foo_id,name, foo_role, bikeId, station_name,station_id) %>%
mutate(prev_dt = lag(dt, order_by = foo_id),
prev_action = lag(action, order_by=foo_id, default = 'NaN'))
foo$timediffsecs = as.numeric(difftime(foo$dt,foo$prev_dt,units='secs'))
> foo
# A tibble: 6 x 11
# Groups: foo_id, name, foo_role, bikeId, station_name, station_id [3]
dt name foo_id foo_role bikeId station_name station_id action prev_dt prev_action timediffsecs
<dttm> <fct> <int> <fct> <int> <fct> <dbl> <chr> <dttm> <chr> <dbl>
1 2019-12-12 13:05:47 Jane Jetson 106337 Mechanic 12345 FooStation 1235. Release NA NaN NA
2 2019-12-12 13:05:57 Jane Jetson 106337 Mechanic 12345 FooStation 1235. Return 2019-12-12 13:05:47 Release 10
3 2019-12-12 13:06:16 John Doe 106338 Mechanic 12345 FooStation 1235. Release NA NaN NA
4 2019-12-12 13:06:19 John Doe 106338 Mechanic 12345 FooStation 1235. Return 2019-12-12 13:06:16 Release 3
5 2019-12-12 13:07:16 John Doe 106338 Mechanic 12345 FooStation 1235. Release 2019-12-12 13:06:19 Return 57
6 2019-12-12 14:07:16 John Doe 106338 Mechanic 56789 Some Station 4567. Release NA NaN NA
问题:
row 5是一个新周期,因为actionRelease和Return之前发生过,但timediffsecs记录了 57 秒。row 5Prev_dt和prev_action应该是NA和timediffsecs=Sys.time() - dt。row 6应该有timediffsecs=Sys.time() - dt
我认为可行:
我将prev_action NA 更改为 NaN,因此我可以做一些 if else 语句,但我不太确定如何为此构造一个。我想将prev_dt 中的NA 更改为默认为dt,但这样做有问题。我想尝试这样做的原因是我可以使用条件语句,但如果不需要,则无需更改 NA。
tl;dr: 我希望timediffsecs 记录正确的秒数。 row 5 和 row 6 有问题。 row 5 应该是 Sys.time() - dt。 row 6我要回Sys.time() - dt
数据:
structure(list(dt = structure(c(1576173947, 1576173957, 1576173976,
1576173979, 1576174036, 1576177636), class = c("POSIXct", "POSIXt"
), tzone = ""), name = structure(c(1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Jane Jetson",
"John Doe"), class = "factor"), foo_id = c(106337L, 106337L,
106338L, 106338L, 106338L, 106338L), foo_role = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "Mechanic", class = "factor"),
bikeId = c(12345L, 12345L, 12345L, 12345L, 12345L, 56789L
), station_name = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("FooStation",
"Some Station"), class = "factor"), station_id = c(1234.89,
1234.89, 1234.89, 1234.89, 1234.89, 4567.12), action = c("Release",
"Return", "Release", "Return", "Release", "Release")), row.names = c(NA,
-6L), class = "data.frame")
【问题讨论】:
-
如何匹配
Release和Return?他们会一直在一个连续的行中吗? -
现在我正在分组而不包括操作。它并不总是连续的,但在一个站点发布之前不会再次发生,直到它被返回。
标签: r loops if-statement lag