【发布时间】:2022-01-13 12:20:04
【问题描述】:
我有一个大型数据集,它在特定位置全天记录 id
我想做的是绘制每个人id 在收集数据的时间段内的累计访问次数。
数据样本如下所示,完整数据集已访问数天。
我尝试了一些使用 cumsum 的变体,但无法正常工作。
dput(df)
structure(list(date = c("06/01/2021", "06/01/2021", "06/01/2021",
"06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021",
"06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021",
"06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021",
"06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021",
"06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021", "06/01/2021",
"06/01/2021", "06/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021", "07/01/2021",
"07/01/2021", "07/01/2021", "08/01/2021", "08/01/2021", "08/01/2021",
"08/01/2021", "08/01/2021", "08/01/2021", "08/01/2021", "08/01/2021",
"08/01/2021", "08/01/2021", "08/01/2021", "08/01/2021", "08/01/2021",
"08/01/2021"), time = c("08:02:54", "08:04:48", "08:04:49", "08:05:49",
"08:05:50", "08:05:50", "08:05:51", "08:06:32", "08:06:33", "08:07:34",
"08:07:34", "08:07:35", "08:07:36", "08:07:36", "08:09:52", "08:09:53",
"08:09:53", "08:10:02", "08:10:04", "08:10:05", "08:10:05", "08:10:07",
"08:10:08", "08:10:22", "08:10:42", "08:10:43", "08:11:14", "08:11:15",
"08:11:38", "08:11:39", "08:11:39", "08:11:40", "08:11:40", "08:11:41",
"08:11:48", "08:11:50", "08:11:51", "08:11:51", "08:11:52", "08:11:53",
"08:11:54", "08:11:54", "08:12:36", "08:12:37", "08:12:38", "08:12:38",
"08:13:25", "08:13:25", "08:14:09", "08:14:18", "08:14:19", "08:14:24",
"08:14:24", "08:14:25", "08:14:37", "08:14:38", "08:14:58", "08:14:58",
"08:14:59", "08:14:59", "08:15:03", "08:15:04", "08:15:04", "08:15:05",
"08:15:12", "08:15:13", "08:15:13", "08:15:33", "08:15:34", "08:15:37",
"08:15:39", "08:15:51", "08:16:12", "08:16:13", "08:16:14", "08:16:31",
"08:16:32", "08:16:42", "08:17:00", "08:17:00", "08:17:01", "08:17:03",
"08:17:19", "08:17:20", "08:17:22", "08:17:26", "08:17:26", "08:17:27",
"08:17:27", "08:17:32", "08:17:32", "08:17:33", "08:17:50", "08:17:51",
"08:17:51", "08:17:52", "08:18:38", "08:18:39", "08:18:39", "08:18:40",
"08:18:41", "08:18:41", "08:19:44", "08:19:44", "08:19:46", "08:19:46",
"08:22:27", "08:23:20", "08:23:20", "08:23:47", "08:23:48", "08:23:48",
"08:23:52", "08:23:52"), id = c(2L, 3L, 2L, 3L, 4L, 5L, 3L, 4L,
3L, 2L, 3L, 3L, 2L, 4L, 5L, 2L, 3L, 2L, 2L, 2L, 4L, 3L, 2L, 2L,
4L, 5L, 3L, 2L, 4L, 5L, 3L, 3L, 4L, 5L, 6L, 4L, 3L, 5L, 4L, 5L,
4L, 3L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L,
5L, 3L, 2L, 4L, 5L, 3L, 2L, 2L, 4L, 5L, 3L, 3L, 4L, 5L, 6L, 4L,
3L, 5L, 4L, 5L, 4L, 3L, 2L, 2L, 3L, 2L, 4L, 5L, 3L, 3L, 4L, 5L,
6L, 4L, 3L, 5L, 4L, 5L, 4L, 3L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 3L,
2L, 4L, 5L, 3L, 3L, 4L, 5L, 6L, 4L, 3L)), class = "data.frame", row.names = c(NA,
-114L))
head(df)
date time id
1 06/01/2021 08:02:54 2
2 06/01/2021 08:04:48 3
3 06/01/2021 08:04:49 2
4 06/01/2021 08:05:49 3
5 06/01/2021 08:05:50 4
6 06/01/2021 08:05:50 5
【问题讨论】:
-
什么定义了“访问”?每一行都是一次访问吗? ID 2 在同一天有时间在
08:02:54和08:04:49- 这两个都是访问吗? -
是的,他们是,每一行都是一次访问
标签: r date time-series cumulative-sum