【发布时间】:2017-10-10 02:27:37
【问题描述】:
我在下面给出了这个数据框 dput:
lf3 = structure(list(session_id = c(1L, 1L, 1L, 2L, 3L, 5L, 5L, 6L,
6L, 7L), userId = c(1, 1, 1, 2, 2, 4, 4, 5, 5, 5), datetime =
structure(c(1457029336,
1457029337, 1457029340, 1457029596, 1457313569, 1457030783, 1457030784,
1457030918, 1457030920, 1457370365), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), referer = c(22, 2, 7, 5, 23, 20, 7, 24, 18,
22), request = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 5)), .Names = c("session_id",
"userId", "datetime", "referer", "request"), row.names = c(NA,
10L), class = "data.frame")
现在我想退出那些具有最低指定标准/值的会话。 我试试这段代码:
lf3 %>% group_by(session_id) %>% tally(sort = TRUE) %>% filter(n>2)
但我想返回相同的数据帧,只有会话通过此条件,如下所示:
session_id userId datetime referer request
1 1 1 2016-03-03 18:22:16 22 1
2 1 1 2016-03-03 18:22:17 2 2
3 1 1 2016-03-03 18:22:20 7 3
如何处理
【问题讨论】:
-
使用预期输出更新您的问题。
-
所以它只会给出 session_id =1 频率大于 2 的行。期望的输出会像这个框架:
structure(list(session_id = c(1L, 1L, 1L), userId = c(1, 1, 1 ), datetime = structure(c(1457029336, 1457029337, 1457029340), class = c("POSIXct", "POSIXt"), tzone = "UTC"), referer = c(22, 2, 7), request = c(1, 2, 3)), .Names = c("session_id", "userId", "datetime", "referer", "request"), row.names = c(NA, 3L), class = "data.frame") -
我更喜欢base R,
ave,lf3[ave(lf3$userId, lf3$session_id, FUN = length) > 2, ]