【发布时间】:2021-11-07 03:16:03
【问题描述】:
这就是我的数据的样子。
# A tibble: 4,722,462 x 5
started_at member_casual weekday ride_length month
<dttm> <chr> <fct> <dbl> <fct>
1 2020-08-20 18:08:14 member Thursday 0.160 August
2 2020-08-27 18:46:04 casual Thursday 1.15 August
3 2020-08-26 19:44:14 casual Wednesday 2.15 August
4 2020-08-27 12:05:41 casual Thursday 0.801 August
5 2020-08-27 16:49:02 casual Thursday 0.180 August
6 2020-08-27 17:26:23 casual Thursday 0.691 August
7 2020-08-26 20:14:02 casual Wednesday 0.333 August
8 2020-08-26 21:59:50 casual Wednesday 0.212 August
9 2020-08-26 19:17:42 casual Wednesday 0.242 August
10 2020-08-27 15:13:57 casual Thursday 0.467 August
# ... with 4,722,452 more rows
我想按 'weekday' 和 'member_casual' 分组,然后汇总以获得一周中每一天的平均骑手人数,即,对于 'Monday' 和 'casual' 行:(周一的次数& Casual 出现在数据中)/(给定时间范围内的实际星期一数)。这是我最接近的地方。
#finding the total number of weeks in the given time frame.
weeks_ <-as.numeric(difftime(max(df2$started_at),min(df2$started_at),units="weeks"))
#assuming there are only complete weeks
df2 %>% group_by(weekday,member_casual)%>% summarise("Average Riders"=(n()/weeks_))
由于时间范围非常大,此输出不准确但足够准确。
weekday member_casual `Average Riders`
<fct> <chr> <dbl>
1 Monday casual 4404.
2 Monday member 6688.
3 Tuesday casual 4279.
4 Tuesday member 7289.
5 Wednesday casual 4434.
6 Wednesday member 7648.
7 Thursday casual 4447.
8 Thursday member 7285.
9 Friday casual 5807.
10 Friday member 7452.
11 Saturday casual 9366.
12 Saturday member 7612.
13 Sunday casual 7527.
14 Sunday member 6331.
【问题讨论】:
-
逻辑不清楚
the number of Mondays in the given timeframe -
您尝试了哪些不起作用的方法?你的输出不是你想要的怎么办?
-
@camille 这只是计数,而不是每周平均计数。
-
@akrun 我的意思是数据框中最新日期和最旧日期之间出现的实际星期一数,我希望一周中的每一天都按“member_casual”进一步分组
-
您能否用一个具有预期输出的可重复的小示例更新您的帖子。您显示的输入数据来自完整数据,预计来自完整数据。如果我们有一个小例子,交叉检查会变得更容易
标签: r