【问题标题】:Average number of riders for all the days of a week using dplyr [duplicate]使用 dplyr 一周中所有日子的平均骑手人数 [重复]
【发布时间】:2021-11-07 03:16:03
【问题描述】:

这就是我的数据的样子。

# A tibble: 4,722,462 x 5
   started_at          member_casual weekday   ride_length month 
   <dttm>              <chr>         <fct>           <dbl> <fct> 
 1 2020-08-20 18:08:14 member        Thursday        0.160 August
 2 2020-08-27 18:46:04 casual        Thursday        1.15  August
 3 2020-08-26 19:44:14 casual        Wednesday       2.15  August
 4 2020-08-27 12:05:41 casual        Thursday        0.801 August
 5 2020-08-27 16:49:02 casual        Thursday        0.180 August
 6 2020-08-27 17:26:23 casual        Thursday        0.691 August
 7 2020-08-26 20:14:02 casual        Wednesday       0.333 August
 8 2020-08-26 21:59:50 casual        Wednesday       0.212 August
 9 2020-08-26 19:17:42 casual        Wednesday       0.242 August
10 2020-08-27 15:13:57 casual        Thursday        0.467 August
# ... with 4,722,452 more rows

我想按 'weekday' 和 'member_casual' 分组,然后汇总以获得一周中每一天的平均骑手人数,即,对于 'Monday' 和 'casual' 行:(周一的次数& Casual 出现在数据中)/(给定时间范围内的实际星期一数)。这是我最接近的地方。

#finding the total number of weeks in the given time frame.
weeks_ <-as.numeric(difftime(max(df2$started_at),min(df2$started_at),units="weeks"))
#assuming there are only complete weeks
df2 %>% group_by(weekday,member_casual)%>% summarise("Average Riders"=(n()/weeks_))

由于时间范围非常大,此输出不准确但足够准确。

weekday   member_casual `Average Riders`
   <fct>     <chr>                    <dbl>
 1 Monday    casual                   4404.
 2 Monday    member                   6688.
 3 Tuesday   casual                   4279.
 4 Tuesday   member                   7289.
 5 Wednesday casual                   4434.
 6 Wednesday member                   7648.
 7 Thursday  casual                   4447.
 8 Thursday  member                   7285.
 9 Friday    casual                   5807.
10 Friday    member                   7452.
11 Saturday  casual                   9366.
12 Saturday  member                   7612.
13 Sunday    casual                   7527.
14 Sunday    member                   6331.

【问题讨论】:

  • 逻辑不清楚the number of Mondays in the given timeframe
  • 您尝试了哪些不起作用的方法?你的输出不是你想要的怎么办?
  • @camille 这只是计数,而不是每周平均计数。
  • @akrun 我的意思是数据框中最新日期和最旧日期之间出现的实际星期一数,我希望一周中的每一天都按“member_casual”进一步分组
  • 您能否用一个具有预期输出的可重复的小示例更新您的帖子。您显示的输入数据来自完整数据,预计来自完整数据。如果我们有一个小例子,交叉检查会变得更容易

标签: r


【解决方案1】:

我们可以使用

library(dplyr)
df1 %>%
   add_count(member_casual) %>%
   group_by(weekday, member_casual) %>%
   summarise(Average_Riders = n()/n, .groups = 'drop')

【讨论】:

    【解决方案2】:
    library(dplyr)
    df %>%
        group_by(weekday, member_casual) %>% 
        count()
    
      weekday   member_casual     n
      <chr>     <chr>         <int>
    1 Thursday  casual            5
    2 Thursday  member            1
    3 Wednesday casual            4
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-07-30
      • 2020-06-19
      • 1970-01-01
      • 1970-01-01
      • 2016-12-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多