【问题标题】:creating "Per Day" rows, from selective "Per Month" figures using tidyverse使用 tidyverse 从选择性的“每月”数字创建“每​​天”行
【发布时间】:2019-11-08 05:55:50
【问题描述】:

我有一组销售报告,其中包含报告“每天”或“每月”销售数据的商店。 当我将它们绘制在同一张图表上时,“每月”数字看起来像尖峰,使图表难以理解。

我希望将这些“每月一次”的数字转换为一个月中的几天平均分布,这样我就可以绘制每日销售图表。

我设法使用 tidyverse, lubridate 来计算数据集中的“sales_per_day”列。如何创建“每天 1 行”的行,即。对于 2019-01,从每 1 行每月数据创建 30 行每日行?

sales <- tibble(
  distributor = c("StoreA", "StoreA", "StoreA", "StoreA", "StoreB"), 
  sales = c(100,200,300,400,5000), 
  date = c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-30"),
  freq = c("daily", "daily", "daily", "daily", "monthly"))

> sales
# A tibble: 5 x 4
  distributor sales date       freq   
  <chr>       <dbl> <chr>      <chr>  
1 StoreA        100 2019-01-01 daily  
2 StoreA        200 2019-01-02 daily  
3 StoreA        300 2019-01-03 daily  
4 StoreA        400 2019-01-04 daily  
5 StoreB       5000 2019-01-30 monthly


wanted_sales <- tibble(
  distributor = c("StoreA", "StoreA", "StoreA", "StoreA", "StoreB", "StoreB", "StoreB", "StoreB"), 
  sales = c(100, 200, 300, 400, 5000 / 30, 5000 / 30, 5000 / 30, 5000 / 30), 
  date = c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"),
  freq = c("daily", "daily", "daily", "daily", "daily", "daily", "daily", "daily" ))

> wanted_sales
# A tibble: 8 x 4
  distributor sales date       freq 
  <chr>       <dbl> <chr>      <chr>
1 StoreA       100  2019-01-01 daily
2 StoreA       200  2019-01-02 daily
3 StoreA       300  2019-01-03 daily
4 StoreA       400  2019-01-04 daily
5 StoreB       167. 2019-01-01 daily
6 StoreB       167. 2019-01-02 daily
7 StoreB       167. 2019-01-03 daily
8 StoreB       167. 2019-01-04 daily

per_day <- sales %>% filter(freq == "monthly") %>%
  group_by(date) %>%
  mutate(mdays = as.integer(days_in_month(as_date(date)))) %>%
  mutate(sales_per_day = sales / mdays)

> per_day
# A tibble: 1 x 6
# Groups:   date [1]
  distributor sales date       freq    mdays sales_per_day
  <chr>       <dbl> <chr>      <chr>   <int>         <dbl>
1 StoreB       5000 2019-01-30 monthly    31          161.

我希望将生成的 per_day tibble 设为 30 行,其中 $date 列是“2019-01-01”、“2019-01-02”...“2019-01-30”的序列。

【问题讨论】:

    标签: r tidyverse


    【解决方案1】:

    我们可以将date 更改为实际的日期类并创建一个新列startdate,如果freq 不是"daily" 并且sales 除以30,则该列将具有该特定月份的第一天。对于每个date 我们使用complete 创建日期序列并将freq 更改为"daily"

    library(dplyr)
    library(tidyr)
    library(lubridate)
    
    sales %>%
      mutate(date = as.Date(date), 
             startdate = if_else(freq == "daily", date, floor_date(date, "month")), 
             sales = if_else(freq == "daily", sales, sales/30)) %>%
       group_by(date) %>%
       complete(date = seq(startdate, date, "1 day"), sales = sales, 
                freq = "daily", distributor = distributor) %>%
       select(-startdate)
    
    # Groups:   date [30]
    #   date       sales freq  distributor
    #   <date>     <dbl> <chr> <chr>      
    # 1 2019-01-01  100  daily StoreA     
    # 2 2019-01-02  200  daily StoreA     
    # 3 2019-01-03  300  daily StoreA     
    # 4 2019-01-04  400  daily StoreA     
    # 5 2019-01-01  167. daily StoreB     
    # 6 2019-01-02  167. daily StoreB     
    # 7 2019-01-03  167. daily StoreB     
    # 8 2019-01-04  167. daily StoreB     
    # 9 2019-01-05  167. daily StoreB     
    #10 2019-01-06  167. daily StoreB     
    # … with 25 more rows
    

    【讨论】:

    • 太棒了!!!从来不知道complete(),多么聪明的函数!在我绝望的尝试中,我这样做了:R month_days &lt;- tibble(date = as_date(paste0(as_date("2010-01-01") + 1:as.integer(as_date(now()) - as_date("2010-01-01"))))) %&gt;% mutate(ymon = floor_date(date, unit = "month")) 这创建了一个带有日期列表的小标题,带有 year_mon 值,然后我可以使用 left_join 。它确实有效,但是您的解决方案要好得多!感谢您向我展示了一种非常优雅的方式,Ronak!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-01-31
    • 2019-05-30
    • 1970-01-01
    • 2019-03-09
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多