使用 tidyverse 从选择性的“每月”数字创建“每天”行答案

【问题标题】：creating "Per Day" rows, from selective "Per Month" figures using tidyverse使用 tidyverse 从选择性的“每月”数字创建“每天”行
【发布时间】：2019-11-08 05:55:50
【问题描述】：

我有一组销售报告，其中包含报告“每天”或“每月”销售数据的商店。当我将它们绘制在同一张图表上时，“每月”数字看起来像尖峰，使图表难以理解。

我希望将这些“每月一次”的数字转换为一个月中的几天平均分布，这样我就可以绘制每日销售图表。

我设法使用 tidyverse, lubridate 来计算数据集中的“sales_per_day”列。如何创建“每天 1 行”的行，即。对于 2019-01，从每 1 行每月数据创建 30 行每日行？

sales <- tibble(
  distributor = c("StoreA", "StoreA", "StoreA", "StoreA", "StoreB"), 
  sales = c(100,200,300,400,5000), 
  date = c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-30"),
  freq = c("daily", "daily", "daily", "daily", "monthly"))

> sales
# A tibble: 5 x 4
  distributor sales date       freq   
  <chr>       <dbl> <chr>      <chr>  
1 StoreA        100 2019-01-01 daily  
2 StoreA        200 2019-01-02 daily  
3 StoreA        300 2019-01-03 daily  
4 StoreA        400 2019-01-04 daily  
5 StoreB       5000 2019-01-30 monthly


wanted_sales <- tibble(
  distributor = c("StoreA", "StoreA", "StoreA", "StoreA", "StoreB", "StoreB", "StoreB", "StoreB"), 
  sales = c(100, 200, 300, 400, 5000 / 30, 5000 / 30, 5000 / 30, 5000 / 30), 
  date = c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"),
  freq = c("daily", "daily", "daily", "daily", "daily", "daily", "daily", "daily" ))

> wanted_sales
# A tibble: 8 x 4
  distributor sales date       freq 
  <chr>       <dbl> <chr>      <chr>
1 StoreA       100  2019-01-01 daily
2 StoreA       200  2019-01-02 daily
3 StoreA       300  2019-01-03 daily
4 StoreA       400  2019-01-04 daily
5 StoreB       167. 2019-01-01 daily
6 StoreB       167. 2019-01-02 daily
7 StoreB       167. 2019-01-03 daily
8 StoreB       167. 2019-01-04 daily

per_day <- sales %>% filter(freq == "monthly") %>%
  group_by(date) %>%
  mutate(mdays = as.integer(days_in_month(as_date(date)))) %>%
  mutate(sales_per_day = sales / mdays)

> per_day
# A tibble: 1 x 6
# Groups:   date [1]
  distributor sales date       freq    mdays sales_per_day
  <chr>       <dbl> <chr>      <chr>   <int>         <dbl>
1 StoreB       5000 2019-01-30 monthly    31          161.

我希望将生成的 per_day tibble 设为 30 行，其中 $date 列是“2019-01-01”、“2019-01-02”...“2019-01-30”的序列。

【问题讨论】：

标签： r tidyverse

【解决方案1】：

我们可以将date 更改为实际的日期类并创建一个新列startdate，如果freq 不是"daily" 并且sales 除以30，则该列将具有该特定月份的第一天。对于每个date 我们使用complete 创建日期序列并将freq 更改为"daily"。

library(dplyr)
library(tidyr)
library(lubridate)

sales %>%
  mutate(date = as.Date(date), 
         startdate = if_else(freq == "daily", date, floor_date(date, "month")), 
         sales = if_else(freq == "daily", sales, sales/30)) %>%
   group_by(date) %>%
   complete(date = seq(startdate, date, "1 day"), sales = sales, 
            freq = "daily", distributor = distributor) %>%
   select(-startdate)

# Groups:   date [30]
#   date       sales freq  distributor
#   <date>     <dbl> <chr> <chr>      
# 1 2019-01-01  100  daily StoreA     
# 2 2019-01-02  200  daily StoreA     
# 3 2019-01-03  300  daily StoreA     
# 4 2019-01-04  400  daily StoreA     
# 5 2019-01-01  167. daily StoreB     
# 6 2019-01-02  167. daily StoreB     
# 7 2019-01-03  167. daily StoreB     
# 8 2019-01-04  167. daily StoreB     
# 9 2019-01-05  167. daily StoreB     
#10 2019-01-06  167. daily StoreB     
# … with 25 more rows

【讨论】：

太棒了！！！从来不知道complete()，多么聪明的函数！在我绝望的尝试中，我这样做了：R month_days <- tibble(date = as_date(paste0(as_date("2010-01-01") + 1:as.integer(as_date(now()) - as_date("2010-01-01"))))) %>% mutate(ymon = floor_date(date, unit = "month")) 这创建了一个带有日期列表的小标题，带有 year_mon 值，然后我可以使用 left_join 。它确实有效，但是您的解决方案要好得多！感谢您向我展示了一种非常优雅的方式，Ronak！