每月至每日价值 - R答案

【问题标题】：Monthly to Daily Value - R每月至每日价值 - R
【发布时间】：2022-01-12 18:18:10
【问题描述】：

我有一个包含每月数据的 data.frame df：

Company	Store	Brand	Month	Sales	Budget	Quantity	Year
A	Store A	Brand A	Jun	$300	$300	3000	2022
A	Store A	Brand A	Jul	$300	$300	3000	2022
A	Store A	Brand A	Aug	$300	$300	3000	2022
A	Store A	Brand A	Sep	$300	$300	3000	2022

我希望每天都有平均值，例如（Jun 有 30 天，所以销售额 300 美元 / 30 天 = 每天 10 美元）：

Company	Store	Brand	Month	Sales	Budget	Quantity	Date
A	Store A	Brand A	Jun	$10	$10	100	01-06-2022
A	Store A	Brand A	Jun	$10	$10	100	02-06-2022
A	Store A	Brand A	Jun	$10	$10	100	03-06-2022
A	Store A	Brand A	Jun	$10	$10	100	04-06-2022
A	Store A	Brand A	Jun	$10	$10	100	05-06-2022
A	Store A	Brand A	Jun	$10	$10	100	06-06-2022
A	Store A	Brand A	Jun	$10	$10	100	07-06-2022

我不知道代码可以使用什么函数。

谢谢！

【问题讨论】：

哪种语言？ Ago 不是一个可识别的月份。可能你的意思是Aug？
是的！抱歉，我的主要语言是西班牙语
@Onyambu，来自web.library.yale.edu/cataloging/months我猜是西班牙语还是葡萄牙语。

标签： r date

【解决方案1】：

前面有几件事：

我推断您的语言环境设置为西班牙语（基于"Ago"，我假设是 8 月）。要运行此代码，我首先设置我的本地语言环境，以便它能够正确解析。您可能不需要这个，但其他人（使用其他语言）可能需要这个或类似的东西来测试此代码。
```
prevlocale <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "Spanish")
# [1] "Spanish_Spain.1252"
format(as.Date("2022-08-01"), format = "%b")
# [1] "ago"
### when done and you want to return to your local locale
Sys.setlocale("LC_TIME", prevlocale)
```
您的数字列不是数字。我会将它们更改为numeric，否则数学运算将不起作用，您可以根据需要将它们重新格式化为$-strings。

我们不能以简单的一步逻辑对所有行执行此操作，因为每个月都有不同的天数。要向前迈进，第一个挑战是确定每个月有多少天。有几种方法可以解决这个问题（包括lubridate 包），我将提供一个base-R 解决方案（使用as.POSIXlt）来解决这个问题并返回正确的日期向量。

yrmon2days <- function(yr, mon) {
  stopifnot(length(yr) == 1L, length(mon) == 1L)
  day2 <- day1 <- as.POSIXlt(as.Date(paste(yr, mon, "01", sep = "-"), format = "%Y-%b-%d"))
  day2$mon <- day2$mon + 1L
  seq(day1, day2-1, by = "day")
}
yrmon2days(2022, "Feb")
#  [1] "2022-02-01 UTC" "2022-02-02 UTC" "2022-02-03 UTC" "2022-02-04 UTC" "2022-02-05 UTC" "2022-02-06 UTC" "2022-02-07 UTC"
#  [8] "2022-02-08 UTC" "2022-02-09 UTC" "2022-02-10 UTC" "2022-02-11 UTC" "2022-02-12 UTC" "2022-02-13 UTC" "2022-02-14 UTC"
# [15] "2022-02-15 UTC" "2022-02-16 UTC" "2022-02-17 UTC" "2022-02-18 UTC" "2022-02-19 UTC" "2022-02-20 UTC" "2022-02-21 UTC"
# [22] "2022-02-22 UTC" "2022-02-23 UTC" "2022-02-24 UTC" "2022-02-25 UTC" "2022-02-26 UTC" "2022-02-27 UTC" "2022-02-28 UTC"

当前当前不可矢量化；可以这样做，但是周围的数据还有其他复杂性，这使得这一步目前有点过分了。

我尝试过使用dplyr::group_by 并进行一般分组，但虽然这很有意义，但我不想假设每年/每月一行。有了这种预防措施，很明显我们需要逐行操作，而不是可能（虽然不是使用此数据）返回每组中的 1 行以外的东西。

dplyr

library(dplyr)
dat %>%
  mutate(across(c(Sales, Budget), ~ as.numeric(gsub("\\$", "", .)))) %>%
  rowwise() %>%
  summarize(
    Date = yrmon2days(Year, Month),
    Company, Store, Brand, Year, Month,
    across(c(where(is.numeric), -Year), ~ . / length(Date))
  )
# # A tibble: 122 x 9
#    Date                Company Store   Brand    Year Month Sales Budget Quantity
#    <dttm>              <chr>   <chr>   <chr>   <int> <chr> <dbl>  <dbl>    <dbl>
#  1 2022-06-01 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  2 2022-06-02 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  3 2022-06-03 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  4 2022-06-04 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  5 2022-06-05 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  6 2022-06-06 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  7 2022-06-07 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  8 2022-06-08 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  9 2022-06-09 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
# 10 2022-06-10 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
# # ... with 112 more rows

基础 R

dat[c("Sales","Budget")] <- lapply(dat[c("Sales","Budget")], function(z) as.numeric(gsub("\\$", "", z,)))
isnum <- sapply(dat, is.numeric)
isnum[which(colnames(dat) == "Year")] <- FALSE
out <- do.call(rbind, lapply(seq_len(nrow(dat)), function(rn) {
  Date <- yrmon2days(dat$Year[rn], dat$Month[rn])
  Nums <- lapply(dat[rn,isnum], `/`, length(Date))
  suppressWarnings( # "row names were found from a short variable and have been discarded"
    cbind(dat[rn,!isnum], Nums, data.frame(Date = Date))
  )
}))
head(out)
#   Company   Store   Brand Month Year Sales Budget Quantity       Date
# 1       A Store A Brand A   Jun 2022    10     10      100 2022-06-01
# 2       A Store A Brand A   Jun 2022    10     10      100 2022-06-02
# 3       A Store A Brand A   Jun 2022    10     10      100 2022-06-03
# 4       A Store A Brand A   Jun 2022    10     10      100 2022-06-04
# 5       A Store A Brand A   Jun 2022    10     10      100 2022-06-05
# 6       A Store A Brand A   Jun 2022    10     10      100 2022-06-06

数据

dat <- structure(list(Company = c("A", "A", "A", "A"), Store = c("Store A", "Store A", "Store A", "Store A"), Brand = c("Brand A", "Brand A", "Brand A", "Brand A"), Month = c("Jun", "Jul", "Ago", "Sep"), Sales = c("$300", "$300", "$300", "$300"), Budget = c("$300", "$300", "$300", "$300"), Quantity = c(3000L, 3000L, 3000L, 3000L), Year = c(2022L, 2022L, 2022L, 2022L)), class = "data.frame", row.names = c(NA, -4L))

【讨论】：

当我同时使用两个选项（Dplyr 和 Base）时，我得到这个错误：“seq.int(0, to0 - from, by) 中的错误：'to' 必须是有限数”。我尝试使用我的数据框和您的示例数据库，我得到了同样的错误。
检查以确保您的所有月份都被识别。 unique(setdiff(tolower(dat$Month), tolower(format(as.Date(paste(2022, 1:12, 1, sep="-")), format = "%b")))) 返回什么？