前面有几件事:
-
我推断您的语言环境设置为西班牙语(基于"Ago",我假设是 8 月)。要运行此代码,我首先设置我的本地语言环境,以便它能够正确解析。您可能不需要这个,但其他人(使用其他语言)可能需要这个或类似的东西来测试此代码。
prevlocale <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "Spanish")
# [1] "Spanish_Spain.1252"
format(as.Date("2022-08-01"), format = "%b")
# [1] "ago"
### when done and you want to return to your local locale
Sys.setlocale("LC_TIME", prevlocale)
-
您的数字列不是数字。我会将它们更改为numeric,否则数学运算将不起作用,您可以根据需要将它们重新格式化为$-strings。
-
我们不能以简单的一步逻辑对所有行执行此操作,因为每个月都有不同的天数。要向前迈进,第一个挑战是确定每个月有多少天。有几种方法可以解决这个问题(包括lubridate 包),我将提供一个base-R 解决方案(使用as.POSIXlt)来解决这个问题并返回正确的日期向量。
yrmon2days <- function(yr, mon) {
stopifnot(length(yr) == 1L, length(mon) == 1L)
day2 <- day1 <- as.POSIXlt(as.Date(paste(yr, mon, "01", sep = "-"), format = "%Y-%b-%d"))
day2$mon <- day2$mon + 1L
seq(day1, day2-1, by = "day")
}
yrmon2days(2022, "Feb")
# [1] "2022-02-01 UTC" "2022-02-02 UTC" "2022-02-03 UTC" "2022-02-04 UTC" "2022-02-05 UTC" "2022-02-06 UTC" "2022-02-07 UTC"
# [8] "2022-02-08 UTC" "2022-02-09 UTC" "2022-02-10 UTC" "2022-02-11 UTC" "2022-02-12 UTC" "2022-02-13 UTC" "2022-02-14 UTC"
# [15] "2022-02-15 UTC" "2022-02-16 UTC" "2022-02-17 UTC" "2022-02-18 UTC" "2022-02-19 UTC" "2022-02-20 UTC" "2022-02-21 UTC"
# [22] "2022-02-22 UTC" "2022-02-23 UTC" "2022-02-24 UTC" "2022-02-25 UTC" "2022-02-26 UTC" "2022-02-27 UTC" "2022-02-28 UTC"
当前当前不可矢量化;可以这样做,但是周围的数据还有其他复杂性,这使得这一步目前有点过分了。
-
我尝试过使用dplyr::group_by 并进行一般分组,但虽然这很有意义,但我不想假设每年/每月一行。有了这种预防措施,很明显我们需要逐行操作,而不是可能(虽然不是使用此数据)返回每组中的 1 行以外的东西。
dplyr
library(dplyr)
dat %>%
mutate(across(c(Sales, Budget), ~ as.numeric(gsub("\\$", "", .)))) %>%
rowwise() %>%
summarize(
Date = yrmon2days(Year, Month),
Company, Store, Brand, Year, Month,
across(c(where(is.numeric), -Year), ~ . / length(Date))
)
# # A tibble: 122 x 9
# Date Company Store Brand Year Month Sales Budget Quantity
# <dttm> <chr> <chr> <chr> <int> <chr> <dbl> <dbl> <dbl>
# 1 2022-06-01 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 2 2022-06-02 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 3 2022-06-03 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 4 2022-06-04 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 5 2022-06-05 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 6 2022-06-06 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 7 2022-06-07 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 8 2022-06-08 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 9 2022-06-09 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 10 2022-06-10 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# # ... with 112 more rows
基础 R
dat[c("Sales","Budget")] <- lapply(dat[c("Sales","Budget")], function(z) as.numeric(gsub("\\$", "", z,)))
isnum <- sapply(dat, is.numeric)
isnum[which(colnames(dat) == "Year")] <- FALSE
out <- do.call(rbind, lapply(seq_len(nrow(dat)), function(rn) {
Date <- yrmon2days(dat$Year[rn], dat$Month[rn])
Nums <- lapply(dat[rn,isnum], `/`, length(Date))
suppressWarnings( # "row names were found from a short variable and have been discarded"
cbind(dat[rn,!isnum], Nums, data.frame(Date = Date))
)
}))
head(out)
# Company Store Brand Month Year Sales Budget Quantity Date
# 1 A Store A Brand A Jun 2022 10 10 100 2022-06-01
# 2 A Store A Brand A Jun 2022 10 10 100 2022-06-02
# 3 A Store A Brand A Jun 2022 10 10 100 2022-06-03
# 4 A Store A Brand A Jun 2022 10 10 100 2022-06-04
# 5 A Store A Brand A Jun 2022 10 10 100 2022-06-05
# 6 A Store A Brand A Jun 2022 10 10 100 2022-06-06
数据
dat <- structure(list(Company = c("A", "A", "A", "A"), Store = c("Store A", "Store A", "Store A", "Store A"), Brand = c("Brand A", "Brand A", "Brand A", "Brand A"), Month = c("Jun", "Jul", "Ago", "Sep"), Sales = c("$300", "$300", "$300", "$300"), Budget = c("$300", "$300", "$300", "$300"), Quantity = c(3000L, 3000L, 3000L, 3000L), Year = c(2022L, 2022L, 2022L, 2022L)), class = "data.frame", row.names = c(NA, -4L))