这是一个data.table 解决方案。我试图避免手动计算,并采用基于长到宽转换的解决方案。
这是我的解决方案,后面有详细的说明:
library(lubridate)
library(data.table)
dt <- setDT(dt)
dt[,Date := date(Date)]
dt[,type := fifelse(Type == "SS_RT",fifelse(Remark == "AT_1_O","A1","A2"),"B")]
## transform to wide
df2 <- rbind(dcast(data = dt,Date~type ,value.var = "Price",fill = 0)[,linetype := "count"],
dcast(data = dt,Date~type ,value.var = "Price",fill = 0,fun.aggregate = sum)[,linetype := "value"])
## A and tot
df2[,tot := rowSums(.SD),.SDcols = c("A1","A2","B")]
df2[,A := A1+A2]
## create pc
cols <- c("A","A1","A2","B")
df2[,paste0(cols,"_pc") := lapply(.SD,function(x) round(x/tot*100) ),.SDcols = cols]
cols <- c("A1","A2")
df2[,paste0(cols,"_exc") := lapply(.SD,function(x) round(x/(A1+A2)*100) ),.SDcols = cols]
## add missing dates
df2 <- merge(CJ(Date = seq(min(dt$Date),max(dt$Date),1),linetype = c("count","value")),
df2,all = T,by = c("Date","linetype"))
df2[is.na(df2)] <- 0
df2[,linetype := NULL]
df2
Date A1 A2 B tot A A_pc A1_pc A2_pc B_pc A1_exc A2_exc
1: 2020-12-01 3 1 3 7 4 57 43 14 43 75 25
2: 2020-12-01 3800 1200 4200 9200 5000 54 41 13 46 76 24
3: 2020-12-02 0 0 0 0 0 0 0 0 0 0 0
4: 2020-12-02 0 0 0 0 0 0 0 0 0 0 0
5: 2020-12-03 0 0 0 0 0 0 0 0 0 0 0
6: 2020-12-03 0 0 0 0 0 0 0 0 0 0 0
7: 2020-12-04 0 0 0 0 0 0 0 0 0 0 0
8: 2020-12-04 0 0 0 0 0 0 0 0 0 0 0
9: 2020-12-05 0 0 0 0 0 0 0 0 0 0 0
10: 2020-12-05 0 0 0 0 0 0 0 0 0 0 0
11: 2020-12-06 0 0 0 0 0 0 0 0 0 0 0
12: 2020-12-06 0 0 0 0 0 0 0 0 0 0 0
13: 2020-12-07 0 1 2 3 1 33 0 33 67 0 100
14: 2020-12-07 0 1600 3200 4800 1600 33 0 33 67 0 100
所以第一步是我按照您的规则创建type 变量:
dt[,Date := date(Date)]
dt[,type := fifelse(Type == "SS_RT",fifelse(Remark == "AT_1_O","A1","A2"),"B")]
我们知道A 就是A1 + A2。它允许我将表格转换为宽格式。我做了两次:一次计算,一次计算每种类型的总和:
dcast(data = dt,Date ~ type ,value.var = "Price",fill = 0)
Date A1 A2 B
1: 2020-12-01 3 1 3
2: 2020-12-07 0 1 2
这里我计算每种类型的出现次数,因为它使用默认聚合:lenght。
如果我使用sum 作为聚合函数:
dcast(data = dt,Date~type ,value.var = "Price",fill = 0,fun.aggregate = sum)
Date A1 A2 B
1: 2020-12-01 3800 1200 4200
2: 2020-12-07 0 1600 3200
我添加了linetype 变量,这将帮助我添加缺失的日期(我使用它来保持每个日期两行)。
我绑定两者,我得到:
Date A1 A2 B linetype
1: 2020-12-01 3 1 3 count
2: 2020-12-07 0 1 2 count
3: 2020-12-01 3800 1200 4200 value
4: 2020-12-07 0 1600 3200 value
然后我计算A 和总数:
df2[,tot := rowSums(.SD),.SDcols = c("A1","A2","B")]
df2[,A := A1+A2]
然后,我使用 lapply 和要转换的列的向量计算百分比 (_pc) 和 Excl 变量(为简单起见,我将其命名为 _exc)。我使用fifelse 来避免除以0:
cols <- c("A","A1","A2","B")
df2[,paste0(cols,"_pc") := lapply(.SD,function(x) round(x/tot*100) ),.SDcols = cols]
cols <- c("A1","A2")
df2[,paste0(cols,"_exc") := lapply(.SD,function(x) round(x/(A1+A2)*100) ),.SDcols = cols]
Date A1 A2 B linetype tot A A_pc A1_pc A2_pc B_pc A1_exc A2_exc
1: 2020-12-01 3 1 3 count 7 4 57 43 14 43 75 25
2: 2020-12-01 3800 1200 4200 value 9200 5000 54 41 13 46 76 24
3: 2020-12-07 0 1 2 count 3 1 33 0 33 67 0 100
4: 2020-12-07 0 1600 3200 value 4800 1600 33 0 33 67 0 100
然后我通过合并linetype 和Date 的所有组合并保留所有行来添加缺失的日期。我使用CJ 函数创建了一个data.table,其中包含两个变量的所有组合:
CJ(Date = seq(min(dt$Date),max(dt$Date),1),linetype = c("count","value"))
Date linetype
1: 2020-12-01 count
2: 2020-12-01 value
3: 2020-12-02 count
4: 2020-12-02 value
5: 2020-12-03 count
6: 2020-12-03 value
7: 2020-12-04 count
8: 2020-12-04 value
9: 2020-12-05 count
10: 2020-12-05 value
11: 2020-12-06 count
12: 2020-12-06 value
13: 2020-12-07 count
14: 2020-12-07 value
然后用 0 替换缺失值并抑制 linetype 变量。
然后您可以使用setcolorder 重新排序列,并使用kabbleExtra(请参阅here)生成您的html 输出。
您可以对dplyr 执行相同操作,使用pivot_wider 转换为宽,mutate_all 代替lapply(.SD,...) 进行计算,expand.grid 代替CJ 生成缺失表日期。