【发布时间】:2017-03-13 11:42:11
【问题描述】:
我正在为 R 中的数据转换而苦苦挣扎。我收到的数据属于这种类型:
input <- data.frame(AF = sample(0:1, 100, replace=TRUE),
CAD = sample(0:1, 100, replace=TRUE),
CHF = sample(0:1, 100, replace=TRUE),
DEM = sample(0:1, 100, replace=TRUE),
DIAB = sample(0:1, 100, replace=TRUE))
input$Counts <- rowSums(input)
我想要实现的输出是:
output <- data.frame(Condition = c('AF', 'CAD', 'CHF', 'DEM', 'DIAB'),
'1' = sample(11:20, 5, replace=TRUE),
'2' = sample(11:20, 5, replace=TRUE),
'3' = sample(11:20, 5, replace=TRUE),
'4' = sample(11:20, 5, replace=TRUE),
'5' = sample(11:20, 5, replace=TRUE))
其中的交点是与条件匹配的观察计数(现在位于第一列)和行总和(现在是单独的列)。
我的解决方案如下,但我想知道是否有更优雅的解决方案?
data.frame(Condition = colnames(input[ ,1:5]),
"One" = c(nrow(input[input$AF==1 & input$Counts==1,]),
nrow(input[input$CAD==1 & input$Counts==1,]),
nrow(input[input$CHF==1 & input$Counts==1,]),
nrow(input[input$DEM==1 & input$Counts==1,]),
nrow(input[input$DIAB==1 & input$Counts==1,])),
"Two" = c(nrow(input[input$AF==1 & input$Counts==2,]),
nrow(input[input$CAD==1 & input$Counts==2,]),
nrow(input[input$CHF==1 & input$Counts==2,]),
nrow(input[input$DEM==1 & input$Counts==2,]),
nrow(input[input$DIAB==1 & input$Counts==2,])),
"Three" = c(nrow(input[input$AF==1 & input$Counts==3,]),
nrow(input[input$CAD==1 & input$Counts==3,]),
nrow(input[input$CHF==1 & input$Counts==3,]),
nrow(input[input$DEM==1 & input$Counts==3,]),
nrow(input[input$DIAB==1 & input$Counts==3,])),
"Four" = c(nrow(input[input$AF==1 & input$Counts==4,]),
nrow(input[input$CAD==1 & input$Counts==4,]),
nrow(input[input$CHF==1 & input$Counts==4,]),
nrow(input[input$DEM==1 & input$Counts==4,]),
nrow(input[input$DIAB==1 & input$Counts==4,])),
"Five" = c(nrow(input[input$AF==1 & input$Counts==5,]),
nrow(input[input$CAD==1 & input$Counts==5,]),
nrow(input[input$CHF==1 & input$Counts==5,]),
nrow(input[input$DEM==1 & input$Counts==5,]),
nrow(input[input$DIAB==1 & input$Counts==5,])),
"Six" = c(nrow(input[input$AF==1 & input$Counts==6,]),
nrow(input[input$CAD==1 & input$Counts==6,]),
nrow(input[input$CHF==1 & input$Counts==6,]),
nrow(input[input$DEM==1 & input$Counts==6,]),
nrow(input[input$DIAB==1 & input$Counts==6,]))
)
【问题讨论】:
-
这有助于表达这一点:“按计数拆分,然后按总和聚合每一列”
-
取行求和,然后计数
nrow(input[input$Var==1 & input$Counts==whatever,])只是一种间接的按列求和,拆分然后按计数组合的方式。
标签: r aggregate multiple-columns rows split-apply-combine