使用 dplyr 从具有多个物种、处理和变量的数据框中计算百分比答案

【问题标题】：Calculate percent from data frame with several species, treatments and variables using dplyr使用 dplyr 从具有多个物种、处理和变量的数据框中计算百分比
【发布时间】：2016-05-27 17:48:10
【问题描述】：

问题

创建一个包含百分比的新行

数据

 df<- data.frame(
     species   = c ("A","A","A","A","B","B","B","B","A","A","A","A","B","B","B","B"),
     number    = c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2),
     treatment = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1),
     variable  = c ("x","y","x","y","x","y","x","y","x","y","x","y","x","y","x","y"),
     value = sample(1:16)
    )

问题

我想计算给定数量和处理的物种的百分比......即变量 x 和 y（前两行）的总和应为 100%。

我用 dplyr 试过了：

result <- df%>%
    group_by(variable) %>%
    mutate(percent = value*100/sum(value))

test<-subset(result,variable=="x")
sum(test[,6]) # sums to 100%

“测试”是错误的，因为它是两个物种和两个治疗中所有 x 的百分比。

期望的输出

 species number treatment variable value    percent
    A      1         0        x     40         40
    A      1         0        y     60         60
    A      2         0        x      1         10
    A      2         0        y      9         90

【问题讨论】：

你需要df %>% group_by(variable) %>% mutate(percent= value*100/sum(df$value))
不，那只是我的尝试。任何解决方案都可以..
我的意思是 sum(df$value) 而不是 sum(value)
比较@akrun 的方法和你的方法的输出：它们是不同的。按照您描述的方式，akrun 的方法为您提供了正确的解决方案。
当您使用sample 时，请使用set.seed 以便它可以重现。

标签： r dataframe dplyr plyr

【解决方案1】：

这是一个使用tidyr的答案：

require(tidyr)
require(dplyr) 

df %>% spread(variable, value) %>% 
        mutate(percent.x = x / (x+y), 
               percent.y = y / (x+y))

这里还有一个dplyr-only 解决方案：

df %>% group_by(number, treatment, species) %>% 
        mutate(percent = 100 * value / sum(value))

您的问题是您在完全错误的变量上执行group_by()。由于您希望在特定的 (number, treatment, solution) 组合中定义百分比，但要在您的 variable 中有所不同，您应该 group_by() 前者，而不是后者。

【讨论】：

谢谢。我以为我已经尝试过“group_by（数字，治疗，物种）”但它没有用。无论如何，现在它起作用了！ :)

【解决方案2】：

这就是你要找的吗？我正在使用data.table 包：

library(data.table)
DT <- as.data.table(df)

DT_output <- DT[,list(value=sum(value)),by=c('species', 'number', 'treatment', 'variable')]
DT_temp <- DT[,list(sum=sum(value)),by=c('species', 'number', 'treatment' )]

T_output <- merge(DT_output, DT_temp, by = c('species', 'number', 'treatment'))

DT_output[, percent := 100 * value / sum]

setorder(DT_output, species,treatment,number,variable)
DT_output

【讨论】：