【发布时间】:2020-01-01 01:57:48
【问题描述】:
我有五个可能的character 状态中的data.frame (genotypes):
genotypes <- c("0/0","1/1","0/1","1/0","./.")
library(dplyr)
set.seed(1)
df <- do.call(rbind, lapply(1:100, function(i)
matrix(sample(genotypes, 30, replace = T), nrow = 1, dimnames = list(NULL, paste0("V", 1:30))))) %>%
data.frame()
我想把每一行总结为我有多少:
-
ref.hom(0/0) -
alt.hom(1/1) -
het(0/1或1/0) -
na(./.)
这似乎很慢:
sum.df <- do.call(rbind,lapply(1:nrow(df), function(i){
data.frame(ref.hom = length(which(df[i,] == "0/0")),
alt.hom = length(which(df[i,] == "1/1")),
het = length(which(df[i,] == "0/1") | which(df[i,] == "1/0")),
na = length(which(df[i,] == "./.")))
}))
还有更有效的方法,也许是基于dplyr 的方法来做到这一点?
【问题讨论】:
标签: r dataframe dplyr bioinformatics summarize