【发布时间】:2015-04-05 12:08:22
【问题描述】:
我有一个如下的数据框(称为 dat)
chr chrStart chrEnd Gene RChr RStart REnd Rname distance
chr1 39841 39883 Gene1 chr1 398 3984 Cha1b 0
chr1 39841 39883 Gene1 chr1 398 3985 Ab 0
chr1 39841 39883 Gene1 chr1 398 3986 Tia 0
chr1 39841 39883 Gene1 chr1 398 3987 MEA 0
chr1 39841 39883 Gene1 chr1 398 3988 La 0
chr1 39841 39883 Gene1 chr1 398 3989 M3 0
chr1 14893 15893 Gene2 chr1 398 3984 Cha1b 0
chr1 14893 15893 Gene2 chr1 398 3985 Cha1b 0
chr1 14893 15893 Gene2 chr1 398 3986 Cha1b 0
chr1 14893 15893 Gene2 chr1 398 3987 MEA 0
chr1 14893 15893 Gene2 chr1 398 3988 MEA 0
chr1 39841 39883 Gene1 chr1 398 3989 M3 0
我想得到每个基因出现不同类型 Rname 的频率,所以上面的结果应该是这样的
Gene Rname Freq
Gene1 Cha1b 1
Gene1 Ab 1
Gene1 Tia 1
Gene1 MEA 1
Gene1 La 1
Gene1 M3 1
Gene2 Cha1b 3
Gene2 MEA 2
Gene2 M3 1
我尝试使用 dplyr 进行两个分组,但我认为这没有任何意义,而且它只是给了我每个基因的所有 Rname 的频率
library(dplyr)
GroupTbb <- dat %>%
group_by(Gene) %>%
group_by(Rname) %>%
summarise(freq = sum(Rname))
【问题讨论】:
-
base R选项是subset(as.data.frame(table(dat[c('Gene', 'Rname')])), Freq!=0)
标签: r