【发布时间】:2016-08-20 15:55:07
【问题描述】:
我有两个数据框。第一个数据框是遗传变异列表、它们的标识符以及它们在染色体上的位置。第二个是基因列表,其中每行中的列指定基因在染色体上的开始和停止位置。
我想查看哪些遗传变异属于由 start_20 和 stop_20 列表示的基因“范围”。一个遗传变异可能属于超过 1 个基因的范围。例如,这里的 snp "rs1" 将映射到基因 A 和基因 B。
这是我迄今为止尝试过的:
基因范围的df
chromosome<-c("1", "1", "2")
start_20<-c("1", "1", "5")
stop_20<-c("4", "4", "6")
gene<-c("A", "B", "C")
genelist=data.frame(chromosome, start_20, stop_20, gene,stringsAsFactors=F )
snps的df及其位置
chromosome<-c("1", "2")
snp<-c("rs1", "rs2")
position<-c("3", "5")
snplist=data.frame(chromosome,snp,position,stringsAsFactors=F)
目的是通过碱基对位置将 snp 与基因匹配(即 snp 1 的位置为“3”,这意味着它映射到基因 A 和基因 B)。
genelist.bychrome <- vector("list", 2)
按染色体排列的基因列表。
for(i in 1:2) genelist.bychrome[[i]] <- genelist[genelist[,"chromosome"]==i,]
长度为 nrow(snplist) 的空容器 找到匹配的基因就放在这里
gene.matched <- rep("",nrow(snplist))
gene.matched<-as.list(gene.matched)
#looping across each observation in snplist
for(i in 1:nrow(snplist)){
# snplist[i,"chromosome"] is the chromosome of interest
# Because of consecutive ordering genelist.bychrome[[3]] gives the genelist for chromosome 3
Therefore, genelist.bychrome[[ snplist[i,"chromosome"] ]] gives the genelist for the chromosome of interest
VERY IMPORTANT: get.gene gives the index in genelist.bychrome[[ snplist[i,"chromosome"] ]], NOT genelist
if(snplist[i,"chromosome"] <= 1){
get.gene<- which((genelist.bychrome[[ snplist[i,"chromosome"] ]][,"stop_20"] >= snplist[i,"position"]) &
# get matching list element of genelist.bychrome
# in this element collect indices for rows where stop position is greater than the postion of the snp and
# start position is less than the position of the snp
# this should collect multiple rows for some snps
# dump the gene for this index in the matching element of gene.matched
# i.e get.gene<- which(genelist.bychrome[[1]] [,"stop_20"] >= snplist[1,3]) & (genelist.bychrome[[1]] [,"start_20"] <= snplist[1,3])
# gene.matched <- genelist.bychrome[[1]][get.gene,"gene"]
( genelist.bychrome[[ snplist[i,"chromosome"] ]][,"start_20"] <= snplist[i,"position"])) # correct
if(length(get.gene)!=0) gene.matched[i]<- genelist.bychrome[[ snplist[i,"chromosome"] ]][get.gene,"gene"]
}
} # end for()
#bind the matched genes to the snplist
snplist.new <- cbind(snplist,gene.matched)
任何提示将不胜感激!谢谢。
【问题讨论】:
-
您说,“snp 1 的位置为 '3',这意味着它映射到基因 7 和 8”。你怎么知道位置 3 映射到 7 和 8?
-
您好,抱歉,我刚刚意识到这是我的问题中的一个错误(我写了一个新版本以使其更直观)。基因 7 和 8 现在应该读取基因 A 和基因 B。
标签: r bioinformatics