【问题标题】:Looping with biomart in R在 R 中使用 biomart 循环
【发布时间】:2021-01-19 19:05:02
【问题描述】:

我有一个基于许多文件创建的数据集列表。


list.function <-  function() { 
   
  sample1 <- data.frame(ensembl.id = c("ENSG00000000005.6", "ENSG00000000003.15", "ENSG00000000419.13", "ENSG00000000457.14", "ENSG00000000460.17"), counts = c(4, 5, 6, 1, 1))
  sample2 <- data.frame(ensembl.id =  c("ENSG00000000005.6", "ENSG00000000003.15", "ENSG00000000419.13", "ENSG00000000457.14", "ENSG00000000460.17"), counts = c(4, 5, 6, 1, 1))
  sample3 <- data.frame(ensembl.id =  c("ENSG00000000005.6", "ENSG00000000003.15", "ENSG00000000419.13", "ENSG00000000457.14", "ENSG00000000460.17"), counts = c(4, 5, 6, 1, 1))
  sample4 <- data.frame(ensembl.id =  c("ENSG00000000005.6", "ENSG00000000003.15", "ENSG00000000419.13", "ENSG00000000457.14", "ENSG00000000460.17"), counts = c(4, 5, 6, 1, 1))
  
  sapply(paste('sample', seq(1,4,1), sep=''), get, environment(), simplify = FALSE) 
} 

my.list3 <- list.function()
my.list3



library("biomaRt")
grch38     <- useMart("ensembl",dataset="hsapiens_gene_ensembl")

我正在尝试自动执行此操作:


my.list4 = lapply(my.list3, function(x){
  
atributos = getBM(attributes = c("ensembl_gene_id_version", "external_gene_name",  "chromosome_name", "gene_biotype", "entrezgene_description"),
                  filters = "ensembl_gene_id_version",
                  values = x$ensembl.id,
                  mart = grch38)


atributos_unique = atributos %>% distinct(ensembl_gene_id_version, .keep_all = TRUE)


merged = merge(x, atributos_unique, by.x="ensembl.id", by.y="ensembl_gene_id_version" )


merged$gene_biotype = as.factor(merged$gene_biotype)
})

正确使用所有数据集, 但输出不正确!

我需要“合并”的最终输出对于我的“my.list3”列表中的每个数据集都是唯一的,并且与原始数据集同名

有什么想法吗?

【问题讨论】:

  • 您能在dput(head(my_data2)) 上发布输出吗?更好的是一个可重复的示例,您还可以定义集市。
  • 我添加了一个非常相似的数据库(但信息很少,因此 biomart 不会永远占用)。我无法通过正确的操作使其输出命名数据帧
  • 只需在 lapply 调用中再添加一行:mergedreturn(merged)。您没有在函数调用中返回数据框。
  • 请写成答案,这样任何人都可以看到!使用 return(merged) 确实有效!

标签: r loops lapply biomart


【解决方案1】:

您没有在函数调用中返回数据框。

library(biomaRt)
library(tidyverse)

grch38 = useMart("ensembl", dataset="hsapiens_gene_ensembl")

my.list4 = lapply(my.list3, function(x){
                  atributos = getBM(attributes = c("ensembl_gene_id_version",
                                                   "external_gene_name",
                                                   "chromosome_name",
                                                   "gene_biotype",
                                                   "entrezgene_description"),
                                    filters = "ensembl_gene_id_version",
                                    values = x$ensembl.id,
                                    mart = grch38)


               atributos_unique = atributos %>% 
                                     distinct(ensembl_gene_id_version, .keep_all = TRUE)


              merged = merge(x,
                             atributos_unique,
                             by.x="ensembl.id",
                             by.y="ensembl_gene_id_version" )


              merged$gene_biotype = as.factor(merged$gene_biotype)
              return(merged) #or just merged
})

在函数调用的末尾添加return(merged)

【讨论】:

    猜你喜欢
    • 2021-08-06
    • 2017-06-08
    • 2015-06-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多