【问题标题】:How to get the filenames based on names in other file in R?如何根据 R 中其他文件中的名称获取文件名?
【发布时间】:2020-11-24 10:19:53
【问题描述】:

我有一个文件目录数据,其中包含如下文件:

data
  |___ UPA.csv
  |___ M_B.csv
  |___ M_C.csv
  |___ M_D.csv
  |___ M_E.csv

UPA.csv 如下:

Genes
AC018653.3
AC022509.1
AC022509.2
AC055720.2
AC082651.1
AC084346.2
AC084824.4
AC092171.4
AC092803.2

M_B.csv 如下:

AC084346.2
AD097808.3
AC084824.4
ADFR3564.8
A1982983.4

M_C.csv 如下:

AC098789.3
AC022509.2
AC783546.3
AC055720.2

M_D.csv 如下:

AC018653.3
AS989473.9
AC022509.1
AE378467.1

我想检查UPA.csv 中的哪些Genes 也在其他文件中找到。并想获取文件名。

我希望输出如下所示:

M_B.csv: AC084346.2, AC084824.4
M_C.csv: AC022509.2, AC055720.2
M_D.csv: AC018653.3, AC022509.1

为此,我尝试如下:

setwd("/data/")
library(tidyverse)
library(magrittr)

genes <- Sys.glob(file.path("M_*.csv"))
genes.read <- lapply(genes,function(x) read.delim(x, header = FALSE))
genes.read <- lapply(genes.read, function(x) set_colnames(x, "Genes"))
ref2 <- list.files(pattern = "UP")
ref2
ref.read <- read.delim(ref2[[1]])
intersect <- lapply(seq_along(genes.read), function(x) 
  intersect(genes.read[[x]], ref.read))
for(i in 1:length(genes.read)) { 
  cat(genes[[i]],":",intersect[[i]]$Genes, "\n")
}

上面的代码只给出了文件名,没有给出基因:

M_B.csv:
M_C.csv
M_D.csv:

【问题讨论】:

    标签: r file tidyverse filenames magrittr


    【解决方案1】:

    尝试以下方法:

    UPA <- read.csv('UPA.csv')
    filenames <- list.files(pattern = 'M_.*\\.csv$')
    
    do.call(rbind, lapply(filenames, function(x) {
      data <- read.delim(x, header = FALSE)
      names(data) <- 'Genes'
      cbind(file = x, subset(data, Genes %in% UPA$Genes))
    })) -> result
    

    使用tidyverse,您可以执行以下操作:

    library(tidyverse)
    
    map_df(filenames, function(x) {
      read.delim(x, header = FALSE) %>%
        setNames('Genes') %>%
        filter(Genes %in% UPA$Genes) %>%
        mutate(file = x)
    }) -> result
    

    这应该会给你一些输出:

    result
    
    #       Genes    file 
    #1 AC084346.2 M_B.csv
    #2 AC084824.4 M_B.csv
    #3 AC022509.2 M_C.csv
    #4 AC055720.2 M_C.csv
    #...
    

    【讨论】:

    • 谢谢。对于tidyverselist_dat 是什么?
    • 抱歉,应该是filenames。我已经更新了答案。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-10-07
    • 2017-09-03
    • 1970-01-01
    • 2021-10-31
    相关资源
    最近更新 更多