【问题标题】:rbind files in subdirectory based on filenamerbind 基于文件名的子目录中的文件
【发布时间】:2017-09-06 00:40:06
【问题描述】:

我在多个子目录中有一个同名 CSV 的目录。我正在尝试将名称相似的 CSV 组合到 1 个数据框中,并将子目录名称添加为一列。在下面的示例中,我将有一个名为“data”的数据框和一个名为“name”的数据框,其中包含来自 Run 1 和 Run 2 的观察结果,并在每个数据框中添加了一个名为 Run 的列。如果解决方案与 CSV 的名称无关,那将是理想的,但任何解决方案都会非常有帮助。

在这个问题中,这个人有同样的问题,但我需要一个 R 解决方案:Combining files with same name in r and writing them into different files in R

 dir <- getwd()

subDir <- 'temp'

dir.create(subDir)

setwd(file.path(dir, subDir))

dir.create('Run1')
dir.create('Run2')

employeeID <- c('123','456','789')
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))

employeeID <- c('123','456','789')
first <- c('John','Jane','Tom')
last <- c('Doe','Smith','Franks')

data <- data.frame(employeeID,salary,startdate)
name <- data.frame(employeeID,first,last)

write.csv(data, file = "Run1/data.csv",row.names=FALSE, na="")
write.csv(name, file = "Run1/name.csv",row.names=FALSE, na="")

employeeID <- c('465','798','132')
salary <- c(100000, 500000, 300000)
startdate <- as.Date(c('2000-11-1','2001-3-25','2003-3-14'))

employeeID <- c('465','798','132')
first <- c('Jay','Susan','Tina')
last <- c('Jones','Smith','Thompson')

data <- data.frame(employeeID,salary,startdate)
name <- data.frame(employeeID,first,last)

write.csv(data, file = "Run2/data.csv",row.names=FALSE, na="")
write.csv(name, file = "Run2/name.csv",row.names=FALSE, na="")

# list files in all directories to read
files <- list.files(recursive = TRUE)

# Read csvs into a list
list <- lapply(files, read.csv)

# Name each dataframe with the run and filename
names <- sub("\\..*$", "", files)
names(list) <- sub("\\..*$", "", files)

# And add .id = 'run' so that the run number is one of the columns
# This would work if all of the files were the same, but I don't know how to subset the dataframes based on name. 
all_dat <- list %>%
bind_rows(.id = 'run')

【问题讨论】:

    标签: r


    【解决方案1】:
    files_to_df <- function(pattern){ 
    
      # pattern <- "data"
      filenames <- list.files(recursive = TRUE, pattern = pattern) 
    
      df_list <- lapply(filenames, read.csv, header = TRUE)
    
      # Name each dataframe with the run and filename
      names(df_list) <- str_sub(filenames, 1, 4)
    
      # Create combined dataframe  
      df <- df_list %>%
        bind_rows(.id = 'run')
    
      # Assign dataframe to the name of the pattern  
      assign(pattern, df)
    
      # Return the dataframe  
      return(data.frame(df))
    }
    
    name_df <- files_to_df('name')
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-10-17
      • 1970-01-01
      • 2013-02-13
      • 2013-06-12
      • 2017-02-11
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多