【问题标题】:Read several files in different directories in r [closed]在 r [关闭] 中读取不同目录中的多个文件
【发布时间】:2017-05-10 19:14:14
【问题描述】:

我想从不同的目录读取多个 .csv 文件,然后将其放入单个数据帧中。

我有两种目录要读取:

A:/LogIIS/FOLDER01/"files.csv"


在其他文件夹中有几个 files.csv,如下例所示:

A:/LogIIS/FOLDER02/FOLDER_A/"files.csv

"A:/LogIIS/FOLDER02/FOLDER_B/"files.csv"

"A:/LogIIS/FOLDER02/FOLDER_C/"files.csv"


"A:/LogIIS/FOLDER03/FOLDER_A/"files.csv"

"A:/LogIIS/FOLDER03/FOLDER_B/"files.csv"

"A:/LogIIS/FOLDER03/FOLDER_C/"files.csv"

"A:/LogIIS/FOLDER03/FOLDER_D/"files.csv"


提前致谢!

【问题讨论】:

    标签: r csv


    【解决方案1】:

    如果需要显式定义文件模式(文件名或扩展名),可以在list.files 函数中使用pattern 参数。

    library(data.table)
    
    # make an explicit alist of folders
    folders = list(
      file.path('A:','LogIIS','FOLDER02','FOLDER_A'),
      file.path('A:','LogIIS','FOLDER02','FOLDER_B'),
      file.path('A:','LogIIS','FOLDER02','FOLDER_C'),
      file.path('A:','LogIIS','FOLDER03','FOLDER_A'),
      file.path('A:','LogIIS','FOLDER03','FOLDER_B'),
      file.path('A:','LogIIS','FOLDER03','FOLDER_C'),
      file.path('A:','LogIIS','FOLDER03','FOLDER_D')
    )
    
    # iterate through each folder in list and return all files
    # unlist those lists of files into a single vector
    files = unlist(sapply(folders, function(folder) {
      list.files(folder, full.names=TRUE)
    }))
    
    # read each file into a data.table
    # return data.table results as a list
    # combine list into a single data.table
    rbindlist(use.names=TRUE, fill=FALSE,
      lapply(files, function(x) { 
        fread(x)  
      }) 
    )
    

    【讨论】:

    • @Mikuma,代码返回“ fread(x) 中的错误:预期 sep (' ') 但换行,EOF(或其他非打印字符)在从点 0 检测类型时结束字段 5 : #Software: Microsoft Internet Information Services 8.5"
    • @helio7sr 看起来您的 CSV 文件中有一些不规则字符。您可能需要考虑清理 CSV 文件,或更改 fread 解释它们的方式。尝试使用命令?data.table::fread 了解更多关于解释源文件的参数。
    【解决方案2】:

    我还将使用带有循环的list.files() 函数来提取所有信息。列出公共顶级目录下的所有目录,在本例中为目录 A:/LogIIS

    common_path = "A:/LogIIS/"
    primary_dirs = list.files(common_path);
    primary_dirs 
    [1] "FOLDER01" "FOLDER02" "FOLDER03"
    

    现在我将对所有primary_dirs 进行嵌套循环,在您的示例中,所有.csv 文件都有一个通用名称files.csv,这简化了问题,您还没有说如何附加 csv 文件但是我将假设它们具有相同的列标题并使用cbind() 附加它们,否则您可以使用rbind()

    main_data = data.frame(##populate heade) ## 
    

    使用here的答案

    for(dir in primary_dirs) {
      sub_folders = list.files(paste(common_path,dir,sep = ""))
      if (any(sub_folders %in% "files.csv")) {
        ## there is files.csv in this directory read it in and append to a data.frame.
        ## read in data 
        temp_data = read.csv(file = paste(common_path,dir,"/files.csv",sep = ""))
        ## append
        main_data = cbind(main_data,temp_data);
      } else {
        ## try go one more directory deeper
        for(sub_dir in sub_folders) {
          sub_sub_files = list.files(paste(common_path,dir,"/",sub_dir,sep = ""))             
          if (any(sub_sub_files %in% "files.csv")) {
            ## found files.csv read it in and append it
            temp_data = read.csv(file = paste(common_path,dir,"/",sub_dir,"/files.csv",sep = ""))
            main_data = cbind(main_data,temp_data);
          } else {
            warning("could not find the file 'files.csv' two directories deep")
          }
        } 
      }
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-10-05
      • 1970-01-01
      相关资源
      最近更新 更多