从 csv 文件动态创建数据框列表答案

【问题标题】：dynamically create list of data frames from csv files从 csv 文件动态创建数据框列表
【发布时间】：2016-04-29 09:02:21
【问题描述】：

我想一次将许多 csv 文件读取到一个类似结构的大哈希中，在该结构中，可以访问特定数据集的键（这将是 csv 文件的名称）。 AFAIK R 没有散列，因此选择使用带有命名元素的列表（如果这不正确，请纠正我）。到目前为止我的代码：

csv_files <- list.files(pattern="*.csv");
datasets  <- vector("list", length(csv_files));
names(datasets) <- csv_files;

for (i in 1:length(datasets)){
  csv_file <- names(datasets[i])
  datasets[i] <- read.csv(file=csv_file, header=T, sep=",", skip=0, check.names=TRUE)
}

但此代码不起作用（datasets 包含其他但不是特定的 csv data.frame）并返回以下警告：

Warning messages:
1: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",",  :
  number of items to replace is not a multiple of replacement length
2: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",",  :
  number of items to replace is not a multiple of replacement length
3: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",",  :
  number of items to replace is not a multiple of replacement length
4: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",",  :
  number of items to replace is not a multiple of replacement length
5: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",",  :
  number of items to replace is not a multiple of replacement length

以下是我的故障排除结果：

似乎从 csv 文件读取工作正常：

> csv_file <- names(datasets[1])
> temp_dataset <- read.csv(file=csv_file, header=T, sep=",", skip=0, check.names=TRUE)
> temp_dataset
   ord orig pred as o.p
1    1    0    0  1   0
2    2    0    0  1   0
3    3    0    0  1   0
4    4    0    0  0   0
5    5    0    0  0   0
6    6    0    0  0   0
7    7    0    0  0   0
8    8    0    0  0   0
9    9    0    0  0   0
10  10    0    0  0   0
11  11    0    0  0   0
12  12    0    0  0   0
13  13    0    0  0   0
14  14    0    0  0   0
15  15    0    0  0   0
16  16    0    0  0   0
17  17    0    0  0   0
18  18    0    0  0   0
19  19    0    0  0   0
20  20    0    0  0   0
21  21    0    0  0   0
22  22    0    0  0   0
23  23    4    0  0   4
24  24  402    0  1 402
25  25    0    0  1   0
26  26    0    0  1   0
27  27    0    0  1   0
28  28    1    0  1   0

问题在于将这些数据分配给列表中的特定数据框

> datasets[1] <- temp_dataset[-1]
Warning message:
In datasets[1] <- temp_dataset[-1] :
  number of items to replace is not a multiple of replacement length

似乎只有第一列分配给列表内的特定数据框：

> datasets[1]
$repeating.csv
repeating.csv
 [1]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[17]   0   0   0   0   0   0   4 402   0   0   0   1

我错过了什么？

【问题讨论】：

试试datasets[[i]] <- ... 为什么要使用csv_file <- names(datasets[i])这一行？而是在read.csv 中尝试file = csv_files[i]
如果您想将所有文件读入数据框列表，您可以使用sapply(csv_files, read.csv)。每个列表元素的名称将是文件的名称。
更正：应该是 datasets=sapply(csv_files, read.csv) 将列表分配给对象。
来自?Extract：“[、[[ 和$ 之间最重要的区别是[ 可以选择多个元素，而其他两个选择单个元素。”所以[ 总是会给你一个sublist，而[[ 会给你一个列表的单个元素。
请问为什么我被否决了？是为 R 创建“更复杂”的数据结构问题还是我选择了错误的方式来解决我的目标或问题是什么？请发表声明。谢谢。

标签： r list csv dataframe hashtable

【解决方案1】：

基于@RichardScriven，@joran cmets 我最终得到了以下解决方案：

CSV_PATH="/home/wakatana/r/csv"
CSV_FILES <- list.files(pattern="*.csv");
DATASETS  <- vector("list", length(CSV_FILES))
names(DATASETS) <- CSV_FILES
for (i in 1:length(CSV_FILES)){
  message(CSV_FILES[i])
  full.csv.path = file.path(CSV_PATH, CSV_FILES[i])
  if (CSV_FILES[i] == "skip_first_four_lines.csv"){
    DATASETS[[i]] <- read.csv(file=full.csv.path, header=F, sep=",", skip=4, col.names = names(read.csv(file=full.csv.path, nrow = 0)))
  }
  else {
    DATASETS[[i]] <- read.csv(file=full.csv.path, header=T, sep=",", skip=0, check.names=TRUE)
  }
}

如果有人展示不同的方法并解释为什么更好，我会接受他的问题。

【讨论】：