【发布时间】:2018-10-08 13:35:36
【问题描述】:
我有一个循环遍历目录中的每个文件。它在一个文件上工作正常,但只要目录中有 2 个或更多文件,第二个(或更多)输出都是 NA。
我尝试从 read.csv 切换到 fread,我尝试将 .csv 转换为 .txt,我尝试了选择特定列的不同方法(例如,保留、选择),但我总是得到 NA第二次通过循环。它不是第二个文件,因为如果我删除第一个文件,第二个文件就会完美处理。
不确定它是否是 .csv 末尾的内容,或者是否正在将行名添加到第二个文件或什么。谢谢!
filenames <- list.files()
n_filenames <- length(filenames)
SSRT_cb1_pre <- data.frame(matrix(ncol = 4, nrow = n_filenames))
cols <- c(13, 23, 24, 25, 28, 29, 31, 32)
for (i in 1:n_filenames) {
print(filenames)
dt_pre <- fread(filenames[i], header=T, sep=",", select=cols,
stringsAsFactors=F, na.strings=c("NA", "", "."))
dt_pre$RT <- as.numeric(dt_pre$rt)
data_real_pre <- subset(dt_pre, SSTBlocks.thisRepN>=0)
data_corr_pre <- subset(data_real_pre, corr == 1)
data_corr_pre_RTmean <- aggregate(RT ~ P, data = data_corr_pre,
FUN=mean, na.rm=TRUE)
data_corr_pre_SSDmean <- aggregate(SSD ~ P, data = data_corr_pre,
FUN = mean, na.rm = TRUE)
pre_sub <- data_corr_pre_RTmean[i,1]
preMeanRT <- data_corr_pre_RTmean[i,2]
preMeanSSD <- data_corr_pre_SSDmean[i,2]
SSRT_cb1_pre[i, 1] <- i
SSRT_cb1_pre[i, 2] <- pre_sub
SSRT_cb1_pre[i, 3] <- preMeanRT
SSRT_cb1_pre[i, 4] <- preMeanSSD
}
SSRT_cb1_pre
下面给了我这个输出:
输出:
SSRT_cb1_pre
i sub1 preRT preSSD
1 1 301 0.4877872 0.2580645
2 2 NA NA NA
比 ABO 更新的代码
filenames <- list.files()
n_filenames <- length(filenames)
n_rows <- n_filenames/2
SSRT_cb1_pre <- data.frame(matrix(ncol = 4, nrow = n_filenames)) # for output
colnames(SSRT_cb1_pre) <- c("i","sub1", "preRT", "preSSD")
cols <- c(13, 23, 24, 25, 28, 29, 31, 32)
colsnames <- c("SSTBlocks.thisRepN", "SSD", "corr", "rt", "sess", "CB", "P", "expName")
for (i in 1:n_filenames) {
print(filenames)
dt_pre <- fread(filenames[i], header=T, sep=",", select=colsnames, stringsAsFactors=F, na.strings=c("NA", "", "."))
dt_pre$RT <- as.numeric(dt_pre$rt)
data_real_pre <- subset(dt_pre, SSTBlocks.thisRepN>=0)
data_corr_pre <- subset(data_real_pre, corr == 1)
data_corr_pre_RTmean <- data_corr_pre[, mean(RT, na.rm=T), by = P] #suggested by Yannis Vassiliadis Stackoverflow as alt to aggregate
data_corr_pre_SSDmean <- data_corr_pre[, mean(SSD, na.rm=T), by = P]
# values to collect from each file
pre_sub <- data_corr_pre_RTmean[i, 1]
preMeanRT <- data_corr_pre_RTmean[i, 2]
preMeanSSD <- data_corr_pre_SSDmean[i, 2]
# output for values - should iterate through
SSRT_cb1_pre[i, 1] <- i
SSRT_cb1_pre[i, 2] <- pre_sub
SSRT_cb1_pre[i, 3] <- preMeanRT
SSRT_cb1_pre[i, 4] <- preMeanSSD
}
SSRT_cb1_pre
class(data_corr_pre_RTmean)
class(data_corr_pre_SSDmean)
这给出了输出:
[1] "301_1_PsychoPy_SST_Pretest_2.csv" "303_1_PsychoPy_SST_Pretest.csv"
[1] "301_1_PsychoPy_SST_Pretest_2.csv" "303_1_PsychoPy_SST_Pretest.csv"
Warning messages:
1: In as.numeric(dt_pre$rt) : NAs introduced by coercion
2: In as.numeric(dt_pre$rt) : NAs introduced by coercion
>
> SSRT_cb1_pre
i sub1 preRT preSSD
1 1 301 0.4877872 0.2580645
2 2 NA NA NA
> class(data_corr_pre_RTmean)
[1] "data.table" "data.frame"
> class(data_corr_pre_SSDmean)
[1] "data.table" "data.frame"
【问题讨论】:
-
谢谢 - 我希望它这么简单,但空格或没有空格并没有改变任何事情。
-
我有一种感觉,如果您使用列名而不是列索引,您将避免该错误。
fread的select参数也接受字符。顺便说一句,鉴于dt_pre和data_real_pre具有data.table类,我建议您使用dt_corr_pre[, mean(RT, na.rm=T), by = P]作为aggregate的更快替代方案。 -
谢谢您的帮助。我是否引用列名或列索引似乎并不重要。我只是注意到,虽然当 i=1 时,值正确传递到 SSRT_cb1_pre 但是当 i = 2 时它们作为 NA 传递,当我检查保存 i = 2 的计算平均值的变量时,它们正确存储在 data_corr_pre_RTmean 和data_corr_pre_SSDmean,所以它们只是没有传递给新的data.frame SSRT_cb1_pre。
-
因此,假设
P在每个文件中具有多个值,那么data_corr_pre_RTmean和data_corr_pre_SSDmean是数据帧。然后对于第二个文件,您将获得data_corr_pre_RTmean的第二行并创建preMeanRT。但如果第二行不存在,则意味着P只取一个值,即您甚至不应该使用aggregate,而只使用mean。 -
谢谢 Yannis - 我现在已经修改了我的代码(见上文),是的 - 这些变量是 data.table/data.frame,但仍然没有通过第二组手段(最终我会有还有更多文件,但只是试图解决这些问题),但它们正在生成和存储。有什么我需要改变的: SSRT_cb1_pre