【发布时间】:2021-11-16 11:10:54
【问题描述】:
我正在尝试读取大型数据集的块: 找到每个块的平均值(代表更大的列) 将平均值添加到矩阵列中 然后找到平均值的平均值给我列的整体平均值。 我已经设置好了,但是我的 while 循环没有重复它的循环。我认为这可能与我如何指代“块”和“块”有关。
这是在R中使用“iris.csv”的一种做法
fl <- file("iris.csv", "r")
clname <- readLines(fl, n=1) # read the header
r <- unlist(strsplit(clname,split = ","))
length(r) # get the number of columns in the matrix
cm <- matrix(NA, nrow=1000, ncol=length(r)) # need a matrix that can be filled on each #iteration.
numchunk = 0 #set my chunks of code to build up
while(numchunk <= 0){ #stop when no more chunks left to run
numchunk <- numchunk + 1 # keep on moving through chunks of code
x <- readLines(fl, n=100) #read 100 lines at a time
chunk <- as.numeric(unlist(strsplit(x,split = ","))) # readable chunk of code
m <- matrix(chunk, ncol=length(r), byrow = TRUE) # put chunk in a matrix
cm[numchunk,] <- colMeans(m) #get the column means of the matrix and fill in larger matrix
print(numchunk) # print the number of chunks used
}
cm
close(fl)
final_mean <- colSums(cm)/nrow(cm)
return(final_mean)
-- 这在我设置我的 n = 1000 时有效,但我希望它适用于更大的数据集,其中 while 需要继续运行。 谁能帮我纠正这个问题?
【问题讨论】:
标签: r matrix while-loop large-data readlines