将多个 *.bil 气候数据合并到 *.csv答案

【问题标题】：Merge multiple *.bil climate data into *.csv将多个 *.bil 气候数据合并到 *.csv
【发布时间】：2015-01-29 01:59:11
【问题描述】：

我有超过 7,000 个文件 *.bil 文件，我正在尝试将其合并到一个 *.csv 文件中并将其导出。我可以使用 raster 和 as.data.frame 读取 *.bil 文件：

setwd("/.../Prism Weather Data All/")
filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")
r = raster("PRISM_ppt_stable_4kmM2_189501_bil.bil")
test <- as.data.frame(r, na.rm=TRUE)

这会设置工作目录并获取所有带有 *.bil 的文件。但我只光栅一个文件并设置 as.data.frame 来验证它是正确的，这很完美。但我试图弄清楚如何将所有 7000 个文件（文件名）合并为一个。

对此的任何帮助将不胜感激。提前致谢。

【问题讨论】：

values(r) 可能会比as.data.frame(r) 更快。
尝试 as.data.frame(stack(filenames)) 但有很多方法可以解决这个问题

标签： r dataframe raster

【解决方案1】：

假设 7000 是一个实数而非近似值，并且每个文件中的所有数据结构相同（列数和行数相同）：

setwd("/.../Prism Weather Data All/")

nc<- ## put the number of columns of each file (assuming they're all the same)
nr<- ## put the number of rows of each file (assuming they're all the same)

filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")

# initialize what is likely to be a large object
final.df<-as.data.frame(matrix(NA,ncol=7000*nc,nrow=nr)) 
counter=1
# loop through the files
for (i in filenames){
    r = raster(i)
    test <- as.data.frame(r, na.rm=TRUE)
    final.df[,counter:counter+nc]<-test
    counter<-counter+nc+1
}

# write the csv
write.csv(final.df,"final-filename.csv")

请记住，您的机器必须有足够的内存来保存所有数据，因为 R 需要在内存中有对象。

如果每个文件的列数不同，您可以通过调整循环内final.df 赋值中的索引并相应地增加counter 来进行调整。

编辑：产生预期结果

我认为 for 循环是完成此类工作的唯一方法。确实，7000 个文件是一个相当大的集合，所以希望花一些时间来查看它的迭代。

setwd("/.../Prism Weather Data All/")

nc<- ## put the number of columns you expect the data in the files to have
nr<- ## put roughly the number of rows times 12 (if you plan to read a year worth of data)
     ## PLUS some tolerance, so you'll end up with an object actually larger than needed

filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")

# initialize what is likely to be a large object
final.df<-as.data.frame(matrix(NA,ncol=c,nrow=nr)) 
counter=1
# loop through the files
for (i in filenames){
    r = raster(i)
    test <- as.data.frame(r, na.rm=TRUE)
    numrow2<-nrow(test)
    final.df[counter:counter+numrow2,]<-test
    counter<-counter+numrow2+1
}

final.df[counter-1:nrow(final.df),]<-NULL  ## remove empty rows

# write the csv
write.csv(final.df,"final-filename.csv")

希望对你有帮助。

【讨论】：

这是我最初的想法，但不想使用 for 循环，因为担心会花费很长时间，这就是它正在做的事情。我会运行一下，看看效果如何。感谢您的帮助。
感谢您的帮助，但这不起作用。我只分离了 1 年，它有 12 个文件，它正在创建 36 列而不是按列合并。
这里是错误 => [<-.data.frame(*tmp*, , counter:counter + nc, value = list( : 新列会在现有列之后留下孔) 中的错误
你可以通过在循环中写出append=TRUE来避免将final.df保存在内存中。
@PavoDive 对此表示感谢。它最终有点不同，我不得不对数据进行一些返工，但它现在正在处理中。感谢您的帮助。

【解决方案2】：

我一直在使用 Prism 数据，下面是另一种方法。这是如果您可以合并 7,000 个 .bil 文件中的每个“站”或行名称。在这种情况下，每个月将是一个单独的列，对应于相同的站点 ID/行。

setwd("/.../Prism Weather Data All/")
require(dplyr)
require(raster)

#This makes sure only .bil is read (not asc.bil, etc)

filenames <- dir("/.../Prism Weather Data All/", pattern = "\\.bil$")

z <- as.data.frame(matrix(NA)) 

#loop through the data, and name each column the name of the date in 
#the spreadsheet (according to Prism's naming convention, the date 
#starts at character 24 and ends at character 29)

for (file in filenames){
  r <- raster(filenames)
  test <- as.data.frame(r, na.rm=TRUE, row.names=TRUE, col.names=FALSE)
  names(test)<- c(substring(file, 24, 29))
  z <- cbind(z, test)
}

#then export the data.frame to CSV!

【讨论】：