【发布时间】:2014-03-11 17:14:17
【问题描述】:
我有超过 800 个 dbf 文件需要在 R 中导入和合并。我已经能够使用以下代码引入所有文件:
library(foreign)
setwd("c:/temp/help/")
files <- list.files(pattern="\\.dbf$")
all.the.data <- lapply(files, read.dbf, as.is=FALSE)
DATA <- do.call("rbind",all.the.data)
但是,这些 dbf 文件具有不同的列数,即使它们有时具有相同的列数,这些标题也可能不同。以下是四个 dbf 文件以提供示例:
file01 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1002_2km", class = "factor"),
VALUE_11 = 11443500, VALUE_31 = 13500, VALUE_42 = 928800,
VALUE_43 = 162000, VALUE_90 = 18900), .Names = c("PLOTBUFFER",
"VALUE_11", "VALUE_31", "VALUE_42", "VALUE_43", "VALUE_90"), row.names = c(NA,
-1L), class = "data.frame", data_types = c("C", "F", "F", "F",
"F", "F"))
file02 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1002_5km", class = "factor"),
VALUE_11 = 66254400, VALUE_21 = 125100, VALUE_31 = 80100,
VALUE_41 = 4234500, VALUE_42 = 3199500, VALUE_43 = 4194000,
VALUE_52 = 376200, VALUE_90 = 72000), .Names = c("PLOTBUFFER",
"VALUE_11", "VALUE_21", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43",
"VALUE_52", "VALUE_90"), row.names = c(NA, -1L), class = "data.frame", data_types = c("C",
"F", "F", "F", "F", "F", "F", "F", "F"))
file03 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1003_2km", class = "factor"),
VALUE_11 = 1972800, VALUE_31 = 125100, VALUE_41 = 5316300,
VALUE_42 = 990900, VALUE_43 = 1995300, VALUE_52 = 740700,
VALUE_90 = 1396800, VALUE_95 = 25200), .Names = c("PLOTBUFFER",
"VALUE_11", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43", "VALUE_52",
"VALUE_90", "VALUE_95"), row.names = c(NA, -1L), class = "data.frame", data_types = c("C",
"F", "F", "F", "F", "F", "F", "F", "F"))
file04 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1003_5km", class = "factor"),
VALUE_11 = 43950600, VALUE_31 = 270000, VALUE_41 = 12969900,
VALUE_42 = 5105700, VALUE_43 = 12614400, VALUE_52 = 1491300,
VALUE_90 = 2055600, VALUE_95 = 70200), .Names = c("PLOTBUFFER",
"VALUE_11", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43", "VALUE_52",
"VALUE_90", "VALUE_95"), row.names = c(NA, -1L), class = "data.frame", data_types = c("C",
"F", "F", "F", "F", "F", "F", "F", "F"))
我希望数据框与此匹配:
merged <- structure(list(PLOTBUFFER = structure(1:2, .Label = c("1002_2km",
"1002_5km"), class = "factor"), VALUE_11 = c(11443500, 66254400
), VALUE_21 = c(0, 125100), VALUE_31 = c(13500, 80100), VALUE_41 = c(0,
4234500), VALUE_42 = c(928800, 3199500), VALUE_43 = c(162000,
4194000), VALUE_52 = c(0, 376200), VALUE_90 = c(18900, 72000)), .Names = c("PLOTBUFFER",
"VALUE_11", "VALUE_21", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43",
"VALUE_52", "VALUE_90"), class = "data.frame", row.names = c(NA,
-2L))
如果一个数据集中缺少列,则只需用零或 NULL 填充。
谢谢
-al
@infominer 的建议适用于我作为示例包含的 4 个文件,但是当我尝试在包含 802 个元素的大型列表中使用 merge_recurse 时,我收到了一个错误。
files <- list.files(pattern="\\.dbf$")
all.the.data <- lapply(files, read.dbf, as.is=FALSE)
merged <- merge_recurse(all.the.data)
错误:求值嵌套太深:无限递归/选项(表达式=)? 总结期间出错:评估嵌套太深:无限递归/选项(表达式=)?
【问题讨论】: