【问题标题】:merge multiple dbf files with non-matching headers in R在R中合并多个具有不匹配标头的dbf文件
【发布时间】:2014-03-11 17:14:17
【问题描述】:

我有超过 800 个 dbf 文件需要在 R 中导入和合并。我已经能够使用以下代码引入所有文件:

library(foreign)
setwd("c:/temp/help/")

files <- list.files(pattern="\\.dbf$")
all.the.data <- lapply(files, read.dbf, as.is=FALSE)
DATA <- do.call("rbind",all.the.data)

但是,这些 dbf 文件具有不同的列数,即使它们有时具有相同的列数,这些标题也可能不同。以下是四个 dbf 文件以提供示例:

file01 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1002_2km", class = "factor"), 
                         VALUE_11 = 11443500, VALUE_31 = 13500, VALUE_42 = 928800, 
                         VALUE_43 = 162000, VALUE_90 = 18900), .Names = c("PLOTBUFFER", 
                                                                          "VALUE_11", "VALUE_31", "VALUE_42", "VALUE_43", "VALUE_90"), row.names = c(NA, 
                                                                                                                                                     -1L), class = "data.frame", data_types = c("C", "F", "F", "F", 
                                                                                                                                                                                                "F", "F"))
file02 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1002_5km", class = "factor"), 
                         VALUE_11 = 66254400, VALUE_21 = 125100, VALUE_31 = 80100, 
                         VALUE_41 = 4234500, VALUE_42 = 3199500, VALUE_43 = 4194000, 
                         VALUE_52 = 376200, VALUE_90 = 72000), .Names = c("PLOTBUFFER", 
                                                                          "VALUE_11", "VALUE_21", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43", 
                                                                          "VALUE_52", "VALUE_90"), row.names = c(NA, -1L), class = "data.frame", data_types = c("C", 
                                                                                                                                                                "F", "F", "F", "F", "F", "F", "F", "F"))
file03 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1003_2km", class = "factor"), 
                         VALUE_11 = 1972800, VALUE_31 = 125100, VALUE_41 = 5316300, 
                         VALUE_42 = 990900, VALUE_43 = 1995300, VALUE_52 = 740700, 
                         VALUE_90 = 1396800, VALUE_95 = 25200), .Names = c("PLOTBUFFER", 
                                                                           "VALUE_11", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43", "VALUE_52", 
                                                                           "VALUE_90", "VALUE_95"), row.names = c(NA, -1L), class = "data.frame", data_types = c("C", 
                                                                                                                                                                 "F", "F", "F", "F", "F", "F", "F", "F"))
file04 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1003_5km", class = "factor"), 
                         VALUE_11 = 43950600, VALUE_31 = 270000, VALUE_41 = 12969900, 
                         VALUE_42 = 5105700, VALUE_43 = 12614400, VALUE_52 = 1491300, 
                         VALUE_90 = 2055600, VALUE_95 = 70200), .Names = c("PLOTBUFFER", 
                                                                           "VALUE_11", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43", "VALUE_52", 
                                                                           "VALUE_90", "VALUE_95"), row.names = c(NA, -1L), class = "data.frame", data_types = c("C", 
                                                                                                                                                                 "F", "F", "F", "F", "F", "F", "F", "F"))

我希望数据框与此匹配:

merged <- structure(list(PLOTBUFFER = structure(1:2, .Label = c("1002_2km", 
    "1002_5km"), class = "factor"), VALUE_11 = c(11443500, 66254400
    ), VALUE_21 = c(0, 125100), VALUE_31 = c(13500, 80100), VALUE_41 = c(0, 
    4234500), VALUE_42 = c(928800, 3199500), VALUE_43 = c(162000, 
    4194000), VALUE_52 = c(0, 376200), VALUE_90 = c(18900, 72000)), .Names = c("PLOTBUFFER", 
    "VALUE_11", "VALUE_21", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43", 
    "VALUE_52", "VALUE_90"), class = "data.frame", row.names = c(NA, 
    -2L))

如果一个数据集中缺少列,则只需用零或 NULL 填充。

谢谢

-al

@infominer 的建议适用于我作为示例包含的 4 个文件,但是当我尝试在包含 802 个元素的大型列表中使用 merge_recurse 时,我收到了一个错误。

files <- list.files(pattern="\\.dbf$")
all.the.data <- lapply(files, read.dbf, as.is=FALSE)
merged <- merge_recurse(all.the.data)

错误:求值嵌套太深:无限递归/选项(表达式=)? 总结期间出错:评估嵌套太深:无限递归/选项(表达式=)?

【问题讨论】:

    标签: r header merge dbf


    【解决方案1】:

    使用包重塑

    library(reshape)
    merged.files <-merge_recurse(list(file01,file02,file03,file04))
    

    编辑:

    感谢Ramnath,试试这个代码

    Reduce(function(...) merge(..., all=T),all.the.data)
    

    改编自https://stackoverflow.com/a/6947326/2747709

    【讨论】:

    • 您正在处理多少个文件?当列表太大而无法处理 merge_recurse 时,这基本上是一个错误
    • 802 个元素或 dbfs,总计 2.2 mb
    • 802,看起来并不多,但是我们再次递归合并。让我知道这个更新的是否适合你
    • Reduce 功能有效!感谢@infominer 找到解决方案!
    猜你喜欢
    • 2013-10-10
    • 2015-11-22
    • 1970-01-01
    • 2016-10-19
    • 2020-07-16
    • 2023-02-08
    • 2015-03-20
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多