R中的Reduce（）超过相似的变量名导致错误答案

【问题标题】：Reduce() in R over similar variable names causing errorR中的Reduce（）超过相似的变量名导致错误
【发布时间】：2015-04-07 08:30:57
【问题描述】：

我有 19 个从 lapply 和 split 操作生成的嵌套列表。这些列表采用以下形式：

#list1
Var col1 col2 col3
A    2     3    4
B    3     4    5

#list2
Var col1 col2 col3

A    5    6     7
B    5    4     4

......

#list19

Var col1 col2 col3

A    3   6    7
B    7   4    4

我已经能够将列表与

合并

merge.all <- function(x, y) merge(x, y, all=TRUE, by="Var")
out <- Reduce(merge.all, DataList)

但是，由于其他列的名称相似，我得到了一个错误。

如何将列表的名称连接到变量名称，以便得到如下内容：

Var list1.col1 list1.col2 list1.col3  ..........   list19.col3
 A    2          3          4                            7 
 B    3          4          5          ..........        4

【问题讨论】：

标签： r list merge reduce lapply

【解决方案1】：

我真的很确定有人会想出一个更好的解决方案。但是，如果您寻求快速而肮脏的解决方案，这似乎可行。

我的计划是在合并之前简单地更改列名。

#Sample Data
df1 <- data.frame(Var = c("A","B"), col1 = c(2,3), col2 = c(3,4), col3 = c(4,5))
df2 <- data.frame(Var = c("A","B"), col1 = c(5,5), col2 = c(6,4), col3 = c(7,5))
df19 <- data.frame(Var = c("A","B"), col1 = c(3,7), col2 = c(6,4), col3 = c(7,4))

mylist <- list(df1, df2, df19)
names(mylist) <- c("df1", "df2", "df19") #just manually naming, presumably your list has names


## Change column names by pasting name of dataframe in list with standard column names. - using ugly mix of `lapply` and a `for` loop:

mycolnames <- colnames(df1)
mycolnames1 <- lapply(names(mylist), function(x) paste0(x, mycolnames)) 


for(i in 1:length(mylist)){
  colnames(mylist[[i]]) <- mycolnames1[[i]]
  colnames(mylist[[i]])[1] <- "Var" #put Var back in so you can merge
}



## Merge
merge.all <- function(x, y)
  merge(x, y, all=TRUE, by="Var")

out <- Reduce(merge.all, mylist)
out


#  Var df1col1 df1col2 df1col3 df2col1 df2col2 df2col3 df19col1 df19col2 df19col3
#1   A       2       3       4       5       6       7        3        6        7
#2   B       3       4       5       5       4       5        7        4        4

你去 - 它工作但非常丑陋。

【讨论】：

【解决方案2】：

要将数据框名称设置为唯一，您可以使用函数将所有不是合并变量的列表名称设置为唯一名称。

resetNames <- function(x, byvar = "Var") {
    asrl <- as.relistable(lapply(x, names))
    allnm <- names(unlist(x, recursive = FALSE))
    rpl <- replace(allnm, unlist(asrl) %in% byvar, byvar)
    Map(setNames, x, relist(rpl, asrl))
}

Reduce(merge.all, resetNames(dlist))
#  Var list1.col1 list1.col2 list1.col3 list2.col1 list2.col2 list2.col4 list3.col1
#1   A          2          3          4          5          6          7          3
#2   B          3          4          5          5          4          4          7
#  list3.col2 list3.col3 list4.col1 list4.col2 list4.col3
#1          6          7          3          6          7
#2          4          4          4          5          6

当使用添加的数据框运行您的列表时，不会出现任何警告。并且总是有数据表。它的合并方法不会返回重复列名的警告。

library(data.table)
Reduce(merge.all, lapply(dlist, as.data.table))

另一种选择是在数据进入函数时检查名称，在那里更改它们，然后您可以避免警告。这并不完美，但在这里可以正常工作。

merge.all <- function(x, y) {
    m <- match(names(y)[-1], gsub("[.](x|y)$", "", names(x)[-1]), 0L)
    names(y)[-1][m] <- paste0(names(y)[-1][m], "DUPE")
    merge(x, y, all=TRUE, by="Var")
}

rm <- Reduce(merge.all, dlist)
names(rm)
#  [1] "Var"        "col1"       "col2"       "col3"       "col1DUPE.x"
#  [6] "col2DUPE.x" "col4"       "col1DUPE.y" "col2DUPE.y" "col3DUPE.x"
# [11] "col1DUPE"   "col2DUPE"   "col3DUPE.y"

dlist 在哪里

structure(list(list1 = structure(list(Var = structure(1:2, .Label = c("A", 
"B"), class = "factor"), col1 = 2:3, col2 = 3:4, col3 = 4:5), .Names = c("Var", 
"col1", "col2", "col3"), class = "data.frame", row.names = c(NA, 
-2L)), list2 = structure(list(Var = structure(1:2, .Label = c("A", 
"B"), class = "factor"), col1 = c(5L, 5L), col2 = c(6L, 4L), 
    col4 = c(7L, 4L)), .Names = c("Var", "col1", "col2", "col4"
), class = "data.frame", row.names = c(NA, -2L)), list3 = structure(list(
    Var = structure(1:2, .Label = c("A", "B"), class = "factor"), 
    col1 = c(3L, 7L), col2 = c(6L, 4L), col3 = c(7L, 4L)), .Names = c("Var", 
"col1", "col2", "col3"), class = "data.frame", row.names = c(NA, 
-2L)), list4 = structure(list(Var = structure(1:2, .Label = c("A", 
"B"), class = "factor"), col1 = 3:4, col2 = c(6L, 5L), col3 = c(7L, 
6L)), .Names = c("Var", "col1", "col2", "col3"), row.names = c(NA, 
-2L), class = "data.frame")), .Names = c("list1", "list2", "list3", 
"list4"))

【讨论】：