如何删除R中重复的列名？答案

【问题标题】：How to remove duplicated column names in R?如何删除R中重复的列名？
【发布时间】：2014-07-31 07:56:52
【问题描述】：

我有非常大的矩阵，我知道它们的一些列名是重复的。所以我只想找到那些重复的列名并从重复的列中删除。我试过duplicate()，但它删除了重复的条目。有人会帮我在 R 中实现这个吗？关键是，重复的列名可能没有重复的整体。

【问题讨论】：

标签： r

【解决方案1】：

假设temp 是您的矩阵

temp <- matrix(seq_len(15), 5, 3)
colnames(temp) <- c("A", "A", "B")

##      A  A  B
## [1,] 1  6 11
## [2,] 2  7 12
## [3,] 3  8 13
## [4,] 4  9 14
## [5,] 5 10 15

你可以的

temp <- temp[, !duplicated(colnames(temp))]

##      A  B
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15

或者，如果你想保留最后一个重复的列，你可以这样做

temp <- temp[, !duplicated(colnames(temp), fromLast = TRUE)] 

##       A  B
## [1,]  6 11
## [2,]  7 12
## [3,]  8 13
## [4,]  9 14
## [5,] 10 15

【讨论】：

【解决方案2】：

或者假设你可以使用 data.frames subset:

subset(iris, select=which(!duplicated(names(.))))

请注意，dplyr::select 在这里不适用，因为它已经要求输入数据中的列唯一性。

【讨论】：

iris <- iris %>% subset(., select = which(!duplicated(names(.)))) 管道友好的版本

【解决方案3】：

要按名称删除特定的重复列，您可以执行以下操作：

test = cbind(iris, iris) # example with multiple duplicate columns
idx = which(duplicated(names(test)) & names(test) == "Species")
test = test[,-idx]

要删除所有重复的列，它有点简单：

test = cbind(iris, iris) # example with multiple duplicate columns
idx = which(duplicated(names(test)))
test = test[,-idx]

或：

test = cbind(iris, iris) # example with multiple duplicate columns
test = test[,!duplicated(names(test))]

【讨论】：

【解决方案4】：

temp = matrix(seq_len(15), 5, 3)
colnames(temp) = c("A", "A", "B")

temp = as.data.frame.matrix(temp)
temp = temp[!duplicated(colnames(temp))]
temp = as.matrix(temp)

【讨论】：

为什么将其转换为数据框，然后再转换回矩阵？和我的回答有什么不同？你不需要写一个额外的逗号？
这很重要，因为我的解决方案是 data.table data.frame，因此我无法让您的解决方案发挥作用。一旦我将它转换为矩阵，就像一个魅力。逗号省略是偶然的，不会影响任何事情。

【解决方案5】：

将所有重复项存储到一个向量中，例如重复项，并使用带有单括号子集的 -duplicates 来删除重复列。

       # Define vector of duplicate cols (don't change)
       duplicates <- c(4, 6, 11, 13, 15, 17, 18, 20, 22, 
            24, 25, 28, 32, 34, 36, 38, 40, 
            44, 46, 48, 51, 54, 65, 158)

      # Remove duplicates from food and assign it to food2
         food2 <- food[,-duplicates]

【讨论】：

硬编码重复的列号不是很好。改用which(duplicated(colnames(food))) 会更好、更灵活。