在 R data.frame 中查找常量列的名称答案

【问题标题】：Find the names of constant columns in an R data.frame在 R data.frame 中查找常量列的名称
【发布时间】：2020-05-04 14:02:36
【问题描述】：

这是this question 的后续活动。在 data.frame DATA 中，我有一些列是跨越第一列的唯一行的常数，称为 study.name。例如，setting、prof 和 random 列对于 Shin.Ellis 的所有行是 constant，对于 Trus.Hsu 的所有行是 constant 等等.包括Shin.Ellis 和Trus.Hsu，共有10 个唯一的study.name 行。

我想知道如何找到这些常量列的名称？

下面提供了一个解决方案（见NAMES），但我想知道为什么NAMES会输出自始至终不是恒定的"error"？

DATA <- read.csv("https://raw.githubusercontent.com/izeh/m/master/cc.csv")
DATA <- setNames(DATA, sub("\\.\\d+$", "", names(DATA)))

is_constant <- function(x) length(unique(x)) == 1L 

(NAMES <- names(Filter(all, aggregate(.~study.name, DATA, is_constant)[-1])) )

# > [1] "setting" "prof"   "error"   "random"   ## "error" is NOT a constant variable 
                                                ## BUT why it is outputted here!

# Desired output: 
# [1] "setting" "prof" "random"

【问题讨论】：

标签： r list function dataframe lapply

【解决方案1】：

我们需要传递na.action 来处理NA 元素，否则，它将完全删除整行

names(Filter(all, aggregate(.~study.name, DATA, is_constant, 
            na.action = na.pass)[-1]))
#[1] "setting" "prof"    "random"

【讨论】：

快速跟进。假设DATA 是：a <- data.frame(study.name = c(1,1,2,3), mod.s=c(3,3,1,2), mod.g=c(1,1,3,1)); b <- data.frame(study.name = c(1,1,2,3), mod.s=c(3,3,2,2), mod.g=c(1,2,3,2)); DATA <- cbind(a,b)。现在如果你运行你的代码，它不应该返回任何东西，因为"mod.s" 和"mod.g" 在DATA 中不是常量，但是它错误地返回"mod.s" 和"mod.g"？你能帮忙吗？
@rnorouzian 在您之前的数据集中，只有一个 study.name 列
@rnorouzian 请检查您的数据。它给出的输出与它应该给出的完全一样，因为 is_constant by group 返回 TRUE。有重复的列名 'mod.s' 和 'mod.'g 应该是这样吗？
@rnorouzian 也许你的意思是DATA <- rbind(a, b)，然后你得到names(Filter(all, aggregate(.~study.name, DATA, is_constant, na.action = na.pass)[-1]))# character(0)
抱歉突然睡着了！否 DATA <- cbind(a,b) 是正确的。如果您还记得DATA 的这种形式与我的问题HERE 有关吗？您可以跳过"study.name" 的附加列。