如何根据存储在 R 中的向量中的索引来选择列？答案

【问题标题】：How to select columns based on indices stored in a vector in R?如何根据存储在 R 中的向量中的索引来选择列？
【发布时间】：2021-02-02 05:27:13
【问题描述】：

我正在尝试从数据框中选择相关性大于所需截止值的列。我正在使用 findCorrelation 函数将所有更高相关性的索引存储在一个变量中。当我打印这个变量时，我看到索引没有排序。我想知道如何使用此变量从原始数据框中选择列？

correlationMatrix <- cor(cor_numVar[,1:274])
highlyCorrelated <- findCorrelation(correlationMatrix, cutoff=0.5)
train[,highlyCorrelated]

【问题讨论】：

使用类似的东西：original_df[, sort(highlyCorrelated)]
真的很抱歉。由于我想删除这些高度相关的值，所以想做这样的事情original_df[, -sort(highlyCorrelated)] 同时我通过将这些索引转换为列名找到了一个解决方案：to_be_removed <- colnames(correlationMatrix)[highlyCorrelated]original_df[!names(original_df) %in% to_be_removed] 但是，它给了我一个错误。

标签： r dataframe indexing subset correlation

【解决方案1】：

下面是 mtcars 数据集的示例：

correlationMatrix <- cor(mtcars[, 2:10])
highlyCorrelated <- caret::findCorrelation(correlationMatrix, cutoff=0.5)
colnames(correlationMatrix)[-highlyCorrelated] #less correlated variables
colnames(correlationMatrix)[highlyCorrelated] #highly correlated variables

【讨论】：