R预测缺失值答案

【问题标题】：R predict missing values [closed]R预测缺失值
【发布时间】：2015-05-09 10:05:00
【问题描述】：

我应该如何根据 R 中的其他值预测缺失值 NA？平均值是不够的。

所有值都是可靠的 - 列值是树范围率，行是三个高度，以米为单位。

我的excel文件是here。

有没有办法做到这一点？我一直在尝试使用预测功能，但没有成功。

【问题讨论】：

标签： r machine-learning classification missing-data predict

【解决方案1】：

有很多方法可以解决这个问题，但这里有一种。我也尝试在您的数据集上使用它，但它要么太小，要么有太多线性组合，或者其他什么，因为它没有收敛。

阿米莉亚 - http://fastml.com/impute-missing-values-with-amelia/

data(mtcars)

mtcars1<-mtcars[rep(row.names(mtcars),10),] #increasing dataset

#inserting NAs into dataset
insert_nas <- function(x) {
    len <- length(x)
    n <- sample(1:floor(0.2*len), 1) #randomly choosing # of missing obs
    i <- sample(1:len, n) #choosing which to make missing
    x[i] <- NA 
    x
}

mtcars1 <- sapply(mtcars1, insert_nas)

ords = c( 'cyl','hp','vs','am','gear','carb' ) #integers - your dataset has no integers so don't specify this
#idvars = c( 'these', 'will', 'be', 'ignored' )
#noms = c( 'some', 'nominal', 'columns' ) #categorical

a.out = amelia( mtcars1,  ords = ords)

a.out$imputations[[1]]

#you can also ensemble your imputations if you'd like. Here we ensemble 3 of the 5 returned imputations
final_data<-as.data.frame(sapply(colnames(a.out$imputations[[1]]),function(i)
    rowMeans(cbind(a.out$imputations[[1]][,i],a.out$imputations[[2]][,i],a.out$imputations[[3]][,i]))))

【讨论】：