在 R 中实现 LIME 对 h2o 建模答案

【问题标题】：Implementation of LIME on h2o modelling in R在 R 中实现 LIME 对 h2o 建模
【发布时间】：2017-12-17 00:05:23
【问题描述】：

我想在 R 中使用 h2o（深度学习）创建的模型上实现 LIME。为了使用模型中的数据，我创建了 h2oFrames 并将其转换回数据帧，然后在 LIME 中使用它（石灰函数，因为LIME 的解释功能无法识别 h2oFrame）。在这里我可以运行函数

下一步是对测试数据使用解释功能来生成解释。在这里，R 在使用数据帧和 h2oFrame 时会引发错误。

这是使用数据框时产生的错误：

Error in chk.H2OFrame(x) : must be an H2OFrame

这是使用 h2oframe 时产生的错误：

Error in UseMethod("permute_cases") : 
  no applicable method for 'permute_cases' applied to an object of class "H2OFrame"

if(!require(pacman))  install.packages("pacman")
pacman::p_load(h2o, lime, data.table, e1071)

data(iris)
h2o.init( nthreads = -1 )
h2o.no_progress()

# Split up the data set
iris <- as.h2o(iris)

split <- h2o.splitFrame( iris, c(0.6, 0.2), seed = 1234 )
iris_train <- h2o.assign( split[[1]], "train" ) # 60%
iris_valid <- h2o.assign( split[[2]], "valid" ) # 20%
iris_test  <- h2o.assign( split[[3]], "test" )  # 20%


output <- 'Species'
input <- setdiff(names(iris),output)


model_dl_1 <- h2o.deeplearning(
  model_id = "dl_1", 
  training_frame = iris_train, 
  validation_frame = iris_valid,
  x = input,
  y = output,
  hidden = c(32, 32, 32),
  epochs = 10, # hopefully converges earlier...
  score_validation_samples = 10000, 
  stopping_rounds = 5,
  stopping_tolerance = 0.01
)

pred1 <- h2o.predict(model_dl_1, iris_test)
list(dimension = dim(pred1), pred1$predict)

#convert to df from h2ofdataframe

train_org<-as.data.frame(iris_train) 
#converting train h2oframe to dataframe
sapply(train_org,class) #checking the class of train_org
test_df <- as.data.frame(iris_test) 
#converting test data h2oFrame to dataframe
test_sample <- test_df[1:1,] 

#works
#lime is used to get explain on the train data
explain <- lime(train_org, model_dl_1, bin_continuous = FALSE, n_bins = 
                  5, n_permutations = 1000)


# Explain new observation
explanation <- explain(test_sample, n_labels = 1, n_features = 1)
h2o.shutdown(prompt=F)

谁能帮我找到一个解决方案或方法来使用 LIME 的解释功能和适当的数据帧

【问题讨论】：

请提供完全可复现的代码示例以及有关石灰和 h2o R 包的版本信息。
您需要更新帖子中的代码，以便它可以重现——它可以是任何数据集（虹膜就可以了）。请在此处查看有关 MCVE 的 Stack Overflow 指南：stackoverflow.com/help/mcve 如果我无法复制/粘贴您的代码来帮助您调试代码，则它不是 MCVE。
@ErinLeDell，感谢您的反馈，我会进行更改。
@ErinLeDell，我已经发布了完整的代码。请你看看。感谢您的宝贵时间
您能否说明您发布的代码中出现了两条错误消息中的哪一条，以及出现在哪一行？

标签： r dataframe h2o

【解决方案1】：

底层的lime 包使用两个函数，predict_model() 和model_type()，您需要为当前不受支持的任何模型设置它们。

对于您的具体示例，这是您需要做的。

第 1 步：为 H2OMultinomialModel 类的模型设置通用 model_type 函数。您在这里所做的只是告诉lime 您希望它执行什么模型类型，例如“分类”或“回归”。

model_type.H2OMultinomialModel <- function(x, ...) {
    # Function tells lime() what model type we are dealing with
    # 'classification', 'regression', 'survival', 'clustering', 'multilabel', etc
    #
    # x is our h2o model

    return("classification")

}

第 2 步：为 H2OMultinomialModel 类的模型设置通用 predict_model 函数。这里的关键是理解，要让石灰发挥作用，它需要分类概率而不是预测（这花了我一点时间才弄清楚，它必须处理 lime:::output_type(explaination) 变量）。

predict_model.H2OMultinomialModel <- function(x, newdata, type, ...) {
    # Function performs prediction and returns dataframe with Response
    #
    # x is h2o model
    # newdata is data frame
    # type is only setup for data frame

    pred <- h2o.predict(x, as.h2o(newdata))

    # return classification probabilities only
    return(as.data.frame(pred[,-1]))

}

正确设置这些功能后，您就可以运行 lime 脚本了。

# Lime is used to get explain on the train data
explainer <- lime(train_org, model_dl_1, bin_continuous = FALSE, n_bins = 5, n_permutations = 1000)

# Explain new observation
explanation <- explain(test_sample, explainer, n_labels = 1, n_features = 1)

【讨论】：

请注意，lime 软件包已更新为集成 h2o。您可能需要在此处下载 GitHub 版本：github.com/thomasp85/lime