【问题标题】:Keeping the ID Key (Or Any Other Column) When Scoring a New Data Set?为新数据集评分时保留 ID 键(或任何其他列)?
【发布时间】:2017-08-30 04:17:30
【问题描述】:

这可能是一个愚蠢的问题,但是当我在 R 中使用 H2O Predict 函数时,我想知道是否有一种方法可以指定它保留得分数据中的一列或多列。具体来说,我想保留我唯一的 ID 密钥。

就目前而言,我最终采用了一种非常低效的方法,即为原始数据集分配一个索引键,为分数分配一个索引键,然后将分数合并到评分数据集。我宁愿只说“为这个数据集评分并保留 x,y,z....列。”有什么建议吗?

低效的代码:

#Use H2O predict function to score new data
NL2L_SCore_SetScored.hex = h2o.predict(object = best_gbm, newdata = 
NL2L_SCore_Set.hex)

#Convert scores hex to data frame from H2O
NL2L_SCore_SetScored.df<-as.data.frame(NL2L_SCore_SetScored.hex)
#add index to the scores so we can merge the two datasets
NL2L_SCore_SetScored.df$ID <- seq.int(nrow(NL2L_SCore_SetScored.df))



#Convert orignal scoring set to data frame from H2O
NL2L_SCore_Set.df<-as.data.frame(NL2L_SCore_Set.hex)
#add index to original scoring data so we can merge the two datasets
NL2L_SCore_Set.df$ID <- seq.int(nrow(NL2L_SCore_Set.df))


#Then merge by newly created ID Key so I have the scores on my scoring data 
#set. Ideally I wouldn't have to even create this key and could keep 
#original Columns from the data set, which include the customer id key

Full_Scored_Set=inner_join(NL2L_SCore_Set.df,NL2L_SCore_Set.df, by="ID" )

【问题讨论】:

    标签: r data-manipulation predict h2o scoring


    【解决方案1】:

    您可以简单地将 ID 列列绑定到预测帧,而不是进行连接,因为预测帧行的顺序相同。

    R 示例(忽略我在原始训练集上进行预测的事实,这仅用于演示目的):

    library(h2o)
    h2o.init()
    
    data(iris)
    iris$id <- 1:nrow(iris)  #add ID column
    iris_hf <- as.h2o(iris)  #convert iris to an H2OFrame
    
    fit <- h2o.gbm(x = 1:4, y = 5, training_frame = iris_hf)
    pred <- h2o.predict(fit, newdata = iris_hf)
    pred$id <- iris_hf$id
    head(pred)
    

    现在你有了一个带有 ID 列的预测框:

      predict    setosa   versicolor    virginica id
    1  setosa 0.9989301 0.0005656447 0.0005042210  1
    2  setosa 0.9985183 0.0006462680 0.0008354416  2
    3  setosa 0.9989298 0.0005663071 0.0005038929  3
    4  setosa 0.9989310 0.0005660443 0.0005029535  4
    5  setosa 0.9989315 0.0005649384 0.0005035886  5
    6  setosa 0.9983457 0.0011517334 0.0005025218  6
    

    【讨论】:

      猜你喜欢
      • 2021-11-11
      • 1970-01-01
      • 2018-12-22
      • 1970-01-01
      • 2014-06-17
      • 2018-07-23
      相关资源
      最近更新 更多