【发布时间】:2017-08-30 04:17:30
【问题描述】:
这可能是一个愚蠢的问题,但是当我在 R 中使用 H2O Predict 函数时,我想知道是否有一种方法可以指定它保留得分数据中的一列或多列。具体来说,我想保留我唯一的 ID 密钥。
就目前而言,我最终采用了一种非常低效的方法,即为原始数据集分配一个索引键,为分数分配一个索引键,然后将分数合并到评分数据集。我宁愿只说“为这个数据集评分并保留 x,y,z....列。”有什么建议吗?
低效的代码:
#Use H2O predict function to score new data
NL2L_SCore_SetScored.hex = h2o.predict(object = best_gbm, newdata =
NL2L_SCore_Set.hex)
#Convert scores hex to data frame from H2O
NL2L_SCore_SetScored.df<-as.data.frame(NL2L_SCore_SetScored.hex)
#add index to the scores so we can merge the two datasets
NL2L_SCore_SetScored.df$ID <- seq.int(nrow(NL2L_SCore_SetScored.df))
#Convert orignal scoring set to data frame from H2O
NL2L_SCore_Set.df<-as.data.frame(NL2L_SCore_Set.hex)
#add index to original scoring data so we can merge the two datasets
NL2L_SCore_Set.df$ID <- seq.int(nrow(NL2L_SCore_Set.df))
#Then merge by newly created ID Key so I have the scores on my scoring data
#set. Ideally I wouldn't have to even create this key and could keep
#original Columns from the data set, which include the customer id key
Full_Scored_Set=inner_join(NL2L_SCore_Set.df,NL2L_SCore_Set.df, by="ID" )
【问题讨论】:
标签: r data-manipulation predict h2o scoring