【发布时间】:2018-07-25 18:06:08
【问题描述】:
我正在尝试使用xgboost 制作花呢模型,但是我收到一条模糊的错误消息。
这是一个可重现的例子:
准备数据:
library(xgboost)
library(dplyr)
set.seed(123)
xx <- rpois(5000, 0.02)
xx[xx>0] <- rgamma(sum(xx>0), 50)
yy <- matrix(rnorm(15000), 5000,3, dimnames = list(1:5000, c("a", "b", "c")))
train_test <- sample(c(0,1), 5000, replace = T)
准备xgboost,这里重要的是:objective = 'reg:tweedie'、eval_metric = "tweedie-nloglik"和tweedie_variance_power = 1.2:
dtrain <- xgb.DMatrix(
data = yy %>% subset(train_test == 0),
label = xx %>% subset(train_test == 0)
)
dtest <- xgb.DMatrix(
data = yy %>% subset(train_test == 1),
label = xx %>% subset(train_test == 1)
)
watchlist <- list(eval = dtest, train = dtrain)
param <- list(max.depth = 2,
eta = 0.3,
nthread = 1,
silent = 1,
objective = 'reg:tweedie',
eval_metric = "tweedie-nloglik",
tweedie_variance_power = 1.2)
最后调用 xgboost:
resBoost <- xgb.train(params = param, data=dtrain, nrounds = 20, watchlist=watchlist)
这给出了这个晦涩的错误消息:
Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) :
[17:59:18] amalgamation/../src/metric/elementwise_metric.cc:168: Check failed: param != nullptr tweedie-nloglik must be in formattweedie-nloglik@rho
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(dmlc::StackTrace[abi:cxx11]()+0x1bc) [0x7f1f0ce742ac]
[bt] (1) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f1f0ce74e88]
[bt] (2) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(xgboost::metric::EvalTweedieNLogLik::EvalTweedieNLogLik(char const*)+0x1eb) [0x7f1f0cea00db]
[bt] (3) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(+0x68ef1) [0x7f1f0ce78ef1]
[bt] (4) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(xgboost::Metric::Create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x263) [0x7f1f0ce7ede3]
[bt] (5) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(xgboost::LearnerImpl::Configure(std::vector<std::pair
问题似乎与参数eval_metric = "tweedie-nloglik" 相关,因为如果我将eval_metric 更改为logloss,它就会通过:
param$eval_metric <- "logloss"
resBoost <- xgb.train(params = param, data=dtrain, nrounds = 20, watchlist=watchlist)
[1] eval-logloss:0.634391 train-logloss:0.849734
[2] eval-logloss:0.634391 train-logloss:0.849734
...
知道如何使用eval_metric = "tweedie-nloglik" 参数,因为它似乎在我的上下文中最合适?谢谢
【问题讨论】:
-
我认为你应该用
"tweedie-nloglik@rho"替换它,rho是1到2之间的数字,根据这个:github.com/dmlc/xgboost/blob/master/src/metric/… -
弗兰斯做对了。它可能是与 R 相关的实现。我认为在 python 中它可能是 Bastien 写的,但我不确定。