【问题标题】:LightGBM : validation AUC score during model fit differs from manual testing AUC score for same test setLightGBM:模型拟合期间的验证 AUC 分数与相同测试集的手动测试 AUC 分数不同
【发布时间】:2020-07-13 05:47:57
【问题描述】:

我有一个带有以下参数的 LightGBM 分类器:

lgbmodel_2_wt = LGBMClassifier(boosting_type='gbdt',
                        num_leaves= 105, 
                        max_depth= 11,
                        learning_rate=0.03,
                        n_estimators= 5000,
                        categorical_feature=[0,1,3,4,5,6,7,8,9,10,11,12,13,14,15],
                        objective='binary',
                        class_weight= {0: 0.6, 1: 1},
                        min_split_gain=0.01,
                        min_child_weight=2,
                        min_child_samples=20,
                        subsample=0.9,
                        colsample_bytree=0.8,
                        reg_alpha=0.1,
                        reg_lambda=0.1,
                        n_jobs= -1,
                        verbose= -1)

以下是模型拟合函数调用:

history = {}
eval_history = record_evaluation(history)
lgbmodel_2_wt.fit(
    X_train, y_train,
    eval_set= [(X_train, y_train), (X_test, y_test)],
    eval_metric='auc', verbose=500, early_stopping_rounds=30,
    callbacks= [eval_history])

上述拟合返回以下评估结果:

Training until validation scores don't improve for 30 rounds
[500]   training's auc: 0.902706    training's binary_logloss: 0.379436 valid_1's auc: 0.887315 valid_1's binary_logloss: 0.369
Early stopping, best iteration is:
[860]   training's auc: 0.909587    training's binary_logloss: 0.366997 valid_1's auc: 0.88844  valid_1's binary_logloss: 0.366346

根据上图,最佳 AUC 得分为 0.88844。但是,当手动预测同一集合(即“X_test”)的结果时,结果会发生变化:

y_pred = lgbmodel_2_wt.predict(X_test)
roc_auc_score(y_test, y_pred)

上述代码段的 AUC 分数为 0.7901740256981424 。 我应该认为哪个 AUC 分数是正确的,因为同一测试集的分数不同。 LightGBM 的在线文档有限,我一直很难解释结果。任何帮助表示赞赏。

【问题讨论】:

标签: python machine-learning classification auc lightgbm


【解决方案1】:

Meto!这解决了我的问题: https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier.fit

我发现我们需要在模型构造函数中添加first_metric_only = True,例如:

gbm = LGBMClassifier(learning_rate=0.01, first_metric_only = True)

gbm.fit(train_X, train_Y,eval_set =[(test_X,test_Y)] , eval_metric=['auc'],
        early_stopping_rounds=10,verbose = 2)

【讨论】:

    【解决方案2】:

    试试

    y_pred = lgbmodel_2_wt.predict_proba(X_test)[:, 1]

    而不是

    y_pred = lgbmodel_2_wt.predict(X_test)

    【讨论】:

      猜你喜欢
      • 2020-10-04
      • 2019-02-13
      • 1970-01-01
      • 1970-01-01
      • 2017-06-16
      • 2015-09-07
      • 1970-01-01
      • 2021-05-18
      • 2017-07-16
      相关资源
      最近更新 更多