【发布时间】:2018-11-14 15:40:04
【问题描述】:
您好,我正在解决一个回归问题。我的数据集包含 13 个特征和 550068 行。我尝试了不同的模型,发现提升算法(即 xgboost、catboost、lightgbm)在该大数据集上表现良好。这里是代码
import lightgbm as lgb
gbm = lgb.LGBMRegressor(objective='regression',num_leaves=100,learning_rate=0.2,n_estimators=1500)
gbm.fit(x_train, y_train,
eval_set=[(x_test, y_test)],
eval_metric='l2_root',
early_stopping_rounds=10)
y_pred = gbm.predict(x_test, num_iteration=gbm.best_iteration_)
accuracy = round(gbm.score(x_train, y_train)*100,2)
mse = mean_squared_error(y_test,y_pred)
rmse = np.sqrt(mse)
import xgboost as xgb
boost_params = {'eval_metric': 'rmse'}
xgb0 = xgb.XGBRegressor(
max_depth=8,
learning_rate=0.1,
n_estimators=1500,
objective='reg:linear',
gamma=0,
min_child_weight=1,
subsample=1,
colsample_bytree=1,
scale_pos_weight=1,
seed=27,
**boost_params)
xgb0.fit(x_train,y_train)
accuracyxgboost = round(xgb0.score(x_train, y_train)*100,2)
predict_xgboost = xgb0.predict(x_test)
msexgboost = mean_squared_error(y_test,predict_xgboost)
rmsexgboost= np.sqrt(msexgboost)
from catboost import Pool, CatBoostRegressor
train_pool = Pool(x_train, y_train)
cbm0 = CatBoostRegressor(rsm=0.8, depth=7, learning_rate=0.1,
eval_metric='RMSE')
cbm0.fit(train_pool)
test_pool = Pool(x_test)
predict_cat = cbm0.predict(test_pool)
acc_cat = round(cbm0.score(x_train, y_train)*100,2)
msecat = mean_squared_error(y_test,predict_cat)
rmsecat = np.sqrt(msecat)
通过使用上述模型,我得到的 rmse 值约为 2850。现在我想通过减少均方根误差来提高模型性能。如何提高模型性能?由于我是增强算法的新手,哪些参数会影响模型?以及如何为这些算法(xgboost、catboost、lightgbm)进行超参数调整。我使用的是 Windows10 操作系统和英特尔 i5 第 7 代。
【问题讨论】:
-
了解您的数据并进行特征工程。与尝试不同的增强技术相比,它将获得更多的回报。阅读 kaggle 获胜者的采访,他们所要说的就是做特征工程。
标签: machine-learning boost data-science xgboost hyperparameters