【发布时间】:2019-09-27 22:30:27
【问题描述】:
我想使用LightGBM 来预测房子的tradeMoney,但是当我在LightGBM 的lgb.Dataset 中指定categorical_feature 时遇到麻烦。
我得到data.dtypes如下:
type(train)
pandas.core.frame.DataFrame
train.dtypes
area float64
rentType object
houseFloor object
totalFloor int64
houseToward object
houseDecoration object
region object
plate object
buildYear int64
saleSecHouseNum int64
subwayStationNum int64
busStationNum int64
interSchoolNum int64
schoolNum int64
privateSchoolNum int64
hospitalNum int64
drugStoreNum int64
我使用LightGBM对其进行如下训练:
categorical_feats = ['rentType', 'houseFloor', 'houseToward', 'houseDecoration', 'region', 'plate']
folds = KFold(n_splits=5, shuffle=True, random_state=2333)
oof_lgb = np.zeros(len(train))
predictions_lgb = np.zeros(len(test))
feature_importance_df = pd.DataFrame()
for fold_, (trn_idx, val_idx) in enumerate(folds.split(train.values, target.values)):
print("fold {}".format(fold_))
trn_data = lgb.Dataset(train.iloc[trn_idx], label=target.iloc[trn_idx], categorical_feature=categorical_feats)
val_data = lgb.Dataset(train.iloc[val_idx], label=target.iloc[val_idx], categorical_feature=categorical_feats)
num_round = 10000
clf = lgb.train(params, trn_data, num_round, valid_sets = [trn_data, val_data], verbose_eval=500, early_stopping_rounds = 200)
oof_lgb[val_idx] = clf.predict(train.iloc[val_idx], num_iteration=clf.best_iteration)
predictions_lgb += clf.predict(test, num_iteration=clf.best_iteration) / folds.n_splits
print("CV Score: {:<8.5f}".format(r2_score(target, oof_lgb)))
但即使我指定了categorical_features,它仍然会给出这样的错误消息。
ValueError:数据的 DataFrame.dtypes 必须是 int、float 或 bool。做过 不期望字段rentType,houseFloor,houseToward中的数据类型, 房屋装饰、地区、盘子
以下是要求:
LightGBM 版本:2.2.3
熊猫版本:0.24.2
Python 版本:3.6.8
|蟒蛇公司| (默认,2019 年 2 月 21 日,18:30:04)[MSC v.1916 64 位 (AMD64)]
有人可以帮帮我吗?
【问题讨论】: