【发布时间】:2020-03-24 17:39:02
【问题描述】:
我有一个训练数据 CSV,并且我成功地预测了我的测试 CSV 的目标列。问题是我需要将结果逆变换回字符串以进行进一步分析。
from sklearn import preprocessing
lbl = preprocessing.LabelEncoder()
for x in train.columns:
if train[x].dtype == 'object':
lbl.fit(list(train[x].values))
train[x] = lbl.transform(list(train[x].values))
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
y = train['target']
del train['target']
X = train
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=1,stratify=y)
clf = RandomForestClassifier(n_estimators = 500, max_depth = 6)
clf.fit(X_train,y_train)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',max_depth=6,max_features='auto', max_leaf_nodes=None,min_impurity_split=1e-07, min_samples_leaf=1,min_samples_split=2, min_weight_fraction_leaf=0.0,n_estimators=500, n_jobs=1, oob_score=False, random_state=None,verbose=0, warm_start=False)
predictions_test = clf.predict(X_test)
lbl = LabelEncoder()
lbl.fit(test['target'])
predictions_test = lbl.inverse_transform(predictions_test)
如果从 csv 中的目标列中删除值,我将得到如下输出。如何在 CSV 文件中写入预测值
数组([nan, nan, nan, ..., nan, nan, nan])
【问题讨论】:
-
没有
prediction数组。应该是predictions_testpredictions_test = lbl.inverse_transform(predictions_test) -
@IronHandOdin 这是一个错字。我将 lbl.fit(y) 更改为 lbl.fit(test['target']) 并且似乎它有效。如果我从测试 excel 中删除目标列值,则输出为 array([nan, nan, nan, ..., nan, nan, nan])
-
能否提供数据样本
-
这是我的测试集,如果我从 CSV 中删除目标列,它会返回 NAN drive.google.com/file/d/1Svhc0uGrreEw_OAVipLYnMi7Skl0VV4D/…
-
@IronHandOdin 我不知道我的想法是否正确。我是新手
标签: python machine-learning scikit-learn