处理要预测的元组中的分类值答案

【问题标题】：Handling categorical values in tuples to be predicted处理要预测的元组中的分类值
【发布时间】：2020-05-29 09:40:20
【问题描述】：

我正在使用经过训练的 sklearn 模型构建 API。我已将模型保存为 .joblib 格式，并在进行预测之前将其加载到 API 后端。但问题是我的数据包含分类列，我在使用pandas 库中的get_dummies() 方法对这些分类列进行一次热编码后训练了我的模型。我的 API 接收带有分类列值的 JSON 数据，没有任何编码。在将元组传递给模型之前，我应该如何对要预测的元组进行编码？有人可以帮我吗？谢谢。

我使用的数据集在编码前后有以下一组列：

之前：

Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal'],
      dtype='object')

之后：

Index(['age', 'sex', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang','oldpeak', 'ca', 'cp_0', 'cp_1', 'cp_2', 'cp_3', 'thal_0',
       'thal_1', 'thal_2', 'thal_3', 'slope_0', 'slope_1', 'slope_2'],
      dtype='object')

【问题讨论】：

标签： python pandas machine-learning scikit-learn one-hot-encoding

【解决方案1】：

在你的 get_dummies on predict 之后试试这个

df.reindex(columns=features, fill_value=0)

其中features 是功能名称列表

【讨论】：