随机森林算法作为Python中的输入[重复]答案

【问题标题】：Random Forest algorithm as an input in Python [duplicate]随机森林算法作为Python中的输入[重复]
【发布时间】：2022-01-20 00:31:09
【问题描述】：

我使用 Python 构建、训练并保存了一个 RF 算法模型，具有以下特性：

已删除文件数（整数）。
路径（字符串）
严重性（整数）

考虑到 sk-learn 不处理字符串，我已经使用 CountVectorizer 转换了数据。如何获取用户输入路径（字符串）并将其转换为与保存模型相同的格式以进行严重性预测？请注意，使用字符串print(clf.predict([[5, '/some/path']])) 的预测会导致错误：

ValueError: Iterable over raw text documents expected, string object received.

【问题讨论】：

否，两种解决方案都会产生另一个错误“TypeError: float() argument must be a string or a number, not 'CountVectorizer'”
那么请用完整的minimal reproducible example 开一个新问题，解释这些解决方案不起作用（并在此处链接）。

标签： python python-3.x machine-learning scikit-learn random-forest

【解决方案1】：

如果您的模型在训练阶段采用转换后的路径（即使用CountVectorizer 转换），那么您还需要在推理阶段应用转换。所以，应该是这样的。

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()
# recall that you have fitted it before
#vectorizer.fit(X_train)

print(clf.predict([[5, vecorizer.transform('/some/path')]]))

【讨论】：