如何在sklearn中缩放单个样本以进行预测？答案

【问题标题】：How to scale single sample for prediction in sklearn?如何在sklearn中缩放单个样本以进行预测？
【发布时间】：2019-11-25 19:13:46
【问题描述】：

我有一组已经适合回归模型的缩放数据。

在引入要预测的单个样本时，您打算如何在预测之前缩放此输入？

我可以 concat 到原始数据框，重新缩放并提取底行。但这不会造成数据泄漏吗？正确的？我也必须改装模型？

处理这种情况的正确方法是什么？

【问题讨论】：

标签： python scikit-learn

【解决方案1】：

您应该使用之前自己训练的模型来扩展测试数据。

如果您在原始数据框中插入该行，这不是正确的方式，因为您会导致 data-leakage ，您将无法以这种方式在生产中看到真实数据。

假设您有多个这样的样本，并且您决定对缩放器进行建模，再次查看这些新数据，这被认为是不好的做法并导致数据泄漏，您最初的缩放器模型应该只使用。

对我来说有趣的是，如果您的训练数据和测试数据在这种情况下具有不同的分布，那么无论您选择的扩展策略如何，它都无法很好地与测试数据配合使用，这是一个有用的
link 描述了问题和可能的解决方案。

这是用于扩展您的训练和测试数据的示例，转载自 - here

import numpy as np
import matplotlib as mpl
from matplotlib import pyplot as plt
from matplotlib import cm
from sklearn.preprocessing import RobustScaler
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

## load the dataset
dataset = fetch_california_housing()
X_full, y_full = dataset.data, dataset.target
##split into train and test
X_train,X_test,y_train,y_test = train_test_split(X_full,y_full)


## initialize the scaler
scale = RobustScaler()

### you are fitting the scaler and then transforming the data
## the scaler looks at the data in the train set and creates a model
## which will be used to transform the data
X_train_scaled = scale.fit_transform(X_train)
print(X_train)
print(X_train_scaled)


#### scale has been fitted once , you should be using this now
### on all test/ predict data that come in
### hence the below line only applies transform on the data
### if you are going to fit again that would mean data-leakage
X_test_scale = scale.transform(X_test)

【讨论】：

但是如何使用模型来缩放测试数据？
@LewisMorris 请参阅下面我的 naswer
@LewisMorris - 我已经添加了代码来给你一个示例以及 cmets，希望可以清除它
这太棒了！！我现在明白了....您将缩放器与测试数据相匹配，然后您可以将任何数据传递给 predict 方法。太感谢了。我永远感激你帮助了我这么多。

【解决方案2】：

此示例使用`MinMaxScaler` 来缩放数据，但同样的原则适用于`all` 案例。

流程总结：

第 1 步：将scaler 安装在TRAINING data 上
第2步：使用scaler到transform the training data
第3步：使用transformed training data到fit the predictive model
第4步：使用scaler到transform the TEST data
第 5 步：predict 使用 trained model 和 transformed TEST data

使用虹膜数据的示例：

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC

data = datasets.load_iris()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train) # fit it on the training data

model = SVC()
model.fit(X_train_scaled, y_train)

X_test_scaled = scaler.transform(X_test) # apply it on the test data
y_pred = model.predict(X_test_scaled) # model prediction on the scaled test set

希望这会有所帮助。干杯

【讨论】：

此示例使用MinMaxScaler 来缩放数据，但同样的原则适用于all 案例。

此示例使用`MinMaxScaler` 来缩放数据，但同样的原则适用于`all` 案例。