【发布时间】:2020-07-12 13:47:17
【问题描述】:
我不清楚我应该在什么时候对我的数据应用缩放,以及我应该如何做。另外,有监督和无监督学习的过程是否相同,回归、分类和神经网络是否相同?
第一种方式:
df = pd.read_csv("mydata.csv")
features = df.iloc[:,:-1]
results = df.iloc[:,-1]
scaler = StandardScaler()
features = scaler.fit_transform(features)
x_train, x_test, y_train, y_test = train_test_split(features, results, test_size=0.3, random_state=0)
第二种方式:
df = pd.read_csv("mydata.csv")
features = df.iloc[:,:-1]
results = df.iloc[:,-1]
scaler = StandardScaler()
x_train, x_test, y_train, y_test = train_test_split(features, results, test_size=0.3, random_state=0)
x_train = scaler.fit_transform(x_train)
x_test = scaler.fit_transform(x_test)
第三种方式:
df = pd.read_csv("mydata.csv")
features = df.iloc[:,:-1]
results = df.iloc[:,-1]
scaler = StandardScaler()
x_train, x_test, y_train, y_test = train_test_split(features, results, test_size=0.3, random_state=0)
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
或者也许是第四个?
另外,我有一些样本要用于预测,这些样本不在df 中,我应该如何处理这些数据,我应该怎么做:
samples = scaler.fit_transform(samples)
或:
samples = scaler.transform(samples)
【问题讨论】:
标签: python machine-learning keras scikit-learn