是的,你可以!但是,字段 C 和其他列之间需要存在相关性。如果不是,那么预测将接近随机。
- 使用 A、B、D、E 作为输入 (x) 训练模型
- 使 C 成为 (y)
将数据集分为训练、测试和验证。
回答你的其他问题(如果我没有用这个变量训练我的模型,是否也可以?)
- 不,因为模型将如何学习将 4 个输入字段映射到一个输出字段,在这种情况下它将是 (C)。
要了解这个问题,请将您的方法与波士顿住房dataset 进行比较。
import pandas as pd
import numpy as np
# Read dataset into X and Y
df = pd.read_csv('YOURDATASET.csv', delim_whitespace=True, header=None)
dataset = df.values
# for example, your dataset is all loaded into a matrix (aka an array with rows of data, and each Index representing those features mentioned A B C D E)
X = dataset[:, 0:1] + dataset[:, 3:4]
Y = dataset[:, 2]
#print "X: ", X
#print "Y: ", Y
# Define the neural network
from keras.models import Sequential
from keras.layers import Dense
def build_nn():
model = Sequential()
model.add(Dense(20, input_dim=5, init='normal', activation='relu'))
# No activation needed in output layer (because regression)
model.add(Dense(1, init='normal'))
# Compile Model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
# Evaluate model (kFold cross validation)
from keras.wrappers.scikit_learn import KerasRegressor
# sklearn imports:
from sklearn.cross_validation import cross_val_score, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# Before feeding the i/p into neural-network, standardise the dataset because all input variables vary in their scales
estimators = []
estimators.append(('standardise', StandardScaler()))
estimators.append(('multiLayerPerceptron', KerasRegressor(build_fn=build_nn, nb_epoch=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n=len(X), n_folds=10)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print "Mean: ", results.mean()
print "StdDev: ", results.std()