【发布时间】:2021-06-25 20:11:47
【问题描述】:
我正在使用 Keras 和 Scikit-Learn 管道构建神经网络进行预处理。到目前为止,我能够构建管道和初始模型架构(非常基本),但是在将两者结合起来时遇到了问题。我能够将管道用于其他机器学习模型(问题一直是深度学习)。
我继续收到以下值错误:
ValueError: 层序号_53 的输入 0 与 层:输入形状的预期轴 -1 的值为 5,但已收到 带形状的输入(无,49)
当我更新模型的 input_dim 以解决初始错误时,在第一个 epoch 完成运行后我收到类似的错误:
纪元 1/100 283/300 [===========================>..] - ETA:0s - 损失: 0.5751 - binary_accuracy:0.7925 -------------------------------------------------- ------------------------- ValueError Traceback(最近一次调用 最后)在 1#拟合模型 ----> 2 历史 = pipeline.fit(X_train, y_train)
ValueError:layersequential_54 的输入 0 与 层:输入形状的预期轴 -1 的值为 49,但已收到 输入形状(无,5)
将 keras 神经网络嵌入 sklearn 管道(需要一次性编码分类变量的管道)的最佳方法是什么?
代码总结如下:
# Preprocessing Pipeline
numeric_features = list(X.select_dtypes(include=['number']))
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('normalize', MinMaxScaler(feature_range=(0,1)))])
categorical_features = list(X.select_dtypes(include=['category']))
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)]
)
# Split Data into Training and Test Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=True)
# Split Training Data into Training and Validation Sets
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42, shuffle=True)
def CreateModel():
# Define Model
model = Sequential([
layers.Dense(units=32, activation='relu', input_dim=X.shape[-1]),
layers.Dense(units=16, activation='relu'),
layers.Dense(units=1, activation='sigmoid')
])
# Specify Optimizer
optimizer = optimizers.Adam(epsilon=0.01)
# Compile the Model
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['binary_accuracy'])
return model
# Add Early Stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=10, min_delta=0.001, restore_best_weights=True)
# Instantiate Baseline Classification Models
clf = KerasClassifier(build_fn=CreateModel, verbose=1, epochs=100, batch_size=16, validation_data=(X_val, y_val), callbacks=[early_stopping])
# Fit to the training set
pipeline = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', clf)
])
# Fit Model
history = pipeline.fit(X_train, y_train)
【问题讨论】:
-
据我所知,这没有完美的解决方案。有几个选项位于stackoverflow.com/q/59755378/10495893
标签: python tensorflow keras scikit-learn one-hot-encoding