Keras 多输入网络，使用图像和结构化数据：如何构建正确的输入数据？答案

【问题标题】：Keras Multi Input Network, using Images and structured data : How do I build the correct input data?Keras 多输入网络，使用图像和结构化数据：如何构建正确的输入数据？
【发布时间】：2020-11-09 19:49:12
【问题描述】：

我正在使用 Keras 函数 API 构建多输入网络，但我很难找到并理解正确的输入数据格式。

我有两个主要输入：

一个是图像，它会抛出一个经过微调的 ResNet50 CNN
第二个是一个简单的 numpy 数组 (X_train)，其中包含有关图像的元数据（图像的位置和大小）。这会抛出一个简单的密集网络。

我从数据帧加载图像，其中包含元数据和对应图像的文件路径。我使用 ImageDataGenerator 和 flow_from_dataframe 方法来加载我的图像：

datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

train_flow = datagen.flow_from_dataframe(
                                        dataframe=df_train,
                                        x_col="cropped_img_filepath",
                                        y_col="category",
                                        batch_size=batch_size,
                                        shuffle=False,
                                        class_mode="categorical",
                                        target_size=(224,224)
                                        )

我可以使用它们自己的数据分别训练两个网络，直到这里没有问题。
然后将两个不同网络的两个输出组合成一个密集网络，输出一个 10 位的概率向量：

# Create the input for the final dense network using the output of both the dense MLP and CNN
combinedInput = concatenate([cnn.output, mlp.output])

x = Dense(512, activation="relu")(combinedInput)
x = Dense(256, activation="relu")(x)
x = Dense(128, activation="relu")(x)
x = Dense(32, activation="relu")(x)
x = Dense(10, activation="softmax")(x)



model = Model(inputs=[cnn.input, mlp.input], outputs=x)

# Compile the model 
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss="categorical_crossentropy",
              metrics=['accuracy'],
              optimizer=opt)

# Train the model
model_history = model.fit(x=(train_flow, X_train), 
                          y=y_train, 
                          epochs=1, 
                          batch_size=batch_size)

但是，当我无法训练整个网络时，我会收到以下错误：

ValueError: 无法找到可以处理输入的数据适配器：( 包含类型 {"", ""}),

我了解我没有为我的输入数据使用正确的输入格式。
我可以用 train_flow 训练我的 CNN，用 X_train 训练我的密集网络，所以我希望这能奏效。

你知道如何将图像数据和 num 数组组合成一个多输入数组吗？

感谢您提供的所有信息！

【问题讨论】：

您能否描述一下您是如何创建 X_train 的？它看起来不像是错误中的 Numpy 数组？
感谢您的快速回复！它确实是一个numpy.ndarray，我认为它与数组相同。我通过在我的 df_train DataFrame 上使用 Pandas.DataFrame.to_numpy() 方法构建它
我认为您需要一个生成 x1、x2、y 的自定义生成器。请看一下这个链接：github.com/keras-team/keras/issues/8130#issuecomment-336855177

标签： python tensorflow keras input deep-learning

【解决方案1】：

我终于找到了方法，从@Nima Aghli 提议的帖子中启发了我。
这是我的做法：

首先实例化预处理功能（对我来说是用于 ResNest50 的那个）：

from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input

def preprocess_function(x):
    if x.ndim == 3:
        x = x[np.newaxis, :, :, :]
    return preprocess_input(x)

# Initializing the datagen, using the above function :
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

然后定义自定义数据生成器，该生成器将生成随机采样的数组耦合图像和元数据，同时确保永远不会缺少数据（以便您可以在任意数量的 epoch 上运行）：

def createGenerator(dff, verif=False, batch_size=BATCH_SIZE):

    # Shuffles the dataframe, and so the batches as well
    dff = dff.sample(frac=1)
    
    # Shuffle=False is EXTREMELY important to keep order of image and coord
    flow = datagen.flow_from_dataframe(
                                        dataframe=dff,
                                        directory=None,
                                        x_col="cropped_img_filepath",
                                        y_col="category",
                                        batch_size=batch_size,
                                        shuffle=False,
                                        class_mode="categorical",
                                        target_size=(224,224),
                                        seed=42
                                      )
    idx = 0
    n = len(dff) - batch_size
    batch = 0
    while True : 
        # Get next batch of images
        X1 = flow.next()
        # idx to reach
        end = idx + X1[0].shape[0]
        # get next batch of lines from df
        X2 = dff[["x", "y", "w", "h"]][idx:end].to_numpy()
        dff_verif = dff[idx:end]
        # Updates the idx for the next batch
        idx = end
#         print("batch nb : ", batch, ",   batch_size : ", X1[0].shape[0])
        batch+=1
        # Checks if we are at the end of the dataframe
        if idx==len(dff):
#             print("END OF THE DATAFRAME\n")
            idx = 0
            

        # Yields the image, metadata & target batches
        if verif==True :
            yield [X1[0], X2], X1[1], dff_verif
        else :
            yield [X1[0], X2], X1[1]  #Yield both images, metadata and their mutual label

我自愿保留注释，因为它有助于掌握所有计算的操作。
要点/问题是从所有数据帧中获取图像，而不会缺少图像，并且具有相同大小的批次。
此外，我们必须注意图像/元数据的顺序，以便正确的信息连接到返回数组中的正确图像。

【讨论】：