Conv2DTranspose 中的过滤器和内核大小应该是多少？答案

【问题标题】：What should be filters and kernel size in Conv2DTranspose?Conv2DTranspose 中的过滤器和内核大小应该是多少？
【发布时间】：2021-03-17 10:51:21
【问题描述】：

我正在尝试创建一个简单的 GAN，但无法选择正确的参数。考虑下面的生成器和鉴别器代码。它产生 (HEIGHT = 32 宽度 = 54)。

def build_generator(latent_size=100):
    # we will map a pair of (z, L), where z is a latent vector and L is a
    # label drawn from P_c, to image space (..., 54, 32, 3)
    cnn = Sequential()

    cnn.add(Dense(3*54*32, input_dim=latent_size, activation='relu'))
    cnn.add(Reshape((4, 3, 432)))

    # upsample to (8, 6, ...)
    cnn.add(Conv2DTranspose(192, 2, strides=2, padding='valid',
                        activation='relu',
                        kernel_initializer='glorot_normal'))
    cnn.add(BatchNormalization())

    # upsample to (16, 18, ...)
    cnn.add(Conv2DTranspose(96, 5, strides=(2,3), padding='same',
                        activation='relu',
                        kernel_initializer='glorot_normal'))
    cnn.add(BatchNormalization())

    # upsample to (32, 54, ...)
    cnn.add(Conv2DTranspose(3, 5, strides=(2,3), padding='same',
                        activation='tanh',
                        kernel_initializer='glorot_normal'))


    # this is the z space commonly referred to in GAN papers
    latent = Input(shape=(latent_size, ))

    # this will be our label
    image_class = Input(shape=(1,), dtype='int32')

    cls = Embedding(num_classes, latent_size,
                    embeddings_initializer='glorot_normal')(image_class)

    # hadamard product between z-space and a class conditional embedding
    h = layers.multiply([latent, cls])

    fake_image = cnn(h)

    return Model([latent, image_class], fake_image)


def build_discriminator():
    # build a relatively standard conv net, with LeakyReLUs as suggested in
    # the reference paper
    cnn = Sequential()

    cnn.add(Conv2D(32, 3, padding='same', strides=2,
                   input_shape=(32, 54, 3)))
    cnn.add(LeakyReLU(0.2))
    cnn.add(Dropout(0.3))

    cnn.add(Conv2D(64, 3, padding='same', strides=1))
    cnn.add(LeakyReLU(0.2))
    cnn.add(Dropout(0.3))

    cnn.add(Conv2D(128, 3, padding='same', strides=2))
    cnn.add(LeakyReLU(0.2))
    cnn.add(Dropout(0.3))

    cnn.add(Conv2D(256, 3, padding='same', strides=1))
    cnn.add(LeakyReLU(0.2))
    cnn.add(Dropout(0.3))

    cnn.add(Flatten())

    image = Input(shape=(32, 54, 3))

    features = cnn(image)

    # first output (name=generation) is whether or not the discriminator
    # thinks the image that is being shown is fake, and the second output
    # (name=auxiliary) is the class that the discriminator thinks the image
    # belongs to.
    fake = Dense(1, activation='sigmoid', name='generation')(features)
    aux = Dense(num_classes, activation='softmax', name='auxiliary')(features)

    return Model(image, [fake, aux])

但我想生成尺寸为 (200, 200) 而不是 (54, 32) 的图像。我曾尝试更改图层中的几个参数，但总是出现此错误：

ValueError: Input 0 of layer auxiliary is incompatible with the layer: expected axis -1 of input shape to have value 4000000 but received input with shape (None, 179200)

应更改哪些参数以生成形状为 (200, 200) 的图像？

【问题讨论】：

标签： python machine-learning keras deep-learning conv-neural-network

【解决方案1】：

一个简单的解决方案是从这里开始：

cnn.add(Dense(25*25*432, input_dim=latent_size, activation='relu'))
cnn.add(Reshape((25, 25, 432)))

然后反卷积3次到25x2x2x2 = 200

cnn.add(Conv2DTranspose(192, 2, strides=2, padding='valid',
                    activation='relu',
                    kernel_initializer='glorot_normal'))
cnn.add(BatchNormalization())

cnn.add(Conv2DTranspose(96, 2, strides=2, padding='valid',
                    activation='relu',
                    kernel_initializer='glorot_normal'))
cnn.add(BatchNormalization())

cnn.add(Conv2DTranspose(3, 2, strides=2, padding='valid',
                    activation='relu',
                    kernel_initializer='glorot_normal'))
cnn.add(BatchNormalization())

【讨论】：

嗨@maggu。感谢你的回答。我已按照您的指示进行操作，但无法训练模型。请看一下这个 colab colab.research.google.com/drive/…。我希望分享我的 colab 链接不会违反任何规则。
你也可以检查 build_discriminator() 代码吗？我不确定它是否正确。
因此您还需要将判别器调整为所需的图像大小，在这种情况下 input_shape=(200, 200, 3) 以及以后的 image = Input(shape=(200, 200, 3 ))...那它有用吗？
在进行您所说的更改后，我收到此错误：“ValueError：连接轴的所有输入数组维度必须完全匹配，但沿着维度 1，索引 0 处的数组大小为 32 并且索引 1 处的数组大小为 200"