【问题标题】:What are the minimum input sizes for VGG-16 and ResNet and can I change them?VGG-16 和 ResNet 的最小输入大小是多少,我可以更改它们吗?
【发布时间】:2020-08-28 00:14:50
【问题描述】:

我正在做一个小项目,我想在两个网络中放置一个大小为 (999,13,1) 的元素数组,但是添加作为输入会引发异常,其中一个层需要输入至少为 32x32x3。 我想知道是否可以修改 VGG-16 和 ResNet 的 keras 实现以接受更小、不同的输入(假设它甚至值得修改而不是从头开始),或者是否有最低要求我必须遵守的可接受的输入大小。

其实我还不如再详细解释一下:输入文件是从几个音频文件中提取的梅尔频率倒谱分量特征。 999 代表我提取的 10 秒数据,13 是我采用的倒谱数,1 是那个特定倒谱的值。 现在,VGG16 需要 RGB 图像(至少据我所知),所以我可以将最终轴复制三次并获得大小为 (999,13,3) 的“图像”。问题在于,由于 VGG 层的输入太大而无法计算,因此具有 32 个倒谱分量会引发大量 OOM 错误。降低记录的时间(从 999 到更低的数字)会削弱我的模型的预测。

【问题讨论】:

  • 999,13,1 是你的图片形状吗?
  • 所以,是999, 13, 3?
  • 你的数据建议,它更适合Conv1D模型而不是VGG或resnet等基于Conv2D的模型。
  • 从技术上讲,可以使用 conv1d 设计自定义 vgg16。这会有帮助吗?
  • 检查更新的答案,也许您可​​以将您的标题更改为 VGG16 实现谱图或一维数据,以便以后对其他用户有用。

标签: python keras resnet vgg-net


【解决方案1】:

这是您的频谱图的 VGG16 实现,您的输入图像应该具有尺寸 (999,13),其中 999 表示时间暗淡,13 是过滤器的数量。

您可以根据需要更改一些中间参数。

from tensorflow.keras import models
import numpy as np

import tensorflow as tf
from tensorflow.keras.layers import *




def VGG16_1d(classes = 3):
    img_input = Input((999,13))
    # Block 1
    x = layers.Conv1D(64, 3,
                      activation='relu',
                      padding='same',
                      name='block1_conv1')(img_input)
    x = layers.Conv1D(64, 3,
                      activation='relu',
                      padding='same',
                      name='block1_conv2')(x)
    x = layers.MaxPooling1D(2, strides=2, name='block1_pool', padding='same')(x)

    # Block 2
    x = layers.Conv1D(128, 3,
                      activation='relu',
                      padding='same',
                      name='block2_conv1')(x)
    x = layers.Conv1D(128, 3,
                      activation='relu',
                      padding='same',
                      name='block2_conv2')(x)
    x = layers.MaxPooling1D(2, strides=2, name='block2_pool', padding='same')(x)

    # Block 3
    x = layers.Conv1D(256, 3,
                      activation='relu',
                      padding='same',
                      name='block3_conv1')(x)
    x = layers.Conv1D(256, 3,
                      activation='relu',
                      padding='same',
                      name='block3_conv2')(x)
    x = layers.Conv1D(256, 3,
                      activation='relu',
                      padding='same',
                      name='block3_conv3')(x)
    x = layers.MaxPooling1D(2, strides=2, name='block3_pool', padding='same')(x)

    # Block 4
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block4_conv1')(x)
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block4_conv2')(x)
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block4_conv3')(x)
    x = layers.MaxPooling1D(2, strides=2, name='block4_pool', padding='same')(x)

    # Block 5
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block5_conv1')(x)
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block5_conv2')(x)
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block5_conv3')(x)
    x = layers.MaxPooling1D(2, strides=2, name='block5_pool', padding='same')(x)

    # Classification block
    x = layers.Flatten(name='flatten')(x)
    x = layers.Dense(128, activation='relu', name='fc1')(x) # reduced dim for 1-d task
    x = layers.Dense(128, activation='relu', name='fc2')(x)
    x = layers.Dense(classes, activation='softmax', name='predictions')(x)


    # Create model.
    model = models.Model(img_input, x, name='vgg16')
    return model

model = VGG16_1d(3)
model.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         [(None, 999, 13)]         0         
_________________________________________________________________
block1_conv1 (Conv1D)        (None, 999, 64)           2560      
_________________________________________________________________
block1_conv2 (Conv1D)        (None, 999, 64)           12352     
_________________________________________________________________
block1_pool (MaxPooling1D)   (None, 500, 64)           0         
_________________________________________________________________
block2_conv1 (Conv1D)        (None, 500, 128)          24704     
_________________________________________________________________
block2_conv2 (Conv1D)        (None, 500, 128)          49280     
_________________________________________________________________
block2_pool (MaxPooling1D)   (None, 250, 128)          0         
_________________________________________________________________
block3_conv1 (Conv1D)        (None, 250, 256)          98560     
_________________________________________________________________
block3_conv2 (Conv1D)        (None, 250, 256)          196864    
_________________________________________________________________
block3_conv3 (Conv1D)        (None, 250, 256)          196864    
_________________________________________________________________
block3_pool (MaxPooling1D)   (None, 125, 256)          0         
_________________________________________________________________
block4_conv1 (Conv1D)        (None, 125, 512)          393728    
_________________________________________________________________
block4_conv2 (Conv1D)        (None, 125, 512)          786944    
_________________________________________________________________
block4_conv3 (Conv1D)        (None, 125, 512)          786944    
_________________________________________________________________
block4_pool (MaxPooling1D)   (None, 63, 512)           0         
_________________________________________________________________
block5_conv1 (Conv1D)        (None, 63, 512)           786944    
_________________________________________________________________
block5_conv2 (Conv1D)        (None, 63, 512)           786944    
_________________________________________________________________
block5_conv3 (Conv1D)        (None, 63, 512)           786944    
_________________________________________________________________
block5_pool (MaxPooling1D)   (None, 32, 512)           0         
_________________________________________________________________
flatten (Flatten)            (None, 16384)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 128)               2097280   
_________________________________________________________________
fc2 (Dense)                  (None, 128)               16512     
_________________________________________________________________
predictions (Dense)          (None, 3)                 387       
=================================================================
Total params: 7,023,811
Trainable params: 7,023,811
Non-trainable params: 0

【讨论】:

  • 非常感谢,朋友,我会尽快实施。最后我也通过降低批量解决了这个问题,但我更喜欢你的回答!
猜你喜欢
  • 2021-10-12
  • 1970-01-01
  • 2011-06-08
  • 2018-07-23
  • 1970-01-01
  • 2014-08-11
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多