是否可以在 caffe 中使用任意图像大小？答案

【问题标题】：Is it possible to use arbitrary image sizes in caffe?是否可以在 caffe 中使用任意图像大小？
【发布时间】：2017-07-19 09:06:14
【问题描述】：

我知道 caffe 有所谓的空间金字塔层，它使网络能够使用任意大小的图像。我遇到的问题是，网络似乎拒绝在单个批次中使用任意图像大小。我错过了什么还是这是真正的问题？

我的 train_val.prototxt：

name: "digits"
layer {
  name: "input"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "input"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/test_lmdb"
    batch_size: 10
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "bn1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "spatial_pyramid_pooling"
  type: "SPP"
  bottom: "conv2"
  top: "pool2"
  spp_param {
    pyramid_height: 2
  }
} 
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "pool2"
  top: "bn2"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "pool2"
  top: "bn2"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "bn2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

Link 关于后续问题的另一个问题。

【问题讨论】：

我认为如果你在训练之前调整它们的大小会更好，如果你定义了 pad 和 fiters，任意会造成混乱，你可以用 opencv cv2.imresize(imgfile,(height,widht ))
如果我调整它们的大小，使用空间金字塔池化层可以获得哪些好处？对我来说没有意义吗？

标签： machine-learning neural-network computer-vision deep-learning caffe

【解决方案1】：

你在这里混合了几个概念。

网络能否接受任意输入形状？
好吧，并非所有网络都可以使用任何输入形状。在许多情况下，网络仅限于训练它的输入形状。
在大多数情况下，当使用全连接层 ("InnerProduct") 时，这些层需要 exact 输入维度，因此更改输入形状会“破坏”这些层并将网络限制为 特定的，预定义的输入形状。
另一方面，“全卷积网络”在输入形状方面更加灵活，通常可以处理任何输入形状。

可以在批量训练期间更改输入形状吗？
即使您的网络架构允许任意输入形状，您也不能在 batch 训练期间使用您想要的任何形状，因为单个批次中所有样本的输入形状必须相同：How can you concatenate a 27x27另一个形状为 17x17 的图像？

您收到的错误似乎来自"Data" 层，该层正在努力将不同形状的样本连接到一个批次中。

您可以通过设置batch_size: 1 一次处理一个样本并在solver.prototxt 中设置iter_size: 32 来平均32 个样本的梯度，从而获得batch_size: 32 的SGD 效果来解决此问题。

【讨论】：

感谢您的明确回答，我按照您的建议做了，它解决了问题。我也遇到了一个后续问题，你可以看看我编辑的问题吗？会很好！ - 无论如何我都会将你的答案标记为正确的:-)
@R_Valdez 如果你有一个新问题，你应该问一个新问题（考虑将它们链接到上下文中）。
好的，我会问一个新问题。
我又问了一个问题。您可以在我编辑的问题中找到该链接。