caffe 是否也在反向传递期间计算学习率为零 (lr_mult = 0) 的层的梯度？

【问题标题】：Does caffe computes gradients also for layers with learning rate zero (lr_mult = 0) during the backward pass?caffe 是否也在反向传递期间计算学习率为零 (lr_mult = 0) 的层的梯度？
【发布时间】：2018-09-28 05:57:41
【问题描述】：

我最近使用 D-X-Y C++ 实现 (https://github.com/D-X-Y/caffe-faster-rcnn/) 训练了一个 Faster R-CNN 模型。为了节省训练时间，我通过设置 lr_mult = 0 冻结了较低（共享）的卷积层。我比较了有和没有冻结层的迭代时间，发现没有显着差异。在 Caffe 中，是否仍然为 lr_mult = 0 的这些层计算梯度？

【问题讨论】：

标签： performance machine-learning computer-vision caffe gradient-descent

【解决方案1】：

我对此不是 100% 确定，但即使在 lr_mult: 0 时，AFAIK caffe 也会计算梯度，因为可能在其他地方需要梯度。
您是否尝试过设置propagate_down: false 来阻止渐变传播？

来自caffe.proto：

  // Specifies whether to backpropagate to each bottom. If unspecified,
  // Caffe will automatically infer whether each input needs backpropagation
  // to compute parameter gradients. If set to true for some inputs,
  // backpropagation to those inputs is forced; if set false for some inputs,
  // backpropagation to those inputs is skipped.

【讨论】：