【问题标题】:Deep Learning caffe - classification of data leads to NaN深度学习 caffe - 数据分类导致 NaN
【发布时间】:2016-10-24 15:06:17
【问题描述】:

我有一个针对 2 类问题的训练有素的 caffe 网络,并且想要检查单个数据的网络输出。所以我像这样运行分类:

proto = 'deploy.prototxt'
model = 'snapshot_iter_4000.caffemodel'
net = caffe.Net(proto, model, caffe.TEST)

# get image from database to variable "image"
out = net.forward_all(data=image)
print out
>> {'prob': array([[ nan,  nan],
    [ nan,  nan]], dtype=float32)}

我查看了训练输出;我看到准确性永远不会变得更好(总是在 0.48 左右)。 我检查了所有输入 lmdb,其中没有包含 NaN 的数据。此外,我总是用相同的数据集训练几个分类器,它们按预期工作。

有人遇到过这个问题吗? caffe 是否存在一些已知的数值不稳定性?

如果有人可以帮助我,我会很高兴! 谢谢 =)

这是我用于所有网络的solver.prototxt:

test_iter:100
test_interval:100
base_lr: 0.03 
display:50
max_iter: 6000 
lr_policy: "step" 
gamma: 0.1 
momentum:0.9
weight_decay:0.0005
stepsize: 2000 
snapshot:2000
snapshot_prefix:"snapshot"
solver_mode:GPU
net:"train_val.prototxt"
solver_type:SGD

以及网络架构(即 AlexNet):

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 70
  }
  data_param {
    source: "./dataset/train_db"
    batch_size: 300
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    crop_size: 70
  }
  data_param {
    source: "./dataset/val_db"
    batch_size: 300
    backend: LMDB
  }
}

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}


layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc8"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}

【问题讨论】:

  • 你有可能在'snapshot_iter_4000.caffemodel'中有nans吗?
  • 关注this thread

标签: nan deep-learning caffe pycaffe


【解决方案1】:

更新:

从我的回答下的反馈来看,导致问题中出现NaN的原因是:

Data 层中top: "data" 的规模为 [0, 255],而初始学习率为base_lr: 0.03,对于该输入数据规模而言太大,从而导致分歧。

top: "data" 归一化为Data 层中的[0, 1] 解决了这个问题:

transform_param {
    mirror: true
    scale: 0.00390625
    crop_size: 70
}

在您的情况下,NAN 更有可能表明训练分歧,这意味着您的训练没有收敛(这由 2 类分类的 0.48 训练准确度表明)。由于您的输入 lmdb 之前已经工作过,原因更有可能是您使用了太大的学习率,这会在训练期间过度更新模型参数,从而导致 NAN 数量。 因此,您可以尝试较小的学习率,例如小 10 倍,直到您的训练有效。
另外 @Shai in the comment above提供的线程也很不错。

【讨论】:

  • 对不起,我说得不准确。我对相同的数据集使用了完全相同的训练参数(solver.prototxt)。所以学习率(实际上是 0.03)应该不是问题。但是,我没有使用自定义 Losslayer,并且我检查了 NaN 的输入。我发现即使没有 NaN 大约 50% 的分类器也能得到准确率。 50 个分类器中的 1 个。所以我想,也许 - 因为我使用随机梯度下降 - 随机初始化是问题 - 你如何看待这个?
  • 是网。 prototxt 也一样?
  • 是的,我对所有分类器使用相同的 caffe 设置
  • 你能上传你的网络吗? prototxt 和求解器。 prototxt?
  • 你的意思是你已经成功训练了几次完全相同的网络(solver/net.prototxt不变)但只有这一次失败了?
猜你喜欢
  • 2015-03-14
  • 1970-01-01
  • 2016-01-03
  • 2017-11-08
  • 2015-10-09
  • 2021-06-17
  • 1970-01-01
  • 2014-12-11
  • 2017-05-20
相关资源
最近更新 更多