转载请注明出处:

http://www.cnblogs.com/darkknightzh/p/6221664.html

参考网址:

https://github.com/torch/nn/issues/873

http://stackoverflow.com/questions/37459812/finetune-a-torch-model

https://github.com/torch/nn/blob/master/doc/module.md

https://github.com/torch/torch7/blob/master/doc/utility.md

=====================================================

170928更新(可以微调层):

参考网址:

http://www.thedataware.com/post/the-torch-adventures-setting-layer-wise-training-parameters

https://github.com/NVIDIA/DIGITS/tree/master/examples/fine-tuning

https://github.com/NVIDIA/DIGITS/blob/master/examples/fine-tuning/lenet-fine-tune.lua#L56

https://stackoverflow.com/questions/37459812/finetune-a-torch-model

https://www.zhihu.com/question/44376850

说明:目前就第一个网址的能finetune参数。

深度学习中目前有参数的为:卷积层-conv(weight+bias),batchnorm层:bn(weight+bias),全连接层-linear(weight+bias)。

因而在torch中使用local params, gradParams = model:parameters()的话,默认得到的#params为上面这三种类型的层的数量之和再乘以2。如果对应没有bias,则该层参数参数数量为1。

使用http://www.thedataware.com/post/the-torch-adventures-setting-layer-wise-training-parameters的方法,可以更新某个层。该文章是每个层设置不同的学习率,如果只某些特定的层学习率不为0,其它层学习率均为0(或者先定义fineTuneLayerIdx={10,11,12},而后for i = 1, #params改成for i = 1, #fineTuneLayerIdx来减少计算量),则会只更新这些层的参数。需要注意的是,如果fine tune最后几层还好,可以print(params),来看一下参数,然后计算一下哪些参数是需要更新的,如果更新中间的层。。。只能自己去对应了(特别是如Inception,Resnet这种网络中间层的参数,对应起来更加蛋疼了吧)。

该网址中对每层都设置学习率的代码如下:

local params, gradParams = model:parameters() 

-- Set the learning rate to 0.01
local learningRates = torch.Tensor(#params):fill(0.01)
-- Set the learning rate of the second layer to 0.001
learningRates[2] = 0.001

optimState = {}
for i = 1, #params do
  table.insert(optimState, {
    learningRate = learningRates[i],
    learningRateDecay = 0.0001,
    momentum = 0.9,
    dampening = 0.0,
    weightDecay = 5e-4
  })
end

for e = 1, epochs do
  -- Get MNIST batch
  X, Y = get_mnist_batch(batch_size)

  -- forward -> backward (outside of feval)
  model:zeroGradParameters()
  out = model:forward(X)
  err = criterion:forward(out, Y)
  gradOutputs = criterion:backward(out, Y)
  model:backward(X, gradOutputs)

  -- layer-wise optimization
  for i = 1, #params do
    local feval = function(x)
      return err, gradParams[i]
    end

    -- run optimizer
    optim.sgd(feval, params[i], optimState[i])
  end
  
end
-- model trained
View Code

相关文章:

  • 2021-07-18
  • 2021-11-17
  • 2022-12-23
  • 2022-12-23
  • 2021-11-18
  • 2021-06-27
  • 2022-12-23
猜你喜欢
  • 2022-03-08
  • 2022-03-07
  • 2021-06-08
  • 2021-07-07
  • 2021-07-23
相关资源
相似解决方案