转载请注明出处:
http://www.cnblogs.com/darkknightzh/p/6221664.html
参考网址:
https://github.com/torch/nn/issues/873
http://stackoverflow.com/questions/37459812/finetune-a-torch-model
https://github.com/torch/nn/blob/master/doc/module.md
https://github.com/torch/torch7/blob/master/doc/utility.md
=====================================================
170928更新(可以微调层):
参考网址:
http://www.thedataware.com/post/the-torch-adventures-setting-layer-wise-training-parameters
https://github.com/NVIDIA/DIGITS/tree/master/examples/fine-tuning
https://github.com/NVIDIA/DIGITS/blob/master/examples/fine-tuning/lenet-fine-tune.lua#L56
https://stackoverflow.com/questions/37459812/finetune-a-torch-model
https://www.zhihu.com/question/44376850
说明:目前就第一个网址的能finetune参数。
深度学习中目前有参数的为:卷积层-conv(weight+bias),batchnorm层:bn(weight+bias),全连接层-linear(weight+bias)。
因而在torch中使用local params, gradParams = model:parameters()的话,默认得到的#params为上面这三种类型的层的数量之和再乘以2。如果对应没有bias,则该层参数参数数量为1。
使用http://www.thedataware.com/post/the-torch-adventures-setting-layer-wise-training-parameters的方法,可以更新某个层。该文章是每个层设置不同的学习率,如果只某些特定的层学习率不为0,其它层学习率均为0(或者先定义fineTuneLayerIdx={10,11,12},而后for i = 1, #params改成for i = 1, #fineTuneLayerIdx来减少计算量),则会只更新这些层的参数。需要注意的是,如果fine tune最后几层还好,可以print(params),来看一下参数,然后计算一下哪些参数是需要更新的,如果更新中间的层。。。只能自己去对应了(特别是如Inception,Resnet这种网络中间层的参数,对应起来更加蛋疼了吧)。
该网址中对每层都设置学习率的代码如下:
local params, gradParams = model:parameters() -- Set the learning rate to 0.01 local learningRates = torch.Tensor(#params):fill(0.01) -- Set the learning rate of the second layer to 0.001 learningRates[2] = 0.001 optimState = {} for i = 1, #params do table.insert(optimState, { learningRate = learningRates[i], learningRateDecay = 0.0001, momentum = 0.9, dampening = 0.0, weightDecay = 5e-4 }) end for e = 1, epochs do -- Get MNIST batch X, Y = get_mnist_batch(batch_size) -- forward -> backward (outside of feval) model:zeroGradParameters() out = model:forward(X) err = criterion:forward(out, Y) gradOutputs = criterion:backward(out, Y) model:backward(X, gradOutputs) -- layer-wise optimization for i = 1, #params do local feval = function(x) return err, gradParams[i] end -- run optimizer optim.sgd(feval, params[i], optimState[i]) end end -- model trained