【问题标题】:mxnet training not progressingmxnet 培训没有进展
【发布时间】:2017-08-23 21:27:41
【问题描述】:

提前感谢您的帮助。

我在让 mxnet 模型收敛到任何东西时遇到一些问题:它似乎卡在接近其初始权重。

一个工作示例(尽管我今天很难让许多这样的模型工作)。我已经尝试了以下方法,其中包含一系列时期(最多 100 个)和一系列学习率(0.001 到 10),但无法从中得到任何明智的结果。

import mxnet as mx
import numpy as np

inputs = np.expand_dims(np.random.uniform(size=10000), axis=1)
labels = np.sin(inputs)

data_iter = mx.io.NDArrayIter(data=inputs, label=labels, data_name='data', label_name='label', batch_size=50)

data = mx.sym.Variable('data')
label = mx.sym.Variable('label')

fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
ac1 = mx.sym.Activation(data=fc1, act_type='relu')

fc2 = mx.sym.FullyConnected(data=ac1, num_hidden=64)
ac2 = mx.sym.Activation(data=fc2, act_type='relu')

fc3 = mx.sym.FullyConnected(data=ac2, num_hidden=16)
ac3 = mx.sym.Activation(data=fc3, act_type='relu')

output = mx.sym.FullyConnected(data=ac3, num_hidden=1)
loss = mx.symbol.MakeLoss(mx.symbol.square(output - label), name="loss")

model = mx.module.Module(symbol=loss, data_names=('data',), label_names=('label',))

import logging
logging.getLogger().setLevel(logging.DEBUG)
model.fit(data_iter,
          optimizer='sgd',
          optimizer_params={'learning_rate':0.1},
          eval_metric='mse',
          num_epoch=5)

导致:

INFO:root:Epoch[0] Train-mse=0.221155
INFO:root:Epoch[0] Time cost=0.173
INFO:root:Epoch[1] Train-mse=0.225179
INFO:root:Epoch[1] Time cost=0.176
INFO:root:Epoch[2] Train-mse=0.225179
INFO:root:Epoch[2] Time cost=0.179
INFO:root:Epoch[3] Train-mse=0.225179
INFO:root:Epoch[3] Time cost=0.176
INFO:root:Epoch[4] Train-mse=0.225179
INFO:root:Epoch[4] Time cost=0.183

很明显,培训并没有真正取得进展。

【问题讨论】:

  • 您应该尝试在输出层使用 tanh 激活,这样 sin 的范围和网络的输出匹配。
  • 好点。这是我为 SO 起草的一个简单示例,但我的问题甚至适用于更明智的输出层 :) 我使用的 mxnet 不正确,但我看不到在哪里!

标签: machine-learning neural-network deep-learning mxnet


【解决方案1】:

我获取了您的代码并对其进行了一些更新,并且能够使其收敛,代码粘贴在下面。

我所做的更新:我更新了层,只有两个完全连接的层,每个层有 128 个单元,更新了损失函数以使用内置的线性回归,添加了 Momentum 并更新了学习率,最后 - 运行更多时代

希望这会有所帮助!

import mxnet as mx
import numpy as np

inputs = np.expand_dims(np.random.uniform(size=10000), axis=1)
labels = np.sin(inputs)

data_iter = mx.io.NDArrayIter(data=inputs, label=labels, data_name='data', label_name='label', batch_size=50)

data = mx.sym.Variable('data')
label = mx.sym.Variable('label')

fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
ac1 = mx.sym.Activation(data=fc1, act_type='relu')

fc2 = mx.sym.FullyConnected(data=ac1, num_hidden=128)
ac2 = mx.sym.Activation(data=fc2, act_type='relu')

output = mx.sym.FullyConnected(data=ac2, num_hidden=1)
#loss = mx.symbol.MakeLoss(mx.symbol.square(output - label), name="loss")
loss = mx.sym.LinearRegressionOutput(data=output, label=label, name="loss")

model = mx.module.Module(symbol=loss, data_names=('data',), label_names=('label',))

import logging
logging.getLogger().setLevel(logging.DEBUG)
model.fit(data_iter,
          optimizer='sgd',
          optimizer_params={'learning_rate':0.005, 'momentum': 0.9},
          eval_metric='mse',
          num_epoch=50)

结果:

INFO:root:Epoch[0] Train-mse=0.076923
INFO:root:Epoch[0] Time cost=0.148
INFO:root:Epoch[1] Train-mse=0.061155
INFO:root:Epoch[1] Time cost=0.178
INFO:root:Epoch[2] Train-mse=0.061154
INFO:root:Epoch[2] Time cost=0.168
INFO:root:Epoch[3] Train-mse=0.061153
INFO:root:Epoch[3] Time cost=0.151
INFO:root:Epoch[4] Train-mse=0.061151
INFO:root:Epoch[4] Time cost=0.182
INFO:root:Epoch[5] Train-mse=0.061150
INFO:root:Epoch[5] Time cost=0.186
INFO:root:Epoch[6] Train-mse=0.061149
INFO:root:Epoch[6] Time cost=0.197
INFO:root:Epoch[7] Train-mse=0.061147
INFO:root:Epoch[7] Time cost=0.174
INFO:root:Epoch[8] Train-mse=0.061145
INFO:root:Epoch[8] Time cost=0.148
INFO:root:Epoch[9] Train-mse=0.061142
INFO:root:Epoch[9] Time cost=0.150
INFO:root:Epoch[10] Train-mse=0.061140
INFO:root:Epoch[10] Time cost=0.145
INFO:root:Epoch[11] Train-mse=0.061136
INFO:root:Epoch[11] Time cost=0.135
INFO:root:Epoch[12] Train-mse=0.061133
INFO:root:Epoch[12] Time cost=0.136
INFO:root:Epoch[13] Train-mse=0.061128
INFO:root:Epoch[13] Time cost=0.137
INFO:root:Epoch[14] Train-mse=0.061122
INFO:root:Epoch[14] Time cost=0.146
INFO:root:Epoch[15] Train-mse=0.061116
INFO:root:Epoch[15] Time cost=0.135
INFO:root:Epoch[16] Train-mse=0.061108
INFO:root:Epoch[16] Time cost=0.152
INFO:root:Epoch[17] Train-mse=0.061098
INFO:root:Epoch[17] Time cost=0.179
INFO:root:Epoch[18] Train-mse=0.061086
INFO:root:Epoch[18] Time cost=0.160
INFO:root:Epoch[19] Train-mse=0.061069
INFO:root:Epoch[19] Time cost=0.151
INFO:root:Epoch[20] Train-mse=0.061050
INFO:root:Epoch[20] Time cost=0.145
INFO:root:Epoch[21] Train-mse=0.061024
INFO:root:Epoch[21] Time cost=0.164
INFO:root:Epoch[22] Train-mse=0.060990
INFO:root:Epoch[22] Time cost=0.151
INFO:root:Epoch[23] Train-mse=0.060944
INFO:root:Epoch[23] Time cost=0.141
INFO:root:Epoch[24] Train-mse=0.060881
INFO:root:Epoch[24] Time cost=0.136
INFO:root:Epoch[25] Train-mse=0.060790
INFO:root:Epoch[25] Time cost=0.124
INFO:root:Epoch[26] Train-mse=0.060658
INFO:root:Epoch[26] Time cost=0.151
INFO:root:Epoch[27] Train-mse=0.060455
INFO:root:Epoch[27] Time cost=0.166
INFO:root:Epoch[28] Train-mse=0.060131
INFO:root:Epoch[28] Time cost=0.148
INFO:root:Epoch[29] Train-mse=0.059582
INFO:root:Epoch[29] Time cost=0.219
INFO:root:Epoch[30] Train-mse=0.058581
INFO:root:Epoch[30] Time cost=0.160
INFO:root:Epoch[31] Train-mse=0.056593
INFO:root:Epoch[31] Time cost=0.178
INFO:root:Epoch[32] Train-mse=0.052252
INFO:root:Epoch[32] Time cost=0.184
INFO:root:Epoch[33] Train-mse=0.042274
INFO:root:Epoch[33] Time cost=0.168
INFO:root:Epoch[34] Train-mse=0.023321
INFO:root:Epoch[34] Time cost=0.162
INFO:root:Epoch[35] Train-mse=0.005860
INFO:root:Epoch[35] Time cost=0.161
INFO:root:Epoch[36] Train-mse=0.000848
INFO:root:Epoch[36] Time cost=0.164
INFO:root:Epoch[37] Train-mse=0.000319
INFO:root:Epoch[37] Time cost=0.176
INFO:root:Epoch[38] Train-mse=0.000221
INFO:root:Epoch[38] Time cost=0.148
INFO:root:Epoch[39] Train-mse=0.000163
INFO:root:Epoch[39] Time cost=0.199
INFO:root:Epoch[40] Train-mse=0.000123
INFO:root:Epoch[40] Time cost=0.141
INFO:root:Epoch[41] Train-mse=0.000096
INFO:root:Epoch[41] Time cost=0.133
INFO:root:Epoch[42] Train-mse=0.000078
INFO:root:Epoch[42] Time cost=0.144
INFO:root:Epoch[43] Train-mse=0.000065
INFO:root:Epoch[43] Time cost=0.174
INFO:root:Epoch[44] Train-mse=0.000056
INFO:root:Epoch[44] Time cost=0.208
INFO:root:Epoch[45] Train-mse=0.000050
INFO:root:Epoch[45] Time cost=0.152
INFO:root:Epoch[46] Train-mse=0.000045
INFO:root:Epoch[46] Time cost=0.154
INFO:root:Epoch[47] Train-mse=0.000041
INFO:root:Epoch[47] Time cost=0.151
INFO:root:Epoch[48] Train-mse=0.000039
INFO:root:Epoch[48] Time cost=0.177
INFO:root:Epoch[49] Train-mse=0.000036
INFO:root:Epoch[49] Time cost=0.135

【讨论】:

  • 确实,代码本身总体上还不错,但我需要对优化器设置和损失函数更加小心。感谢您的帮助:)
【解决方案2】:

我建议您进行权重初始化。 检查这个

model.fit(data_iter,
          optimizer='sgd',
          initializer=mx.init.Xavier(),//here it is ,also you may try another initializations
          optimizer_params={'learning_rate':0.005, 'momentum': 0.9},
          eval_metric='mse',
          num_epoch=50)

似乎如果没有初始化,您将从接近零的权重和偏差均匀分布开始。 在这种情况下,权重变化将很小并且可能会消失或跨层的差异很小,这可能导致线性模型而不是接受数据的非线性。 另请参阅那些文章。

https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94

https://towardsdatascience.com/weight-initialization-techniques-in-neural-networks-26c649eb3b78

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-04-24
    • 2020-12-24
    • 2019-07-10
    • 1970-01-01
    • 2017-03-23
    • 2020-08-18
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多