TensorFlow使用记录 (三）： Learning Rate Scheduling

file: tensorflow/python/training/learning_rate_decay.py

神经网络中通过超参数 learning rate，来控制每次参数更新的幅度。学习率太小会降低网络优化的速度，增加训练时间；学习率太大则可能导致可能导致参数在局部最优解两侧来回振荡，网络不能收敛。

tensorflow 定义了很多的学习率衰减方式：

指数衰减是比较常用的衰减方法，学习率是跟当前的训练轮次指数相关的。

tf.train.exponential_decay(
    learning_rate,     # 初始学习率
    global_step,       # 当前训练轮次
    decay_steps,       # 衰减周期
    decay_rate,        # 衰减率系数
    staircase=False,   # 定义是否是阶梯型衰减，还是连续衰减，默认是 False
    name=None
)
'''
decayed_learning_rate = learning_rate *
                      decay_rate ^ (global_step / decay_steps)
'''

示例代码：

import tensorflow as tf
import matplotlib.pyplot as plt
style1 = []
style2 = []
N = 200

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for step in range(N):
    # 标准指数型衰减
    learing_rate1 = tf.train.exponential_decay(
      learning_rate=0.5, global_step=step, decay_steps=10, decay_rate=0.9, staircase=False)
    # 阶梯型衰减
    learing_rate2 = tf.train.exponential_decay(
      learning_rate=0.5, global_step=step, decay_steps=10, decay_rate=0.9, staircase=True)
    lr1 = sess.run([learing_rate1])
    lr2 = sess.run([learing_rate2])
    style1.append(lr1)
    style2.append(lr2)

step = range(N)

plt.plot(step, style1, 'g-', linewidth=2, label='exponential_decay')
plt.plot(step, style2, 'r--', linewidth=1, label='exponential_decay_staircase')
plt.title('exponential_decay')
plt.xlabel('step')
plt.ylabel('learing rate')
plt.legend(loc='upper right')
plt.tight_layout()
plt.show()

View Code