从 ADAM 开始，然后使用 SGD 进行微调。更改优化器答案

【问题标题】：Starting with ADAM and then fine tune with SGD. Changing the optimizer从 ADAM 开始，然后使用 SGD 进行微调。更改优化器
【发布时间】：2021-11-02 19:43:00
【问题描述】：

我读到了这篇关于 a bag of tricks for image classification. 的精彩博客

这部分我很难弄清楚如何在 tensorflow 中实现，或者更确切地说，我不知道该怎么做，甚至不知道它是否可能。

所以，从 Adam 开始：只需设置一个不高得离谱的学习率，通常默认为 0.0001，您通常会得到一些非常好的结果。然后，一旦您的模型开始使用 Adam 饱和，就可以使用 SGD 以较小的学习率进行微调，以挤入最后一点准确度！

你能改变优化器而不以某种方式重新编译吗？

我曾尝试用谷歌搜索，但似乎找不到太多信息。任何人都知道这在张量流中是否可行，如果可以，该怎么做？（或者如果您有相关信息的来源）

【问题讨论】：

标签： tensorflow machine-learning keras

【解决方案1】：

您可以从 tensorflow 文档的 training loop from scratch 开始。创建两个 train_step 函数，第一个使用 Adam 优化器，第二个使用 SGD 优化器。

optimizer1 = keras.optimizers.Adam(learning_rate=1e-3)
optimizer2 = keras.optimizers.SGD(learning_rate=1e-3)

@tf.function
def train_step1(x, y):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss_value = loss_fn(y, logits)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer1.apply_gradients(zip(grads, model.trainable_weights))
    train_acc_metric.update_state(y, logits)
    return loss_value

@tf.function
def train_step2(x, y):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss_value = loss_fn(y, logits)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer2.apply_gradients(zip(grads, model.trainable_weights))
    train_acc_metric.update_state(y, logits)
    return loss_value

主循环：

epochs = 20
train_step = train_step1
start_time = time.time()
for epoch in range(epochs):
  if epoch > epochs//2:
    train_step = train_step2

  total_train_loss = 0.
  # print("\nStart of epoch %d" % (epoch,))


  # Iterate over the batches of the dataset.
  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
      loss_value = train_step(x_batch_train, y_batch_train)
      total_train_loss += loss_value.numpy()
      ...

请注意，每个 train_step 函数的图形都是单独构建的。在图形模式下，您不能将优化器作为参数在迭代（Adam 和 SGD）期间更改的单个 train_step 函数。

【讨论】：