在运行 Tensorflow 会话之前,应该启动一个 Optimizer,如下所示:
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
tf.train.GradientDescentOptimizer 是 GradientDescentOptimizer 类的对象,顾名思义,它实现了梯度下降算法。
方法 minimize() 以“成本”作为参数被调用,由 compute_gradients() 和 apply_gradients()apply_gradients() 两个方法组成/strong>。
对于大多数(自定义)优化器实现,apply_gradients() 方法需要进行调整。
此方法依赖于我们将创建的(新)优化器(类)来实现以下方法:_create_slots()、_prepare()、_apply_dense() 和 _apply_sparse() .
Ops 通常是用 C++ 编写的。无需自己更改 C++ 标头,您仍然可以通过这些方法返回一些 Ops 的 python 包装器。
这样做如下:
def _create_slots(self, var_list):
# Create slots for allocation and later management of additional
# variables associated with the variables to train.
# for example: the first and second moments.
'''
for v in var_list:
self._zeros_slot(v, "m", self._name)
self._zeros_slot(v, "v", self._name)
'''
def _apply_dense(self, grad, var):
#define your favourite variable update
# for example:
'''
# Here we apply gradient descents by substracting the variables
# with the gradient times the learning_rate (defined in __init__)
var_update = state_ops.assign_sub(var, self.learning_rate * grad)
'''
#The trick is now to pass the Ops in the control_flow_ops and
# eventually groups any particular computation of the slots your
# wish to keep track of:
# for example:
'''
m_t = ...m... #do something with m and grad
v_t = ...v... # do something with v and grad
'''
return control_flow_ops.group(*[var_update, m_t, v_t])
有关示例的更详细说明,请参阅此博客文章
https://www.bigdatarepublic.nl/custom-optimizer-in-tensorflow/