如何在 python theano 中编写 adagrad 代码答案

【问题标题】：How to code adagrad in python theano如何在 python theano 中编写 adagrad 代码
【发布时间】：2015-03-31 09:37:56
【问题描述】：

为了简化问题，假设一个维度（或特征）已经更新了 n 次，下次我看到该特征时，我想将学习率设置为 1/n。

我想出了这些代码：

def test_adagrad():
  embedding = theano.shared(value=np.random.randn(20,10), borrow=True)
  times = theano.shared(value=np.ones((20,1)))
  lr = T.dscalar()
  index_a = T.lvector()
  hist = times[index_a]
  cost = T.sum(theano.sparse_grad(embedding[index_a]))
  gradients = T.grad(cost, embedding)
  updates = [(embedding, embedding+lr*(1.0/hist)*gradients)]
  ### Here should be some codes to update also times which are omitted ### 
  train = theano.function(inputs=[index_a,   lr],outputs=cost,updates=updates)
  for i in range(10):
    print train([1,2,3],0.05)

Theano 没有给出任何错误，但训练结果有时会给出 Nan。请问有人知道怎么改吗？

感谢您的帮助

PS：我怀疑是稀疏空间中的操作造成了问题。所以我试图用theano.sparse.mul替换*。正如我之前提到的，这给出了一些结果

【问题讨论】：

标签： python gradient theano

【解决方案1】：

也许你可以利用下面的example for implementation of adadelta，并用它来推导出你自己的。如果成功请更新:-)

【讨论】：

非常感谢您的回答
不客气 :-) 如果您觉得它有用，请将答案标记为“已接受”并点赞 :-) 此外 - 如果您希望为未来的用户跟进 - 您也可以附上你的实现...

【解决方案2】：

我一直在寻找同样的东西，并最终以 zuuz 已经指出的资源风格自己实现它。所以也许这可以帮助任何在这里寻求帮助的人。

def adagrad(lr, tparams, grads, inp, cost):
    # stores the current grads
    gshared = [theano.shared(np.zeros_like(p.get_value(),
                                           dtype=theano.config.floatX),
                             name='%s_grad' % k)
               for k, p in tparams.iteritems()]
    grads_updates = zip(gshared, grads)
    # stores the sum of all grads squared
    hist_gshared = [theano.shared(np.zeros_like(p.get_value(),
                                                dtype=theano.config.floatX),
                                  name='%s_grad' % k)
                    for k, p in tparams.iteritems()]
    rgrads_updates = [(rg, rg + T.sqr(g)) for rg, g in zip(hist_gshared, grads)]

    # calculate cost and store grads
    f_grad_shared = theano.function(inp, cost,
                                    updates=grads_updates + rgrads_updates,
                                    on_unused_input='ignore')

    # apply actual update with the initial learning rate lr
    n = 1e-6
    updates = [(p, p - (lr/(T.sqrt(rg) + n))*g)
               for p, g, rg in zip(tparams.values(), gshared, hist_gshared)]

    f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore')

    return f_grad_shared, f_update

【讨论】：

【解决方案3】：

我发现this implementation from Lasagne 非常简洁易读。您可以按原样使用它：

for param, grad in zip(params, grads):
    value = param.get_value(borrow=True)
    accu = theano.shared(np.zeros(value.shape, dtype=value.dtype),
                         broadcastable=param.broadcastable)
    accu_new = accu + grad ** 2
    updates[accu] = accu_new
    updates[param] = param - (learning_rate * grad /
                              T.sqrt(accu_new + epsilon))

【讨论】：