当模型有多输出的时候,容易产生此问题,如以下程序所示:

        # zero the parameter gradients
        model.zero_grad()

        # forward + backward + optimize
        outputs, hidden = model(inputs, hidden)
        loss = _loss(outputs, session, items)
        acc_loss += loss.data[0]

        loss.backward()
        # Add parameters' gradients to their values, multiplied by learning rate
        for p in model.parameters():
            p.data.add_(-learning_rate, p.grad.data)

 

第一种解决方案:

detach/repackage the hidden state in between batches. There are (at least) three ways to do this.

  1. hidden.detach_()
  2. hidden = hidden.detach()
  3. hidden = Variable(hidden.data, requires_grad=True) 

第二种解决方案:

replace loss.backward() with loss.backward(retain_graph=True) but know that each successive batch will take more time than the previous one because it will have to back-propagate all the way through to the start of the first batch.  

通常来说,第二种解决方案速度很慢,如果内存小的话会内存溢出 

相关文章:

  • 2022-12-23
  • 2022-12-23
  • 2021-05-20
  • 2021-08-13
  • 2021-08-02
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
猜你喜欢
  • 2022-12-23
  • 2021-06-17
  • 2022-12-23
  • 2021-07-18
  • 2021-08-25
  • 2022-02-26
相关资源
相似解决方案