【发布时间】:2018-07-15 14:18:37
【问题描述】:
我正在使用 PyTorch (0.4) 实现 DDPG,但在反向传播损失时遇到了困难。 所以,首先我的代码执行更新:
def update_nets(self, transitions):
"""
Performs one update step
:param transitions: list of sampled transitions
"""
# get batches
batch = transition(*zip(*transitions))
states = torch.stack(batch.state)
actions = torch.stack(batch.action)
next_states = torch.stack(batch.next_state)
rewards = torch.stack(batch.reward)
# zero gradients
self._critic.zero_grad()
# compute critic's loss
y = rewards.view(-1, 1) + self._gamma * \
self.critic_target(next_states, self.actor_target(next_states))
loss_critic = F.mse_loss(y, self._critic(states, actions),
size_average=True)
# backpropagte it
loss_critic.backward()
self._optim_critic.step()
# zero gradients
self._actor.zero_grad()
# compute actor's loss
loss_actor = ((-1.) * self._critic(states, self._actor(states))).mean()
# backpropagate it
loss_actor.backward()
self._optim_actor.step()
# do soft updates
self.perform_soft_update(self.actor_target, self._actor)
self.perform_soft_update(self.critic_target, self._critic)
其中self._actor、self._crtic、self.actor_target 和 self.critic_target 是篮网。
如果我运行它,我会在第二次迭代中得到以下错误:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
at
line 221, in update_nets
loss_critic.backward()
line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
line 89, in backward
allow_unreachable=True) # allow_unreachable flag
我不知道是什么原因造成的。
到目前为止,我所知道的是,loss_critic.backward() 调用会导致错误。
我已经调试了loss_critic - 它得到了一个有效值。
如果我用一个简单的替换损失计算
loss_critic = torch.tensor(1., device=self._device, dtype=torch.float, requires_grad=True)
包含值 1 的张量可以正常工作。
另外,我已经检查过我没有保存一些可能导致错误的结果。
此外,使用loss_actor 更新演员不会导致任何问题。
有人知道这里出了什么问题吗?
谢谢!
更新
我换了
# zero gradients
self._critic.zero_grad()
和
# zero gradients
self._actor.zero_grad()
与
# zero gradients
self._critic.zero_grad()
self._actor.zero_grad()
self.critic_target.zero_grad()
self.actor_target.zero_grad()
(两个调用)但它仍然失败并出现相同的错误。 此外,在一次迭代结束时执行更新的代码
def perform_soft_update(self, target, trained):
"""
Preforms the soft update
:param target: Net to be updated
:param trained: Trained net - used for update
"""
for param_target, param_trained in \
zip(target.parameters(), trained.parameters()):
param_target.data.copy_(
param_target.data * (
1.0 - self._tau) + param_trained * self._tau
)
【问题讨论】:
标签: neural-network pytorch backpropagation reinforcement-learning loss