通过 gModule 向后火炬传递答案

【问题标题】：torch backward through gModule通过 gModule 向后火炬传递
【发布时间】：2016-02-06 07:59:25
【问题描述】：

我有一个如下图，其中输入 x 有两条到达 y 的路径。它们与使用 cMulTable 的 gModule 相结合。现在，如果我执行 gModule:backward(x,y)，我会得到一个包含两个值的表。它们是否对应于从两条路径导出的误差导数？

但是由于 path2 包含其他 nn 层，我想我需要以逐步的方式推导出该路径中的导数。但是为什么我得到一个包含两个 dy/dx 值的表格？

为了让事情更清楚，测试代码如下：

input1 = nn.Identity()()
input2 = nn.Identity()()
score = nn.CAddTable()({nn.Linear(3, 5)(input1),nn.Linear(3, 5)(input2)})
g = nn.gModule({input1, input2}, {score})  #gModule

mlp = nn.Linear(3,3) #path2 layer

x = torch.rand(3,3)
x_p = mlp:forward(x)
result = g:forward({x,x_p})
error = torch.rand(result:size())
gradient1 = g:backward(x, error)  #this is a table of 2 tensors
gradient2 = g:backward(x_p, error)  #this is also  a table of 2 tensors

那么我的步骤有什么问题？

P.S，也许我已经找到了原因，因为 g:backward({x,x_p}, error) 导致同一张表。所以我猜这两个值分别代表 dy/dx 和 dy/dx_p。

【问题讨论】：

标签： lua neural-network torch backpropagation

【解决方案1】：

我认为您只是在构建 gModule 时犯了一个错误。每个nn.Module 中的gradInput 必须具有与其input 完全相同的结构——这就是反向传播的工作方式。

这是一个示例，如何使用 nngraph 创建像您这样的模块：

require 'torch'
require 'nn'
require 'nngraph'

function CreateModule(input_size)
    local input = nn.Identity()()   -- network input

    local nn_module_1 = nn.Linear(input_size, 100)(input)
    local nn_module_2 = nn.Linear(100, input_size)(nn_module_1)

    local output = nn.CMulTable()({input, nn_module_2})

    -- pack a graph into a convenient module with standard API (:forward(), :backward())
    return nn.gModule({input}, {output})
end


input = torch.rand(30)

my_module = CreateModule(input:size(1))

output = my_module:forward(input)
criterion_err = torch.rand(output:size())

gradInput = my_module:backward(input, criterion_err)
print(gradInput)

更新

正如我所说，每个nn.Module 中的gradInput 必须具有与其input 完全相同的结构。所以，如果你定义你的模块为nn.gModule({input1, input2}, {score})，你的gradOutput（反向传播的结果）将是一个梯度w.r.t的表。 input1 和 input2 在您的情况下是 x 和 x_p。

唯一的问题仍然存在：为什么你在调用时不会收到错误：

gradient1 = g:backward(x, error) 
gradient2 = g:backward(x_p, error)

必须引发异常，因为第一个参数不能是张量，而是包含两个张量的表。好吧，在计算:backward(input, gradOutput) 期间，大多数（可能是全部）torch 模块不使用input 参数（它们通常存储上次:forward(input) 调用中input 的副本）。事实上，这个论点毫无用处，模块甚至都懒得去验证它。

【讨论】：

嗨，亚历克斯，感谢您的回答。我没有使用单个输入 x，而是使用两个输入 a 和 b 创建了 gModule，而 b 的值取决于 a。我这样做是因为 nn 层比线性变换更复杂。它具有 LSTM 结构。
我还包含了我的代码模拟，请查看@Alexander Lutsenko