如何在 PyTorch 中创建允许动态序列长度的 LSTM答案

【问题标题】：How to create LSTM that allows dynamic sequence length in PyTorch如何在 PyTorch 中创建允许动态序列长度的 LSTM
【发布时间】：2022-12-09 07:42:02
【问题描述】：

我在 PyTorch 中创建了一个 LSTM，我需要给它一个序列长度变量，以下是我的代码：

class Seq2SeqSingle(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, in_features, out_features):
        super(Seq2SeqSingle, self).__init__()
        self.out_features = out_features
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size

        self.fc_i = nn.Linear(input_size, out_features)
        self.fc_o = nn.Linear(out_features, input_size)
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True)
        self.fc_0 = nn.Linear(128*11, out_features)         ## <----------- LOOK HERE
        self.fc_1 = nn.Linear(out_features, out_features)

    def forward(self, x):
        #print(x.shape)
        output = self.fc_i(torch.relu(x))
        output = self.fc_o(torch.relu(output))
        
        h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)).to(device)
        c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)).to(device)
        output, (h_out, c_out) = self.lstm(output, (h_0, c_0))
        output = output.reshape(x.size(0), -1)
        output = self.fc_0(torch.relu(output))
        output = self.fc_1(torch.relu(output))
        output = nn.functional.softmax(output, dim = 1)
        return output

为了匹配 LSTM 层输出的大小，我需要将 128（即隐藏大小）乘以 11（序列长度），显然如果我更改序列长度它会崩溃，我如何避免指定这个固定尺寸？

【问题讨论】：

通常，人们会使用最后的隐藏状态而不是展平下一层的所有隐藏状态。如果您担心从早期步骤中丢失信息，您可以通过均值或总和或加权总和（注意力）对所有隐藏状态进行聚合。
@joe32140 我该怎么做？ “使用最后的隐藏状态而不是展平下一层的所有隐藏状态”
看起来您正在尝试对输入序列进行分类，即将单个标签分配给给定的输入。你能在你的问题中证实这一点吗？
当batch_first=True时输出是(N, L, D * H_{out})，所以你可以做last_hidden = output[:,-1,:]。请注意，如果您确实进行了填充，那么选择最后一个隐藏的可能不是最好的方法。
长度可能会改变，但D * H_out的大小不会根据序列长度而改变。 last_hidden = output[:,-1,:] 表示你只取最后一步的隐藏状态。

标签： machine-learning deep-learning pytorch lstm torch

【解决方案1】：

您无法避免指定固定大小。正如joe32140在a comment中所说，一种常见的做法是只将最后一步的隐藏状态作为线性层的输入，因此大小不再取决于步数。

【讨论】：