【问题标题】:PyTorch LSTM categorical model - output to target mappingPyTorch LSTM 分类模型 - 输出到目标映射
【发布时间】:2021-01-14 22:46:58
【问题描述】:

我有一个输出长度为 2 的向量的网络。我的目标是 1 或 0 的形式,指的是两个可能的类别。获得损失的最佳方法是什么 - 即我应该将目标转换为例如 2 维向量,还是应该转换网络的输出,例如取最大数的位置作为输出?

我的网络如下所示:

class LSTMClassifier(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super().__init__()
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.lstm1 = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        self.lstm2 = nn.LSTM(hidden_dim, hidden_dim, layer_dim, batch_first=True)
        self.fc1 = nn.Linear(hidden_dim, 32)
        self.fc2 = nn.Linear(32, 1)
        self.dropout = nn.Dropout(p=0.2)
        self.batch_normalisation1 = nn.BatchNorm1d(layer_dim)
        self.batch_normalisation2 = nn.BatchNorm1d(2)
        self.activation = nn.Softmax(dim=2)
    
    def forward(self, x):
        h0, c0 = self.init_hidden(x)
        out, (hn1, cn1) = self.lstm1(x, (h0, c0))
        out = self.dropout(out,)
        out = self.batch_normalisation1(out)
        
        h1, c1 = self.init_hidden(out)
        out, (hn2, cn2) = self.lstm2(out, (h1, c1))
        out = self.dropout(out)
        out = self.batch_normalisation1(out)
        
        h2, c2 = self.init_hidden(out)
        out, (hn3, cn3) = self.lstm2(out, (h2, c2))
        out = self.dropout(out)
        out = self.batch_normalisation1(out)
        
        out = self.fc1(out[:, -1, :])
        out = self.dropout(out)
        out = self.fc2(out)
        return out
    
    def init_hidden(self, x):
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)
        return [t for t in (h0, c0)]
    
    def pred(self, x):
        out = self(x)
        return out > 0

这个网络的输入示例是:

tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [2.3597e-04, 1.1507e-02, 8.7719e-02, 6.1093e-02, 9.5556e-01],
         [2.1474e-03, 5.3805e-03, 9.6491e-02, 2.2508e-01, 8.2222e-01]]])

形状为torch.Size([1, 3, 5])。目标当前是10。但是,网络输出一个向量,例如:

tensor([[0.5293, 0.4707]], grad_fn=<SoftmaxBackward>)

在这些目标和网络输出之间设置损失的最佳方法是什么?

更新:

我现在可以按照答案中的建议训练模型:

model = LSTMClassifier(5, 128, 3, 1)
Epochs = 10
batch_size = 32

criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=1e-6)

for epoch in range(Epochs):
    if epoch == 0:
        accurate = 0
        for X_instance, y_instance in zip(val_x, val_y):
            if int(y_instance) == 1 and model.pred(X_instance.view(-1, 3, 5)).item():
                accurate += 1
        print(f"Untrained accuracy test set: {accurate/len(val_x)}")
    print(f"Epoch {epoch + 1}")
    
    for n, (X, y) in enumerate(train_batches):
        model.train()
        optimizer.zero_grad()
        y_pred = model(X)
        loss = criterion(y_pred, y)
        loss.backward()
        optimizer.step()

    model.eval()
    accurate = 0
    for X_instance, y_instance in zip(val_x, val_y):
        if int(y_instance) == 1 and model.pred(X_instance.view(-1, 3, 5)).item():
            accurate += 1
    print(f"Accuracy test set: {accurate/len(val_x)}")

【问题讨论】:

    标签: python pytorch


    【解决方案1】:

    您不应该在网络末端使用任何激活,并且只输出一个神经元而不是两个(使用 BCEWithLogitsLoss 训练)。

    以下是您的神经网络代码,其中包含注释并删除了不必要的部分:

    class LSTMClassifier(nn.Module):
        def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
            super().__init__()
            self.hidden_dim = hidden_dim
            self.layer_dim = layer_dim
            self.lstm1 = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
            self.lstm2 = nn.LSTM(hidden_dim, hidden_dim, layer_dim, batch_first=True)
            self.fc1 = nn.Linear(hidden_dim, 32)
            # Output 1 neuron instead of two
            self.fc2 = nn.Linear(32, 1)
            # Model should not depend on batch size
            # self.batch_size = None
            # You are not using this variable
            # self.hidden = None
            self.dropout = nn.Dropout(p=0.2)
            self.batch_normalisation1 = nn.BatchNorm1d(layer_dim)
            self.batch_normalisation2 = nn.BatchNorm1d(2)
    
        def forward(self, x):
            # Hidden are initialized with 0 explicitly
            # h0, c0 = self.init_hidden(x)
            out, _ = self.lstm1(x)
            # No need for initial values
            # out, (hn1, cn1) = self.lstm1(x, (h0, c0))
            out = self.dropout(out)
            out = self.batch_normalisation1(out)
    
            # Same for all other cells you re-init with zeros, it's implicit
            out, _ = self.lstm2(out)
            out = self.dropout(out)
            out = self.batch_normalisation1(out)
    
            out, _ = self.lstm2(out)
            out = self.dropout(out)
            out = self.batch_normalisation1(out)
    
            out = self.fc1(out[:, -1, :])
            out = self.dropout(out)
            # No need for activation
            # out = F.softmax(self.fc2(out))
            out = self.fc2(out)
            return out
    
        # Return True (1) or False (0)
        def pred(self, x):
            return self(x) > 0
    

    我还添加了 pred 方法,该方法将 logits 转换为目标(例如与某些指标一起使用)。

    基本上,如果您的 logit 低于0,则为False,否则为True。在这种情况下无需激活。

    【讨论】:

    • 谢谢,这似乎是有道理的。但是,当我尝试使用 BCEWithLogitsLoss 进行训练时,出现错误 RuntimeError: result type Float can't be cast to the desired output type Long。我传递的目标看起来像tensor([[0],[1], [1] .... [0], [0]]),而网络的输出就像tensor([[-0.5519], [0.3448] .... ])
    • 这个错误是由于目标被转换为torch.LongTensor,通过将目标转换为浮点数来修复错误 - 我将把它留在这里以供参考
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-12-26
    • 2021-01-06
    • 1970-01-01
    • 1970-01-01
    • 2016-01-21
    • 2021-05-30
    • 1970-01-01
    相关资源
    最近更新 更多