PyTorch：ValueError：预期输入batch_size（256）与目标batch_size（128）匹配答案

【问题标题】：PyToch: ValueError: Expected input batch_size (256) to match target batch_size (128)PyTorch：ValueError：预期输入batch_size（256）与目标batch_size（128）匹配
【发布时间】：2021-03-06 11:37:15
【问题描述】：

我在使用 pytorch 训练 BiLSTM 词性标注器时遇到了 ValueError。 ValueError：预期输入 batch_size (256) 与目标 batch_size (128) 匹配。

def train(model, iterator, optimizer, criterion, tag_pad_idx):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        text = batch.p
        tags = batch.t
        
        optimizer.zero_grad()
        
        #text = [sent len, batch size]
        
        predictions = model(text)
        
        #predictions = [sent len, batch size, output dim]
        #tags = [sent len, batch size]
        
        predictions = predictions.view(-1, predictions.shape[-1])
        tags = tags.view(-1)
        
        #predictions = [sent len * batch size, output dim]
        #tags = [sent len * batch size]
        
        loss = criterion(predictions, tags)
                
        acc = categorical_accuracy(predictions, tags, tag_pad_idx)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)



def evaluate(model, iterator, criterion, tag_pad_idx):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            text = batch.p
            tags = batch.t
            
            predictions = model(text)
            
            predictions = predictions.view(-1, predictions.shape[-1])
            tags = tags.view(-1)
            
            loss = criterion(predictions, tags)
            
            acc = categorical_accuracy(predictions, tags, tag_pad_idx)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

class BiLSTMPOSTagger(nn.Module):
        def __init__(self, 
                     input_dim, 
                     embedding_dim, 
                     hidden_dim, 
                     output_dim, 
                     n_layers, 
                     bidirectional, 
                     dropout, 
                     pad_idx):
            
            super().__init__()
            
            self.embedding = nn.Embedding(input_dim, embedding_dim, padding_idx = pad_idx)
            
            self.lstm = nn.LSTM(embedding_dim, 
                                hidden_dim, 
                                num_layers = n_layers, 
                                bidirectional = bidirectional,
                                dropout = dropout if n_layers > 1 else 0)
            
            self.fc = nn.Linear(hidden_dim * 2 if bidirectional else hidden_dim, output_dim)
            
            self.dropout = nn.Dropout(dropout)
            
        def forward(self, text):
            embedded = self.dropout(self.embedding(text))
            outputs, (hidden, cell) = self.lstm(embedded)
            predictions = self.fc(self.dropout(outputs))        
            return predictions

....................... ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ....................

INPUT_DIM = len(POS.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 128
OUTPUT_DIM = len(TAG.vocab)
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.25
PAD_IDX = POS.vocab.stoi[POS.pad_token]

print(INPUT_DIM)  #output 22147
print(OUTPUT_DIM) #output 42

model = BiLSTMPOSTagger(INPUT_DIM, 
                        EMBEDDING_DIM, 
                        HIDDEN_DIM, 
                        OUTPUT_DIM, 
                        N_LAYERS, 
                        BIDIRECTIONAL, 
                        DROPOUT, 
                        PAD_IDX)

N_EPOCHS = 10

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion, TAG_PAD_IDX)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion, TAG_PAD_IDX)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'tut1-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')





ValueError                                Traceback (most recent call last)
<ipython-input-55-83bf30366feb> in <module>()
      7     start_time = time.time()
      8 
----> 9     train_loss, train_acc = train(model, train_iterator, optimizer, criterion, TAG_PAD_IDX)
     10     valid_loss, valid_acc = evaluate(model, valid_iterator, criterion, TAG_PAD_IDX)
     11 

4 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2260     if input.size(0) != target.size(0):
   2261         raise ValueError('Expected input batch_size ({}) to match target batch_size ({}).'
-> 2262                          .format(input.size(0), target.size(0)))
   2263     if dim == 2:
   2264         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

ValueError: Expected input batch_size (256) to match target batch_size (128).

【问题讨论】：

你能添加你的 train() 和你 evaluate() 函数的代码吗？我想问题出在 train() 计算损失时。打印您的预测和目标的形状以检查它们是否相同
另外，错误的堆栈跟踪会更好。但我猜这个错误是在计算损失时抛出的。
@TheodorPeifer 我刚刚编辑了问题以包含培训和评估功能。谢谢！！
@planet_pluto 我包含了堆栈跟踪。谢谢
您能否在申请.view() 之前和之后打印一次tags.shape 和predictions.shape？

标签： python neural-network pytorch lstm part-of-speech

【解决方案1】：

（从 cmets 继续）

我猜你的批量大小等于 128（它没有定义），对吧？ LSTM 输出每个时间步的输出列表。但是对于分类，您通常只需要最后一个。所以outputs 的第一个维度是您的序列长度，在您的情况下似乎是 2。当您应用 .view 时，这两个乘以您的批量大小（128），然后得到 256。 lstm层你需要把最后一个输出输出序列outputs。像这样：

def forward(self, text):
    embedded = self.dropout(self.embedding(text))
    outputs, (hidden, cell) = self.lstm(embedded)
    
    # take last output
    outputs = outputs.reshape(batch_size, sequence_size, hidden_size)
    outputs = outputs[:, -1]

    predictions = self.fc(self.dropout(outputs))        
    return predictions

【讨论】：

你是对的，我的批量大小是 128。我在上面应用了你的代码，它显示 ValueError: Expected input batch_size (2) to match target batch_size (128)。预测：torch.Size([2, 42]) 标签 torch.Size([1, 128]) 预测：torch.Size([2, 42]) 标签：torch.Size([128])
能否添加 batch_first=True 作为 nn.LSTM 函数的参数，然后重试？
hm，那么也许你必须重塑它，我编辑了我的答案
还是一样。
真的吗？那令人困惑。当您打印输出时，直接在 lstm 层之后的形状应该是 2、128、42 对吗？您的目标是取两者中的最后一个获得 1、128、42，然后将其重塑为 128、42。然后将其传递到输出 128,1 的密集层。尝试归档它，通常它应该与上述步骤一起使用。（编辑：也尝试忘记重塑，而不是 [:, -1] 使用 [-1]）