PyTorch Multi-GPU K80s Batch 张量失败答案

【问题标题】：PyTorch Multi-GPU K80s Batch fails for TensorsPyTorch Multi-GPU K80s Batch 张量失败
【发布时间】：2017-07-24 14:44:36
【问题描述】：

在单个 GPU 上训练时，我的训练适用于小批量（默认）。

if USE_CUDA:
    encoderchar = encoderchar.cuda()
    encoder = encoder.cuda()
    decoder = decoder.cuda()

但是，当我使用所有可用的 GPU 进行训练时，我得到了一个错误。

if USE_CUDA:
    encoderchar = torch.nn.DataParallel(encoderchar, device_ids=[0, 1, 2, 3, 4, 5, 6, 7])
    encoder =  torch.nn.DataParallel(encoder, device_ids=[0, 1, 2, 3, 4, 5, 6, 7])
    decoder = torch.nn.DataParallel(decoder, device_ids=[0, 1, 2, 3, 4, 5, 6, 7])
    encoderchar = encoderchar.cuda()
    encoder = encoder.cuda()
    decoder = decoder.cuda()

在转发过程中出现以下错误。

RuntimeError                              Traceback (most recent call last)
<ipython-input-10-227f3e86847c> in <module>()
18         loss, ar1, ar2 = train(data_input_batch_index, data_input_batch_length, data_target_batch_index, data_target_batch_length, 
19                                encoderchar, encoder, decoder, encoderchar_optimizer, encoder_optimizer, decoder_optimizer,
---> 20                                criterion, batch_size)
21 
22         # Keep track of loss
<ipython-input-8-21861d792653> in train(input_batch, input_batch_length, target_batch, target_batch_length, encoderchar, encoder, decoder, encoderchar_optimizer, encoder_optimizer, decoder_optimizer, criterion, batch_size)
21             #reshaped_input_length =  Variable(torch.LongTensor(reshaped_input_length)).cuda()
22         hidden_all, output = encoderchar(w, reshaped_input_length)
---> 23         encoder_input[ix] = output.transpose(0,1).contiguous().view(batch_size, -1)
24 
25     temporary_target_batch_length = [15] * batch_size
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/variable.py in __setitem__(self, key, value)
78         else:
79             if isinstance(value, Variable):
---> 80                 return SetItem(key)(self, value)
81             else:
82                 return SetItem(key, value)(self)
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py in forward(self, i, value)
37         else:  # value is Tensor
38             self.value_size = value.size()
---> 39         i._set_index(self.index, value)
40         return i
41 

RuntimeError: sizes do not match at /py/conda-bld/pytorch_1493681908901/work/torch/lib/THC/THCTensorCopy.cu:31

一个cuda long tensor和一个list是传递给encoderchar前馈的参数类型。

hidden_all, output = encoderchar(w, reshaped_input_length)
encoder_input[ix] = output.transpose(0,1).contiguous().view(batch_size, -1)

nvidia-smi 在抛出错误后显示以下内容。

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      
|    0     18320    C   python                                         453MiB |
|    1     18320    C   python                                         266MiB |
|    2     18320    C   python                                         266MiB |
|    3     18320    C   python                                         266MiB |
|    4     18320    C   python                                         266MiB |
|    5     18320    C   python                                         266MiB |
|    6     18320    C   python                                         266MiB |
|    7     18320    C   python                                         262MiB |
+-----------------------------------------------------------------------------+

这里有什么问题？

【问题讨论】：

hidden_all、output 和 encoder_input 的大小/尺寸是多少？还有batch_size的内容是什么？
以下是尺寸 hidden_all - torch.Size([15, 128, 500]) output - torch.Size([1, 128, 500]) encoder_input - torch.Size([15, 128 , 500]) **这段代码在单个 GPU 环境中运行良好。 **
batch_size 为 128

标签： batch-processing pytorch

【解决方案1】：

DataParallel 需要知道哪个dim 来分割输入数据（即哪个dim 是batch_size）。它假设（默认情况下）dim=0 中表示输入的 batch_size 的维度。

在您的情况下，encoderchar 模块的输入的批量大小为暗 1。

所以，要么修改 DataParallel 实例化，指定 dim=1：

encoderchar = torch.nn.DataParallel(encoderchar, device_ids=[0, 1, 2, 3, 4, 5, 6, 7], dim=1)

或者，通过这样做来更改输入大小，（将 batch_size dim 移至 0）：

w = w.view(batch_size, -1)

【讨论】：

我也试过了。它现在因另一个错误而失败。 ---> 16 packed = torch.nn.utils.rnn.pack_padded_sequence(embedded, input_lengths) 63 if len(lengths) != batch_size: ---> 64 raise ValueError("lengths array has wrong size") input_lengths is a长度为 128 (batch_size) 的列表。我们也需要切片吗？？
你在encoderchar 模块中调用pack_padded_sequence 吗？因为您的批次大小为 128 并且您使用 8 个设备 DataParallel 将您的批次分成 8 个批次，每个批次大小为 16。以某种方式确保pack_padded_sequence 获得正确的批量大小，即 16。
是的，我在 encoderchar 模块中调用 pack_padded_sequence。这不是哈克吗？我会这样做的。
我一点也不觉得它很老套。我怀疑DataParallel 可以拆分打包序列，所以在你的模块中这样做可能是正确的方法。
但是，这是我的转发功能。 DataParallel 正在破坏 input_seqs，但为什么不破坏 input_lengths？ def forward(self, input_seqs, input_lengths, hidden=None): # Note: we run this all at once (over multiple batches of multiple sequences) embedded = self.embedding(input_seqs) packed = torch.nn.utils.rnn.pack_padded_sequence(embedded, input_lengths) outputs, hidden = self.gru(packed, hidden) outputs, output_lengths = torch.nn.utils.rnn.pad_packed_sequence(outputs) # unpack (back to padded) return outputs, hidden