IndexError：索引 -9223372036854775808 超出尺寸 2 的维度 1 的范围答案

【问题标题】：IndexError: index -9223372036854775808 is out of bounds for dimension 1 with size 2IndexError：索引 -9223372036854775808 超出尺寸 2 的维度 1 的范围
【发布时间】：2021-10-20 06:05:08
【问题描述】：

我正在尝试使用二元交叉熵训练一个连体网络。

我在 train_epoch 中出现以下错误：

y_true_2[range(y_true_2.shape[0]), y_true.long()] = 1 IndexError: index -9223372036854775808 is out of bounds for dimension 1 with size 2

以下是代码sn-p供参考：

def train_epoch(train_loader, model, loss_fn, optimizer, cuda, log_interval, metrics, logging):
    for metric in metrics:
        metric.reset()

    model.train()
    losses = []
    total_loss = 0

    for batch_idx, ((x0, x1), y) in enumerate(train_loader):

        x0, x1, y_true = x0.cpu(), x1.cpu(), y.cpu()
        gc.collect()
        optimizer.zero_grad()
        output1, output2 = model(x0, x1)

        '''Distance metric - PairwiseDistance'''
        p_dist = torch.nn.PairwiseDistance(keepdim=True)

        dy = p_dist(output1, output2)
        dy = torch.nan_to_num(dy)
        y_true = torch.nan_to_num(y_true)

        '''2 lines indicated the normalization of dy to 0 and 1 by dividing it with max value'''

        maximum_dy = torch.max(dy)
        maximum_dy = torch.nan_to_num(maximum_dy)
        dy = dy / maximum_dy

        maximum_y_true = torch.max(y_true)
        maximum_y_true = torch.nan_to_num(maximum_y_true)

        y_true = y_true / maximum_y_true

        dy = torch.squeeze(dy, 1)

        'Output tensor of dimension [4,2] and input tensor of dimension [4] to BCE loss function'
        input_dy = torch.empty(dy.size(0), 2)
        input_dy[:, 0] = 1 - dy
        input_dy[:, 1] = dy

        y_true_2 = torch.zeros(dy.size(0), 2)
        y_true_2[range(y_true_2.shape[0]), y_true.long()] = 1

        m = nn.Sigmoid()
        loss = loss_fn(m(input_dy), y_true_2)

        loss.backward()
        optimizer.step()

        losses.append(loss.item())
        total_loss += loss.item()

        input_dy_metric = torch.round(input_dy)


        for metric in metrics:
            metric(input_dy_metric, y_true_2)
            metric.total += y_true_2.shape[0]

        if batch_idx % log_interval == 0:
            message = 'Train: [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                batch_idx, len(train_loader),
                100. * batch_idx / len(train_loader), np.mean(losses))
            for metric in metrics:
                message += '\t{}: {}'.format(metric.name(), metric.value())

            print(message)
            losses = []

    total_loss /= (batch_idx + 1)
    return total_loss, metrics

请帮助我解决可能的问题。提前致谢。

【问题讨论】：

除了使用debugger，您还可以尝试打印出y_true_2.shape[0] 和y_true.long()。这至少会给你一个关于哪个索引产生IndexError的提示。
另外，在转换为Tensor.long之前看看y_true
以下是您建议检查的输出： 1. print(y_true_2) = tensor([[0., 1.], [0., 1.], [0., 1.] , [0., 1.]]) 2. print(y_true) = 张量([1., 1., 1., 1.]) 3. print(y_true_2.shape[0]) = 4 4. print( range(y_true_2.shape[0])) = range(0,4) 5. print(y_true.long()) = tensor([1, 1, 1, 1])
请发布错误的完整堆栈跟踪以确保错误行

标签： python pytorch conv-neural-network training-data loss-function

【解决方案1】：

-9223372036854775808 指的是 NaN 值。

尝试在错误行之前将NaN 值替换为如下所示的确定性值：

y_true_2 = torch.where(torch.isnan(y_true_2), torch.zeros_like(y_true_2), y_true_2)
y_true = torch.where(torch.isnan(y_true), torch.zeros_like(y_true), y_true)

【讨论】：