试图理解 PyTorch 中的 cross_entropy 损失答案

【问题标题】：Trying to understand cross_entropy loss in PyTorch试图理解 PyTorch 中的 cross_entropy 损失
【发布时间】：2019-12-01 08:16:20
【问题描述】：

这是一个非常新手的问题，但我正试图解决 Torch 中的 cross_entropy 损失，因此我创建了以下代码：

x = torch.FloatTensor([
                        [1.,0.,0.]
                       ,[0.,1.,0.]
                       ,[0.,0.,1.]
                       ])

print(x.argmax(dim=1))

y = torch.LongTensor([0,1,2])
loss = torch.nn.functional.cross_entropy(x, y)

print(loss)

输出如下：

tensor([0, 1, 2])
tensor(0.5514)

鉴于我的输入与预期输出匹配，我不明白为什么损失不是 0？

【问题讨论】：

标签： python machine-learning pytorch

【解决方案1】：

那是因为你给交叉熵函数的输入不是你所做的概率，而是用这个公式转换成概率的 logits：

probas = np.exp(logits)/np.sum(np.exp(logits), axis=1)

因此，pytorch 在您的情况下使用的概率矩阵是：

[0.5761168847658291,  0.21194155761708547,  0.21194155761708547]
[0.21194155761708547, 0.5761168847658291, 0.21194155761708547]
[0.21194155761708547,  0.21194155761708547, 0.5761168847658291]

【讨论】：

从数学的角度来看，OP需要将y转化为概率分布
是的，如果我们将输入更改为 like:x = torch.FloatTensor([ [10.,0.,0.] ,[0.,10.,0.] ,[0. ,0.,10.] ])，那么 F.cross_entropy 结果会接近于零，所以 F.cross_entropy 期望 ground truth 与其他类的差异越大越好，不是 ground truth = 1 最好.

【解决方案2】：

torch.nn.functional.cross_entropy 函数将log_softmax（softmax 后跟一个对数）和nll_loss（负对数似然损失）组合在一个单一的函数，即相当于F.nll_loss(F.log_softmax(x, 1), y)。

代码：

x = torch.FloatTensor([[1.,0.,0.],
                       [0.,1.,0.],
                       [0.,0.,1.]])
y = torch.LongTensor([0,1,2])

print(torch.nn.functional.cross_entropy(x, y))

print(F.softmax(x, 1).log())
print(F.log_softmax(x, 1))

print(F.nll_loss(F.log_softmax(x, 1), y))

输出：

tensor(0.5514)
tensor([[-0.5514, -1.5514, -1.5514],
        [-1.5514, -0.5514, -1.5514],
        [-1.5514, -1.5514, -0.5514]])
tensor([[-0.5514, -1.5514, -1.5514],
        [-1.5514, -0.5514, -1.5514],
        [-1.5514, -1.5514, -0.5514]])
tensor(0.5514)

从here阅读更多关于torch.nn.functional.cross_entropy损失函数的信息。

【讨论】：

【解决方案3】：

完整的、复制/粘贴可运行示例，显示分类交叉熵损失计算示例：

-纸+铅笔+计算器
-NumPy
-PyTorch

除了微小的四舍五入差异之外，所有 3 个结果都是相同的：

import torch
import torch.nn.functional as F

import numpy as np

def main():

    ### paper + pencil + calculator calculation #################

    """
    predictions before softmax:
                  columns
               (4 categories)
        rows     1, 4, 1, 1
    (3 samples)  5, 1, 2, 1
                 1, 2, 5, 1

    ground truths (NOT one hot encoded)
          1, 0, 2

    preds softmax calculation:
    (e^1/(e^1+e^4+e^1+e^1)), (e^4/(e^1+e^4+e^1+e^1)), (e^1/(e^1+e^4+e^1+e^1)), (e^1/(e^1+e^4+e^1+e^1))
    (e^5/(e^5+e^1+e^2+e^1)), (e^1/(e^5+e^1+e^2+e^1)), (e^2/(e^5+e^1+e^2+e^1)), (e^1/(e^5+e^1+e^2+e^1))
    (e^1/(e^1+e^2+e^5+e^1)), (e^2/(e^1+e^2+e^5+e^1)), (e^5/(e^1+e^2+e^5+e^1)), (e^1/(e^1+e^2+e^5+e^1))

    preds after softmax:
    0.04332, 0.87005, 0.04332, 0.04332
    0.92046, 0.01686, 0.04583, 0.01686
    0.01686, 0.04583, 0.92046, 0.01686

    categorical cross-entropy loss calculation:
    (-ln(0.87005) + -ln(0.92046) + -ln(0.92046)) / 3 = 0.10166

    Note the loss ends up relatively low because all 3 predictions are correct
    """


    ### calculation via NumPy ###################################

    # predictions from model (just made up example data in this case)
    # rows = 3 samples, cols = 4 categories
    preds = np.array([[1, 4, 1, 1],
                      [5, 1, 2, 1],
                      [1, 2, 5, 1]], dtype=np.float32)

    # ground truths, NOT one hot encoded
    gndTrs = np.array([1, 0, 2], dtype=np.int64)

    preds = softmax(preds)

    loss = calcCrossEntropyLoss(preds, gndTrs)

    print('\n' + 'NumPy loss = ' + str(loss) + '\n')

    ### calculation via PyTorch #################################

    # predictions from model (just made up example data in this case)
    # rows = 3 samples, cols = 4 categories
    preds = torch.tensor([[1, 4, 1, 1],
                          [5, 1, 2, 1],
                          [1, 2, 5, 1]], dtype=torch.float32)

    # ground truths, NOT one hot encoded
    gndTrs = torch.tensor([1, 0, 2], dtype=torch.int64)

    loss = F.cross_entropy(preds, gndTrs)

    print('PyTorch loss = ' + str(loss) + '\n')
# end function

def softmax(x: np.ndarray) -> np.ndarray:
    numSamps = x.shape[0]

    for i in range(numSamps):
        x[i] = np.exp(x[i]) / np.sum(np.exp(x[i]))
    # end for

    return x
# end function

def calcCrossEntropyLoss(preds: np.ndarray, gndTrs: np.ndarray) -> np.ndarray:
    assert len(preds.shape) == 2
    assert len(gndTrs.shape) == 1
    assert preds.shape[0] == gndTrs.shape[0]

    numSamps = preds.shape[0]

    mySum = 0.0
    for i in range(numSamps):
        # Note: in numpy, "log" is actually natural log (ln)
        mySum += -1 * np.log(preds[i, gndTrs[i]])
    # end for

    crossEntLoss = mySum / numSamps
    return crossEntLoss
# end function

if __name__ == '__main__':
    main()

程序输出：

NumPy loss = 0.10165966302156448

PyTorch loss = tensor(0.1017)

【讨论】：