在 Keras 中构建自定义损失函数答案

【问题标题】：Constructing a Custom Loss Function in Keras在 Keras 中构建自定义损失函数
【发布时间】：2019-01-18 12:50:38
【问题描述】：

我正在尝试从this paper 在 Keras 中编写自定义损失函数。即，我要创建的损失是这样的：

这是多类多标签问题的一种排名损失。以下是详细信息：

Y_i = set of positive labels for sample i
Y_i^bar = set of negative labels for sample i (complement of Y_i)
c_j^i = prediction on i^th sample at label j

在下文中，y_true 和 y_pred 的维度都是 18。

def multilabel_loss(y_true, y_pred):
    """ Multi-label loss function.

    More complete description here...

    """    
    zero = K.tf.constant(0, dtype=tf.float32)
    where_one = K.tf.not_equal(y_true, zero)
    where_zero = K.tf.equal(y_true, zero)

    Y_p = K.tf.where(where_one)
    Y_n = K.tf.where(where_zero)

    n = K.tf.shape(y_true)[0]
    loss = 0

    for i in range(n):
        # Here i is the ith sample; for a specific i, I find all locations
        # where Y_p, Y_n belong to the ith sample; axis 0 denotes
        # the sample index space
        Y_p_i = K.tf.equal(Y_p[:,0], K.tf.constant(i, dtype=tf.int64))
        Y_n_i = K.tf.equal(Y_n[:,0], K.tf.constant(i, dtype=tf.int64))

        # Here I plug in those locations to get the values
        Y_p_i = K.tf.where(Y_p_i)
        Y_n_i = K.tf.where(Y_n_i)

        # Here I get the indices of the values above
        Y_p_ind = K.tf.gather(Y_p[:,1], Y_p_i)
        Y_n_ind = K.tf.gather(Y_n[:,1], Y_n_i)

        # Here I compute Y_i and its complement
        yi = K.tf.shape(Y_p_ind)[0]
        yi_not = K.tf.shape(Y_n_ind)[0]

        # The value to normalize the inner summation
        normalizer = K.tf.divide(1, K.tf.multiply(yi, yi_not))

        # This creates a matrix of all combinations of indices k, l from the 
        # above equation; then it is reshaped
        prod = K.tf.map_fn(lambda x: K.tf.map_fn(lambda y: K.tf.stack( [ x, y ] ), Y_n_ind ), Y_p_ind )
        prod = K.tf.reshape(prod, [-1, 2, 1])
        prod = K.tf.squeeze(prod)

        # Next, the indices are fed into the corresponding prediction
        # matrix, where the values are then exponentiated and summed
        y_pred_gather = K.tf.gather(y_pred[i,:].T, prod)
        s = K.tf.cast(K.sum(K.tf.exp(K.tf.subtract(y_pred_gather[:,0], y_pred_gather[:,1]))), tf.float64)
        loss = loss + K.tf.multiply(normalizer, s)
    return loss

我的问题如下：

当我去编译我的图表时，我收到一个围绕n 的错误。即TypeError: 'Tensor' object cannot be interpreted as an integer。我环顾四周，但我找不到阻止这种情况的方法。我的直觉是我需要完全避免 for 循环，这让我想到了
如何在没有 for 循环的情况下编写此损失？我对 Keras 还很陌生，我自己花了几个小时来编写这个自定义损失。我想写得更简洁。阻止我使用所有矩阵的原因是 Y_i 及其补码可以为每个 i 呈现不同的大小。

如果您希望我详细说明我的代码，请告诉我。很高兴这样做。

更新 3

根据@Parag S. Chandakkar 的建议，我有以下几点：

def multi_label_loss(y_true, y_pred):

    # set consistent casting
    y_true = tf.cast(y_true, dtype=tf.float64)
    y_pred = tf.cast(y_pred, dtype=tf.float64)

    # this get all positive predictions and negative predictions
    # it also exponentiates them in their respective Y_i classes
    PT = K.tf.multiply(y_true, tf.exp(-y_pred))
    PT_complement = K.tf.multiply((1-y_true), tf.exp(y_pred))

    # this step gets the weight vector that we'll normalize by
    m = K.shape(y_true)[0]
    W = K.tf.multiply(K.sum(y_true, axis=1), K.sum(1-y_true, axis=1))
    W_inv = 1./W
    W_inv = K.reshape(W_inv, (m,1))

    # this step computes the outer product of two tensors
    def outer_product(inputs):
        """
        inputs: list of two tensors (of equal dimensions, 
            for which you need to compute the outer product
        """
        x, y = inputs

        batchSize = K.shape(x)[0]

        outerProduct = x[:,:, np.newaxis] * y[:,np.newaxis,:]
        outerProduct = K.reshape(outerProduct, (batchSize, -1))

        # returns a flattened batch-wise set of tensors
        return outerProduct

    # set up inputs to outer product
    inputs = [PT, PT_complement]

    # compute final loss
    loss = K.sum(K.tf.multiply(W_inv, outer_product(inputs)))

    return loss

【问题讨论】：

我想看看它，您是否可以在代码的 for 循环中添加一些简短的 cmets 来解释您到底在做什么或它与损失函数的关系上面给出的？
@sdcbr 我添加了一些 cmets。我希望他们有点帮助！感谢您的观看。
只是一个风格说明：使用# for comments 和""" For doc strings (the description of what your function does that comes right after the signature at the top of the function. It can be long and have all the indentation and whatnot that you want. """ 我编辑了你的来说明。
谢谢，@Engineero。下次会记住这一点！

标签： python tensorflow keras tensor loss-function

【解决方案1】：

这不是答案，更像是我的思考过程，应该可以帮助您编写简洁的代码。

首先，我认为您现在不必担心该错误，因为当您消除 for 循环时，您的代码可能看起来非常不同。

现在，我还没有看过论文，但预测 c_j^i 应该是来自最后一个非 softmax 层的原始值（这是我假设的）。

因此您可以添加一个额外的exp 层并为每个预测计算exp(c_j^i)。现在，for 循环的出现是因为求和。如果你仔细观察，它所做的只是首先形成所有标签对，然后减去它们相应的预测。现在，首先将减法表示为exp(c_l^i) * exp(-c_k^i)。要查看发生了什么，请举一个简单的例子。

import numpy as np
a = [1, 2, 3]
a = np.reshape(a, (3,1))

按照上面的解释，你想要下面的结果。

r1 = sum([1 * 2, 1 * 3, 2 * 3]) = sum([2, 3, 6]) = 11

您可以通过矩阵乘法得到相同的结果，这是一种消除 for 循环的方法。

r2 = a * a.T
# r2 = array([[1, 2, 3],
#             [2, 4, 6],
#             [3, 6, 9]])

Extract the upper triangular part，即2, 3, 6，然后对数组求和得到11，这就是你想要的结果。现在，可能存在一些差异，例如，您可能需要详尽地形成所有对。你应该可以把它转换成矩阵乘法的形式。

处理求和项后，如果您预先计算每个样本 i 的数量 |Y_i| 和 \bar{Y_i}，则可以轻松计算归一化项。将它们作为输入数组传递，并将它们作为y_pred 的一部分传递给损失函数。对i 的最终求和将由 Keras 完成。

编辑 1： 即使 |Y_i| 和 \bar{Y_i} 采用不同的值，您应该能够构建一个通用公式来提取上三角部分，而不管矩阵大小如何预计算 |Y_i| 和 \bar{Y_i}。

编辑 2：我认为你没有完全理解我的意思。在我看来，NumPy 根本不应该在损失函数中使用。这（大部分）仅使用 Tensorflow 是可行的。我将再次解释，同时保留我之前的解释。

我现在知道正标签和负标签之间存在笛卡尔积（分别为 |Y_i| 和 \bar{Y_i}）。因此，首先，在原始预测之后放置一个layer of exp（在 TF 中，而不是在 Numpy 中）。
现在，您需要知道y_true 的18 个维度中哪些索引对应于正数，哪些对应于负数。如果您使用的是一种热编码，您可以使用tf.where 和tf.gather 即时找到它（请参阅here）。
现在，您应该知道对应于正标签和负标签的索引j（在c_j^i 中）。您需要做的就是计算 \sum_(k, l) {exp(c_k^i) * (1 / exp(c_l^i))} 对 (k, l)。您需要做的就是形成一个由exp(c_k^i) for all k（称为A）和另一个由exp(c_l^i) for all l（称为B）组成的张量。然后计算sum(A * B^T)。如果您使用笛卡尔积，也无需提取上三角部分。至此，您应该得到了最内层求和的结果。
与我之前所说的相反，我认为您还可以从 y_true 即时计算归一化因子。

您只需要弄清楚如何将其扩展到三个维度以处理多个样本。

注意：Numpy 的用法是 probably possible 通过使用 tf.py_func 但这里似乎没有必要。就用TF的功能吧。

【讨论】：

这非常有帮助！请在我的问题中查看我的编辑/更新。我已经对损失进行了 numpy 实现，但我仍然坚持使用 Kronecker 产品。
再次感谢您。我更新了我的问题并添加了 tensorflow 代码来生成单个样本的损失。我还有两个问题。我不明白第 (1) 点在原始预测之后放置一层 exp 的必要性；此外，我仍然难以将其推广到 3 维，因为每个样本对于 Y_i 及其补码都有不同的大小。不过你帮了大忙！
你是对的。您不必放置exp 层。使用tf.exp 也可以。我只是想说不要使用 Numpy 函数。关于Y_i 的另一个大小不同的问题，请注意输入大小可能会发生变化，但输出始终是标量。所以你可以使用tf.map_fn。您需要做的就是编写一个以y_true 和y_pred 作为输入的函数。识别Y_i和\bar{Y_i}，输出最里面的和。将此函数作为参数包含在 map_fn 中。
我再次更新了我的问题。我想这次我明白了。很高兴听到您的想法。
我进行了测试并且它们匹配。感谢您的帮助！您的回答被接受，因为它引导我找到解决方案。