TensorFlow 2d 直方图答案

【问题标题】：Tensorflow 2d HistogramTensorFlow 2d 直方图
【发布时间】：2019-03-30 06:35:13
【问题描述】：

我正在尝试在 tensorflow 中创建一个 2D 直方图，以用于 tensorflow 中的自定义损失函数。更一般地说，我认为人们可以从使用神经元的共同激活中受益，这需要类似的结构。

这就是我想要做的具体事情：

给定一个 Nx2 张量，其中 N 是一些样本数，我想创建一个（合并的）共激活直方图。例如，在 input=[[0, 0.01], [0, 0.99], [0.5, 0.5]] 和总共 10000 个 bin 的简单情况下，我想生成一个 100x100 的张量，除 3 外全为 0 (0, 0.01)、(0, 0.99) 和 (0.5, 0.5) 处的条目，其中的值为 1/3（缩放很容易，所以我可以用 1 代替）。

我可以使用标准的 numpy 或数组操作轻松做到这一点

neuron1 = data[:, 1]
neuron2 = data[:, 2]

hist_2d = np.zeros((100, 100))

for neuron1_output in neuron1:
    for neuron2_output in neuron2:
        hist_2d[int(100 * neuron1_output), int(100 * neuron2_output)] += 1

如果我想在 Tensorflow 中使用 hist_2d 作为损失函数的一部分，我似乎无法进行这种迭代。

有人知道生成我正在寻找的二维直方图的好方法吗？我很高兴找到 tf.histogram_fixed_width()，但这只会生成 1d 直方图。我已经开始研究 tf.while_loop() 和 tf.map_fn()，但我对 Tensorflow 还很陌生，所以我不确定哪种途径最有前途。

【问题讨论】：

标签： python numpy tensorflow histogram distribution

【解决方案1】：

也许this snipped 会对你有所帮助。

import tensorflow as tf
@tf.function
def get2dHistogram(x, y,
                   value_range,
                   nbins=100,
                   dtype=tf.dtypes.int32):
    """
    Bins x, y coordinates of points onto simple square 2d histogram
    
    Given the tensor x and y:
    x: x coordinates of points
    y: y coordinates of points
    this operation returns a rank 2 `Tensor` 
    representing the indices of a histogram into which each element
    of `values` would be binned. The bins are equal width and
    determined by the arguments `value_range` and `nbins`.
    
    
  Args:
    x:  Numeric `Tensor`.
    y: Numeric `Tensor`.
    value_range[0] lims for x
    value_range[1] lims for y
    
    nbins:  Scalar `int32 Tensor`.  Number of histogram bins.
    dtype:  dtype for returned histogram.
  
    
    
    """
    x_range = value_range[0]
    y_range = value_range[1]

    histy_bins = tf.histogram_fixed_width_bins(y, y_range, nbins=nbins, dtype=dtype)
    
    H = tf.map_fn(lambda i: tf.histogram_fixed_width(x[histy_bins == i], x_range, nbins=nbins),
              tf.range(nbins))
    return H # Matrix!

用 tensorflow 2.0 编写，但您肯定可以管理它。

【讨论】：

【解决方案2】：

发布一个我想出的“答案”，更像是一种解决方法。

我想要创建 2D 直方图的全部原因是我想要计算两个神经元激活的联合分布的熵。我已经将激活值离散化到 bin 中，所以如果我将分布打乱就可以了，因为这不会修改熵值。

鉴于此，这就是我所做的：我创建了一个具有平方数的 bin 的一维直方图，然后简单地滑动值，以便前半部分数字对应于神经元 1 的激活，后半部分对应于神经元 2 的激活。在python中：

# Calculate the entropy of a 1D tensor, fuzzing the edges with epsilon to keep numbers
# clean.
def calculate_entropy(y, epsilon):
    clipped = tf.clip_by_value(y, epsilon, 1 - epsilon)
    return -tf.cast(tf.reduce_sum(clipped * tf.log(clipped)), dtype=tf.float32)


# Sandbox for developing calculating the entropies of y
def tf_entropies(y, epsilon, nbins):
    # Create histograms for the activations in the batch.
    value_range = [0.0, 1.0]
    # For prototype, only consider first two features.
    neuron1 = y[:, 0]
    neuron2 = y[:, 1]
    hist1 = tf.histogram_fixed_width(neuron1, value_range, nbins=nbins)
    hist2 = tf.histogram_fixed_width(neuron2, value_range, nbins=nbins)
    # Normalize
    count = tf.cast(tf.count_nonzero(hist1), tf.int32)
    dist1 = tf.divide(hist1, count)
    dist2 = tf.divide(hist2, count)
    neuron1_entropy = calculate_entropy(dist1, epsilon)
    neuron2_entropy = calculate_entropy(dist2, epsilon)

    # Calculate the joint distribution and then get the entropy
    recast_n1 = tf.cast(tf.divide(tf.cast(nbins * neuron1, tf.int32), nbins), tf.float32)
    meshed = recast_n1 + tf.divide(neuron2, nbins)  # Shift over the numbers for neuron2
    joint_hist = tf.histogram_fixed_width(meshed, value_range, nbins=nbins * nbins)
    joint_dist = tf.divide(joint_hist, count)
    joint_entropy = calculate_entropy(joint_dist, epsilon)

    return neuron1_entropy, neuron2_entropy, joint_entropy, joint_dist

获得联合直方图后，我可以使用正常程序获得联合熵。我通过使用正常的 numpy 操作实现相同的逻辑来验证我得到了正确的结果。熵计算匹配。

如果遇到类似问题，我希望这对其他人有所帮助。

【讨论】：

您的解决方案的问题是舍入（隐式转换为整数）具有零梯度。这样看来，您的熵似乎不能用于反向传播/优化。