“TensorFlow Probability 中的概率层回归”问题答案

【问题标题】：Problem with "Regression with Probabilistic Layers in TensorFlow Probability"“TensorFlow Probability 中的概率层回归”问题
【发布时间】：2020-06-10 10:15:16
【问题描述】：

我在使用 tfp.layers.DistributionLambda 时遇到问题，我是一个 TF 新手，正在努力使张量流动。 有人可以提供一些关于如何设置输出分布参数的见解吗？

上下文：

TFP 团队在Regression with Probabilistic Layers in TensorFlow Probability 上写了一个教程，它建立了以下模型：

# Build model.
model = tfk.Sequential([
  tf.keras.layers.Dense(1 + 1),
  tfp.layers.DistributionLambda(
      lambda t: tfd.Normal(loc=t[..., :1],
                           scale=1e-3 + tf.math.softplus(0.05 * t[..., 1:]))),
])

我的问题：

它使用 tfp.layers.DistributionLambda 输出正态分布，但我不清楚 tfd.Normal 的参数（平均值/位置和标准差/比例）是如何设置的，所以我无法将 Normal 更改为伽马分布。我尝试了以下方法，但没有奏效（预测分布参数为 nan）。

def dist_output_layer (t, softplus_scale=0.05):
    """Create distribution with variable mean and variance
    """
    mean = t[..., :1]
    std_dev = 1e-3 + tf.math.softplus(softplus_scale * mean)

    alpha = (mean/std_dev)**2
    beta = alpha/mean

    return tfd.Gamma(concentration = alpha, 
                     rate = beta
                    )

# Build model.
model = tf.keras.Sequential([
    tf.keras.layers.Dense(20,activation="relu"), # "By using a deeper neural network and introducing nonlinear activation functions, however, we can learn more complicated functional dependencies!
    tf.keras.layers.Dense(1 + 1), #two neurons here b/c the output layer's distribution's mean and std. deviation
    tfp.layers.DistributionLambda(dist_output_layer)
])

非常感谢。

【问题讨论】：

标签： python tensorflow tensorflow-probability

【解决方案1】：

说实话，关于您从 Medium 粘贴的代码 sn-p 有很多话要说。

不过，我希望您会发现下面我的 cmets 有点用处。

# Build model.
model = tfk.Sequential([

    # The first layer is a Dense layer with 2 units, one for each of the parameters that will
    # be learnt (see next layer). Its implied shape is (batch_size, 2).
    # Note that this Dense layer has no activation function as we want are any real value that will be used
    # to parameterize the Normal distribution in the Normal distribution component of the following
    # layer
    tf.keras.layers.Dense(1 + 1),

    # The following layer is a DistributionLambda that encapsulates a Normal distribution. The
    # DistributionLambda takes a function in its constructor, and this function should take the output
    # tensor from the previous layer as its input (this is the Dense layer and the comments above).
    # The goal is to learn the 2 parameters of the distribution that is loc (the mean) and scale (the standard
    # deviation). For this, a lambda construct is used. The ellipsis you can see for the loc
    # and scale arguments (that is the 3 dots) are for the batch size. Also note that scale (the standard deviation)
    # cannot be negative. The softplus function was used to make sure that the learnt parameter scale doesn't get
    # negative.
    tfp.layers.DistributionLambda(
      lambda t: tfd.Normal(loc=t[..., :1],
                       scale=1e-3 + tf.math.softplus(0.05 * t[..., 1:]))),
])

【讨论】：

很好的解释！关于softplus中的“0.05”来自哪里的任何想法？我很难理解
@LucasMiranda 很抱歉，我没有看到您的评论。无论如何，Winthrop Harvey 在上面提供了一些线索。或许你也可以参考这个github页面github.com/tensorflow/probability/issues/703这里也有讨论

【解决方案2】：

关于添加 .05 的问题，这是一个小的偏移量，可以解决没有它可能出现的一些梯度问题。基本上前面说过，我们确信真正的可变性不小于 epsilon（此处为 .05），因此我们将确保 std dev 永远不会通过添加它来更小。

见https://github.com/tensorflow/probability/issues/751

金钱报价：

“如果在给定任务的实践中，无穷小尺度最终成为问题，我们通常使用的解决方法是 softplus-and-shift，例如 scale = epsilon + tf.math.softplus(unconstrained_scale)，其中 epsilon 是一些我们先验地确信的像 1e-5 这样的微小值远小于真实规模。”

编辑：由于我上面描述的原因，实际上添加的是 1e-3。至于乘法……可能又只是缩放或梯度调整。或者也许让 scale 参数从某个大小开始。

【讨论】：