使用量化神经网络对输入图像进行标准化答案

【问题标题】：Standartization for input images using in quantized neural networks使用量化神经网络对输入图像进行标准化
【发布时间】：2022-01-26 18:25:54
【问题描述】：

我正在使用量化神经网络（需要像素为[0, 255] 的输入图像）一段时间。对于 ssd_mobilenet_v1.tflite 模型，通过https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/2 给出以下标准化参数：

 mean: 127.5
 std : 127.5

因此，使用此参数，通用公式normalized_input = (input - mean) / std 对我来说没有意义。当像素值小于128 时，括号变为0，标准化输入也是0。所以128 下的每个值都会导致黑色像素。这不可能是对的还是我错了？

感谢您的帮助。我很想在这里讨论。

尊敬的克里斯

【问题讨论】：

标签： python tensorflow image-processing neural-network

【解决方案1】：

我会说张量中的每个值都是基于导致黑色像素的均值和标准值进行归一化的，这是完全正常的behavior：

import tensorflow as tf

mean = 127.5
std = 127.5
input = tf.concat([tf.random.uniform((1, 2, 2, 2)), tf.reshape(tf.repeat(tf.constant(128.0), repeats=4), (1, 2, 2, 1))], axis=-1)
normalized_input = (input - mean) / std
print(input)
print(normalized_input)

tf.Tensor(
[[[[  0.50647175   0.20693159 128.        ]
   [  0.18777049   0.9095379  128.        ]]

  [[  0.42894745   0.76806736 128.        ]
   [  0.58564055   0.31613588 128.        ]]]], shape=(1, 2, 2, 3), dtype=float32)
tf.Tensor(
[[[[-0.9960277  -0.998377    0.00392157]
   [-0.9985273  -0.99286634  0.00392157]]

  [[-0.99663574 -0.99397594  0.00392157]
   [-0.99540675 -0.9975205   0.00392157]]]], shape=(1, 2, 2, 3), dtype=float32)

我经常遇到基于整个图像数据集计算平均值和标准差的项目，并根据这些度量标准对图像进行标准化：

import tensorflow as tf
import matplotlib.pyplot as plt

input = tf.concat([tf.random.uniform((1, 2, 2, 2)), tf.reshape(tf.repeat(tf.constant(128.0), repeats=4), (1, 2, 2, 1))], axis=-1)
normalized_input = (input - tf.reduce_mean(input, keepdims=True)) / tf.math.reduce_std(input, keepdims=True)

print(input)
print(normalized_input)
plt.imshow(tf.squeeze(input, axis=0))
plt.imshow(tf.squeeze(normalized_input, axis=0))

tf.Tensor(
[[[[7.1283507e-01 6.4363706e-01 1.2800000e+02]
   [1.5691042e-02 2.3734951e-01 1.2800000e+02]]

  [[6.6603470e-01 1.3576746e-01 1.2800000e+02]
   [3.1267488e-01 9.6504271e-01 1.2800000e+02]]]], shape=(1, 2, 2, 3), dtype=float32)
tf.Tensor(
[[[[-0.70291406 -0.7040649   1.414201  ]
   [-0.71450937 -0.7108226   1.414201  ]]

  [[-0.70369244 -0.71251214  1.414201  ]
   [-0.7095697  -0.69871914  1.414201  ]]]], shape=(1, 2, 2, 3), dtype=float32)

在许多其他项目中，您也只能看到 uint8 图像被缩放到 [0, 1] 范围，这实际上意味着每个图像都被 255 除。请查看post 了解更多详细信息。

【讨论】：

哦，是的，你是对的。我的问题有点不清楚，低于 128 的值会导致负归一化值（我认为应该设置为 0），高于 128 的值会导致值更小 1，当我使用 uint8_t 作为数据类型时也会导致值为零。那么为什么有人要这样做呢？我的意思是，归一化对我来说是有意义的，但对于 std 和 mean 来说，这个值显然不是。
我认为如果您使用一批图像的均值和标准差，然后使用这些值对图像进行归一化，情况会有所不同。查看更新的答案。
好的，所以我“只是”需要为平均值和标准差使用其他值？ tensorflow 中的均值和标准差不应该用于对图像进行归一化，以便在神经网络中进行进一步处理？
用一些解释更新了答案。
我现在明白了。谢谢！

【解决方案2】：

对不起，一个人！！！我认为 Tensorlfow 的 Normalize Fn 是考虑 beta、gamma 和 sigma 值的分数 Fn。

model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=(1, 32, 32, 3)), # input shape to have value 25088 but received input with shape (None, 784) 
    tf.keras.layers.Normalization(mean=3., variance=2. ,name='Layer_1'),
    tf.keras.layers.Normalization(mean=4., variance=6. ,name='Layer_2'),
    tf.keras.layers.Dense(256, activation='relu' ,name='Layer_3'),
])

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(6, activation=tf.nn.softmax ,name='Layer_4'))
model.summary()

with tf.compat.v1.variable_scope('Layer_1', reuse=tf.compat.v1.AUTO_REUSE):                 
            v2 = tf.compat.v1.get_variable('v2', shape=[256])       # <tf.Variable 'Layer_1/v2:0' shape=(256,) dtype=float32, numpy=array([-0.06715409,  0.10130859,  0.05591007, -0.05931217,  0.10036706, ...
            x1 = tf.compat.v1.get_variable('x', shape=[256])        # <tf.Variable 'Layer_1/x:0' shape=(256,) dtype=float32, numpy=array([-6.63143843e-02,  3.17198113e-02,  1.04614533e-01, -2.30028257e-02, ...
            y1 = tf.compat.v1.get_variable('y', shape=[256])        # <tf.Variable 'Layer_1/y:0' shape=(256,) dtype=float32, numpy=array([-0.10782533,  0.01488321, -0.04950972, -0.09561327,  0.10698273, ...
            y2 = tf.compat.v1.get_variable('y_', shape=[256])       # <tf.Variable 'Layer_1/y_:0' shape=(256,) dtype=float32, numpy=array([-0.04931336, -0.10670284, -0.10054329, -0.09619174,  0.08752564, ...
            mu = tf.compat.v1.get_variable('mu', shape=[256])       # <tf.Variable 'Layer_1/mu:0' shape=(256,) dtype=float32, numpy=array([-0.06098992,  0.02202646, -0.05624849,  0.0602672 , -0.02878931, ...
            sigma = tf.compat.v1.get_variable('sigma', shape=[256]) # <tf.Variable 'Layer_1/sigma:0' shape=(256,) dtype=float32, numpy=array([ 2.84786597e-02,  1.00004725e-01, -8.51654559e-02, -5.34656569e-02, ...
            gamma = tf.compat.v1.get_variable('gamma', shape=[256]) # <tf.Variable 'Layer_1/gamma:0' shape=(256,) dtype=float32, numpy=array([ 0.10177503,  0.04634983, -0.02325767,  0.04158259,  0.10051229, ...
            beta = tf.compat.v1.get_variable('beta', shape=[256])   # <tf.Variable 'Layer_1/beta:0' shape=(256,) dtype=float32, numpy=array([-7.85651207e-02, -4.94908020e-02,  8.88925046e-03,  9.37148184e-03, ...

【讨论】：

是的，Normalization 层（这不是问题的一部分）“[..] 会将输入转换并缩放到以 0 为中心且标准差为 1 的分布中。它通过预先计算来实现这一点数据的均值和方差，并在运行时调用 (input - mean) / sqrt(var)。"
为什么我觉得您在每个答案中发布的所有不必要的图像都在拖钓？ ;)
我不知道我使用标准数据集并执行您提出的任务，这个问题是图像规范化。