使用 VGG16 预训练权重的 Imagenet 分类问题答案

【问题标题】：Issue with Imagenet classification with VGG16 pretrained weights使用 VGG16 预训练权重的 Imagenet 分类问题
【发布时间】：2023-03-19 08:08:01
【问题描述】：

我试图在 tensorflow 中使用 VGG16 网络运行一个香草图像网络分类（通过 Keras 主干给出 VGG16）。

但是，当我尝试对大象样本图像进行分类时，它给出了完全出乎意料的结果。

我无法弄清楚可能是什么问题。

这是我使用的完整代码：

import tensorflow as tf
import numpy as np
from PIL import Image
from tensorflow.python.keras._impl.keras.applications import imagenet_utils


model = tf.keras.applications.VGG16()
VGG = model.graph

VGG.get_operations()
input = VGG.get_tensor_by_name("input_1:0")
output = VGG.get_tensor_by_name("predictions/Softmax:0")
print(input)
print(output)

I = Image.open("Elephant.jpg")
new_img = I.resize((224,224))
image_array = np.array(new_img)[:, :, 0:3]
image_array = np.expand_dims(image_array, axis=0)


with tf.Session(graph=VGG) as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    pred = (sess.run(output,{input:image_array}))
    print(imagenet_utils.decode_predictions(pred))

以下是我得到的示例输出：

张量("input_1:0", shape=(?, 224, 224, 3), dtype=float32)
Tensor("predictions/Softmax:0", shape=(?, 1000), dtype=float32)

[[('n02281406', 'sulphur_butterfly', 0.0022673723), ('n01882714', '考拉', 0.0021256246), ('n04325704', '偷走', 0.0020583202), ('electric_ray,', 0.0020416214), ('n01797886', 'ruffed_grouse', 0.0020229272)]]

从概率来看，传递的图像数据似乎有问题（因为所有数据都非常低）。

但我不知道出了什么问题。
而且我很确定这张照片是一头大象作为人类！

【问题讨论】：

标签： python tensorflow computer-vision conv-neural-network imagenet

【解决方案1】：

我认为有两个错误，第一个是您必须通过除以 255 所有像素来重新缩放图像。

I = Image.open("Elephant.jpg")
new_img = I.resize((224,224))
image_array = np.array(new_img)[:, :, 0:3]
image_array /= 255.
image_array = np.expand_dims(image_array, axis=0)

第二点是我在查看预测值时得到的。您有一个包含 1000 个元素的向量，并且在重新缩放后它们都有 0.1% 的预测。这意味着您有一个未经训练的模型。如果在 tensorflow 中加载，我不知道该怎么做，但例如在 Keras 上你可以这样做：

app = applications.vgg16
model = app.VGG16(
        include_top=False,    # this is to have the classifier Standard from imagenet
        weights='imagenet',   # this load weight, else it's random weight
        pooling="avg")

根据我的阅读，您必须从例如 github 下载另一个包含重量的文件。

希望对你有帮助

EDIT1：

我尝试了使用 Keras 的相同模型：

from keras.applications.vgg16 import VGG16, decode_predictions
import numpy as np

model = VGG16(weights='imagenet')

I = Image.open("Elephant.jpg")
new_img = I.resize((224,224))
image_array = np.array(new_img)[:, :, 0:3]
image_array = image_array/255.
x = np.expand_dims(image_array, axis=0)

preds = model.predict(x)
print('Predicted:', decode_predictions(preds, top=5)[0])

如果我评论重新缩放，我有不好的预测：

预测：[('n03788365', 'mosquito_net', 0.22725257), ('n15075141', 'toilet_tissue', 0.026636025), ('n04209239', 'shower_curtain', 0.019786758), ('n02804'4'4 , 0.01353887), ('n03131574', '婴儿床', 0.01316699)]

没有重新缩放，这很好：

预测：[('n02504458', 'African_elephant', 0.95870858), ('n01871265', 'tusker', 0.040065952), ('n02504013', 'Indian_elephant', 0.0012253703), ('n0170432) , 5.0949382e-08), ('n02454379', '犰狳', 5.0408511e-10)]

现在，如果我去掉权重，我会得到与使用 Tensorflow 的“相同”：

预测：[('n07717410', 'acorn_squash', 0.0010033853), ('n02980441', '城堡', 0.0010028203), ('n02124075', 'Egyptian_cat', 0.0010028186), ('n0417'9'9 , 0.0010027955), ('n02492660', 'howler_monkey', 0.0010027081)]

对我来说，这意味着你没有施加任何重量。也许它们已下载但未使用。

【讨论】：

感谢您的回复。对于您提到的第1点，我后来意识到，修改后的结果是一样的。关于第二点 VGG16() 还提供了你提到的参数（权重和池化），它们的默认值很好，所以这里没有问题。正如你提到的，VGG16 下载最初从 github 加载权重（来自（直接链接到文件！！ )github.com/fchollet/deep-learning-models/releases/download/v0.1/…)
由于答案的大小，我通过编辑回复。
感谢您的详细回复，但它仍然没有解决我的问题。我检查了 vgg16.py 的来源，似乎所有情况下都有 model.load_weights(weights_path)。
是的，在 keras 上，您可以加载权重文件（例如，从已经完成的训练中），但我不知道 Tensorflow 的等价物。我在谷歌上没有找到清楚的东西，但可以肯定有一些东西。我还检查了我拥有的一本书，他为重量创建了占位符并为自己创建了图层，所以这也无济于事。
不要除以 255.0。相反，请使用 keras.applications.vgg16.preprocess_input 实用程序。另外，评论init_op = tf.global_variables_initializer(), sess.run(init_op)，它将权重重置为随机值。如果它不起作用，请告诉我。

【解决方案2】：

似乎我们可以（或需要？）使用来自 Keras 的会话（它具有关联的加载图和权重），而不是在 Tensorflow 中创建新会话并使用从 Keras 模型获得的图，如下所示

VGG = model.graph

我认为上面得到的图表没有权重（这就是预测错误的原因），并且来自 Keras 会话的图表作为正确的权重（所以这两个图表实例应该不同）

以下是完整代码：

import tensorflow as tf
import numpy as np
from PIL import Image
from tensorflow.python.keras._impl.keras.applications import imagenet_utils
from tensorflow.python.keras._impl.keras import backend as K


model = tf.keras.applications.VGG16()
sess = K.get_session()
VGG = model.graph #Not needed and also doesnt have weights in it

VGG.get_operations()
input = VGG.get_tensor_by_name("input_1:0")
output = VGG.get_tensor_by_name("predictions/Softmax:0")
print(input)
print(output)

I = Image.open("Elephant.jpg")
new_img = I.resize((224,224))
image_array = np.array(new_img)[:, :, 0:3]
image_array = np.expand_dims(image_array, axis=0)
image_array = image_array.astype(np.float32)
image_array = tf.keras.applications.vgg16.preprocess_input(image_array)

pred = (sess.run(output,{input:image_array}))
print(imagenet_utils.decode_predictions(pred))

这给出了预期的结果：

[[('n02504458', 'African_elephant', 0.8518132), ('n01871265', 'tusker', 0.1398836), ('n02504013', 'Indian_elephant', 0.0082286), ('n01704323', '三角龙' 6.965483e-05), ('n02397096', 'warthog', 1.8662439e-06)]]

感谢 Idavid 提供有关使用 preprocess_input() 函数的提示，感谢 Nicolas 提供有关卸载重量的提示。

【讨论】：