如何在我训练有素的模型上应用 Grad-CAM？答案

【问题标题】：How to apply Grad-CAM on my trained Model?如何在我训练有素的模型上应用 Grad-CAM？
【发布时间】：2021-06-28 20:41:45
【问题描述】：

我已经训练了一个模型来判断图像是对还是错（只有 2 个类），并且我使用了 keras 网站上的指南 GradCAM。输入图像被重新整形为 (250, 250)，然后通过将图像 numpy 数组除以 255 进行归一化。然后将其传递给模型的训练。这是附加的代码。我遇到以下错误：Invalid reduction dimension (1 for input with 1 dimension(s) [Op:Mean]

数据

image = cv2.imread("/content/drive/MyDrive/SendO2/Train/correct/droidcam-20210128-152301.jpg")
image = cv2.resize(image, (250, 250))
image = image.astype('float32') / 255
image = np.expand_dims(image, axis=0)

型号

model = Sequential()

#Adding first convolutional layer
model.add(Conv2D(64, (3,3), activation="relu"))

#Adding maxpooling
model.add(MaxPooling2D((2,2)))

#Adding second convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))

#Adding third convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))

#Adding fourth convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))

#Adding fifth convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))

#Flattening the layers
model.add(Flatten())

model.add(Dense(128, input_shape = X.shape[1:], activation="relu"))

#Output Layer. Since, the image is right/wrong, only 2 neurons is needed.
model.add(Dense(2, activation = "softmax"))
# model.add(Dense(2, activation = "sigmoid"))

model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy", metrics = ["accuracy"])

GradCAM

def get_img_array(img_path, size):
    # `img` is a PIL image of size 299x299
    img = keras.preprocessing.image.load_img(img_path, target_size=size)
    # `array` is a float32 Numpy array of shape (299, 299, 3)
    array = keras.preprocessing.image.img_to_array(img)
    # We add a dimension to transform our array into a "batch"
    # of size (1, 299, 299, 3)
    array = np.expand_dims(array, axis=0)
    print(array.shape)
    return array

def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
    # First, we create a model that maps the input image to the activations
    # of the last conv layer as well as the output predictions
    grad_model = tf.keras.models.Model(
        [model.inputs], [model.get_layer(last_conv_layer_name).output, model.output]
    )

    # Then, we compute the gradient of the top predicted class for our input image
    # with respect to the activations of the last conv layer
    with tf.GradientTape() as tape:
        last_conv_layer_output, preds = grad_model(img_array)
        if pred_index is None:
            pred_index = tf.argmax(preds[0])
        class_channel = preds[:, pred_index]

    # This is the gradient of the output neuron (top predicted or chosen)
    # with regard to the output feature map of the last conv layer
    grads = tape.gradient(class_channel, last_conv_layer_output)

    # This is a vector where each entry is the mean intensity of the gradient
    # over a specific feature map channel
    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
    # pooled_grads = tf.reduce_mean(grads)

    # We multiply each channel in the feature map array
    # by "how important this channel is" with regard to the top predicted class
    # then sum all the channels to obtain the heatmap class activation
    last_conv_layer_output = last_conv_layer_output[0]
    heatmap = last_conv_layer_output @ pooled_grads[..., tf.newaxis]
    heatmap = tf.squeeze(heatmap)

    # For visualization purpose, we will also normalize the heatmap between 0 & 1
    heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
    return heatmap.numpy()

调整参数

img_size = (250, 250)
preprocess_input = keras.applications.xception.preprocess_input
decode_predictions = keras.applications.xception.decode_predictions

last_conv_layer_name = "dense_1"

# The local path to our target image
img_path =  "/content/drive/MyDrive/SendO2/Train/correct/droidcam-20210128-152301.jpg"

preprocess_input = keras.applications.xception.preprocess_input
decode_predictions = keras.applications.xception.decode_predictions


display(Image(img_path))

运行它们

# Prepare image
img_array = preprocess_input(get_img_array(img_path, size=img_size))

# Make model
model = model_builder(weights="imagenet")

# Remove last layer's softmax
model.layers[-1].activation = None

# Print what the top predicted class is
preds = model.predict(img_array)
print("Predicted:", decode_predictions(preds, top=1)[0])

# Generate class activation heatmap
heatmap = make_gradcam_heatmap(img_array, model, last_conv_layer_name)

# Display heatmap
plt.matshow(heatmap)
plt.show()

这是错误：

如果有人能在这里帮助我，我将不胜感激。

【问题讨论】：

试试this。让我知道它是否适合您。
嘿，谢谢你的链接，但我收到了这个错误：InvalidArgumentError: Invalid reduction dimension (1 for input with 1 dimension(s) [Op:Mean]. 我要问你的一件事是你的模型有 10 个输出神经元，而我的只有 2 个输出神经元。那么，您认为这会影响代码吗？错误：imgur.com/a/Wg9QfOS
您如何尝试上述链接？如果可能，请分享一个 colab 链接。
drive.google.com/drive/folders/…
抱歉回复晚了。请看一下答案。希望对您有所帮助。

标签： python-3.x tensorflow keras deep-learning neural-network

【解决方案1】：

这是完整的演示工作代码。与您类似，我将使用 softmax 对 2 类进行分类，并使用 sparse_categorical_crossentropy 损失函数。你的模型定义有问题，所以我自己写，不用担心很简单。

为了使它有用，我将端到端回答，希望新访问者也能发现它有用。我将对数字偶数还是奇数进行分类 - 二进制分类。最后，我们将使用原始样本图像并对其进行预处理，并尝试找到一个类激活图，以查看我们的模型关注的位置。所以，这里是内容：

准备 2 类数据集
构建 2 类分类器并对其进行训练
查找 Grad-CAM

数据集

我们将使用MNIST 并将其修改为二进制类 - 偶数或奇数。

import tensorflow as tf 

(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()

# x set 
x_train = tf.expand_dims(x_train, axis=-1)
x_train = tf.divide(x_train, 255)
x_train = tf.image.resize(x_train, [84,84]) 

# y set 
# odd : 0 ; even : 1
y_train = (y_train % 2 == 0).astype(int)

print(x_train.shape, y_train.shape)
(60000, 84, 84, 1) (60000,)

型号

from tensorflow.keras import Sequential
from tensorflow.keras.layers import (Conv2D, MaxPooling2D, Flatten, Dropout,
                                     Dense, GlobalAveragePooling2D)

model = Sequential()

#Adding first convolutional layer
model.add(Conv2D(16, kernel_size=(3,3), input_shape = (84,84,1)))  
model.add(Conv2D(32, kernel_size=(3,3), activation="relu"))  
model.add(Conv2D(64, kernel_size=(3,3), activation="relu"))  
model.add(Conv2D(128, kernel_size=(3,3), activation="relu"))  
model.add(GlobalAveragePooling2D())   
model.add(Dropout(0.5))         
model.add(Dense(2, activation=tf.nn.softmax))       
model.summary()

model.compile(optimizer = "adam", 
              loss = "sparse_categorical_crossentropy", 
              metrics = ["accuracy"])

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_16 (Conv2D)           (None, 82, 82, 16)        160       
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 80, 80, 32)        4640      
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 78, 78, 64)        18496     
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 76, 76, 128)       73856     
_________________________________________________________________
global_average_pooling2d_3 ( (None, 128)               0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 258       
=================================================================
Total params: 97,410
Trainable params: 97,410
Non-trainable params: 0

请注意，对于Grad-CAM，我们将使用层conv2d_19 - 就在GAP 层之前。（您可以选择另一个提供2D 特征图的层）。好的，训练模型：

model.fit(x_train, y_train, epochs=20, batch_size=256, verbose=2)

Epoch 1/20
235/235 - 32s - loss: 0.5549 - accuracy: 0.7203
Epoch 2/20
235/235 - 29s - loss: 0.4586 - accuracy: 0.7890
Epoch 3/20
235/235 - 29s - loss: 0.4160 - accuracy: 0.8132
Epoch 4/20
235/235 - 29s - loss: 0.4000 - accuracy: 0.8226
Epoch 5/20
235/235 - 29s - loss: 0.3830 - accuracy: 0.8304
Epoch 6/20
235/235 - 29s - loss: 0.3690 - accuracy: 0.8367
Epoch 7/20
235/235 - 29s - loss: 0.3619 - accuracy: 0.8421
...
...

Grad-CAM

我们将使用与我的另一个答案 here 完全相同的 GradCAM 类。没有区别。之后，我们将加载一个示例图像：

import matplotlib.pyplot as plt

image = cv2.imread('/content/5.png', 0)
image = cv2.bitwise_not(image)          # ATTENTION 
image = cv2.resize(image, (84, 84))

# checking how it looks 
plt.imshow(image, cmap="gray")
plt.show()

image = tf.expand_dims(image, axis=-1)     # from 84 x 84 to 84 x 84 x 1 
image = tf.divide(image, 255)              # normalize
image = tf.reshape(image, [1, 84, 84, 1])  # reshape to add batch dimension

print(image.shape) # (1, 84, 84, 1)

注意：您可能想知道我为什么使用image = cv2.bitwise_not(image) 这个。因为对我来说，我的样本看起来如下（背景白色和前景黑色），它看起来不像训练模型的数据集（MNIST）。这就是为什么我必须在我的示例上使用 bitwise_not 操作 - 但您可能不需要为您的案例执行此操作。更多详情请看我的another answer。

好的，我们知道这张图片是什么。我还在下图中添加了一些线条，以查看什么模型对它们做出反应 VS 主要 ROI。让我们看看模型会说什么，是偶数 (1) 还是奇数 (0)。

preds = model.predict(image) 
i = np.argmax(preds[0])
i # 0 - great model correctly recognize, this is an odd number

查找类激活图

# `conv2d_19` - remember this, we talked about it earlier 
icam = GradCAM(model, i, 'conv2d_19') 
heatmap = icam.compute_heatmap(image)
heatmap = cv2.resize(heatmap, (84, 84))

image = cv2.imread('/content/5.png')
image = cv2.resize(image, (84, 84))
print(heatmap.shape, image.shape)

(heatmap, output) = icam.overlay_heatmap(heatmap, image, alpha=0.5)

可视化

fig, ax = plt.subplots(1, 3)
fig.set_size_inches(20,20)

ax[0].imshow(heatmap)
ax[1].imshow(image)
ax[2].imshow(output)

【讨论】：

您使用的是 Guided Grad-CAM，它与 Grad-CAM 不同。在一篇名为 Sanity Checks for Saliency Maps 的论文中，Adebayo 等人表明 Guided Grad-CAM 只是输入的一个函数，因此没有显示模型学到了什么。此问题不影响 Grad-CAM。此外，一般来说，输入梯度被认为很容易任意移动，这意味着任何使用输入梯度的显着性方法实际上都不能反映模型的学习行为。换句话说，Guided Backpropagation 和 Guided Grad-CAM 是当今大忌。
回答清楚，我在找什么