Keras 分类 - 对象检测答案

【问题标题】：Keras Classification - Object DetectionKeras 分类 - 对象检测
【发布时间】：2017-06-12 13:06:53
【问题描述】：

我正在使用 Keras 和 Python 进行分类，然后进行对象检测。我已经以 80% 以上的准确率对猫/狗进行了分类，我对目前的结果还可以。我的问题是如何从输入图像中检测猫或狗？我完全糊涂了。我想使用我自己的身高，而不是来自互联网的预训练。

这是我目前的代码：

from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
import numpy as np
import matplotlib.pyplot as plt
import matplotlib

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img

#########################################################################################################
#VALUES
# dimensions of our images.
img_width, img_height = 150, 150

train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000 #1000 cats/dogs
nb_validation_samples = 800 #400cats/dogs
nb_epoch = 50
#########################################################################################################

#MODEL
model = Sequential()
model.add(Convolution2D(32, 3, 3, input_shape=(3, img_width, img_height)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])


# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
##########################################################################################################
#TEST AUGMENTATION
img = load_img('data/train/cats/cat.0.jpg')  # this is a PIL image
x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)
x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)

# the .flow() command below generates batches of randomly transformed images
# and saves the results to the `preview/` directory
i = 0
for batch in train_datagen.flow(x, batch_size=1,
                          save_to_dir='data/TEST AUGMENTATION', save_prefix='cat', save_format='jpeg'):
    i += 1
    if i > 20:
        break  # otherwise the generator would loop indefinitely
##########################################################################################################
# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)

#PREPARE TRAINING DATA
train_generator = train_datagen.flow_from_directory(
        train_data_dir, #data/train
        target_size=(img_width, img_height),  #RESIZE to 150/150
        batch_size=32,
        class_mode='binary')  #since we are using binarycrosentropy need binary labels

#PREPARE VALIDATION DATA
validation_generator = test_datagen.flow_from_directory(
        validation_data_dir,  #data/validation
        target_size=(img_width, img_height), #RESIZE 150/150
        batch_size=32,
        class_mode='binary')


#START model.fit
history =model.fit_generator(
        train_generator, #train data
        samples_per_epoch=nb_train_samples,
        nb_epoch=nb_epoch,
        validation_data=validation_generator,  #validation data
        nb_val_samples=nb_validation_samples)


model.save_weights('savedweights.h5')
# list all data in history
print(history.history.keys())

#ACC VS VAL_ACC
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy ACC VS VAL_ACC')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
#LOSS VS VAL_LOSS
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss LOSS vs VAL_LOSS')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


model.load_weights('first_try.h5')

所以现在既然我对猫和狗进行了分类，我需要如何以及做什么来输入图像并通过它找到带有边界框的猫或狗？我对此完全陌生，甚至不确定我是否以正确的方式解决了这个问题？谢谢。

更新嗨，很抱歉这么晚才发布结果，这几天无法解决这个问题。我正在导入图像并将其重塑为 1,3,150,150 形状，因为 150,150 形状会带来错误：

Exception: Error when checking : expected convolution2d_input_1 to have 4 dimensions, but got array with shape (150L, 150L)

导入图片：

#load test image
img=load_img('data/prediction/cat.155.jpg')
#reshape to 1,3,150,150
img = np.arange(1* 150 * 150).reshape((1,3,150, 150))
#check shape
print(img.shape)

然后我将 def predict_function(x) 更改为：

def predict_function(x):
    # example of prediction function for simplicity, you
    # should probably use `return model.predict(x)`
   # random.seed(x[0][0])
  #  return random.random()
   return model.predict(img)

现在当我跑步时：

best_box = get_best_bounding_box(img, predict_function)
print('best bounding box %r' % (best_box, ))

我得到的输出是最佳边界框：无

所以我就跑了：

model.predict(img)

然后得到以下内容：

model.predict(img)
Out[54]: array([[ 0.]], dtype=float32)

所以它根本不检查它是猫还是狗......有什么想法吗？

注意：当 def predict)function(x) 使用时：

random.seed(x[0][0])
   return random.random()

我确实得到了输出，它复选框并给出了最好的。

【问题讨论】：

标签： python classification keras object-detection

【解决方案1】：

你建立的机器学习模型和你试图完成的任务是不一样的。该模型尝试解决分类任务，而您的目标是检测图像中的对象，即object detection task。

分类有一个布尔问题，而检测问题有两个以上的答案。

你能做什么？

我可以建议您尝试三种可能性：

1。结合你的模型使用滑动窗口

裁剪定义大小的框（例如从 20X20 到 160X160）并使用滑动窗口。对于每个窗口，尝试预测它是狗的概率，最后取你预测的最大窗口。

这将为边界框生成多个候选者，您将使用获得的最高概率选择边界框。

这可能会很慢，因为我们需要对数百个样本进行预测。

另一种选择是尝试在您的网络之上实施RCNN (another link) 或Faster-RCNN 网络。这些网络基本上减少了要使用的候选边界框窗口的数量。

更新——计算滑动窗口示例

下面的代码演示了如何做滑动窗口算法。您可以更改参数。

import random
import numpy as np

WINDOW_SIZES = [i for i in range(20, 160, 20)]


def get_best_bounding_box(img, predict_fn, step=10, window_sizes=WINDOW_SIZES):
    best_box = None
    best_box_prob = -np.inf

    # loop window sizes: 20x20, 30x30, 40x40...160x160
    for win_size in window_sizes:
        for top in range(0, img.shape[0] - win_size + 1, step):
            for left in range(0, img.shape[1] - win_size + 1, step):
                # compute the (top, left, bottom, right) of the bounding box
                box = (top, left, top + win_size, left + win_size)

                # crop the original image
                cropped_img = img[box[0]:box[2], box[1]:box[3]]

                # predict how likely this cropped image is dog and if higher
                # than best save it
                print('predicting for box %r' % (box, ))
                box_prob = predict_fn(cropped_img)
                if box_prob > best_box_prob:
                    best_box = box
                    best_box_prob = box_prob

    return best_box


def predict_function(x):
    # example of prediction function for simplicity, you
    # should probably use `return model.predict(x)`
    random.seed(x[0][0])
    return random.random()


# dummy array of 256X256
img = np.arange(256 * 256).reshape((256, 256))

best_box = get_best_bounding_box(img, predict_function)
print('best bounding box %r' % (best_box, ))

示例输出：

predicting for box (0, 0, 20, 20)
predicting for box (0, 10, 20, 30)
predicting for box (0, 20, 20, 40)
...
predicting for box (110, 100, 250, 240)
predicting for box (110, 110, 250, 250)
best bounding box (140, 80, 160, 100)

2。为目标检测任务训练新网络

你可以看看pascal dataset (examples here)，它包含 20 个类，其中两个是猫和狗。

数据集包含对象的位置作为 Y 目标。

3。使用现有网络完成此任务

最后但同样重要的是，您可以重用现有网络，甚至为您的特定任务进行“知识转移”（此处为 keras 示例）。

看看下面的convnets-keras lib。

所以选择你最好的方法去更新我们的结果。

【讨论】：

您好，谢谢您这么好的回复！！！我会先尝试做滑动窗口，你能举例说明如何用代码做到这一点吗？对不起，我对 python 和 keras 很陌生。
我已经添加了虚拟窗口的示例，希望对结果有所帮助
我会试试的，先生，谢谢您在此处留下关于结果的反馈！
我已经用结果更新了问题，现在有什么问题吗？ ;/
尝试 model.predict([x]) 但您仍然需要调整输入图像的大小以适应模型输入向量的大小。我建议打开另一个问题来帮助您解决这个特定问题