faster-rcnn - 爱码网

前言：

如下图所示，根据自己制定的“Deep Learning”学习计划，11月份的主要任务是：熟悉各大DL网络模型，主要以分类和检测为主；看论文；熟悉病理数据等。我们有一个2人组的小分队，我这个月的主要工作集中在学习目标检测的经典算法以及基于tensorflow或者keras跑一些经典的案例，主要有R-CNN，SPP-Net，Fast-RCNN，Faster-RCNN，YOLO等；另一名成员主要学习分类相关的经典网络模型，主要是google-net一系列的模型（inception-v1，inception-v2，inception-v3，resnet 等）。我们分别要整理出一份关于检测和分类的详细报告，然后不断完善、互相交流讨论、分享，发挥小分队的优势。

faster-rcnn

本文不打算介绍理论知识，网上有很多整理好的资源，如果后续有更深刻的理解，再另写博文整理。本文主要整理了Faster-RCNN的实践，使用了resnet网络参数做与训练、KITTI数据集做 fine-tuning。最近在 github 上找到一位乐于开源的大佬，这个大佬主要也是使用tensorflow和keras框架实现一些深度学习的案例。其中，本文的Faster-RCNN实践也是基于这位大佬开源的源码进行整理和归纳。本文的目的是，有需要的读者看到这篇博文，然后跟着做就能够把代码跑通，感受一下Faster-RCNN的魅力所在。如果读者在实践的过程中遇到任何问题，欢迎留言，我也会尽力跟大家一起解决问题。

本文假设实践的读者们已经搭好了 tensorflow 和 keras 框架。如果没有的话，可以参考如下几篇博文：

1）http://blog.csdn.net/houchaoqun_xmu/article/details/72461592

2）http://blog.csdn.net/houchaoqun_xmu/article/details/78508783

建议读者们使用 python3，tensorflow（>1.1），keras（2.0.9）等环境，实践本文提供的Faster-RCNN案例。

Reference：

github 源码地址：https://github.com/Houchaoqun/keras_frcnn

TFFRCNN：https://github.com/CharlesShang/TFFRCNN

KITTI Datasets：http://www.cvlibs.net/datasets/kitti/index.php

h5 - 模型参数（inception-v3，resnet50，VGG16，VGG19）：http://pan.baidu.com/s/1dET5J7z 密码：hdp9

"Can't open attribute (can't locate attribute: 'layer_names')"：http://blog.csdn.net/dugudaibo/article/details/78008918

深度学习与计算机视觉看这一篇就够了：http://blog.csdn.net/u012507022/article/details/51441629

前期准备工作：

1）下载 github 源码：

[python]view plain copy
git clone https://github.com/Houchaoqun/keras_frcnn  

2）下载 KITTI 数据集：

- 训练数据标签：http://kitti.is.tue.mpg.de/kitti/data_object_label_2.zip

- 训练数据图像：http://kitti.is.tue.mpg.de/kitti/data_object_image_2.zip

3）下载模型参数并存放到如下路径（需新建 model 文件夹），本文使用的是 resnet50 模型的参数：

[python]view plain copy
./model/resnet50_weights_tf_dim_ordering_tf_kernels.h5    # 上文有提供一些模型参数的下载地址  

4）根据源码，创建存放数据的文件夹（注意区分大小写）：

[python]view plain copy
./media/jintian/Netac/Datasets/Kitti/object/training/image_2   
  
./media/jintian/Netac/Datasets/Kitti/object/training/label_2  

5）将下载好的KITTI数据图像和标签分别存放在对应的路径下（根据源码而定），本文的路径示例如下所示：

[python]view plain copy
./media/jintian/Netac/Datasets/Kitti/object/training/image_2/002468.png  # 训练数据图像  
./media/jintian/Netac/Datasets/Kitti/object/training/label_2/002468.png  # 训练数据标签  

本文实践环境：python 3.6.3 + tensorflow 1.1.0 + keras 2.0.9

[python]view plain copy
[email protected]:~/document/deepLearning/github/keras_frcnn$ python  
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)   
[GCC 7.2.0] on linux  
Type "help", "copyright", "credits" or "license" for more information.  
>>> import tensorflow as tf  
>>> tf.__version__  
'1.1.0'  
>>> import keras  
Using TensorFlow backend.  
... ...  
... ...  
  
>>> keras.__version__  
'2.0.9'  

要注意各个框架、工具的版本，不然有可能会报错。比如keras的版本2.1.1时，就会报如下错误：

[python]view plain copy
Error when checking target: expected rpn_out_class to have shape (None, None  

KITTI datasets：

KITTI 是一个测试交通场景中车辆检测，车辆追踪，语义分割等算法的公开数据集。现在测试自动驾驶等车辆识别算法的，都用这个数据集。

KITTI 主页链接：http://www.cvlibs.net/datasets/kitti/

faster-rcnn

训练数据图像和训练数据标签下载如下图所示：

本文提供可以直接下载的链接（官网需要提供邮箱，然后再将下载链接发至对应的邮箱）：

1）训练数据标签：http://kitti.is.tue.mpg.de/kitti/data_object_label_2.zip

2）训练数据图像：http://kitti.is.tue.mpg.de/kitti/data_object_image_2.zip

faster-rcnn

keras_frcnn 代码结构：

[python]view plain copy
[email protected]:~/document/deepLearning/github/keras_frcnn$ tree -L 2  
.  
├── config.pickle  
├── extract_featuremap.py  
├── generate_simple_kitti_anno_file.py  
├── images  
│   ├── 000000.png  
│   ├── 000001.png  
│   ├── 000002.png  
│   ├── 000003.png  
│   ├── 000004.png  
│   ├── 000005.png  
│   ├── 000006.png  
│   ├── 000007.png  
│   ├── 000008.png  
│   ├── 000009.png  
│   ├── 000010.png  
│   ├── 000011.png  
│   ├── 000012.png  
│   ├── 000013.png  
│   ├── 000014.png  
│   └── 000015.png  
├── keras_frcnn  
│   ├── config.py  
│   ├── config.pyc  
│   ├── data_augment.py  
│   ├── data_augment.pyc  
│   ├── data_generators.py  
│   ├── data_generators.pyc  
│   ├── fixed_batch_normalization.py  
│   ├── fixed_batch_normalization.pyc  
│   ├── __init__.py  
│   ├── __init__.pyc  
│   ├── losses.py  
│   ├── losses.pyc  
│   ├── pascal_voc_parser.py  
│   ├── __pycache__  
│   ├── resnet.py  
│   ├── resnet.pyc  
│   ├── roi_helpers.py  
│   ├── roi_helpers.pyc  
│   ├── roi_pooling_conv.py  
│   ├── roi_pooling_conv.pyc  
│   ├── simple_parser.py  
│   ├── simple_parser.pyc  
│   ├── vgg.py  
│   ├── visualize.py  
│   └── visualize.pyc  
├── kitti_simple_label-backup.txt  
├── kitti_simple_label.txt  
├── measure_map.py  
├── media  
│   ├── jintian  
│   └── tri_images  
├── model  
│   ├── inception_resnet_v2_weights_tf_dim_ordering_tf_kernels.h5  
│   ├── inception_resnet_v2_weights_tf_dim_ordering_tf_kernels_notop.h5  
│   ├── kitti_frcnn_last.hdf5  
│   └── resnet50_weights_tf_dim_ordering_tf_kernels.h5  
├── README.md  
├── requirements.txt  
├── results_images  
│   └── backup-images  
├── test_frcnn_kitti.py  
└── train_frcnn_kitti.py  
  
9 directories, 54 files  

注：从上述的代码结构可以看出，./model 目录下存放了几个"XX.h5"和1个"XX.hdf5"模型参数。本文使用了“inception_resnet_v2_weights_tf_dim_ordering_tf_kernels.h5”作为模型的初始化参数（需要自行下载并放到对应的目录下，上文提供了下载链接），“Kitti_frcnn_last.hdf5”是用来存储模型使用KITTI数据集进行fine-tuning得到的参数。

配置文件：./keras_frcnn/keras_frcnn/config.py

[python]view plain copy
# -*- encoding: utf-8 -*-  
from keras import backend as K  
  
  
class Config:  
    def __init__(self):  
        self.verbose = True  
  
        # 使用resnet50做预训练  
        self.network = 'resnet50'  
  
        # setting for data augmentation  
        self.use_horizontal_flips = False  
        self.use_vertical_flips = False  
        self.rot_90 = False  
  
        # 配置 faster-rcnn 参数  
        # anchor box scales  
        self.anchor_box_scales = [128, 256, 512]  
        # anchor box ratios  
        self.anchor_box_ratios = [[1, 1], [1, 2], [2, 1]]  
        # size to resize the smallest side of the image  
        self.im_size = 600  
        # image channel-wise mean to subtract  
        self.img_channel_mean = [103.939, 116.779, 123.68]  
        self.img_scaling_factor = 1.0  
        # number of ROIs at once  
        self.num_rois = 4  
        # stride at the RPN (this depends on the network configuration)  
        self.rpn_stride = 16  
        self.balanced_classes = False  
        # scaling the stdev  
        self.std_scaling = 4.0  
        self.classifier_regr_std = [8.0, 8.0, 4.0, 4.0]  
        # overlaps for RPN  
        self.rpn_min_overlap = 0.3  
        self.rpn_max_overlap = 0.7  
        # overlaps for classifier ROIs  
        self.classifier_min_overlap = 0.1  
        self.classifier_max_overlap = 0.5  
  
        # placeholder for the class mapping, automatically generated by the parser  
        self.class_mapping = None  
  
        # location of pretrained weights for the base network  
        # 设置模型预训练的参数，本文主要使用如下路径：  
        # 1）'./model/resnet50_weights_tf_dim_ordering_tf_kernels.h5'  
        # 2）'./model/kitti_frcnn_last.hdf5'  
        self.model_path = './model/kitti_frcnn_last.hdf5'  
  
        self.data_dir = '.data/'  
        # 设置模型参数：  
        self.num_epochs = 3000  
        # 指定标签存储文件  
        self.kitti_simple_label_file = 'kitti_simple_label.txt'  
        # TODO: this field is set to simple_label txt, which in very simple format like:  
        # TODO: /path/image_2/000000.png,712.40,143.00,810.73,307.92,Pedestrian, see kitti_simple_label.txt for detail  
        self.simple_label_file = 'simple_label.txt'  
        self.config_save_file = 'config.pickle'  

num_epochs = 3000 表示模型需要将数据训练3000轮次，本文使用1080Ti的GPU训练需要好几天，因此将 num_epochs 设置成 200 也可以达到不错的效果。你可以根据自己的需求调整参数。

实践步骤：

1）完成前期准备的相关工作

2）执行如下操作，生成 kitti_simple_label.txt 标签文件（目录文件可根据实际情况修改）

[python]view plain copy
python generate_simple_kitti_anno_file.py \  
./data/training/image_2 \  
./data/training/label_2  

执行成功后，提示如下所示：

[python]view plain copy
[email protected]:~/hcq/deep_learning/github/keras_frcnn$ python generate_simple_kitti_anno_file.py \  
> ./data/training/image_2 \  
> ./data/training/label_2  
got 7481 label files.  
convert finished.  

kitti_simple_label.txt 文件格式如下所示：

[python]view plain copy
./data/training/image_2/000090.png,5.08,199.56,126.68,269.46,Car  
./data/training/image_2/005525.png,585.62,176.58,602.32,189.82,Car  
./data/training/image_2/005525.png,475.73,177.95,510.45,202.70,Car  
./data/training/image_2/005525.png,531.44,176.06,546.07,188.60,DontCare  
./data/training/image_2/005525.png,566.85,171.90,581.48,188.61,DontCare  
./data/training/image_2/000513.png,568.87,174.25,772.15,366.01,Car  
./data/training/image_2/000513.png,1163.37,178.04,1241.00,374.00,Car  
./data/training/image_2/000513.png,719.08,169.65,842.26,226.62,Car  
./data/training/image_2/000513.png,688.62,172.92,762.41,208.73,Car  
./data/training/image_2/000513.png,668.50,174.27,735.15,201.33,Car  
./data/training/image_2/000513.png,508.35,177.79,543.90,201.43,Car  
./data/training/image_2/000513.png,41.53,193.83,230.53,267.98,Car  
./data/training/image_2/000513.png,581.35,173.22,605.75,189.20,Car  
./data/training/image_2/000513.png,351.93,181.28,426.19,216.65,Car  
./data/training/image_2/000513.png,402.42,173.81,452.22,209.80,Car  
./data/training/image_2/000513.png,457.03,175.66,500.62,199.92,Car  
./data/training/image_2/000513.png,514.89,176.56,554.22,193.83,Car  

可以看出，kitti 这组训练数据虽然只有7481的训练图像，但是每张图像都可能存在多个目标类。比如 000513.png 这张图像就存在很多目标类，所有对应了多个标签。

Kitti数据集中的7481张图像，通过./keras_frcnn/keras_frcnn/simple_parser.py 脚本中的 get_data() 函数划分为训练数据和测试数据，数量如下所示：

[python]view plain copy
Training images per class:  
{'Car': 28742,  
 'Cyclist': 1627,  
 'DontCare': 11295,  
 'Misc': 973,  
 'Pedestrian': 4487,  
 'Person_sitting': 222,  
 'Tram': 511,  
 'Truck': 1094,  
 'Van': 2914,  
 'bg': 0}  

[python]view plain copy
Num classes (including bg) = 10  # 包括背景的目标类个数  
Num train samples 6220  # 训练数据  
Num val samples 1261  # 验证数据  

3）使用 resnet50 参数作为模型初始化参数，有几个关键的位置如下所示：

[python]view plain copy
# 【1】 ./keras_frcnn/train_frcnn_kitti.py  
# base_net_weights  
try:  
    print('loading weights from {}'.format(cfg.base_net_weights))  
    model_rpn.load_weights(cfg.base_net_weights, by_name=True)  
    model_classifier.load_weights(cfg.base_net_weights, by_name=True)  
except Exception as e:  
    print(e)  
    print('Could not load pretrained model weights. Weights can be found in the keras application folder '  
          'https://github.com/fchollet/keras/tree/master/keras/applications')  

[python]view plain copy
# 【2】./keras_frcnn/keras_frcnn/config.py  
  
# self.num_epochs = 3000  
self.num_epochs = 100  

4）训练模型，执行如下命令：

[python]view plain copy
python train_frcnn_kitti.py  

模型开始训练的效果如下所示：

faster-rcnn

经过几天的 training，模型最终训练完成，参数保存在"./keras_frcnn/model/kitti_frcnn_last.hdf5“。

[python]view plain copy
Average number of overlapping bounding boxes from RPN = 34.579 for 1000 previous iterations  
1000/1000 [==============================] - 1329s 1s/step - rpn_cls: 0.2544 - rpn_regr: 0.0987 - detector_cls: 0.3043 - detector_regr: 0.1163   
Mean number of bounding boxes from RPN overlapping ground truth boxes: 35.861  
Classifier accuracy for bounding boxes from RPN: 0.87734375  
Loss RPN classifier: 0.23741177825622123  
Loss RPN regression: 0.09788954605642357  
Loss Detector classifier: 0.29615414681006225  
Loss Detector regression: 0.11018231464223936  
Elapsed time: 1329.4224348068237  
Training complete, exiting.  

5）测试训练好的模型，执行如下命令：

[python]view plain copy
### usage  
python test_frcnn_kitti.py # 测试默认的图像，default='images/000010.png'  
python test_frcnn_kitti.py -p ./images/000010.png # 测试指定的图像  
python test_frcnn_kitti.py -p ./images # 测试指定文件夹下的所有图像，其中images是一个文件夹  

如果你还没安装opencv，可以执行如下步骤进行安装：

[python]view plain copy
pip install opencv-python  

测试结果如下所示：

- python test_frcnn_kitti.py # 测试默认图像

[python]view plain copy
Loading weights from ./model/kitti_frcnn_last.hdf5  
2017-11-26 14:14:11.646544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)  
predict image from images/000010.png  
Car:  
[ 359.  189.  539.  299.] prob: 0.9999945163726807  
[ 559.  179.  629.  229.] prob: 0.9998922348022461  
[ 819.  179.  919.  249.] prob: 0.9997738003730774  
[  999.   199.  1249.   369.] prob: 0.9964742064476013  
[ 789.  179.  849.  219.] prob: 0.9433906674385071  
[ 589.  179.  639.  219.] prob: 0.9140269756317139  
[ 809.  189.  879.  239.] prob: 0.8051236271858215  
DontCare:  
[ 139.  189.  179.  199.] prob: 0.8146459460258484  
Truck:  
[  879.     0.  1259.   249.] prob: 0.9663242697715759  
Elapsed time = 6.536675214767456  
result saved into  ./results_images/000010.png  
Please enter any keyboard to exit...  

- python test_frcnn_kitti.py -p ./images/001.png # 训练自己的图像

faster-rcnn

可以看到，训练这张随手拍的照片，效果并不是很好，只能识别出一辆车。之后如果有更多的数据集，再训练训练会有更好的效果。还有一个原因就是，我只训练了250个epoch，不过这时候模型的总损失已经变化不大了。

- python test_frcnn_kitti.py -p ./images # 测试images目录下的所有图像，并把结果存入 results_images

faster-rcnn

从上图可以看到，模型对images目录下的图像依次进行测试，并将结果存入指定的文件夹（results_images）