最近在做量化相关的工作,在老师的推荐下看了这篇文章,这篇文章是google2018新的作品,非常良心,讲解非常详细,而且有代码可以work。

一、参考文献

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

文章讲解:

Google CVPR 2018论文:CNN量化技术

Additionally, the minimum and maximum values for activations are determined during training. This allows a model trained with quantization in the loop to be converted to a fixed point inference model with little effort, eliminating the need for a separate calibration step.

此外,**的最小值和最大值在训练期间确定。这使得在循环中用量化训练的模型可以毫不费力地转换成固定点推断模型,从而不需要单独的校准步骤。(校准是为了获得参数的范围)


二、具体实现

github 代码:

https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md

The linked model tar files contain the following:

  • Trained model checkpoints:

mobilenet_v1_1.0_224.ckpt.data-00000-of-00001(保存变量及其取值)

mobilenet_v1_1.0_224.ckpt.index

mobilenet_v1_1.0_224.ckpt.meta(保存图结构)

  • Eval graph text protos (to be easily viewed) :mobilenet_v1_1.0_224_eval.pbtxt
  • Frozen trained models:mobilenet_v1_1.0_224_frozen.pb(模型大小:17173742
  • Info file containing input and output information:mobilenet_v1_1.0_224_info.txt
  • Converted TensorFlow Lite flatbuffer model:mobilenet_v1_1.0_224.tflite(模型大小:4276000

Note that quantized model GraphDefs (pb文件) are still float models, they just have FakeQuantization operation embedded to simulate quantization. These are converted by TensorFlow Lite to be fully quantized. The final effect of quantization can be seen by comparing the frozen fake quantized graph to the size of the TFLite flatbuffer, i.e. The TFLite flatbuffer is about 1/4 the size. For more information on the quantization techniques used here, see here.

请注意,量化模型GraphDefs(pb文件)仍然是浮点模型,它们只是嵌入了伪量化操作以模拟量化。 这些由TensorFlow Lite转换为完全量化。 量化的最终效果可以通过比较冷冻假量化图与TFLite扁平缓冲器的大小来看出,即,TFLite扁平缓冲器大小约为1/4。 有关这里使用的量化技术的更多信息,请看这里。

MobileNet V1 scripts(路径:models/research/slim/nets/mobilenet_v1.md 在nets文件夹下关于mobilenet的script

This package contains scripts for training floating point and eight-bit fixed point TensorFlow models.

Quantization tools used are described in contrib/quantize.(量化工具)

Conversion to fully quantized models for mobile can be done through TensorFlow Lite.(上文说过了现在文件只是fake quantization,所以要全量化需要用TensorFlow lite转换)


 Accuracies were computed by evaluating using a single image crop.(即未使用多尺度,这些精度是通过使用单个图像作物评估来计算的。一些学术论文报告使用多种尺度的多种作物来提高准确度。


我不用bazel,直接跑脚本即可(bazel 用到build和workspace文件,可以看看对应的规则和依赖)

train结果:

######### float train ############

CUDA_VISIBLE_DEVICES=0 python nets/mobilenet_v1_train.py --dataset_dir "/home/fuhao/workspace/data/ILSVRC2012" --fine_tune_checkpoint "/tmp/checkpoints/mobilenet_v1_1.0_224.ckpt" --quantize=False --checkpoint_dir "/tmp/mobilenet/imagenet/float-train"

tensorboard --logdir=/tmp/mobilenet/imagenet/float-train

######### float train from zero ############

CUDA_VISIBLE_DEVICES=0 python nets/mobilenet_v1_train.py --dataset_dir "/home/fuhao/workspace/data/ILSVRC2012" --quantize=False --checkpoint_dir "/tmp/mobilenet/imagenet/float-train-zero"

tensorboard --logdir=/tmp/mobilenet/imagenet/float-train-zero

######## int8 train ###########

CUDA_VISIBLE_DEVICES=0 python nets/mobilenet_v1_train.py --dataset_dir "/home/fuhao/workspace/data/ILSVRC2012" --fine_tune_checkpoint "/tmp/checkpoints/mobilenet_v1_1.0_224_quant.ckpt" --quantize=True --checkpoint_dir "/tmp/mobilenet/imagenet/int8-train"

eval结果:

#########    float  eval   ##########

(tf17-2)[email protected]:~/workspace/projects/tf-models/research/slim$ CUDA_VISIBLE_DEVICES=0 python nets/mobilenet_v1_eval.py 

--dataset_dir "/home/fuhao/workspace/data/ILSVRC2012" 

--checkpoint_dir "/tmp/checkpoints/mobilenet_v1_1.0_224.ckpt" 

--eval_dir "/tmp/mobilenet/imagenet/eval-float"


eval/Recall_5[0.89988]
eval/Accuracy[0.7102]


######### quantization eval ##########

(tf17-2) [email protected]:~/workspace/projects/tf-models/research/slim$ CUDA_VISIBLE_DEVICES=0 python nets/mobilenet_v1_eval.py 

--dataset_dir "/home/fuhao/workspace/data/ILSVRC2012" 

--checkpoint_dir "/tmp/checkpoints/mobilenet_v1_1.0_224_quant.ckpt" 

--eval_dir "/tmp/mobilenet/imagenet/eval-22-true"  

--quantize=True

eval/Recall_5[0.89226]
eval/Accuracy[0.69914]


思路:

用TFLite 将模型全量化,然后可以看一下量化后的模型内容(怎么打开看tflite的内容?)

tflite文件怎么用?

可视化pb文件:

将pb转换成pbtxt:Tensorflow GraphDef pb 文件读和写 (binary format text format )

可视化tflite:

python tensorflow/contrib/lite/tools/visualize.py  /home/fuhao/workspace/projects/tf-models/research/slim/mobilenet/mobilenet_v1_1.0_224/mobilenet_v1_1.0_224.tflite /home/fuhao/workspace/projects/tf-models/research/slim/mobilenet/mobilenet_v1_1.0_224/mobilenet_v1_1.0_224.html

tensorboard可视化参数分布:

这个需要加一句 

三、参考资料

Tensorflow MobileNet移动端迁移学习指南2

https://blog.csdn.net/gubenpeiyuan/article/details/79671558

TensorFlow For Poets

https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#3

开发者指南

https://www.tensorflow.org/mobile/tflite/devguide


tensorflow接口研读math_ops(三)

https://www.jianshu.com/p/7fbce28e85a4

math_ops函数使用,本篇为Reduction函数,Scan,Segmention,比较序列和ID。



踩过的几个坑



https://github.com/tensorflow/tensorflow/blob/4952f981be07b8bf508f8226f83c10cdafa3f0c4/tensorflow/contrib/lite/toco/graph_transformations/quantize.cc#L171-L197

https://github.com/tensorflow/tensorflow/blob/4952f981be07b8bf508f8226f83c10cdafa3f0c4/tensorflow/contrib/lite/toco/graph_transformations/quantize.cc#L111

const MinMax& GetOrComputeMinMax(Model* model, const string& array_name) {const MinMax& GetOrComputeMinMax(Model* model, const string& array_name) {

https://github.com/tensorflow/tensorflow/blob/4952f981be07b8bf508f8226f83c10cdafa3f0c4/tensorflow/contrib/lite/toco/graph_transformations/quantize.cc#L111

https://github.com/tensorflow/tensorflow/blob/4952f981be07b8bf508f8226f83c10cdafa3f0c4/tensorflow/contrib/lite/toco/graph_transformations/quantize.cc#L

【Google量化】Mobilenet TensorFlow-Slimreduce_min:

https://www.tensorflow.org/api_docs/python/tf/reduce_min

tf.quantize:

https://www.tensorflow.org/api_docs/python/tf/quantize

https://github.com/tensorflow/tensorflow/blob/4952f981be07b8bf508f8226f83c10cdafa3f0c4/tensorflow/contrib/lite/toco/model.h#L1303

https://github.com/tensorflow/tensorflow/blob/4952f981be07b8bf508f8226f83c10cdafa3f0c4/tensorflow/contrib/lite/toco/model.h#L130

struct QuantizationParams {
  int32 zero_point = 0;
  double scale = 0.;
  };

相关文章: