与 Tensorflow 模型推理相比，CPU 上的 Tensorflow 模型服务器的性能问题答案

【问题标题】：Performance issue with Tensorflow model server on CPU in comparison with Tensorflow model inference与 Tensorflow 模型推理相比，CPU 上的 Tensorflow 模型服务器的性能问题
【发布时间】：2017-11-27 13:41:57
【问题描述】：

我观察到 CPU 与 Tensorflow 模型服务器的性能问题。与原始 Tensorflow 模型推理相比，它的推理时间加倍。两者均使用 MKL 构建，仅用于 CPU。

要重现的代码：https://github.com/BogdanRuzh/tf_model_service_benchmark

Tensorflow MKL 构建： bazel build --config=mkl -c opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 //tensorflow/tools/pip_package:build_pip_package

Tensorflow 服务器 MKL 构建： bazel build --config=mkl --config=opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 tensorflow_serving/model_servers:tensorflow_model_server

目标模型是用于分割的简单 CNN。

原始 Tensorflow 模型在 0.17 秒内处理一张图像。 TensorFlow 模型服务器在 0.32 秒内处理相同的图像。

如何提高这种性能？这对我的应用非常重要。

【问题讨论】：

标签： python performance tensorflow deep-learning tensorflow-serving

【解决方案1】：

我想explonation会帮助你。据说配置不好的带有 Intel 优化的 tensorflow 性能可能会比 clear build 更差https://github.com/tensorflow/serving/issues/1272#issuecomment-477878180

您可以尝试配置批处理参数（带有配置文件和--enable_batching参数）https://github.com/tensorflow/serving/tree/master/tensorflow_serving/batching

并设置 (inter/intra)_op_parallelism_threads。

此外，MKL 有自己的标志来提高性能https://www.tensorflow.org/guide/performance/overview#tuning_mkl_for_the_best_performance

【讨论】：