分段错误（核心转储） - 从 SavedModel 使用 Tensorflow C++ API 进行推断答案

【问题标题】：Segmentation fault (core dumped) - Infering with Tensorflow C++ API from SavedModel分段错误（核心转储） - 从 SavedModel 使用 Tensorflow C++ API 进行推断
【发布时间】：2020-10-11 14:15:15
【问题描述】：

我正在使用 Tensorflow C++ API 加载 SavedModel 并运行推理。模型加载正常，但是当我运行推理时，出现以下错误：

$ ./bazel-bin/tensorflow/gan_loader/gan_loader
2020-06-21 19:29:18.669604: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /home/eduardo/Documents/GitHub/edualvarado/tensorflow/tensorflow/gan_loader/generator_model_final
2020-06-21 19:29:18.671368: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-06-21 19:29:18.671385: I tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /home/eduardo/Documents/GitHub/edualvarado/tensorflow/tensorflow/gan_loader/generator_model_final
2020-06-21 19:29:18.671474: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
2020-06-21 19:29:18.688557: I tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-06-21 19:29:18.707707: I tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /home/eduardo/Documents/GitHub/edualvarado/tensorflow/tensorflow/gan_loader/generator_model_final
2020-06-21 19:29:18.714949: I tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 45356 microseconds.
Segmentation fault (core dumped)

完整的infering.py 代码如下。最开始，评论你可以找到SavedModel的信息。

    /* INFO ABOUT SAVEDMODEL

The given SavedModel SignatureDef contains the following input(s):
  inputs['dense_1_input'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 100)
      name: serving_default_dense_1_input:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['conv2d_2'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 28, 28, 1)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
*/


#include <fstream>
#include <utility>
#include <vector>

#include "tensorflow/cc/ops/const_op.h"
#include "tensorflow/cc/ops/image_ops.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/graph.pb.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/graph/default_device.h"
#include "tensorflow/core/graph/graph_def_builder.h"
#include "tensorflow/core/lib/core/errors.h"
#include "tensorflow/core/lib/core/stringpiece.h"
#include "tensorflow/core/lib/core/threadpool.h"
#include "tensorflow/core/lib/io/path.h"
#include "tensorflow/core/lib/strings/str_util.h"
#include "tensorflow/core/lib/strings/stringprintf.h"
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/platform/init_main.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/public/session.h"
#include "tensorflow/core/util/command_line_flags.h"
#include "tensorflow/cc/saved_model/loader.h"
#include "tensorflow/cc/saved_model/tag_constants.h"

// These are all common classes it's handy to reference with no namespace.
using tensorflow::Flag;
using tensorflow::int32;
using tensorflow::Status;
using tensorflow::string;
using tensorflow::Tensor;
using tensorflow::tstring;


/*
TODO: Functions
*/
Tensor CreateLatentSpace(const int latent_dim, const int num_samples) {
  Tensor tensor(tensorflow::DT_FLOAT, tensorflow::TensorShape({num_samples, latent_dim}));
  
  auto tensor_mapped = tensor.tensor<float, 2>(); 
  for (int idx = 0; idx < tensor.dim_size(0); ++idx) {
    for (int i = 0; i < tensor.dim_size(1); ++i) {
      tensor_mapped(idx, i) = drand48() - 0.5;
    }
  }
  return tensor;
}

int main(int argc, char* argv[]) {
  // These are the command-line flags the program can understand.
  // They define where the graph and input data is located, and what kind of
  // input the model expects. 
 
  // To create latent space
  int32 latent_dim = 100;
  int32 samples_per_row = 5;
  int32 num_samples = 25;
  
  // Input/Output names
  string input_layer = "serving_default_dense_1_input";
  string output_layer = "StatefulPartitionedCall";


  // Arguments
  std::vector<Flag> flag_list = {
      Flag("latent_dim", &latent_dim, "latent dimensions"),
      Flag("samples_per_row", &samples_per_row, "samples per row"),
      Flag("num_samples", &num_samples, "number of samples"),
      Flag("input_layer", &input_layer, "name of input layer"),
      Flag("output_layer", &output_layer, "name of output layer"),
  };
  string usage = tensorflow::Flags::Usage(argv[0], flag_list);
  const bool parse_result = tensorflow::Flags::Parse(&argc, argv, flag_list);
  if (!parse_result) {
    LOG(ERROR) << usage;
    return -1;
  }

  // We need to call this to set up global state for TensorFlow.
  tensorflow::port::InitMain(argv[0], &argc, &argv);
  if (argc > 1) {
    LOG(ERROR) << "Unknown argument " << argv[1] << "\n" << usage;
    return -1;
  }

  // TODO: First we load and initialize the model.
  std::unique_ptr<tensorflow::Session> session;
  tensorflow::SavedModelBundle model;
  tensorflow::SessionOptions session_options;
  tensorflow::RunOptions run_options;

  const string export_dir = "/home/eduardo/Documents/GitHub/edualvarado/tensorflow/tensorflow/gan_loader/generator_model_final";
  const std::unordered_set<std::string> tags = {"serve"};         

  auto load_graph_status = tensorflow::LoadSavedModel(session_options, run_options, export_dir, tags, &model);
  if (!load_graph_status.ok()) {
    std::cerr << "Failed: " << load_graph_status;
    return -1;
  }

  // TODO: Create latent space
  auto latent_space_tensor = CreateLatentSpace(100, 1);


  // TODO: Run the latent space through the model
  std::vector<Tensor> outputs;
  Status run_status = session->Run({{input_layer, latent_space_tensor}},
                                   {output_layer}, {}, &outputs);

  if (!run_status.ok()) {
    LOG(ERROR) << "Running model failed: " << run_status;
    return -1;
  }
  
  // TODO: Save the figure


  return 0;
}

我想我几乎已经尝试了所有方法，但遗憾的是没有太多关于 C++ API 的文档。您能否为我提供一些指导，为什么会发生这种情况？

非常感谢。

操作系统环境：

Ubuntu 18.04。
张量流 2.2.0
巴泽尔 2.0.0

【问题讨论】：

标签： c++ tensorflow tensorflow-serving

【解决方案1】：

在代码 sn-p 中，sessionptr 在调用 run(..) 之前没有被初始化。

std::unique_ptr<tensorflow::Session> session;
Status run_status = session->Run({{input_layer, latent_space_tensor}},
                               {output_layer}, {}, &outputs);

尝试在调用 run(..) 之前初始化 session，这将解决问题。

初始化会话的一种方法是

std::unique_ptr<tensorflow::Session> session = make_unique<tensorflow::Session>()

这会调用tensorflow::Session 的默认构造函数，现在您的 ptr 指向构造的对象，并在 ptr 超出范围时管理它的释放。

【讨论】：

嘿，请指点我初始化。我无法从上面的代码中弄清楚初始化。
TODO: First we load and initialize the model之后，就清楚了。
我认为这里有些混乱：std::unique_ptr<tensorflow::Session> session; - 这意味着您只是声明了一个类型为 tensorflow::Session 的唯一指针，它不指向任何明确的内存位置，因为您尚未初始化它对任何东西。此声明之后的代码似乎都没有在调用session->run(..) 之前分配会话。在上面的答案中添加了更多信息。希望这会有所帮助。