修改多个输入的 Caffe C++ 预测代码答案

【问题标题】：Modifying the Caffe C++ prediction code for multiple inputs修改多个输入的 Caffe C++ 预测代码
【发布时间】：2015-12-16 14:06:13
【问题描述】：

我实现了Caffe C++ example 的修改版本，虽然它工作得非常好，但速度非常慢，因为它只接受一张一张的图像。理想情况下，我想向 Caffe 传递一个包含 200 个图像的向量并返回每个图像的最佳预测。我收到了一些great help from Fanglin Wang 并实施了他的一些建议，但在弄清楚如何从每张图像中检索最佳结果时仍然遇到了一些麻烦。

Classify 方法现在传递了一个cv::Mat 对象的向量（变量input_channels），它是一个灰度浮点图像的向量。我已经消除了代码中的预处理方法，因为我不需要将这些图像转换为浮点数或减去平均图像。我也一直在尝试摆脱 N 变量，因为我只想返回每个图像的最高预测和概率。

#include "Classifier.h"
using namespace caffe;
using std::string;

Classifier::Classifier(const string& model_file, const string& trained_file, const string& label_file) {
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  Blob<float>* input_layer = net_->input_blobs()[0];
  num_channels_ = input_layer->channels();
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  CHECK(labels) << "Unable to open labels file " << label_file;
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  Blob<float>* output_layer = net_->output_blobs()[0];
  CHECK_EQ(labels_.size(), output_layer->channels())
    << "Number of labels is different from the output layer dimension.";
}

static bool PairCompare(const std::pair<float, int>& lhs, const std::pair<float, int>& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
  std::vector<std::pair<float, int> > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], i));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector<int> result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

/* Return the top N predictions. */
std::vector<Prediction> Classifier::Classify(const std::vector<cv::Mat> &input_channels) {
  std::vector<float> output = Predict(input_channels);

    std::vector<int> maxN = Argmax(output, 1);
    int idx = maxN[0];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
    return predictions;
}

std::vector<float> Classifier::Predict(const std::vector<cv::Mat> &input_channels, int num_images) {
  Blob<float>* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(num_images, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  WrapInputLayer(&input_channels);

  net_->ForwardPrefilled();

  /* Copy the output layer to a std::vector */
  Blob<float>* output_layer = net_->output_blobs()[0];
  const float* begin = output_layer->cpu_data();
  const float* end = begin + num_images * output_layer->channels();
  return std::vector<float>(begin, end);
}

/* Wrap the input layer of the network in separate cv::Mat objects (one per channel). This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D. The last preprocessing operation will write the separate channels directly to the input layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {
  Blob<float>* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels() * num_images; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

更新

非常感谢您的帮助 Shai，我进行了您推荐的更改，但似乎遇到了一些我无法解决的奇怪编译问题（我设法解决了一些问题）。

这些是我所做的更改：

头文件：

#ifndef __CLASSIFIER_H__
#define __CLASSIFIER_H__

#include <caffe/caffe.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <algorithm>
#include <iosfwd>
#include <memory>
#include <string>
#include <utility>
#include <vector>


using namespace caffe;  // NOLINT(build/namespaces)
using std::string;

/* Pair (label, confidence) representing a prediction. */
typedef std::pair<string, float> Prediction;

class Classifier {
 public:
  Classifier(const string& model_file,
             const string& trained_file,
             const string& label_file);

  std::vector< std::pair<int,float> > Classify(const std::vector<cv::Mat>& img);

 private:

  std::vector< std::vector<float> > Predict(const std::vector<cv::Mat>& img, int nImages);

  void WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages);

  void Preprocess(const std::vector<cv::Mat>& img,
                  std::vector<cv::Mat>* input_channels, int nImages);

 private:
  shared_ptr<Net<float> > net_;
  cv::Size input_geometry_;
  int num_channels_;
  std::vector<string> labels_;
};

#endif /* __CLASSIFIER_H__ */

类文件：

#define CPU_ONLY
#include "Classifier.h"

using namespace caffe;  // NOLINT(build/namespaces)
using std::string;

Classifier::Classifier(const string& model_file,
                       const string& trained_file,
                       const string& label_file) {
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
  CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";

  Blob<float>* input_layer = net_->input_blobs()[0];
  num_channels_ = input_layer->channels();
  CHECK(num_channels_ == 3 || num_channels_ == 1)
    << "Input layer should have 1 or 3 channels.";
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  CHECK(labels) << "Unable to open labels file " << label_file;
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  Blob<float>* output_layer = net_->output_blobs()[0];
  CHECK_EQ(labels_.size(), output_layer->channels())
    << "Number of labels is different from the output layer dimension.";
}

static bool PairCompare(const std::pair<float, int>& lhs,
                        const std::pair<float, int>& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
  std::vector<std::pair<float, int> > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], i));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector<int> result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

std::vector< std::pair<int,float> > Classifier::Classify(const std::vector<cv::Mat>& img) {
  std::vector< std::vector<float> > output = Predict(img, img.size());

  std::vector< std::pair<int,float> > predictions;
  for ( int i = 0 ; i < output.size(); i++ ) {
    std::vector<int> maxN = Argmax(output[i], 1);
    int idx = maxN[0];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
  }
  return predictions;
}

std::vector< std::vector<float> > Classifier::Predict(const std::vector<cv::Mat>& img, int nImages) {
  Blob<float>* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(nImages, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  std::vector<cv::Mat> input_channels;
  WrapInputLayer(&input_channels, nImages);

  Preprocess(img, &input_channels, nImages);

  net_->ForwardPrefilled();

  /* Copy the output layer to a std::vector */

  Blob<float>* output_layer = net_->output_blobs()[0];
  std::vector <std::vector<float> > ret;
  for (int i = 0; i < nImages; i++) {
    const float* begin = output_layer->cpu_data() + i*output_layer->channels();
    const float* end = begin + output_layer->channels();
    ret.push_back( std::vector<float>(begin, end) );
  }
  return ret;
}

/* Wrap the input layer of the network in separate cv::Mat objects
 * (one per channel). This way we save one memcpy operation and we
 * don't need to rely on cudaMemcpy2D. The last preprocessing
 * operation will write the separate channels directly to the input
 * layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages) {
  Blob<float>* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels()* nImages; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

void Classifier::Preprocess(const std::vector<cv::Mat>& img,
                            std::vector<cv::Mat>* input_channels, int nImages) {
  for (int i = 0; i < nImages; i++) {
      vector<cv::Mat> channels;
      cv::split(img[i], channels);
      for (int j = 0; j < channels.size(); j++){
           channels[j].copyTo((*input_channels)[i*num_channels_[0]+j]);
      }
  }
}

【问题讨论】：

您能简单描述一下您的修改吗？谢谢。
下面的答案（使用 cmets）是正确的。但是，在您的预处理步骤中，您需要 (i) 将图像格式转换为网络输入格式； (ii) 如果给定图像不同，则将其大小调整为 input_geometry_； (iii) 减去您需要从文件 imagenet_mean.binaryproto 加载的图像均值。然后，您可以将图像拆分为单独的基于通道的图像平面。

标签： c++ machine-learning neural-network deep-learning caffe

【解决方案1】：

不幸的是，我认为网络前向传递的并行化尚未实现。但是，如果您愿意，您可以简单地实现自己的包装器，以并行地通过网络副本重复运行数据？

看看How many images can you pass to Caffe at a time?

在链接的 prototxt 中你需要定义的是

input_shape {
  dim: 64 // num of images
  dim: 1
  dim: 28 // height
  dim: 28 // width
}

现有的实现评估一批 64 张图像，但不一定是并行的。但是，如果在 GPU 上运行，处理 64 个批处理将比 64 个单图像批处理快。

【讨论】：

感谢 Aiden 的帮助，所以我不可能将与向量等效的 blob 传递给一个块并从网络中接收一个预测向量吗？
@JackSimpson 将图像的数量指定为第一个 blob dim 的处理方式与单个图像 blob 的向量相同。
@JackSimpson：ypx 是正确的，一个向量中的 64 个 blob 和一个 num 维度为 64 的 blob 是等价的。

【解决方案2】：

如果我正确理解您的问题，您输入n 图像，期望n 对(label, prob)，但只得到一个这样的对。

我相信这些修改应该对你有用：

Classifier::Predict 应该返回一个vector< vector<float> >，即每个输入图像的概率的向量。那是大小为n 的向量的大小为output_layer->channels() 的vector：

std::vector< std::vecot<float> > 
Classifier::Predict(const std::vector<cv::Mat> &input_channels, 
                    int num_images) {
  // same code here...

  /* changes here: Copy the output layer to a std::vector */
  Blob<float>* output_layer = net_->output_blobs()[0];
  std::vector< std::vector<float> > ret;
  for ( int i = 0 ; i < num_images ; i++ ) {
      const float* begin = output_layer->cpu_data() + i*output_layer->channels();
      const float* end = begin + output_layer->channels();
      ret.push_back( std::vector<float>(begin, end) );
  }
  return ret;
}

在Classifier::Classify 中，您需要独立处理每个vector<float> 到Argmax：

 std::vector< std::pair<int,float> > 
 Classifier::Classify(const std::vector<cv::Mat> &input_channels) {

   std::vector< std::vector<float> > output = Predict(input_channels);

   std::vector< std::pair<int,float> > predictions;
   for ( int i = 0 ; i < output.size(); i++ ) {
       std::vector<int> maxN = Argmax(output[i], 1);
       int idx = maxN[0];
       predictions.push_back(std::make_pair(labels_[idx], output[idx]));
   }
   return predictions;
 }

【讨论】：

嗨，Shai，非常感谢您的帮助！我听从了你的建议，但似乎在编译它时遇到了一些问题。我用我所做的修改更新了问题，很抱歉占用了您更多的时间，但您认为您可以看一下吗？
嗨 Shai，我做了一些重写，现在编译时似乎出现了这个错误，我无法解决如何修复：candidate function not viable: no known conversion from 'pair<typename __make_pair_return<basic_string<char> &>::type, typename __make_pair_return<vector<float, allocator<float> > &>::type>' to 'const pair<int, float>' for 1st argument _LIBCPP_INLINE_VISIBILITY void push_back(const_reference __x);
很好，我问了一个人，他们问我是要返回一个浮点数还是一组浮点数，所以我将其更改为 output[i][idx] 修复了一些问题：)
@JackSimpson 请不要将代码放入 cmets。目前还不清楚你的问题是什么。如果问题与此线程足够遥远，请提出新问题。
您的num_channels_ 在头文件中定义为int 而不是数组/指针/向量，因此编写num_channels_[0] 是语法错误。尝试仅将其替换为 num_channels_...