具有提升变体的静态多态性单访问者与多访问者与动态多态性答案

【问题标题】：Static Polymorphism with boost variant single visitor vs multi visitor vs dynamic polymorphism具有提升变体的静态多态性单访问者与多访问者与动态多态性
【发布时间】：2016-09-10 01:54:50
【问题描述】：

我正在比较以下C++多态方法的性能：

方法 [1]。使用 boost 变体的静态多态性，每个方法都有一个单独的访问者方法[2]。使用带有单个访问者的 boost 变体的静态多态性，该访问者使用方法重载调用不同的方法方法[3]。普通的旧动态多态性

平台： - Intel x86 64 位 Red Hat 现代多核处理器，32 GB RAM - gcc (GCC) 4.8.1 与 -O2 优化 - 提升 1.6.0

一些发现：

方法 [1] 的性能似乎明显优于方法 [2] 和 [3]
方法 [3] 在大多数情况下都优于方法 [2]

我的问题是，为什么我使用访问者但使用方法重载调用正确方法的方法 [2] 的性能比虚拟方法差。我希望静态多态性比动态多态性更好。我知道在方法 [2] 中传递的额外参数需要花费一些成本来确定要调用的类的哪个 visit() 方法，并且由于方法重载可能会产生更多的分支？但这难道不应该胜过虚拟方法吗？

代码如下：

// qcpptest.hpp

#ifndef INCLUDED_QCPPTEST_H
#define INCLUDED_QCPPTEST_H

#include <boost/variant.hpp>

class IShape {
 public:
  virtual void rotate() = 0;
  virtual void spin() = 0;
};

class Square : public IShape {
 public:
  void rotate() {
   // std::cout << "Square:I am rotating" << std::endl;
    }
  void spin() { 
    // std::cout << "Square:I am spinning" << std::endl; 
  }
};

class Circle : public IShape {
 public:
  void rotate() { 
    // std::cout << "Circle:I am rotating" << std::endl; 
  }
  void spin() {
   // std::cout << "Circle:I am spinning" << std::endl; 
}
};

// template variation

// enum class M {ADD, DEL};
struct ADD {};
struct DEL {};

class TSquare {
    int i;
 public:
    void visit(const ADD& add) {
        this->i++;
    // std::cout << "TSquare:I am rotating" << std::endl;
  }
    void visit(const DEL& del) {
        this->i++;
    // std::cout << "TSquare:I am spinning" << std::endl;
  }

    void spin() {
        this->i++;
     // std::cout << "TSquare:I am rotating" << std::endl; 
 }
    void rotate() {
        this->i++;
     // std::cout << "TSquare:I am spinning" << std::endl; 
 }
};

class TCircle {
    int i;
 public:
    void visit(const ADD& add) {
        this->i++;
    // std::cout << "TCircle:I am rotating" << std::endl;
  }
    void visit(const DEL& del) {
        this->i++;
    // std::cout << "TCircle:I am spinning" << std::endl;
  }
    void spin() { 
        this->i++;
        // std::cout << "TSquare:I am rotating" << std::endl; 
    }
    void rotate() {
    this->i++; 
        // std::cout << "TSquare:I am spinning" << std::endl; 
    }
};

class MultiVisitor : public boost::static_visitor<void> {
 public:
  template <typename T, typename U>

    void operator()(T& t, const U& u) {
    // std::cout << "visit" << std::endl;
    t.visit(u);
  }
};

// separate visitors, single dispatch

class RotateVisitor : public boost::static_visitor<void> {
 public:
  template <class T>
  void operator()(T& x) {
    x.rotate();
  }
};

class SpinVisitor : public boost::static_visitor<void> {
 public:
  template <class T>
  void operator()(T& x) {
    x.spin();
  }
};

#endif

// qcpptest.cpp

#include <iostream>
#include "qcpptest.hpp"
#include <vector>
#include <boost/chrono.hpp>

using MV = boost::variant<ADD, DEL>;
// MV const add = M::ADD;
// MV const del = M::DEL;
static MV const add = ADD();
static MV const del = DEL();

void make_virtual_shapes(int iters) {
  // std::cout << "make_virtual_shapes" << std::endl;
  std::vector<IShape*> shapes;
  shapes.push_back(new Square());
  shapes.push_back(new Circle());

  boost::chrono::high_resolution_clock::time_point start =
      boost::chrono::high_resolution_clock::now();

  for (int i = 0; i < iters; i++) {
    for (IShape* shape : shapes) {
      shape->rotate();
      shape->spin();
    }
  }

  boost::chrono::nanoseconds nanos =
      boost::chrono::high_resolution_clock::now() - start;
  std::cout << "make_virtual_shapes took " << nanos.count() * 1e-6
            << " millis\n";
}

void make_template_shapes(int iters) {
  // std::cout << "make_template_shapes" << std::endl;
  using TShapes = boost::variant<TSquare, TCircle>;
  // using MV = boost::variant< M >;

  // xyz
  std::vector<TShapes> tshapes;
  tshapes.push_back(TSquare());
  tshapes.push_back(TCircle());
  MultiVisitor mv;

  boost::chrono::high_resolution_clock::time_point start =
      boost::chrono::high_resolution_clock::now();

  for (int i = 0; i < iters; i++) {
    for (TShapes& shape : tshapes) {
      boost::apply_visitor(mv, shape, add);
      boost::apply_visitor(mv, shape, del);
      // boost::apply_visitor(sv, shape);
    }
  }
  boost::chrono::nanoseconds nanos =
      boost::chrono::high_resolution_clock::now() - start;
  std::cout << "make_template_shapes took " << nanos.count() * 1e-6
            << " millis\n";
}

void make_template_shapes_single(int iters) {
  // std::cout << "make_template_shapes_single" << std::endl;
  using TShapes = boost::variant<TSquare, TCircle>;
  // xyz
  std::vector<TShapes> tshapes;
  tshapes.push_back(TSquare());
  tshapes.push_back(TCircle());
  SpinVisitor sv;
  RotateVisitor rv;

  boost::chrono::high_resolution_clock::time_point start =
      boost::chrono::high_resolution_clock::now();

  for (int i = 0; i < iters; i++) {
    for (TShapes& shape : tshapes) {
      boost::apply_visitor(rv, shape);
      boost::apply_visitor(sv, shape);
    }
  }
  boost::chrono::nanoseconds nanos =
      boost::chrono::high_resolution_clock::now() - start;
  std::cout << "make_template_shapes_single took " << nanos.count() * 1e-6
            << " millis\n";
}

int main(int argc, const char* argv[]) {
  std::cout << "Hello, cmake" << std::endl;

  int iters = atoi(argv[1]);

  make_virtual_shapes(iters);
  make_template_shapes(iters);
  make_template_shapes_single(iters);

  return 0;
}

【问题讨论】：

这个程序在使用-O3 编译时会出现段错误。你确定你的逻辑是正确的？
只有在没有提供 argv[1] 时才会出现段错误 :)
是的，您需要提供一个参数，例如 10 或 1000 或 1000000。这就是它运行循环的次数。

标签： c++ performance templates boost polymorphism

【解决方案1】：

方法 2 基本上是低效地重新实现动态调度。当你有：

shape->rotate();
shape->spin();

这涉及在 vtable 中查找正确的函数并调用它。该查找的低效率。但是当你有：

boost::apply_visitor(mv, shape, add);

这大致分解为（假设 add<> 成员函数模板只是一个 reinterpret_cast 没有检查）：

if (shape.which() == 0) {
    if (add.which() == 0) {
        mv(shape.as<TSquare&>(), add.as<ADD&>());
    }
    else if (add.which() == 1) {
        mv(shape.as<TSquare&>(), add.as<DEL&>());
    }
    else {
        // ???
    }
}
else if (shape.which() == 1) {
    if (add.which() == 0) {
        mv(shape.as<TCircle&>(), add.as<ADD&>());
    }
    else if (add.which() == 1) {
        mv(shape.as<TCircle&>(), add.as<DEL&>());
    }
    else {
        // ???
    }
}
else {
   // ???
}

在这里，我们有一个分支的组合爆炸（我们在方法 1 中不必这样做），但我们实际上必须检查每个变体的每个可能的静态类型，以确定我们必须做什么（我们没有'不必在方法3中做）。而且这些分支将无法预测，因为您每次都采用不同的分支，因此您无法在不急停的情况下流水线化任何类型的代码。

mv() 上的重载是免费的 - 只是弄清楚我们称之为 mv 的内容不是。还要注意基于更改两个轴中的任何一个会发生的增量时间：

+---------------+----------------+----------------+----------+
|               |    Method 1    |    Method 2    | Method 3 |
+---------------+----------------+----------------+----------+
|    New Type   | More Expensive | More Expensive |   Free   |
| New Operation |      Free      | More Expensive |   Free*  |
+---------------+----------------+----------------+----------+

方法 1 在添加新类型时变得更加昂贵，因为我们必须显式地迭代所有类型。添加新操作是免费的，因为操作是什么并不重要。

方法 3 可以随意添加新类型，也可以随意添加新操作——唯一的变化是增加了 vtable。由于对象大小，这会产生一些影响，但通常会小于增加的类型迭代。

【讨论】：

谢谢。问题： 1. 方法 1 的 apply_visitor() 中是否也有一个检查，尽管检查访问者被调用的形状是一个单一的级别？这意味着方法 2 有“1 个附加检查”：所以方法 1： if (shape.which() == 0) { ... } else if(shape.which() == 1) { ... } 方法 2 : - 正如你所描述的，所以 2 个检查而不是 1 个。所以一个额外的“如果”条件那么昂贵？
我的意思是，如果有两种以上的类型，我可以理解可能有多个比较，但在我的示例中只有 2 种类型，因此方法 2 的比较结果应该比方法多“1” 2. 对性能的影响似乎特别大。
@Sid 不是两种类型，两种变体。你没有更多的比较，你有更多的嵌套比较。这不仅仅是在方法 1 变体中再添加一种类型。
我知道这是一个嵌套比较，但在我的示例中，即使是嵌套比较也只会导致一次额外的 CMP 操作，对吗？由于第二个变体（以及第二个嵌套循环）只有两种类型。
@Sid 也有跳跃。代码不可预测性很糟糕。