以非多态方式调用虚函数的成本是多少？答案

【问题标题】：Whats the cost of calling a virtual function in a non-polymorphic way?以非多态方式调用虚函数的成本是多少？
【发布时间】：2013-02-02 02:12:27
【问题描述】：

我有一个纯抽象基类和两个派生类：

struct B { virtual void foo() = 0; };
struct D1 : B { void foo() override { cout << "D1::foo()" << endl; } };
struct D2 : B { void foo() override { cout << "D1::foo()" << endl; } };

在 A 点调用 foo 是否与调用非虚拟成员函数的成本相同？或者它是否比 D1 和 D2 不是从 B 派生的更昂贵？

int main() {
 D1 d1; D2 d2; 
 std::vector<B*> v = { &d1, &d2 };

 d1.foo(); d2.foo(); // Point A (polymorphism not necessary)
 for(auto&& i : v) i->foo(); // Polymorphism necessary.

 return 0;
}

答案： Andy Prowl 的答案是正确的答案，我只是想添加 gcc 的汇编输出（在 godbolt 中测试：gcc- 4.7 -O2 -march=native -std=c++11)。直接函数调用的成本是：

mov rdi, rsp
call    D1::foo()
mov rdi, rbp
call    D2::foo()

对于多态调用：

mov rdi, QWORD PTR [rbx]
mov rax, QWORD PTR [rdi]
call    [QWORD PTR [rax]]
mov rdi, QWORD PTR [rbx+8]
mov rax, QWORD PTR [rdi]
call    [QWORD PTR [rax]]

但是，如果对象不是从B 派生的，而您只是执行直接调用，gcc 将内联函数调用：

mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:std::cout
call    std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)

如果D1 和D2 不是从B 派生的，这可以进一步优化，所以我猜不，它们不等效（在至少对于具有这些优化的这个版本的 gcc，-O3 在没有内联的情况下产生了类似的输出）。在D1 和D2 确实派生自B 的情况下，是否存在阻止编译器内联的东西？

“修复”：使用委托（也就是自己重新实现虚函数）：

struct DG { // Delegate
 std::function<void(void)> foo;
 template<class C> DG(C&& c) { foo = [&](void){c.foo();}; }
};

然后创建一个代表向量：

std::vector<DG> v = { d1, d2 };

如果您以非多态方式访问方法，这允许内联。但是，我猜访问向量会比只使用虚函数（还不能用 Godbolt 测试）要慢（或者至少一样快，因为std::function 使用虚函数进行类型擦除）。

【问题讨论】：

如果D1 和D2 是从B 派生的直接调用，编译器没有理由不能内联调用。
你无法计算这些指令集的差异。
没有什么能阻止编译器内联D1::foo()、D2::foo()。这是一些GCC 4.7 及以上的故障。 GCC 4.5 内联这个没有问题。 clang 3.4.1 也内联了这个。
它仍然因 gcc-4.9 (tip-of-trunk) -O3 -march=native -DNDEBUG 而失败（请参阅此处的代码和程序集：goo.gl/NKm3Uz）。它应该内联这些调用，因为我们只有一个 TU。在更复杂的程序中，除非您使用final，否则即使使用 LTO 也很难内联这些，因为您始终可以创建一个新的 TU，您可以在其中从一个类派生（动态库也可以这样做） . IIRC Herb Sutter 将这个问题描述为“通过虚拟继承，你需要为无限的可扩展性付出代价”，这是有代价的。
此外，通过虚拟继承，接口（或所有可能的接口，除非您使用适配器模式）与对象一起放入 vtable 中，并且此 vtable 可以变得很大。委托提供了更小的接口（和 vtable），这提高了循环中的缓存使用率。

标签： c++ performance virtual-functions

【解决方案1】：

在 A 点调用 foo 是否与调用非虚拟成员函数的成本相同？

是的。

或者它是否比 D1 和 D2 不是从 B 派生的更昂贵？

没有。

编译器将静态解析这些函数调用，因为它们不是通过指针或引用执行的。由于调用函数的对象的类型在编译时是已知的，因此编译器知道必须调用 foo() 的哪个实现。

【讨论】：

查看答案中的汇编代码。应该，但实际上它是t cause the compiler wont inline。
@gnzlbg：您是否尝试过更重的优化，例如 -O3？我看不出是什么阻止了编译器内联这些调用。
@LokiAstari 如果您调用的函数几乎不起作用并且是您的应用程序的热点，它会有所作为。有时，此功能旨在透明，您依赖它们内联。即使它们没有被多态使用，它们也不能被内联的事实至少在 IMO 很有趣。在紧密循环中，函数被内联的事实可能是循环展开与否之间的区别。仍然有很多罐头，可能，如果......
@LokiAstari 真的吗？我认为这里发生的事情非常违反直觉。在我的应用程序的一个小的非关键部分接受多态类型的成本并不意味着我在我知道编译器确切知道我正在调用哪种类型的函数的地方接受成本。这是错误的。我宁愿使用委托并在非关键部分付出更多，而不是继承并在任何地方付出代价。这是一个非常重要的继承与基于委托的多态性决策。
@gnzlbg 委托实际上与 vtable 执行相同的操作 - 它是对由指针表示的函数的间接调用。如果您需要在应用程序中进行如此极端的性能调整，那么您应该使用编译时多态性。

【解决方案2】：

最简单的解决方案是查看编译器的内部结构。在 Clang 中，我们在 lib/CodeGen/CGClass.cpp 中找到 canDevirtualizeMemberFunctionCall：

/// canDevirtualizeMemberFunctionCall - Checks whether the given virtual member
/// function call on the given expr can be devirtualized.
static bool canDevirtualizeMemberFunctionCall(const Expr *Base, 
                                              const CXXMethodDecl *MD) {
  // If the most derived class is marked final, we know that no subclass can
  // override this member function and so we can devirtualize it. For example:
  //
  // struct A { virtual void f(); }
  // struct B final : A { };
  //
  // void f(B *b) {
  //   b->f();
  // }
  //
  const CXXRecordDecl *MostDerivedClassDecl = getMostDerivedClassDecl(Base);
  if (MostDerivedClassDecl->hasAttr<FinalAttr>())
    return true;

  // If the member function is marked 'final', we know that it can't be
  // overridden and can therefore devirtualize it.
  if (MD->hasAttr<FinalAttr>())
    return true;

  // Similarly, if the class itself is marked 'final' it can't be overridden
  // and we can therefore devirtualize the member function call.
  if (MD->getParent()->hasAttr<FinalAttr>())
    return true;

  Base = skipNoOpCastsAndParens(Base);
  if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(Base)) {
    if (const VarDecl *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
      // This is a record decl. We know the type and can devirtualize it.
      return VD->getType()->isRecordType();
    }

    return false;
  }

  // We can always devirtualize calls on temporary object expressions.
  if (isa<CXXConstructExpr>(Base))
    return true;

  // And calls on bound temporaries.
  if (isa<CXXBindTemporaryExpr>(Base))
    return true;

  // Check if this is a call expr that returns a record type.
  if (const CallExpr *CE = dyn_cast<CallExpr>(Base))
    return CE->getCallReturnType()->isRecordType();

  // We can't devirtualize the call.
  return false;
}

我相信代码（和随附的 cmets）是不言自明的 :)

【讨论】：

那么如果成员函数/类是final但没有标记为final，那么调用不会被去虚拟化？（并且标记的意思是我自己或编译器本身）。
@gnzlbg：比这要复杂一些。基本上，这里的目标是确定final overrider。如果它被标记为final，那么你就知道它是；如果不是，那么您仍然可以解除虚拟化调用，前提是您能够静态地确定对象的动态类型，例如在int main() { Derived d; Base& b = d; b.foo(); } 中b 是“显然”对Derived 实例的引用。