在编译时已知成员时在函数分支删除中答案

【问题标题】：In function branch removal when a member is known at compile-time在编译时已知成员时在函数分支删除中
【发布时间】：2016-01-17 15:43:30
【问题描述】：

考虑以下代码：

// Class definition
class myclass
{
    public:
    constexpr myclass() noexcept: _value{0}, _option{true} {}
    constexpr myclass(int value) noexcept: _value{value}, _option{true} {}
    constexpr myclass(int value, bool option) noexcept: _value{value}, _option{option} {}
    constexpr int get_value() const noexcept {return _value;}
    constexpr int get_option() const noexcept {return _option;}
    private:
    int _value;
    bool _option;
};

// Some function that should be super-optimized
int f(myclass x, myclass y) 
{
    if (x.get_option() && y.get_option()) {
        return x.get_value() + y.get_value();
    } else {
        return x.get_value() * y.get_value();
    }
}

我的问题是以下内容：在这种模式下，编译器通常能够避免在编译时的选项时避免测试，例如，当a和b整数时（在这种情况下，调用隐式单参数构造函数，option 始终为真）？当我说“一般”时，我的意思是在复杂的现实世界程序中，但是 f 被两个 int 调用。

【问题讨论】：

标签： c++ c++11 constructor compiler-optimization constexpr

【解决方案1】：

简短的回答是“取决于”。这取决于很多事情，包括代码的复杂性、使用的编译器等。

一般来说，常量传播（换句话说，“将作为常量传递给函数的东西转换成常量本身”对于编译器来说并不是一件非常困难的事情。Clang/LLVM 在编译的早期就这样做了在生成 LLVM-IR（“中间表示”，从源代码构建的代码层，不代表实际机器代码）。其他编译器也将具有类似的构造，既使用 IR，也通过跟踪与非常量值分开的常量。

因此，假设编译器可以“跟随”代码（例如，如果 f 和对 f 的调用位于不同的源文件中，则不太可能得到优化）。

当然，如果您想确定您的特定编译器对您的特定代码做了什么，则必须检查编译器生成的代码。

// Class definition
class myclass
{
    public:
    constexpr myclass() noexcept: _value{0}, _option{true} {}
    constexpr myclass(int value) noexcept: _value{value}, _option{true} {}
    constexpr myclass(int value, bool option) noexcept: _value{value}, _option{option} {}
    constexpr int get_value() const noexcept {return _value;}
    constexpr int get_option() const noexcept {return _option;}
    private:
    int _value;
    bool _option;
};

// Some function that should be super-optimized
int f(myclass x, myclass y) 
{
    if (x.get_option() && y.get_option()) {
        return x.get_value() + y.get_value();
    } else {
        return x.get_value() * y.get_value();
    }
}

int main()
{
    myclass a;
    myclass b(1);
    myclass c(2, false);

    int x = f(a, b);
    int y = f(b, c);

    return x + y;
}

这将生成与以下内容相同的代码：

int main()
{
    return 3;
}

但是，如果我们将代码更改为：

#include "myclass.h"

extern int f(myclass x, myclass y);

int main()
{
    myclass a;
    myclass b(1);
    myclass c(2, false);

    int x = f(a, b);
    int y = f(b, c);

    return x + y;
}

并在单独的文件中声明f（使用-O2优化），生成的代码是

define i32 @_Z1f7myclassS_(i64 %x.coerce, i64 %y.coerce) #0 {
entry:
  %x.sroa.0.0.extract.trunc = trunc i64 %x.coerce to i32
  %y.sroa.0.0.extract.trunc = trunc i64 %y.coerce to i32
  %conv.i = and i64 %x.coerce, 1095216660480
  %tobool = icmp eq i64 %conv.i, 0
  %conv.i12 = and i64 %y.coerce, 1095216660480
  %tobool2 = icmp eq i64 %conv.i12, 0
  %or.cond = or i1 %tobool, %tobool2
  %add = add nsw i32 %y.sroa.0.0.extract.trunc, %x.sroa.0.0.extract.trunc
  %mul = mul nsw i32 %y.sroa.0.0.extract.trunc, %x.sroa.0.0.extract.trunc
  %retval.0 = select i1 %or.cond, i32 %mul, i32 %add
  ret i32 %retval.0
}

和主要的：

define i32 @main() #0 {
entry:
  %call = tail call i32 @_Z1f7myclassS_(i64 4294967296, i64 4294967297)
  %call4 = tail call i32 @_Z1f7myclassS_(i64 4294967297, i64 2)
  %add = add nsw i32 %call4, %call
  ret i32 %add
}

如您所见，f 的参数被转换为两个 64 位整数，option 的值存储在 64 位值的上半部分。然后f函数将64位的值分成两部分，并根据值决定返回乘法还是加法的结果。

【讨论】：

“不同的源文件”是什么意思。如果在头文件中定义了f怎么办？
如果f在头文件中，可以在调用的地方内联。如果声明了f，但未定义，编译器将别无选择，只能调用该函数 - 由于被调用函数不知道传入的内容，因此必须检查输入值。我稍后会用几个例子来说明我的意思。
LTO 使之成为可能，即使它是在另一个翻译单元中定义的！
@kukyakya：没错，许多编译器确实支持 LTO。我不相信许多大型项目实际上会使用它......当然 LLVM 本身并没有......