将 c++ 映射到程序集答案

【问题标题】：Mapping c++ to assembly将 c++ 映射到程序集
【发布时间】：2017-05-26 15:00:47
【问题描述】：

当使用 clang 3.9.1 和优化 (-O2) 编译一些代码时，我在运行时遇到了一些我在其他编译器（clang 3.8 和 gcc 6.3）中没有看到的意外行为。

我认为我可能有一些无意的未定义行为（使用 ubsan 编译会消除意外行为），因此我尝试简化程序并发现一个特定函数似乎导致了行为差异。

现在，我将程序集映射回 c++，以查看哪里出了问题，尝试确定发生这种情况的原因，并且有几个部分我很难映射回来。

C++：

#include <atomic>
#include <cstdint>
#include <cstdlib>
#include <thread>
#include <cstdio>

enum class FooState { A, B };

struct Foo {
  std::atomic<std::int64_t> counter{0};
  std::atomic<std::int64_t> counter_a{0};
  std::atomic<std::int64_t> counter_b{0};
};

//__attribute__((noinline))
FooState to_state(const std::int64_t c) {
  return c >= 0 ? FooState::A : FooState::B;
}

static const int NUM_MODIFIES = 100;

int value_a = 0, value_b = 0;
Foo foo;
std::atomic<std::int64_t> total_sum{0};

void test_function() {
  bool done = false;
  while (!done) {
    const std::int64_t count =
        foo.counter.fetch_add(1, std::memory_order_seq_cst);
    const FooState state = to_state(count);

    int &val = FooState::A == state ? value_a : value_b;
    if (val == NUM_MODIFIES) {
      total_sum += val;
      done = true;
    }

    std::atomic<std::int64_t> &c =
        FooState::A == state ? foo.counter_a : foo.counter_b;
    c.fetch_add(1, std::memory_order_seq_cst);
  }
}

组装：

test_function():                     # @test_function()
        test    rax, rax
        setns   al
        lock
        inc     qword ptr [rip + foo]
        mov     ecx, value_a
        mov     edx, value_b
        cmovg   rdx, rcx
        cmp     dword ptr [rdx], 100
        je      .LBB1_3
        mov     ecx, foo+8
        mov     edx, value_a
.LBB1_2:                                # =>This Inner Loop Header: Depth=1
        test    al, 1
        mov     eax, foo+16
        cmovne  rax, rcx
        lock
        inc     qword ptr [rax]
        test    rax, rax
        setns   al
        lock
        inc     qword ptr [rip + foo]
        mov     esi, value_b
        cmovg   rsi, rdx
        cmp     dword ptr [rsi], 100
        jne     .LBB1_2
.LBB1_3:
        lock
        add     qword ptr [rip + total_sum], 100
        test    al, al
        mov     eax, foo+8
        mov     ecx, foo+16
        cmovne  rcx, rax
        lock
        inc     qword ptr [rcx]
        ret

我发现将to_state 标记为noinline 或将done 更改为全局似乎可以“修复”意外行为。

我看到的意外行为是，当 counter >= 0 时 counter_a 应该增加，否则 counter_b 应该增加。据我所知，有时这并没有发生，但确切地确定何时/为什么是困难的。

我可以使用一些帮助的程序集的一部分是test rax, rax; setns al 和test al, 1 部分。似乎初始测试不会确定性地设置al，然后该值用于确定要增加哪个计数器，但也许我误解了一些东西。

下面是一个小例子来演示这个问题。当使用 clang 3.9 和 -O2 编译时，它通常会永远挂起，否则会运行到完成。

#include <atomic>
#include <cstdint>
#include <cstdlib>
#include <thread>
#include <cstdio>

enum class FooState { A, B };

struct Foo {
  std::atomic<std::int64_t> counter{0};
  std::atomic<std::int64_t> counter_a{0};
  std::atomic<std::int64_t> counter_b{0};
};

//__attribute__((noinline))
FooState to_state(const std::int64_t c) {
  return c >= 0 ? FooState::A : FooState::B;
}

//__attribute__((noinline))
FooState to_state2(const std::int64_t c) {
  return c >= 0 ? FooState::A : FooState::B;
}

static const int NUM_MODIFIES = 100;

int value_a = 0, value_b = 0;
Foo foo;
std::atomic<std::int64_t> total_sum{0};

void test_function() {
  bool done = false;
  while (!done) {
    const std::int64_t count =
        foo.counter.fetch_add(1, std::memory_order_seq_cst);
    const FooState state = to_state(count);

    int &val = FooState::A == state ? value_a : value_b;
    if (val == NUM_MODIFIES) {
      total_sum += val;
      done = true;
    }

    std::atomic<std::int64_t> &c =
        FooState::A == state ? foo.counter_a : foo.counter_b;
    c.fetch_add(1, std::memory_order_seq_cst);
  }
}

int main() {
  std::thread thread = std::thread(test_function);

  for (std::size_t i = 0; i <= NUM_MODIFIES; ++i) {
    const std::int64_t count =
        foo.counter.load(std::memory_order_seq_cst);
    const FooState state = to_state2(count);

    unsigned log_count = 0;

    auto &inactive_val = FooState::A == state ? value_b : value_a;
    inactive_val = i;

    if (FooState::A == state) {
      foo.counter_b.store(0, std::memory_order_seq_cst);
      const auto accesses_to_wait_for =
          foo.counter.exchange((std::numeric_limits<std::int64_t>::min)(),
                               std::memory_order_seq_cst);
      while (accesses_to_wait_for !=
             foo.counter_a.load(std::memory_order_seq_cst)) {
        std::this_thread::yield();

        if(++log_count <= 10) {
          std::printf("#1 wait_for=%ld, val=%ld\n", accesses_to_wait_for, 
            foo.counter_a.load(std::memory_order_seq_cst));
        }
      }
    } else {
      foo.counter_a.store(0, std::memory_order_seq_cst);

      auto temp = foo.counter.exchange(0, std::memory_order_seq_cst);
      std::int64_t accesses_to_wait_for = 0;
      while (temp != INT64_MIN) {
        ++accesses_to_wait_for;
        --temp;
      }

      while (accesses_to_wait_for !=
             foo.counter_b.load(std::memory_order_seq_cst)) {
        std::this_thread::yield();

        if (++log_count <= 10) {
          std::printf("#2 wait_for=%ld, val=%ld\n", accesses_to_wait_for, 
            foo.counter_b.load(std::memory_order_seq_cst));
        }
      }
    }

    std::printf("modify #%lu complete\n", i);
  }

  std::printf("modifies complete\n");

  thread.join();

  const std::size_t expected_result = NUM_MODIFIES;
  std::printf("%s\n", total_sum == expected_result ? "ok" : "fail");
}

【问题讨论】：

你为什么要看汇编语言来调试代码？创建 mvce 并使用调试器？
您一直在说“意外行为”，但我仍然不确定您没有预料到哪种行为？你能澄清一下吗？
@UnholySheep 抱歉。我已经用更多信息更新了帖子。
整个 value_a, value_b 真正什么都不做的事情很奇怪。设置为 0 并与之比较，但未使用。每当我看到全局变量和线程时，我都会感到紧张。
也很确定您与原子的所有比较都不是线程安全的。是的，他们使用原子，但是 ?运算符仍然会在多个指令中出现，因此它的值可以改变。

标签： c++ assembly clang

【解决方案1】：

我不是 100% 确定（没有调试它，只是在头部模拟），但我认为 test rax,rax + setns al 这两对都在测试错误。

first的结果取决于rax < 0是否在调用函数（UB）时，循环内的其他测试将始终为“NS”（在rax中测试32b地址=> SF=0 => al =1)，因此固定al == 1 剩余循环将始终选择counter_a。

现在我读到了你的问题，你也有同样的怀疑（我确实只是先看看代码）。

【讨论】：

是的，这也是我怀疑的部分。您是否在代码中看到任何可能允许生成它的地方？
@CTT 不是真的。如果它本身是state，我会期望相反的逻辑（枚举为 A=0，B=1，而al 设置为 1/0）。我确实认为它可能是“在收到to_state() 的返回值之后，由于内联而将其删除之前”的一部分，但逆逻辑不支持这一点。所以它可能要复杂得多。我仍然不能 100% 确定我是否正确地破译了它。此外，我在 C++ 源代码中看不到任何类似 UB 的东西，但我不是 C++ 大师。通过引用检查别名是否有效的一件事，但是 AFAIG (g=guess) 是有效的。（所以我认为你遇到了编译器错误.. 50%？）
这确实是一个编译器错误。修复在这里：reviews.llvm.org/rL291630