【问题标题】:Failure of pointer dereferencing (SIGSEGV) after inline-assembly内联汇编后指针取消引用 (SIGSEGV) 失败
【发布时间】:2020-10-08 15:09:00
【问题描述】:

在尝试用不同的方法对 Schoenhage 基数转换树的叶子进行编码时,我偶然发现了编译器(GCC、clang)通过与倒数相乘来优化除以小常数的问题。正如他们应该的那样,没有抱怨。所以我决定添加一点内联汇编来获得可比较的基准,但我得到的是段错误。

代码(不是最小的示例,但一些上下文可能会有所帮助)

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

#define ARRAY_LENGTH 33

static const char digits[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8',
                               '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
                               'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
                               'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
                               'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
                               'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
                               's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '+',
                               '/' };

static void to_radix_recursive(unsigned int a, unsigned int b,
                               char *result, int *index) {
   unsigned int r = 0u, q = 0u;
   int i;
   char c;

   if (a == 0u) {
      return;
   }
#ifdef CZ_USE_ASM
   //printf("BEFORE a = %u, b = %u, q = %u, r = %u\n", a, b, q, r);
   __asm__("xorl %%edx, %%edx;"
           "movl %2, %%eax;"
           "movl %3, %%ebx;"
           "divl %%ebx;"
           : "=a"(q), "=d"(r)
           : "g"(a), "g"(b)
          );
   //printf("AFTER  a = %u, b = %u, q = %u, r = %u\n\n", a, b, q, r);
#else
   q = a / b;
   r = a % b;
#endif
   to_radix_recursive(q, b, result, index);
   c = digits[r];
   i = *index;        /* Line 41 */
   result[i] = c;
   (*index)++;
}

int main(void) {
   int idx;
   unsigned int a, b;
   char result[ARRAY_LENGTH] = {'\0'};

   /* All checks and balances ommitted! */

   /* 0 < a <= UINT_MAX */
   a = 1234567;
#ifdef CZ_USE_CONSTANT
   /* Most compilers optimimize to multiplication with reciprocal here */
   b = 10;
#else
   /* Should press the optimizer to use "divl" */
   for (b = 2u; b < 64u; b++) {
      idx = 0;
      to_radix_recursive(a, b, result, &idx);
      printf("Result recursive = %s for radix %u\n", result,b);
      for (int i = 0; i < ARRAY_LENGTH; i++) {
         result[i] = '\0';
      }
   }
#endif
   exit(EXIT_SUCCESS);
}

-DCZ_USE_ASM 的预期输出

Result recursive = 100101101011010000111 for radix 2
Result recursive = 2022201111201 for radix 3
Result recursive = 10231122013 for radix 4
Result recursive = 304001232 for radix 5
Result recursive = 42243331 for radix 6
Result recursive = 13331215 for radix 7
Result recursive = 4553207 for radix 8
Result recursive = 2281451 for radix 9
Result recursive = 1234567 for radix 10
Result recursive = 773604 for radix 11
Result recursive = 4B6547 for radix 12
Result recursive = 342C19 for radix 13
Result recursive = 241CB5 for radix 14
Result recursive = 195BE7 for radix 15
Result recursive = 12D687 for radix 16
Result recursive = ED4EA for radix 17
Result recursive = BDC71 for radix 18
Result recursive = 98IG4 for radix 19
Result recursive = 7E687 for radix 20
Result recursive = 6769J for radix 21
Result recursive = 55KGF for radix 22
Result recursive = 49AHJ for radix 23
Result recursive = 3H787 for radix 24
Result recursive = 3407H for radix 25
Result recursive = 2I679 for radix 26
Result recursive = 28JDJ for radix 27
Result recursive = 206JJ for radix 28
Result recursive = 1LHS8 for radix 29
Result recursive = 1FLM7 for radix 30
Result recursive = 1ADKN for radix 31
Result recursive = 15LK7 for radix 32
Result recursive = 11BM4 for radix 33
Result recursive = VDWR for radix 34
Result recursive = SRSC for radix 35
Result recursive = QGLJ for radix 36
Result recursive = ODTP for radix 37
Result recursive = MIaN for radix 38
Result recursive = KVQM for radix 39
Result recursive = JBO7 for radix 40
Result recursive = HbHG for radix 41
Result recursive = GRaJ for radix 42
Result recursive = FMTb for radix 43
Result recursive = ELUF for radix 44
Result recursive = DOTb for radix 45
Result recursive = CVKJ for radix 46
Result recursive = BffI for radix 47
Result recursive = B7e7 for radix 48
Result recursive = AO9C for radix 49
Result recursive = 9hfH for radix 50
Result recursive = 9FXA for radix 51
Result recursive = 8eTZ for radix 52
Result recursive = 8FQc for radix 53
Result recursive = 7jKJ for radix 54
Result recursive = 7N6b for radix 55
Result recursive = 71bl for radix 56
Result recursive = 6bu4 for radix 57
Result recursive = 6Ivb for radix 58
Result recursive = 60cp for radix 59
Result recursive = 5gu7 for radix 60
Result recursive = 5Qln for radix 61
Result recursive = 5BAN for radix 62
Result recursive = 4x3J for radix 63

但如上所述:它改为段错误。我将代码拉开一点,以便每行执行一个操作并运行 valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./divmod 打印

==9546== Memcheck, a memory error detector
==9546== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9546== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==9546== Command: ./divmod
==9546== 
==9546== Invalid read of size 4
==9546==    at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==  Address 0x2 is not stack'd, malloc'd or (recently) free'd
==9546== 
==9546== 
==9546== Process terminating with default action of signal 11 (SIGSEGV)
==9546==  Access not within mapped region at address 0x2
==9546==    at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==    by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546==  If you believe this happened as a result of a stack
==9546==  overflow in your program's main thread (unlikely but
==9546==  possible), you can try to increase the size of the
==9546==  main thread stack using the --main-stacksize= flag.
==9546==  The main thread stack size used in this run was 8388608.
==9546== 
==9546== HEAP SUMMARY:
==9546==     in use at exit: 0 bytes in 0 blocks
==9546==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==9546== 
==9546== All heap blocks were freed -- no leaks are possible

地址0x2 非常低,暗示确实是指针解引用失败。

printf 的两次调用是因为您可能已经猜到的原因:如果您使用一个(我昨天需要同时使用两个),它就可以工作。这几乎总是由某处的某些 UB(未定义行为)引起的。

隐隐害怕自己犯了一个非常愚蠢的错误,会被大家嘲笑:这是什么原因,如何修复?

【问题讨论】:

  • 我不确定你的代码使用内联汇编会更快。
  • @Jabberwocky 它不会,但我使用该程序集只是为了获得可比较的基准。将“乘以倒数”与“divl”进行比较有点像将苹果与橙子进行比较。
  • 多么干净整洁的问题啊,这确实给我带来了微笑 :) 但是(意见警告!)不鼓励您在一个源文件中混合两种语言,就像我不鼓励@ 987654328@s - 除非蜘蛛侠感觉这是唯一的选择,否则不要这样做:)
  • 如果您的目标是防止除法被优化,您可以通过将除数放入volatile 变量中来获得类似的结果,这样编译器就不会假定它是常数。不过,这可能会导致一些额外的加载和存储。
  • @NateEldredge:或者只是使用asm("": "+r"(var)) 从优化器中“洗白”它的值,以使编译器在寄存器中实现var 并忘记它所知道的任何内容(例如,值,非负数等)这基本上是一些 Benchmark::DoNotOptimize 包装器所做的。它之所以有效,是因为您告诉编译器 var 的值是 asm 语句的输出,并且它不会检查 asm 语句来尝试找出它的作用,即使是空的也是如此。

标签: c gcc x86 inline-assembly


【解决方案1】:

问题是在你的内联汇编中你这样做:

   __asm__("xorl %%edx, %%edx;"
           "movl %2, %%eax;"
           "movl %3, %%ebx;"
           "divl %%ebx;"
           : "=a"(q), "=d"(r)
           : "g"(a), "g"(b)
          );

GCC/CLANG 非常无情。如果你修改一个寄存器,你需要告诉编译器它将被修改。在这个内联汇编代码中,您说过 EAXEDX 是仅输出寄存器(它们将被修改),但您没有告诉编译器您修改/破坏了 EBX。一个简单的解决方法是将 EBX 添加到 clobber 列表中,如下所示:

   __asm__("xorl %%edx, %%edx;"
           "movl %2, %%eax;"
           "movl %3, %%ebx;"
           "divl %%ebx;"
           : "=a"(q), "=d"(r)
           : "g"(a), "g"(b)
           : "ebx"
          );

现在编译器不会假设 EBX 仍然包含与运行内联汇编代码之前相同的值。

如果您的内联汇编以MOV 指令开头,则您可能采取了错误的方法,即不使用内联汇编操作数(和约束)本身来允许编译器尝试生成最有效的代码版本。您的内联汇编可能看起来像这样:

   __asm__("divl %4"
           : "=a"(q), "=d"(r)
           : "a"(a), "d"(0), "r"(b)
          );

我们创建了第 5 个操作数来传递编译器选择的寄存器中的除数。我们还在操作数中将 EDX 设置为零,而不是在内联汇编中这样做。此版本还为输入和输出操作数重用了 EAXEDX 寄存器,需要可能使用的寄存器更少。

【讨论】:

  • 是的,我知道这会很尴尬! ;-) 谢谢!
  • @deamentiaemundi:一点也不尴尬。不幸的是,GCC 内联汇编语法的无情性质可能会导致发生这样的微妙问题。然而,无情的性质也使它更强大,因为它可以让编译器充分利用其优化能力。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-10-24
  • 2023-03-25
  • 2011-02-04
  • 1970-01-01
  • 1970-01-01
  • 2018-05-20
  • 2023-03-10
相关资源
最近更新 更多