【发布时间】:2020-10-08 15:09:00
【问题描述】:
在尝试用不同的方法对 Schoenhage 基数转换树的叶子进行编码时,我偶然发现了编译器(GCC、clang)通过与倒数相乘来优化除以小常数的问题。正如他们应该的那样,没有抱怨。所以我决定添加一点内联汇编来获得可比较的基准,但我得到的是段错误。
代码(不是最小的示例,但一些上下文可能会有所帮助)
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#define ARRAY_LENGTH 33
static const char digits[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8',
'9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '+',
'/' };
static void to_radix_recursive(unsigned int a, unsigned int b,
char *result, int *index) {
unsigned int r = 0u, q = 0u;
int i;
char c;
if (a == 0u) {
return;
}
#ifdef CZ_USE_ASM
//printf("BEFORE a = %u, b = %u, q = %u, r = %u\n", a, b, q, r);
__asm__("xorl %%edx, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"divl %%ebx;"
: "=a"(q), "=d"(r)
: "g"(a), "g"(b)
);
//printf("AFTER a = %u, b = %u, q = %u, r = %u\n\n", a, b, q, r);
#else
q = a / b;
r = a % b;
#endif
to_radix_recursive(q, b, result, index);
c = digits[r];
i = *index; /* Line 41 */
result[i] = c;
(*index)++;
}
int main(void) {
int idx;
unsigned int a, b;
char result[ARRAY_LENGTH] = {'\0'};
/* All checks and balances ommitted! */
/* 0 < a <= UINT_MAX */
a = 1234567;
#ifdef CZ_USE_CONSTANT
/* Most compilers optimimize to multiplication with reciprocal here */
b = 10;
#else
/* Should press the optimizer to use "divl" */
for (b = 2u; b < 64u; b++) {
idx = 0;
to_radix_recursive(a, b, result, &idx);
printf("Result recursive = %s for radix %u\n", result,b);
for (int i = 0; i < ARRAY_LENGTH; i++) {
result[i] = '\0';
}
}
#endif
exit(EXIT_SUCCESS);
}
-DCZ_USE_ASM 的预期输出
Result recursive = 100101101011010000111 for radix 2
Result recursive = 2022201111201 for radix 3
Result recursive = 10231122013 for radix 4
Result recursive = 304001232 for radix 5
Result recursive = 42243331 for radix 6
Result recursive = 13331215 for radix 7
Result recursive = 4553207 for radix 8
Result recursive = 2281451 for radix 9
Result recursive = 1234567 for radix 10
Result recursive = 773604 for radix 11
Result recursive = 4B6547 for radix 12
Result recursive = 342C19 for radix 13
Result recursive = 241CB5 for radix 14
Result recursive = 195BE7 for radix 15
Result recursive = 12D687 for radix 16
Result recursive = ED4EA for radix 17
Result recursive = BDC71 for radix 18
Result recursive = 98IG4 for radix 19
Result recursive = 7E687 for radix 20
Result recursive = 6769J for radix 21
Result recursive = 55KGF for radix 22
Result recursive = 49AHJ for radix 23
Result recursive = 3H787 for radix 24
Result recursive = 3407H for radix 25
Result recursive = 2I679 for radix 26
Result recursive = 28JDJ for radix 27
Result recursive = 206JJ for radix 28
Result recursive = 1LHS8 for radix 29
Result recursive = 1FLM7 for radix 30
Result recursive = 1ADKN for radix 31
Result recursive = 15LK7 for radix 32
Result recursive = 11BM4 for radix 33
Result recursive = VDWR for radix 34
Result recursive = SRSC for radix 35
Result recursive = QGLJ for radix 36
Result recursive = ODTP for radix 37
Result recursive = MIaN for radix 38
Result recursive = KVQM for radix 39
Result recursive = JBO7 for radix 40
Result recursive = HbHG for radix 41
Result recursive = GRaJ for radix 42
Result recursive = FMTb for radix 43
Result recursive = ELUF for radix 44
Result recursive = DOTb for radix 45
Result recursive = CVKJ for radix 46
Result recursive = BffI for radix 47
Result recursive = B7e7 for radix 48
Result recursive = AO9C for radix 49
Result recursive = 9hfH for radix 50
Result recursive = 9FXA for radix 51
Result recursive = 8eTZ for radix 52
Result recursive = 8FQc for radix 53
Result recursive = 7jKJ for radix 54
Result recursive = 7N6b for radix 55
Result recursive = 71bl for radix 56
Result recursive = 6bu4 for radix 57
Result recursive = 6Ivb for radix 58
Result recursive = 60cp for radix 59
Result recursive = 5gu7 for radix 60
Result recursive = 5Qln for radix 61
Result recursive = 5BAN for radix 62
Result recursive = 4x3J for radix 63
但如上所述:它改为段错误。我将代码拉开一点,以便每行执行一个操作并运行 valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./divmod 打印
==9546== Memcheck, a memory error detector
==9546== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9546== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==9546== Command: ./divmod
==9546==
==9546== Invalid read of size 4
==9546== at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== Address 0x2 is not stack'd, malloc'd or (recently) free'd
==9546==
==9546==
==9546== Process terminating with default action of signal 11 (SIGSEGV)
==9546== Access not within mapped region at address 0x2
==9546== at 0x4005FA: to_radix_recursive (divmod.c:41)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== by 0x4005F1: to_radix_recursive (divmod.c:39)
==9546== If you believe this happened as a result of a stack
==9546== overflow in your program's main thread (unlikely but
==9546== possible), you can try to increase the size of the
==9546== main thread stack using the --main-stacksize= flag.
==9546== The main thread stack size used in this run was 8388608.
==9546==
==9546== HEAP SUMMARY:
==9546== in use at exit: 0 bytes in 0 blocks
==9546== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==9546==
==9546== All heap blocks were freed -- no leaks are possible
地址0x2 非常低,暗示确实是指针解引用失败。
对printf 的两次调用是因为您可能已经猜到的原因:如果您使用一个(我昨天需要同时使用两个),它就可以工作。这几乎总是由某处的某些 UB(未定义行为)引起的。
隐隐害怕自己犯了一个非常愚蠢的错误,会被大家嘲笑:这是什么原因,如何修复?
【问题讨论】:
-
我不确定你的代码使用内联汇编会更快。
-
@Jabberwocky 它不会,但我使用该程序集只是为了获得可比较的基准。将“乘以倒数”与“divl”进行比较有点像将苹果与橙子进行比较。
-
多么干净整洁的问题啊,这确实给我带来了微笑 :) 但是(意见警告!)不鼓励您在一个源文件中混合两种语言,就像我不鼓励@ 987654328@s - 除非蜘蛛侠感觉这是唯一的选择,否则不要这样做:)
-
如果您的目标是防止除法被优化,您可以通过将除数放入
volatile变量中来获得类似的结果,这样编译器就不会假定它是常数。不过,这可能会导致一些额外的加载和存储。 -
@NateEldredge:或者只是使用
asm("": "+r"(var))从优化器中“洗白”它的值,以使编译器在寄存器中实现var并忘记它所知道的任何内容(例如,值,非负数等)这基本上是一些 Benchmark::DoNotOptimize 包装器所做的。它之所以有效,是因为您告诉编译器 var 的值是 asm 语句的输出,并且它不会检查 asm 语句来尝试找出它的作用,即使是空的也是如此。
标签: c gcc x86 inline-assembly