你是对的,这是多余的如果你知道esp已经指向你推送你的呼叫者的ebp的位置。
当 gcc 使用 -fno-omit-frame-pointer 编译函数时,它实际上会执行您建议的优化,即在知道 esp 已经指向正确的位置时弹出 ebp。
这在使用调用保留寄存器(如ebx)的函数中非常常见,这些寄存器也必须像ebp 一样保存/恢复。编译器通常在为 C99 可变大小数组保留空间之前在序言/尾声中执行所有保存/恢复。所以pop ebx 将始终让esp 指向pop ebp 的正确位置。
例如在Godbolt compiler explorer 上,此功能的clang 3.8 输出(带有-O3 -m32)。通常,编译器并不能完全生成最佳代码:
void extint(int); // a function that can't inline because the compiler can't see the definition.
int save_reg_framepointer(int a){
extint(a);
return a;
}
# clang3.8
push ebp
mov ebp, esp # stack-frame boilerplate
push esi # save a call-preserved reg
push eax # align the stack to 16B
mov esi, dword ptr [ebp + 8] # load `a` into a register that will survive the function call.
mov dword ptr [esp], esi # store the arg for extint. Doing this with an ebp-relative address would have been slightly more efficient, but just push esi here instead of push eax earlier would make even more sense
call extint
mov eax, esi # return value
add esp, 4 # pop the arg
pop esi # restore esi
pop ebp # restore ebp. Notice the lack of a mov esp, ebp here, or even a lea esp, [ebp-4] before the first pop.
ret
当然是人类(借用 gcc 的技巧)
# hand-written based on tricks from gcc and clang, and avoiding their suckage
call_non_inline_and_return_arg:
push ebp
mov ebp, esp # stack-frame boilerplate if we have to.
push esi # save a call-preserved reg
mov esi, dword [ebp + 8] # load `a` into a register that will survive the function call
push esi # replacing push eax / mov
call extint
mov eax, esi # return value. Could mov eax, [ebp+8]
mov esi, [ebp-4] # restore esi without a pop, since we know where we put it, and esp isn't pointing there.
leave # same as mov esp, ebp / pop ebp. 3 uops on recent Intel CPUs
ret
由于堆栈需要在 call 之前对齐 16(根据 SystemV i386 ABI 的规则,请参阅 x86 标签 wiki 中的链接),我们不妨保存/恢复一个额外的 reg,而不仅仅是push [ebp+8],然后(在通话后)mov eax, [ebp+8]。编译器倾向于保存/恢复调用保留的寄存器而不是多次重新加载本地数据。
如果不是当前版本的 ABI 中的堆栈对齐规则,我可能会这样写:
# hand-written: esp alignment not preserved on the call
call_no_stack_align:
push ebp
mov ebp, esp # stack-frame boilerplate if we have to.
push dword [ebp + 8] # function arg. 2 uops for push with a memory operand
call extint # esp is offset by 12 from before the `call` that called us: return address, ebp, and function arg.
mov eax, [ebp+8] # return value, which extint won't have modified because it only takes one arg
leave # same as mov esp, ebp / pop ebp. 3 uops on recent Intel CPUs
ret
gcc 实际上会使用leave 而不是 mov / pop,以防在弹出ebx 之前确实需要修改esp。例如,flip Godbolt to gcc (instead of clang), and take out -m32,所以我们正在为 x86-64 进行编译(其中 args 在寄存器中传递)。这意味着调用后不需要从堆栈中弹出 args,因此 rsp 被正确设置为仅弹出两个 regs。 (推送/弹出使用 8 个字节的堆栈,但 rsp 在 SysV AMD64 ABI 中的 call 之前仍然必须是 16B 对齐的,所以 gcc 实际上在 call 周围做了一个 sub rsp, 8 和相应的 add .)
另一个错过的优化:使用gcc -m32,可变长度数组函数在调用后使用add esp, 16 / leave。 add 完全没用。 (将 -m32 添加到 Godbolt 上的 gcc 参数中)。