使用 GCC 的堆栈保护和粉碎答案

【问题标题】：Stack Protection and Smashing using GCC使用 GCC 的堆栈保护和粉碎
【发布时间】：2015-05-21 08:37:44
【问题描述】：

我正在阅读Smashing the Stack for Fun and Profit（特别是，这篇文章指的是“缓冲区溢出”部分）。这篇文章是为一台 32 位机器，但是我正在使用 64 位机器我的例子。一个特殊的例子是导致一些我无法解决的问题解释。 example3.c 具有覆盖返回地址以跳过的功能主函数中的指令。这是我的代码：

#include <stdio.h>

void function(int a, int b, int c)
{
  char buf1[5];
  char buf2[10];
  int *retptr;

  retptr = (void*)(buf2 + 40);
  (*retptr) += 8;
}

int main(void)
{
  int x;

  x = 0;
  function(1,2,3);
  x = 1;
  printf("%d\n", x);
  return 0;
}

我用 gcc v4.8.2 用以下命令编译这个程序：

gcc example3.c -o example3

请注意，默认情况下 gcc 编译器似乎实现了一些堆栈保护，例如地址空间布局随机化和堆栈金丝雀。我在计算 ret 时已经考虑了这些安全措施指针值。这里是由 gcc example3.c -S -fverbose-asm -o stack-protection.s:

    .file   "example3.c"
# GNU C (Ubuntu 4.8.2-19ubuntu1) version 4.8.2 (x86_64-linux-gnu)
#   compiled by GNU C version 4.8.2, GMP version 5.1.3, MPFR version 3.1.2-p3, MPC version 1.0.1
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -imultiarch x86_64-linux-gnu example3.c -mtune=generic
# -march=x86-64 -auxbase-strip verbose-stack-pro.s -fverbose-asm
# -fstack-protector -Wformat -Wformat-security
# options enabled:  -faggressive-loop-optimizations
# -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg -fcommon
# -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
# -feliminate-unused-debug-types -ffunction-cse -fgcse-lm -fgnu-runtime
# -fident -finline-atomics -fira-hoist-pressure -fira-share-save-slots
# -fira-share-spill-slots -fivopts -fkeep-static-consts
# -fleading-underscore -fmath-errno -fmerge-debug-strings
# -fmove-loop-invariants -fpeephole -fprefetch-loop-arrays
# -freg-struct-return -fsched-critical-path-heuristic
# -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock
# -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec
# -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fshow-column
# -fsigned-zeros -fsplit-ivs-in-unroller -fstack-protector
# -fstrict-volatile-bitfields -fsync-libcalls -ftrapping-math
# -ftree-coalesce-vars -ftree-cselim -ftree-forwprop -ftree-loop-if-convert
# -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
# -ftree-parallelize-loops= -ftree-phiprop -ftree-pta -ftree-reassoc
# -ftree-scev-cprop -ftree-slp-vectorize -ftree-vect-loop-version
# -funit-at-a-time -funwind-tables -fverbose-asm -fzero-initialized-in-bss
# -m128bit-long-double -m64 -m80387 -maccumulate-outgoing-args
# -malign-stringops -mfancy-math-387 -mfp-ret-in-387 -mfxsr -mglibc
# -mieee-fp -mlong-double-80 -mmmx -mno-sse4 -mpush-args -mred-zone -msse
# -msse2 -mtls-direct-seg-refs

    .text
    .globl  function
    .type   function, @function
function:
.LFB0:
    .cfi_startproc
    pushq   %rbp    #
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp  #,
    .cfi_def_cfa_register 6
    subq    $64, %rsp   #,
    movl    %edi, -52(%rbp) # a, a
    movl    %esi, -56(%rbp) # b, b
    movl    %edx, -60(%rbp) # c, c
    movq    %fs:40, %rax    #, tmp65
    movq    %rax, -8(%rbp)  # tmp65, D.2197
    xorl    %eax, %eax  # tmp65
    leaq    -32(%rbp), %rax #, tmp61
    addq    $40, %rax   #, tmp62
    movq    %rax, -40(%rbp) # tmp62, ret
    movq    -40(%rbp), %rax # ret, tmp63
    movl    (%rax), %eax    # *ret_1, D.2195
    leal    8(%rax), %edx   #, D.2195
    movq    -40(%rbp), %rax # ret, tmp64
    movl    %edx, (%rax)    # D.2195, *ret_1
    movq    -8(%rbp), %rax  # D.2197, tmp66
    xorq    %fs:40, %rax    #, tmp66
    je  .L2 #,
    call    __stack_chk_fail    #
.L2:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   function, .-function
    .section    .rodata
.LC0:
    .string "%d\n"
    .text
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp    #
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp  #,
    .cfi_def_cfa_register 6
    subq    $16, %rsp   #,
    movl    $0, -4(%rbp)    #, x
    movl    $3, %edx    #,
    movl    $2, %esi    #,
    movl    $1, %edi    #,
    call    function    #
    movl    $1, -4(%rbp)    #, x
    movl    -4(%rbp), %eax  # x, tmp61
    movl    %eax, %esi  # tmp61,
    movl    $.LC0, %edi #,
    movl    $0, %eax    #,
    call    printf  #
    movl    $0, %eax    #, D.2200
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
    .section    .note.GNU-stack,"",@progbits

执行 example3 具有跳过第二个赋值给 x 的预期效果，程序输出 0。

但是，如果我改为使用 -fno-stack-protector 选项进行编译：

gcc -fno-stack-protector example3.c -S -fverbose-asm -o no-stack-protection.s

我收到以下汇编文件：

    .file   "example3.c"
# GNU C (Ubuntu 4.8.2-19ubuntu1) version 4.8.2 (x86_64-linux-gnu)
#   compiled by GNU C version 4.8.2, GMP version 5.1.3, MPFR version 3.1.2-p3, MPC version 1.0.1
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -imultiarch x86_64-linux-gnu example3.c -mtune=generic
# -march=x86-64 -auxbase-strip verbose-no-stack-pro.s -fno-stack-protector
# -fverbose-asm -Wformat -Wformat-security
# options enabled:  -faggressive-loop-optimizations
# -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg -fcommon
# -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
# -feliminate-unused-debug-types -ffunction-cse -fgcse-lm -fgnu-runtime
# -fident -finline-atomics -fira-hoist-pressure -fira-share-save-slots
# -fira-share-spill-slots -fivopts -fkeep-static-consts
# -fleading-underscore -fmath-errno -fmerge-debug-strings
# -fmove-loop-invariants -fpeephole -fprefetch-loop-arrays
# -freg-struct-return -fsched-critical-path-heuristic
# -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock
# -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec
# -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fshow-column
# -fsigned-zeros -fsplit-ivs-in-unroller -fstrict-volatile-bitfields
# -fsync-libcalls -ftrapping-math -ftree-coalesce-vars -ftree-cselim
# -ftree-forwprop -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon
# -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pta
# -ftree-reassoc -ftree-scev-cprop -ftree-slp-vectorize
# -ftree-vect-loop-version -funit-at-a-time -funwind-tables -fverbose-asm
# -fzero-initialized-in-bss -m128bit-long-double -m64 -m80387
# -maccumulate-outgoing-args -malign-stringops -mfancy-math-387
# -mfp-ret-in-387 -mfxsr -mglibc -mieee-fp -mlong-double-80 -mmmx -mno-sse4
# -mpush-args -mred-zone -msse -msse2 -mtls-direct-seg-refs

    .text
    .globl  function
    .type   function, @function
function:
.LFB0:
    .cfi_startproc
    pushq   %rbp    #
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp  #,
    .cfi_def_cfa_register 6
    movl    %edi, -36(%rbp) # a, a
    movl    %esi, -40(%rbp) # b, b
    movl    %edx, -44(%rbp) # c, c
    leaq    -32(%rbp), %rax #, tmp61
    addq    $40, %rax   #, tmp62
    movq    %rax, -8(%rbp)  # tmp62, ret
    movq    -8(%rbp), %rax  # ret, tmp63
    movl    (%rax), %eax    # *ret_1, D.2195
    leal    8(%rax), %edx   #, D.2195
    movq    -8(%rbp), %rax  # ret, tmp64
    movl    %edx, (%rax)    # D.2195, *ret_1
    popq    %rbp    #
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   function, .-function
    .section    .rodata
.LC0:
    .string "%d\n"
    .text
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp    #
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp  #,
    .cfi_def_cfa_register 6
    subq    $16, %rsp   #,
    movl    $0, -4(%rbp)    #, x
    movl    $3, %edx    #,
    movl    $2, %esi    #,
    movl    $1, %edi    #,
    call    function    #
    movl    $1, -4(%rbp)    #, x
    movl    -4(%rbp), %eax  # x, tmp61
    movl    %eax, %esi  # tmp61,
    movl    $.LC0, %edi #,
    movl    $0, %eax    #,
    call    printf  #
    movl    $0, %eax    #, D.2196
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
    .section    .note.GNU-stack,"",@progbits

并且相应的可执行文件不会产生所需的值 0 而是我无法与程序集文件协调的随机值。

我对@987654333@ 情况下堆栈帧的印象是（sfp = 保存的帧指针，ret = 返回地址）：

low memory address     buf2 (16 bytes)  buf1 (8 bytes)  retptr (8 bytes)  sfp (8 bytes) ret       high memory address
<---                  [              ][              ][                ][             ][    ] ...
top of stack                                                                                      bottom of stack

我的问题：

我是不是在不受保护的情况下误算了返回地址的位置？

【问题讨论】：

用-S -fverbose-asm编译，也可能用-O编译
我已更新我的问题以包含来自-fverbose-asm 的输出。 -O 似乎消除了输出，因此无堆栈保护版本没有 function。我无法从附加的 cmets 中看到该选项在程序集文件中的位置出了什么问题。这两个版本似乎都对retptr 变量执行相同的操作。
最好的办法是单步执行代码，这样您就可以查看寄存器/内存值。做了asm已经有一段时间了，但是rsp寄存器上有一个sub qword指令，在非保护版本中不存在。
我的猜测是您忽略了 GCC 为实现堆栈保护而添加的变量。此处显示的一种实现[1] 在函数中使用了额外的局部变量，这会干扰您对堆栈的视图。 [1][wiki.osdev.org/Stack_Smashing_Protector] 另外，您可能会说您应该在-fno-stack-protector 案例中看到粉碎，而不是在前一个案例中。好吧，我认为你在前一种情况下可视化堆栈时犯了一些错误，因为在前一种情况下我没有得到输出，即0 ideone (ideone.com/dRVgZ2)
@Nishant 关于堆栈保护，我已经明确计算了要添加的额外变量，这可以在 stack-protection.s 程序集文件中看到；金丝雀被添加到缓冲区之前，但它对我计算返回地址没有任何影响，因为受保护情况下的retptr 替换了金丝雀。我认为 ideone.com 在这里是不可接受的；无论您向缓冲区添加什么值（计算retptr），它都会成功编译并输出1，但它会导致内存冲突，建议进行积极优化。

标签： c stack-smash

【解决方案1】：

我是不是在不受保护的情况下误算了返回地址的位置？

这部分是正确的，至少只要地址适合 int。 retptr 的正确类型应该是带有 x86-64 asm 的 long，因此指针可以保存 64 位地址。

您可以通过运行以下程序来仔细检查：

#include <stdio.h>

void function(int a, int b, int c)
{
  char buf1[5];
  char buf2[10];
  int *retptr;

  retptr = (void*)(buf2 + 40);
  printf("retptr points to: %p\n", (long*)(long)*retptr);
  (*retptr) += 8;
}

int main(void)
{
  int x;


  printf("ret address is %p\n", &&label);
  x = 0;
  function(1,2,3);
label:
  x = 1;
  printf("%d\n", x);

  return 0;
}

通过运行此程序，您应该能够确认function 之后的地址也是retptr 持有的地址。

我相信您没有得到预期的0 的原因在于这一行：

(*retptr) += 8;

在我的 64 位系统上，x = 1 编译为：

  40058a:   c7 45 fc 01 00 00 00    movl   $0x1,-0x4(%rbp)
  400591:   8b 45 fc                mov    -0x4(%rbp),%eax
  400594:   89 c6                   mov    %eax,%esi

第一行将1 加载到x 中，另外两行将x 的值作为参数传递给printf()。请注意这是 7 个字节，而不是 8 个字节。如果将增量更改为 7，您应该会看到 0，正如您所料。

实际上，通过添加 8，ret 指令将指令指针设置为指向45，而不是8b。然后该代码变为：

  45 fc                 rex.RB cld 
  89 c6                 mov    %eax,%esi

我不完全确定此时会发生什么，我怀疑这取决于 CPU 型号。我的似乎跳过指令直到mov %eax,%esi，所以printf 显示%eax 的值。如果你看function()的反汇编，原来%rax是用来存储retptr的值，这就是打印出来的看似随机的值。

【讨论】：

啊，我不使用long真是太糟糕了。无论如何，GCC 似乎为指针分配了 8 个字节。我对指令长度为 7 个字节以及为什么 8 字节增量在受保护的情况下起作用感到有些困惑。如何获得实现x = 1 的指令布局？我使用了 GDB 的反汇编命令，但我没有得到与您在此处显示的相同级别的详细信息。一点：“...ret 指令已将指令指针设置为指向45，而不是c7”，应该读为8b 而不是c7？最后一个问题：如何获得最终的汇编代码sn-p？
我在可执行文件上使用了objdump -D 以获得反汇编。关于c7而不是8b，我的意思是c7：当跳转到c7 45 fc 01 00 00 00时，由于返回地址添加了额外的字节，我们最终跳转到45 fc 01 00 00...。关于最终代码sn-p，我写了一个小程序：char asm_snippet[]={0x45, 0xfc, 0x01, 0x00, 0x00 ...。然后使用objdump -D。你也可以在 gdb 中使用x /32i asm_snippet。我是从这里学到的：lkml.org/lkml/2008/1/7/406
我们要跳转的想法不是通过了指令c7 45 fc 01 00 00 00吗？因此，添加 7 个字节会将我们带到 8b 45 fc，添加额外的字节会导致我们从 8b 跳转到 45 fc？