字符串反转函数 x86 NASM 程序集答案

【问题标题】：String reverse function x86 NASM assembly字符串反转函数 x86 NASM 程序集
【发布时间】：2020-03-21 02:23:12
【问题描述】：

我正在尝试使用 x86 NASM 汇编语言编写一个反转字符串中字符顺序的函数。我尝试使用寄存器来做它（我知道使用堆栈做它更有效）但我不断收到分段错误，c 声明如下所示

extern char* reverse(char*);

组装环节：

section .text
global reverse
reverse:
        push ebp           ; prologue
        mov ebp, esp       
        mov eax, [ebp+8]   ; eax <- points to string
        mov edx, eax
look_for_last:
        mov ch, [edx]      ; put char from edx in ch
        inc edx
        test ch, ch        
        jnz look_for_last  ; if char != 0 loop
        sub edx, 2         ; found last
swap:                      ; eax = first, edx = last (characters in string)
        test eax, edx       
        jg end             ; if eax > edx reverse is done
        mov cl, [eax]      ; put char from eax in cl
        mov ch, [edx]      ; put char from edx in ch
        mov [edx], cl      ; put cl in edx
        mov [eax], ch      ; put ch in eax
        inc eax
        dec edx
        jmp swap            
end:
        mov eax, [ebp+8]   ; move char pointer to eax (func return)
        pop ebp            ; epilogue
        ret

似乎导致分段错误的行是

mov cl, [eax]

为什么会这样？在我的理解中，eax 永远不会超出字符串的范围，所以 [eax] 中总会有一些东西。我怎么会出现分段错误？

【问题讨论】：

我知道使用堆栈更有效不，不是！就地反转比对字符串的每个字节执行 4 字节 push 更有效。您的具体实现没有优化，但它可能仍然更快。（例如，您不使用 bswap 一次反转 4 个字节，并且在某些 CPU (AMD) 上分别编写 cl 和 ch 将对 ECX 和您的循环结构 with a jmp at the bottom and also a conditional branch at the top is inefficient 产生错误的依赖关系。
当然，您也可以使用 SIMD for strlen（您称之为 look_for_last）在 SSE2 中一次传输 16 个字节，或者至少一次传输 4 个字节比特黑客。 Why does glibc's strlen need to be so complicated to run quickly?。在开始交换之前必须完成这项工作会让一些人感到痛苦，而推送/弹出实现可以同时做到这一点。但是 push/pop 也需要两个循环，它们都复制和触摸 5x (= 4+1) 与只读和就地循环一样多的内存。如果你要倒车到一个大的 buf，你甚至可以跳过 strlen
无论如何，使用pshufb 的 SSSE3，您可以在现代 x86 CPU 上每个时钟周期反转 16 个字节，而您的则为 1 个。在 IceLake 之前，我认为 AVX2 不会有帮助。其他 CPU 将成为 shuffle 吞吐量的瓶颈。不过，IceLake 上的 AVX512VBMI vpermb 会很好：一次 uop 用于 64 个字节。另请参阅gcc.gnu.org/bugzilla/show_bug.cgi?id=92246 - GCC 错过了自动矢量化单词（又名short）反向循环的优化。
（我知道这比您可能已经准备好消化的优化要多得多，但其中一些可能很有趣。另请参阅stackoverflow.com/tags/x86/info 以获取其他指南、文档和性能链接。）

标签： string x86 nasm

【解决方案1】：

好吧，我想通了，我错误地使用了test eax, edx，而不是我应该使用cmp eax, edx。现在可以了。

【讨论】：