确定数组中负数和正数的数量答案

【问题标题】：Determine number of negative and positive numbers in an array确定数组中负数和正数的数量
【发布时间】：2016-02-14 15:02:11
【问题描述】：

我需要在汇编程序中确定数组中负数和正数的数量。汇编程序似乎没有将它们识别为负数。我怎么解决这个问题？我这样定义数组：

word_array db 3, -2, 11, -1, -2, -7, -5, -20

我有这个计算正数的函数：

count_positives:
mov dx, word [word_array + 2*ecx - 2]
cmp edx, 0
JL skip
inc ebx
skip:
loopnz count_positives

【问题讨论】：

您使用 db 将 word_array 声明为字节。也许您打算使用dw（对于 16 位 WORD）。喜欢：word_array dw 3, -2, 11, -1, -2, -7, -5, -20。同样，由于您通过使用dx 来使用edx 寄存器的低16 位，那么我建议您更改为cmp dx, 0 而不是cmp edx, 0 如果您不比较寄存器的正确部分，您的负数可能显示为正数。
其实用test dx, dx，或者干脆跳过加载，用cmp word [word_array + 2*ecx - 2], 0。并且不要使用loop 指令，它很慢。使用dec ecx / jnz
@PeterCordes ：因为我只是在评论，我试图建议为什么现有代码可能不会产生正确的结果。

标签： assembly x86 nasm 32-bit

【解决方案1】：

阅读cmets

proc:
  mov si, data ; si points to the data
  mov cx, [len] ; cx gets the length of the data
  shr cx,1 ; the length was in bytes, we want words
  mov bx, 0
  mov dx, cx

checkNext:
  mov ax, [si]
  text ax, ax ; alternatively: test ax, 8000h
  js isNegative
  inc bx ; counting positive numbers

isNegative:
  add si, 2 ; moving to next word
  loop checkNext ; decrease cx, jump if not 0

  sub dx, bx ; bx has the positive numbers, dx - the negative ones
  ret ; done

data dw -1,2,-3,4
len dw $-data

【讨论】：

我认为您在将长度输入 cx 后缺少mov dx, cx，因为您使用 dx 而不初始化它。

【解决方案2】：

您正在加载 DX 的低 16 位，而高位（包括符号位）则保留之前存在的任何垃圾。使用 16 位操作数大小进行比较。

计算负数或非负数，然后从总计数中减去，得到另一个。

如果您需要对负数和正数进行计数，则需要两个计数器，一个 test 或 cmp 后跟两个分支（这样零就不会进入任何一个计数器）。

改编自 Sten 的回答，但有一些改进。注意test value, -1 等价于cmp value, 0。

section .rodata

word_array dw -1,2,-3,4
len  equ $-word_array     ; length in bytes.  assembler constant, so we can mov reg, imm8/imm32   rather than loading it as data.

section .text
;; clobbers ESI, ECX.  Returns in EAX, EDX
proc:
  mov   esi, word_array  ; esi points to the array.  In MASM, use OFFSET word_array
  mov   ecx, len/2 - 1      ; [esi + ecx*2] points to the last element
  xor   edx, edx           ; non_neg_count = 0

countloop:
    ; cmp   [esi + ecx*2], 0   ; This can't macro-fuse (memory and immediate operand).  Also can't micro-fuse on SnB, because of a 2-reg addressing mode
  movsx   eax, word [esi + ecx*2]  ; use a 2-reg addressing mode to save loop overhead, since this there's no ALU execution port component to this insn.  It doesn't need to micro-fuse to be one uop
  test    eax, eax        ; can macro-fuse with js
  js isNegative
  inc   edx               ; counting non-negative numbers
isNegative:
  dec   ecx               ; can macro-fuse with jge, but probably won't unless alignment stops it from being decoded in the same cycle as the earlier test/js
  jge countloop       ; jge, not jnz, because we want ecx from [0 : len-1], rather than [1 : len]

; after the loop, ecx=-1, edx=non_neg_count
; neg_count = array_count - non_neg_count
  mov   eax, len/2
  sub   eax, edx        ;   eax =  neg_count

  ret    ; return values in eax, edx

英特尔上的循环是 4 微秒。（或者更可能是在 Haswell 之前的 Sandybridge 上的 5 个，如果两个测试/分支对在同一个周期中击中解码器，那么只有一个宏融合。HSW 可以在一个解码组中进行 2 个宏融合）。

带有sets bl / add edx, ebx 的无分支版本可能运行良好。

您可以通过将 eax 归零，然后在循环中使用 scasw 将 ax 与 [esi] 进行比较，并将 esi 增加 2 来稍微节省代码大小，但这通常不是提高性能的好选择。

如果正面与非负面很重要：

section .rodata

word_array dw -1,2,0,-3,4
len  equ $-word_array     ; length in bytes.  assembler constant, so we can mov reg, imm8/imm32   rather than loading it as data.

section .text
;; clobbers ESI, EDI, EBP.  Returns in EAX, EDX
proc_pos_and_neg:
  mov   esi, word_array   ; esi points to the array.  In MASM, use OFFSET word_array
  xor   edx, edx           ; pos_count = 0
  xor   eax, eax           ; neg_count = 0

  lea   edi, [esi + len]  ; points one past the end of the array
  xor   ebx, ebx          ; clear upper portion, because setcc r32 isn't available, only setcc r8  :(

countloop:
  cmp    word [esi], 0
  setg   bl               ; 0 or 1, depending on  array[i] > 0
  lea    edx, [edx + ebx]  ; add without affecting flags
  setl   bl
  add    eax, ebx          ; can clobber flags now

  add    esi, 2            ; simple pointer-increment
  cmp    esi, edi
  jb  countloop            ; loop while our pointer is below the pointer to one-past-the-end

ret     ; neg_count in eax,  pos_count in edx

如果需要的话，零计数是n - eax - edx，其中n 是元素的数量。

我在这里使用了不同的循环结构只是为了多样化。循环应该是 7 微秒。

在 setcc 写入 bl 后读取 ebx 避免了部分寄存器合并损失，因为我们在循环外对 EBX 进行了异或归零。（保存/恢复 EBX 的上下文切换或中断将消除该性能优势，但对于短循环，可能仍然值得将异或归零提升到循环之外。）

【讨论】：