seebro
           mov    esi,   this            ; vector u
            movups xmm0,  [esi]           ; first vector in xmm0
            movaps xmm2,  xmm0            ; copy original vector
            mulps  xmm0,  xmm0            ; mul with 2nd vector
            movaps xmm1,  xmm0            ; copy result
            shufps xmm1,  xmm1, 4Eh       ; shuffle: f1,f0,f3,f2
            addps  xmm0,  xmm1            ; add: f3+f1,f2+f0,f1+f3,f0+f2 
            movaps xmm1,  xmm0            ; copy results
            shufps xmm1,  xmm1, 11h       ; shuffle: f0+f2,f1+f3,f0+f2,f1+f3 
            addps  xmm0,  xmm1            ; add: x,x,f0+f1+f2+f3,f0+f1+f2+f3

            rsqrtps xmm0,  xmm0           ; recip. sqrt (faster than ss + shufps)
            mulps   xmm2,  xmm0           ; mul by reciprocal
            movups  [esi], xmm2           ; bring back result

核心思想 xmm寄存器 4分量 全部赋值为x*x +y*y + z*z, rsqrtps 求向量长度的倒数,最终乘以xmm2存的原向量,完成标准化。

rsqrtps用的查表方式,近似求的开方倒数,提高执行速度。精确求利用rsqrtss指令。

分类:

技术点:

相关文章:

  • 2021-11-04
  • 2021-04-06
  • 2021-09-25
  • 2021-07-10
  • 2021-08-26
  • 2022-12-23
  • 2021-11-12
猜你喜欢
  • 2022-12-23
  • 2022-02-24
  • 2022-12-23
  • 2022-12-23
  • 2021-06-29
  • 2021-07-20
  • 2021-11-09
相关资源
相似解决方案