一、引言
最早的NEON指令在ARM Cortex-A5内核上,作为可选的模块出现。
直到后ARM Cortex-A7全面支持NEON协处理器。才开始广泛应用于实际项目开发中。 由于32位寄存器的局限性,ARM公司的在神们希望,能过增加寄存器的位宽来增加CPU的数据处理能力。通过数据位扩展,以前一条指令只能处理单个数据,目前可扩展处理多个数据。
二、什么是NEON
1)、NEON is a wide SIMD data processing architecture
2)、Extension of the ARM instruction set
3)、32 registers, 64-bits wide (dual view as 16 registers, 128-bits wide)
4)、Registers are considered as vectors of elements of the same data type
5)、Data types can be: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single prec. float
Instructions perform the same operation in all lanes三、NEON相关指令
1)、Vectors and Scalars
Registers hold one or more elements of the same data type.
Vn can be used to reference either a 64-bit Dn or 128-bit Qn register
A register, data type combination describes a vector of elements
Some instructions can reference individual scalar elements
Scalar elements are referenced using the array notation Vn[x]
Array ordering is always from the least significant bit.
2)、Neon Operation
Arithmetic
○ VABA, VABD, VABS, VNEG, VADD, VSUB, VADDHN, VSUBHN, VHADD, VHSUB,
VPADD, VPADAL, VMAX, VMIN, VPMAX, VPMIN, VCLS, VCLZ, VCNT
● Multiplication
○ VMUL, VMLA, VMLS, VQDMULL, VQDMLAL, VQDMLSL, VQDMULH
● Shifts
○ VSHL, VSHR, VSRA, VSLI, VSRI
● Comparison and Selection
○ VCEQ, VCGE, VCGT, VCLE, VCLT, VTST, VBIF, VBIT, VBSL
● Logical
○ VAND, VBIC, VEOR, VORN, VORR, VMVN
● Reciprocal Estimate/Step, Reciprocal Square Root Estimate/Step
○ VRECPE, VRSQRTE, VRECPS, VRSQRTS
● Miscellaneous
○ VMOV, VDUP, VCVT, VEXT, VREV, VSWP, VTBL, VTBX, VTRN, VUZP, VZIP
● Load/Store
○ VLD1, VLD2, VLD3, VLD4, VST1, VST2, VST3, VST4