在 C++ 中使用乘法累加指令内联汇编答案

【问题标题】：Using Multiply Accumulate Instruction Inline Assembly in C++在 C++ 中使用乘法累加指令内联汇编
【发布时间】：2011-04-02 19:00:41
【问题描述】：

我正在 ARM9 处理器上实现 FIR 滤波器并尝试使用 SMLAL 指令。

最初我实现了以下过滤器并且它运行良好，除了这种方法使用太多的处理能力而无法在我们的应用程序中使用。

uint32_t DDPDataAcq::filterSample_8k(uint32_t sample)
 {
    // This routine is based on the fir_double_z routine outline by Grant R Griffin
    // - www.dspguru.com/sw/opendsp/alglib.htm 
    int i = 0; 
    int64_t accum = 0; 
    const int32_t *p_h = hCoeff_8K; 
    const int32_t *p_z = zOut_8K + filterState_8K;


    /* Cast the sample to a signed 32 bit int 
     * We need to preserve the signdness of the number, so if the 24 bit
     * sample is negative we need to move the sign bit up to the MSB and pad the number
     * with 1's to preserve 2's compliment. 
     */
    int32_t s = sample; 
    if (s & 0x800000)
        s |= ~0xffffff;

    // store input sample at the beginning of the delay line as well as ntaps more
    zOut_8K[filterState_8K] = zOut_8K[filterState_8K+NTAPS_8K] = s;

    for (i =0; i<NTAPS_8K; ++i)
    {
        accum += (int64_t)(*p_h++) * (int64_t)(*p_z++);
    }

    //convert the 64 bit accumulator back down to 32 bits
    int32_t a = (int32_t)(accum >> 9);


    // decrement state, wrapping if below zero
    if ( --filterState_8K < 0 )
        filterState_8K += NTAPS_8K;

    return a; 
}

我一直在尝试用内联汇编替换乘法累加，因为即使打开了优化，GCC 也没有使用 MAC 指令。我用以下内容替换了 for 循环：

uint32_t accum_low = 0; 
int32_t accum_high = 0; 

for (i =0; i<NTAPS_4K; ++i)
{
    __asm__ __volatile__("smlal %0,%1,%2,%3;"
        :"+r"(accum_low),"+r"(accum_high)
        :"r"(*p_h++),"r"(*p_z++)); 
} 

accum = (int64_t)accum_high << 32 | (accum_low);

我现在使用 SMLAL 指令得到的输出不是我期望的过滤数据。我得到的随机值似乎与原始信号或我期望的数据没有模式或联系。

我感觉我在将 64 位累加器拆分为指令的高位和低位寄存器时做错了，或者我将它们重新组合在一起是错误的。无论哪种方式，我都不确定为什么无法通过将 C 代码与内联程序集交换来获得正确的输出。

【问题讨论】：

为什么不直接使用 DSP 库？
您使用的是哪个编译器版本？我尝试使用选项 -O3 -march=armv5te 使用 GCC 4.4.3 编译您的纯 C 代码，它生成了 smlal 指令。
我一直在使用 4.3.2。我不知道您可以这样指定 -march 标志。一旦我添加了它，GCC 也会生成我希望的程序集。谢谢！
@Nils：把你的评论变成答案:)
似乎您汇总的大多数产品在每次调用中都不会发生变化。如果您保持运行总和，则可以在每次调用时使用适当的增量更新总和，然后您就不需要汇编程序。但当然我可能读错了。

标签： c++ assembly arm filtering

【解决方案1】：

您使用的是哪个编译器版本？我尝试使用选项 -O3 -march=armv5te 使用 GCC 4.4.3 编译您的纯 C 代码，它生成了 smlal 指令。

【讨论】：