【问题标题】：how to use assembly to get the result of a __stdcall function that returns float如何使用汇编来获取返回浮点数的 __stdcall 函数的结果
【发布时间】：2013-08-10 21:50:31
【问题描述】：

我有一个汇编例程，它以通用方式调用已知使用stdcall 约定并返回float 的函数。编组框架正在使用此函数将 stdcall 函数公开给脚本语言。

背景

这是在 MinGW 4.3、Win32 上编译的使用 GNU 内联汇编的函数：

inline uint64_t stdcall_invoke_return_float(int args_size_bytes,
                                            const char * args_ptr,
                                            void * func_ptr)
{
    uint64_t result;
    assert(
        0 == args_size_bytes % 4
            || !"argument size must be a multiple of 4 bytes");
#if defined(__GNUC__)
    asm
    (
        /* INPUT PARAMS:  %0 is the address where top of FP stack to be stored
         *                %1 is the number of BYTES to push onto the stack, */
        /*                   and during the copy loop it is the address of */
        /*                   the next word to push */
        /*                %2 is the base address of the array */
        /*                %3 is the address of the function to call */
            "testl %1, %1    # If zero argument bytes given, skip \n\t"
            "je    2f        # right to the function call.        \n\t"
            "addl  %2, %1\n"
        "1:\n\t"
            "subl  $4, %1    # Push arguments onto the stack in   \n\t"
            "pushl (%1)      # reverse order. Keep looping while  \n\t"
            "cmp   %2, %1    # addr to push (%1) > base addr (%2) \n\t"
            "jg    1b        # Callee cleans up b/c __stdcall.    \n"
        "2:\n\t"
            "call  * %3      # Callee will leave result in ST0    \n\t"
            "fsts  %0        # Copy 32-bit float from ST0->result"
        : "=m" (result)
        : "r" (args_size_bytes), "r" (args_ptr), "mr" (func_ptr)
        : "%eax", "%edx", "%ecx" /* eax, ecx, edx are caller-save */, "cc"
    );
#else
#pragma error "Replacement for inline assembler required"
#endif
    return result;
}

这只是让编写测试用例更容易的一点胶水：

template<typename FuncPtr, typename ArgType>
float float_invoke(FuncPtr f, int nargs, ArgType * args)
{
    uint64_t result = stdcall_invoke_return_float(
        nargs * sizeof(ArgType),
        reinterpret_cast<const char *>(args),
        reinterpret_cast<void *>(f)
    );
    return *reinterpret_cast<float *>(&result);
}

现在我有一些调用这个函数的测试用例：

__stdcall float TestReturn1_0Float()
{ return 1.0f; }

__stdcall float TestFloat(float a)
{ return a; }

__stdcall float TestSum2Floats(float a, float b)
{ return a + b; }

static const float args[2] = { 10.0f, -1.0f };

assert_equals(1.0f, float_invoke(TestReturn1_0Float, 0, args)); // test 1
assert_equals(10.0f, float_invoke(TestFloat, 1, args));         // test 2
assert_equals(-1.0f, float_invoke(TestFloat, 1, args + 1));     // test 3
assert_equals(9.0f, float_invoke(TestSumTwoFloats, 2, args));   // test 4

问题

随机地，测试 3 给了我垃圾输出而不是返回 -1.0。

我想知道我是不是

未能保留call 指令之前的某些状态？
用fsts 指令弄乱了某些状态？
从根本上误解了如何从返回 float 的 stdcall 函数中获取 float 值？？？？

非常感谢所有帮助。

【问题讨论】：

标签： c++ windows x86 inline-assembly stdcall

【解决方案1】：

您允许函数指针的内存引用，GCC 可能会在错误假设内联汇编不会更改堆栈指针的情况下构造相对于堆栈指针的引用。

【讨论】：

这是一个很好的观点；拆卸很容易显示是否是这种情况。但我不这么认为，在这种情况下 - 因为编译器知道函数的代码位置（这是一个 compile/link time 常量），但不知道它与当前堆栈指针之间的偏移量（即runtime 变量，因为堆栈指针因调用上下文和/或调用线程而异）。相对于堆栈指针的函数地址是……不寻常的，因为执行位于堆栈上的代码已经……“弃用”了。

【解决方案2】：

缺少一台windows机器，我无法完全测试这个；在 Linux 上，以下为我获取 float 函数的返回码：

extern float something(int);

#include 
#include 

int main(int argc, char **argv)
{
    int val = atoi(argv[1]);
    float ret;

    asm("pushl %1\n\t"
        "call * %2\n\t"
        "addl $4, %%esp"
       : "=t"(ret)
       : "r"(val), "r"(something)
       : "%eax", "%ecx", "%edx", "memory", "cc");

    printf("something(%d) == %f\n", val, ret);
    return 0;
}

关键是使用"=t"(ret) 约束 - 获取浮点堆栈的顶部，请参阅Machine Constraints（来自 gcc 手册）。如果 Windows stdcall 返回 float 的结果也是 ST(0)，那应该可以工作，不需要 fld/fst，因为编译器可以在必要时为您执行这些操作。

当您从内联汇编中调用函数时，您还需要指定 memory 和 cc clobbers。

【讨论】：

它似乎与=t 约束一致。但是，我仍然想知道 为什么 使用 fsts 指令的工作方式不一致！另外，为什么我需要内存破坏器？
内存“破坏者”实际上是一个障碍。这意味着编译器将以这样的方式对内联asm() 块进行排序，生成的剩余指令在它之前完成所有加载/存储，并在它之后重新完成。 cc 也一样，关于条件代码，也就是之前比较的状态（所以它不会，比如说，在 asm 块之前执行 cmp 并测试它之后的结果）。