如何减少 C 中深度递归函数的堆栈帧？答案

【问题标题】：How can I reduce the stack frame of a deeply recursive function in C?如何减少 C 中深度递归函数的堆栈帧？
【发布时间】：2015-04-14 14:32:36
【问题描述】：

假设我有一些操作图形结构的递归函数：

typedef struct Node {
    Data data;
    size_t visited;
    size_t num_neighbors;
    struct Node *neighbors[];
} Node;

void doSomethingWithNode(Node *node)
{
    if ((!node) || node->visited)
        return;
    node->visited = 1;
    /* Do something with node->data */
    size_t /* some local variables */;
    size_t i;
    for (i = 0; i < node->num_neighbors; i++)
    {
        /* Do some calculations with the local variables */
        if (/* something */)
            doSomethingWithNode(node->neighbors[i]);
    }
}

由于我在循环中使用了局部变量，编译器 (gcc) 为这个函数创建了一个比我想的更大的堆栈帧（大量的 pushq 和 popq 指令即使使用-O3)，这是一个问题，因为它是深度递归的。由于访问节点的顺序无关紧要，我可以重构此代码以使用Node 指针堆栈，从而将开销减少到每次迭代一个指针。

是否有任何提示可以给编译器 (gcc) 以解决此问题？
如果没有，是否可以在不借助汇编的情况下将调用堆栈本身用于我的指针堆栈？

【问题讨论】：

所有递归代码也可以使用循环表示为非递归。如果默认的 8MB（在 Linux 上）不够，您还可以在链接时增加堆栈大小（使用例如 -z stack-size linker option）。虽然我并不真正认为有必要，因为局部变量的数量相对较小（当然取决于“一些局部变量”）并且没有数组。并且局部变量并没有真正用push 和pop 指令处理，所以你真的在看正确的代码吗？
在 gcc 手册页中简短查看后，我看到了一个选项 -fconserve-stack。你试过了吗？
@Marian：谢谢！我试试看。
@Marian 我刚刚尝试用-fconserve-stack 编译一个编程语言实现。它对探测最大递归深度的测试程序没有任何影响：无论有没有使用该选项编译的解释器，都可以实现相同数量的递归调用。 time make tests 也没有区别。该选项有一个通用名称，但可能针对在它执行任何操作之前必须出现的特定情况。也许你必须在同一个函数中有很多不重叠的块作用域，它们可以折叠到同一个堆栈空间或其他任何地方。

标签： c gcc recursion stack callstack

【解决方案1】：

您可以维护要访问的节点的向量或列表（或一些队列，或者可能是堆栈，甚至是任意无序集合）（并且您可能希望维护已访问节点的集合或哈希表） .

然后你将有一个循环选择要访问的容器前面的节点，并可能在该容器的后面添加一些未访问的节点....

阅读有关continuation passing style 和tail calls 的维基页面

Google 还有Deutsch Schorr Waite Algorithm，它可以给你一些想法。

【讨论】：

这就是我说“我可以重构这段代码以使用堆栈”时的意思。
是的，但是正如我所说，顺序并不重要，堆栈可能更容易使用。
但是感谢关于 Deutsch Schorr Waite 算法的建议。这看起来完全正确！唯一的问题是它似乎只假设两个邻居......

【解决方案2】：

你能把计算放到它们自己的非递归函数中吗？这样，当您进行递归调用时，所有临时变量的堆栈都不会存在。

更新：看起来至少局部变量中的一些数据对于递归是必不可少的。您可以使用alloca 在堆栈上显式分配内存。

【讨论】：

不，它们的值需要在循环的迭代中保留。
可能是两个循环？一个是递归的，一个不是？堆栈上需要i，因为它对于递归状态至关重要。在这种情况下，任何其他变量是否必不可少？
为了做到这一点，我需要维护一个指针列表，然后传递给递归调用。到那时，我不妨使用循环而不是递归，这正是我在第 2 项中要问的内容。

【解决方案3】：

您希望编译器做什么来解决问题？

您当然可以检查您的代码，并尽量减少局部变量的数量，尽可能清楚地表明它们（例如）仅通过使用 const 分配给一次，等等.如果可能，这可能会使编译器重新使用空间。

如果做不到这一点，您可能可以通过迭代来节省一些内存，因为这样就不需要返回地址了。

【讨论】：

【解决方案4】：

您可以使用malloc 和realloc 来管理动态增长的节点堆栈。这是管理堆栈的“类”：

typedef struct Stack {
    void **pointers;
    size_t count;
    size_t alloc;
} Stack;

void Stack_new(Stack *stack)
{
    stack->alloc = 10;
    stack->count = 0;
    stack->pointers = malloc(stack->alloc * sizeof(void*));
}

void Stack_free(Stack *stack)
{
    free(stack->pointers);
    stack->pointers = null;
}

void Stack_push(Stack *stack, void *value)
{
    if (stack->alloc < stack->count + 1) {
        stack->alloc *= 2;
        stack->pointers = realloc(stack->pointers, stack->alloc * sizeof(void*));
    }
    stack->pointers[stack->count++] = value;
}

void *Stack_pop(Stack *stack)
{
    if (stack->count > 0)
        return stack->pointers[--stack->count];
    return NULL;
}

【讨论】：

我不太明白这与递归函数中的局部变量有什么关系？
是的，我可以。但是如果可能的话，为什么不使用调用堆栈呢？这就是我问这个的原因。
@Lundin：我只是帮助将指针堆栈存储在堆栈之外，以避免使用堆栈进行递归。但现在我看到了这个问题，例如每个递归级别的堆栈上有 10 个变量。
@Lundin 这很重要，因为堆栈数据结构可以用作递归调用中的并行堆栈，甚至可以支持程序逻辑是迭代（或尾递归）并且所有上下文都是的替代算法在显式堆栈中。

【解决方案5】：

“它是深度递归的”暗示最深的递归发生在不超过 1 个未访问的 neighbor 的路径中。

只有当有超过 1 个有趣的邻居时才让代码递归，否则就循环。

void doSomethingWithNode(Node *node) {
  while (node) {
    if (node->visited) return;
    node->visited = 1;
    /* Do something with node->data */
    size_t /* some local variables */;
    size_t i;
    Node *first = NULL;
    for (i = 0; i < node->num_neighbors; i++) {
        /* Do some calculations with the local variables */
        if (/* something */) {

          // Save the first interesting node->neighbors[i] for later
          if (first == NULL && 
              node->neighbors[i] != NULL && 
              node->neighbors[i]->visited == 0) {
            first = node->neighbors[i];

         } else {
            doSomethingWithNode(node->neighbors[i]);
          }
        }
    }
    node = first;
  }
}

这不会减少堆栈帧，而是在只有 1 层时消除递归。 IOWs：当不需要递归时。

递归深度现在应该不再超过 O(log2(n)) 而不是原来的最坏情况 O(n)

【讨论】：

【解决方案6】：

如果您有更多数量的局部变量和数组，那么您可以尝试使用malloc 分配内存，使用单指针和固定偏移对其进行操作。free 退出函数时的内存。

通过这种方式，您将保存堆栈并为所有迭代重用相同的堆（可能）部分。

【讨论】：

【解决方案7】：

如果其他答案不优雅并且需要大量开销，我会发现很多。可能没有好的方法，任何方法都取决于手头的递归类型。

在你的情况下，递归在最后，只需要变量 i 。为减少堆栈帧，您可以为所有其他变量使用全局空间。

如果你想进一步减少并删除 i，你可以使用 node->visisted 作为计数器：

static struct VARS {
    int iSomething;
    Data *dataptr;
    double avg;
} gVars;

void doSomethingWithNode(Node *node)
{
    if ((!node) || node->visited)
        return;
    /* Do something with node->data */
    /* some local variables in global space */;
    gVars.iSomething= 1;
    for (; node->visited < node->num_neighbors; node->visited++)
    {
        /* Do some calculations with the local variables */
        if (/* something */)
            doSomethingWithNode(node->neighbors[node->visited]);
    }
}

【讨论】：

ps：因为这仍然是一个 hack，它也不优雅。
我不明白这有什么帮助。 i 不是问题。问题是我无法从循环中删除的其他局部变量。
如果循环中的这些局部变量在迭代过程中不携带信息，那么您也可以将它们移动到全局空间。如果是这样，您甚至可以考虑将它们移动到节点中。

【解决方案8】：

将所有对递归不是必需的局部变量放入struct locals 并使用plocals-> 访问它们。将计算放入自己的非递归函数（Arkadiy 的答案）的优势在于，如果需要，变量是有效的并在递归中保留它们的值。

#include <stddef.h>

struct Data {
    char data[1];
};

typedef struct Node {
    struct Data data;
    size_t visited;
    size_t num_neighbors;
    struct Node *neighbors;
} Node;

struct Locals {
    /* local variables not essential for recursion */;
};
static void doSomethingWithNodeRecurse(Node *node, struct Locals *plocals)
{
    if ((!node) || node->visited)
        return;
    node->visited = 1;
    /* Do something with node->data */
    /* local variables essential for recursion */
    size_t i;
    for (i = 0; i < node->num_neighbors; i++)
    {
        /* Do some calculations with the local variables */
        if (1/* something */)
            doSomethingWithNodeRecurse(&node->neighbors[i], plocals);
        /* Do some calculations with the local variables */
    }
}

void doSomethingWithNode(Node *node)
{
    struct Locals locals;

    doSomethingWithNodeRecurse(node, &locals);
}

如果变量仍然太大而无法在堆栈上分配它们，则可以按照 Vagish 的建议将它们分配在堆上：

#include <stddef.h>
#include <stdlib.h>

struct Data {
    char data[1];
};

typedef struct Node {
    struct Data data;
    size_t visited;
    size_t num_neighbors;
    struct Node *neighbors;
} Node;

struct Locals {
    /* local variables too big for allocation on stack */;
};
void doSomethingWithNode(Node *node)
{
    struct Locals *plocals;

    if ((!node) || node->visited)
        return;

    /* ---> allocate the variables on the heap <--- */
    if ((plocals = malloc(sizeof *plocals)) == NULL) abort();

    node->visited = 1;
    /* Do something with node->data */
    size_t i;
    for (i = 0; i < node->num_neighbors; i++)
    {
        /* Do some calculations with the local variables */
        if (1/* something */)
            doSomethingWithNode(&node->neighbors[i]);
        /* Do some calculations with the local variables */
    }
    /* ---> free the variables <--- */
    free(plocals);
}

【讨论】：