如何将 AST 从解析器生成器打印到 graphviz？答案

【问题标题】：How to print an AST from a parser generator to graphviz?如何将 AST 从解析器生成器打印到 graphviz？
【发布时间】：2021-05-29 17:58:36
【问题描述】：

我尝试从 PackCC 的计算器示例中打印 AST 表示。

我添加了一个“节点”对象和函数来编写一个点文件。

这似乎适用于 '+' 和 '-' 表达式，但当我混合使用 '+' 和 '*' 时，结果出乎意料。

请注意这两个操作有不同的优先级，如果它可以帮助你的话。

这是语法文件。

%prefix "calc"
%value "Node*" # The type of "$$" values.

###############################################################################

%header{
    typedef struct Node {
        const char* data;
        struct Node* l_child;
        struct Node* r_child;
    }Node;
    
    Node* node(const char* text, Node* left, Node* right);
    
    void dot_graph(char* name, Node* root);
}

###############################################################################

%source {
    #include <stdio.h>
    #include <stdlib.h>
    
    Node* node(const char* text, Node* left, Node* right)
    {
        Node* res = malloc(sizeof (Node));
        res->data = text;
        res->l_child = left;
        res->r_child = right;
        return res;
    }
    
    void fill_dot(FILE* f, Node* n, Node* parent, int parentNum)
    {
        static int currentNum; // This number will be printed to make unique names.
        // Lines that represents nodes.
        if (parent == NULL) {
            fprintf(f, "    node%d [label=\"%s\"]\n", currentNum, n->data);
        }
        else {
            fprintf(f, "    node%d [label=\"%s\"]\n", currentNum, n->data);
            fprintf(f, "    node%d -- node%d\n", parentNum, currentNum);
            parentNum += 1;
        }
        currentNum += 1;
    
        if (n->l_child != NULL) {
            fill_dot(f, n->l_child, n, parentNum);
        }
    
        if (n->r_child != NULL) {
            fill_dot(f, n->r_child, n, parentNum);
        }
        parentNum -= 1;
        if( parentNum == -1) // Hopefully the end of the recursion.
            currentNum = 0;
    
    }
    
    void dot_graph(char* name, Node* root)
    {
        FILE* res = fopen(name, "w");
        if (res == NULL) {
            printf("File problem: %s\n", name);
            exit(1);
        }
        fprintf(res, "graph {\n"); // Opens the dot file
        fill_dot(res, root, NULL, 0); // fills with the nodes.
        fprintf(res, "}\n"); // Close the dot file.
        fclose(res);
    }

}

###############################################################################

statement <- _ e:expression _ EOL { $$ = e; puts("Expression parsed");}
           / ( !EOL . )* EOL      { printf("error\n"); }

expression <- e:term { $$ = e; }

term <- l:term _ < '+' > _ r:factor { $$ = node(strdup($1), l, r); }
      / l:term _ < '-' > _ r:factor { $$ = node(strdup($2), l, r); }
      / e:factor                { $$ = e; }

factor <- l:factor _ < '*' > _ r:unary { $$ = node(strdup($1), l, r); }
        / l:factor _ < '/' > _ r:unary { $$ = node(strdup($2), l, r); }
        / e:unary                  { $$ = e; }

unary <- < '+' > _ e:unary { $$ = node(strdup($1), e, NULL ); }
       / < '-' > _ e:unary { $$ = node(strdup($2), e, NULL ); }
       / e:primary     { $$ = e; }

primary <- < [0-9]+ >               { $$ = node(strdup($1), NULL, NULL); }
         / '(' _ e:expression _ ')' { $$ = e; }

_      <- [ \t]*
EOL    <- '\n' / '\r\n' / '\r' / ';'

%%
int main() {
    Node* root = node("", NULL, NULL);
    calc_context_t *ctx = calc_create(NULL);
    while (calc_parse(ctx, &root)) { // The root node will be "extended" by childs during parsing.
    }
    dot_graph("calc.dot", root);
    calc_destroy(ctx);
    // No "free" for the nodes to shorten this file.
    return 0;
}

运行 packcc、编译和运行解析器的命令。

    ./packcc calc.peg && gcc calc.c && ./a.out

如果你在 linux 上查看图表的命令。

     dot -Tx11 calc.dot

我想知道问题出在解析器操作上还是出在我设计得很痛苦的点式打印机上。

【问题讨论】：

标签： c parsing graphviz

【解决方案1】：

这就是你在树上行走的方式。您在扫描期间没有正确识别父节点。据我所见，它与解析（因此 packcc）无关，我在下面提供的结果和代码是在完全不使用 packcc 或语法的情况下创建的；我刚刚使用您的 node 函数手动创建了几棵树。这通常是一种更好的调试方式，也是一种创建最少示例的更好方式，因为它有助于澄清与问题无关的内容（在这种情况下，是解析器，其中包含很多不相关的代码）。

这是您的函数产生的结果，右侧有正确的行（使用 diff --side-by-side 产生，以便您可以看到差异）：


     -- YOUR CODE --              -- CORRECT --
graph {                       graph {
    node0 [label="+"]             node0 [label="+"]
    node1 [label="4"]             node1 [label="4"]
    node0 -- node1                node0 -- node1
    node2 [label="*"]             node2 [label="*"]
    node0 -- node2                node0 -- node2
    node3 [label="5"]             node3 [label="5"]
    node1 -- node3          |     node2 -- node3
    node4 [label="6"]             node4 [label="6"]
    node1 -- node4          |     node2 -- node4
}                             }

除了node1 被用作node3 和node4 的父级而不是node2 之外，一切都很好。当然，这是一个非常简单的图表。更大的图表会有更多的错误。（我怀疑它与加法一起工作的原因仅仅是左关联性产生了一个向左倾斜的图。事实上，如果你为4*5+6而不是4+5*6绘制一个图，你不会看到任何问题。（由方式，很高兴你展示了包括构建指令在内的所有内容。你只是错过了输入字符串:-(但我想通了。)

“精心设计”并不是一个糟糕的描述。这有点像鲁布·戈德堡。根据经验，每当您发现自己在递归中使用静态变量时，您应该立即停止并重新考虑您的方法。这永远不是正确的解决方案。如果您需要传递数据，请使用参数和返回值。它几乎总是更简单、更清晰，而且总是不那么脆弱且更线程安全。

考虑到这一点，下面是简单的递归打印机。我删除了 parent 参数，因为您只是将它用作标志：

/* If parent is -1, this is the root node. Otherwise, it's the nodeid of
 * the parent node. nextid is the next node id to use for a child, if any.
 * Returns the next id to use.
 */
static int gv_helper(FILE* f, Node* node, int parent, int nextid) {
  if (node == NULL) return nextid;
  int id = nextid++;   /* Get an id for this node */

  /* Show this node */
  fprintf(f, "    node%d [label=\"%s\"]\n", id, node->data);
  if (parent >= 0)
    fprintf(f, "    node%d -- node%d\n", parent, id);

  /* Recurse over the children. Advance nextid as we go. */
  nextid = gv_helper(f, node->l_child, id, nextid);
  return gv_helper(f, node->r_child, id, nextid);
}

/* The public entry avoids having to pass magical arguments. But sometimes it
 * would be better to give the client access to all the arguments.
 */
int gv_print(FILE* f, Node* root) {
  return gv_helper(f, root, -1, 0);
}

代码不是独立地增加节点 ID 和父 ID（无论如何都不能工作），而是简单地观察到调用中分配的节点 ID 是子级递归调用的父 ID。这样，递归有助于使解决方案更清晰，而不是依赖于难以分析的状态。

【讨论】：