【问题标题】:boost::spirit access position iterator from semantic actionsboost::spirit 从语义动作中访问位置迭代器
【发布时间】:2013-11-05 22:02:39
【问题描述】:

假设我有这样的代码(行号供参考):

1:
2:function FuncName_1 {
3:    var Var_1 = 3;
4:    var  Var_2 = 4;
5:    ...

我想编写一个语法来解析此类文本,将所有标识符(函数和变量名)信息放入树中(utree?)。 每个节点应保留:line_num、column_num 和符号值。示例:

root: FuncName_1 (line:2,col:10)
  children[0]: Var_1 (line:3, col:8)
  children[1]: Var_1 (line:4, col:9)

我想将它放入树中,因为我计划遍历该树,并且对于每个节点,我必须知道“上下文”:(当前节点的所有父节点)。

例如,在使用 Var_1 处理节点时,我必须知道这是函数 FuncName_1 的局部变量的名称(当前作为节点处理,但更早一级)

我无法弄清楚一些事情

  1. 这可以在 Spirit 中通过语义动作和 utree 来完成吗?还是应该使用变体 树?
  2. 如何将这三个信息(列、行​​、符号名)同时传递给节点?我知道我必须使用 pos_iterator 作为语法的迭代器类型,但是如何在语义操作中访问这些信息?

我是 Boost 的新手,所以我一遍又一遍地阅读 Spirit 文档,我尝试用谷歌搜索我的问题,但我无法将所有部分放在一起以找到解决方案。好像以前没有我这样的用例(或者我只是找不到它) 看起来具有位置迭代器的唯一解决方案是具有解析错误处理的解决方案,但这不是我感兴趣的情况。 仅解析我正在使用的代码的代码如下,但我不知道如何继续。

  #include <boost/spirit/include/qi.hpp>
  #include <boost/spirit/include/support_line_pos_iterator.hpp>

  namespace qi = boost::spirit::qi;
  typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;

  template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
  struct ParseGrammar: public qi::grammar<Iterator, Skipper>
  {
        ParseGrammar():ParseGrammar::base_type(SourceCode)
        {
           using namespace qi;
           KeywordFunction = lit("function");
           KeywordVar    = lit("var");
           SemiColon     = lit(';');

           Identifier = lexeme [alpha >> *(alnum | '_')];
           VarAssignemnt = KeywordVar >> Identifier >> char_('=') >> int_ >> SemiColon;
           SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignemnt >> '}';
        }

        qi::rule<Iterator, Skipper> SourceCode;
        qi::rule<Iterator > KeywordFunction;
        qi::rule<Iterator,  Skipper> VarAssignemnt;
        qi::rule<Iterator> KeywordVar;
        qi::rule<Iterator> SemiColon;
        qi::rule<Iterator > Identifier;
  };

  int main()
  {
     std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var  Var_2 = 4; }";

     pos_iterator_t first(content.begin()), iter = first, last(content.end());
     ParseGrammar<pos_iterator_t> resolver;    //  Our parser
     bool ok = phrase_parse(iter,
                            last,
                            resolver,
                            qi::space);

     std::cout << std::boolalpha;
     std::cout << "\nok : " << ok << std::endl;
     std::cout << "full   : " << (iter == last) << std::endl;
     if(ok && iter == last)
     {
        std::cout << "OK: Parsing fully succeeded\n\n";
     }
     else
     {
        int line   = get_line(iter);
        int column = get_column(first, iter);
        std::cout << "-------------------------\n";
        std::cout << "ERROR: Parsing failed or not complete\n";
        std::cout << "stopped at: " << line  << ":" << column << "\n";
        std::cout << "remaining: '" << std::string(iter, last) << "'\n";
        std::cout << "-------------------------\n";
     }
     return 0;
  }

【问题讨论】:

标签: c++ boost abstract-syntax-tree boost-spirit-qi


【解决方案1】:

这是一个有趣的练习,我终于整理了一个 on_success[1] 的工作演示来注释 AST 节点.

假设我们想要这样的 AST:

namespace ast
{
struct LocationInfo {
    unsigned line, column, length;
};

struct Identifier     : LocationInfo {
    std::string name;
};

struct VarAssignment  : LocationInfo {
    Identifier id;
    int value;
};

struct SourceCode     : LocationInfo {
    Identifier function;
    std::vector<VarAssignment> assignments;
};
}

我知道,“位置信息”对于 SourceCode 节点来说可能是多余的,但你知道...无论如何,为了便于将属性分配给这些节点而不需要 语义操作 或许多专门设计的构造函数:

#include <boost/fusion/adapted/struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::Identifier,    (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode,    (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))

那里。现在我们可以声明规则来公开这些属性:

qi::rule<Iterator, ast::SourceCode(),    Skipper> SourceCode;
qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
qi::rule<Iterator, ast::Identifier()>         Identifier;
// no skipper, no attributes:
qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;

我们根本不(基本上)修改语法:属性传播是“自动的”[2]

KeywordFunction = lit("function");
KeywordVar      = lit("var");
SemiColon       = lit(';');

Identifier      = as_string [ alpha >> *(alnum | char_("_")) ];
VarAssignment   = KeywordVar >> Identifier >> '=' >> int_ >> SemiColon; 
SourceCode      = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';

魔法

我们如何获取附加到节点的源位置信息?

auto set_location_info = annotate(_val, _1, _3);
on_success(Identifier,    set_location_info);
on_success(VarAssignment, set_location_info);
on_success(SourceCode,    set_location_info);

现在,annotate 只是一个可调用对象的惰性版本,定义为:

template<typename It>
struct annotation_f {
    typedef void result_type;

    annotation_f(It first) : first(first) {}
    It const first;

    template<typename Val, typename First, typename Last>
    void operator()(Val& v, First f, Last l) const {
        do_annotate(v, f, l, first);
    }
  private:
    void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
        using std::distance;
        li.line   = get_line(f);
        li.column = get_column(first, f);
        li.length = distance(f, l);
    }
    static void do_annotate(...) { }
};

由于get_column 的工作方式,函子是有状态的(因为它记住了开始迭代器)[3]。如您所见,do_annotate 只接受源自LocationInfo 的任何内容。

现在,布丁的证明:

std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var  Var_2 = 4; }";

pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver(first);    //  Our parser

ast::SourceCode program;
bool ok = phrase_parse(iter,
        last,
        resolver,
        qi::space,
        program);

std::cout << std::boolalpha;
std::cout << "ok  : " << ok << std::endl;
std::cout << "full: " << (iter == last) << std::endl;
if(ok && iter == last)
{
    std::cout << "OK: Parsing fully succeeded\n\n";

    std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
    for (auto const& va : program.assignments)
        std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
}
else
{
    int line   = get_line(iter);
    int column = get_column(first, iter);
    std::cout << "-------------------------\n";
    std::cout << "ERROR: Parsing failed or not complete\n";
    std::cout << "stopped at: " << line  << ":" << column << "\n";
    std::cout << "remaining: '" << std::string(iter, last) << "'\n";
    std::cout << "-------------------------\n";
}

打印出来:

ok  : true
full: true
OK: Parsing fully succeeded

Function name: FuncName_1 (see L1:1:56)
variable Var_1 assigned value 3 at L2:3:14
variable Var_2 assigned value 4 at L3:3:15

完整的演示程序

Live On Coliru

同时显示:

  • 错误处理,例如:

    Error: expecting "=" in line 3: 
    
    var  Var_2 - 4; }
               ^---- here
    ok  : false
    full: false
    -------------------------
    ERROR: Parsing failed or not complete
    stopped at: 1:1
    remaining: 'function FuncName_1 {
    var Var_1 = 3;
    var  Var_2 - 4; }'
    -------------------------
    
  • BOOST_SPIRIT_DEBUG 宏

  • 一种方便地流式传输任何 AST 节点的 LocationInfo 部分的 hacky 方式,抱歉 :)
//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
#include <iomanip>

namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;

typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;

namespace ast
{
    namespace manip { struct LocationInfoPrinter; }

    struct LocationInfo {
        unsigned line, column, length;
        manip::LocationInfoPrinter printLoc() const;
    };

    struct Identifier     : LocationInfo {
        std::string name;
    };

    struct VarAssignment  : LocationInfo {
        Identifier id;
        int value;
    };

    struct SourceCode     : LocationInfo {
        Identifier function;
        std::vector<VarAssignment> assignments;
    };

    ///////////////////////////////////////////////////////////////////////////
    // Completely unnecessary tweak to get a "poor man's" io manipulator going
    // so we can do `std::cout << x.printLoc()` on types of `x` deriving from
    // LocationInfo
    namespace manip {
        struct LocationInfoPrinter {
            LocationInfoPrinter(LocationInfo const& ref) : ref(ref) {}
            LocationInfo const& ref;
            friend std::ostream& operator<<(std::ostream& os, LocationInfoPrinter const& lip) {
                return os << lip.ref.line << ':' << lip.ref.column << ':' << lip.ref.length;
            }
        };
    }

    manip::LocationInfoPrinter LocationInfo::printLoc() const { return { *this }; }
    // feel free to disregard this hack
    ///////////////////////////////////////////////////////////////////////////
}

BOOST_FUSION_ADAPT_STRUCT(ast::Identifier,    (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode,    (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))

struct error_handler_f {
    typedef qi::error_handler_result result_type;
    template<typename T1, typename T2, typename T3, typename T4>
        qi::error_handler_result operator()(T1 b, T2 e, T3 where, T4 const& what) const {
            std::cerr << "Error: expecting " << what << " in line " << get_line(where) << ": \n" 
                << std::string(b,e) << "\n"
                << std::setw(std::distance(b, where)) << '^' << "---- here\n";
            return qi::fail;
        }
};

template<typename It>
struct annotation_f {
    typedef void result_type;

    annotation_f(It first) : first(first) {}
    It const first;

    template<typename Val, typename First, typename Last>
    void operator()(Val& v, First f, Last l) const {
        do_annotate(v, f, l, first);
    }
  private:
    void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
        using std::distance;
        li.line   = get_line(f);
        li.column = get_column(first, f);
        li.length = distance(f, l);
    }
    static void do_annotate(...) {}
};

template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, ast::SourceCode(), Skipper>
{
    ParseGrammar(Iterator first) : 
        ParseGrammar::base_type(SourceCode),
        annotate(first)
    {
        using namespace qi;
        KeywordFunction = lit("function");
        KeywordVar      = lit("var");
        SemiColon       = lit(';');

        Identifier      = as_string [ alpha >> *(alnum | char_("_")) ];
        VarAssignment   = KeywordVar > Identifier > '=' > int_ > SemiColon; // note: expectation points
        SourceCode      = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';

        on_error<fail>(VarAssignment, handler(_1, _2, _3, _4));
        on_error<fail>(SourceCode, handler(_1, _2, _3, _4));

        auto set_location_info = annotate(_val, _1, _3);
        on_success(Identifier,    set_location_info);
        on_success(VarAssignment, set_location_info);
        on_success(SourceCode,    set_location_info);

        BOOST_SPIRIT_DEBUG_NODES((KeywordFunction)(KeywordVar)(SemiColon)(Identifier)(VarAssignment)(SourceCode))
    }

    phx::function<error_handler_f> handler;
    phx::function<annotation_f<Iterator>> annotate;

    qi::rule<Iterator, ast::SourceCode(),    Skipper> SourceCode;
    qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
    qi::rule<Iterator, ast::Identifier()>             Identifier;
    // no skipper, no attributes:
    qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;
};

int main()
{
    std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var  Var_2 - 4; }";

    pos_iterator_t first(content.begin()), iter = first, last(content.end());
    ParseGrammar<pos_iterator_t> resolver(first);    //  Our parser

    ast::SourceCode program;
    bool ok = phrase_parse(iter,
            last,
            resolver,
            qi::space,
            program);

    std::cout << std::boolalpha;
    std::cout << "ok  : " << ok << std::endl;
    std::cout << "full: " << (iter == last) << std::endl;
    if(ok && iter == last)
    {
        std::cout << "OK: Parsing fully succeeded\n\n";

        std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
        for (auto const& va : program.assignments)
            std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
    }
    else
    {
        int line   = get_line(iter);
        int column = get_column(first, iter);
        std::cout << "-------------------------\n";
        std::cout << "ERROR: Parsing failed or not complete\n";
        std::cout << "stopped at: " << line  << ":" << column << "\n";
        std::cout << "remaining: '" << std::string(iter, last) << "'\n";
        std::cout << "-------------------------\n";
    }
    return 0;
}

[1]遗憾的是没有(der)记录,除了变戏法样本

[2] 好吧,我用as_string 得到了正确的分配给Identifier 没有太多的工作

[3]在性能方面可能有更聪明的方法,但现在,让我们保持简单

【讨论】:

  • @Gregory81 在没有on_success 的情况下这样做是可以的,但在我看来,它不能很好地“扩展”更大的语法。您会考虑使用语义操作(参见例如this answer,或text_node_t builder from this larger example)...
  • @Gregory81 我更愿意将我的“咨询”限制在堆栈溢出范围内。这是因为我相信它最适合寻找类似信息的未来用户。此外,该系统是开放的,我经常对其他人提出的出色解决方案感到惊讶。每个人都赢了。
  • [删除了我之前在电子邮件中的评论,以免引诱垃圾邮件发送者] @sehe,1. 好的,我了解您对私人联系的立场。 2. 我尝试编译您提供的代码,似乎它使用了很少的 cxx11 功能,我的编译器(Visual Studio 2010)对某些构造不满意。 3. 由于我缺乏提升精神的知识,您的代码中几乎没有我不理解的东西。 3.1 什么是“期望点”?能否请您指导我查看一些文档/教程/信息,以便我熟悉它?
  • @sehe 感谢您的回答,它完美地浓缩了使用 on_success 的方式,由于文件数量众多,在 conjure 示例中有点难以理解。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2013-02-06
  • 1970-01-01
  • 1970-01-01
  • 2014-01-20
  • 2013-02-12
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多