【问题标题】:Spirit parser segfaults精神解析器段错误
【发布时间】:2015-09-26 19:28:42
【问题描述】:

运行此程序时出现段错误。看起来像调试 打印,但是当我调试它时,我只会得到一个无限循环的回溯。 如果有人可以帮助我指出正确的方向,我将不胜感激。 如果可能的话,我也很感激任何清理这个的提示/技巧 语法。

谢谢!

//code here:
/***
*I-EBNF parser
*
*This defines a grammar for BNF.
*/

//Speeds up compilation times.
//This is a relatively small grammar, this is useful.
#define BOOST_SPIRIT_NO_PREDEFINED_TERMINALS
#define BOOST_SPIRIT_QI_DEBUG

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/fusion/adapted.hpp>
#include <boost/fusion/support.hpp>
#include <vector>
#include <string>
#include <iostream>

namespace Parser
{

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

enum class RHSType
{
    Terminal, Identifier
};
struct RHS
{
    RHSType type;
    std::string value;
};
struct Rule
{
    std::string identifier; //lhs
    std::vector<RHS> rhs;
};
}

//expose our structs to fusion:
BOOST_FUSION_ADAPT_STRUCT(
    Parser::RHS,
    (Parser::RHSType, type)
    (std::string, value)
)
BOOST_FUSION_ADAPT_STRUCT(
    Parser::Rule,
    (std::string, identifier)
    (std::vector<Parser::RHS>, rhs)
)

namespace Parser
{
typedef std::vector<Rule> RuleList;

//our grammar definition
template <typename Iterator>
struct Grammar: qi::grammar<Iterator, std::list<Rule>, ascii::space_type>
{
    Grammar(): Grammar::base_type(rules)
    {
        qi::char_type char_;

        letter = char_("a-zA-Z");
        digit = char_('0', '9');
        symbol = char_('[') | ']' | '[' | ']' | '(' | ')' | '<' | '>'
| '\'' | '\"' | '=' | '|' | '.' | ',' | ';';
        character = letter | digit | symbol | '_';
        identifier = letter >> *(letter | digit | '_');
        terminal = (char_('\'') >> character >> *character >>
char_('\'')) | (char_('\"') >> character >> *character >> char_('\"'));
        lhs = identifier;
        rhs = terminal | identifier | char_('[') >> rhs >> char_(']')
| char_('{') >> rhs >> char_('}') | char_('(') >> rhs >> char_(')') |
rhs >> char_('|') >> rhs | rhs >> char_(',') >> rhs;
        rule = identifier >> char_('=') >> rhs;
        rules = rule >> *rule;
    }

private:
    qi::rule<Iterator, char(), ascii::space_type> letter, digit,
symbol, character;
    qi::rule<Iterator, std::string(), ascii::space_type> identifier,
lhs, terminal;
    qi::rule<Iterator, RHS, ascii::space_type> rhs;
    qi::rule<Iterator, Rule, ascii::space_type> rule;
    qi::rule<Iterator, std::list<Rule>, ascii::space_type> rules;
};

}

int main()
{
    Parser::Grammar<std::string::const_iterator> parser;
    boost::spirit::ascii::space_type space;
    std::string input;
    std::vector<std::string> output;
    bool result;

    while (std::getline(std::cin, input))
        {
            if (input.empty())
                {
                    break;
                }
            std::string::const_iterator it, itEnd;
            it = input.begin();
            itEnd = input.end();
            result = phrase_parse(it, itEnd, parser, space, output);
            if (result && it == itEnd)
                {
                    std::cout << "success" << std::endl;
                }
        }

    return 0;
}

¹ 来自 [spirit-general] 邮件列表的交叉发布:http://boost.2283326.n4.nabble.com/parser-segfault-tips-tricks-td4680336.html

【问题讨论】:

  • 我在这里实时编码答案:livecoding.tv/sehe (experiment)
  • @Nevermore 是的,这是一个好点。我刚刚想出了以下示例(直播):assigned1 = some_var | "a 'string' value" | [ 'okay', "this", is_another, identifier, { ( "nested" ) } ] | "done"。 (注意:我代表邮件列表中的提问者在此处发布了问题)
  • 好的。顺便说一句,无法在 gcc 4.9.2 + boost 1.58 上重现。
  • @Nevermore 你的意思是你不能编译它?还是没有崩溃?
  • @Nevermore 好吧。这取决于您使用的输入,因为您需要触发左递归(请参阅我的答案)。此外,由于属性被(错误地)声明的方式,肯定存在一些未定义的行为。这或许可以解释为什么崩溃不一致

标签: c++ parsing segmentation-fault grammar boost-spirit


【解决方案1】:

2015 年 9 月 26 日上午 1:45,泰勒利特菲尔德写道:

大家好: 运行此程序时出现段错误。看起来像调试 打印,但是当我调试它时,我只会得到一个无限循环的回溯。 如果有人可以帮助我指出正确的方向,我将不胜感激。 如果可能的话,我也很感激任何清理这个的提示/技巧 语法。

首先,它不会编译。

它不应该编译,因为语法没有公开属性(你的意思是list&lt;Rule&gt;()而不是list&lt;Rule&gt;?)。

但您永远不能将其分配给output 变量(即std::vector&lt;std::string&gt;?!?)

同样你在这里忘记了括号

qi::rule<Iterator, RHS(), ascii::space_type> rhs;
qi::rule<Iterator, Rule(), ascii::space_type> rule;
qi::rule<Iterator, std::list<Rule>(), ascii::space_type> rules;

rhs 规则具有无限左递归:

    rhs               = terminal
                      | identifier
                      | ('[' >> rhs >> ']')
                      | ('{' >> rhs >> '}')
                      | ('(' >> rhs >> ')')
                      | (rhs >> '|' >> rhs)  // OOPS
                      | (rhs >> ',' >> rhs)  // OOPS
                      ;

这可能解释了崩溃,因为它会导致 stackoverflow。

注意事项

直播记录(part #1part #2)准确地显示了我首先清理语法并随后使事情真正值得编译的步骤。

那里有很多工作:

  • 清理:使用隐式 qi::lit 进行交互([]{}()=|,
  • 使用 kleene+ 而不是 a &gt;&gt; *a(不止一种情况)
  • 首选% 解析器来解析...列表
  • 我不得不稍微“调整”一下“RHS”的规则;最后两个分支中存在无限递归(参见// OOPS)。我通过引入一个“纯”表达式规则来修复它(它只解析一个 RHS 结构。我已将此类型重命名为 Expression

    “列表”解析(识别由,| 分隔的表达式列表)被移动到原始的rhs 规则中,我将其重命名为expr_list 以更具描述性:

    expression        = qi::attr(Parser::ExprType::Terminal)   >> terminal 
                      | qi::attr(Parser::ExprType::Identifier) >> identifier 
                      | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '[' >> expr_list >> ']' ]
                      | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '{' >> expr_list >> '}' ] 
                      | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '(' >> expr_list >> ')' ] 
                      ;
    
    expr_list         = expression % (char_("|,")) // TODO FIXME?
                      ;
    
  • 为了使合成的属性真正转换为RHS(现在:Expression)类型,我们需要为第一个适应实际公开一个RHSType(现在:ExprType)值成员。您可以在上面的行中看到我们为此使用了qi::attr()

  • 现代编译器和 boost 版本可以大大简化 BOOST_FUSION_ADAPT_STRUCT 调用:

    BOOST_FUSION_ADAPT_STRUCT(Parser::Expression, type, value)
    BOOST_FUSION_ADAPT_STRUCT(Parser::Rule, identifier, expr_list)
    
  • 我将一些规则“升级”为 lexemes,这意味着它们不服从船长。

    猜测我可能应该将空格字符添加到terminal字符串文字)内的可接受字符集中。如果这不是您想要的,只需从 char_("[][]()&lt;&gt;\'\"=|.,;_ "); 中删除最后一个字符。

  • 我还将船长更改为blank_type,因为它不会跳过换行符。您可以使用它直接使用相同的语法轻松解析多行输入。做,例如:

    rules = rule % qi::eol;
    

另请参阅:Boost spirit skipper issues,了解有关跳过、词位及其交互的更多信息。

事不宜迟,下面是一个工作示例:

Live On Coliru

#define BOOST_SPIRIT_DEBUG

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <vector>

namespace Parser
{
    namespace qi    = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;

    enum class ExprType { Terminal, Identifier, Compound };

    static inline std::ostream& operator<<(std::ostream& os, ExprType type) {
        switch (type) {
            case ExprType::Terminal:   return os << "Terminal";
            case ExprType::Identifier: return os << "Identifier";
            case ExprType::Compound:   return os << "Compound";
        }
        return os << "(unknown)";
    }

    struct Expression { // TODO make recursive (see `boost::make_recursive_variant`)
        ExprType    type;
        std::string value;
    };

    using ExprList = std::vector<Expression>;

    struct Rule {
        std::string identifier; // lhs
        ExprList    expr_list;
    };
}

//expose our structs to fusion:
BOOST_FUSION_ADAPT_STRUCT(Parser::Expression, type, value)
BOOST_FUSION_ADAPT_STRUCT(Parser::Rule, identifier, expr_list)

namespace Parser
{
    typedef std::list<Rule> RuleList;

    //our grammar definition
    template <typename Iterator>
    struct Grammar: qi::grammar<Iterator, RuleList(), ascii::blank_type>
    {
        Grammar(): Grammar::base_type(rules)
        {
            qi::char_type char_;

            symbol            = char_("[][]()<>\'\"=|.,;_ ");
            character         = qi::alpha | qi::digit | symbol;
            identifier        = qi::alpha >> *(qi::alnum | char_('_'));


            // TODO capture strings including interpunction(?)
            terminal          = ('\'' >> +(character - '\'') >> '\'') 
                              | ('\"' >> +(character - '\"') >> '\"');

            expression        = qi::attr(Parser::ExprType::Terminal)   >> terminal 
                              | qi::attr(Parser::ExprType::Identifier) >> identifier 
                              | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '[' >> expr_list >> ']' ]
                              | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '{' >> expr_list >> '}' ] 
                              | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '(' >> expr_list >> ')' ] 
                              ;

            expr_list         = expression % (char_("|,")) // TODO FIXME?
                              ;
                                    // above accepts mixed separators:
                                    //     a, b, c | d, e
                                    //
                                    // original accepted:
                                    //
                                    //     a, b, [ c | d ], e
                                    //     a| b| [ c , d ]| e
                                    //     a| b| [ c | d ]| e
                                    //     a, b, [ c , d ], e

            rule              = identifier >> '=' >> expr_list;
            //rules           = rule % qi::eol; // alternatively, parse multi-line input in one go
            rules             = +rule;

            BOOST_SPIRIT_DEBUG_NODES((rules)(rule)(expr_list)(expression)(identifier)(terminal))
        }

    private:
        qi::rule<Iterator, Expression(),  ascii::blank_type> expression;
        qi::rule<Iterator, ExprList(),    ascii::blank_type> expr_list;
        qi::rule<Iterator, Rule(),        ascii::blank_type> rule;
        qi::rule<Iterator, RuleList(),    ascii::blank_type> rules;
        // lexemes:
        qi::rule<Iterator, std::string()> terminal, identifier;
        qi::rule<Iterator, char()>        symbol, character;
    };

}

int main() {
    using It = std::string::const_iterator;

    Parser::Grammar<It> parser;
    boost::spirit::ascii::blank_type blank;

    std::string input;

    while (std::getline(std::cin, input))
    {
        if (input.empty()) {
            break;
        }

        It it = input.begin(), itEnd = input.end();

        Parser::RuleList output;
        bool result = phrase_parse(it, itEnd, parser, blank, output);
        if (result) {
            std::cout << "success\n";
            for (auto& rule : output) {
                std::cout << "\ntarget: " << rule.identifier << "\n";
                for (auto& rhs : rule.expr_list) {
                    std::cout << "rhs:    " << boost::fusion::as_vector(rhs) << "\n";
                }
            }
        } else {
            std::cout << "parse failed\n";
        }

        if (it != itEnd)
            std::cout << "remaining unparsed: '" << std::string(it, itEnd) << "\n";
    }
}

打印输出:

success

target: assigned1
rhs:    (Identifier some_var)
rhs:    (Terminal a 'string' value)
rhs:    (Compound [ 'okay', "this", is_another, identifier, { ( "nested" ) } ])
rhs:    (Terminal done)
success

target: assigned2
rhs:    (Compound { a })

并启用调试 (#define BOOST_SPIRIT_DEBUG):

<rules>
<try>assigned1 = some_var</try>
<rule>
    <try>assigned1 = some_var</try>
    <identifier>
    <try>assigned1 = some_var</try>
    <success> = some_var | "a 'st</success>
    <attributes>[[a, s, s, i, g, n, e, d, 1]]</attributes>
    </identifier>
    <expr_list>
    <try> some_var | "a 'stri</try>
    <expression>
        <try> some_var | "a 'stri</try>
        <terminal>
        <try>some_var | "a 'strin</try>
        <fail/>
        </terminal>
        <identifier>
        <try>some_var | "a 'strin</try>
        <success> | "a 'string' value</success>
        <attributes>[[s, o, m, e, _, v, a, r]]</attributes>
        </identifier>
        <success> | "a 'string' value</success>
        <attributes>[[Identifier, [s, o, m, e, _, v, a, r]]]</attributes>
    </expression>
    <expression>
        <try> "a 'string' value" </try>
        <terminal>
        <try>"a 'string' value" |</try>
        <success> | [ 'okay', "this",</success>
        <attributes>[[a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]]</attributes>
        </terminal>
        <success> | [ 'okay', "this",</success>
        <attributes>[[Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]]]</attributes>
    </expression>
    <expression>
        <try> [ 'okay', "this", i</try>
        <terminal>
        <try>[ 'okay', "this", is</try>
        <fail/>
        </terminal>
        <identifier>
        <try>[ 'okay', "this", is</try>
        <fail/>
        </identifier>
        <expr_list>
        <try> 'okay', "this", is_</try>
        <expression>
            <try> 'okay', "this", is_</try>
            <terminal>
            <try>'okay', "this", is_a</try>
            <success>, "this", is_another</success>
            <attributes>[[o, k, a, y]]</attributes>
            </terminal>
            <success>, "this", is_another</success>
            <attributes>[[Terminal, [o, k, a, y]]]</attributes>
        </expression>
        <expression>
            <try> "this", is_another,</try>
            <terminal>
            <try>"this", is_another, </try>
            <success>, is_another, identi</success>
            <attributes>[[t, h, i, s]]</attributes>
            </terminal>
            <success>, is_another, identi</success>
            <attributes>[[Terminal, [t, h, i, s]]]</attributes>
        </expression>
        <expression>
            <try> is_another, identif</try>
            <terminal>
            <try>is_another, identifi</try>
            <fail/>
            </terminal>
            <identifier>
            <try>is_another, identifi</try>
            <success>, identifier, { ( "n</success>
            <attributes>[[i, s, _, a, n, o, t, h, e, r]]</attributes>
            </identifier>
            <success>, identifier, { ( "n</success>
            <attributes>[[Identifier, [i, s, _, a, n, o, t, h, e, r]]]</attributes>
        </expression>
        <expression>
            <try> identifier, { ( "ne</try>
            <terminal>
            <try>identifier, { ( "nes</try>
            <fail/>
            </terminal>
            <identifier>
            <try>identifier, { ( "nes</try>
            <success>, { ( "nested" ) } ]</success>
            <attributes>[[i, d, e, n, t, i, f, i, e, r]]</attributes>
            </identifier>
            <success>, { ( "nested" ) } ]</success>
            <attributes>[[Identifier, [i, d, e, n, t, i, f, i, e, r]]]</attributes>
        </expression>
        <expression>
            <try> { ( "nested" ) } ] </try>
            <terminal>
            <try>{ ( "nested" ) } ] |</try>
            <fail/>
            </terminal>
            <identifier>
            <try>{ ( "nested" ) } ] |</try>
            <fail/>
            </identifier>
            <expr_list>
            <try> ( "nested" ) } ] | </try>
            <expression>
                <try> ( "nested" ) } ] | </try>
                <terminal>
                <try>( "nested" ) } ] | "</try>
                <fail/>
                </terminal>
                <identifier>
                <try>( "nested" ) } ] | "</try>
                <fail/>
                </identifier>
                <expr_list>
                <try> "nested" ) } ] | "d</try>
                <expression>
                    <try> "nested" ) } ] | "d</try>
                    <terminal>
                    <try>"nested" ) } ] | "do</try>
                    <success> ) } ] | "done"</success>
                    <attributes>[[n, e, s, t, e, d]]</attributes>
                    </terminal>
                    <success> ) } ] | "done"</success>
                    <attributes>[[Terminal, [n, e, s, t, e, d]]]</attributes>
                </expression>
                <success> ) } ] | "done"</success>
                <attributes>[[[Terminal, [n, e, s, t, e, d]]]]</attributes>
                </expr_list>
                <success> } ] | "done"</success>
                <attributes>[[Compound, [(,  , ", n, e, s, t, e, d, ",  , )]]]</attributes>
            </expression>
            <success> } ] | "done"</success>
            <attributes>[[[Compound, [(,  , ", n, e, s, t, e, d, ",  , )]]]]</attributes>
            </expr_list>
            <success> ] | "done"</success>
            <attributes>[[Compound, [{,  , (,  , ", n, e, s, t, e, d, ",  , ),  , }]]]</attributes>
        </expression>
        <success> ] | "done"</success>
        <attributes>[[[Terminal, [o, k, a, y]], [Terminal, [t, h, i, s]], [Identifier, [i, s, _, a, n, o, t, h, e, r]], [Identifier, [i, d, e, n, t, i, f, i, e, r]], [Compound, [{,  , (,  , ", n, e, s, t, e, d, ",  , ),  , }]]]]</attributes>
        </expr_list>
        <success> | "done"</success>
        <attributes>[[Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]]]</attributes>
    </expression>
    <expression>
        <try> "done"</try>
        <terminal>
        <try>"done"</try>
        <success></success>
        <attributes>[[d, o, n, e]]</attributes>
        </terminal>
        <success></success>
        <attributes>[[Terminal, [d, o, n, e]]]</attributes>
    </expression>
    <success></success>
    <attributes>[[[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]], [Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]], [Terminal, [d, o, n, e]]]]</attributes>
    </expr_list>
    <success></success>
    <attributes>[[[a, s, s, i, g, n, e, d, 1], [[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]], [Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]], [Terminal, [d, o, n, e]]]]]</attributes>
</rule>
<rule>
    <try></try>
    <identifier>
    <try></try>
    <fail/>
    </identifier>
    <fail/>
</rule>
<success></success>
<attributes>[[[[a, s, s, i, g, n, e, d, 1], [[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]], [Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]], [Terminal, [d, o, n, e]]]]]]</attributes>
</rules>
success

target: assigned1
rhs:    (Identifier some_var)
rhs:    (Terminal a 'string' value)
rhs:    (Compound [ 'okay', "this", is_another, identifier, { ( "nested" ) } ])
rhs:    (Terminal done)
<rules>
<try>assigned2 = { a }</try>
<rule>
    <try>assigned2 = { a }</try>
    <identifier>
    <try>assigned2 = { a }</try>
    <success> = { a }</success>
    <attributes>[[a, s, s, i, g, n, e, d, 2]]</attributes>
    </identifier>
    <expr_list>
    <try> { a }</try>
    <expression>
        <try> { a }</try>
        <terminal>
        <try>{ a }</try>
        <fail/>
        </terminal>
        <identifier>
        <try>{ a }</try>
        <fail/>
        </identifier>
        <expr_list>
        <try> a }</try>
        <expression>
            <try> a }</try>
            <terminal>
            <try>a }</try>
            <fail/>
            </terminal>
            <identifier>
            <try>a }</try>
            <success> }</success>
            <attributes>[[a]]</attributes>
            </identifier>
            <success> }</success>
            <attributes>[[Identifier, [a]]]</attributes>
        </expression>
        <success> }</success>
        <attributes>[[[Identifier, [a]]]]</attributes>
        </expr_list>
        <success></success>
        <attributes>[[Compound, [{,  , a,  , }]]]</attributes>
    </expression>
    <success></success>
    <attributes>[[[Compound, [{,  , a,  , }]]]]</attributes>
    </expr_list>
    <success></success>
    <attributes>[[[a, s, s, i, g, n, e, d, 2], [[Compound, [{,  , a,  , }]]]]]</attributes>
</rule>
<rule>
    <try></try>
    <identifier>
    <try></try>
    <fail/>
    </identifier>
    <fail/>
</rule>
<success></success>
<attributes>[[[[a, s, s, i, g, n, e, d, 2], [[Compound, [{,  , a,  , }]]]]]]</attributes>
</rules>
success

target: assigned2
rhs:    (Compound { a })

输了

代码中还剩下一些 TODO。实施它们需要更多的努力,但我不确定这实际上是正确的方向,所以我会等待反馈:)

基本的 TODO 左侧之一是表示 AST 中表达式的递归性质。现在,我只是通过插入嵌套复合表达式的源字符串来“逃避”。

【讨论】:

  • "您的意思是 list&lt;Rule&gt;() 而不是 list&lt;Rule&gt;?"它们现在应该在功能上等效,您可能会看到一个实现错误。
  • 可能。甚至可能,因为如果您查看其余代码,则与 rhs 和语法本身中公开和合成的属性绝对不匹配...
  • 我没有看剩下的代码。这条线刚刚引起了我的注意,因为经过多年的人们被它所困扰,TT() 的设计问题终于应该得到解决了。
  • @K-ballo 我知道这一点。我只是没有对所使用的 boost 版本做出假设,并从 OP 甚至能够编译他的代码这一事实推断出来:) 那在当时的前沿 boost 中应该是不可能的
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-06-04
  • 1970-01-01
  • 1970-01-01
  • 2016-03-11
  • 1970-01-01
相关资源
最近更新 更多