获取字符串标记/参数作为解析答案

【问题标题】：Getting string tokens / params as parsing获取字符串标记/参数作为解析
【发布时间】：2014-06-19 05:30:39
【问题描述】：

我一直在努力在 C++ 中找到一种方法来在我自己的脚本文件中获取令牌和参数，我可以在 C# 中使用正则表达式来做到这一点，但我需要在 C++ 中做到这一点

假设我有这个脚本文件：

Name = Tywin Lanninster
Age = 35

CustomAttributes = {
 Health = 100,
 Mana = 100,
 Height = 4,
 Weight = 40
}

我需要知道的是如何通过所有这些代码进行交互，并在令牌之间触发？
比如获取 Name 值、age 值，然后获取行为内部的所有数据，同时还能获取它的值？就像能够解析它一样，我已经提升但 Spirit 不是我要找的东西，我已经看到 std::string 有一个 find_first_of 函数，所以我正在寻找答案。

我从 2007 年发现了一个类似我的问题，但给出的代码还不够，因为它不完整，但可以看到它是我正在寻找的：http://www.daniweb.com/software-development/cpp/threads/76797/simple-script-parser-how-to-

请帮帮我！

所以我做了这个：

这是我的第一个解析加载循环：

while (getline(file, text)) {
    if (!getNextToken()) continue;
    std::cout << "Token: '" << token << "'" << std::endl;
    if (token == "Name")
    {
        getString();
    }
    else {
        std::cout << "Unknown token: '" << token << "'" << std::endl;
    }
}

我用来获取令牌的功能，我可以获取年龄和姓名之类的东西

bool Script::getNextToken()
char letter;

    for (int i = 0; i != text.size(); i++)
    {
        letter = text[i];
        if (letter == TOKEN_EQUAL)
        {
            token = text.substr(0, i-1);
            text.erase(0, i+2);
            return true;
        }
    }

    return false;
}

我用于在“内获取字符串”的代码

    bool Script::getString()
    char letter;
    std::string str;
    std::cout << "Trying to get string from: " << text << std::endl;
    for (int i = 0; i != text.size() + 1; i++)
    {
        letter = text[i];
        if (letter == TOKEN_STRING)
        {
            text.erase(0, 1);
            int pos = text.find_first_of(TOKEN_STRING);
            str = text.substr(i, pos);
            std::cout << "String Found: " << str << std::endl;
            text.erase(pos);
            std::cout << "text left: " << text << std::endl;
            break;
        }
    }

但我不知道如何在 { } 中获取内容，因为它们位于不同的行中并且我正在使用 getline

【问题讨论】：

你能在 C++11 中使用正则表达式吗？
我已经提升了 1.55，但是执行正则表达式会简化所有内容并忽略 CustomAttributes 标记和 { } 括号，就好像它不是解析器一样。
我在 C# 中使用过正则表达式，但我想解释并能够逐个令牌获取令牌，就像在我显示的链接中一样。
我正在考虑在一个循环中从 std::string 读取所有字符，直到它是字符串的结尾，如果我点击这个标记 (=) 我阅读了这个标记之前的所有内容并进一步阅读直到指定了新行或标记，有关此的任何线索？我可以进一步简化我的问题
如果您尝试解析的语法比您提供的示例更复杂，您应该尝试使用lex 和yacc 等工具。它们的目的是简化词法和句法分析，并提供良好的性能。

标签： c++ parsing token params

【解决方案1】：

更新：

鉴于您的文件格式比我最初想象的要复杂一些，我这次使用boost-spirit 再试一次。确实，让它与比教程中的简单示例更复杂的东西一起工作是一件乏味的事情，但在 stackoverflow 条目的一些帮助下，我设法让一些东西工作。

这里是：

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>

#include <iostream>
#include <fstream>
#include <string>

namespace data {

  namespace qi = boost::spirit::qi;
  namespace ascii = boost::spirit::ascii;

  struct custom_attributes {
/*
    custom_attributes(int h = 0, int m = 0, int ht = 0, int w = 0):
      health(h), mana(m), height(ht), weight(w){}
*/
    int health;
    int mana;
    int height;
    int weight;
  };

  struct player {
/*
    player(std::string n="", int a=0, custom_attributes at={}):
      name(n), age(a), attrs(at) {}
*/
    std::string name;
    int age;
    custom_attributes attrs;
  };
}  

BOOST_FUSION_ADAPT_STRUCT(
    data::custom_attributes,
    (int, health)
    (int, mana)
    (int, height)
    (int, weight)
)

BOOST_FUSION_ADAPT_STRUCT(
    data::player,
    (std::string, name)
    (int, age)
    (data::custom_attributes, attrs)
)

namespace data {
  template <typename Iterator>
  struct attributes_parser : qi::grammar<Iterator, custom_attributes(), ascii::space_type> {
    attributes_parser() : attributes_parser::base_type(start) {

      using qi::int_;
      using qi::lit;
      using qi::lexeme;
      using ascii::char_;
      using ascii::space_type;

      start %=
        lit("CustomAttributes")
        >> '=' >> '{'
        >>  lit("Health") >> '=' >> int_ >> ','
        >>  lit("Mana") >> '=' >> int_ >> ','
        >>  lit("Height") >> '=' >> int_ >> ','
        >>  lit("Weight") >> '=' >> int_
        >>  '}'
        ;
    }
    qi::rule<Iterator, custom_attributes(), ascii::space_type> start;
  };

  template <typename Iterator>
  struct player_parser : qi::grammar<Iterator, player(), ascii::space_type> {
    player_parser() : player_parser::base_type(start) {
      using qi::int_;
      using qi::lit;
      using qi::lexeme;
      using ascii::char_;
      using ascii::space_type;

      quoted_string %= lexeme[+(char_ - '"')];

      start %=
        lit("Name") >> '=' >> '"' >> quoted_string >> '"' >>
        lit("Age") >> '=' >> int_ >> attributes
        ;
    }

    qi::rule<Iterator, std::string(), ascii::space_type> quoted_string;
    qi::rule<Iterator, player(), ascii::space_type> start;
    attributes_parser<Iterator> attributes;
    };

}

int main(){

  using boost::spirit::ascii::space;
  typedef boost::spirit::istream_iterator iterator_type;

  typedef data::player_parser<iterator_type> player_parser;
  player_parser g;
  data::player plr;

  std::ifstream in("player.txt");
  in.unsetf(std::ios::skipws);

  iterator_type iter(in), end;

  bool r = phrase_parse(iter, end, g, space, plr);

  if (r && iter == end){
    std::cout << "parsing succeeded:" << std::endl;
    std::cout << "  name = " << plr.name << std::endl;
    std::cout << "  age = " << plr.age << std::endl;
    std::cout << "    health = " << plr.attrs.health << std::endl;
    std::cout << "    mana = " << plr.attrs.mana << std::endl;
    std::cout << "    height = " << plr.attrs.height << std::endl;
    std::cout << "    weight = " << plr.attrs.weight << std::endl;
    
  } else {
    std::cout << "parsing failed" << std::endl;

  }
}

几点：

我为此使用了 1.49 版的 boost。

上面的代码适用于问题中提供的示例，如果存储在文件中（恰当地命名为player.txt），并且在字符串周围带有双引号，而不是关键字，例如 Name 属性的值。

我选择将语法一分为二，以便您可以重用和扩展属性部分。对于你接下来要去哪里，这似乎是一个合理的猜测。但是，这意味着对boost-fusion 序列化 的支持会很困难。 stackoverflow 上有处理该问题的条目。请检查一下，如果有问题，请再次询问。

旧答案：

既然你要使用正则表达式，我想文档的语法很简单，你想逐行阅读。此外，您在问题中提供的示例可以用空格或回车作为分隔符进行标记。

您可以简单地使用 iostream 和 >> 运算符，以及用于每种行格式的一组组合器（即，执行每个简单操作的函数，您可以组合起来产生更复杂的操作）。

#include <iostream>
#include <stdexcept>
#include <vector>

using std::string;
using std::istream;
using std::vector;

class parse_exn : public std:runtime_error{
    public:
        parse_exn(const string msg) : runtime_error(msg){}
};

bool expect(istream &is, const string atom){
    string data;
    Is >> data;
    return is.good() && data == atom; // or throw
}

int readAttribute(istream &is, const string name){
    int result;
    if (expect(is, name) && expect(is,"="))
        is >> result;
        If (is.good())
            return result;
    }
    throw parse_exn("invalid attribute");
}

 Yourclass readStruct(istream &is, const vector<string> attributes){
    // parse header (name, "=","{")
    // parse each attribute and commas
    // parse footer ("}")
    // return the object
    // or throw an exn if something got wrong
}

Yourclass 应该是您定义的包含您解析的数据的类。你应该明白了。现在请尝试写一些东西，如果遇到困难，请回来。

注意：以上代码未经测试。

【讨论】：

您好，感谢您的回答。这是我所做的：`
显然，语法比看起来更复杂。上面的代码不行。
您好，我很震惊这可以使用 Boost 来完成！似乎正是我所需要的，但它有效吗？出于某种原因，我只获得了 name 属性值，而不是 age 和自定义属性。如果我能分辨出哪里出了问题，我可以从这里开始工作！
我用你的数据测试了我的代码，只有一个区别：我在名字两边加上了双引号。如果你认为这将是唯一的模式，你绝对可以改变它并让它在那里寻找 2 个单词。
它对我不起作用，我使用示例中的相同数据并在名称值周围添加引号，它说它已成功解析，但由于某种原因我没有得到自定义既不属性年龄值，只属性名称 - 我从字面上复制粘贴它，因为我以前从未使用过 Boost::Qi。