用 C++ 编写的分词器的继承设计答案

【问题标题】：Inheritance design for tokenizer written in C++用 C++ 编写的分词器的继承设计
【发布时间】：2015-04-10 21:42:05
【问题描述】：

我正在用 C++ 编写一个简单的解析器，旨在解析 s 表达式语言的子集。

我正在尝试以简洁的方式为标记器设计我的继承层次结构，但我遇到了对象切片问题，因为我一直在尝试避免动态分配。我想避免动态分配的原因是回避在我的标记器和解析器中引入内存泄漏的问题。

整体结构是：一个 Parser 有一个 Tokenizer 实例。解析器调用 Tokenizer::peek() 返回输入头部的标记。我希望 peek() 按值返回一个 Token 实例，而不是动态分配正确派生类的 Token 并返回指针。

更具体地说，假设有两种标记类型：Int 和 Float。这是一个有望澄清问题的示例：

class Token {
public:
  virtual std::string str() { return "default"; }
};

template <typename T>
class BaseToken : public Token {
public:
  T value;
  BaseToken(const T &t) : value(t) {}
  virtual std::string str() {
    return to_str(value);
  }
};

class TokenInt : public BaseToken<int> {
public:
  TokenInt(int i) : BaseToken(i) {}
};

class TokenFloat : public BaseToken<float> {
  TokenFloat(float f) : BaseToken(f) {}
};

Token peek() {
  return TokenInt(10);
}

int main() {
  Token t = peek();
  std::cout << "Token is: " << t.str() << "\n";
  return 0;
}

很明显，输出是“Token is: default”而不是“Token is: 10”，因为 TokenInt 被切分为 Token。

我的问题是：是否有适当的继承结构或设计模式可以在不使用动态分配的情况下完成这种类型的多态性？

【问题讨论】：

使用动态内存分配和智能指针。动态内存分配将解决对象切片问题。智能指针将处理内存管理问题。

标签： c++ parsing inheritance design-patterns

【解决方案1】：

为了返回一个值，您必须知道它的大小。这样做的唯一方法是：

按照 cmets 中的建议返回一个指向基本 Token 类型的智能指针，
返回所有标记类型的联合，然后根据类型标记进行强制转换，或者
返回一个包含类型标签和匹配文本的通用 Token 类型

【讨论】：

很遗憾，数字 2 和 3 太痛苦了（#2 因为联合是痛苦的，#3 因为抽象包含值的类型是痛苦的）因为真的没有充分的理由标记器应该分配任何东西。
工会很痛苦，尤其是。与非 POD 类。我建议使用有区别的联合 - 特别是 boost::variant。它基于堆栈，因此您可以避免任何动态分配问题，而且 IMO 非常易于使用。
@Anna，你应该写一个答案。你的想法在我看来是个赢家。

【解决方案2】：

因此，扩展我的评论，您可以使用 boost::variant。文档有一个很好的教程 (http://www.boost.org/doc/libs/1_57_0/doc/html/variant.html)，但这里有一个如何在您的情况下使用它的示例（注意 - 我添加了一些功能来展示如何使用非常方便的 static_visitor）

Boost::variant 也是只有标头的，所以在链接时不需要特别注意。

（注意——你可以直接使用 boost::variant 作为你的 Token 类型；但是，如果你将它封装在一个类中，你可以在类方法中隐藏访问者的使用）

#include <string>
#include <sstream>
#include <boost/variant.hpp>

typedef boost::variant<std::string, int, float> TokenData;


// Define a function overloaded on the different variant contained types:
std::string type_string(int i)
{
   return "Integer";
}

std::string type_string(std::string const& s)
{
   return "String";
}

std::string type_string(float f)
{
   return "Float";
}

// Visitors implement type specific behavior. See the boost::variant docs
// for some more interesting visitors (recursive, multiple dispatch, etc)

class TypeVisitor : public boost::static_visitor<std::string>  {
public:
   template <typename T>
   std::string operator()(T const& val) const
   {
      return type_string(val);
   }
};

// Token class - no inheritance, so no possible slicing!

class Token {
public:
   template <typename T>
   Token(const T& value):
      m_value(value)
   {}

   std::string str() const {
      // Variants by default have their stream operators defined to act
      // on the contained type. You might want to just define operator<< 
      // for the Token class (see below), but I'm copying your method  
      // signature here.

      std::stringstream sstr;
      sstr << m_value;
      return sstr.str();
   }

   std::string token_type() const {
      // Note: you can actually just use m_value.type() to get the type_info for
      // the variant's type and do a lookup based on that; however, this shows how 
      // to use a static_visitor to do different things based on type
      return boost::apply_visitor(TypeVisitor(), m_value);
   }

private:
   TokenData m_value;

friend
    std::ostream& operator<<(std::ostream&, Token const&);

};

// An alternative to the "str" method
std::ostream& operator<<(std::ostream& oo, Token const& tok)
{
    return oo << tok.m_value;
}


int main(){
   Token t1(10), t2("Hello"), t3(1.5f);
   std::cout << "Token 1 is: " << t1.str() << " Type: "<< t1.token_type() << "\n";
   std::cout << "Token 2 is: " << t2.str() << " Type: "<< t2.token_type() << "\n";
   // Use Token::operator<< instead:
   std::cout << "Token 3 is: " << t3 << " Type: "<< t3.token_type() << "\n";

}

输出：

Token 1 is: 10 Type: Integer
Token 2 is: Hello Type: String
Token 3 is: 1.3 Type: Float

【讨论】：