使用 boost::tokenizer 和 boost::iterator_range答案

【问题标题】：Use boost::tokenizer with boost::iterator_range使用 boost::tokenizer 和 boost::iterator_range
【发布时间】：2012-10-29 01:34:01
【问题描述】：

我正在使用 boost::tokenizer 来读取类似 CSV 的文件。我将令牌存储在std::vector 中。它运作良好，但我只想为每个令牌存储一个boost::iterator。

我试过了：

#include <string>
#include <boost/tokenizer.hpp>
#include <boost/range/iterator_range.hpp>

typedef std::string::const_iterator string_iter;
typedef boost::iterator_range<string_iter> string_view;

int main(){
    std::string line;

    std::vector<string_view> contents;

    boost::tokenizer<boost::escaped_list_separator<char>, string_iter, string_view> tok(line.begin(), line.end());
    contents.assing(tok.begin(), tok.end());
}

但是编译失败：

/usr/include/boost/token_functions.hpp：在“bool”的实例化中 boost::escaped_list_separator::operator()(InputIterator&, InputIterator, Token&) [with InputIterator = __gnu_cxx::__normal_iterator >;令牌 = boost::iterator_range<:__normal_iterator>>;字符 = 字符；性状 = std::char_traits]': /usr/include/boost/token_iterator.hpp:70:11: 'void boost::token_iterator::initialize() [with TokenizerFunc = boost::escaped_list_separator;迭代器 = __gnu_cxx::__normal_iterator >;类型 = boost::iterator_range<:__normal_iterator>>]' /usr/include/boost/token_iterator.hpp:77:63：需要来自 'boost::token_iterator::token_iterator(TokenizerFunc, Iterator, Iterator) [with TokenizerFunc = boost::escaped_list_separator;迭代器 = __gnu_cxx::__normal_iterator >;类型 = boost::iterator_range<:__normal_iterator>>]' /usr/include/boost/tokenizer.hpp:86:53: 来自'boost::tokenizer::iter boost::tokenizer::begin() const [with TokenizerFunc = boost::escaped_list_separator;迭代器 = __gnu_cxx::__normal_iterator >;类型 = boost::iterator_range<:__normal_iterator>>; boost::tokenizer::iter = boost::token_iterator, __gnu_cxx::__normal_iterator >, boost::iterator_range<:__normal_iterator> > >]’ /home/wichtounet/dev/gooda-to-afdo-converter/src/gooda_reader.cpp:58:37: 从这里需要/usr/include/boost/token_functions.hpp:187:16：错误：'tok += (& 下一个)->__gnu_cxx::__normal_iterator<_iterator _container>::operator* >()' /usr/include/boost/token_functions.hpp:193:11: 错误：不匹配 'tok += (& 下一个)->__gnu_cxx::__normal_iterator<_iterator _container>::operator* >()' /usr/include/boost/token_functions.hpp: 在“void”的实例化中 boost::escaped_list_separator::do_escape(iterator&, iterator, Token&) [with iterator = __gnu_cxx::__normal_iterator >;令牌 = boost::iterator_range<:__normal_iterator>>;字符 = 字符；性状 = std::char_traits]’： /usr/include/boost/token_functions.hpp:176:11: 需要来自‘bool boost::escaped_list_separator::operator()(InputIterator&, InputIterator, Token&) [with InputIterator = __gnu_cxx::__normal_iterator >;令牌 = boost::iterator_range<:__normal_iterator>>;字符 = 字符；性状 = std::char_traits]’ /usr/include/boost/token_iterator.hpp:70:11: 'void boost::token_iterator::initialize() [with TokenizerFunc = boost::escaped_list_separator;迭代器 = __gnu_cxx::__normal_iterator >;类型 = boost::iterator_range<:__normal_iterator>>]' /usr/include/boost/token_iterator.hpp:77:63：需要来自 'boost::token_iterator::token_iterator(TokenizerFunc, Iterator, Iterator) [with TokenizerFunc = boost::escaped_list_separator;迭代器 = __gnu_cxx::__normal_iterator >;类型 = boost::iterator_range<:__normal_iterator>>]' /usr/include/boost/tokenizer.hpp:86:53: 来自'boost::tokenizer::iter boost::tokenizer::begin() const [with TokenizerFunc = boost::escaped_list_separator;迭代器 = __gnu_cxx::__normal_iterator >;类型 = boost::iterator_range<:__normal_iterator>>; boost::tokenizer::iter = boost::token_iterator, __gnu_cxx::__normal_iterator >, boost::iterator_range<:__normal_iterator> > >]’ /home/wichtounet/dev/gooda-to-afdo-converter/src/gooda_reader.cpp:58:37: 从这里需要/usr/include/boost/token_functions.hpp:130:9：错误：'tok +='\012''中的'operator+='不匹配 /usr/include/boost/token_functions.hpp:134:9：错误：不匹配 'tok += (& 下一个)->__gnu_cxx::__normal_iterator<_iterator _container>::operator* >()' /usr/include/boost/token_functions.hpp:138:9: 错误：不匹配 'tok += (& 下一个)->__gnu_cxx::__normal_iterator<_iterator _container>::operator* >()' /usr/include/boost/token_functions.hpp:142:9: 错误：不匹配 'tok += (& 下一个)->__gnu_cxx::__normal_iterator<_iterator _container>::operator* >()'

我也只是尝试使用boost::token_iterator 自己计算两个迭代器，但到目前为止我还没有成功。

是否有解决方案只获取每个标记的迭代器范围而不是字符串以节省一些性能？

【问题讨论】：

标签： c++ boost boost-tokenizer

【解决方案1】：

啊！你需要一个包含：

#include <iostream>
#include <boost/tokenizer.hpp>
#include <boost/range/iterator_range.hpp>
#include <string>

int main()
{
    std::string line;
    typedef std::string::const_iterator string_iter;
    typedef boost::iterator_range<string_iter> string_view;

    boost::tokenizer<boost::escaped_list_separator<char>, string_iter, string_view> tok(line.begin(), line.end());
}

compiles fine

【讨论】：

好吧，我的错 :( 我的例子真的很糟糕。只有在迭代标记器时才会出现问题。我更新了我的示例。对不起。
好的。但从我的描述来看，保留迭代器可能很危险。
只要我沿着迭代器保留原始字符串，它应该不会造成任何问题。我之前已经用 boost::split 做到了。
一个有趣的谜语。显然，迭代器范围不提供 += 运算符。在查看类型时，我不明白为什么应该这样做。也许，应该重新考虑 iterator_range
您是否尝试过字符串算法或正则表达式？ :) 也许这可以解决问题：svn.boost.org/trac/boost/ticket/3997 (or that)

【解决方案2】：

这行不通。 tokenizer 需要一个类型（第三个模板参数），它可以附加标记器函数的结果。具体来说，它必须提供运营商+= ( tokenizer<...>::iterator::value_type )。下面的code snippet 应该会让你更进一步，虽然我不确定这是否值得努力......

#include <string>
#include <boost/tokenizer.hpp>
#include <boost/range/iterator_range.hpp>
#include <iostream>
#include <cstddef>

typedef std::string::const_iterator string_iter;
typedef boost::iterator_range<string_iter> string_view;

// a constant size character buffer, skips anything beyond CSize...
template< std::size_t CSize >
class assignable_view {
   std::size_t m_size;
   char m_buffer[CSize];

   friend std::ostream& operator << (std::ostream& p_out, assignable_view const & p_view)
   {
      if (p_view.m_size > 0u) {
         std::copy(p_view.m_buffer, p_view.m_buffer + p_view.m_size, std::ostream_iterator<char>(p_out));
      }
      return p_out;
   }

public:
   template <class TIter>
   void operator += (TIter p_input) 
   {
      if (m_size < CSize) {
         m_buffer[m_size++] = p_input;
      }   
   }   
   assignable_view() 
      : m_size(0u) {}
};

int main(){
    std::string line
        = "Field 1,\"putting quotes around fields, allows commas\",Field 3";

    std::vector<string_view> contents;

    boost::tokenizer<
       boost::escaped_list_separator<char>, 
       string_iter, 
       assignable_view<11>    
    > tok(line.begin(), line.end());

    for (auto const & t_s : tok) {
       std::cout << t_s << std::endl;
    }
    //contents.assing(tok.begin(), tok.end());
}

【讨论】：