在 C++ 中读取自定义文件格式答案

【问题标题】：Reading custom file formats in C++在 C++ 中读取自定义文件格式
【发布时间】：2022-01-14 22:51:21
【问题描述】：

我将以下格式的配置文件读入我的 C++ 代码：

# name score
Marc 19.7
Alex 3.0
Julia 21.2

到目前为止，我已经调整了一个在这里找到的解决方案：Parse (split) a string in C++ using string delimiter (standard C++)。比如下面的代码sn -p 逐行读入文件，对每一行调用parseDictionaryLine，它会丢弃第一行，按照原线程中描述的方式拆分字符串，并将值插入到一个(self -实现）哈希表。

void parseDictionaryLine(std::string &line, std::string &delimiter, hash_table &table) {
    size_t position = 0;
    std::string name;
    float score;

    while((position = line.find(delimiter)) != std::string::npos) {
        name = line.substr(0, position);
        line.erase(0, position + delimiter.length());
        score = stof(line);
        table.hinsert(name, score);
    }
}

void loadDictionary(const std::string &path, hash_table &table) {
    std::string line;
    std::ifstream fin(path);
    std::string delimiter = " ";
    int lineNumber = 0;
    if(fin.is_open()) {
        while(getline(fin, line)) {
            if(lineNumber++ < 1) {
                continue; // first line
            }
            parseDictionaryLine(line, delimiter, table);
        }
        fin.close();
    }
    else {
        std::cerr << "Unable to open file." << std::endl;
    }
}

我的问题是，C++ 中是否有更优雅的方式来完成这项任务？特别是，是否有 (1) 更好的 split 函数，例如在 Python 中，(2) 更好的方法来测试一行是否是注释行（以 # 开头），例如 startsWith (3) 甚至可能在迭代器中处理类似于 Python 中的上下文管理器的文件并确保文件实际上将被关闭？我的解决方案适用于此处显示的简单案例，但随着更复杂的变化（例如位于不可预测位置的多个注释行和更多参数）变得更加笨拙。此外，让我担心的是，我的解决方案没有检查文件是否真的符合规定的格式（每行两个值，第一个是字符串，第二个是浮点数）。用我的方法实现这些检查似乎很麻烦。

我知道有 JSON 和其他文件格式以及为此用例制作的库，但我正在处理遗留代码，不能去那里。

【问题讨论】：

boost.org/doc/libs/1_78_0/doc/html/string_algo/… 你不必担心调用fin.close() 它会在你的函数结束时自动完成
如果你知道一个字符串的具体格式std::istringstream和普通的流提取操作符>>？否则std::istringstream（再次）和std::getline 在循环中使用分隔符作为“换行符”？而且网上肯定有不少“用分隔符分割”的例子。
@AlanBirtles 我会检查一下 boost 解决方案，这似乎很有帮助。怎么会自动调用fin.close()？如果程序之前崩溃，例如在尝试将字符串转换为浮点数时，我猜该文件将永远不会关闭。 @Some 程序员老兄 istringstream 的好点，这是第二种选择，是的，有例子（我发布的一个）。整个字符串处理对我来说似乎有点笨拙。
fin 是一个本地对象，因此在函数结束时会自动销毁（除非您的程序完全退出并出现一些非 C++ 异常，如 seg-fault，在这种情况下操作系统将关闭任何打开的文件句柄），析构函数调用close。
听起来很简单，试试SO search。在那里，您至少会找到一些建议，甚至是解决方案。

标签： c++ iterator fstream

【解决方案1】：

您可以使用 operator>> 在分隔符处为您拆分，如下所示：

#include <iostream>
#include <sstream>
#include <unordered_map>

std::istringstream input{
"# name score\n"
"Marc 19.7\n"
"Alex 3.0\n"
"Julia 21.2\n"
};


auto ReadDictionary(std::istream& stream)
{
    // unordered_map has O(1) lookup, map has n(log n) lookup
    // so I prefer unordered maps as dictionaries.
    std::unordered_map<std::string, double> dictionary;
    std::string header;

    // read the first line from input (the comment line or header)
    std::getline(stream, header);

    std::string name;
    std::string score;

    // read name and score from line (>> will split at delimiters for you)
    while (stream >> name >> score)
    {
        dictionary.insert({ name, std::stod(score) });
    }

    return dictionary;
}


int main()
{
    auto dictionary = ReadDictionary(input); // todo replace with file stream

    // range based for loop : https://en.cppreference.com/w/cpp/language/range-for
    // captured binding : https://en.cppreference.com/w/cpp/language/structured_binding
    for (const auto& [name, score] : dictionary)
    {
        std::cout << name << ": " << score << "\n";
    }

    return 0;
}

【讨论】：

【解决方案2】：

我会尽力回答你所有的问题。

首先要拆分字符串，您不应该使用链接的问题/答案。它是从 2010 年开始的，而且已经过时了。或者，您需要滚动到最底部。在那里你会找到更现代的答案。

在 C++ 中，很多事情都是通过迭代器完成的。因为 C++ 中的许多算法或构造函数都使用迭代器。因此，拆分字符串的更好方法是使用迭代器。然后，这将始终产生一个衬里。

背景。 std::string 也是一个容器。您可以迭代元素，例如其中的单词或值。如果是空格分隔值，您可以在std::istringstream 上使用std::istream_iterator。但是多年来有一个专门的迭代器用于迭代字符串中的模式：

std::sregex_token_iterator。而且因为它是专门为这个目的设计的，所以应该使用它。

Ans 如果它用于拆分字符串，使用正则表达式的开销也很小。因此，您可以按字符串、逗号、冒号或其他方式进行拆分。示例：

#include <iostream>
#include <string>
#include <vector>
#include <regex>

const std::regex re(";");

int main() {

    // Some test string to be splitted
    std::string test{ "Label;42;string;3.14" };

    // Split and store whatever number of elements in the vector. One Liner
    std::vector data(std::sregex_token_iterator(test.begin(), test.end(), re, -1), {});

    // Some debug output
    for (const std::string& s : data) std::cout << s << '\n';
}

因此，无论有多少模式，它都会将所有数据部分复制到std::vector。

所以，现在您有了一个用于拆分字符串的单行解决方案。

用于检查。如果第一个字符是字符串，则可以使用

索引运算符 (if (string[0] == '#'))
或者，std::string 的 front 函数 (if (string.front() == '#'))
还是一个正则表达式

但是，这里你需要小心。字符串不能为空，所以最好写： if (not string.empty() and string.front() == '#')

关闭文件或遍历文件。

如果您使用std::ifstream，则构造函数将为您打开文件，而析构函数将在流变量超出范围时自动关闭它。这里的典型模式是：

// Open the file and check, if it coud be opened
if (std::iftsream fileStream{"test.txt"};fileStream) {
    
    // Do things

}  // <-- This will close the file automatically for you

然后，一般来说，您应该使用更面向对象的方法。数据和对这些数据进行操作的方法应该封装在一个类中。然后您将覆盖提取器操作符>> 和插入器操作符<< 来读取和写入数据。这是因为只有类应该知道如何处理数据。如果你决定使用不同的机制，修改你的类，外面世界的其他部分仍然可以工作。

在你的例子中，输入和输出就是这么简单，最简单的 IO 就可以工作。无需拆分字符串。

请看下面的例子。

并特别注意main 中仅有的几条语句。

如果您更改类内部的某些内容，它会简单地继续工作。

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>

// Data in one line
struct Data {
    // Name and score
    std::string name{};
    double score{};

    // Extractor and inserter
    friend std::istream& operator >> (std::istream& is, Data& d) { return is >> d.name >> d.score; }
    friend std::ostream& operator << (std::ostream& os, const Data& d) { return os << d.name << '\t' << d.score; }
};

// Datbase, so all data from the source file
struct DataBase {
    std::vector<Data> data{};

    // Extractor
    friend std::istream& operator >> (std::istream& is, DataBase& d) {
        // Clear old data
        d.data.clear(); Data element{};

        // Read all lines from source stream
        for (std::string line{}; std::getline(is, line);) {

            // Ignore empty and command lines
            if (not line.empty() and line.front() != '#') {

                // Call extractor from Data class end get the data
                std::istringstream(line) >> element;

                // And save new data in the datbase
                d.data.push_back(std::move(element));
            }
        }
        return is;
    }
    // Inserter. Output all data
    friend std::ostream& operator << (std::ostream& os, const DataBase& d) {
        std::copy(d.data.begin(), d.data.end(), std::ostream_iterator<Data>(os, "\n"));
        return os;
    }
};

int main() {

    // Open file and check, if it is open
    if (std::ifstream ifs{ "test.txt" }; ifs) {

        // Our database
        DataBase db{};

        // Read all data
        ifs >> db;

        // Debug output show all data
        std::cout << db;
    }
    else std::cerr << "\nError: Could not open source file\n";
}

【讨论】：