转换为 c_str() 后，字符串最终变为垃圾答案

【问题标题】：String gets junk on end after conversion to c_str()转换为 c_str() 后，字符串最终变为垃圾
【发布时间】：2010-12-07 17:16:49
【问题描述】：

这是一个家庭作业，仅供所有想知道的人使用。

我正在编写一个词汇翻译器（英语 -> 德语，反之亦然），并且应该将用户所做的一切保存到文件中。很简单。

这是代码：

std::string file_name(user_name + ".reg");
std::ifstream file(file_name.c_str(), std::ios::binary | std::ios::ate);
// At this point, we have already verified the file exists. This shouldn't ever throw!
// Possible scenario:  user deletes file between calls.
assert( file.is_open() );

// Get the length of the file and reset the seek.
size_t length = file.tellg();
file.seekg(0, std::ios::beg);

// Create and write to the buffer.
char *buffer = new char[length];
file.read(buffer, length);
file.close();

// Find the last comma, after which comes the current dictionary.
std::string strBuffer = buffer;
size_t position = strBuffer.find_last_of(',') + 1;
curr_dict_ = strBuffer.substr(position);

// Start the trainer; import the dictionary.
trainer_.reset( new Trainer(curr_dict_.c_str()) );

问题显然是应该存储我的字典值的 curr_dict_ 。例如，我的老师有一个名为10WS_PG2_P4_de_en_gefuehle.txt 的字典文件。 Trainer 像这样导入字典文件的全部内容：

std::string s_word_de;
std::string s_word_en;
std::string s_discard;
std::string s_count;
int i_word;

std::ifstream in(dictionaryDescriptor);

if( in.is_open() )
{
    getline(in, s_discard); // Discard first line.
    while( in >> i_word &&
        getline(in, s_word_de, '<') &&
        getline(in, s_discard, '>') &&
        getline(in, s_word_en, '(') &&
        getline(in, s_count, ')') )
    {   
        dict_.push_back(NumPair(s_word_de.c_str(), s_word_en.c_str(), Utility::lexical_cast<int, std::string>(s_count)));
    }
}
else
    std::cout << dictionaryDescriptor;

单行是这样写的

1             überglücklich <-> blissful                     (0)

curr_dict_ 似乎可以正常导入，但是在输出它时，我在文件末尾得到一大堆垃圾字符！

我什至使用十六进制编辑器来确保包含字典的文件最后不包含多余的字符。它没有。

顶部代码正在为字典读取的注册表文件：

Christian.reg

Christian,abc123,10WS_PG2_P4_de_en_gefuehle.txt

我做错了什么？

【问题讨论】：

这条声明泄露了：char *buffer = new char[length]; prefer std::vector<char> buffer(length);
或者更好的是，不要读入字符缓冲区，read directly into a string。（作为奖励，这也可以防止这个特殊的错误......）
@Martin：谢谢，已修复。 @Rudolph：看起来很可爱；我会尝试整合它。

标签： c++ file file-io

【解决方案1】：

read 函数（如file.read(buffer, length); 行中的）不会终止字符缓冲区。您需要手动完成（再分配一个字符，并将 nul 放在reading 之后的gcountth 位置）。

【讨论】：

所以我实际上还没有使用 gcount，但是从 cplusplus 告诉我的，它只是返回最后读取的字符的位置。甜的！如何使用它来将空终止符读取为 char *？
您需要手动输入（例如 buffer[file.gcount()] = 0.
这也是您需要分配一个额外字符的原因（使用字符缓冲区或向量方法时）。

【解决方案2】：

我会这样做：

std::string strBuffer(length, '\0');
myread(file, &strBuffer[read], length); // guranteed to read length bytes from file into buffer

完全避免需要中间缓冲区。

【讨论】：

-1：依赖于 std::string 实现的内部结构。由非连续存储实现的任何 std::string 都会失败。见stackoverflow.com/questions/760790/…
@Zan Lynx：我不敢苟同。 1) C++03 确实要求 &str[0] 返回指向contiguous storage 的指针 2) 字符串长度不会受到读取的影响（因为字符串与字符串数据保持长度分开（即它不依赖于字符串被 '\0' 终止））。原因是 data() c_str() 和 operator[] 这样做是为了允许（但不要求）实现提供字符串的引用计数版本。所以请删除你错误的-1。
很好，operator[] 将返回连续存储。但是，长度仍然是错误的。它将设置为文件的长度，但不能保证读取实际上读取了那么多数据。
我假设用户知道 read 可能在完成之前返回，因此需要循环来读取完整文件。那是一个不同的问题。示例代码就是（如上所述）不进行错误检查的代码，因此原始代码提供了 OP 提供的完全相同的 same 功能。但只是为了迂腐，我添加了所需的循环以实际保证文件被读入缓冲区。
还要考虑在获取文件长度和读取数据之间文件内容被重写的情况。