保存文件字节的最合适的向量类型是什么？答案

【问题标题】：What is the most suitable type of vector to keep the bytes of a file?保存文件字节的最合适的向量类型是什么？
【发布时间】：2016-10-14 18:56:27
【问题描述】：

什么是最适合保存文件字节的向量类型？

我正在考虑使用 int 类型，因为位“00000000”（1 个字节）被解释为 0！

目标是将这些数据（字节）保存到一个文件中，以后再从这个文件中检索。

注意：文件包含空字节（“00000000”位）！

我有点迷路了。帮我！ =D 谢谢！

更新一：

要读取我正在使用此功能的文件：

char* readFileBytes(const char *name){
    std::ifstream fl(name);
    fl.seekg( 0, std::ios::end );
    size_t len = fl.tellg();
    char *ret = new char[len];
    fl.seekg(0, std::ios::beg);
    fl.read(ret, len);
    fl.close();
    return ret;
}

注意我：我需要找到一种方法来确保可以从文件中恢复位“00000000”！

注意二：有什么建议可以安全地将这些位“00000000”保存到文件中吗？

注意三：当使用 char 数组时，我在为该类型转换位“00000000”时遇到问题。

代码片段：

int bit8Array[] = {0, 0, 0, 0, 0, 0, 0, 0};
char charByte = (bit8Array[7]     ) | 
                (bit8Array[6] << 1) | 
                (bit8Array[5] << 2) | 
                (bit8Array[4] << 3) | 
                (bit8Array[3] << 4) | 
                (bit8Array[2] << 5) | 
                (bit8Array[1] << 6) | 
                (bit8Array[0] << 7);

更新二：

遵循@chqrlie 的建议。

#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm>
#include <random>
#include <cstring>
#include <iterator>

std::vector<unsigned char> readFileBytes(const char* filename)
{
    // Open the file.
    std::ifstream file(filename, std::ios::binary);

    // Stop eating new lines in binary mode!
    file.unsetf(std::ios::skipws);

    // Get its size
    std::streampos fileSize;

    file.seekg(0, std::ios::end);
    fileSize = file.tellg();
    file.seekg(0, std::ios::beg);

    // Reserve capacity.
    std::vector<unsigned char> unsignedCharVec;
    unsignedCharVec.reserve(fileSize);

    // Read the data.
    unsignedCharVec.insert(unsignedCharVec.begin(),
               std::istream_iterator<unsigned char>(file),
               std::istream_iterator<unsigned char>());

    return unsignedCharVec;
}

int main(){

    std::vector<unsigned char> unsignedCharVec;

    // txt file contents "xz"
    unsignedCharVec=readFileBytes("xz.txt");

    // Letters -> UTF8/HEX -> bits!
    // x -> 78 -> 0111 1000
    // z -> 7a -> 0111 1010

    for(unsigned char c : unsignedCharVec){
        printf("%c\n", c);
        for(int o=7; o >= 0; o--){
            printf("%i", ((c >> o) & 1));
        }
        printf("%s", "\n");
    }

    // Prints...
    // x
    // 01111000
    // z
    // 01111010

    return 0;
}

更新三：

这是我用来写入二进制文件的代码：

void writeFileBytes(const char* filename, std::vector<unsigned char>& fileBytes){
    std::ofstream file(filename, std::ios::out|std::ios::binary);
    file.write(fileBytes.size() ? (char*)&fileBytes[0] : 0, 
               std::streamsize(fileBytes.size()));
}

writeFileBytes("xz.bin", fileBytesOutput);

更新四：

进一步阅读UPDATE III：

c++ - Save the contents of a "std::vector<unsigned char>" to a file

结论：

“00000000”位（1字节）问题的解决方案，当然是在朋友的指导下将文件字节的存储类型更改为std::vector<unsigned char>。 std::vector<unsigned char> 是一种通用类型（存在于所有环境中）并且可以接受任何八进制数（与“UPDATE I”中的 char* 不同）！

此外，从数组 (char) 更改为向量 (unsigned char) 对成功至关重要！使用矢量，我可以更安全地操作数据，并且完全独立于其内容（在 char 数组中我遇到了问题）。

非常感谢！

【问题讨论】：

你在用这些字节做什么？
unsigned char 将保存通用字节。
我会使用uint8_t
@NathanOliver 保存到文件并稍后阅读此文件。谢谢！
我同意 krzaq - 使用 uint8_t。

标签： c++ visual-c++ byte bit

【解决方案1】：

使用std::vector<unsigned char>。不要使用std::uint8_t：它不会存在于没有恰好 8 位的本机硬件类型的系统上。 unsigned char 将永远存在；它通常是硬件支持的最小可寻址类型，并且至少需要 8 位宽，因此如果您要处理 8 位字节，它将处理您需要的位。

如果你真的、真的、真的很喜欢固定宽度类型，你可以考虑std::uint_least8_t，它永远存在，并且至少有八位，或者std::uint_fast8_t，它也至少有八位。但是 char 类型的文件 I/O 流量，以及混合 char 及其变体与模糊指定的“最少”和“快速”类型可能会让人感到困惑。

【讨论】：

在我看来，“无符号字符”是我的“00000000”位（字节）的解决方案。我会做测试。我会回报的！谢谢！ =D

【解决方案2】：

你的代码有3个问题：

您使用char 类型并返回char *。然而，返回值不是正确的 C 字符串，因为您没有为 '\0' 终止符分配额外的字节，也没有空终止它。
如果文件可能包含空字节，您可能应该使用unsigned char 或uint8_t 类型来明确表示该数组不包含文本。
您不会将数组大小返回给调用者。调用者无法判断数组有多长。您可能应该使用std::vector<uint8_t> 或std::vector<unsigned char>，而不是使用new 分配的数组。

【讨论】：

我听从了你的建议。在我看来，“无符号字符”是我的“00000000”位（字节）的解决方案。我会做测试。我给你回报！谢谢！ =D
@EduardoLucio 第 3 点在这里很重要。您需要一种方法来判断数据的长度，否则约定是用 0 位的值标记结尾。我假设这是你问题的根源。否则，无论您使用的是char、unsigned char 还是uint8_t，都无关紧要，除了记录您在做什么——它们的行为都相同。将字节保存在 char 数组中是很常见的事情，没有人会被它弄糊涂。

【解决方案3】：

uint8_t是我眼中的赢家：

正好是 8 位，或 1 个字节，长；
它是未签名的，无需您每次都输入unsigned；
在所有平台上完全相同；
它是一个通用类型，并不意味着任何特定用途，不像char / unsigned char，它与文本字符相关联，即使它在技术上可以用于任何目的，就像uint8_t一样。

底线：uint8_t 在功能上等同于unsigned char，但在源代码中这是一些未指定性质的数据做得更好。

所以使用std::vector<uint8_t>。
#include <stdint.h> 使uint8_t 定义可用。

P。 S. 正如 cmets 中所指出的，C++ 标准将 char 定义为 1 个字节，严格来说，字节不需要与 octet（8 位）相同。在这样一个假设的系统上，char 仍然存在，并且长度为 1 个字节，但uint8_t 被定义为 8 位 (octet)，因此可能不存在（由于实现困难/开销）。所以char 从理论上讲更便携，但uint8_t 更严格，对预期行为有更广泛的保证。

【讨论】：

@VioletGiraffe 是的。 sizeof(char) == sizeof(signed char) == sizeof(unsigned char) == 1.
@NathanOliver：谢谢先生，它现在可以工作了。我还有一个问题，你知道如何用 C++ 制作音乐
@LightnessRacesinOrbit std::cout << "\a"; 将开始你的旅程，你只需要在“笔记”之间添加一些暂停
恐怕在 1 个字符超过 8 位的系统上，uint8_t 要么不被支持，要么也超过 8 位，因为数据类型不能有 sizeof
@Slava 它不会存在，因为uint8_t 只需要 8 位宽。只有存在对应大小合适的基本类型时，实现才能提供类型。