将数据存储到 Cassandra 时是 Big Endian 还是 Small Endian？答案

【问题标题】：Big Endian or Small Endian while storing the data into Cassandra?将数据存储到 Cassandra 时是 Big Endian 还是 Small Endian？
【发布时间】：2013-10-01 05:45:27
【问题描述】：

我需要使用 Java 代码将 Byte Array 值写入 Cassandra。然后我将让我的 C++ 程序从 Cassandra 读取相同的 Byte Array 数据。

该字节数组由三个字节数组组成，如下所述 -

short schemaId = 32767;
long lastModifiedDate = "1379811105109L";
byte[] avroBinaryValue = os.toByteArray();

现在，我将 schemaId 、 lastModifiedDate 和 avroBinaryValue 一起写入单个 Byte Array 和生成的字节数组，我将写回 Cassandra，然后我将拥有我的 C++ 程序，该程序将检索从 Cassandra 获取 Byte Array 数据，然后对其进行反序列化以从中提取 schemaId 、 lastModifiedDate 和 avroBinaryValue 。

所以现在我很困惑在写 Cassandra 时是否应该在我的 Java 代码中使用 Big Endian？还是在将数据存储到 Cassandra 时，这里是小字节序？

以下是代码，到目前为止，我已经使用 Java 将所有内容序列化为单字节数组...

public static void main(String[] args) throws Exception {

    String os = "whatever os is";
    byte[] avroBinaryValue = os.getBytes();

    long lastModifiedDate = 1379811105109L;
    short schemaId = 32767;

    ByteArrayOutputStream byteOsTest = new ByteArrayOutputStream();
    DataOutputStream outTest = new DataOutputStream(byteOsTest);

    outTest.writeShort(schemaId); // first write schemaId
    outTest.writeLong(lastModifiedDate); // second lastModifiedDate
    outTest.writeInt(avroBinaryValue.length); // then attributeLength
    outTest.write(avroBinaryValue); // then its value

    byte[] allWrittenBytesTest = byteOsTest.toByteArray();

    // write this allWrittenBytesTest into Cassandra

    // now deserialize it and extract everything from it
    DataInputStream inTest = new DataInputStream(new ByteArrayInputStream(allWrittenBytesTest));

    short schemaIdTest = inTest.readShort();

    long lastModifiedDateTest = inTest.readLong();

    int sizeAvroTest = inTest.readInt();
    byte[] avroBinaryValue1 = new byte[sizeAvroTest];
    inTest.read(avroBinaryValue1, 0, sizeAvroTest);


    System.out.println(schemaIdTest);
    System.out.println(lastModifiedDateTest);
    System.out.println(new String(avroBinaryValue1));

}

我还想看看在 Java 中是否有任何有效或正确的方法来执行此操作，因为我需要使用 C++ 程序从 Cassandra 检索这些数据，所以我不希望在 C++ 方面也有任何问题.. 所以我试图确保当我从 Java 端将这些数据写入 Cassandra 时，一切看起来都很好..

现在，为了测试我正在做的事情是-我正在将这个字节数组从 Java 程序写入一个文件，我正在使用 C++ 程序读取同一个文件，然后相应地反序列化那个字节数组..

我希望我的问题足够清楚。有人可以帮我解决这个问题吗？

【问题讨论】：

您知道ByteBuffer 将允许您直接指定大端或小端，那么您需要确保在C++ 端正确解码它？ See this question 用于类似但不相同的示例。
@WhozCraig：谢谢你的建议..我根本不知道 o ByteBuffer.. 我刚刚经历过.. 看起来我可以通过在编写时使用 ByteBuffer 使我的 Java 程序更好进入卡桑德拉？对？然后我可以使用 C++ 程序来指定我需要在 C++ 端遵循哪个字节序。你是否可以根据我上面的 Java 解决方案给我一个示例，说明如何使用 ByteBuffer 做同样的事情？这对我有很大帮助..谢谢..
我评论中的链接问题有一组很好的示例，说明如何使用short 进行操作。您应该能够对 32 位或 64 位 int 或 long 执行相同操作。 “知道”字节流中的值是大端或小端（我个人更喜欢前者）大大简化了 C++ 代码方面，SO 有许多关于重新组装一个代码的示例，或者您可以使用 @987654337 @.
@WhozCraig：我更新了我正在使用 ByteBuffer 的问题。你能看一下，让我知道我得到了正确的东西吗？如果我使用 ByteBuffer 路由，我也不确定如何反序列化？

标签： java c++ cassandra bytearray endianness

【解决方案1】：

为什么不使用像 google protobuf (http://code.google.com/p/protobuf/) 这样的序列化框架，这样您就不必担心底层细节并从任何语言和工具读取和写入

【讨论】：

由于某种原因我不能使用它，因为我不想序列化两次，因为我的实际值是 Avro 二进制编码值......而且我也不要求评估不同的序列化框架..
据我所知，更好的方法是序列化框架。既然这里已经没有问题了，除非你存储和读取相同的字节序，否则没关系。如果你要从不同的字节序机器读取值，那么使用最常见的一个，这样转换次数就会更少。
是的，我同意 Pradheep .. 但我的实际值是 Avro 二进制编码值，它本身就是一种数据序列化格式。我不能用其他一些序列化格式再次对数据进行二进制编码。我需要将三个字节数组合并为一个......如果我使用另一种序列化格式，那么我需要序列化/反序列化两次，这不是我想要的......