通过网络构建和发送二进制数据答案

【问题标题】：Constructing and sending binary data over network通过网络构建和发送二进制数据
【发布时间】：2018-03-31 15:56:50
【问题描述】：

我正在为我的世界创建一个命令行客户端。可以在此处找到有关该协议的完整规范：http://mc.kev009.com/Protocol。事先回答你的问题，是的，我有点 C++ 菜鸟。

我在实施此协议时遇到了各种问题，其中每个问题都很关键。

协议规定所有类型都是大端的。我不知道应该如何检查我的数据是否为小端，如果是，如何转换为大端。
字符串数据类型有点奇怪。它是一个修改后的 UTF-8 字符串，前面有一个包含字符串长度的短字符串。我不知道我应该如何将它打包到一个简单的 char[] 数组中，也不知道如何将我的简单字符串转换为修改后的 UTF-8 字符串。
即使我知道如何将数据转换为大端并创建修改后的 UTF-8 字符串，我仍然不知道如何将其打包到 char[] 数组中并将其作为一个包发送。我之前所做的只是简单的 ASCII 的简单 HTTP 网络。

非常感谢解释、链接、相关函数名称和简短的 sn-ps！

编辑

现在回答 1 和 3。 1 由 user470379 在下面回答。 3 由这个 AWESOME 线程回答，该线程解释了我想要做的很好：http://cboard.cprogramming.com/networking-device-communication/68196-sending-non-char*-data.html 不过我不确定修改后的 UTF-8。

【问题讨论】：

已经有很多关于 SO 字节序的好问题和答案了。

标签： c++ networking binary minecraft

【解决方案1】：

对于#1，您需要使用ntohs 和朋友。对 16 位整数使用 *s（短）版本，对 32 位整数使用 *l（长）版本。 hton*（主机到网络）会将传出数据转换为大端，而与您所在平台的字节序无关，ntoh*（网络到主机）会将传入数据转换回来（同样，与平台字节序无关)

【讨论】：

谢谢。知道如何通过以字节为单位将这些 int 存储在 char[] 数组中吗？

【解决方案2】：

在我的头顶上......

const char* s;  // the string you want to send
short len = strlen(s);

// allocate a buffer with enough room for the length info and the string
char* xfer = new char[ len + sizeof(short) ];

// copy the length info into the start of the buffer
// note:  you need to hanle endian-ness of the short here.
memcpy(xfer, &len, sizeof(short));

// copy the string into the buffer
strncpy(xfer + sizeof(short), s, len);

// now xfer is the string you want to send across the wire.
// it starts with a short to identify its length.
// it is NOT null-terminated.

【讨论】：

你真的应该为长度声明两个变量，一个作为 size_t 来保存实际长度；或者至少将len 设为无符号短。就目前而言，当strlen(s) % 65536 > 32768 时，这段代码将严重失败。
好点。或者我们可以只做short len = boost::numeric_cast<short>(strlen(s));，如果它不适合短，就会抛出异常。

【解决方案3】：

传统的做法是为每个协议消息定义一个C++消息结构，并为其实现序列化和反序列化功能。例如Login Request 可以这样表示：

#include <string>
#include <stdint.h>

struct LoginRequest
{
    int32_t protocol_version;
    std::string username;
    std::string password;
    int64_t map_seed;
    int8_t dimension;
};

现在需要序列化函数。首先它需要整数和字符串的序列化函数，因为这些是LoginRequest 中的成员类型。

整数序列化函数需要与大端表示进行转换。由于消息的成员被复制到缓冲区和从缓冲区复制，因此可以在复制时反转字节顺序：

#include <boost/detail/endian.hpp>
#include <algorithm>

#ifdef BOOST_LITTLE_ENDIAN

    inline void xcopy(void* dst, void const* src, size_t n)
    {
        char const* csrc = static_cast<char const*>(src);
        std::reverse_copy(csrc, csrc + n, static_cast<char*>(dst));
    }

#elif defined(BOOST_BIG_ENDIAN)

    inline void xcopy(void* dst, void const* src, size_t n)
    {
        char const* csrc = static_cast<char const*>(src);
        std::copy(csrc, csrc + n, static_cast<char*>(dst));
    }

#endif

// serialize an integer in big-endian format
// returns one past the last written byte, or >buf_end if would overflow
template<class T>
typename boost::enable_if<boost::is_integral<T>, char*>::type serialize(T val, char* buf_beg, char* buf_end)
{
    char* p = buf_beg + sizeof(T);
    if(p <= buf_end)
        xcopy(buf_beg, &val, sizeof(T));
    return p;
}

// deserialize an integer from big-endian format
// returns one past the last written byte, or >buf_end if would underflow (incomplete message)
template<class T>
typename boost::enable_if<boost::is_integral<T>, char const*>::type deserialize(T& val, char const* buf_beg, char const* buf_end)
{
    char const* p = buf_beg + sizeof(T);
    if(p <= buf_end)
        xcopy(&val, buf_beg, sizeof(T));
    return p;
}

对于字符串（处理modified UTF-8 the same way as asciiz strings）：

// serialize a UTF-8 string
// returns one past the last written byte, or >buf_end if would overflow
char* serialize(std::string const& val, char* buf_beg, char* buf_end)
{
    int16_t len = val.size();
    buf_beg = serialize(len, buf_beg, buf_end);
    char* p = buf_beg + len;
    if(p <= buf_end)
        memcpy(buf_beg, val.data(), len);
    return p;
}

// deserialize a UTF-8 string
// returns one past the last written byte, or >buf_end if would underflow (incomplete message)
char const* deserialize(std::string& val, char const* buf_beg, char const* buf_end)
{
    int16_t len;
    buf_beg = deserialize(len, buf_beg, buf_end);
    if(buf_beg > buf_end)
        return buf_beg; // incomplete message
    char const* p = buf_beg + len;
    if(p <= buf_end)
        val.assign(buf_beg, p);
    return p;
}

还有几个辅助函子：

struct Serializer
{
    template<class T>
    char* operator()(T const& val, char* buf_beg, char* buf_end)
    {
        return serialize(val, buf_beg, buf_end);
    }
};

struct Deserializer
{
    template<class T>
    char const* operator()(T& val, char const* buf_beg, char const* buf_end)
    {
        return deserialize(val, buf_beg, buf_end);
    }
};

现在使用这些原始函数，我们可以轻松地序列化和反序列化LoginRequest 消息：

template<class Iterator, class Functor>
Iterator do_io(LoginRequest& msg, Iterator buf_beg, Iterator buf_end, Functor f)
{
    buf_beg = f(msg.protocol_version, buf_beg, buf_end);
    buf_beg = f(msg.username, buf_beg, buf_end);
    buf_beg = f(msg.password, buf_beg, buf_end);
    buf_beg = f(msg.map_seed, buf_beg, buf_end);
    buf_beg = f(msg.dimension, buf_beg, buf_end);
    return buf_beg;
}

char* serialize(LoginRequest const& msg, char* buf_beg, char* buf_end)
{
    return do_io(const_cast<LoginRequest&>(msg), buf_beg, buf_end, Serializer());
}

char const* deserialize(LoginRequest& msg, char const* buf_beg, char const* buf_end)
{
    return do_io(msg, buf_beg, buf_end, Deserializer());
}

使用上面的辅助函子并将输入/输出缓冲区表示为char 迭代器范围，只需要一个函数模板来执行消息的序列化和反序列化。

综合起来，用法：

int main()
{
    char buf[0x100];
    char* buf_beg = buf;
    char* buf_end = buf + sizeof buf;

    LoginRequest msg;

    char* msg_end_1 = serialize(msg, buf, buf_end);
    if(msg_end_1 > buf_end)
        ; // more buffer space required to serialize the message

    char const* msg_end_2 = deserialize(msg, buf_beg, buf_end);
    if(msg_end_2 > buf_end)
        ; // incomplete message, more data required
}

【讨论】：

你。先生。是。惊人的。享受你的 +25。
虽然使用默认的网络端函数不是更好吗？ beej.us/guide/bgnet/output/html/multipage/htonsman.html
没有 64 位版本的 hton-functions，只有 16 和 32。betoh 和 hoteb 只是具有 64 位整数支持的相同功能的不同 Linux 特定名称。在将 big-endian 转换为 little-endian 并返回时，您也可以通过使用 memcpy() 的反转版本来完全避免使用它们。
如果主机操作系统已经是大端的，memcpy() 的逆向版本会破坏一切。你是对的，没有 64 位版本，但我不需要这些用于这个程序。
使用memcpy()的反转版本仅在将big-endian转换为little-endian时，当没有转换时（即big-endian到big-endian）使用直接转发memcpy()。可以在预处理或编译时检测主机字节顺序。