将 UTF-16 字节数组编码为字符串字符 C# .NET答案

【问题标题】：Encoding a UTF-16 Byte Array into a string character C# .NET将 UTF-16 字节数组编码为字符串字符 C# .NET
【发布时间】：2020-07-17 13:04:32
【问题描述】：

我有一个字节数组，我相信它正确地存储了一个 UTF-16 编码的代理对，用于 unicode 字符????

通过 .Net System.Text.Encoding.Unicode.GetString() 运行该字节数组会返回非预期结果。

实际结果：��

预期结果：??????

代码示例：

byte[] inputByteArray = new byte[4];
inputByteArray[0] = 0x91;
inputByteArray[1] = 0xDF;
inputByteArray[2] = 0x00;
inputByteArray[3] = 0xD8;

// System.Text.Encoding.Unicode accepts little endian UTF-16
// Least significant byte first within the byte array [0] MSByete in [3]
string str = System.Text.Encoding.Unicode.GetString(inputByteArray);

// This returns �� rather than the excpected symbol: ???? 
Console.WriteLine(str);

详细说明我是如何从字符获取特定字节数组的：????

这个角色在补充多语言位面。 Unicode 中的这个字符是 0x10391。编码成 UTF-16 代理对，这应该是：

用 0x10000 减去 Unicode 值：val = 0x00391 = (0x10391 - 0x10000)

高代理：0xD800 = ( 0xD800 + (0x00391 >> 10 )) 前 10 位

低代理：0xDF91 = (0xDC00 + (0x00391 & 0b_0011_1111_1111)) 后 10 位

【问题讨论】：

标签： c# .net unicode encoding utf-16

【解决方案1】：

Encoding.Unicode 是基于每个 UTF-16 代码单元的小端序。您仍然需要将高代理代码单元放在低代理代码单元之前。这是有效的示例代码：

using System;
using System.Text;

class Test
{
    static void Main()
    {
        byte[] data =
        {
            0x00, 0xD8, // High surrogate
            0x91, 0xDF  // Low surrogate
        };
        string text = Encoding.Unicode.GetString(data);
        Console.WriteLine(char.ConvertToUtf32(text, 0)); // 66449
    }
}

【讨论】：