连接这两个字符串如何导致这个结果？答案

【问题标题】：How does concatting these two strings lead to this result?连接这两个字符串如何导致这个结果？
【发布时间】：2021-12-22 04:37:09
【问题描述】：

我编写了下面的代码来说明连接这两个字符串时发生的这种行为。

const getBytes = x => {
    let buf = Buffer.from(x);
    const n = [];
    
    for (const value of buf.values()) {
        n.push(value);
    }

    return [n, parseInt(buf.toString('hex'), 16)];
};


let x = unescape('%uDB40');
let y = unescape('%uDD31');

console.log(typeof(x), typeof(y));
console.log(Buffer.from(x), getBytes(x), );
console.log(Buffer.from(y), getBytes(y));
console.log(Buffer.from(x+y), getBytes(x+y));

结果是：

string string
<Buffer ef bf bd> [ [ 239, 191, 189 ], 15712189 ]
<Buffer ef bf bd> [ [ 239, 191, 189 ], 15712189 ]
<Buffer f3 a0 84 b1> [ [ 243, 160, 132, 177 ], 4087383217 ]

我无法理解它是如何以完全不同的结果结束的，这使我无法成功移植此行为。

【问题讨论】：

标签： javascript hex emulation

【解决方案1】：

正如Buffer.from 文档所述：“在缓冲区和字符串之间进行转换时，可以指定字符编码。如果未指定字符编码，则默认使用 UTF-8。”

要从 JavaScript 字符串 (UTF-16) 转换为缓冲区 (UTF-8)，您写道：

let buf = Buffer.from(x);

如果 UTF-16 到 UTF-8 字符转换失败，它将按照 Unicode 标准的规定将 Unicode 替换字符写入 Buffer。

我使用这些链接回答了您的问题：Node.js: Buffer、U+DB40、U+DD31、U+FFFD、U+E0131。

对于 JavaScript 参考：JavaScript: The Definitive Guide, by David Flanagan。

对于 Unicode 参考：Unicode Standard。

【讨论】：