由于似乎没有办法“低于” QWebChannel(无需破解 Qt 本身),因此似乎有三种选择:
创建一个单独的 WebSockets 连接。这将需要更大的非本地更改,并可能创建额外的流程。
在 C++ 端进行字节到文本解码,可能使用 QTextDecoder。那可能是最简单的。然而,有架构上的原因更喜欢在我们进行转义序列处理的地方进行文本解码。这当然更传统,并且可能更好地处理极端情况。另外,DomTerm 需要与 QtDomTerm 前端以外的其他前端一起工作。
将字节流编码为字符串序列,使用 QWebChannel 传输后者,在 JavaScript 端转换回字节,然后在 JavaScript 中处理文本解码。
我选择实施第三个选项。后端转换有一些开销。但是,我设计并实现了一种非常有效的编码:
/** Encode an arbitrary sequence of bytes as an ASCII string.
* This is used because QWebChannel doesn't have a way to transmit
* data except as strings or JSON-encoded strings.
* We restrict the encoding to ASCII (i.e. codes less then 128)
* to avoid excess bytes if the result is UTF-8-encoded.
*
* The encoding optimizes UTF-8 data, with the following byte values:
* 0-3: 1st byte of a 2-byte sequence encoding an arbitrary 8-bit byte.
* 4-7: 1st byte of a 2-byte sequence encoding a 2-byte UTF8 Latin-1 character.
* 8-13: mean the same ASCII control character
* 14: special case for ESC
* 15: followed by 2 more bytes encodes a 2-byte UTF8 sequence.
* bytes 16-31: 1st byte of a 3-byte sequence encoding a 3-byte UTF8 sequence.
* 32-127: mean the same ASCII printable character
* The only times we generate extra bytes for a valid UTF8 sequence
* if for code-points 0-7, 14-26, 28-31, 0x100-0x7ff.
* A byte that is not part of a valid UTF9 sequence may need 2 bytes.
* (A character whose encoding is partial, may also need extra bytes.)
*/
static QString encodeAsAscii(const char * buf, int len)
{
QString str;
const unsigned char *ptr = (const unsigned char *) buf;
const unsigned char *end = ptr + len;
while (ptr < end) {
unsigned char ch = *ptr++;
if (ch >= 32 || (ch >= 8 && ch <= 13)) {
// Characters in the printable ascii range plus "standard C"
// control characters are encoded as-is
str.append(QChar(ch));
} else if (ch == 27) {
// Special case for ESC, encoded as '\016'
str.append(QChar(14));
} else if ((ch & 0xD0) == 0xC0 && end - ptr >= 1
&& (ptr[0] & 0xC0) == 0x80) {
// Optimization of 2-byte UTF-8 sequence
if ((ch & 0x1C) == 0) {
// If Latin-1 encode 110000aa,10bbbbbb as 1aa,0BBBBBBB
// where BBBBBBB=48+bbbbbb
str.append(4 + QChar(ch & 3));
} else {
// Else encode 110aaaaa,10bbbbbb as '\017',00AAAAA,0BBBBBBB
// where AAAAAA=48+aaaaa;BBBBBBB=48+bbbbbb
str.append(QChar(15));
str.append(QChar(48 + (ch & 0x3F)));
}
str.append(QChar(48 + (*ptr++ & 0x3F)));
} else if ((ch & 0xF0) == 0xE0 && end - ptr >= 2
&& (ptr[0] & 0xC0) == 0x80 && (ptr[1] & 0xC0) == 0x80) {
// Optimization of 3-byte UTF-8 sequence
// encode 1110aaaa,10bbbbbb,10cccccc as AAAA,0BBBBBBB,0CCCCCCC
// where AAAA=16+aaaa;BBBBBBB=48+bbbbbb;CCCCCCC=48+cccccc
str.append(QChar(16 + (ch & 0xF)));
str.append(QChar(48 + (*ptr++ & 0x3F)));
str.append(QChar(48 + (*ptr++ & 0x3F)));
} else {
// The fall-back case - use 2 bytes for 1:
// encode aabbbbbb as 000000aa,0BBBBBBB, where BBBBBBB=48+bbbbbb
str.append(QChar((ch >> 6) & 3));
str.append(QChar(48 + (ch & 0x3F)));
}
}
return str;
}
是的,这是矫枉过正,几乎可以肯定是过早的优化,但我忍不住 :-)