【发布时间】:2019-04-15 13:31:52
【问题描述】:
如果你将下面的字符串输入到一个用utf8(不带bom)编码的文本文件中,然后用notepad.exe打开它,你会在屏幕上看到一些奇怪的字符。但是记事本实际上可以在没有最后一个'a'的情况下很好地解码这个字符串。非常奇怪的行为。我使用的是 Windows 10 1809。
[19, 16, 12, 14, 15, 15, 12, 17, 18, 15, 14, 15, 19, 13, 20, 18, 16, 19, 14, 16, 20, 16, 18, 12, 13, 14, 15, 20, 19, 17, 14, 17, 18, 16, 13, 12, 17, 14, 16, 13, 13, 12, 15, 20, 19, 15, 19, 13, 18, 19, 17, 14, 17, 18, 12, 15, 18, 12, 19, 15, 12, 19, 18, 12, 17, 20, 14, 16, 17, 18, 15, 12, 13, 19, 18, 17, 18, 14, 19, 18, 16, 15, 18, 17, 15, 15, 19, 16, 15, 14, 19, 13, 19, 15, 17, 16, 12, 12, 18, 12, 14, 12, 16, 19, 12, 19, 12, 17, 19, 20, 19, 17, 19, 20, 16, 19, 16, 19, 16, 12, 12, 18, 19, 17, 18, 16, 12, 17, 13, 18, 20, 19, 18, 20, 14, 16, 13, 12, 12, 14, 13, 19, 17, 20, 18, 15, 12, 15, 20, 14, 16, 15, 16, 19, 20, 20, 12, 17, 13, 20, 16, 20, 13a
我想知道这是否是一个 Windows 错误,或者我可以做些什么来解决这个问题。
【问题讨论】:
-
似乎记事本将其解释为整个字符串的固定 2 字节,因此在内部将其转换为 UCS-2。 [1 9, 1 6, 1 2, 1 映射到 ㅛ ⰹ ㄠ ⰶ ㄠ ⰲ ㄠ ,所以第一个字符实际上是'[1',第二个是'9,',第三个是'1',等等。所以当你删除最后一个 'a',它不能将它编码为 2 字节字符。如果以上内容令人困惑,我很抱歉。我只懂点点滴滴。仍在试图弄清楚这一切。
标签: windows utf-8 character-encoding notepad