防止将 Unicode 字节顺序标记写入文件中间答案

【问题标题】：Preventing Unicode Byte Order Mark to be written in the middle of a file防止将 Unicode 字节顺序标记写入文件中间
【发布时间】：2014-07-06 09:29:59
【问题描述】：

此代码在文件通道中写入两个字符串

final byte[] title = "Title: ".getBytes("UTF-16");
final byte[] body = "This is a string.".getBytes("UTF-16");
ByteBuffer titlebuf = ByteBuffer.wrap(title);
ByteBuffer bodybuf = ByteBuffer.wrap(body);
FileChannel fc = FileChannel.open(p, READ, WRITE, TRUNCATE_EXISTING);
fc.position(title.length); // second string written first, but not relevant to the problem
while (bodybuf.hasRemaining()) fc.write(bodybuf);
fc.position(0);
while (titlebuf.hasRemaining()) fc.write(titlebuf);

每个字符串都以 BOM 为前缀。

[Title: ?T]  *254 255* 0 84 0 105 0 116 0 108 0 101 58 0 32 *254 255* 0 84

虽然可以在文件的开头有一个，但是当流中间有一个时会产生问题。

我怎样才能防止这种情况发生？

【问题讨论】：

标签： java unicode byte-order-mark filechannel

【解决方案1】：

当您使用 BOM 调用 get UTF-16 时会插入 BOM 字节：

final byte[] title = "Title: ".getBytes("UTF-16");

检查 title.length，你会发现它包含额外的 2 个字节用于 BOM 标记

这样您就可以处理这些数组并在包装到 ByteBuffer 之前从中删除 BOM，或者您可以在将 ByteBuffer 写入文件时忽略它

其他解决方案，您可以使用不会写入 BOM 标记的 UTF-16 Little/BIG Endianness：

final byte[] title = "Title: ".getBytes("UTF-16LE");

如果不需要 UTF-16，您也可以使用 UTF-8：

final byte[] title = "Title: ".getBytes("UTF-8");

【讨论】：

谢谢，所有选项都有效。我还在其他地方读到，BOM 是为 UTF-16 编写的，只是因为要区分两种字节序，正如你所说，如果指定了一个，那么 BOM 就不会被写入。