UTF 编码/解码后不打印口音答案

【问题标题】：Accents aren't print after UTF encoding/decodingUTF 编码/解码后不打印口音
【发布时间】：2015-04-26 14:10:04
【问题描述】：

我已经阅读了 several articles 整个 topic ，但我仍然不明白这里发生了什么。请在下面的工作示例中亲自查看（实际上，没有示例，这是我正在处理的完整课程，并添加了一些 main()）。

public class Console extends JFrame {

    private static final long serialVersionUID = 2260047176466126845L;
    private static final String ENCODING = "UTF-8";

    private BlockingQueue<Integer> inBuffer = new LinkedBlockingQueue<Integer>();
    private JTextArea display = new JTextArea();
    private JTextField input = new JTextField();
    private ActionListener listener = new ActionListener() {

        @Override
        public void actionPerformed(ActionEvent e) {
            System.out.println("Input: " + input.getText());
            byte[] bytes = (input.getText() + "\n").getBytes(Charset.forName(ENCODING));
            input.setText("");
            System.out.println("Bytes: " + Arrays.toString(bytes));
            for(byte b : bytes) {
                inBuffer.offer((int) b);
            }
        }
    };

    public Console() {
        super("Debugging");

        LayoutManager layout = new BoxLayout(this.getContentPane(), BoxLayout.Y_AXIS);
        setLayout(layout);
        display.setPreferredSize(new Dimension(420, 210));
        display.setEditable(false);
        input.addActionListener(listener);
        input.setPreferredSize(new Dimension(420, 20));
        add(display);
        add(input);
        pack();
        setVisible(true);
    }

    public final BufferedReader in = new BufferedReader(
            new InputStreamReader(
                    new InputStream() {

                        boolean lastWasEnd = false;

                        @Override
                        public int read() throws IOException {
                            Integer c;
                            if(lastWasEnd) {
                                lastWasEnd = false;
                                return -1;
                            }

                            try {
                                c = inBuffer.poll(10, TimeUnit.MINUTES);
                                lastWasEnd = inBuffer.isEmpty();
                                return c;
                            } catch (InterruptedException e) {
                                e.printStackTrace();
                            }

                            return -1;
                        }
                    }, Charset.forName(ENCODING)
            )
    );

    public final PrintStream out = new PrintStream(new OutputStream() {

        @Override
        public void write(int b) throws IOException {
            display.append(Character.toString((char) b));
        }

    });

    public static void main(String args[]) {
        Console cons = new Console();
        cons.out.println(">> Console started. Using charset: " + Charset.forName(ENCODING));
        while(true) {
            System.out.println("Loop");
            try {
                cons.out.println(">> " + cons.in.readLine());
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

一切顺利，直到我尝试在标准 ASCII 范围内写入任何字符，例如但不限于 áéíóúñ。在这些情况下，我会得到 missing character squares。我尝试使用其他编码无济于事。

更新：

一些具体问题：

为什么不在InputStreamReader 的构造函数中指定字符集使其正确解码多字节字符。
InputStreams 有时会收到超过一个字节的字符。他们如何识别和处理这些字符。

更新 2：

我完全忘记了这段代码：

@Override
public void write(int b) throws IOException {
    display.append(Character.toString((char) b));
}

这是造成麻烦的原因。我会正确地重写它，并期望没有进一步的编码/解码问题。

【问题讨论】：

把BlockingQueue<Integer>改成BlockingQueue<Character>
必须转换为字节才能使用InputStream.read()，那时我将再次面临同样的问题！（我认为...）

标签： java unicode encoding inputstream bufferedreader

【解决方案1】：

UTF-8 是一种多字节编码。这意味着一个字符可能具有超过一个字节长的表示，特别是如果它不是 US-ASCII 类型的字符。由于不清楚的原因，您专门将字符串分解为字节，并附加它们。因此，您将这些字符分解为单个字节，然后将这些字节视为整个字符。

如果字符长度超过一个字节，这将不起作用。

考虑一下为什么要尝试将单个字节而不是整个字符排入队列，如果没有充分的理由，请尝试不将字符串转换为字节而是字符。

【讨论】：

我希望交互式控制台以调用System.in 和System.out 工作的方式提供某种“标准”I/O。考虑到这一点，我尝试使用PrintStream 进行输出，并将InputStream 方便地包裹在BufferedReader 中进行输入。我必须定义一些方法来将我的JTextField 中的所有内容路由到InputSteam.read()，它会一个接一个地返回字节。
我认为解码这些字节是在InputStreamReader(InputStream, Charset) 代码中自动完成的，但似乎并非如此。我还尝试将 UTF-16 与该字符集一起使用，以便强制使用两个字节进行解码，但错误的行为仍然没有受到影响。
好吧，我完全忘记了.append()。我疯狂地重写了输入部分，而输出一直是问题所在。我将您的标记为已接受的答案。谢谢。

【解决方案2】：

为了记录，我最终实现了一个基本的缓冲OutputStream.write()，如下所示，现在所有 I/O 工作正常。

这是我为修复输出而编写的内容。我想改进 endline ('\n') 检测，让它看起来不那么骇人听闻，但我现在还没有找到合适的解决方案，所以与 10 进行比较就可以了。

public final PrintStream out = new PrintStream(new OutputStream() {
    private ByteBuffer buffer = ByteBuffer.allocate(8192);

    @Override
    public void write(int b) throws IOException {

        buffer.put((byte) b);
        if(b == 10) {
            buffer.flip();
            String output = decoder.decode(buffer).toString();
            display.append(output);
            buffer.clear();
        }
    }
});

【讨论】：