【问题标题】:Xalan Transformer transform breaks Unicode 6 astral charactersXalan Transformer 转换破坏了 Unicode 6 星体字符
【发布时间】:2015-08-22 04:05:56
【问题描述】:

角色1F48B是在Unicode 6.0中引入的

Unicode 6.0 support was introduced in Java 7.

我无法让 Xalan 2.7.2 的序列化程序正确写入该字符;而是写

在下游,事情会变得很糟糕:

org.xml.sax.SAXParseException; Character reference "&#55357" is an invalid XML character.
    at org.apache.xerces.parsers.AbstractSAXParser.parse

相比之下,Saxon 8.7 正确地对其进行了序列化。

有谁知道如何让 Xalan 正确编写它?

这是显示问题的代码:

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Element;

public class SurrogatePairSerialisation {

    public static String TRANSFORMER_FACTORY_PROCESSOR_XALAN = "org.apache.xalan.processor.TransformerFactoryImpl";

    public static String TRANSFORMER_FACTORY_SAXON = "net.sf.saxon.TransformerFactoryImpl";

    public static TransformerFactory transformerFactory;

    static {
        System.setProperty("javax.xml.transform.TransformerFactory",
                TRANSFORMER_FACTORY_PROCESSOR_XALAN);
    //          TRANSFORMER_FACTORY_SAXON);

        transformerFactory = javax.xml.transform.TransformerFactory.newInstance();  
    }



public static void main(String[] args) throws Exception {

    // Verify using Java 7 or greater
    System.out.println(System.getProperty("java.vendor") );
    System.out.println( System.getProperty("java.version") ); 

    char[] chars = {55357, 56459};
    int codePoint = Character.codePointAt(chars, 0);

    // Verify its a valid code point
    System.out.println(Character.isValidCodePoint(codePoint));

    // Convert it to a string
    String astral = new String(Character.toChars(codePoint));

    // Show that we can write the string to a file
    FileOutputStream fos = new FileOutputStream(new File(System.getProperty("user.dir") + "/astral.txt"));
    fos.write(astral.getBytes("UTF-8"));
    fos.close(); // it is written as U+1F48B, as expected

    // Now show how it all falls apart with Xalan 
    // Create a DOM doc containing astral char
    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    documentBuilderFactory.setNamespaceAware(true);
    DocumentBuilder db = documentBuilderFactory.newDocumentBuilder();   
    Document doc = db.newDocument();
    Element foo = doc.createElement("foo");
    doc.appendChild(foo);
    foo.setTextContent(astral);


    // Write using Transformer transform 
    FileOutputStream fos2 = new FileOutputStream(new File(System.getProperty("user.dir") + "/astral.xml"));
    writeDocument(doc, fos2);
    fos2.close(); // Xalan writes �� but Saxon 8.7 is ok


}


protected static void writeDocument(Document document, OutputStream outputStream) throws Exception {
    Transformer serializer = transformerFactory.newTransformer();

    System.out.println(serializer.getClass().getName());

    serializer.setOutputProperty(javax.xml.transform.OutputKeys.ENCODING, "UTF-8");
    serializer.setOutputProperty(javax.xml.transform.OutputKeys.OMIT_XML_DECLARATION, "yes");
    serializer.setOutputProperty(javax.xml.transform.OutputKeys.METHOD, "xml");

    serializer.transform( new DOMSource(document) , new StreamResult(outputStream) );               
}


}

【问题讨论】:

    标签: java unicode xmlserializer xalan jaxp


    【解决方案1】:
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-07-22
    • 2015-05-11
    • 2015-03-07
    • 2018-10-16
    • 2011-04-06
    相关资源
    最近更新 更多