【问题标题】:java xml Output of an object, special characters are not being escaped correctlyjava xml 对象的输出,特殊字符未正确转义
【发布时间】:2016-11-08 16:34:43
【问题描述】:

所以对于一个项目,我必须编写一个类,它接受多个对象“页面”,参数为 nameSpaceID、articleID、title、一组字符串,然后将它们输出到一个 xml 文件中。我尝试通过使用带有 XMLStreamWriter 的 XMLOutputFactory 来解决它,将 xml 写入 StringWriter,然后我将带有 transformerFactory 的 StringWriter 转换为正确的格式(缩进和内容),最后将其输出到 .xml 文件中。到目前为止一切正常,但我需要帮助转义特殊字符,例如,如果我在我的文件名中放一个 >,它就不会被转义。我尝试使用 StringEscapeUtils.escapeXml10(String) 对其进行转义,但这只会使我的输出变得更糟。

import java.io.FileOutputStream;
import org.apache.commons.lang3.StringEscapeUtils;
import java.io.StringReader;
import java.io.StringWriter;
import java.util.HashSet;
import java.util.Set;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

/**
 * 
 */

/**
 * @author Paul
 *
 */
public class PageExport {
    /**
     * @param args
     */
    public void printPagestoXML(Page[] pages, String fileName, String filePath){
        try {
            StringWriter xmlRAW = new StringWriter();
            XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newFactory();
            xmlOutputFactory.setProperty("escapeCharacters", false);
            XMLStreamWriter xmlStreamWriter = xmlOutputFactory.createXMLStreamWriter(xmlRAW);

            xmlStreamWriter.writeStartDocument("UTF-8", "1.0");

            xmlStreamWriter.writeStartElement("pages");

            for(int i = 0; i < pages.length; i++){
                xmlStreamWriter.writeStartElement("page");
                xmlStreamWriter.writeAttribute("pageID", pages[i].getArticleID() + "");
                xmlStreamWriter.writeAttribute("namespaceID", pages[i].getNamespaceID() + "");
                xmlStreamWriter.writeAttribute("title", StringEscapeUtils.escapeXml10(pages[i].getTitle()));

                if (pages[i].getCategories() != null){
                    xmlStreamWriter.writeStartElement("categories");

                    for(int j = 0; j < pages[i].getCategories().size(); j++) {
                        xmlStreamWriter.writeEmptyElement("category");
                        xmlStreamWriter.writeAttribute("name", pages[i].getCategories().toArray()[j].toString());
                    }

                    xmlStreamWriter.writeEndElement(); //end of categories
                }

                xmlStreamWriter.writeEndElement(); //end of page i
            }
            xmlStreamWriter.writeEndElement(); //end of pages

            xmlStreamWriter.writeEndDocument(); // end of document

            xmlStreamWriter.flush();
            xmlStreamWriter.close();

            Transformer transformer = TransformerFactory.newInstance().newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            StreamResult streamResult = new StreamResult(new FileOutputStream(filePath + fileName));
            transformer.transform(new StreamSource(new StringReader(xmlRAW.getBuffer().toString())), streamResult);
        }
        catch (Exception e){
            System.out.println(e.getMessage());
        }
    }

    public static void main(String[] args) {
        String goodFilePath = System.getProperty("user.dir") + "/src/data/";
        String goodFileName = "test.xml";
        Set<String> testCategories = new HashSet<String>();
        testCategories.add("this");
        testCategories.add("is");
        testCategories.add("sparta");
        Page[] testPages = {new Page(0, 1337, "l33t", testCategories), new Page(0, 1338, "l33t>", testCategories)};
        PageExport pe = new PageExport();
        pe.printPagestoXML(testPages, goodFileName, goodFilePath);
    }

}

这段代码的输出(第二页标题是重要的):

<?xml version="1.0" encoding="UTF-8"?>
<pages>
  <page pageID="1337" namespaceID="0" title="l33t">
    <categories>
      <category name="this"/>
      <category name="is"/>
      <category name="sparta"/>
    </categories>
  </page>
  <page pageID="1338" namespaceID="0" title="l33t&amp;gt;">
    <categories>
      <category name="this"/>
      <category name="is"/>
      <category name="sparta"/>
    </categories>
  </page>
</pages>

没有 StringEscapeUtils.escapeXml10(title) :

<?xml version="1.0" encoding="UTF-8"?>
<pages>
  <page pageID="1337" namespaceID="0" title="l33t">
    <categories>
      <category name="this"/>
      <category name="is"/>
      <category name="sparta"/>
    </categories>
  </page>
  <page pageID="1338" namespaceID="0" title="l33t&gt;">
    <categories>
      <category name="this"/>
      <category name="is"/>
      <category name="sparta"/>
    </categories>
  </page>
</pages>

我想要什么:

<?xml version="1.0" encoding="UTF-8"?>
<pages>
  <page pageID="1337" namespaceID="0" title="l33t">
    <categories>
      <category name="this"/>
      <category name="is"/>
      <category name="sparta"/>
    </categories>
  </page>
  <page pageID="1338" namespaceID="0" title="l33t>">
    <categories>
      <category name="this"/>
      <category name="is"/>
      <category name="sparta"/>
    </categories>
  </page>
</pages>

编辑:我通过将 DOCTYPE_PUBLIC 设置为“是”解决了这个问题,新代码:

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.StringReader;
import java.io.StringWriter;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.apache.log4j.Logger;

/**
 * @author Paul
 *
 */

public class PageExport {

    Logger log = Logger.getLogger(PageExport.class);

    /**
     * Converts a collection of Pages into a XML String and then into a XML file.
     * 
     * @param   pages The collection or Pages, that shall be written into the file.
     * @param   filepath The full path of the XML file.
     * @see     #printPagestoXML(Page[], String, String)
     * @see     Page
     * 
     */

    public void printPagestoXML(Page[] pages, String filepath){
        //Converting a single input filepath into a filepath & filename and
        //then running the method with the arguments
        String newfilepath = "";
        String[] splitpath = filepath.split("/");
        for (int i = 0; i < splitpath.length - 1 ; i++){
            newfilepath += (splitpath[i] + "/");
        }
        printPagestoXML(pages,  newfilepath, splitpath[splitpath.length - 1].split("\\.")[0]);
    }

    /**
     * Converts a collection of Pages into a XML String and then into a XML file.
     * 
     * @param   pages The collection or Pages, that shall be written into the file.
     * @param   filepath The path of the XML file.
     * @param   filename Name of the .xml file (Without .xml)
     * @see     #printPagestoXML(Page[], String, String)
     * @see     Page
     * 
     */

    public void printPagestoXML(Page[] pages, String filepath, String filename){

        try {
            //Method starts of by creating a new outputfactory, that prints to a StringWriter,
            //so that the xml String can still be transformed before getting output.
            StringWriter rawXml = new StringWriter();
            XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newFactory();
            XMLStreamWriter xmlStreamWriter = xmlOutputFactory.createXMLStreamWriter(rawXml);

            xmlStreamWriter.writeStartDocument("UTF-8", "1.0"); //start of the XML stream

            xmlStreamWriter.writeStartElement("pages"); //the first element "pages"

            for(int i = 0; i < pages.length; i++){  
                //loop to create elements for all pages in the collection
                log.info("Creating Page " + i + ": " + pages[i].getTitle());
                xmlStreamWriter.writeStartElement("page");
                xmlStreamWriter.writeAttribute("pageID", pages[i].getArticleID() + "");
                xmlStreamWriter.writeAttribute("namespaceID", pages[i].getNamespaceID() + "");
                xmlStreamWriter.writeAttribute("title", pages[i].getTitle());

                if (pages[i].getCategories() != null){  
                    xmlStreamWriter.writeStartElement("categories");

                    for(int j = 0; j < pages[i].getCategories().size(); j++) {  
                        //loop to create all categories for the currently creating page
                        log.trace("Creating Category " + j + ": " + pages[i].getCategories().toArray()[j].toString());
                        xmlStreamWriter.writeEmptyElement("category");
                        xmlStreamWriter.writeAttribute("name", pages[i].getCategories().toArray()[j].toString());
                    }

                    xmlStreamWriter.writeEndElement(); //end of categories
                }
                else {
                    // in case a page doesn't categories, the element wont be created and a warning is posted
                    log.info("Page " + (i + 1) + " does not have categories (" + pages[i].toString() + ")");
                }

                xmlStreamWriter.writeEndElement(); //end of page i
            }
            log.info("Last page written.");
            xmlStreamWriter.writeEndElement(); //end of pages
            xmlStreamWriter.writeEndDocument(); // end of document

            xmlStreamWriter.flush();
            xmlStreamWriter.close(); //close the streamwriter

            /*
             * The StringWriter variable rawXml now contains the whole XML string, but it still has to be
             * transformed, otherwise it would all be printed in one line.
             */
            Transformer transformer = TransformerFactory.newInstance().newTransformer();
            transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "yes");    //Setting the output properties
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");            //for the transformer
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            StreamResult streamResult = new StreamResult(new FileOutputStream(filepath + filename + ".xml"));

            //initiation of the output streamresult with the filepath
            transformer.transform(new StreamSource(new StringReader(rawXml.toString())), streamResult);

            log.info(filename + ".xml created.");
            //transformation / formatting of the xml string and output into .xml file
        } catch (Exception e){
            log.warn(e.getMessage());
        }
    }

【问题讨论】:

  • title="l33t>"是有效的编码。任何 XML 解析器都会为您将其转换回 l33t> 。是否有特定原因您必须使用 > 而不是 > ?

标签: java xml


【解决方案1】:

请阅读Character Data and Markup:

& 符号&amp;amp; 和左尖括号&amp;lt; 仅在用作标记分隔符或在注释、处理指令或CDATA 部分中时才能以文字形式出现。如果在其他地方需要它们,则必须分别使用数字字符引用或字符串 &amp;amp;&amp;lt; 对其进行转义。

右尖括号&amp;gt; 可以使用字符串&amp;gt; 表示,并且为了兼容性,当它出现在内容中的字符串]]&gt; 中时,必须使用&amp;gt; 或字符引用进行转义,当那个字符串未标记 CDATA 部分的结尾。

现在应该清楚了,为什么它没有像你预期的那样工作。

【讨论】:

    【解决方案2】:

    在 build.gradle 的依赖项中添加以下行

    编译'commons-lang:commons-lang:2.5'

    非转义使用

    String title = StringEscapeUtils.unescapeJava(.getTitle());
    

    String title = StringEscapeUtils.unescapeJava(userProfile.getScreen_name().replace("\n", "\\n")
                        .replace("&amp;", "&"));
    

    逃生用

    String title = StringEscapeUtils.escapeJava(xmlResponse.getTitle());
    

    String title = StringEscapeUtils.escapeJava(xmlResponse.getTitle()).replace(Constants.ESCAPED_NEWLINE, Constants.NEWLINE);
    

    【讨论】:

    • StringEscapeUtils 的使用是一个红鲱鱼。 XmlStreamWriter.writeAttribute() 将逃脱 &gt; 无论你传递给它。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-02-27
    • 2021-06-15
    • 2018-12-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-05-09
    相关资源
    最近更新 更多