【问题标题】:Strange XML indentation奇怪的 XML 缩进
【发布时间】:2013-05-14 13:22:46
【问题描述】:

我正在编写一个 XML 文件,并且选项卡出现了一些错误:

<BusinessEvents>

<MailEvent>
          <to>Wellington</to>
          <weight>10.0</weight>
          <priority>air priority</priority>
          <volume>10.0</volume>
          <from>Christchurch</from>
          <day>Mon May 20 14:30:08 NZST 2013</day>
          <PPW>8.0</PPW>
          <PPV>2.5</PPV>
     </MailEvent>
<DiscontinueEvent>
          <to>Wellington</to>
          <priority>air priority</priority>
          <company>Kiwi Co</company>
          <from>Sydney</from>
     </DiscontinueEvent>
<RoutePriceUpdateEvent>
          <weightcost>3.0</weightcost>
          <to>Wellington</to>
          <duration>15.0</duration>
          <maxweight>40.0</maxweight>
          <maxvolume>20.0</maxvolume>
          <priority>air priority</priority>
          <company>Kiwi Co</company>
          <day>Mon May 20 14:30:08 NZST 2013</day>
          <frequency>3.0</frequency>
          <from>Wellington</from>
          <volumecost>2.0</volumecost>
     </RoutePriceUpdateEvent>
<CustomerPriceUpdateEvent>
          <weightcost>3.0</weightcost>
          <to>Wellington</to>
          <priority>air priority</priority>
          <from>Sydney</from>
          <volumecost>2.0</volumecost>
     </CustomerPriceUpdateEvent>
</BusinessEvents>

如您所见,第一个子节点根本没有缩进,但是那个节点子节点缩进了两次? 然后关闭标签只缩进一次?

我怀疑这可能与通过doc.appendChild(root) 将根添加到文档中有关,但是当我这样做时,我得到一个错误

“试图在不允许的地方插入一个节点。”

这是我的解析器:

DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder icBuilder;
        try {
            icBuilder = icFactory.newDocumentBuilder();
            String businessEventsFile = System.getProperty("user.dir") + "/testdata/businessevents/businessevents.xml";
            Document doc = icBuilder.parse (businessEventsFile);

            Element root = doc.getDocumentElement();

            Element element;

            if(event instanceof CustomerPriceUpdateEvent){
                element = doc.createElement("CustomerPriceUpdateEvent");
            }
            else if(event instanceof DiscontinueEvent){
                element = doc.createElement("DiscontinueEvent");
            }
            else if(event instanceof MailEvent){
                element = doc.createElement("MailEvent");
            }
            else if(event instanceof RoutePriceUpdateEvent){
                element = doc.createElement("RoutePriceUpdateEvent");
            }
            else{
                throw new Exception("business event isnt valid");
            }

            for(Map.Entry<String, String> field : event.getFields().entrySet()){
                Element newElement = doc.createElement(field.getKey());
                newElement.appendChild(doc.createTextNode(field.getValue()));
                element.appendChild(newElement);
            }

            root.appendChild(element);


            // output DOM XML to console
            Transformer transformer = TransformerFactory.newInstance().newTransformer();
//            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "5");
            DOMSource source = new DOMSource(doc);
            StreamResult console = new StreamResult(businessEventsFile);
            transformer.transform(source, console);

任何见解将不胜感激。

【问题讨论】:

    标签: java xml dom transformer


    【解决方案1】:

    前段时间我也遇到了同样的问题。 我发现问题在于解析后的文档在整个文档中都包含空白作为文本节点。

    例如,在解析文档之后,您可能在&lt;BusinessEvents&gt; 节点下的&lt;MailEvent&gt; 节点之前有一个空白文本节点。 Transformer 保留空白文本节点(我认为这是正确的行为)。

    因此,如果 xml 文本中的标签之间根本没有空格,则 Transformer 会正确缩进标签。 您可以通过手动删除输入中的所有空格(包括换行符)来尝试使用您的代码,然后进行格式化。然后输出可能会比您期望的更多。

    解决此问题的一种方法是在文档解析后从文档中删除多余的空格。 简单地删除所有空白文本节点会使格式看起来更好,但问题是是否确实需要一些空白文本节点。

    所以我在格式化之前清理文档的方法是删除所有仅包含空格的文本节点,除了对于那些文本节点是唯一子节点(没有兄弟姐妹)的情况。

    下面的方法cleanEmptyTextNodes(Node parentNode)递归地从子树中删除所有空白文本节点。

    import java.io.FileInputStream;
    import java.io.FileNotFoundException;
    import java.io.IOException;
    import java.io.StringWriter;
    
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.transform.OutputKeys;
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerException;
    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.dom.DOMSource;
    import javax.xml.transform.stream.StreamResult;
    
    import org.w3c.dom.Document;
    import org.w3c.dom.Node;
    import org.xml.sax.SAXException;
    
    public class FormatXml {
    
        public static void main(String[] args) throws ParserConfigurationException,
                FileNotFoundException, SAXException, IOException,
                TransformerException {
            DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
                    .newInstance();
            DocumentBuilder documentBuilder = docBuilderFactory
                    .newDocumentBuilder();
            Document node = documentBuilder.parse(new FileInputStream("data.xml"));
            System.out.println(format(node, 4));
        }
    
        public static String format(Node node, int indent)
                throws TransformerException {
            cleanEmptyTextNodes(node);
            StreamResult result = new StreamResult(new StringWriter());
            getTransformer(indent).transform(new DOMSource(node), result);
            return result.getWriter().toString();
        }
    
        private static Transformer getTransformer(int indent) {
            Transformer transformer;
            try {
                transformer = TransformerFactory.newInstance().newTransformer();
            } catch (Exception e) {
                throw new RuntimeException("Failed to create the Transformer", e);
            }
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(
                    "{http://xml.apache.org/xslt}indent-amount",
                    Integer.toString(indent));
            return transformer;
        }
    
        /**
         * Removes text nodes that only contains whitespace. The conditions for
         * removing text nodes, besides only containing whitespace, are: If the
         * parent node has at least one child of any of the following types, all
         * whitespace-only text-node children will be removed: - ELEMENT child -
         * CDATA child - COMMENT child
         * 
         * The purpose of this is to make the format() method (that use a
         * Transformer for formatting) more consistent regarding indenting and line
         * breaks.
         */
        private static void cleanEmptyTextNodes(Node parentNode) {
            boolean removeEmptyTextNodes = false;
            Node childNode = parentNode.getFirstChild();
            while (childNode != null) {
                removeEmptyTextNodes |= checkNodeTypes(childNode);
                childNode = childNode.getNextSibling();
            }
    
            if (removeEmptyTextNodes) {
                removeEmptyTextNodes(parentNode);
            }
        }
    
        private static void removeEmptyTextNodes(Node parentNode) {
            Node childNode = parentNode.getFirstChild();
            while (childNode != null) {
                // grab the "nextSibling" before the child node is removed
                Node nextChild = childNode.getNextSibling();
    
                short nodeType = childNode.getNodeType();
                if (nodeType == Node.TEXT_NODE) {
                    boolean containsOnlyWhitespace = childNode.getNodeValue()
                            .trim().isEmpty();
                    if (containsOnlyWhitespace) {
                        parentNode.removeChild(childNode);
                    }
                }
                childNode = nextChild;
            }
        }
    
        private static boolean checkNodeTypes(Node childNode) {
            short nodeType = childNode.getNodeType();
    
            if (nodeType == Node.ELEMENT_NODE) {
                cleanEmptyTextNodes(childNode); // recurse into subtree
            }
    
            if (nodeType == Node.ELEMENT_NODE
                    || nodeType == Node.CDATA_SECTION_NODE
                    || nodeType == Node.COMMENT_NODE) {
                return true;
            } else {
                return false;
            }
        }
    
    }
    

    使用您的输入生成的格式化输出:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <BusinessEvents>
        <MailEvent>
            <to>Wellington</to>
            <weight>10.0</weight>
            <priority>air priority</priority>
            <volume>10.0</volume>
            <from>Christchurch</from>
            <day>Mon May 20 14:30:08 NZST 2013</day>
            <PPW>8.0</PPW>
            <PPV>2.5</PPV>
        </MailEvent>
        <DiscontinueEvent>
            <to>Wellington</to>
            <priority>air priority</priority>
            <company>Kiwi Co</company>
            <from>Sydney</from>
        </DiscontinueEvent>
        <RoutePriceUpdateEvent>
            <weightcost>3.0</weightcost>
            <to>Wellington</to>
            <duration>15.0</duration>
            <maxweight>40.0</maxweight>
            <maxvolume>20.0</maxvolume>
            <priority>air priority</priority>
            <company>Kiwi Co</company>
            <day>Mon May 20 14:30:08 NZST 2013</day>
            <frequency>3.0</frequency>
            <from>Wellington</from>
            <volumecost>2.0</volumecost>
        </RoutePriceUpdateEvent>
        <CustomerPriceUpdateEvent>
            <weightcost>3.0</weightcost>
            <to>Wellington</to>
            <priority>air priority</priority>
            <from>Sydney</from>
            <volumecost>2.0</volumecost>
        </CustomerPriceUpdateEvent>
    </BusinessEvents>
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2016-08-15
      • 2021-03-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-10-26
      相关资源
      最近更新 更多