如何使用 DOM 解析嵌套元素答案

【问题标题】：How to parse nested elements using DOM如何使用 DOM 解析嵌套元素
【发布时间】：2015-09-28 08:36:09
【问题描述】：

我正在开发一个应用程序，我想在其中使用 DOM 解析包含许多嵌套元素的 XML 文件。下面的 sn -p 是我正在处理的 XML 文件的类型。

<?xml version="1.0"?>
<audobon>
    <bird id="1">
        <title>eagle</title>
        <link id="1">wikipedia.org/eagle</link>
        <description>Large bird of prey</description>

    </bird>
    <bird id="2">
        <title>Duck</title>
        <link id="1">wikipedia.org/wood_duck</link>
        <link id="2">wikipedia.org/mallard_duck</link>
        <description>Aquatic, omnivorous bird.</description>

    </bird>
    <bird id="3">
        <title>Crane</title>
        <link id="1">wikipedia.org/crane</link>     
        <description>Aquatic, carnivorous bird</description>
    </bird>

    <bird id="4">
        <title>pigeon</title>
        <link id="1">wikipedia.org/common_pigeon</link>
        <link id="2">wikipedia.org/passenger_pigeon</link>
        <link id="3">wikipedia.org/homing_pigeon</link>
        <description>Domesticated or wild bird</description>

    </bird>

</audobon>

所以在这个例子中，我想遍历每一个“鸟”元素，并去掉“链接”元素，它们的数量是可变的。

这是我目前正在使用的代码。

public static void main(String[] args) {
    try {
        File fXmlFile = new File("C:/Users/I844763/Documents/AudobonXML.xml");
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(fXmlFile);
        doc.getDocumentElement().normalize();
        System.out.println("Root element :" + doc.getDocumentElement().getNodeName());

        NodeList nList = doc.getElementsByTagName("bird");
        ParsedDataLength = nList.getLength();
        NodeList LinkList = null;

        for (int temp = 0; temp < nList.getLength(); temp++) {
            Node nNode = nList.item(temp);
            System.out.println("\nCurrent Element :" + nNode.getNodeName());

            if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                Element eElement = (Element) nNode;
                System.out.println("Bird Id: " + eElement.getAttribute("id"));
                System.out.println("Description: " + eElement.getElementsByTagName("description").item(0).getTextContent());
                System.out.println("Title : " + eElement.getElementsByTagName("title").item(0).getTextContent());

                //need method for setting value of i here to number of links in individual Bird section

                for (int i = 0; temp < nList.getLength(); i++) {
                    System.out.println("    Link : " + eElement.getElementsByTagName("link").item(i).getTextContent());
                }
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

所以，基本上我需要一种方法来确定如何确定第二个循环中 i 的值。我曾考虑在 XML 文件中添加一个额外的元素，其中包含每篇文章中包含的链接数量，但我想要一种更灵活的方法来实现它。谢谢。

【问题讨论】：

标签： java xml parsing dom

【解决方案1】：

也许我不明白，但这不是你想要的吗？

//need method for setting value of i here to number of links in individual Bird section

NodeList linkNodes = eElement.getElementsByTagName("link");

for (int i = 0; i < linkNodes.getLength(); i++) {
    System.out.println("    Link : " + linkNodes.item(i).getTextContent());
}

使用该添加运行您的代码会产生以下输出：

根元素：audobon 当前元素：鸟鸟编号：1 描述：大型猛禽作品名称：鹰链接：wikipedia.org/eagle 当前元素：鸟鸟类编号：2 描述：水生杂食性鸟类。作品名：鸭链接：wikipedia.org/wood_duck 链接：wikipedia.org/mallard_duck 当前元素：鸟鸟类编号：3 描述：水生食肉鸟标题：起重机链接：wikipedia.org/crane 当前元素：鸟鸟类编号：4 描述：家养或野生鸟类作品名称：鸽子链接：wikipedia.org/common_pigeon 链接：wikipedia.org/passenger_pigeon 链接：wikipedia.org/homing_pigeon

【讨论】：

有趣，我之前尝试过类似的方法，但没有成功。也许是因为我是从循环外部定义 NodeList。
getElementsByTagName()（以及许多其他 DOM 节点方法）使用当前节点作为上下文。所以myDocument.getElementsByTagName("foo") 和someElementInTheSameDoc.getElementsByTagName("foo") 会返回不同的节点列表。