使用 Javascript 从节点中检索所有 XML答案

【问题标题】：Retrieving all XML from nodes with Javascript使用 Javascript 从节点中检索所有 XML
【发布时间】：2016-07-01 17:30:01
【问题描述】：

我正在尝试从节点获取 xml。假设我有一个 XML 文件：

<?xml version="1.0"?>
<story title="My title here">
    <subject key="key1" caption="Intro">
        Text here for subject 1. There might be an occasional <html> markup present.
        <action tag="actiontag"/>
    </subject>
    <subject key="key2" caption="Chap1">
        Text for chapter 2
        <directions>
            <dir go="chap5" to="Solving"/>
            <dir go="chap12" to="Searching">
                <extra1 subtitle="subtitle">You can expect extra text here as well.</extra>
                <extra2 subtitle="subtitle2"/>
            </dir>
            <dir go="chap2,chap5" to="Finding"/>
        </directions>
    </subject>
    <chapters key="chap1" caption="Chapter1">
        The text for chapter1 goes here
        <newtag>This one has text as well</newtag>
    </chapters>
</story>

现在我正在尝试将包括节点和标签在内的整个 XML 代码放入对象数组中。所以理想的结果应该是：

subjects[0].key=key1
subjects[0].caption=Intro
subjects[0].txt=Text here for subject 1. There might be an occasional <html> markup present.<action tag="actiontag"/>
subjects[1].key=key2
subjects[1].caption=Chap1
subjects[1].txt=Text for chapter 2<directions><dir go="chap5" to="Solving"/><dir go="chap12" to="Searching"><extra1 subtitle="subtitle">You can expect extra text here as well.</extra><extra2 subtitle="subtitle2"/></dir><dir go="chap2,chap5" to="Finding"/></directions>

此“文本”稍后可以作为 XML 处理。现在我已经能够分别读取 XML 和访问标签了。我已经能够遍历文件并获取文本，但我似乎无法遍历所有节点/文本/标签并保持其格式不变。

我拥有的是：

var xmlDoc;

function loadxml() {
    if (window.XMLHttpRequest) {// code for IE7+, Firefox, Chrome, Opera, Safari
        xmlhttp = new XMLHttpRequest();
    }
    else {// code for IE6, IE5
        xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
    }
    xmlhttp.open("GET", "assets/myfile.xml", false);
    xmlhttp.send();
    xmlDoc = xmlhttp.responseXML;
    xmlhttp.onloadend = init(xmlDoc);
}

function init(xmlDoc) {
    var subjects = [];
    var x, i;
    x = xmlDoc.getElementsByTagName('subject');
    for (i = 0; i < x.length; i++) {
        subjects.push({ key: x[i].getAttribute('key'), caption: x[i].getAttribute('caption'), txt: x[i].childNodes[0].nodeValue });
    }
    //just to check if there's something recorded..
    document.getElementById("result").innerHTML = subjects[1].txt;
}

对象数组没问题，可以。但是如何更改 x[i].childNodes[0].nodeValue 以保存 [subject] 的所有子节点并保留随附的标签和格式？

感谢您的宝贵时间。

【问题讨论】：

不是您的问题的答案，但您真的需要 IE6 及以下版本的代码吗？
使用 jQuery 可能更容易，试试 $.parseXML()
他已经有一个解析过的 XML 文档，@derloopkat，来自xmlhttp.responseXML。
你打算如何使用subject的值？因为如果您只是要对其进行进一步解析，请使用 txt: x[i].childNodes[0] 以便您有一个可以使用的节点。如果要将其附加到 DOM，请使用 appendChild(subjects[1].txt) 而不是 innerHTML。
@Mike McCaughan：不需要 IE6 及以下版本。这是从网络上快速复制和粘贴的。不确定如何处理子节点。最终我需要其中的标签并根据标签的内容进行处理。我目前正在努力的是清楚地了解如何处理 childNodes。我的第一个想法是将 .txt 作为 XML 文档从字符串处理。

标签： javascript xml

【解决方案1】：

function loadxml() {
    if (window.XMLHttpRequest) {// code for IE7+, Firefox, Chrome, Opera, Safari
        xmlhttp = new XMLHttpRequest();
    }
    else {// code for IE6, IE5
        xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
    }
    xmlhttp.open("GET", "assets/myfile.xml", false);
    xmlhttp.send();
    xmlDoc = xmlhttp.responseXML;
    responseText = xmlhttp.responseText;
    textNodes = responseText.split(/<subject.*>/);
    textNodes.shift();   //remove first chunk of text
    for (var i = 0; i < textNodes.length; i++) {
        textNodes[i] = textNodes[i].replace(/\r?\n|\r/g, '');   //remove line breaks;
        textNodes[i] = textNodes[i].replace(/^\s*/, '');      // Replace "> " with ">"
        textNodes[i] = textNodes[i].replace(/>\s*/g, '>');      // Replace "> " with ">"
        textNodes[i] = textNodes[i].replace(/\s*</g, '<');      // Replace "< " with "<"
    }
    xmlhttp.onloadend = init(xmlDoc, textNodes);
}

function init(xmlDoc, textNodes) {
    var subjects = [];
    var x, i;
    x = xmlDoc.getElementsByTagName('subject');
    for (i = 0; i < x.length; i++) {
        subjects.push({ key: x[i].getAttribute('key'), caption: x[i].getAttribute('caption'), txt: textNodes[i] });
    }
    console.log(subjects);
}

【讨论】：

此答案通过使用 ajax 响应的 responseXML 和 responseText 来实现您的目标。 responseXML 对于获取每个主题及其对应的key 和caption 属性很有用。 responseText 对于以您要查找的格式获取每个主题的 txt 很有用：去掉换行符和空格，但保留文本节点中的所有标记。
你说得对，xmlhttp.responseText 是这里的魔力词。次要细节是结束标签仍然存在，但很容易修复。谢谢！