【问题标题】:Iterating through XML document with XPath without printing duplicates使用 XPath 遍历 XML 文档而不打印重复项
【发布时间】:2021-11-04 01:01:43
【问题描述】:

我有这个 XML 文档:

<?xml version="1.0" encoding="UTF-8"?>
<tns:request xmlns:tns="urn">
    <tns:CorrectingData>
        <tns:CorrectingDataBlock>
            <tns:CurrentVersionData>current</tns:CurrentVersionData>
            <tns:NewVersionData>new</tns:NewVersionData>
        </tns:CorrectingDataBlock>
        <tns:CorrectingDataBlock>
            <tns:CurrentVersionData>100</tns:CurrentVersionData>
            <tns:NewVersionData>200</tns:NewVersionData>
        </tns:CorrectingDataBlock>
    </tns:CorrectingData>
</tns:request>

以及对应的XSD文件:

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<xsd:element name="request" type="tns:requestType">
</xsd:element>

<xsd:complexType name="requestType">
    <xsd:sequence>
        <xsd:element name="CorrectingData" type="tns:CorrectingDataType" minOccurs="0" maxOccurs="1">
        </xsd:element>
    </xsd:sequence>
</xsd:complexType>

<xsd:complexType name="CorrectingDataType">
    <xsd:sequence>
        <xsd:element name="CorrectingDataBlock" type="tns:CorrectingDataTextType" minOccurs="1" maxOccurs="unbounded"/>
    </xsd:sequence>
</xsd:complexType>

<xsd:complexType name="CorrectingDataTextType">
    <xsd:sequence>
        <xsd:element name="CurrentVersionData" type="tns:string">
        </xsd:element>
        <xsd:element name="NewVersionData" type="tns:string">
        </xsd:element>
    </xsd:sequence>
</xsd:complexType>

</xsd:schema>

我需要在编辑表单上显示整个 XSD 文档,并在可能的情况下从 XML 文档中插入数据。在我的示例中,我已经简化了代码,以便输出值的叶子项目到控制台。

要从 XML 文档插入数据,我想遍历每个 tns:CorrectingDataBlock 元素并打印其叶子的值(tns:CurrentVersionDatatns:NewVersionData)。 我需要这样的输出:

current
new
100
200

我有这个 javascript 代码,它通过 XSD 文档并创建 XPath,所以我可以使用它从 XML 文档中查找和打印叶子的值。

// ...
// dataXPath = `/*[local-name()='${childElement.getAttribute("name")}']`
// I'm using recursive method, which fills dataXPath from root and passes it to child node all the way to the leaf.
// After I reach the leaf node I print its value.

const childElementDataXPath: string = dataXPath + `/*[local-name()='${childElement.getAttribute("name")}']`;
const snapshotXPathResult: XPathResult = this._dataDocument.evaluate(childElementDataXPath, this._dataDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

for (let i: number = 0; i < snapshotXPathResult.snapshotLength; i++) {
  const node: Element = snapshotXPathResult.snapshotItem(i) as Element;
  console.log(node.textContent);
}
// ...

这段代码创建的结果 XPath 是:

/*[local-name()='request']/*[local-name()='CorrectingData']/*[local-name()='CorrectingDataBlock']/*[local-name()='CurrentVersionData']

/*[local-name()='request']/*[local-name()='CorrectingData']/*[local-name()='CorrectingDataBlock']/*[local-name()='NewVersionData']

并且代码生成这个输出:

current
100
new
200
current
100
new
200

问题:如何更改我的代码以获得我想要的?我做错了什么?

注意事项:

  • 我应该在 XPath 中使用local-name() 函数,因为在执行时我不知道 XML 文档的命名空间。

完整代码清单:

export class Parser {

    public _schemeDocument: Document;
    public _dataDocument: Document;

    public processElement(element: Element, dataXPath: string): void {
        const typeName: string = this._getElementTypeName(element);
        const typeElement: Element = this._getTypeElementByName(typeName);

        dataXPath += `/*[local-name()='${element.getAttribute("name")}']`;

        if (typeElement && this._isComplexType(typeElement)) {
            const sequence = this._schemeDocument.evaluate("./*[local-name()='sequence']", typeElement).iterateNext();
            if (sequence) {
                Array.prototype.forEach.call((sequence as Element).children, (childElement: Element) => {
                    if (this._isElement(childElement)) {
                        const childElementDataXPath: string = dataXPath + `/*[local-name()='${childElement.getAttribute("name")}']`;
                        const snapshotXPathResult: XPathResult = this._dataDocument.evaluate(childElementDataXPath, this._dataDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

                        const childTypeName: string = this._getElementTypeName(childElement);
                        const childTypeElement: Element = this._getTypeElementByName(childTypeName);
                        if (childTypeElement && this._isComplexType(childTypeElement)) {
                            for (let i: number = 0; i < snapshotXPathResult.snapshotLength; i++) {
                                this.processElement(childElement, dataXPath);
                            }
                        } else {
                            const childElementCaption: string = this._getElementCaption(childElement);

                            for (let i: number = 0; i < snapshotXPathResult.snapshotLength; i++) {
                                const node: Element = snapshotXPathResult.snapshotItem(i) as Element;
                                const childElementValue: string = node ? node.textContent : "EMPTY";
                                console.log(childElementCaption + ": " + childElementValue);
                            }
                        }
                    }
                });
            }
        }
    }

    private _getElementTypeName(element: Element): string {
        const splittedTypeName: string[] = element.getAttribute("type").split(":");
        return splittedTypeName.length > 1 ? splittedTypeName[1] : splittedTypeName[0];
    }

    private _getTypeElementByName(typeName: string): Element {
        const simpleTypeXPath = `//*[local-name()='simpleType'][@name='${typeName}']`;
        const complexTypeXPath = `//*[local-name()='complexType'][@name='${typeName}']`;
        return this._schemeDocument.evaluate(`${simpleTypeXPath}|${complexTypeXPath}`, this._schemeDocument).iterateNext() as Element;
    }

    private _getElementCaption(element: Element): string {
        const elementCaption: Node = this._schemeDocument.evaluate(".//*[local-name()='documentation']", element).iterateNext();
        return elementCaption ? elementCaption.textContent : "EMPTY";
    }

    private _isComplexType(element: Element): boolean {
        return element.localName === "complexType";
    }

    private _isElement(element: Element): boolean {
        return element.localName === "element";
    }
}

【问题讨论】:

  • 如果有_dataDocument,为什么不能读出_dataDocument.documentElement.namespaceURI,获取根元素的命名空间?
  • 感谢您的评论!你是对的,我可以在我的代码中使用它。但即使我这样做了,我也不知道它将如何帮助我解决输出重复的主要问题。
  • 来自更改代码的 XPath 如下所示:/tns:request/tns:CorrectingData/tns:CorrectingDataBlock/tns:CurrentVersionData。它仍然存在与输出重复的相同缺陷。 @MartinHonnen
  • 很难从 sn-p 中看出,如果您编写递归代码,请确保您不会在递归中从根向下处理所有内容,而是确保您处理相对于每个为它们的父节点。
  • @MartinHonnen 好建议!我正在尝试在我的系统上应用您的代码。一有结果我就写。

标签: javascript xml xpath xsd


【解决方案1】:

为什么不直接使用 DOM 方法:

const xmlSource = `<?xml version="1.0" encoding="UTF-8"?>
<tns:request xmlns:tns="urn">
    <tns:CorrectingData>
        <tns:CorrectingDataBlock>
            <tns:CurrentVersionData>current</tns:CurrentVersionData>
            <tns:NewVersionData>new</tns:NewVersionData>
        </tns:CorrectingDataBlock>
        <tns:CorrectingDataBlock>
            <tns:CurrentVersionData>100</tns:CurrentVersionData>
            <tns:NewVersionData>200</tns:NewVersionData>
        </tns:CorrectingDataBlock>
    </tns:CorrectingData>
</tns:request>`;

const xmlDoc = new DOMParser().parseFromString(xmlSource, 'application/xml');

const blockList = xmlDoc.getElementsByTagNameNS('*', 'CorrectingDataBlock');

for (let i = 0; i < blockList.length; i++) {
  Array.from(blockList[i].children).forEach(c => console.log(c.textContent));
}

【讨论】:

  • 我无法使用此代码,因为我需要显示整个 XSD 文档,并在可能的情况下从 XML 文档中插入数据。您的方法消除了显示整个 XSD 文档的可能性,并专注于显示 XML 文档。对于信息不足,我深表歉意。这就是我递归收集 XPath 并使用它在 XML 文档中查找数据的主要原因。
【解决方案2】:

我将processElement() 方法更改为:

public processElement(element: Element, dataXPath: string, numberOfDataElementsOfSameType: number = 1, elementDataIndex: number = 0): void {
    const typeName: string = this._getElementTypeName(element);
    const typeElement: Element = this._getTypeElementByName(typeName);

    if (numberOfDataElementsOfSameType > 1) {
        dataXPath += `/*[local-name()='${element.getAttribute("name")}'][${elementDataIndex + 1}]`;
    } else {
        dataXPath += `/*[local-name()='${element.getAttribute("name")}']`;
    }

    if (typeElement && this._isComplexType(typeElement)) {
        const sequenceElement: Element = this._schemeDocument.evaluate("./*[local-name()='sequence']", typeElement).iterateNext() as Element;
        if (sequenceElement) {
            Array.prototype.forEach.call(sequenceElement.children, (childElement: Element) => {
                const childDataXPath: string = dataXPath + `/*[local-name()='${childElement.getAttribute("name")}']`;
                const childResult: XPathResult = this._dataDocument.evaluate(childDataXPath, this._dataDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
                const childTypeName: string = this._getElementTypeName(childElement);
                const childTypeElement: Element = this._getTypeElementByName(childTypeName);
                if (childTypeElement && this._isComplexType(childTypeElement)) {
                    for (let i: number = 0; i < childResult.snapshotLength; i++) {
                        if (childResult.snapshotLength > 1) {
                            this.processElement(childElement, dataXPath, childResult.snapshotLength, i);
                        } else {
                            this.processElement(childElement, dataXPath, 1, 0);
                        }
                    }
                } else {
                    const childElementCaption: string = this._getElementCaption(childElement);
                    for (let i: number = 0; i < childResult.snapshotLength; i++) {
                        const childDataElement: Element = childResult.snapshotItem(i) as Element;
                        const childDataElementValue: string = childDataElement ? childDataElement.textContent : "EMPTY";
                        console.log(childElementCaption + ": " + childDataElementValue);
                    }
                }
            });
        }
    }
}

主要变化是 XPath 现在包含 XML 文档中元素的索引。这有助于避免重复的叶子元素。在具有多个 tns:CorrectingDataBlock 元素的示例中,新 XPath 如下所示:

/*[local-name()='request']/*[local-name()='CorrectingData']/*[local-name()='CorrectingDataBlock'][1]/*[local-name()='CurrentVersionData']

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2011-02-18
    • 1970-01-01
    • 1970-01-01
    • 2013-08-28
    • 2013-01-26
    • 2023-03-30
    • 1970-01-01
    相关资源
    最近更新 更多