从一个文件中解析特定的 XML 属性并将其附加到另一个文件中，前提是第二个文件中存在另一个属性答案

【问题标题】：Parse specific XML attribute from one file and append it to another iff another attribute is present in the second file从一个文件中解析特定的 XML 属性并将其附加到另一个文件中，前提是第二个文件中存在另一个属性
【发布时间】：2016-08-26 09:00:19
【问题描述】：

我有三个 XML 文件（示例如下）。我用它们各自的audioId 属性值命名了这些文件。因此，有问题的文件将被称为93.xml 和2137.xml：

93.xml：

<word BloomsTaxonomy="1,2,3" DictationGroupid="i-e combination List 7" Stage="0" Use="P,L" audioId="93" />

2173.xml：

<word BloomsTaxonomy="1,2,3" DictationGroupid="i-e combination List 7" Stage="0" Use="P,L" audioId="2137" />

mainDataSet.xml：

<word id="2137" title="over" level="1" grouping="Sight Words" YRule="0" MagicE="0" SoftC="0" doublevowel="0" longvowel="0" displayorder="101" silentletters="0"/>

文件 mainDataSet.xml 包含约 3,000 个条目。出于这个问题的目的，我只提供了一个条目。

我的问题是，如果id 在两个文件中都匹配（或者即使@ mainDataSet.xml 中的 987654334@ 与文件 name 匹配。例如，在我提供的示例中，输出应该是：

<word BloomsTaxonomy="1,2,3" DictationGroupid="i-e combination List 7" Stage="0" Use="P,L" audioId="2137" title="over" />

要从 mainDataSet.xml 解析我的 XML，我目前正在做：

e = xml.etree.ElementTree.parse('mainDataSet.xml').getroot()
for atype in e.findall('word'):
    print(atype.get('title'))

【问题讨论】：

标签： python xml xml-parsing lxml elementtree

【解决方案1】：

要添加属性，请使用.attrib 字典。这是一个示例代码，它遍历 mainDataSet.xml 内的 word 元素，检索 id 属性值，解析适当的 XML 文件（在这种情况下为 93.xml 和 2173.xml），更新 word元素并将树转储回文件：

import xml.etree.ElementTree as ET


e = ET.parse('mainDataSet.xml').getroot()
for word in e.findall('word'):
    word_id = word.attrib.get("id")
    if word_id:
        filename = "%s.xml" % word_id
        e_word = ET.parse(filename)
        e_word.getroot().attrib['title'] = word.attrib.get('title')
        e_word.write(filename)

我使用过的mainDataSet.xml 示例：

<words>
    <word id="2137" title="over" level="1" grouping="Sight Words" YRule="0" MagicE="0" SoftC="0" doublevowel="0" longvowel="0" displayorder="101" silentletters="0"/>
    <word id="93" title="something else" level="1" grouping="Sight Words" YRule="0" MagicE="0" SoftC="0" doublevowel="0" longvowel="0" displayorder="101" silentletters="0"/>
</words>

这是我运行脚本后得到的结果：

93.xml:

<word BloomsTaxonomy="1,2,3" DictationGroupid="i-e combination List 7" Stage="0" Use="P,L" audioId="93" title="something else" />

2173.xml:

<word BloomsTaxonomy="1,2,3" DictationGroupid="i-e combination List 7" Stage="0" Use="P,L" audioId="2137" title="over" />

【讨论】：

完美。丹克！我会通过这个，让你知道它是怎么回事！

【解决方案2】：

对于 OP 或未来的读者，请考虑使用 Python 可以使用 lxml 模块运行的 XSLT 1.0 解决方案。作为信息，XSLT 是一种专用语言（其脚本是格式良好的 xml 文件），旨在操作 XML 文件。该脚本可移植到其他通用语言（Java、PHP、C#）、XSLT 处理器（Saxon、Xalan），甚至命令行解释器（Bash、PowerShell）。具体来说，对于这个问题，XSLT 维护了document() 函数，该函数可以访问外部 xml 文件中的节点，以满足 id 等比较需求。

输入 （添加根标签）

mainDataSet.xml

<root>
   <word id="2137" title="over" level="1" grouping="Sight Words" YRule="0" 
         MagicE="0" SoftC="0" doublevowel="0" longvowel="0" 
         displayorder="101" silentletters="0"/>
</root>

2137.xml

<root>
    <word BloomsTaxonomy="1,2,3" DictationGroupid="i-e combination List 7" 
          Stage="0" Use="P,L" audioId="2137" />
</root>

93.xml

<root>
   <word BloomsTaxonomy="1,2,3" DictationGroupid="i-e combination List 7"
         Stage="0" Use="P,L" audioId="93" />
</root>

XSLT 脚本（在外部保存为 .xsl；读入 .py；假设所有 XML 文件都在同一目录中）

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
  <xsl:output method="xml" indent="yes" encoding="UTF-8" />

  <xsl:template match="root">
    <xsl:copy>
      <xsl:apply-templates select="word"/>
    </xsl:copy>
  </xsl:template>  

  <xsl:template match="word">    
    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:if test="@audioId = document('mainDataSet.xml')/root/word/@id">
        <xsl:attribute name="title">
          <xsl:value-of select="document('mainDataSet.xml')/root/word/@title"/>
        </xsl:attribute>
      </xsl:if>
    </xsl:copy>
  </xsl:template>  

</xsl:stylesheet>

Python脚本

import lxml.etree as ET

# LOAD XML AND XSL
xslt = ET.parse('XSLTScript.xsl')
for i in ['2137', '93']:
    dom = ET.parse('{}.xml'.format(i))

    # TRANSFORM XML
    transform = ET.XSLT(xslt)
    newdom = transform(dom)

    # PRETTY PRINT OUTPUT
    tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True)
    print(tree_out.decode("utf-8"))

    # SAVE TO FILE
    xmlfile = open('{}.xml'.format(i),'wb')
    xmlfile.write(tree_out)
    xmlfile.close()

输出 （使用发布的数据）

2173.xml

<root>
  <word BloomsTaxonomy="1,2,3" DictationGroupid="i-e combination List 7" Stage="0" 
        Use="P,L" audioId="2137" title="over"/>
</root>

93.xml

<root>
  <word BloomsTaxonomy="1,2,3" DictationGroupid="i-e combination List 7" Stage="0" 
        Use="P,L" audioId="93"/>
</root>

【讨论】：

咕噜咕噜。这两种解决方案都有效！我会通过他们，让你知道。丹克！
我最终使用了@alecxe 的解决方案。这也是一个不错的主意！还是谢谢！