如何使用标准库使用 Python 在内存中构建大型 XML 文档？答案

【问题标题】：How to build large XML documents in memory with Python using standard library?如何使用标准库使用 Python 在内存中构建大型 XML 文档？
【发布时间】：2014-03-07 21:06:55
【问题描述】：

我正在尝试在内存中创建一个大型 XML 文件，该文件将插入到 ESRI 要素类的 Blob 字段中。

我尝试使用 elementtree，但 Python 最终会崩溃。我可能没有以最好的方式做到这一点。我的代码示例（不准确）：

with update_cursor on feature class:
    for row in update_cursor:
        root = Element("root") 
        tree = ElementTree(root)
        for id in id_list:
            if row[0] in id:
               equipment = Element("equipment") 
               root.append(equipment)

               attrib1 = Element("attrib1")
               equipment.append(attrib1)
               attrib1.text = "myattrib1"

               attrib2 = Element("attrib2")
               equipment.append(attrib2)
               attrib2.text = "myattrib2"

               ....and about 5 more of these appended to equipment

        xml_data = ET.tostring(root)

        insert xml_data into blob field

XML 示例：

<root>
  <equipment>
    <attrib1>One</attrib1>
    <attrib2>Two</attrib2>
    <attrib3>Three</attrib3>
    ...
    <attrib10>Ten</attrib10>
  </equipment>
  <equipment>
    <attrib1>One</attrib1>
    <attrib2>Two</attrib2>
    <attrib3>Three</attrib3>
    ...
    <attrib10>Ten</attrib10>
  </equipment>
</root>

现在我意识到这可能是一种非常业余的方式，但我不确定在内存中构建此 XML 的最佳方式。

对于 update_cursor 中的每一行，可以有多个“设备”元素添加到根，每个“设备”元素将具有完全相同的子元素，但具有不同的属性。

我运行了这个，大约有 200 个 id 与单行匹配，所以它必须在内存中创建设备元素和设备的所有子元素 200 次。

那么，在 Python 中使用标准库在内存中创建 XML 的最佳方法是什么？

【问题讨论】：

如果您描述输入的样子（即row 和id_list），这将对我们有很大帮助。
正在处理空间数据，而行只是获取点的唯一 ID，而 ID_List 只是与此唯一 ID 匹配的 ID 列表。如果 ID 匹配，则使用列表中 ID 的属性填充 XML。每个唯一 ID 可以有多个来自 ID_List 的匹配项，它们代表设备。
我只是想知道这些是不是比我这里更好的编写 XML 的方法。

标签： python xml in-memory

【解决方案1】：

您的数据结构看起来非常简单。不要费心使用 XML 库。只需将您的台词直接写入cStringIO.StringIO。

with update_cursor on feature class:
    for row in update_cursor:
        buffer = cStringIO.StringIO()
        buffer.write("<root>\n")
        for id in id_list:
            if row[0] in id:
               buffer.write("    <equipment>\n")
               buffer.write("        <attrib1>One</attrib1>\n")
               buffer.write("        <attrib2>Two</attrib2>\n")
               buffer.write("        <attrib3>Three</attrib3>\n")

               ....and about 5 more of these appended to equipment

               buffer.write("    </equipment>\n")

        buffer.write("</root>\n")

        xml_data = buffer.getvalue()

        insert xml_data into blob field

【讨论】：

好吧，我想我没有提到的是我并不总是从头开始创建这些数据。有时blob中会有现有数据，我必须读取它并向其中添加新设备并将xml重写到blob。但是我不知道很酷的 cString。
无论哪种方式，我认为您将需要一些针对您的用例量身定制的东西，而不是更通用的 XML 工具。如果您可以使用非捆绑包，lxml 可能会做得更好。如果您可以编写 C 扩展并使用更轻量级的 DOM 库，那也可能是值得的。您也可以试一试 xml.etree.cElementTree，但我不确定它的内存要求有多大不同。另一个想法是处理可以在最后连接的较小文件。

【解决方案2】：

您可以使用ET.SubElement 来创建和附加元素：

equipment = ET.SubElement(root, "equipment")
ET.SubElement(equipment, "attrib1").text = "One"
ET.SubElement(equipment, "attrib2").text = "Two"
ET.SubElement(equipment, "attrib3").text = "Three"
...

更短更清晰。

【讨论】：