使用python 3从xml解析数据答案

【问题标题】：parse data from xml using python 3使用python 3从xml解析数据
【发布时间】：2022-01-11 15:52:00
【问题描述】：

请找到mwe xml文件（实际文件有81k行长，我只展示一小部分）。

<?xml version="1.0" encoding="ISO-8859-1"?>
<modeling>
  <dos>
    <i name="efermi">     -2.48501882 </i>
    <total>
      <array>
        <dimension dim="1">gridpoints</dimension>
        <dimension dim="2">spin</dimension>
        <field>energy</field>
        <field>total</field>
        <field>integrated</field>
        <set>
          <set comment="spin 1">
            <r>   -55.6029     0.0000     0.0000 </r>
            <r>   -55.3940     0.0000     0.0000 </r>
            <r>   -55.1850     0.0000     0.0000 </r>
            <r>   -54.9761     0.0000     0.0000 </r>
          </set>
          <set comment="spin 2">
            <r>   -55.6029     0.0000     0.0000 </r>
            <r>   -55.3940     0.0000     0.0000 </r>
            <r>   -55.1850     0.0000     0.0000 </r>
            <r>   -54.9761     0.0000     0.0000 </r>
          </set>
        </set>
      </array>
    </total>
    <partial>
      <array>
        <dimension dim="1">gridpoints</dimension>
        <dimension dim="2">spin</dimension>
        <dimension dim="3">ion</dimension>
        <field>energy</field>
        <field>    s</field>
        <field>   py</field>
        <field>   pz</field>
        <field>   px</field>
        <field>  dxy</field>
        <field>  dyz</field>
        <field>  dz2</field>
        <field>  dxz</field>
        <field>x2-y2</field>
        <set>
          <set comment="ion 1">
            <set comment="spin 1">
              <r>   -55.6029     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 </r>
              <r>   -55.3940     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 </r>
              <r>   -55.1850     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 </r>
              <r>   -54.9761     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 </r>
            </set>
            <set comment="spin 2">
              <r>   -55.6029     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 </r>
              <r>   -55.3940     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 </r>
              <r>   -55.1850     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 </r>
              <r>   -54.9761     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 </r>
            </set>
          </set>
        </set>
      </array>
    </partial>
  </dos>
</modeling>

dos标签嵌套在下面，spin 1等组件中的值不一定是0。我已经设法到达dos 标记，并获得efermi 值，但不明白如何单独和有选择地获取集合，以便我可以使用matplotlib 绘制它。

这是我当前的代码：

#!/usr/bin/env python
import xml.etree.ElementTree as ET

tree = ET.parse("trial.xml")
root = tree.getroot()

for elem in root:
  print(elem.tag)
  if elem.tag == 'dos':
    for x in elem:
      print(x.attrib.get('name'), x.text)

【问题讨论】：

标签： python xml-parsing

【解决方案1】：

您可以直接使用https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.findall 和 xpath 获取集合（例如：'node_a.node_b.etc.req_node'），如下所示的代码...（也可以访问评论文本）

import xml.etree.ElementTree as ET
import pandas as pd
from matplotlib import pyplot as plt

tree = ET.fromstring("test.xml")
root = tree.getroot()


for elem in root.findall('./dos/partial/array/set/set/set'):
    comment = elem.get('comment')
    print(comment)
    data = list()
    for row in elem.findall('r'):
        data.append(list(map(float, row.text.split())))
    df = pd.DataFrame(data)
    print(df)
    df.plot()
    plt.show()

【讨论】：