用 Python 解析 XML 并导出到 excel答案

【问题标题】：Parsing XML with Python and exporting to excel用 Python 解析 XML 并导出到 excel
【发布时间】：2019-12-04 21:46:12
【问题描述】：

我有一个如下所示的 XML 文件：

     <Result name="1">
       <point>
       <objects>
          <object>
             <path>
                <node>A</node>
                <node>a</node>
                <node>B</node>
                <node>b</node>
                <node>C</node>
                <node>c</node>
                <node>D</node>
                <node>d</node>
             <path/>
          <object/>
          <object>
             <path>
                <node>A</node>
                <node>a</node>
                <node>B</node>
                <node>b</node>
                <node>C</node>
                <node>c</node>
                <node>D</node>
                <node>d</node>
               </path>
            <object/>
         <objects/>
      <Result/>
   <Results/>

我想要一个可以导出到excel的python脚本，格式如下：

非常感谢您的帮助。谢谢你

【问题讨论】：

你尝试了什么？

标签： python xml parsing

【解决方案1】：

我建议阅读 python 文档。来自https://docs.python.org/2/library/xml.etree.elementtree.html：

 import xml.etree.ElementTree as ET
 tree = ET.parse('country_data.xml')
 root = tree.getroot()

我还建议查看https://xlsxwriter.readthedocs.io/

就操作数据的逻辑而言，除非您完全了解 xml 文件的特定顺序/结构，否则您可能会遇到问题，因为（据我所知）没有任何区别 <node>A</node> 与<node>a</node> 或将节点数限制为正好 8 个，因此您需要检查这些内容和其他类似性质的内容，以确保这些内容最终在 excel 文件中正确排列。

【讨论】：

【解决方案2】：

一种解决方案是将此 XML (?) 转换为 HTML 表格，然后将 HTML 表格加载到 Excel。

例如（使用 BeautifulSoup 库）：

data = '''
     <Result name="1">
       <point>
       <objects>
          <object>
             <path>
                <node>A</node>
                <node>a</node>
                <node>B</node>
                <node>b</node>
                <node>C</node>
                <node>c</node>
                <node>D</node>
                <node>d</node>
             <path/>
          <object/>
          <object>
             <path>
                <node>A</node>
                <node>a</node>
                <node>B</node>
                <node>b</node>
                <node>C</node>
                <node>c</node>
                <node>D</node>
                <node>d</node>
               </path>
            <object/>
         <objects/>
      <Result/>
   <Results/>'''

import re
from bs4 import BeautifulSoup

soup = BeautifulSoup(re.sub(r'<(.*?)/>', r'</\1>', data), 'html.parser')

num_objects = len(soup.select('object'))
num_nodes = len(soup.select_one('object').select('node')) // 2

print('<html><table border=1>')
print('<tr>')
print('<th>Result</th>')
for i in range(num_objects):
    print('<th colspan={}>Object</th>'.format(num_nodes))
print('</tr>')

for result in soup.select('Result[name]'):
    print('<tr>')
    print('<td rowspan=2>{}</td>'.format(result['name']))

    nodes = result.select('node')
    for node in nodes[::2]:
        print('<td>' + node.text + '</td>')
    print('</tr>')
    print('<tr>')
    for node in nodes[1::2]:
        print('<td>' + node.text + '</td>')
    print('</tr>')
print('</table></html>')

打印出来：

<html><table border=1>
<tr>
<th>Result</th>
<th colspan=4>Object</th>
<th colspan=4>Object</th>
</tr>
<tr>
<td rowspan=2>1</td>
<td>A</td>
<td>B</td>
<td>C</td>
<td>D</td>
<td>A</td>
<td>B</td>
<td>C</td>
<td>D</td>
</tr>
<tr>
<td>a</td>
<td>b</td>
<td>c</td>
<td>d</td>
<td>a</td>
<td>b</td>
<td>c</td>
<td>d</td>
</tr>
</table></html>

在 Firefox 中，它看起来：

将数据加载到 LibreOffice Calc 很容易（在 Excel 中也应该很容易）：

【讨论】：