【问题标题】:Effective way to convert an XML file to a CSV file?将 XML 文件转换为 CSV 文件的有效方法?
【发布时间】:2021-01-03 12:19:14
【问题描述】:

我正在尝试找到一种使用 Python 将 xml 文件转换为 csv 文件的方法。我想这样做,以便脚本将解析每个警报的 xml 文件(请参阅下面的 xml sn-p)。

所以它会生成一个 xls 文件,其中包含 eventTypeprobableCausedescriptionseverities 的列,类似于这种格式:

我的代码不起作用,它只会更新列名。

XML 示例:

<?xml version="1.0" encoding="UTF-8"?>

<faults version="1" xmlns="urn:nortel:namespaces:mcp:faults" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:faults NortelFaultSchema.xsd ">
    <family longName="1OffMsgr" shortName="OOM"/>
    <family longName="ACTAGENT" shortName="ACAT">
        <logs>
           <log>
                <eventType>RES</eventType>
                <number>1</number>
                <severity>INFO</severity>
                <descTemplate>
                     <msg>Accounting is enabled upon this NE.</msg>
               </descTemplate>
               <note>This log is generated when setting a Session Manager's AM from &lt;none&gt; to a valid AM.</note>
               <om>On all instances of this Session Manager, the &lt;NE_Inst&gt;:&lt;AM&gt;:STD:acct OM row in the  StdRecordStream group will appear and start counting the recording units sent to the configured AM.
                   On the configured AM, the &lt;NE_inst&gt;:acct OM rows in RECSTRMCOLL group will appear and start counting the recording units received from this Session Manager's instances.
               </om>
            </log>
           <log>
                <eventType>RES</eventType>
                <number>2</number>
                <severity>ALERT</severity>
                <descTemplate>
                     <msg>Accounting is disabled upon this NE.</msg>
               </descTemplate>
               <note>This log is generated when setting a Session Manager's AM from a valid AM to &lt;none&gt;.</note>
               <action>If you do not intend for the Session Manager to produce accounting records, then no action is required.  If you do intend for the Session Manager to produce accounting records, then you should set the Session Manager's AM to a valid AM.</action>
               <om>On all instances of this Session Manager, the &lt;NE_Inst&gt;:&lt;AM&gt;:STD:acct OM row in the StdRecordStream group that matched the previous datafilled AM will disappear.
                   On the previously configured AM, the  &lt;NE_inst&gt;:acct OM rows in RECSTRMCOLL group will disappear.
               </om>
            </log>
        </logs>
    </family>
    <family longName="ACODE" shortName="AC">
        <alarms>
            <alarm>
                <eventType>ADMIN</eventType>
                <number>1</number>
                <probableCause>INFORMATION_MODIFICATION_DETECTED</probableCause>
                <descTemplate>
                    <msg>Configured data for audiocode server updated: $1</msg>
                     <param>
                         <num>1</num>
                         <description>AudioCode configuration data got updated</description>
                         <exampleValue>acgwy1</exampleValue>
                     </param>
               </descTemplate>
               <manualClearable></manualClearable>
               <correctiveAction>None. Acknowledge/Clear alarm and deploy the audiocode server if appropriate.</correctiveAction>
               <alarmName>Audiocode Server Updated</alarmName>
               <severities>
                     <severity>MINOR</severity>
               </severities>               
            </alarm>
            <alarm>
                <eventType>ADMIN</eventType>
                <number>2</number>
                <probableCause>CONFIG_OR_CUSTOMIZATION_ERROR</probableCause>
                <descTemplate>
                    <msg>Deployment for audiocode server failed: $1. Reason: $2.</msg>
                     <param>
                         <num>1</num>
                         <description>AudioCode Name</description>
                         <exampleValue>audcod</exampleValue>
                     </param>
                     <param>
                         <num>2</num>
                         <description>AudioCode Deployment failed reason</description>
                         <exampleValue>Failed to parse audiocode configuration data</exampleValue>
                     </param>
               </descTemplate>
               <manualClearable></manualClearable>
               <correctiveAction>Check the configuration of audiocode server. Acknowledge/Clear alarm and deploy the audiocode server if appropriate.</correctiveAction>
               <alarmName>Audiocode Server Deploy Failed</alarmName>
               <severities>
                     <severity>MINOR</severity> 
               </severities>               
            </alarm>
        </alarms>
    </family>
</faults>

我尝试过的(小样本):

from logging import root
from xml.etree import ElementTree
import os
import csv

tree = ElementTree.parse('Fault.xml')

sitescope_data = open('Out.csv', 'w', newline='', encoding='utf-8')
csvwriter = csv.writer(sitescope_data)

col_names = ['eventType', 'probableCause', 'description']
csvwriter.writerow(col_names)

root = tree.getroot()
for eventData in root.findall('alarms'):
    event_data = []
    event = eventData.find('alarm')


    event_id = event.find('eventType')
    if event_id != None :
        event_id = event_id.text
    event_data.append(event_id)

    csvwriter.writerow(event_data)

sitescope_data.close()

【问题讨论】:

标签: python xml csv type-conversion elementtree


【解决方案1】:
root = tree.getroot()

def get_uri(elem):
    if elem.tag[0] == "{":
        uri, ignore, tag = elem.tag[1:].partition("}")
        return f"{{{uri}}}"
    return ""

uri = get_uri(root)

def recurse(root):
    for child in root:
        recurse(child)
        print(child.tag)
    for event in root.findall(f'{uri}alarm'):
        event_data = []
        event_id = event.find(f'{uri}eventType')
        if event_id != None :
            event_id = event_id.text
        event_data.append(event_id)

        probableCause = event.find(f'{uri}probableCause')
        if probableCause != None:
            probableCause = probableCause.text
        event_data.append(probableCause)

        severities = event.find(f'{uir}severities')
        if severities:
            severity_data = ','.join([sv.text for sv in severities.findall('f{uri}severity')])
            event_data.append(severity_data)
        else:
            event_data.append("")

        csvwriter.writerow(event_data)
        

recurse(root)

注意事项:

  1. 使用递归遍历 XML
  2. 打印语句将向您显示,您的每个标签都有来自根目录中的 xmlns 属性的 {urn:nortel:namespaces:mcp:faults},这可能是最让您失望的原因。 我添加了一个函数来获取这个“uri”文本并将其添加到每个标签中。
  3. 每次写入 csv 时,您都希望追加一列以上

【讨论】:

  • 没问题,顺便说一句,下一次,当您发布 xml 时,请确保您发布整个文件或关闭所有标签,以免像我这样的可怜草皮在试图找到您的问题。- 干杯
  • 我想我在移动到下一列并将下一个数据写入 csv(xml 中的'probableCause')时遇到了一些麻烦。如果你有时间,你认为你能简要解释一下我应该如何正确地做到这一点吗?非常感谢。
  • 可能的原因很简单,我更新了代码以显示它。描述很棘手,因为您必须进行参数替换
  • 啊好的。还有“严重性”,因为它嵌入在“严重性”中,我是否必须制作另一个 for 循环才能达到它?
  • 是的,您需要遍历严重性中的每个严重性,并将它们放在一个带逗号的字符串中
猜你喜欢
  • 1970-01-01
  • 2011-03-05
  • 2015-10-28
  • 2014-11-22
  • 2017-04-18
  • 2020-03-21
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多