【问题标题】:python: parsing XML fieldspython:解析 XML 字段
【发布时间】:2019-11-14 11:43:51
【问题描述】:

使用下面的 Python3 脚本,我能够解析 XML 记录并将其转换为列表,(通过从中提取值字段)。

请帮助改进它以使用 XML 记录中的名称“:”值打印。

例如:假设下面一块

<field name="RecordType" value="RESGJG"/>
<field name="RecordTypeHEC" value="PY"/>

得到输出

RESGJG, PY

需要的输出:

RecordType:RESGJG, RecordTypeHEC:PY

我的输入文件:dummy.xml(##请注意它有两条记录##每条记录都以record source="AJS/SHD"开头)

<?xml version="1.0" encoding="UTF-8"?>
<records>
<record source="AJS/SHD" type="call">
<group name="General">
<field name="RecordType" value="RESGJG"/>
<field name="RecordTypeHEC" value="PY"/>
<field name="NodeID" value="rock.dsjjgds.cm"/>
<field name="SequenceNumber" value="7937973"/>
<field name="StartDate" value="20171049979"/>
<field name="EndDate" value="201704059739793"/>
<field name="CallDuration" value="973979i"/>
<field name="CauseForRecordClosing" value="normal"/>
</group>
<group name="SIP">
<field name="ICID" value="dshhkdhs"/>
<field name="CallID" value="sdidydakyd2133@10.10.10.1"/>
<field name="User-Agent" value="NotPresent"/>
<field name="Request-URI" value="sip:+47668384"/>
<field name="CalledPartyNumber" value="sip:+08779379972"/>
<field name="CallingPartyNumber" value="sip:+07073873772@10.0.0.1"/>
<field name="To" value="sip:+878379739"/>
<field name="From" value="sip:+937973962"/>
</group>
<group name="VPN">
<field name="VPN_NAME_B" value="blshahd"/>
<field name="VPN_Group_B" value="ctr"/>
<field name="B_ExtType" value="part"/>
<field name="B_ISDN" value="7973"/>
<field name="B_SIP" value="67367672"/>
<field name="B_PABXID" value="797397"/>
</group>
</record>
<record source="AJS/SHD" type="call">
<group name="General">
<field name="RecordType" value="MESGJG"/>
<field name="RecordTypeHEC" value="DY"/>
<field name="NodeID" value="rock.dsjjgds.cm"/>
<field name="SequenceNumber" value="7937973"/>
<field name="StartDate" value="20171049979"/>
<field name="EndDate" value="201704059739793"/>
<field name="CallDuration" value="973979i"/>
<field name="CauseForRecordClosing" value="normal"/>
</group>
<group name="SIP">
<field name="ICID" value="dshhkdhs"/>
<field name="CallID" value="sdidydakyd2133@10.10.10.1"/>
<field name="User-Agent" value="NotPresent"/>
<field name="Request-URI" value="sip:+47668384"/>
<field name="CalledPartyNumber" value="sip:+08779379972"/>
<field name="CallingPartyNumber" value="sip:+07073873772@10.0.0.1"/>
<field name="To" value="sip:+878379739"/>
<field name="From" value="sip:+937973962"/>
</group>
<group name="VPN">
<field name="VPN_NAME_B" value="blshahd"/>
<field name="VPN_Group_B" value="ctr"/>
<field name="B_ExtType" value="part"/>
<field name="B_ISDN" value="7973"/>
<field name="B_SIP" value="67367672"/>
<field name="B_PABXID" value="797397"/>
</group>
</record>
</records>

我已经尝试过下面的脚本来解析 XML 字段并以列表格式打印。

import sys
import operator
from functools import reduce
from xml.etree.ElementTree import ElementTree

tree = ElementTree()
tree.parse("dummy.xml")
root = tree.getroot()
data = []
groups = root.findall('.//group')
for group in groups:
    data.append([f.attrib['value'] for f in group.findall('./field')])
    q = reduce(operator.concat, data)
    s = ", ".join(q)
print(s)

输出为

RESGJG, PY, rock.dsjjgds.cm, 7937973, 20171049979, 201704059739793, 973979i, normal, dshhkdhs, sdidydakyd2133@10.10.10.1, NotPresent, sip:+47668384, sip:+08779379972, sip:+07073873772@10.0.0.1, sip:+878379739, sip:+937973962, blshahd, ctr, part, 7973, 67367672, 797397, MESGJG, DY, rock.dsjjgds.cm, 7937973, 20171049979, 201704059739793, 973979i, normal, dshhkdhs, sdidydakyd2133@10.10.10.1, NotPresent, sip:+47668384, sip:+08779379972, sip:+07073873772@10.0.0.1, sip:+878379739, sip:+937973962, blshahd, ctr, part, 7973, 67367672, 797397

需要的输出:

RecordType:RESGJG, RecordTypeHEC:PY, NodeID:rock.dsjjgds.cm, SequenceNumber:7937973, StartDate:20171049979, EndDate:201704059739793, CallDuration:973979i, CauseForRecordClosing:normal, ICID:dshhkdhs, CallID:sdidydakyd2133@10.10.10.1, User-Agent:NotPresent, Request-URI:sip:+47668384, CalledPartyNumber:sip:+08779379972, CallingPartyNumber:sip:+07073873772@10.0.0.1, To:sip:+878379739, From:sip:+937973962, VPN_NAME_B:blshahd, VPN_Group_B:ctr, B_ExtType:part, B_ISDN:7973, B_SIP:67367672, B_PABXID:797397,

RecordType:MESGJG, RecordTypeHEC:DY, NodeID:rock.dsjjgds.cm, SequenceNumber:7937973, StartDate:20171049979, EndDate:201704059739793, CallDuration:973979i, CauseForRecordClosing:normal, ICID:dshhkdhs, CallID:sdidydakyd2133@10.10.10.1, User-Agent:NotPresent, Request-URI:sip:+47668384, CalledPartyNumber:sip:+08779379972, CallingPartyNumber:sip:+07073873772@10.0.0.1, To:sip:+878379739, From:sip:+937973962, VPN_NAME_B:blshahd, VPN_Group_B:ctr, B_ExtType:part, B_ISDN:7973, B_SIP:67367672, B_PABXID:797397,

请帮帮我

【问题讨论】:

  • 你只会得到f.attrib['value']。您还需要获取 f.attrib['name']... 并将 data 设为字典,因为您想要一本字典。

标签: python regex python-3.x xml


【解决方案1】:

您的代码仅获取 value 属性,它完全忽略了 name

另外,使用reduce 有点矫枉过正。

groups = root.findall('.//group')
for group in groups:
    print(', '.join('{}: {}'.format(field.attrib['name'], field.attrib['value']) for field in group.findall('./field')))
    print()

将输出:

RecordType: RESGJG, RecordTypeHEC: PY, NodeID: rock.dsjjgds.cm, SequenceNumber: 7937973, StartDate: 20171049979, EndDate: 201704059739793, CallDuration: 973979i, CauseForRecordClosing: normal

ICID: dshhkdhs, CallID: sdidydakyd2133@10.10.10.1, User-Agent: NotPresent, Request-URI: sip:+47668384, CalledPartyNumber: sip:+08779379972, CallingPartyNumber: sip:+07073873772@10.0.0.1, To: sip:+878379739, From: sip:+937973962

VPN_NAME_B: blshahd, VPN_Group_B: ctr, B_ExtType: part, B_ISDN: 7973, B_SIP: 67367672, B_PABXID: 797397

RecordType: MESGJG, RecordTypeHEC: DY, NodeID: rock.dsjjgds.cm, SequenceNumber: 7937973, StartDate: 20171049979, EndDate: 201704059739793, CallDuration: 973979i, CauseForRecordClosing: normal

ICID: dshhkdhs, CallID: sdidydakyd2133@10.10.10.1, User-Agent: NotPresent, Request-URI: sip:+47668384, CalledPartyNumber: sip:+08779379972, CallingPartyNumber: sip:+07073873772@10.0.0.1, To: sip:+878379739, From: sip:+937973962

VPN_NAME_B: blshahd, VPN_Group_B: ctr, B_ExtType: part, B_ISDN: 7973, B_SIP: 67367672, B_PABXID: 797397

【讨论】:

  • 你好..感谢您的回复..我是编程新手,不太确定是否提到“dict格式”是正确的沟通方式..但我需要的输出与上面提到的相同.
  • 你的第一个代码很好,我唯一的问题是我需要打印 dict {} 下的所有字段,分别每条记录/意味着这个 dummy.xml 有 2 条记录..我可以得到帮助分别逐行打印。 (就像我在所需输出中提到的那样)
  • @SSK 请确定所需的输出。它从一个字典开始,到一个字符串现在又是一个字典?
  • 对不起.. 在我发送评论之前我没有看到更新的答案.. 但谢谢.. 它帮助并满足了我的要求。
  • 非常感谢.. 我从上面更新的代码中得到了所需的输出。
猜你喜欢
  • 1970-01-01
  • 2013-07-31
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多