使用同名标签解析 XML Etree答案

【问题标题】：XML Etree parse with tags named the same使用同名标签解析 XML Etree
【发布时间】：2018-03-30 14:37:18
【问题描述】：

所有，几天来一直试图解决这个问题，虽然我认为我已经接近了，但它只是返回空白，因为它没有获取正确的 XML。

示例 XML

<Attribute>
     <Name>Column1</Name>
     <Value>abcded</Value>
</Attribute>
<Attribute>
    <Name>Column2</Name>
    <Value>abcdef</Value>
</Attribute>
<Attribute>
    <Name>coumn3</Name>
    <Value>abcdef</Value>
</Attribute>

代码

for node in parsed_xml.iter():
    Attributes = node.get.attrib('column1')
    correlationssnnamecount = node.find('column2')
    divphoneaddrcount = node.find('column3')

df_xml = df_xml.append(
    pd.Series([column1, getvalueofnode(column2),
               getvalueofnode(column3)], 
               index=dfcols),
               ignore_index=True)

print df_xml

我正在寻找的基本上是我的数据框，其标题为“column1”，值是“column2”等。

【问题讨论】：

标签： python xml pandas elementtree

【解决方案1】：

由于文件已更改，这是解决方案但请注意，对于有效的 xml 文件，您将需要类似

的内容

<data> ...</data>

开头和结尾

import xml.etree.ElementTree as ET
import pandas as pd

inp=ET.parse("inputfile.xml")
inroot=inp.getroot()
ss={}

for child in inroot.iter('Attribute'):
    for n,v in zip(child.findall(".//Name"),child.findall(".//Value")):
        ss.update({n.text:v.text})

df=pd.DataFrame(ss,index=[0])

Out:
   Column1 Column2  coumn3
0  abcded  abcdef  abcdef

如果你的 inputfile.xml 是这里给出的 xml 文件https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

那就是

import xml.etree.ElementTree as ET
import pandas as pd

inp=ET.parse("inputfile.xml")
inroot=inp.getroot()

df=pd.DataFrame(columns= ['Country','rank','year','gdppc'])

ranks = inroot.findall(".//rank")
years = inroot.findall(".//year")
gdppc= inroot.findall(".//gdppc")

df['Country']=[child.attrib['name'] for child in inroot]
df['rank']=[r.text for r in ranks]
df['year']=[y.text for y in years]
df['gdppc'] = [g.text for g in gdppc]

这会根据您的要求生成一个数据框。

df:

      Country     rank  year  gdppc
0  Liechtenstein    1  2008  141100
1      Singapore    4  2011   59900
2         Panama   68  2011   13600

【讨论】：

嗨 Vipluv，感谢您的回复。这也适用于上述 .XML 吗？我正在更改它，因为您可能正在输入 :( 作为增加的复杂性，我在 S3 存储桶中有一堆 .xml 文件，这就是我使用迭代器的原因。
哦..这是完全不同的......所以 findall 的事情可能无济于事......就像你会做 inroot.findall(".//Name") 并获取所有名称然后 inroot.findall("//Values")... 你会需要用到child.attrib和child.tag，如图[docs.python.org/2/library/xml.etree.elementtree.html]
是的，我一直在尝试找出 child.attrib 和 child.tag，但不幸的是我没有取得太大的成功。
是的，这行得通。我现在只需要弄清楚如何消除 .XML 文档中的所有噪音，因为它们不包含和 .. 这就是说，该解决方案非常有效。