【问题标题】:Python - XML file to Pandas Dataframe [duplicate]Python - 到 Pandas Dataframe 的 XML 文件 [重复]
【发布时间】:2021-04-09 19:13:46
【问题描述】:

我对 python 还很陌生,希望在将 XML 文件转换为 Pandas Dataframe 时获得一些帮助。我已经搜索了其他资源,但仍然卡住了。我希望将标签之间的所有字段都放入一个表中。任何帮助是极大的赞赏!谢谢。

下面是我试过的代码,但它不能正常工作。

import xml.etree.ElementTree as ET
import pandas as pd

xml_data = open('5249009-08-34-59-126029.xml', 'r').read()
root = ET.XML(xml_data)

data = []
cols = []
for i, child in enumerate(root):
    data.append([subchild.text for subchild in child])
    cols.append(child.tag)

df = pd.DataFrame(data).T 
df.columns = cols 

print(df)

以下是示例输入数据"

<?xml version="1.0"?>

-<RECORDING>

<IDENT>0</IDENT>

<DEVICEID>133242232</DEVICEID>

<DEVICEALIAS>52232009</DEVICEALIAS>

<GROUP>1823481655</GROUP>

<GATE>1011655</GATE>

<ANI>7777777777</ANI>

<DNIS>777777777</DNIS>

<USER1>00:07:53.2322691,00:03:21.34232761</USER1>

<USER2>text</USER2>

<USER3/>

<USER4/>

<USER5>34fc0a8d-d5632c9b1</USER5>

<USER6>000dfsdf98701596638094</USER6>

<USER7>97</USER7>

<USER8>00701596638094</USER8>

<USER9>10155</USER9>

<USER10/>

<USER11/>

<USER12/>

<USER13>Text</USER13>

<USER14>4</USER14>

<USER15>10</USER15>

<CALLSEGMENTID/>

<CALLID>9870</CALLID>

<FILENAME>\\folderpath\folderpath\folderpath\folderpath\2020\Aug\05\5249009\52343109-234234-34-59-1234234029</FILENAME>

<DURATION>201</DURATION>

<STARTYEAR>2020</STARTYEAR>

<STARTMONTH>08</STARTMONTH>

<STARTMONTHNAME>August</STARTMONTHNAME>

<STARTDAY>05</STARTDAY>

<STARTDAYNAME>Wednesday</STARTDAYNAME>

<STARTHOUR>08</STARTHOUR>

<STARTMINUTE>34</STARTMINUTE>

<STARTSECOND>59</STARTSECOND>

<PRIORITY>50</PRIORITY>

<RECORDINGTYPE>S</RECORDINGTYPE>

<CALLDIRECTION>I</CALLDIRECTION>

<SCREENCAPTURE>7</SCREENCAPTURE>

<KEEPCALLFORDAYS>90</KEEPCALLFORDAYS>

<BLACKOUTREMOTEAUDIO>false</BLACKOUTREMOTEAUDIO>

<BLACKOUTS/>

</RECORDING>

【问题讨论】:

标签: python xml pandas dataframe


【解决方案1】:

如何解析文件的一种可能解决方案:

import pandas as pd
from bs4 import BeautifulSoup

soup = BeautifulSoup(open("your_file.xml", "r"), "xml")

d = {}
for tag in soup.RECORDING.find_all(recursive=False):
    d[tag.name] = tag.get_text(strip=True)

df = pd.DataFrame([d])
print(df)

打印:

  IDENT   DEVICEID DEVICEALIAS       GROUP     GATE         ANI       DNIS                               USER1 USER2 USER3 USER4               USER5                   USER6 USER7           USER8  USER9 USER10 USER11 USER12 USER13 USER14 USER15 CALLSEGMENTID CALLID                                           FILENAME DURATION STARTYEAR STARTMONTH STARTMONTHNAME STARTDAY STARTDAYNAME STARTHOUR STARTMINUTE STARTSECOND PRIORITY RECORDINGTYPE CALLDIRECTION SCREENCAPTURE KEEPCALLFORDAYS BLACKOUTREMOTEAUDIO BLACKOUTS
0     0  133242232    52232009  1823481655  1011655  7777777777  777777777  00:07:53.2322691,00:03:21.34232761  text              34fc0a8d-d5632c9b1  000dfsdf98701596638094    97  00701596638094  10155                        Text      4     10                 9870  \\folderpath\folderpath\folderpath\folderpath\...      201      2020         08         August       05    Wednesday        08          34          59       50             S             I             7              90               false          

【讨论】:

  • @adrej Kesely - 如何修改代码以循环遍历同一目录中的多个 XML 文件?
  • @JK34JK34 在这个答案中尝试示例:stackoverflow.com/questions/18262293/… 然后您可以将字典附加到列表并使用df = pd.DataFrame(lst)中的列表
猜你喜欢
  • 2019-03-28
  • 2018-01-09
  • 2016-11-06
  • 2018-11-19
  • 1970-01-01
  • 2018-04-30
  • 2022-01-22
  • 2021-11-24
  • 2020-07-08
相关资源
最近更新 更多