【发布时间】:2019-01-26 02:56:21
【问题描述】:
我正在使用漂亮的汤从一堆 xml 文件中解析和提取一些信息,如下所示:
import os
a_lis = []
for filepath in glob(os.path.join('../data/trainingFiles/', '*.xml')):
with open(filepath) as f:
content = f.read()
results = BeautifulSoup(content, 'lxml')
#print(results)
for LabelInteractions in results.find_all("labelinteractions"):
#print(LabelInteractions)
for labelinteractions in LabelInteractions.findAll('labelinteraction'):
print(labelinteractions)
出来:
<labelinteraction precipitant="ritonavir" precipitantcode="N0000007423" type="Unspecified interaction"></labelinteraction>
<labelinteraction precipitant="gc stimulator" precipitantcode="NO MAP" type="Unspecified interaction"></labelinteraction>
....
<labelinteraction precipitant="riociguat" precipitantcode="N0000188995" type="Unspecified interaction"></labelinteraction>
<labelinteraction effect=" 25064002: Headache (finding)" precipitant="alcohol" precipitantcode="N0000007432" type="Pharmacodynamic interaction"></labelinteraction>
如何将这些 xml 属性转换为 pandas 数据框格式?,列看起来像这样:
precipitant precipitantcode type effect
【问题讨论】:
标签: python python-3.x pandas beautifulsoup lxml