【发布时间】:2017-06-11 03:35:37
【问题描述】:
我正在使用 BeautifulSoup 解析 Tableau twb XML 文件以获取报告中的工作表列表。
包含我要查找的值的 XML 是
<window class='worksheet' name='ML Productivity'>
苦苦思索如何获取所有 class='worksheet',然后从中获取名称值,例如我想获取 'ML Productivity' 值。
我目前的代码如下。
import sys, os
import bs4 as bs
twbpath = "C:/tbw tbwx files/"
outpath = "C:/out/"
outFile = open(outpath + 'output.txt', "w")
#twbList = open(outpath + 'twb.txt', "w")
for subdir, dirs, files in os.walk(twbpath):
for file in files:
if file.endswith('.twb'):
print(subdir.replace(twbpath,'') + '-' + file)
filepath = open(subdir + '/' + file, encoding='utf-8').read()
soup = bs.BeautifulSoup(filepath, 'xml')
classnodes = soup.findAll('window')
for classnode in classnodes:
if str(classnode) == 'worksheet':
outFile.writelines(file + ',' + str(classnode) + '\n')
print(subdir.replace(twbpath,'') + '-' + file, classnode)
outFile.close()
【问题讨论】:
标签: python xml beautifulsoup