【发布时间】:2017-04-21 04:18:01
【问题描述】:
我有一个python 代码,我在其中解析一个xml 文件并从中提取所有tags。现在我想提取与tag 相关的特定值,但在这样做时发现了一些问题。我的xml 文件示例如下所示:
<Cell ss:StyleID="s65"><Data ss:Type="String">Variable Name</Data></Cell>
<Cell ss:StyleID="s65"><Data ss:Type="String">Variable Label</Data></Cell>
<Cell ss:StyleID="s79"><Data ss:Type="String">Minimum Value</Data></Cell>
<Cell ss:StyleID="s79"><Data ss:Type="String">Maximum Value</Data></Cell>
<Cell ss:StyleID="s80"><Data ss:Type="String">Mean Value</Data></Cell>
<Row ss:AutoFitHeight="0" ss:Height="15">
<Cell ss:StyleID="s73"><Data ss:Type="String">Marks</Data></Cell>
<Cell ss:StyleID="s73"><Data ss:Type="String">Marks of Students</Data></Cell>
<Cell ss:StyleID="s82"><Data ss:Type="Number">0</Data></Cell>
<Cell ss:StyleID="s82"><Data ss:Type="Number">96</Data></Cell>
<Cell ss:StyleID="s83"><Data ss:Type="Number">65.71</Data></Cell>
</Row>
现在上面只是我要提取的整个 xml 文件的一部分。我写了这段代码来打印xml文件中的所有标签:
import xml.etree.ElementTree
xmlTree = xml.etree.ElementTree.parse('sample_xml.xml').getroot()
elemList = []
for elem in xmlTree.iter():
elemList.append(elem.tag) # indent this by tab, not two spaces as I did here
# Just printing out the result
for element in elemList:
print(element)
现在,当我执行这段代码时,我看到的只是一堆重复的以下示例输出:
{urn:schemas-microsoft-com:office:spreadsheet}Interior
{urn:schemas-microsoft-com:office:spreadsheet}NumberFormat
{urn:schemas-microsoft-com:office:spreadsheet}Protection
{urn:schemas-microsoft-com:office:spreadsheet}Worksheet
{urn:schemas-microsoft-com:office:spreadsheet}Table
{urn:schemas-microsoft-com:office:spreadsheet}Column
{urn:schemas-microsoft-com:office:spreadsheet}Column
{urn:schemas-microsoft-com:office:spreadsheet}Column
{urn:schemas-microsoft-com:office:spreadsheet}Column
{urn:schemas-microsoft-com:office:spreadsheet}Column
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
{urn:schemas-microsoft-com:office:spreadsheet}Row
{urn:schemas-microsoft-com:office:spreadsheet}Cell
{urn:schemas-microsoft-com:office:spreadsheet}Data
我不知道要以哪个单元格、数据、行为目标来提取我需要的值(分数、学生分数、最小值、最大值),如开头示例 xml 格式所示。我该怎么做?
更新:根据建议,我可以使用以下代码提取与代码关联的文本:
for elem in xmlTree.iter():
if elem.text != None:
print(elem.text)
现在的问题是,在我的 xml 文件中有一堆不同的文本,但我想提取这 4 个标签文本之后的 4 个文本 - Marks,Marks of Students,Minimum Marks,Maximum Marks .当我的当前标签与Marks 匹配时,我尝试使用next() if 迭代器移动到下一个标签,并继续按该顺序匹配下一个3 个标签,但它没有产生所需的结果。这是我写的:
for elem in xmlTree.iter():
if elem.text == 'Marks':
if next(xmlTree.iter()) == 'Marks of Students':
if next(xmlTree.iter()) == 'Minimum Value':
if next(xmlTree.iter()) == 'Maximum Value':
print(next(elem.text))
print(next(elem.text))
print(next(elem.text))
print(next(elem.text))
【问题讨论】:
-
我无法通过修改您的 XML 以使其格式正确来重现该问题。请发布最小但完整的示例 XML,以及演示问题的相应输出...
标签: python xml parsing xml-parsing