python 2.7：不同的属性答案

【问题标题】：python 2.7: differing attributespython 2.7：不同的属性
【发布时间】：2016-06-25 04:42:00
【问题描述】：

我刚开始使用 (=learn) Python 2.7。我目前关注的重点是从 XML 文件中提取信息。到目前为止 xml.etree.ElementTree 让我走得很远。我现在遇到了“KeyError”。原因 - 据我所知 - 是具有不同属性的元素。

（更大的）XML 文件的关键部分：

<?xml version='1.0' encoding='utf-8' ?>

<XMLFILE>
  <datasources>
    <datasource caption='Sheet1 (ExcelSample)'>
      <connection class='excel-direct' filename='~\SomeExcel.xlsx' .....>
        ......
      </connection>
      <column header='Unit Price' datatype='real' name='[Calculation_1]'     role='measure' type='quantitative'>
        <calculation class='calculation' formula='Sum(Profit)/Sum(Sales)' />
      </column>
      <column datatype='integer' name='[Sales]' role='measure' type='quantitative' user:auto-column='numrec'>
        <calculation class='trial' formula='1' />
      </column>
    </datasource>
  </datasources>
  ........
</XMLFILE>

我的 Python 代码可以很好地提取数据类型和名称，即两列中都存在的属性：

for cal in xmlfile.findall('datasources/datasource/column'):
    dt= cal.attrib[ 'datatype' ]
    nm= cal.attrib[ 'name' ]
    print 'Column name:', dt, '    ', 'datatype:', nm

结果：

Column name: Calculation_1,    datatype:real
Column name: Sales,    datatype:integer

但是，如果我使用 cal.attrib['header'] Python 2.7。打印出来

"KeyError: 'header'

问题：如何告诉 Python 2.7。产生所需的输出：

Calculation "Unit Price": Sum(Profit)/Sum(Sales)

更准确地说，Python 应该做什么：“对于所有（= 如果不止一个，如上例中的那样）列包含属性 'header' 打印输出

header: Unit Price
    formula: Sum(Profit)
header: Sales per day in month
    formula: Sales / count(days(month))

（注意：为了显示更完整的所需输出，我添加了另一列，但我的示例中还没有）

非常感谢您的帮助！

【问题讨论】：

标签： python xml python-2.7

【解决方案1】：

您可以使用 XPath 谓词表达式按特定条件过滤元素，即过滤具有header 属性的column 元素：column[@header] *。所以你的for 循环看起来像这样：

for cal in xmlfile.findall('datasources/datasource/column[@header]'):
    print "header: " + cal.attrib["header"]
    print "    formula: " + cal.find('calculation').attrib["formula"]

*) 请注意，@attribute_name 语法用于引用 XPath 中的 XML 属性。

相反，如果您的意思是遍历 all column，无论它是否具有 header 属性，但仅在 column 具有该属性时打印标题属性值，那么您可以使用简单的if 块实现这一点，如下所示：

if "header" in cal.attrib:
    print "header: " + cal.attrib["header"]

【讨论】：

您好，非常感谢。尽管python现在不再给出错误，但不幸的是它不再打印任何东西......

【解决方案2】：

也许您可以使用“BeautifullSoup”（bs4）模块而不是“xml.etree”

看看Python BeautifulSoup XML Parsing和Extracting properly data with bs4?

【讨论】：

【解决方案3】：

KeyError 是 Python 告诉您在该元素中未找到您请求的属性的方式。没关系，您的 findall xpath 可能会引入一些没有“标题”属性的元素。由于您只对那些属于您的搜索并且恰好附加了“标题”属性的内容感兴趣，因此您可以执行以下操作：

for cal in xmlfile.findall('datasources/datasource/column'):
    try:
        header = cal.attrib["header"]
        #Do something with the header
        print header
    except KeyError:
        #This is where you end up if the element doesn't have a 'header' attribute
        #You shouldn't have to do anything with this 'cal' element

当然，你可以先检查头部是否存在，但是使用python一段时间后，我认为这种方法更简单。

【讨论】：