使用文档构建器工厂在 Python 中解析 XML答案

【问题标题】：XML Parsing in Python using document builder factory使用文档构建器工厂在 Python 中解析 XML
【发布时间】：2009-08-04 19:39:12
【问题描述】：

我在 STAF 和 STAX 工作。这里python用于编码。我是 python 新手。基本上我的任务是使用 Document Factory Parser 在 python 中解析一个 XML 文件。

我要解析的 XML 文件是：

<?xml version="1.0" encoding="utf-8"?>
<operating_system>
  <unix_80sp1>
    <tests type="quick_sanity_test">
      <prerequisitescript>preparequicksanityscript</prerequisitescript>
      <acbuildpath>acbuildpath</acbuildpath>
      <testsuitscript>test quick sanity script</testsuitscript>
      <testdir>quick sanity dir</testdir>
    </tests>
    <machine_name>u80sp1_L004</machine_name>
    <machine_name>u80sp1_L005</machine_name>
    <machine_name>xyz.pxy.dxe.cde</machine_name>
    <vmware id="155.35.3.55">144.35.3.90</vmware>
    <vmware id="155.35.3.56">144.35.3.91</vmware>
  </unix_80sp1>
</operating_system>

我需要阅读所有标签。
对于标签 machine_name，我需要将它们读入列表说所有机器名称都应该在一个列表中。所以读取标签后，machname 应该是 [u80sp1_L004,u80sp1_L005,xyz.pxy.dxe.cde]。
我还需要所有 vmware 标签：所有属性都应为 vmware_attr =[155.35.3.55,155.35.3.56] 所有 vmware 值都应为 vmware_value = [ 144.35.3.90,155.35.3.56]

我能够正确读取除 vmware 标签和机器名称标签之外的所有标签：我正在使用以下代码：（我是 xml 和 vmware 的新手）。需要帮助。

以下代码需要修改。

factory = DocumentBuilderFactory.newInstance();
factory.setValidating(1)
factory.setIgnoringElementContentWhitespace(0)
builder = factory.newDocumentBuilder()
document = builder.parse(xmlFileName)

vmware_value = None
vmware_attr = None
machname = None

# Get the text value for the element with tag name "vmware" 
nodeList = document.getElementsByTagName("vmware") 
for i in range(nodeList.getLength()): 
node = nodeList.item(i) 
if node.getNodeType() == Node.ELEMENT_NODE: 
children = node.getChildNodes() 
for j in range(children.getLength()): 
thisChild = children.item(j) 
if (thisChild.getNodeType() == Node.TEXT_NODE): 
vmware_value = thisChild.getNodeValue()
vmware_attr ==??? what method to use ?
# Get the text value for the element with tag name "machine_name" 
nodeList = document.getElementsByTagName("machine_name") 
for i in range(nodeList.getLength()): 
node = nodeList.item(i) 
if node.getNodeType() == Node.ELEMENT_NODE: 
children = node.getChildNodes() 
for j in range(children.getLength()): 
thisChild = children.item(j) 
if (thisChild.getNodeType() == Node.TEXT_NODE): 
machname = thisChild.getNodeValue()

还有如何检查标签是否存在。我需要正确编码解析。

【问题讨论】：

我知道间距在 Python 中很重要，所以我不知道我应该如何格式化那堵代码墙。那你就靠你自己了，OP。

标签： python xml parsing

【解决方案1】：

您需要将 vmware_value、vmware_attr 和 machname 实例化为列表而不是字符串，所以不要这样：

vmware_value = None
vmware_attr = None
machname = None

这样做：

vmware_value = []
vmware_attr = []
machname = []

然后，要将项目添加到列表中，请在列表中使用 append 方法。例如：

factory = DocumentBuilderFactory.newInstance();
factory.setValidating(1)
factory.setIgnoringElementContentWhitespace(0)
builder = factory.newDocumentBuilder()
document = builder.parse(xmlFileName)

vmware_value = []
vmware_attr = []
machname = []

# Get the text value for the element with tag name "vmware"
nodeList = document.getElementsByTagName("vmware")
for i in range(nodeList.getLength()):
    node = nodeList.item(i)
    vmware_attr.append(node.attributes["id"].value)
    if node.getNodeType() == Node.ELEMENT_NODE:
        children = node.getChildNodes()
        for j in range(children.getLength()):
            thisChild = children.item(j)
            if (thisChild.getNodeType() == Node.TEXT_NODE):
                vmware_value.append(thisChild.getNodeValue())

我还将代码编辑为我认为应该可以将正确的值附加到 vmware_attr 和 vmware_value。

我不得不假设 STAX 使用 xml.dom 语法，所以如果不是这样，您将不得不适当地编辑我的建议。

【讨论】：