Python 使用 xml.dom.minidom 解析 XML - 提取列表中的项目答案

【问题标题】：Python Parse XML using xml.dom.minidom - Extract Items within ListPython 使用 xml.dom.minidom 解析 XML - 提取列表中的项目
【发布时间】：2012-08-17 03:14:19
【问题描述】：

我有很长的 xml 这实际上是使用 ebay api 的 ebay 列表，我试图在该 xml dom 中提取以下结构：

我只放了我遇到问题的片段，如果您需要查看整个文件，请告诉我，我可以将其上传到某个位置或将附件作为图片。

<ItemSpecifics>
<NameValueList>
<Name>Room</Name>
<Value>Living Room</Value>
</NameValueList>
<NameValueList>
<Name>Type</Name>
<Value>Sofa Set</Value>
</NameValueList>
<NameValueList>...</NameValueList>
<NameValueList>
<Name>Upholstery Fabric</Name>
<Value>Microfiber</Value>
</NameValueList>
<NameValueList>
<Name>Color</Name>
<Value>Beiges</Value>
</NameValueList>
<NameValueList>
<Name>Style</Name>
<Value>Contemporary</Value>
</NameValueList>
<NameValueList>
<Name>MPN</Name>
<Value>F7615, F7616, F7617, F7618, F7619, F7620</Value>
</NameValueList>
</ItemSpecifics>

这是另一个 ebay 项目的 dom 结构：

ItemSpecifics>
<NameValueList>
<Name>Brand</Name>
<Value>Nikon</Value>
</NameValueList>
<NameValueList>
<Name>Model</Name>
<Value>D3100</Value>
</NameValueList>
<NameValueList>
<Name>MPN</Name>
<Value>9798</Value>
</NameValueList>
<NameValueList>
<Name>Type</Name>
<Value>Digital SLR</Value>
</NameValueList>
<NameValueList>
<Name>Megapixels</Name>
<Value>14.2 MP</Value>
</NameValueList>
<NameValueList>
<Name>Optical Zoom</Name>
<Value>3.1x</Value>
</NameValueList>
<NameValueList>
<Name>Screen Size</Name>
<Value>3"</Value>
</NameValueList>
<NameValueList>
<Name>Color</Name>
<Value>Black</Value>
</NameValueList>
</ItemSpecifics>

但是当我尝试提取上述元素时，我最终得到以下错误：

   attID=att.attributes.getNamedItem('Name').nodeValue
AttributeError: 'NoneType' object has no attribute 'nodeValue'

这是我解析响应后得到的：

[<DOM Element: NameValueList at 0x103398878>, <DOM Element: NameValueList at 0x103398ab8>, <DOM Element: NameValueList at 0x103398cf8>, <DOM Element: NameValueList at 0x103398f38>, <DOM Element: NameValueList at 0x1033b31b8>, <DOM Element: NameValueList at 0x1033b33f8>, <DOM Element: NameValueList at 0x1033b3638>, <DOM Element: NameValueList at 0x1033b3878>]

这是我在出现错误之前在 for 循环中得到的内容：

<DOM Element: NameValueList at 0x103398878>

这是我的代码：

  results = {}
  attributeSet=response.getElementsByTagName('NameValueList')
  print attributeSet
  attributes={}
  for att in attributeSet:
    print att
    attID=att.attributes.getNamedItem('Name').nodeValue
    attValue=getSingleValue(att,'Value')
    attributes[attID]=attValue
  result['attributes']=attributes
  return result

这是我的xml请求方法：

def sendRequest(apicall,xmlparameters):
  connection = httplib.HTTPSConnection(serverUrl)
  connection.request("POST", '/ws/api.dll', xmlparameters, getHeaders(apicall))
  response = connection.getresponse()
  if response.status != 200:
    print "Error sending request:" + response.reason
  else: 
    data = response.read()
    connection.close()
  return data

【问题讨论】：

<NameValueList>...</NameValueList> 真的在你的数据集中吗？
是的，我在问题中提出的 xml dom 实际上是打印出来的
这很有趣。我想也许你被截断了，但这样做似乎是一个奇怪的地方。
所以我尝试了不同的项目并更新了问题

标签： python xml dom xml-parsing parsexml

【解决方案1】：

attributes.getNamedItem() 为您提供元素的属性，而不是子元素，<NameValueList> 元素没有 Name 属性，只有 <Name> 元素。您必须遍历 <NameValueList> 包含的元素，或使用 .getElementsByTagName('Name') 和 .getElementsByTagName('Value') 来获取各个子节点。

不过，请帮自己一个大忙，改用ElementTree API；该 API 远比 XML DOM API 更 Python 且更易于使用：

from xml.etree import ElementTree as ET

etree = ET.fromstring(data)
results = {}
for nvl in etree.findall('NameValueList'):
    name = nvl.find('Name').text
    value = nvl.find('Value').text
    results[name] = value

【讨论】：

感谢您的回复，本文档不是 ItemSpecifics> 所以在这种情况下我将如何根据 ElementTreeAPI 缩小范围
@Null-Hypothesis：更新；所有Name 和Value 元素都用于在该sn-p 中创建results 字典。
@Null-Hypothesis：什么是 XML 响应？是 urllib2 urlopen 响应吗？
@Null-Hypothesis：data 是一个 Python 字符串，因此只需使用 ElementTree 模块中的 .fromstring(data) 函数即可。
@Null-Hypothesis：这就是文档链接的用途 :-) 使用它，那里有教程！