【问题标题】:Python: Get specific node values and attributes using lxml + objectify + findall or fromstringPython:使用 lxml + objectify + findall 或 fromstring 获取特定的节点值和属性
【发布时间】:2014-08-22 01:24:16
【问题描述】:

我从NVD 中取出并剪切了一部分 XML 源代码,下面是 sn-p:

<?xml version='1.0' encoding='UTF-8'?>
<nvd xmlns="http://nvd.nist.gov/feeds/cve/1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://nvd.nist.gov/feeds/cve/1.2 http://nvd.nist.gov/schema/nvdcve.xsd" pub_date="2014-07-01" nvd_xml_version="1.2">
   <entry CVSS_base_score="6.4" CVSS_exploit_subscore="10.0" CVSS_impact_subscore="4.9" CVSS_score="6.4" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:P/A:N)" CVSS_version="2.0" modified="2014-06-30" name="CVE-2011-1381" published="2014-06-27" seq="2011-1381" severity="Medium" type="CVE">
      <desc>
        <descript source="cve">Unspecified vulnerability in IBM OpenPages GRC Platform 6.1.0.1 before IF4 allows remote attackers to bypass intended access restrictions via unknown vectors.</descript>
      </desc>
   </entry>
   <entry CVSS_base_score="3.5" CVSS_exploit_subscore="6.8" CVSS_impact_subscore="2.9" CVSS_score="3.5" CVSS_vector="(AV:N/AC:M/Au:S/C:P/I:N/A:N)" CVSS_version="2.0" modified="2014-06-30" name="CVE-2014-4669" published="2014-06-28" seq="2014-4669" severity="Low" type="CVE">
      <desc>
        <descript source="cve">HP Enterprise Maps 1.00 allows remote authenticated users to read arbitrary files via a WSDL document containing an XML external entity declaration in conjunction with an entity reference within a GetQuote operation, related to an XML External Entity (XXE) issue.</descript>
      </desc>
   </entry>
</nvd>

正如这个问题的标题和上面的相关sn-p所提到的,我只想获取'descript'节点的值和属性。我尝试使用 findall 方法,但它返回一个空列表:

root = etree.fromstring(open("c:/temp/CVE/sample.xml").read()).getroottree().getroot()
root.findall('entry')

这会返回:

[]

当我打印根标签时,它返回的内容如下:

'{http://nvd.nist.gov/feeds/cve/1.2}nvd'

我还尝试打印直系父母及其子女的标签:

for e in root.iterchildren():
print "Immediate parent : %s" % e.tag
children = e.getchildren()
for c in children : print "\t\tchildren : %s" % c.tag

这是它返回的内容:

Immediate parent : {http://nvd.nist.gov/feeds/cve/1.2}entry
    children : {http://nvd.nist.gov/feeds/cve/1.2}desc
Immediate parent : {http://nvd.nist.gov/feeds/cve/1.2}entry
    children : {http://nvd.nist.gov/feeds/cve/1.2}desc

再次,我只想获取“描述”节点的属性和值。 任何想法都非常感谢。提前致谢!

【问题讨论】:

    标签: python xml parsing lxml


    【解决方案1】:

    您需要在 xpath 表达式中添加命名空间前缀:

    tree = etree.fromstring(open("c:/temp/CVE/sample.xml").read()).getroottree().getroot()
    for descript in tree.xpath('//ns:entry/ns:desc/ns:descript', namespaces={'ns': 'http://nvd.nist.gov/feeds/cve/1.2'}):
        print descript.text
        print descript.attrib.get('source')
    

    打印:

    Unspecified vulnerability in IBM OpenPages GRC Platform 6.1.0.1 before IF4 allows remote attackers to bypass intended access restrictions via unknown vectors.
    cve
    HP Enterprise Maps 1.00 allows remote authenticated users to read arbitrary files via a WSDL document containing an XML external entity declaration in conjunction with an entity reference within a GetQuote operation, related to an XML External Entity (XXE) issue.
    cve
    

    另请参阅此相关主题:

    【讨论】:

    • 非常感谢。为我工作
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2012-12-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-06-13
    • 1970-01-01
    相关资源
    最近更新 更多