【问题标题】:Parsing data for xml file in python在python中解析xml文件的数据
【发布时间】:2015-02-14 15:56:42
【问题描述】:

我有以下 xml 文件:

<address addr="x.x.x.x" addrtype="ipv4"/>
<hostnames>
</hostnames>
<ports><port protocol="tcp" portid="1"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="tcpmux" method="table" conf="3"/></port>
<port protocol="tcp" portid="64623"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="unknown" method="table" conf="3"/></port>
</ports>
<times srtt="621179" rttvar="35357" to="762607"/>
</host>
<host starttime="1418707433" endtime="1418707742"><status state="up" reason="syn-ack" reason_ttl="0"/>
<address addr="y.y.y.y" addrtype="ipv4"/>
<hostnames>
</hostnames>
<ports><port protocol="tcp" portid="1"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="tcpmux" method="table" conf="3"/></port>
<port protocol="tcp" portid="64680"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="unknown" method="table" conf="3"/></port>
</ports>
<times srtt="834906" rttvar="92971" to="1206790"/>
</host>
<host starttime="1418707433" endtime="1418707699"><status state="up" reason="syn-ack" reason_ttl="0"/>
<address addr="w.w.w.w" addrtype="ipv4"/>
<hostnames>
</hostnames>
<ports><extraports state="filtered" count="997">
<extrareasons reason="no-responses" count="997"/>
</extraports>
<port protocol="tcp" portid="25"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="smtp" method="table" conf="3"/></port>
<port protocol="tcp" portid="443"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="https" method="table" conf="3"/></port>
<port protocol="tcp" portid="7443"><state state="open" reason="syn-ack" reason_ttl="0"/><service name="oracleas-https" method="table" conf="3"/></port>
</ports>
<times srtt="690288" rttvar="110249" to="1131284"/>
</host>

我尝试为每个 ip 提取数据的是:

import sys
import xml.etree.ElementTree as ET
input=sys.argv[1]

tree=ET.parse(input)
root=tree.getroot()

for host in root.findall('host'):
    updown=host.find('status').get('state')
    if updown=='up':
        print 'IP Address: '+host.find('address').get('addr')
        ports=[port.get('portid') for port in root.findall('.//port')]
        state=[port.get('state') for port in root.findall('.//port/state')]
        name=[port.get('name') for port in root.findall('.//port/service')]

但它会返回我所有的 ips 信息。如何获取每个 IP 的具体信息?

我想我应该更改root.findall,但我不知道该怎么做。

【问题讨论】:

    标签: python xml xml-parsing elementtree


    【解决方案1】:

    在循环中只需将root.findall() 更改为host.findall()

    for host in root.findall('host'):
        updown=host.find('status').get('state')
        if updown=='up':
            print 'IP Address: '+host.find('address').get('addr')
            ports=[port.get('portid') for port in host.findall('.//port')]
            state=[port.get('state') for port in host.findall('.//port/state')]
            name=[port.get('name') for port in host.findall('.//port/service')]
    

    这将限制在每个主机内查找端口、状态和名称,而不是在整个 XML 文档中查找。

    【讨论】:

      【解决方案2】:

      对我来说,这段代码似乎很可疑:

              ports=[port.get('portid') for port in root.findall('.//port')]
              state=[port.get('state') for port in root.findall('.//port/state')]
              name=[port.get('name') for port in root.findall('.//port/service')]
      

      在循环内部,您在整个根节点中搜索 './/port...' 的东西。
      看来你需要这个:

              ports=[port.get('portid') for port in host.findall('.//port')]
              state=[port.get('state') for port in host.findall('.//port/state')]
              name=[port.get('name') for port in host.findall('.//port/service')]
      

      【讨论】:

        【解决方案3】:

        通过指定

        root.findall('.//port')
        

        您再次从文档的根目录开始,因此返回所有端口。

        ports=[port.get('portid') for port in host.findall('./ports/port')]
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多