【问题标题】:How to remove blocks in XML based on a condition如何根据条件删除 XML 中的块
【发布时间】:2021-08-06 02:08:56
【问题描述】:

我的 XML 文件包含 10k 个用户,我需要删除电子邮件不包含 @acme.com 的所有用户

<?xml version="1.0" encoding="UTF-8"?>
<users type="array">
  <user>
    <id type="integer">14000760626</id>
    <name> Credentialing Department</name>
    <email>user1@acme.com</email>
    <created-at type="dateTime">2020-03-26T10:23:34-04:00</created-at>
    <updated-at type="dateTime">2020-03-26T10:23:34-04:00</updated-at>
    <active type="boolean">false</active>
    <job-title></job-title>
    <phone>1234567890</phone>
    <mobile>1234567890</mobile>
    <description></description>
    <time-zone>Eastern Time (US &amp; Canada)</time-zone>
    <deleted type="boolean">false</deleted>
    <language>en</language>
    <address></address>
    <external-id nil="true"/>
    <helpdesk-agent type="boolean">false</helpdesk-agent>
    <location-name nil="true"/>
    <time-format>12h</time-format>
    <company-names type="array"/>
    <custom_field>
    </custom_field>
  </user>
</users>

我尝试关注how do I filter values from XML file in python,但在更改以下行时卡住了:

>>> xmldata.xpath('/localization/b[@n="Levels"]/l[@k=$level]/v/text()',level='Level1')
['Beginner Level']

我也尝试了其他方法,但总是会丢失一些数据,示例结果:

<?xml version="1.0" encoding="UTF-8"?>
<users type="array">
<user>
<id>14000760626</id>
<name> Credentialing Department</name>
<email>test@aoncology.com</email>
<created-at>2020-03-26T10:23:34-04:00</created-at>
<updated-at>2020-03-26T10:23:34-04:00</updated-at>
<active>false</active>
<job-title>None</job-title>
<phone>1234567890</phone>
<mobile>1234567890</mobile>
<description>None</description>
<time-zone>Eastern Time (US & Canada)</time-zone>
<deleted>false</deleted>
<language>en</language>
<address>None</address>
<external-id>None</external-id>
<helpdesk-agent>false</helpdesk-agent>
<location-name>None</location-name>
<time-format>12h</time-format>
<company-names>None</company-names>
<custom_field>
    </custom_field>
</user>

</users>

【问题讨论】:

    标签: python xml-parsing


    【解决方案1】:

    如果我理解正确,您正在寻找这样的东西:

    假设一个简化的 XML:

    users = """<?xml version="1.0" encoding="UTF-8"?>
    <users type="array">
      <user>
        <id type="integer">14000760626</id>
        <name> Credentialing Department</name>
        <email>user1@acme.com</email>      
      </user>
      <user>
        <id>14000760626</id>
        <name> Credentialing Department</name>
        <email>test@aoncology.com</email>
       </user>
    </users>"""
    

    然后:

    doc = etree.XML(users.encode())
    for user in doc.xpath('//users/user'):        
        if not "acme" in user.xpath('./email')[0].text:
            user.getparent().remove(user)
    print(etree.tostring(doc).decode())
    

    输出:

    <users type="array">
      <user>
        <id type="integer">14000760626</id>
        <name> Credentialing Department</name>
        <email>user1@acme.com</email>      
      </user>
      </users>
    

    从 lxml 导入 etree

    【讨论】:

    • Hey Jack - 检查因为我不是开发人员所以在终端中:1) 运行 users = """<... doc="etree.XML(users.encode())对于" doc.xpath user.xpath print etree.tostring lxml import etree>
    猜你喜欢
    • 2014-11-21
    • 1970-01-01
    • 1970-01-01
    • 2017-05-09
    • 2019-07-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-12-18
    相关资源
    最近更新 更多