【问题标题】:Parse comma separated list from xml in Python在 Python 中从 xml 中解析逗号分隔的列表
【发布时间】:2014-04-09 10:16:21
【问题描述】:

我花了好几个小时寻找这个问题的解决方案,但都空手而归。我正在尝试在 Python 中解析 xml 文档以将元素作为逗号分隔的列表返回。

这里是一个xml文档的例子:

<?xml version="1.0" encoding="utf-8"?>
<Report xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://adcenter.microsoft.com/advertiser/reporting/v5/XMLSchema" ReportName="My DestinationUrl Performance Report" ReportTime="4/7/2014" TimeZone="Various" ReportAggregation="Daily" LastCompletedAvailableDay="4/8/2014 5:00:00 PM (GMT)" LastCompletedAvailableHour="4/8/2014 5:00:00 PM (GMT)" PotentialIncompleteData="false">
  <DestinationUrlPerformanceReportColumns>
    <Column name="GregorianDate" />
    <Column name="AccountName" />
    <Column name="CampaignName" />
    <Column name="CampaignId" />
    <Column name="AdGroupName" />
    <Column name="AdGroupId" />
    <Column name="DestinationUrl" />
    <Column name="Impressions" />
    <Column name="Clicks" />
    <Column name="Spend" />
    <Column name="Conversions" />
  </DestinationUrlPerformanceReportColumns>
  <Table>
    <Row>
      <GregorianDate value="4/7/2014" />
      <AccountName value="BingAccount" />
      <CampaignName value="Campaign#1" />
      <CampaignId value="12345678" />
      <AdGroupName value="Adgroup1" />
      <AdGroupId value="901234567" />
      <DestinationUrl value="www.example.com" />
      <Impressions value="8" />
      <Clicks value="0" />
      <Spend value="0.00" />
      <Conversions value="0" />
    </Row>
    <Row>
      <GregorianDate value="4/7/2014" />
      <AccountName value="BingAccount" />
      <CampaignName value="Campaign#2" />
      <CampaignId value="83984398493" />
      <AdGroupName value="Adgroup#2" />
      <AdGroupId value="3439843983" />
      <DestinationUrl value="www.example.co.uk" />
      <Impressions value="20" />
      <Clicks value="2" />
      <Spend value="0.10" />
      <Conversions value="0" />
    </Row>
  </Table>
  <Copyright>©2014 Microsoft Corporation. All rights reserved. </Copyright>
</Report>

我想在逗号分隔的列表中返回每一行的值,所以想要的结果是: ('4/7/2014','BingAccount','Campaign#1','12345678','Adgroup1','901234567','www.example.com','8','0','0.00' ,'0') ('4/7/2014','BingAccount','Campaign#2','83984398493','Adgroup2','3439843983','www.example.co.uk','20','2',' 0.10','0')

这是我目前所拥有的,但无法进一步推进:

from xml.dom import minidom

xmldoc = minidom.parse('file.xml')

rows = xmldoc.firstChild.childNodes[3].childNodes

for i in rows:
    print tuple(i.childNodes)

【问题讨论】:

    标签: python xml


    【解决方案1】:

    试试xml.etree

    In [4]: print a
    <?xml version="1.0" encoding="utf-8"?>
    <Report xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://adcenter.microsoft.com/advertiser/reporting/v5/XMLSchema" ReportName="My DestinationUrl Performance Report" ReportTime="4/7/2014" TimeZone="Various" ReportAggregation="Daily" LastCompletedAvailableDay="4/8/2014 5:00:00 PM (GMT)" LastCompletedAvailableHour="4/8/2014 5:00:00 PM (GMT)" PotentialIncompleteData="false">
      <DestinationUrlPerformanceReportColumns>
        <Column name="GregorianDate" />
        <Column name="AccountName" />
        <Column name="CampaignName" />
        <Column name="CampaignId" />
        <Column name="AdGroupName" />
        <Column name="AdGroupId" />
        <Column name="DestinationUrl" />
        <Column name="Impressions" />
        <Column name="Clicks" />
        <Column name="Spend" />
        <Column name="Conversions" />
      </DestinationUrlPerformanceReportColumns>
      <Table>
        <Row>
          <GregorianDate value="4/7/2014" />
          <AccountName value="BingAccount" />
          <CampaignName value="Campaign#1" />
          <CampaignId value="12345678" />
          <AdGroupName value="Adgroup1" />
          <AdGroupId value="901234567" />
          <DestinationUrl value="www.example.com" />
          <Impressions value="8" />
          <Clicks value="0" />
          <Spend value="0.00" />
          <Conversions value="0" />
        </Row>
        <Row>
          <GregorianDate value="4/7/2014" />
          <AccountName value="BingAccount" />
          <CampaignName value="Campaign#2" />
          <CampaignId value="83984398493" />
          <AdGroupName value="Adgroup#2" />
          <AdGroupId value="3439843983" />
          <DestinationUrl value="www.example.co.uk" />
          <Impressions value="20" />
          <Clicks value="2" />
          <Spend value="0.10" />
          <Conversions value="0" />
        </Row>
      </Table>
      <Copyright>�.©2014 Microsoft Corporation. All rights reserved. </Copyright>
    </Report>
    
    In [5]: import xml.etree.ElementTree as ET
    
    In [6]: root = ET.fromstring(a)
    
    In [7]: [tuple([y.attrib['value'] for y in x]) for x in root[1]]
    Out[7]:
    [('4/7/2014',
      'BingAccount',
      'Campaign#1',
      '12345678',
      'Adgroup1',
      '901234567',
      'www.example.com',
      '8',
      '0',
      '0.00',
      '0'),
     ('4/7/2014',
      'BingAccount',
      'Campaign#2',
      '83984398493',
      'Adgroup#2',
      '3439843983',
      'www.example.co.uk',
      '20',
      '2',
      '0.10',
      '0')]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-09-17
      • 2015-09-05
      • 1970-01-01
      • 2015-01-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多