【问题标题】:Python extract values from list of OrderedDictsPython 从 OrderedDicts 列表中提取值
【发布时间】:2017-12-22 02:54:52
【问题描述】:

我已经使用 xmltodict 解析了一个 XML 文件,并且我发现了 <coordinates> 标记的路径,我希望从中提取经纬度值以添加到数据框中。这是一个小样本:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
   <Document>
      <Folder>
         <name>One Line Diagram</name>
         <open>0</open>
         <Folder>
            <name>SectionOne</name>
            <open>0</open>
            <Folder>
               <name>Node</name>
               <open>0</open>
               <Placemark>
                  <name>5680420</name>
                  <styleUrl>#Style_0</styleUrl>
                  <description />
                  <MultiGeometry type="MultiGeometry" Type="MultiGeometry">
                     <Polygon>
                        <outerBoundaryIs>
                           <LinearRing>
                              <coordinates>-83.6514766,67.0234192 -83.6515403,67.0233918 -83.6515309,67.0233134 -83.6514609,67.0232885 -83.5778406,67.0246267 -83.5777768,67.0246541 -83.5777861,67.0247325 -83.5778560,67.0247574 -83.6514766,67.0234192</coordinates>
                           </LinearRing>
                        </outerBoundaryIs>
                     </Polygon>
                  </MultiGeometry>
               </Placemark>
               <Placemark>
                  <name>25934531</name>
                  <styleUrl>#Style_0</styleUrl>
                  ML60
                  <description />
                  <MultiGeometry type="MultiGeometry" Type="MultiGeometry">
                     <Polygon>
                        <outerBoundaryIs>
                           <LinearRing>
                              <coordinates>-83.6512679,67.0216805 -83.6513317,67.0216531 -83.6513222,67.0215747 -83.6512522,67.0215498 -83.5967049,67.0225434 -83.5966412,67.0225708 -83.5966505,67.0226492 -83.5967204,67.0226741 -83.6512679,67.0216805</coordinates>
                           </LinearRing>
                        </outerBoundaryIs>
                     </Polygon>
                  </MultiGeometry>
               </Placemark>
            </Folder>
         </Folder>
      </Folder>
   </Document>
</kml>

路径在下面。

> doc['kml']['Document']['Folder']['Folder']['Folder'][0]['Placemark'][0]['MultiGeometry']['Polygon']['outerBoundaryIs']['LinearRing']['coordinates']

这是一个非常长的 xml 文档,包含 4 个Folder 标签,但我只需要第一个 ['Folder'][0] 中的值。我不知道该怎么做是遍历所有['Placemark'][n],直到提取所有坐标。

我已经尝试了几件事,最后一个在下面,这是一个尝试开始我的方式来找到正确的标签。但无济于事。

root_elements = doc['Document'] if type(doc['Document']) == OrderedDict else [doc['Document']]
for element in root_elements:
    print(element['Placemark'])

追溯:

Traceback (most recent call last)
<ipython-input-69-db580dc8b6e2> in <module>()
----> 1 root_elements = doc['Document'] if type(doc['Document']) == OrderedDict else [doc['Document']]
      2 for element in root_elements:
      3     print(element['Placemark'])

KeyError: 'Document'

感谢任何帮助。

【问题讨论】:

  • 错误告诉您doc 中没有关键的“文档”。您发布的路径不是以doc['kml']['Document'](不是doc['Document'])开头的吗?
  • 既然你这么说,为什么是的。我觉得我好笨。谢谢。

标签: python xml xml-parsing xmltodict


【解决方案1】:

您的 xml 缺少 2 个文件夹的结束标记(下面的第 4 行和第 3 行到最后一行。只需将它们复制并粘贴到您的 XML 文件中即可)。

使用此工具缩进的 XML https://www.freeformatter.com/xml-formatter.html#ad-output

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
   <Document>
      <Folder>
         <name>One Line Diagram</name>
         <open>0</open>
         <Folder>
            <name>SectionOne</name>
            <open>0</open>
            <Folder>
               <name>Node</name>
               <open>0</open>
               <Placemark>
                  <name>5680420</name>
                  <styleUrl>#Style_0</styleUrl>
                  <description />
                  <MultiGeometry type="MultiGeometry" Type="MultiGeometry">
                     <Polygon>
                        <outerBoundaryIs>
                           <LinearRing>
                              <coordinates>-83.6514766,67.0234192 -83.6515403,67.0233918 -83.6515309,67.0233134 -83.6514609,67.0232885 -83.5778406,67.0246267 -83.5777768,67.0246541 -83.5777861,67.0247325 -83.5778560,67.0247574 -83.6514766,67.0234192</coordinates>
                           </LinearRing>
                        </outerBoundaryIs>
                     </Polygon>
                  </MultiGeometry>
               </Placemark>
               <Placemark>
                  <name>25934531</name>
                  <styleUrl>#Style_0</styleUrl>
                  ML60
                  <description />
                  <MultiGeometry type="MultiGeometry" Type="MultiGeometry">
                     <Polygon>
                        <outerBoundaryIs>
                           <LinearRing>
                              <coordinates>-83.6512679,67.0216805 -83.6513317,67.0216531 -83.6513222,67.0215747 -83.6512522,67.0215498 -83.5967049,67.0225434 -83.5966412,67.0225708 -83.5966505,67.0226492 -83.5967204,67.0226741 -83.6512679,67.0216805</coordinates>
                           </LinearRing>
                        </outerBoundaryIs>
                     </Polygon>
                  </MultiGeometry>
               </Placemark>
            </Folder>
         </Folder>
      </Folder>
   </Document>
</kml>

使用 xmltodict 从包含您的 XML 的坐标.xml 文件中提取坐标(包括 2 个缺少的文件夹结束标记)

import xmltodict

with open('coordinates.xml') as coords:
    doc = xmltodict.parse(coords.read())

coordinates = []

#Loop and get each placemark tag in document
for placemark in doc['kml']['Document']['Folder']['Folder']['Folder']['Placemark']:
    #Get coordinates string from current placemark
    coordinateString=placemark['MultiGeometry']['Polygon']['outerBoundaryIs']['LinearRing']['coordinates']

    #split coordinates string into lists of coordinates. Split co-ord pairs by space (" "). Split x & y of each co-ord by comma (",")
    coordinateList=[x.split(",") for x in coordinateString.split(" ")]
    coordinates.append(coordinateList)

print(coordinates)

打印“坐标”列表的输出

[[[u'-83.6514766', u'67.0234192'], [u'-83.6515403', u'67.0233918'], [u'-83.6515309', u'67.0233134'], [u'-83.6514609', u'67.0232885'], [u'-83.5778406', u'67.0246267'], [u'-83.5777768', u'67.0246541'], [u'-83.5777861', u'67.0247325'], [u'-83.5778560', u'67.0247574'], [u'-83.6514766', u'67.0234192']], [[u'-83.6512679', u'67.0216805'], [u'-83.6513317', u'67.0216531'], [u'-83.6513222', u'67.0215747'], [u'-83.6512522', u'67.0215498'], [u'-83.5967049', u'67.0225434'], [u'-83.5966412', u'67.0225708'], [u'-83.5966505', u'67.0226492'], [u'-83.5967204', u'67.0226741'], [u'-83.6512679', u'67.0216805']]]

coordinates[0] 给出第一个地标标签的坐标列表

[[u'-83.6514766', u'67.0234192'], [u'-83.6515403', u'67.0233918'], [u'-83.6515309', u'67.0233134'], [u'-83.6514609', u'67.0232885'], [u'-83.5778406', u'67.0246267'], [u'-83.5777768', u'67.0246541'], [u'-83.5777861', u'67.0247325'], [u'-83.5778560', u'67.0247574'], [u'-83.6514766', u'67.0234192']], [[u'-83.6512679', u'67.0216805']

coordinates[0][0] 给出第一个地标标记的第一个坐标对

[u'-83.6514766', u'67.0234192']

coordinates[0][0] 给出第一个地标标记的第一个坐标对的 x 值

-83.6514766

【讨论】:

  • 抱歉格式化。你看到的是整个代码的一小部分。我只是放了一些,以便董事会了解数据。
  • Peter Out,你所做的和我所做的一样,但是当数字可以在文件之间更改时,我需要从多个 ['Placemark'] 标签中提取元组。今天早上我想也许简单的解决方案是placemark = doc['kml']['Document']['Folder']['Folder']['Folder']['Placemark'] 然后做len(placemark) 我会试试这个。
  • 我会补充一点,我喜欢你如何将路径带到我需要的地方进入一个变量。像董事会中的许多人一样,我正在自学 Python,但有时显而易见的事情会溜走。
  • 当然,我自己也不是 Python 专家。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-06-10
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多