【问题标题】:how to denormalise coordinates?如何非规范化坐标?
【发布时间】:2019-11-11 22:30:02
【问题描述】:

我正在为计算机视觉应用程序注释数据集。我有 xml 文件形式的标准化坐标(xmin,ymin,xmax,ymax)

完整的xml如下所示:

<annotation>
    <folder>image</folder>
    <filename>100_icdar13.png</filename>
    <path>/Users/image/100_icdar13.png</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>816</width>
        <height>608</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>192</xmin>
            <ymin>157</ymin>
            <xmax>530</xmax>
            <ymax>223</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>561</xmin>
            <ymin>159</ymin>
            <xmax>645</xmax>
            <ymax>219</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>74</xmin>
            <ymin>247</ymin>
            <xmax>465</xmax>
            <ymax>311</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>493</xmin>
            <ymin>255</ymin>
            <xmax>625</xmax>
            <ymax>305</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>85</xmin>
            <ymin>339</ymin>
            <xmax>496</xmax>
            <ymax>400</ymax>
        </bndbox>
    </object>
</annotation>

我想对这个数据集进行非规范化并以下列格式导出所有的框

x1, y1, x2, y2, x3, y3, x4, y4, text

我该怎么做,我可以使用什么算法来实现这一点?

【问题讨论】:

标签: python tensorflow machine-learning computer-vision object-detection


【解决方案1】:

您可以使用ElementTree来解析XML并提取坐标:

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element

xml_raw = '''
<annotation>
    ...
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>192</xmin>
            <ymin>157</ymin>
            <xmax>530</xmax>
            <ymax>223</ymax>
        </bndbox>
    </object>
    <object>
        ...
    </object>
    ...
</annotation>
'''
if __name__ == '__main__':
    root: Element = ET.fromstring(xml_raw)
    for obj in root.findall('object'):
        bndbox: Element = obj.find('bndbox')

        name = obj.find('name').text
        xmin, xmax, ymin, ymax = [int(bndbox.find(x).text) for x in ['xmin', 'xmax', 'ymin', 'ymax']]
        coords = [(x, y) for x in [xmin, xmax] for y in [ymin, ymax]]
        print(name, coords)

哪个输出:

text [(192, 157), (192, 223), (530, 157), (530, 223)]
text [(561, 159), (561, 219), (645, 159), (645, 219)]
text [(74, 247), (74, 311), (465, 247), (465, 311)]
text [(493, 255), (493, 305), (625, 255), (625, 305)]
text [(85, 339), (85, 400), (496, 339), (496, 400)]

【讨论】:

  • 运行脚本时出现此错误: 中的文件“denormalise.py”,第 30 行 name = obj.find('name').text AttributeError: 'NoneType' object没有属性“文本”
  • 你从哪里得到x1, y1, x2, y2, x3, y3, x4, y4, text 中的text 字段?我以为它来自annotation&gt;object&gt;name。您可能需要相应地修改obj.find('name').text
  • 您的假设是正确的,您所做的代码是否必须具有 ' ' ... ' ' ' 或者我应该将其替换为 xml_raw = open("demofile.xml", "r ")
  • 你可以改用with open('raw_data.xml') as f: root = ET.parse(f)
  • 当我做 xml_raw = open("demofile.xml", "r") 我得到错误。 TypeError:需要一个类似字节的对象,而不是'_io.TextIOWrapper'
【解决方案2】:

这就是答案:

import xml.etree.ElementTree as ET
import os
import glob
import shutil

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element


with open('100_icdar13.xml') as f: root = ET.parse(f)
for obj in root.findall('object'):
    bndbox: Element = obj.find('bndbox')
    name = obj.find('name').text
    xmin, xmax, ymin, ymax = [int(bndbox.find(x).text) for x in ['xmin', 'xmax', 'ymin', 'ymax']]
    coords = [(x, y) for x in [xmin, xmax] for y in [ymin, ymax]]
    print(coords, name)

输出:

[(201, 162), (201, 229), (207, 162), (207, 229)] text
[(208, 162), (208, 229), (223, 162), (223, 229)] text
[(224, 162), (224, 229), (239, 162), (239, 229)] text
[(493, 255), (493, 305), (625, 255), (625, 305)] text
[(85, 339), (85, 400), (496, 339), (496, 400)] text

【讨论】:

    猜你喜欢
    • 2013-08-21
    • 2016-05-13
    • 2018-06-06
    • 2011-06-08
    • 2021-06-13
    • 2020-04-06
    • 1970-01-01
    • 1970-01-01
    • 2014-12-23
    相关资源
    最近更新 更多