用于对象检测的 PyTorch - 图像增强答案

【问题标题】：PyTorch for Object detection - Image augmentation用于对象检测的 PyTorch - 图像增强
【发布时间】：2021-06-06 01:17:55
【问题描述】：

我正在使用 PyTorch 进行对象检测和改进现有模型（迁移学习），如以下链接所述 - https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

虽然不同的变换用于图像增强（本教程中的水平翻转），但本教程没有提及任何关于变换边界框/注释以确保它们与变换后的图像一致的内容。我在这里缺少一些基本的东西吗？

【问题讨论】：

标签： python image computer-vision pytorch object-detection

【解决方案1】：

在训练阶段，转换确实应用于图像和目标，同时加载数据。在PennFudanDataset 类中，我们有这两行：

if self.transforms is not None:  
    img, target = self.transforms(img, target)

其中target 是一个字典，包含：

target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd

PennFudanDataset 类中的self.transforms() 设置为包含[transforms.ToTensor(), transforms.Compose()] 的转换列表，get_transform() 的返回值，同时使用以下方法实例化数据集：

dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))

为对象检测任务编写的变换transforms.Compose() comes from T, a custom transform。具体来说，在__call__ of RandomHorizontalFlip() 中，我们同时处理 image 和 target（例如，掩码、关键点）：

为了完整起见，我从github repo中借用代码：

def __call__(self, image, target):
        if random.random() < self.prob:
            height, width = image.shape[-2:]
            image = image.flip(-1)
            bbox = target["boxes"]
            bbox[:, [0, 2]] = width - bbox[:, [2, 0]]
            target["boxes"] = bbox
            if "masks" in target:
                target["masks"] = target["masks"].flip(-1)
            if "keypoints" in target:
                keypoints = target["keypoints"]
                keypoints = _flip_coco_person_keypoints(keypoints, width)
                target["keypoints"] = keypoints
        return image, target

在这里，我们可以了解它们是如何根据图像对masks和keypoints进行翻转的。

【讨论】：