DensePose: Dense Human Pose Estimation In The Wild

Rıza Alp G¨uler, Natalia Neverova, Natalia Neverova
DensePose: Dense Human Pose Estimation In The Wild
DensePose-COCO: a large-scale ground-truth dataset with image-to-surface correspondences manually annotated on 50K COCO images(标数据的难度可想而知)
DensePose-RCNN: densely regress part-specific UV coordinates within every human region at multiple frames per second(居然还是使用回归这么原始的方法?还R-CNN?)

Abstract

dense human pose estimation: dense correspondences between an RGB image and a surface-based representation of the human body
可想而知即使是标一张数据的难度也是较大的,因此作者介绍了一种有效的标注方法
in the wild: in the presence of background, occlusions and scale variations,这样的数据标注更困难,遑论预测

Introduction

二维图像的理解和三维重建密切相关的
基于DenseReg,用CNN回归3D模型与RGB图像间点的对应关系。但是这里的问题相比于DenseReg更困难,因为in the wild,人的姿势变化更剧烈。
contributions:
1. introduce the first manually-collected ground truth dataset for the task, by gathering dense correspondences between the SMPL model and persons appearing in the COCO dataset
2. use the resulting dataset to train CNN-based systems that deliver dense correspondence ‘in the wild’, by regressing ody surface coordinates at any image pixel, observing a superiority of region-based models over fully-convolutional networks
3. use sparse correspondences defined over a randomly chosen subset of image pixels per training sample to ‘inpaint’ the supervision signal in the rest of the image domain

COCO-DensePose Dataset

Head, Torso, Lower/Upper Arms, Lower/Upper Legs, Hands and Feet
head, hands and feet: use the manually obtained UV fields provided in the SMPL model
rest of the parts: obtain the unwrapping via multidimensional scaling applied to pairwise geodesic distances

Accuracy of human annotators

人标记的数据也是有误差的,尤其是对于比较精细的部位,如头、手脚等

Evaluation Measures

Pointwise evaluation: evaluates correspondence accuracy over the whole image domain through the Ratio of Correct Point (RCP) correspondences (a correspondence is declared correct if the geodesic distance is below a certain threshold). 对于不同阈值t计算AUC(Area Under the Curve)

AUCa=1a0af(t)dt

Per-instance evaluation: geodestic point similarity
GPSj=1PjpPjexp(g(ip,i^p)22κ2)

Learning Dense Human Pose Estimation

DensePose-RCNN: combining the DenseReg approach(FCN) with the Mask-RCNN architecture. proposing regions-of-interest (ROI), extracting region-adapted features through ROI pooling and feeding the resulting features into a region-specific branch
如下图一,类似Faster R-CNN,先提取特征,获取RoI proposal并进行RoI pooling,将特征继续进行卷积,获得类别和patch。将得到的patch输入图二中类似Mask R-CNN的multi-task模型中,得到最终结果。其中应用了cross cascading
DensePose: Dense Human Pose Estimation In The Wild
DensePose: Dense Human Pose Estimation In The Wild

相关文章: