00034-Convoulution Pose Machines

Author:

Shih-En Wei---The Robotics Institute Carnegie Mellon University

Abstract:

Pose Machines provide a sequential prediction framework for learning rich implicit spatial models.The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation.

1. Introduction

CPM (convolutional pose machines)

inherit the benefits --- the implicit learning of long-range dependencies between image and multi-part cues, tight integration between learning and inference, a modular sequential design
learn feature representations for both image and spatial context directly from data.
allows for golbally joint training with backpropagation.
efficiently handle large training datasets.

2D belief maps for location of each part.At a particular stage in the CPM, the spatial context of part beliefs provide strong disambiguating cues to a subsequent stage. As a result, each stage of a CPM produces belief maps with increasingly refined estimates for the locations of each part'

00034-Convoulution Pose Machines

We find, through experiments, that large receptive fields on the belief maps are crucial for learning long range spatial relationships and the result in improved accuracy.

Contributions:

learning implicit spatial models via a sequential composition of convolutional architectures
a systematic approach to designing and training such an architecture to learn both image features and image-dependent spatial models for structured prediction tasks, without the need for any graphical model style inference.

2 Related work

pictorial structures model
Hierarchical models
Non-tree models
sequential prediction

3. Method

00034-Convoulution Pose Machines

3.1 Pose Machines

Our goal is to predict the image locations Y = (Y1, ..., Yp) for all P parts.

A classifier in the first stage t = 1, therefore produces the following belief values:

00034-Convoulution Pose Machines

In subsequent stages, the classifier predicts a belief for assigning a location to each part Yp = every z is belong Z; based on (1) features of the image data xt z 2 Rd again, and (2) contextual information from the preceeding classifier in the neighborhood around each Yp:

00034-Convoulution Pose Machines

3.2 Convolutional Pose Machines

3.2.1 Keypoinnt Localization Using Local Image Evidence

The first stage of a convolutional pose machine predicts part beliefs from only local image evidence.

3.2.2 Sequential Prediction with Learned Spatial Context Features

A predictor in subsequent stages (gt > 1) can use the Spatial context Ψ

The design of the network is guided by achieving a receptive field at the output layer of the second stage network that is large enough to allow the learning of potentially complex and long-range correlations between parts.

Accuracy improves with the size of the receptive field.

00034-Convoulution Pose Machines

3.3 Learning in Convolutional Pose Machines

00034-Convoulution Pose Machines

The cost function:

00034-Convoulution Pose Machines

4 Evaluation

4.1 Analysis

Addressing vanishing gradients:

Benefit of end-to-end learning:

Comparison on training schemes:

Performance across stages:

4.2 Datasets and Quantitative Analysis

MPII Human Pose Dataset:

Leeds Sports Pose (LSP) Dataset:

FLIC Dataset:

1. Introduction

2 Related work

3. Method

4 Evaluation

Discussion