Image saliency_detection / segmentation论文

Existing saliency detection approaches usually focus on how to effectively combine hierarchical features so as to encode rich semantic representation, then capture distinctive objectness and detailed boundaries simultaneously. However, it is often overlooked that directly apply concatenation or element-wise operation to different feature maps is sub-optimal, because some maps are too clustered which may introduce misleading information.

This work first proposes a Recurrent Localization Network, which consists of a contextual weighting module (CWM) and a recurrent module (RM). CWM can adaptively weight the feature maps for each position based on a predicted spatial response map. The recurrent module gradually refine the predicted saliency map over ‘time’.
This work adopt a Boundary Refinement Network (BRN) to recover the detailed boundary information. BRN can predict a $n\times n$ coefficient map for each pixel which indicates the relationship between the center point and its $n\times n$ neighbors.
In summary, the contextual weighting module is organized as an inception-like module with 3x3, 5x5 and 7x7 convolutional kernal, followed by a concatnation and convolution operation. The CWM module generates a response map, indicating the importance for each spatial position.
For the feature map of each block, the recurrent module simultaneously utilize both the current feed-forward input and the previous state of the same block.
The boundary refinement network takes current image and its saliency map as input, aiming to learn the propagation coefficients with several convolutional layers. The propagation coefficients are then used to refine the saliency map.