- Holistically-Nested Edge Detection (CVPR 2015 (HED))
- Richer Convolutional Features for Edge Detection (CVPR 2017 (RCF))
- Deeply Supervised Salient Object Detection with Short Connections(CVPR 2017 (DSS))
Purpose
Generally speaking, HED、 RCF and DSS aim to solve the following problems:
(1) holistic image training and prediction;
(2) multi-scale and multi-level feature learning.
(3)high level and low level concat (shot connections)
Model
1. HED
HED automatically learns rich hierarchical representations (guided by deep supervision on side responses) that are important in order to resolve the challenging ambiguity in edge and object boundary detection.
HED network architecture
he principle of network construction:
Multi-scale learning: inside (left) and outside (right).
On the one hand, multi-scale learning can be “inside” the neural network, in the form of increasingly larger receptive fields and downsampled layers. On the other hand, multi-scale learning can be “outside” of the neural network, for example by “tweaking the scales” of input images. The later, however, is time-consuming. So, this paper uses the former.
The architecture comprises a single stream deep network with multiple side outputs to achieve the goal that have a network that learns features from which it is possible to produce edge maps approaching the ground truth.
2. RCF
This paper attempt to adopt richer convolutional features to detect edge
RCF network architecture
The principle of network construction:
The use of this rich hierarchical information of every stage is hypothesized to help a lot, so the model increases the number hierarchical information on the basis of HED.
3. DSS
This paper provides rich multi-scale feature maps at each layer, a property that is critically needed to perform segment detection
DSS network architecture
By having a series of short connections from deeper side outputs to shallower ones, our new framework offers two advantages: (1) high-level features can be transformed to shallower side-output layers and thus can help them better locate the most salient region; shallower side output layers can learn rich low-level features that can help refine the sparse and irregular prediction maps from deeper side-output layers.(2) By combining features from different levels, the resulting architecture provides rich multi-scale feature maps at each layer, a property that is essentially need to do salient object detection.
Thinking
Although these papers take into account the role of the layer, only produce a map at each stage, so information is less, we can consider emerging multiple maps at each stage, and finally merge multiple maps.