SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). 在内存和精度上的一个妥协。
Introduction
It is important to retain boundary information in the extracted image representation.
The encoder network in Segnet is topologically identical to the convolutional layers in VGG16. We remove the fully connected layers of VGG16 which makes the SegNet encoder network significantly smaller and easier to train than many other recent architectures.
The key component of SegNet is the decoder network which consists of a hierarchy of decoders one corresponding to each encoder. Of these, the appropriate decoders use the max-pooling indices received from the corresponding encoder to perform non-linear upsampling of their input feature maps.
Resusing max-pooling indices in the decoding process has several practical advantages
It improves boundary delineation
It reduces the number of parameters enabling end-to-end trainng,
This form of upsampling can be incorporated into any encoder-decoder architecture with only a little modification
Encoder部分使用VGG16,但decoder部分不同很多
Segnet评估使用的两个场景任务。CamVid road scene segmentation and SUN RGB-D indoor scene segmentation
Literature review
For each sample,the indices of the max locations computed during pooling are stored and passed to the decoder.
Architecture
减参: from 134M to 14.7M
Encoder 提供了一种高效的方式去储存信息,仅仅储存nax-pooling indices,the locations of the maximun feature value in each pooling window is memorized for each encoder feature map
Decoder
和DeconvNet比,有一个比较大参数,需要更多的计算资源并且很难端到端训练,主要是因为使用了fc层
和U-net,not reuse pooling indics but instead transfers the entire feature map(at the cost of more memory) to the corresponding decoders and concatenates them to upsampling decoders feature map.另外都是使用预训练的VGGnet
Decoder Variants
只存索引,大量减参。SegNet对于池索引需要几乎可忽略的存储成本
Training
SGD,
Lr = 0.1
Momentum = 0.9
Mini_batch =12
Cross-entropy loss