极简笔记 Fully Convolutional Adaptation Networks for Semantic Segmentation

文章核心贡献，提出FCAN，探究利用GTA5游戏数据集来训练语义分割网络，并将此网络迁移到真实路况场景下进行测试。
极简笔记 Fully Convolutional Adaptation Networks for Semantic Segmentation

FCAN分为两部分，Appearance Adaptation Networks（ANN）和Representation Adaptation Networks（RAN），总结构见上图。GTA5的图像记为source domain，真实街道场景记作target domain。为了解决从source domain训练transfer到target domain引起的domain shift问题，文章提供了两点思路：1. 让source domain图像和target domain图像更加相近；2. 学习domain invariant特征用于语义分割。这两个步骤分别由ANN和RAN完成。
极简笔记 Fully Convolutional Adaptation Networks for Semantic Segmentation

ANN详细结构如上图，它的本质是一个风格迁移网络，将白噪声输入网络，以迭代的方式，通过让网络浅层的feature correlation和target domain图像集合 $X_{t}$ 浅层的feature correlation更接近来约束图像风格，让深层的feature map和source domain(单幅图像 $x_{s}$ )的feature map更接近来约束图像语义内容，从而实现 $x_{s}$ 到 $x_{t}$ 的风格变换，即让GTA5图像看起来更接近真实场景。因为图像主要内容还是 $x_{s}$ ，所以浅层风格相似性计算在整体loss里占比非常小（ $α = 10^{- 14}$ ）

RAN则是在语义分割encoder-decoder模型中间加入了对抗loss，discriminator用来ASPP的方式逐像素预测该像素属于source domain还是target domain。Encoder部分作为生成器，为了迷惑判别器会逐渐学得丢失appearance，只保留semantic的domain invariant feature，而这正是我们想要的。以此特征进行decoder并逐像素预测类别，能够缓解domain shift问题。

文章针对ANN部分做实验，发现在训练期间用ANN将source domain image转到target domain，然后在测试期间不使用ANN对target domain进行变换，得到的效果最好。
极简笔记 Fully Convolutional Adaptation Networks for Semantic Segmentation

针对RAN部分，作者依次叠加Adaptive Batch Normalization (ABN)，Adversarial Domain Adaptation
(ADA)，使用全卷积网络做decoder（conv），引入ASPP网络，ANN，得到了实验结果如下图。
极简笔记 Fully Convolutional Adaptation Networks for Semantic Segmentation

然后是semi-supervised结果以及和sota domain adaption方法的对比结果。
极简笔记 Fully Convolutional Adaptation Networks for Semantic Segmentation