视频超分辨率论文笔记：Deep Back-Projection Networks For Super-Resolution

Deep Back-Projection Networks For Super-Resolution：CVPR 2018

paper：https://arxiv.org/pdf/1803.02735v1.pdf
code：https://github.com/alterzero/DBPN-Pytorch

1. Relative work

intro里边提到目前图像SR（超分辨率）的DL模型的四种。分别是
1. Predefined upsampling：在进行特征提取前就将图像插值
2. Single upsampling：在输出前进行上采样（采用sub-pixel比较好？）
3. Progressive upsampling：在特征提取过程中逐步上采样
4. Iterative up and downsampling：迭代式的上采样及下采样（本文）
2. Method

核心：残差学习！！！！！

2.1 Up-projection unit
- 公式化
  $\quad H_{0}^{t}=\left(L^{t-1} * p_{t}\right) \uparrow_{s} \ \ \ \ (1)$
  $\quad L_{0}^{t}=\left(H_{0}^{t} * g_{t}\right) \downarrow_{s} \ \ \ \ (2)$
  $\quad e_{t}^{l}=L_{0}^{t}-L^{t-1} \ \ \ \ (3)$
  $\quad H_{1}^{t}=\left(e_{t}^{l} * q_{t}\right) \uparrow_{s} \ \ \ \ (4)$
  $\quad H^{t}=H_{0}^{t}+H_{1}^{t} \ \ \ \ (5)$
$L^{t-1}$ 就是上一个Down-projection unit的输出，是当前迭代LR（low-resolution）的原型特征图，将其卷积并上采样得到当前迭代的HR（high-resolution）特征图原型 $H_{0}^{t}$
得到HR特征图原型 $H_{0}^{t}$ 以后怎么得到当前迭代的最终版本的HR特征图呢？答案是学习残差，学习原型 $H_{0}^{t}$ 到最终版本H^{t}的残差 $H^{t}_1$
怎么学习残差呢？首先将HR原型 $H_{0}^{t}$ 卷积并下采样得到增强版本的LR特征图 $L_{0}^{t}$ 。
将增强版本的LR特征图 $L_{0}^{t}$ 与LR原型 $L^{t-1}$ 相减得到LR的残差 $e_{t}^{l}$
自然地，将LR残差 $e_{t}^{l}$ 卷积并上采样便得到了HR残差 $H_{1}^{t}$ 。得到HR残差以后，done

2.2 Down-projection unit

公式化
$L_{0}^{t}=\left(H^{t} * g_{t}^{\prime}\right) \downarrow_{s} \ \ \ \ (6)$
$H_{0}^{t}=\left(L_{0}^{t} * p_{t}^{\prime}\right) \uparrow_{s} \ \ \ \ (7)$
$e_{t}^{h}=H_{0}^{t}-H^{t} \ \ \ \ (8)$
$L_{1}^{t}=\left(e_{t}^{h} * g_{t}^{\prime}\right) \downarrow_{s} \ \ \ \ (9)$
$L^{t}=L_{0}^{t}+L_{1}^{t} \ \ \ \ (10)$
down-projection unit的目的是根据上一次迭代的HR特征得到表征力更强的LR特征为下一次的up-projection做准备
$H^{t}$ 就是上一个Up-projection unit的输出，是当前迭代HR（low-resolution）的原型特征图，将其卷积并下采样 $L_{0}^{t}$ 得到当前迭代的LR（high-resolution）特征图原型
得到LR特征图原型 $L_{0}^{t}$ 以后怎么得到当前迭代的最终版本的LR特征图呢？答案是学习残差，学习原型 $L_{0}^{t}$ 到最终版本H^{t}的残差 $L^{t}_1$
怎么学习残差呢？首先将LR原型 $L_{0}^{t}$ 卷积并上采样得到增强版本的HR特征图 $H_{0}^{t}$ 。
将增强版本的HR特征图 $H_{0}^{t}$ 与HR原型 $H^{t}$ 相减得到HR的残差 $e_{t}^{h}$
自然地，将HR残差 $e_{t}^{h}$ 卷积并下采样便得到了LR残差 $L_{1}^{t}$ 。得到LR残差以后，done
显然，是up-projection unit的一个逆过程

2.3 Dense Up-projection Unit

视频超分辨率论文笔记：Deep Back-Projection Networks For Super-Resolution

Dense Up-projection Unit与Up-projection Unit的唯一区别在于输入。Dense Up-projection Unit的输入为浅层的每一个Down-projection unit的输出concat后的特征图。concat以后经过1*1的卷积层降维，再进入一个普通的Up-projection Unit。
Dense Down-projection Unit同理

2.4 Network Architecture

视频超分辨率论文笔记：Deep Back-Projection Networks For Super-Resolution

Initial feature extraction：通过两个3*3的卷积层进行初始化的特征提取
Back-projection stages：交替式的Up-projection Unit和Down projection Unit积木
Reconstruction：将所有Up-projection Unit的输出HR特征图concat起来然后进入一个3*3的卷积层，得到最后的HR图片

2.5 Details

使用大卷积核
- 2x enlargement：conv 6*6 filters，strides 2 ，padding 2
- 4x enlargement：conv 8*8 filters，strides 4 ，padding 2
- 8x enlargement：conv 12*12 filters，strides 8 ，padding 2
抛弃BatchNorm以及Dropout
Optimizer：adam with momentum 0.9 and weight decay 1e-4
Loss：MSE

2.6 Supplement

本文源码中采用ConvTranspose2d上采样，也可以采用Sub-Pixel（pytorch中为nn.PixelShuffle）进行上采样
Sub-Pixel由Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network 提出

Deep Back-Projection Networks For Super-Resolution：CVPR 2018

1. Relative work

2. Method

核心：残差学习！！！！！

2.1 Up-projection unit

2.2 Down-projection unit

2.3 Dense Up-projection Unit

2.4 Network Architecture

2.5 Details

2.6 Supplement