SinGAN: Learning a Generative Model from a Single Natural Image（ICCV19）

2. Method

学习的目标是an unconditional generative model that captures the internal statistics of a single training image $x$

不同于纹理生成（texture generation），本文针对的图像都是general natural images

2.1. Multi-scale architecture

SinGAN: Learning a Generative Model from a Single Natural Image（ICCV19）
对于输入图像 $x$ 的pyramid $\left \{ x_0,\cdots,x_N \right \}$ ，对应各自的生成器 $\left \{ G_0,\cdots,G_N \right \}$ ，其中 $x_n$ 是将 $x$ 尺寸缩小 $r^n$ 倍的图像， $r\gt1$ 是一个超参数，每一个 $G_n$ 对应一个判别器 $D_n$

训练首先从 $x_N$ 这一尺寸开始， $G_N$ 将高斯白噪声 $z_N$ 转换为图像 $\tilde{x}_N$
$\tilde{x}_N=G_N(z_N) \qquad(1)$
$\tilde{x}_N$ 包含了图像的general layout以及object的global structure，后续的 $G_n(n\lt N)$ 逐渐地增加各种细节

如Figure 5所示， $G_n$ 接收的输入有2个，1是高斯白噪声 $z_n$ ，2是上一个尺度生成图像的上采样版本 $\left ( \tilde{x}_{n+1} \right )\uparrow^r$
$\tilde{x}_n=G_n\left ( z_n, \left ( \tilde{x}_{n+1} \right )\uparrow \right ), \quad n\lt N \qquad(2)$
SinGAN: Learning a Generative Model from a Single Natural Image（ICCV19）
更具体来说， $G_n$ 执行的操作如下，是一种残差的操作
$\tilde{x}_n=\left ( \tilde{x}_{n+1} \right )\uparrow^r+\psi_n\left ( z_n+\left ( \tilde{x}_{n+1} \right )\uparrow^r \right ) \qquad(3)$
其中 $\psi_n$ 是一个ConvNet，包含了5个block，每个block是Conv(3x3)-BatchNorm-LeakyReLU

2.2. Training

训练是从coarsest scale到finest scale，每一个GAN在训练好之后，就保持fixed状态

对于第 $n$ 个GAN，损失函数包括adversarial term以及reconstruction term
$\underset{G_n}{\min}\ \underset{D_n}{\max}\ \mathcal{L}_{adv}(G_n,D_n)+\alpha\mathcal{L}_{rec}(G_n) \qquad(4)$

Adversarial loss
使用WGAN-GP loss

Reconstruction loss
必须保证存在一组noise，能够重构出原始图像 $x$
因此事先选取一组 $\left \{ z_N^{rec},z_{N-1}^{rec},\cdots,z_0^{rec} \right \}=\left \{ z^*,0,\cdots,0 \right \}$ ，生成得到 $\left \{ \tilde{x}_N^{rec},\tilde{x}_{N-1}^{rec},\cdots,\tilde{x}_0^{rec} \right \}$

于是对于 $n\lt N$
$\mathcal{L}_{rec}=\left \| G_n\left ( 0,\left ( \tilde{x}_{n+1}^{rec} \right )\uparrow^r \right ) -x_n\right \|^2 \qquad(5)$
对于 $n=N$ ， $\mathcal{L}_{rec}=\left \| G_N(z^*)-x_N \right \|^2$