Adversarially Regularized Autoencoders

Kim Y, Zhang K, Rush A M, et al. Adversarially regularized autoencoders[J]. arXiv preprint arXiv:1706.04223, 2017.
GitHub: https://github.com/jakezhaojb/ARAE
adversarially regularized autoencoder (ARAE)

Abstract

Deep latent variable models (也就是VAE、GAN这种由随机变量作种子的模型)比较方便生成连续的样本。当把他们运用在例如文本、离散图片等离散结构上时,将会遇到很大挑战。本文提出了一个灵活的方法来训练 deep latent variable models of discrete structures。

Background and Notation

Discrete Autoencoder

就是把离散序列 encoder 之后再 decoder,通过 softmax 来进行离散
Lrec(ϕ,ψ)=log pψ(xencϕ(x))L_{rec}(\phi,\psi)=-log~p_{\psi}(x|enc_{\phi}(x))

x^=argmaxx pψ(xencϕ(x))\hat{x}=argmax_{x}~p_{\psi}(x|enc_{\phi}(x))

编码器和解码器是一个 problem-specific(特定的问题),一般可以选择 RNN 作为解码器和编码器。

Generative Adversarial Networks

WGAN:
minθmaxwWEzpr[fw(z)]Ez~pz[fw(z~)]min_{\theta}max_{w\in W}E_{z\sim p_r}[f_w(z)]-E_{\tilde{z}\sim p_z}[f_w(\tilde{z})]

weight-clipping w=[ϵ,ϵ]w=[-\epsilon,\epsilon]

Adversarially Regularized Autoencoder

ARAE combines a discrete autoencoder with a GAN-regularized latent representation. 模型如下图所示,学习离散空间PψP_{\psi}。直觉上这种方法用一个更灵活的先验分布提供了一个更平滑的离散编码空间。
【阅读笔记】Adversarially Regularized Autoencoders

模型包含 a discrete autoencoder regularized with a prior distribution,
minϕ,ψLrec(ϕ,ψ)+λ(1)W(PQ,Pz)min_{\phi,\psi}L_{rec}(\phi,\psi)+\lambda^{(1)}W(P_Q,P_z)

其中WW表示离散编码空间PQP_Q(就是xx经过编码后encϕ(x)enc_{\phi}(x)概率空间)和PzP_z的 Wasserstein 距离。模型训练相当于对下面几个目标进行求解:

  • (1) minϕ,ψLrec(ϕ,ψ)=ExPr[log pϕ(xenc(x))]min_{\phi,\psi}L_{rec}(\phi,\psi)=E_{x\sim P_r}[-log~p_{\phi}(x|enc(x))]
  • (2) maxwWLcri=ExPr[fw(encϕ(x))]Ez^Pz[fw(z^)]max_{w\in W}L_{cri}=E_{x\sim P_r}[f_w(enc_{\phi}(x))]-E_{\hat{z}\sim P_z}[f_w(\hat{z})]
  • (3) minϕLenc(ϕ)=ExPr[fw(encϕ(x))]Ez^Pz[fw(z^)]min_{\phi}L_{enc}(\phi)=E_{x\sim P_r}[f_w(enc_{\phi}(x))]-E_{\hat{z}\sim P_z}[f_w(\hat{z})]

(1)为最小化编码解码器的的重构误差、(2)是优化判别器、(3)是优化生成器

经验上我们发现,先验分布PzP_z对结果有很强的影响,最简单的选择是固定的高斯分布N(0,1)N(0,1),但是这种限制很强的条件很容易造成模型的崩溃。我们不固定PzP_z而是通过一个生成器来学习一个从高斯分布N(0,1)N(0,1)PzP_z的映射。

Algorithm 1 ARAE Training
for each training iteration do

  • (1) Train the encoder/decoder for reconstruction KaTeX parse error: Expected 'EOF', got '}' at position 11: (\phi,\psi}̲)
  • Sample {x(i)}i=1mPr\{x^{(i)}\}^m_{i=1}\sim P_r and compute z(i)=encϕ(x(i))z^{(i)}=enc_{\phi}(x^{(i)})
  • Backprop loss, Lrec=1mi=1mlog pψ(x(i)z(i))L_{rec}=−\frac{1}{m}\sum^m_{i=1}log~p_{\psi}(x^{(i)}|z^{(i)})
  • (2) Train the critic (w)(w)
  • Sample {x(i)}i=1mPr\{x^{(i)}\}^m_{i=1}\sim P_r and ${s{(i)}}m_{i=1}\sim N(0, I)
  • Compute $z{(i)}=enc_{\phi}(x{(i)}) and $\hat{z}{(i)}=g_{\theta}(z{(i)})
  • Backprop loss $-\frac{1}{m}\summ_{i=1}f_w(z(i))+frac{1}{m}\summ_{i=1}f_w(\hat{z}{(i)})
  • Clip critic ww to KaTeX parse error: Unexpected character: '' at position 11: [−\epsilon̲, \epsilon]
  • (3) Train the encoder/generator adversarially (ϕ,θ)(\phi, \theta)
  • Sample {x(i)}i=1mPr\{x^{(i)}\}^m_{i=1}\sim P_r and {s(i)}i=1mN(0,I)\{s^{(i)}\}^m_{i=1}\sim N(0, I)
  • Compute z(i)=encϕ(x(i))z^{(i)}=enc_{\phi}(x^{(i)}) and z^(i)=gθ(s(i))\hat{z}^{(i)}=g_{\theta}(s^{(i)}).
  • Backprop loss 1mmi=1fw(z(i))1mi=1mfw(z^(i))\frac{1}{m}\sum_m^{i=1} f_w(z^{(i)})− \frac{1}{m}\sum^m_{i=1} f_w(\hat{z}^{(i)})
    end for

Extension: Unaligned Transfer

考虑对齐问题,对解码器增加一个条件变为pψ(xz,y)p_{\psi}(x|z,y)(没看太明白,以后看代码看看能看明白不),最优化时考虑分类误差
minϕ,ψLrec(ϕ,ψ)+λ(1)W(PQ,Pz)λ(2)Lclass(ϕ,u)min_{\phi,\psi}L_{rec}(\phi,\psi)+\lambda^{(1)}W(P_Q,P_z)-\lambda^{(2)}L_{class}(\phi,u)

本文中λ(2)=1\lambda^{(2)}=1,并且需要在训练时增加两个步骤:(2b) 训练分类器、(3b)为分类器训练解码器

Algorithm 2 ARAE Transfer Extension
Each loop additionally:

  • (2b) Train attribute classifier (u)(u)
  • Sample {x(i)}i=1mPr\{x^{(i)}\}^m_{i=1}\sim P_r, lookup $y^{(i)} , and compute z(i)=encϕ(x(i))z^(i)=enc_{\phi}(x^{(i)})
  • Backprop loss 1mi=1mlog pu(y(i)z(i))−\frac{1}{m}\sum^m_{i=1}log~p_u(y^{(i)}|z^{(i)})
  • (3b) Train the encoder adversarially (ϕ)(\phi)
  • Sample {x(i)}i=1mPr\{x^{(i)}\}^m_{i=1}\sim P_r, lookup $y^{(i)} , and compute z(i)=encϕ(x(i))z^(i)=enc_{\phi}(x^{(i)})
  • Backprop loss 1mi=1mlog pu(1y(i)z(i))−\frac{1}{m}\sum^m_{i=1}log~p_u(1-y^{(i)}|z^{(i)})

Theoretical Properties

在标准的 GAN 中,我们隐式的减小真实分布和模型分布。在本文的情况中,我的理解是隐式的最小化 embedding 空间的真实分布和模型分布,并且最小化模型分布PrP_r和隐变量分布pψ=zpψ(xz)p(z)dzp_{\psi}=\int_zp_{\psi}(x|z)p(z)dz
略去一些很数学的证明

Experiments

【阅读笔记】Adversarially Regularized Autoencoders
【阅读笔记】Adversarially Regularized Autoencoders

其他

看 github 上作者把 WGAN 方法更新为 WGAN-UP。

相关文章:

  • 2021-12-22
  • 2021-12-06
  • 2021-12-02
  • 2021-05-15
猜你喜欢
  • 2021-04-10
  • 2021-12-29
  • 2021-05-02
  • 2021-06-13
  • 2021-06-22
  • 2021-07-07
  • 2021-10-21
相关资源
相似解决方案