Glow: Generative Flow with Invertible 1×1 Convolutions

Diederik P. Kingma, Prafulla Dhariwal

Abstract

flow-based generative models: tractability of the exact log-likelihood, tractability of exact latent-variable inference, and parallelizability of both training and synthesis
Glow: a simple type of generative flow using an invertible 1 × 1 convolution
capable of efficient realistic-looking synthesis and manipulation of large images
code:https://github.com/openai/glow

Introduction

two major unsolved problems of machine learning:
(1) data-efficiency: the ability to learn from few datapoints, like humans;
(2) generalization: robustness to changes of the task or its context

generative models:
(1) learning realistic world models
(2) learning meaningful features of the input while requiring little or no human supervision or labeling

generative model{likelihood-based methods{autoregressive modelVAEflow-based generative modelGAN

merits of flow-based generative models:
1. Exact latent-variable inference and log-likelihood evaluation
2. Efficient inference and efficient synthesis
3. Useful latent space for downstream tasks
4. Significant potential for memory savings

Background: Flow-based Generative Models

x: high dimensional random vector with unknown true distribution p(x)
D: i.i.d dataset from p(x)
pθ(x): model
log-likelihood objective(the expected compression cost):

minθL(D)=E(logpθ(x))=1|D|xDlogpθ(x)

z: latent variable with tractable/simple pdf pθ(z), e.g. N(μ,Σ)
x=gθ(z), gθ is bijective, gθ1=fθ
f=f1f2...fK,h1=f1(x),h2=f2(h1),...,z=fK(hK1)
flow: xf1h1f2h2...fKz
根据随机变量函数的概率密度公式,有

logpθ(x)=logpθ(z)+i=1Klog|det(fihi1)|

fihi1为三角矩阵,则det(fihi1)=diag(fihi1)

Proposed Generative Flow

提出的新flow结构如下图所示,其中每一步flow包括一个actnorm, invertible 1×1 convolution和affine coupling layer,最终得到(b)中的multi-scale architecture,其中K为flow深度,L为level数
Glow: Generative Flow with Invertible 1x1 Convolutions
Glow: Generative Flow with Invertible 1x1 Convolutions

Actnorm: scale and bias layer with data dependent initialization

activation normalizaton, performs an affine transformation of the activations using a scale and bias parameter per channel
initialize: the post-actnorm activations per-channel have zero mean and unit variance given an initial minibatch of data (data dependent initialization)

Invertible 1 × 1 convolution

hRh×w×c,WR1×1×c×c

log|det(conv2d(h;W)h)|=hwlog|det(W)|

det(W):O(c3)
initialize the weights W as a random rotation matrix, having a log-determinant of 0

W=PL(U+diag(s))

P: permutation matrix
L: lower triangular matrix with ones on the diagonal
U: upper triangular matrix with zeros on the diagonal
s: vector
log|det(W)|=sum(log|s|)

Affine Coupling Layers

A powerful reversible transformation where the forward function, the reverse function and the logdeterminant are computationally efficient
Zero initialization: initialize the last convolution of each NN() with zeros, such that each affine coupling layer initially performs an identity function
Split and concatenation: split() splits h the input tensor into two halves along the channel dimension, concat() concatenation into a single tensor
Permutation: ensures that after sufficient steps of flow, each dimensions can affect every other dimension

Related Work

GAN

Quantitative Experiments

NN(): conv3×3×512+relu+conv1×1×512+relu+conv1×1×3
K=32,L=3
3种permutation方法: a reversing operation as described in the RealNVP, a fixed random permutation, and our invertible 1×1 convolution

Qualitative Experiments

K=32,L=6
sampling from a reduced-temperature model often results in higher-quality samples, pθ,T(x)pθT2(x)
Synthesis and Interpolation: take a pair of real images, encode them with the encoder, and linearly interpolate between the latents to obtain samples. extremely high quality
Semantic Manipulation: calculate the average latent vector zpos for images with the attribute and zneg for images without, use the difference (zposzneg) as a direction for manipulating. 需要标注信息

Conclusion

first likelihood-based model in the literature that can efficiently synthesize high-resolution natural images
比auto encoder不知道高到哪去了

相关文章: