Glow: Generative Flow with Invertible 11 Convolutions
Diederik P. Kingma, Prafulla Dhariwal
Abstract
flow-based generative models: tractability of the exact log-likelihood, tractability of exact latent-variable inference, and parallelizability of both training and synthesis
Glow: a simple type of generative flow using an invertible 1 1 convolution
capable of efficient realistic-looking synthesis and manipulation of large images
code:https://github.com/openai/glow
Introduction
two major unsolved problems of machine learning:
(1) data-efficiency: the ability to learn from few datapoints, like humans;
(2) generalization: robustness to changes of the task or its context
generative models:
(1) learning realistic world models
(2) learning meaningful features of the input while requiring little or no human supervision or labeling
merits of flow-based generative models:
1. Exact latent-variable inference and log-likelihood evaluation
2. Efficient inference and efficient synthesis
3. Useful latent space for downstream tasks
4. Significant potential for memory savings
Background: Flow-based Generative Models
: high dimensional random vector with unknown true distribution
: i.i.d dataset from
: model
log-likelihood objective(the expected compression cost):
: latent variable with tractable/simple pdf , e.g.
, is bijective,
flow:
根据随机变量函数的概率密度公式,有
若为三角矩阵,则
Proposed Generative Flow
提出的新flow结构如下图所示,其中每一步flow包括一个actnorm, invertible 11 convolution和affine coupling layer,最终得到(b)中的multi-scale architecture,其中为flow深度,为level数
Actnorm: scale and bias layer with data dependent initialization
activation normalizaton, performs an affine transformation of the activations using a scale and bias parameter per channel
initialize: the post-actnorm activations per-channel have zero mean and unit variance given an initial minibatch of data (data dependent initialization)
Invertible 1 1 convolution
initialize the weights as a random rotation matrix, having a log-determinant of 0
: permutation matrix
: lower triangular matrix with ones on the diagonal
: upper triangular matrix with zeros on the diagonal
: vector
Affine Coupling Layers
A powerful reversible transformation where the forward function, the reverse function and the logdeterminant are computationally efficient
Zero initialization: initialize the last convolution of each NN() with zeros, such that each affine coupling layer initially performs an identity function
Split and concatenation: split() splits the input tensor into two halves along the channel dimension, concat() concatenation into a single tensor
Permutation: ensures that after sufficient steps of flow, each dimensions can affect every other dimension
Related Work
GAN
Quantitative Experiments
NN(): conv+relu+conv+relu+conv
3种permutation方法: a reversing operation as described in the RealNVP, a fixed random permutation, and our invertible convolution
Qualitative Experiments
sampling from a reduced-temperature model often results in higher-quality samples,
Synthesis and Interpolation: take a pair of real images, encode them with the encoder, and linearly interpolate between the latents to obtain samples. extremely high quality
Semantic Manipulation: calculate the average latent vector for images with the attribute and for images without, use the difference as a direction for manipulating. 需要标注信息
Conclusion
first likelihood-based model in the literature that can efficiently synthesize high-resolution natural images
比auto encoder不知道高到哪去了