[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

Google Research

Paper name aimed two point: 1. Synthetic Depth-of-Field

2. Single-Camera Mobile Phone

Key Words:

Shallow depth-of-field: When the depth of field is small, or shallow, the image background and foreground are blurred, with only a small area in focus

dual-pixel:

Dual pixel technology effectively divides every single pixel into two separate photo sites. Each pixel consists of two photodiodes that sit side by side next to each other under a micro lens.

alpha matte:

A matte is a layer (or any of its channels) that defines the transparent areas of that layer or another layer.

Bayer plane:

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

Background:

cell phone's carema is all-in-focus images.

AIM:

We present a system to computationally synthesize shallow depth-of-field images with a single mobile camera and a single button press.

Result:

Our system can process a 5.4 megapixel image in 4 seconds on a mobile phone, is fully automatic, and is robust enough to be used by non-experts.

1. Introduction

Some methods: a) two cameras

b) time-of-flight or structured-light direct depth sensor

c) Lens Blur

Our method:

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

Our system combines two different technologies and is able to function with only one of them. The first is a neural network trained to segment out people and their accessories.Second, if available, we use a sensor with dual-pixel (DP) auto-focus hardware, which effectively gives us a 2-sample light field with a narrow ∼1 millimeter baseline.

The first :

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

The second:

---<<Burst photography for high dynamic range and low-light imaging on mobile cameras.>>

---Robert Anderson, David Gallup, Jonathan T Barron, Janne Kontkanen, Noah Snavely, Carlos Hernández, Sameer Agarwal, and Steven M Seitz. 2016. Jump: Virtual Reality Video. SIGGRAPH Asia (2016).

--Jonathan T Barron, Andrew Adams, YiChang Shih, and Carlos Hernández. 2015. Fast bilateral-space stereo for synthetic defocus.

--Johannes Kopf, Michael F Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint bilateral upsampling. ACM TOG (2007).

We present a calibration procedure.

summary:

Our rendering technique divides the scene into several layers at different disparities, splats pixels to translucent disks according to disparity and then composites the different layers weighted by the actual disparity.

Another question:

The wide field-of-view of a typical mobile camera is ill-suited for portraiture. It causes a photographer to stand near subjects leading to unflattering perspective distortion of their faces.

2. Related work

--Carlos Hernández. 2014. Lens Blur in the new Google Camera app. http://research. googleblog.com/2014/04/lens-blur-in-new-google-camera-app.html.

---etc.

3. Person segmentation

Our contributions include: (a) training and data collection methodologies to train a fast and accurate segmentation model capable of running on a mobile device, and (b) edge-aware filtering to upsample the mask predicted by the neural network.

3.1 Data Collection

choosing a wide enough variety of poses, discarding poor training images, cleaning up inaccurate polygon masks, etc.

With each improvement we made over a 9- month period in our training data, we observed the quality of our defocused portraits to improve commensurately.

3.2 Training

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

The network takes as input a 4 channel 256 × 256 image, where 3 of the channels correspond to the RGB image resized and padded to 256 × 256 resolution preserving the aspect ratio. The fourth channel encodes the location of the face as a posterior distribution of an isotropic Gaussian centered on the face detection box with a standard deviation of 21 pixels and scaled to be 1 at the mean location.

3.3 Inference

At inference time, we are provided with an RGB image and face rectangles output by a face detector.Our model is trained to predict the segmentation mask corresponding to the face location in the input.

3.4 Edge-Aware Filtering of a Segmentation Mask

Using the prior that mask boundaries are often aligned with image edges, we use

an edge-aware filtering approach to upsample the low resolution mask M(x) predicted by the network.

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

3.5 Accuracy and Efficiency

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

3.07 Giga-flops compared to 607 for PortraitFCN+ and 3160 for Mask-RCNN as measured using the Tensorflow Model Benchmark Tool.

4. Depth from dual-pixel camera

Dual-pixel (DP) auto-focus systems work by splitting pixels in half, such that the left half integrates light over the right half of the aperture and vice versa。

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

This system is normally used for autofocus, where it is sometimes called phase-detection auto-focus.

Some techniques can compute detph but need two more than two views.

EXP:

---Edward H Adelson and John YA Wang. 1992. Single lens stereo with a plenoptic camera.

TPAMI (1992)

We build upon the stereo work of Barron et al.

---JonathanTBarron,AndrewAdams,YiChangShih,andCarlosHernández.2015. Fast bilateral-spacestereoforsyntheticdefocus. CVPR (2015).

We therefore build upon the stereowork of Barron et al. [2015] and the edge-aware flow work of Anderson et al. [2016] to construct a stereo algorithm that is both tractable at high resolution and wellsuited to the defocus task by virtue of following the edges in the input image.

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

4.1 Computing Disparity

To get multiple frames for denoising, we keep a circular buffer of the last nine raw and DP frames captured by the camera.

To compute disparity, we take each non-overlapping 8 × 8 tile in the first view and search a range of −3 pixels to 3 pixels in the second view at DP resolution.

Several heuristics: the value of the SSD loss, the magnitude of the horizontal gradients in the tile, the presence of a close second minimum, and the agreement of disparities in neighboring tiles.

4.2 Imaging Model and Calibration

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

This equation has two notable consequences. First, disparity depends on focus distance (z) and is zero when depth is equal to focus distance (D = z). Second, there is a linear relationship between inverse depth and disparity that does not vary spatially.

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone

4.3 Combining Disparity and Segmentation

4.4 Edge-Aware Filtering of Disparity

We use the bilateral solver [Barron and Poole 2016] to turn the noisy disparities into a smooth edge-aware disparity map suitable for shallow depth-of-field rendering.

5 RENDERING

5.1 Precomputing the blur parameters

[paper] 00035 Synthetic Depth-of-Field with a Single-Camera Mobile Phone