Deep Learning for Visual Manipulation and Synthesis

Similar documents
Alternatives to Direct Supervision

Unsupervised Learning

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin

Learning to Generate Images

Generative Adversarial Text to Image Synthesis

arxiv: v1 [cs.cv] 17 Nov 2016

GAN and Feature Representation. Hung-yi Lee

Generative Networks. James Hays Computer Vision

Generative Adversarial Networks (GANs)

Generative Models II. Phillip Isola, MIT, OpenAI DLSS 7/27/18

Deep Fakes using Generative Adversarial Networks (GAN)

Introduction to GANs

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

Two Routes for Image to Image Translation: Rule based vs. Learning based. Minglun Gong, Memorial Univ. Collaboration with Mr.

arxiv: v1 [cs.cv] 5 Jul 2017

Inverting The Generator Of A Generative Adversarial Network

Stochastic Simulation with Generative Adversarial Networks

Progress on Generative Adversarial Networks

Image Restoration with Deep Generative Models

Visual Recommender System with Adversarial Generator-Encoder Networks

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

Adversarial Machine Learning

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Introduction to Generative Adversarial Networks

Generative Adversarial Network

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

arxiv: v1 [cs.cv] 12 Sep 2016

Learning to generate 3D shapes

Spatial Localization and Detection. Lecture 8-1

Toward Multimodal Image-to-Image Translation

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Structured Prediction using Convolutional Neural Networks

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

arxiv: v1 [cs.cv] 1 Nov 2018

What was Monet seeing while painting? Translating artworks to photo-realistic images M. Tomei, L. Baraldi, M. Cornia, R. Cucchiara

A GAN framework for Instance Segmentation using the Mutex Watershed Algorithm

Fully Convolutional Networks for Semantic Segmentation

Lecture 7: Semantic Segmentation

GENERATIVE ADVERSARIAL NETWORK-BASED VIR-

Learning a Discriminative Model for the Perception of Realism in Composite Images

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Learning to generate with adversarial networks

arxiv: v2 [cs.cv] 13 Jun 2017

Transfer Learning. Style Transfer in Deep Learning

POINT CLOUD DEEP LEARNING

Deep generative models of natural images

Yiqi Yan. May 10, 2017

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao

Single Image Super-resolution. Slides from Libin Geoffrey Sun and James Hays

Bridging Theory and Practice of GANs

arxiv: v1 [eess.sp] 23 Oct 2018

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Introduction to Generative Models (and GANs)

Photo-realistic Renderings for Machines Seong-heum Kim

Learning from 3D Data

Rich feature hierarchies for accurate object detection and semantic segmentation

Deep Generative Models Variational Autoencoders

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Amortised MAP Inference for Image Super-resolution. Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017

Controllable Generative Adversarial Network

arxiv: v2 [cs.cv] 23 Dec 2017

Deep Manga Colorization with Color Style Extraction by Conditional Adversarially Learned Inference

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

CS230: Lecture 4 Attacking Networks with Adversarial Examples - Generative Adversarial Networks

Adversarially Learned Inference

SYNTHESIS OF IMAGES BY TWO-STAGE GENERATIVE ADVERSARIAL NETWORKS. Qiang Huang, Philip J.B. Jackson, Mark D. Plumbley, Wenwu Wang

DCGANs for image super-resolution, denoising and debluring

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Multi-view 3D Models from Single Images with a Convolutional Network

GAN Related Works. CVPR 2018 & Selective Works in ICML and NIPS. Zhifei Zhang

Pixel-level Generative Model

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Project 3 Q&A. Jonathan Krause

Martian lava field, NASA, Wikipedia

Introduction to Generative Adversarial Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Paired 3D Model Generation with Conditional Generative Adversarial Networks

Part-based and local feature models for generic object recognition

Texture. CS 419 Slides by Ali Farhadi

TextureGAN: Controlling Deep Image Synthesis with Texture Patches

Generative Adversarial Nets. Priyanka Mehta Sudhanshu Srivastava

GAN Frontiers/Related Methods

arxiv: v3 [cs.cv] 30 Mar 2018

Image Composition. COS 526 Princeton University

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

arxiv: v2 [cs.cv] 2 Dec 2017

arxiv: v1 [cs.cv] 16 Nov 2017

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

arxiv: v2 [cs.cv] 1 Dec 2017

Generating Images from Captions with Attention. Elman Mansimov Emilio Parisotto Jimmy Lei Ba Ruslan Salakhutdinov

arxiv: v2 [cs.cv] 26 Mar 2017

Generative Adversarial Networks (GANs) Based on slides from Ian Goodfellow s NIPS 2016 tutorial

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

Generative Face Completion

Feature Visualization

Transcription:

Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley 2017/01/11 @ VALSE

What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay close to the input. satisfy user s constraint. [Schaefer et al. 2006]

Sketch2Photo [Tao et al. 2009] What is Visual Synthesis? Image Generation Program user input result Desired output: satisfy user s constraint.

So far so good

Things can get really bad The lack of safety wheels

Adding the safety wheels Image Editing Program Input Photo User Input A desired output: stay close to the input. satisfy user s constraint. Lie on the natural image manifold Output Result Natural Image Manifold

Prior work: Heuristic-based Gradient [Perez et al. 2003] Bleeding artifacts [Tao et al. 2010] Color [Reinhard et al. 2004] Color and Texture [Johnson et al. 2011]

Prior work: Discriminative Learning Natural Human Motion (34 subjects) [Ren et al. 2005] Image Compositing (20 images) [Xue et al. 2012] Image Deblurring (40 images) [Liu et al. 2013]

Our Goal: - Learn the manifold of natural images without direct human annotations. - Improve visual manipulation and synthesis by constraining the result to lie on that learned manifold.

Why Deep Learning Methods? Impressive results on visual recognition. Classification, detection, segmentation,3d vision, videos, etc. No feature engineering. Recent development of generative models. (e.g. Generative Adversarial Networks)

Deep Learning trends: performance

Deep Learning trends: research AlexNet [Krizhevsky et al.] ImageNet [Jia et al.]

Discriminative Model M: {x P real x = 1} [ICCV 15 ] Realism CNN Predict Realism Improve Editing Image Editing Model Generative Model M: {x x = G z } [SIGGRAPH 14 ] Project Editing UI Edit Transfer [ECCV 16 ]

Discriminative Model M: {x P real x = 1} [ICCV 15 ] Realism CNN Predict Realism Improve Editing Image Editing Model Foreground Object F Background B Image Composite I

Learning Visual Realism CNN Training Composite images Classifying Natural Photos 25K natural photos vs. 25k composite images

How do we get composite images? Target Object Composite Images Object Mask Object Masks with Similar Shapes Object Mask: (1) Human Annotation (2) Object Proposal [Lalonde and Efros 2007]

Ranking of Training Composites Most realistic composites Least realistic composites

Evaluation Dataset [Lalonde and Efros 2007] Task: binary classification 360 realistic photos (natural images + realistic composites) 360 unrealistic photos Area under ROC Curve Methods without object mask Lalonde and Efros (no mask) 0.61 AlexNet + SVM 0.73 RealismCNN 0.84 RealismCNN + SVM 0. 88 Human 0.91 Methods using object mask Reinhard et al. 0.66 Lalonde and Efros (with mask) 0.81

Visual Realism Ranking Least Realistic Most Realistic Snowy Mountain Highway Ocean Red: unrealistic composite, Green: realistic composite, Blue: natural image

Our Pipeline Realism CNN Predict Realism Improve Composites Image Editing Model

Improving Visual Realism Editing model: Color adjustment g Foreground object F Realism CNN Original Composite (Realism score: 0.0) Improved Composite (Realism score: 0.8) E(g, F) = E CNN + E reg Quasi-Newton (L-BFGS)

Selecting Suitable Objects Best-fitting object selected by RealismCNN Object with most similar shape

Optimizing Color Compatibility Object mask Cut-n-paste Lalonde et al. Xue et al. Ours

Sanity Check: Real Photos Object mask Cut-n-paste Lalonde et al. Xue et al. Ours

E = 50.73 9.38 5.05 3.44 3.00 Visualizing and Localizing Errors ( E I p ) Number of L-BFGS iterations Result Gradient Map

Discriminative Model {x P real x = 1} Pros: CNN is easy to train. Graphics programs often produce better images than generative models. General framework for many tasks (e.g. deblurring, retargeting, etc.) Cons: Task-specific: cannot apply pre-trained model to other tasks. Graphics programs are often non-parametric and non-differentiable. Graphics programs often require user in the loop: thus automatically generating results for CNN training is challenging. Code: github.com/junyanz/realismcnn Data: people.eecs.berkeley.edu/~junyanz/projects/realism/

Discriminative Model M: {x P real x = 1} [ICCV 15 ] Realism CNN Predict Realism Improve Editing Image Editing Model Generative Model M: {x x = G z } [SIGGRAPH 14 ] Project Editing UI Edit Transfer [ECCV 16 ]

Learning Natural Image Manifold Deep generative models: Generative Adversarial Network (GAN) [Goodfellow et al. 2014] [Radford et al. 2015] [Denton et al 2015] Variational Auto-Encoder (VAE) [Kingma and Welling 2013] DRAW (Recurrent Neural Network) [Gregor et al 2015] Pixel RNN and Pixel CNN ([Oord et al 2016])

Image Classification via Neural Network Cat Input image I Slides credit: Andrew Owens

Can We Generate Images with Neural Networks? Gaussian noise Image or Random Distribution

Generative Adversarial Networks (GAN) Generative Model Synthesized image [Goodfellow et al. 2014]

Generative Adversarial Networks (GAN) Generative Model Discriminative Model real [Goodfellow et al. 2014]

Generative Adversarial Networks (GAN) Generative Model Discriminative Model fake [Goodfellow et al. 2014]

Cat Generation (w.r.t. training iterations

GAN as Manifold Approximation Sample training images from Amazon Shirts Random image samples from Generator G(z) [Radford et al. 2015]

Traverse on the GAN Manifold G(z 0 ) Linear Interpolation in z space: G(z 0 + t (z 1 z 0 )) G(z 1 ) Limitations: not photo-realistic enough, low resolution produce images randomly, no user control [Radford et al. 2015]

Overview original photo Project Editing UI different degree of image manipulation Edit Transfer projection on manifold transition between the original and edited projection

Overview original photo Project Editing UI different degree of image manipulation Edit Transfer projection on manifold transition between the original and edited projection

Projecting an Image onto the Manifold Input: real image x R Output: latent vector z Optimization Reconstruction loss L Generative model G(z) 0.196 0.238 0.332

Projecting an Image onto the Manifold Input: real image x R Output: latent vector z Optimization Inverting Network z = P x 0.196 0.238 0.332 Auto-encoder with a fixed decoder G 0.218 0.242 0.336

Projecting an Image onto the Manifold Input: real image x R Output: latent vector z Optimization Inverting Network z = P x 0.196 0.238 0.332 Hybrid Method Use the network as initialization 0.218 0.242 0.336 for the optimization problem 0.268 0.153 0.167

Overview original photo Project Editing UI different degree of image manipulation Edit Transfer projection on manifold transition between the original and edited projection

Manipulating the Latent Vector constraint violation loss L g user guidance image Objective: Guidance v g G(z) z 0

Overview original photo Project Editing UI different degree of image manipulation Edit Transfer projection on manifold transition between the original and edited projection

Edit Transfer Motion (u, v)+ Color (A 3 4 ): estimate per-pixel geometric and color variation G(z 0 ) Linear Interpolation in z space G(z 1 ) Input

Edit Transfer Motion (u, v)+ Color (A 3 4 ): estimate per-pixel geometric and color variation G(z 0 ) Linear Interpolation in z space G(z 1 ) Input

Edit Transfer Motion (u, v)+ Color (A 3 4 ): estimate per-pixel geometric and color variation G(z 0 ) Linear Interpolation in z space G(z 1 ) Input Result

Image Manipulation Demo

Image Manipulation Demo

Designing Products

Interactive Image Generation

The Simplest Generative Model: Averaging AverageExplorer: {x x = n w n I n warp } Generative model: weighted average of warped images. Limitations: cannot synthesize novel content. [Zhu et al. 2014]

Generative Image Transformation

igan (aka. interactive GAN) Get the code: github.com/junyanz/igan Intelligent drawing tools via GAN. Debugging tools for understanding and visualizing deep generative networks. Work in progress: supporting more models (GAN, VAE, theano/tensorflow).

Generative Model {x x = G z, z Z} Pros: Task-independent: offline generative model training is independent of the graphics applications. Optimizing z is easier than optimizing x. Generative models are better and better. Cons: Low quality, low res => post-processing (still engineering work). Limitations of current generative models: cannot produce good texture.

Related work on GAN Goodfellow s NIPS 2016 Tutorial: [arxiv], [slides] Early work: [Tu 07 ], [Gutmann and Hyvarinen 10 ], etc. New models: InfoGAN, SSGAN, VAE-GAN, LAPGAN, BiGAN, CoGAN, PPGAN, etc. Training techniques: DCGAN, Improved-GAN, EBGAN, Unrolling. Image: Inpainting, Inverting features, Style Transfer, Text-To- Image, super-resolution, etc. Video: Frame Prediction, Tiny Videos, etc.

Image-to-Image Translation arxiv 2016 code: github.com/phillipi/pix2pix Image-to-Image Translation with Conditional Adversarial Nets Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Arxiv 2016

Image-to-Image Problems

Conditional Adversarial Networks (cgan) Loss: L1+GAN G:U-Net D: PatchGAN (70 70) U-Net [Ronneberger et al. 15 ]

Conditional GAN

Network and Loss Function Loss function: L1 + GAN Generator G: U-Net Discriminator D: PatchGAN 70 70 (FCN) U-Net [Ronneberger et al. 15 ]

Different Losses

Architectures for Generator G

Patch Size of PatchGAN

Applications

Label Facade

Label Street View

Map Generation

Day night

Edge Handbag HED [Xie and Tu. 15 ]

Edge Shoe HED [Xie and Tu. 15 ]

User Sketch Photo

Automatic Colorization

Failure Cases - Sparse input image. - Unusual input image.

Summary: Image-to-Image Problems

Cat Paper Collection GitHub: github.com/junyanz/catpapers 90% data is visual; most of visual data are about Cats. 60+ vision, learning and graphics papers.

Thank You! Eli Yong Jae Philipp Alyosha Philipp Tinghui