Deep Learning for Visual Manipulation and Synthesis

Size: px

Start display at page:

Download "Deep Learning for Visual Manipulation and Synthesis"

George Joseph
6 years ago
Views:

1 Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley VALSE

2 What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay close to the input. satisfy user s constraint. [Schaefer et al. 2006]

3 Sketch2Photo [Tao et al. 2009] What is Visual Synthesis? Image Generation Program user input result Desired output: satisfy user s constraint.

4 So far so good

5 Things can get really bad The lack of safety wheels

6 Adding the safety wheels Image Editing Program Input Photo User Input A desired output: stay close to the input. satisfy user s constraint. Lie on the natural image manifold Output Result Natural Image Manifold

7 Prior work: Heuristic-based Gradient [Perez et al. 2003] Bleeding artifacts [Tao et al. 2010] Color [Reinhard et al. 2004] Color and Texture [Johnson et al. 2011]

8 Prior work: Discriminative Learning Natural Human Motion (34 subjects) [Ren et al. 2005] Image Compositing (20 images) [Xue et al. 2012] Image Deblurring (40 images) [Liu et al. 2013]

9 Our Goal: - Learn the manifold of natural images without direct human annotations. - Improve visual manipulation and synthesis by constraining the result to lie on that learned manifold.

10 Why Deep Learning Methods? Impressive results on visual recognition. Classification, detection, segmentation,3d vision, videos, etc. No feature engineering. Recent development of generative models. (e.g. Generative Adversarial Networks)

11 Deep Learning trends: performance

12 Deep Learning trends: research AlexNet [Krizhevsky et al.] ImageNet [Jia et al.]

13 Discriminative Model M: {x P real x = 1} [ICCV 15 ] Realism CNN Predict Realism Improve Editing Image Editing Model Generative Model M: {x x = G z } [SIGGRAPH 14 ] Project Editing UI Edit Transfer [ECCV 16 ]

14 Discriminative Model M: {x P real x = 1} [ICCV 15 ] Realism CNN Predict Realism Improve Editing Image Editing Model Foreground Object F Background B Image Composite I

15 Learning Visual Realism CNN Training Composite images Classifying Natural Photos 25K natural photos vs. 25k composite images

16 How do we get composite images? Target Object Composite Images Object Mask Object Masks with Similar Shapes Object Mask: (1) Human Annotation (2) Object Proposal [Lalonde and Efros 2007]

17 Ranking of Training Composites Most realistic composites Least realistic composites

18 Evaluation Dataset [Lalonde and Efros 2007] Task: binary classification 360 realistic photos (natural images + realistic composites) 360 unrealistic photos Area under ROC Curve Methods without object mask Lalonde and Efros (no mask) 0.61 AlexNet + SVM 0.73 RealismCNN 0.84 RealismCNN + SVM Human 0.91 Methods using object mask Reinhard et al Lalonde and Efros (with mask) 0.81

19 Visual Realism Ranking Least Realistic Most Realistic Snowy Mountain Highway Ocean Red: unrealistic composite, Green: realistic composite, Blue: natural image

20 Our Pipeline Realism CNN Predict Realism Improve Composites Image Editing Model

21 Improving Visual Realism Editing model: Color adjustment g Foreground object F Realism CNN Original Composite (Realism score: 0.0) Improved Composite (Realism score: 0.8) E(g, F) = E CNN + E reg Quasi-Newton (L-BFGS)

22 Selecting Suitable Objects Best-fitting object selected by RealismCNN Object with most similar shape

23 Optimizing Color Compatibility Object mask Cut-n-paste Lalonde et al. Xue et al. Ours

24 Sanity Check: Real Photos Object mask Cut-n-paste Lalonde et al. Xue et al. Ours

25 E = Visualizing and Localizing Errors ( E I p ) Number of L-BFGS iterations Result Gradient Map

26 Discriminative Model {x P real x = 1} Pros: CNN is easy to train. Graphics programs often produce better images than generative models. General framework for many tasks (e.g. deblurring, retargeting, etc.) Cons: Task-specific: cannot apply pre-trained model to other tasks. Graphics programs are often non-parametric and non-differentiable. Graphics programs often require user in the loop: thus automatically generating results for CNN training is challenging. Code: github.com/junyanz/realismcnn Data: people.eecs.berkeley.edu/~junyanz/projects/realism/

27 Discriminative Model M: {x P real x = 1} [ICCV 15 ] Realism CNN Predict Realism Improve Editing Image Editing Model Generative Model M: {x x = G z } [SIGGRAPH 14 ] Project Editing UI Edit Transfer [ECCV 16 ]

28 Learning Natural Image Manifold Deep generative models: Generative Adversarial Network (GAN) [Goodfellow et al. 2014] [Radford et al. 2015] [Denton et al 2015] Variational Auto-Encoder (VAE) [Kingma and Welling 2013] DRAW (Recurrent Neural Network) [Gregor et al 2015] Pixel RNN and Pixel CNN ([Oord et al 2016])

29 Image Classification via Neural Network Cat Input image I Slides credit: Andrew Owens

30 Can We Generate Images with Neural Networks? Gaussian noise Image or Random Distribution

31 Generative Adversarial Networks (GAN) Generative Model Synthesized image [Goodfellow et al. 2014]

32 Generative Adversarial Networks (GAN) Generative Model Discriminative Model real [Goodfellow et al. 2014]

33 Generative Adversarial Networks (GAN) Generative Model Discriminative Model fake [Goodfellow et al. 2014]

34 Cat Generation (w.r.t. training iterations

35 GAN as Manifold Approximation Sample training images from Amazon Shirts Random image samples from Generator G(z) [Radford et al. 2015]

36 Traverse on the GAN Manifold G(z 0 ) Linear Interpolation in z space: G(z 0 + t (z 1 z 0 )) G(z 1 ) Limitations: not photo-realistic enough, low resolution produce images randomly, no user control [Radford et al. 2015]

37 Overview original photo Project Editing UI different degree of image manipulation Edit Transfer projection on manifold transition between the original and edited projection

38 Overview original photo Project Editing UI different degree of image manipulation Edit Transfer projection on manifold transition between the original and edited projection

39 Projecting an Image onto the Manifold Input: real image x R Output: latent vector z Optimization Reconstruction loss L Generative model G(z)

40 Projecting an Image onto the Manifold Input: real image x R Output: latent vector z Optimization Inverting Network z = P x Auto-encoder with a fixed decoder G

41 Projecting an Image onto the Manifold Input: real image x R Output: latent vector z Optimization Inverting Network z = P x Hybrid Method Use the network as initialization for the optimization problem

42 Overview original photo Project Editing UI different degree of image manipulation Edit Transfer projection on manifold transition between the original and edited projection

43 Manipulating the Latent Vector constraint violation loss L g user guidance image Objective: Guidance v g G(z) z 0

44 Overview original photo Project Editing UI different degree of image manipulation Edit Transfer projection on manifold transition between the original and edited projection

45 Edit Transfer Motion (u, v)+ Color (A 3 4 ): estimate per-pixel geometric and color variation G(z 0 ) Linear Interpolation in z space G(z 1 ) Input

46 Edit Transfer Motion (u, v)+ Color (A 3 4 ): estimate per-pixel geometric and color variation G(z 0 ) Linear Interpolation in z space G(z 1 ) Input

47 Edit Transfer Motion (u, v)+ Color (A 3 4 ): estimate per-pixel geometric and color variation G(z 0 ) Linear Interpolation in z space G(z 1 ) Input Result

48 Image Manipulation Demo

49 Image Manipulation Demo

50 Designing Products

52 Interactive Image Generation

53 The Simplest Generative Model: Averaging AverageExplorer: {x x = n w n I n warp } Generative model: weighted average of warped images. Limitations: cannot synthesize novel content. [Zhu et al. 2014]

54 Generative Image Transformation

55 igan (aka. interactive GAN) Get the code: github.com/junyanz/igan Intelligent drawing tools via GAN. Debugging tools for understanding and visualizing deep generative networks. Work in progress: supporting more models (GAN, VAE, theano/tensorflow).

56 Generative Model {x x = G z, z Z} Pros: Task-independent: offline generative model training is independent of the graphics applications. Optimizing z is easier than optimizing x. Generative models are better and better. Cons: Low quality, low res => post-processing (still engineering work). Limitations of current generative models: cannot produce good texture.

57 Related work on GAN Goodfellow s NIPS 2016 Tutorial: [arxiv], [slides] Early work: [Tu 07 ], [Gutmann and Hyvarinen 10 ], etc. New models: InfoGAN, SSGAN, VAE-GAN, LAPGAN, BiGAN, CoGAN, PPGAN, etc. Training techniques: DCGAN, Improved-GAN, EBGAN, Unrolling. Image: Inpainting, Inverting features, Style Transfer, Text-To- Image, super-resolution, etc. Video: Frame Prediction, Tiny Videos, etc.

58 Image-to-Image Translation arxiv 2016 code: github.com/phillipi/pix2pix Image-to-Image Translation with Conditional Adversarial Nets Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Arxiv 2016

59 Image-to-Image Problems

60 Conditional Adversarial Networks (cgan) Loss: L1+GAN G:U-Net D: PatchGAN (70 70) U-Net [Ronneberger et al. 15 ]

61 Conditional GAN

62 Network and Loss Function Loss function: L1 + GAN Generator G: U-Net Discriminator D: PatchGAN (FCN) U-Net [Ronneberger et al. 15 ]

63 Different Losses

64 Architectures for Generator G

65 Patch Size of PatchGAN

66 Applications

67 Label Facade

68 Label Street View

69 Map Generation

70 Day night

71 Edge Handbag HED [Xie and Tu. 15 ]

72 Edge Shoe HED [Xie and Tu. 15 ]

73 User Sketch Photo

74 Automatic Colorization

75 Failure Cases - Sparse input image. - Unusual input image.

76 Summary: Image-to-Image Problems

77 Cat Paper Collection GitHub: github.com/junyanz/catpapers 90% data is visual; most of visual data are about Cats. 60+ vision, learning and graphics papers.

79 Thank You! Eli Yong Jae Philipp Alyosha Philipp Tinghui

Alternatives to Direct Supervision

CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of