Learning to Generate Images

Size: px

Start display at page:

Download "Learning to Generate Images"

Charity Lynch
5 years ago
Views:

1 Learning to Generate Images Jun-Yan Zhu Ph.D. at UC Berkeley Postdoc at MIT CSAIL

2 Computer Vision before 2012 Cat Features Clustering Pooling Classification

3 [LeCun et al, 1998], [Krizhevsky et al, 2012] Computer Vision Now Features Clustering Pooling Classification Cat Deep Net Cat

4 [Redmon et al., 2018] [Güler et al., 2018] [Zhao et al., 2017] Deep Learning for Computer Vision [Deng et al. 2009] 70 Deep Net Top 5 accuracy on ImageNet benchmark Object detection Human understanding Autonomous driving

5 Can Deep Learning Help Graphics? Cat Modeling Texturing Lighting Rendering

6 Can Deep Learning Help Graphics? Cat Modeling Texturing Lighting Rendering Deep Net Good/Bad

7 Selecting the most attractive expressions Photos 101 Photos [Zhu et al. SIGGRAPH Asia 2014]

8 Selecting the most realistic composites Most realistic composites Least realistic composites [Zhu et al. ICCV 2015]

9 Can Deep Learning Help Graphics? Cat Modeling Texturing Lighting Rendering Deep Net Cat

10 Generating images is hard! Cat Modeling Texturing Lighting Rendering 8 Deep Net 28x28 pixels

11 Generative Adversarial Networks (GANs) [Goodfellow et al. 2014]

12 z G(z) Random code G Generator fake image aleju/cat-generator [Goodfellow et al. 2014]

13 z G(z) G D Real (1) or fake (0)? Random code Generator fake image Discriminator A two-player game: G tries to generate fake images that can fool D. D tries to detect fake images. [Goodfellow et al. 2014]

14 z G(z) G D fake (0.1) Random code Generator fake image Discriminator Learning objective (GANs) [Goodfellow et al. 2014]

15 z G(z) G D fake (0.1) Random code Generator fake image Discriminator x D real (0.9) Learning objective (GANs) real image [Goodfellow et al. 2014]

16 z G(z) G D fake (0.3) Random code Generator fake image Discriminator x D real (0.9) Learning objective (GANs) real image [Goodfellow et al. 2014]

17 Limitations of GANs No user control. vs Random code Output User input Output Low resolution and quality.

18 Contributions Co-authors: Phillip Isola, Taesung Park, Ting-Chun Wang Richard Zhang, Tinghui Zhou, Ming-Yu Liu, Andrew Tao Jan Kautz, Bryan Catanzaro, Alexei A. Efros

19 Goals: Improve Control, Quality, and Resolution pix2pix CycleGAN pix2pixhd GANs Conditional on user inputs. Learning without pairs. High quality and resolution.

20 Goals: Improve Control, Quality, and Resolution pix2pix CycleGAN pix2pixhd GANs Conditional on user inputs. Learning without pairs. High quality and resolution.

21 z G(z) G D Real or fake? Random code Generator Output image Discriminator Learning objective (GANs) [Goodfellow et al. 2014]

22 x G(x) G D Real or fake? Input image Generator Output image Discriminator Learning objective (pix2pix) [Isola, Zhu, Zhou, Efros, 2016]

23 x G(x) G D Real Input image Generator Output image Discriminator Learning objective (pix2pix) [Isola, Zhu, Zhou, Efros, 2016]

24 x G(x) G D Real too Input image Generator Output image Discriminator Learning objective (pix2pix) [Isola, Zhu, Zhou, Efros, 2016]

25 x G(x) G Generator D Real or fake pair? Discriminator Learning objective (pix2pix) [Isola, Zhu, Zhou, Efros, 2016]

26 #edges2cats Ivy Vitaly

27 x G(x) G Generator Real or fake pair? D Discriminator Input: Sketch Grayscale Output: Photo Color [Isola, Zhu, Zhou, Efros, 2016]

28 Automatic Colorization with pix2pix Input Output Input Output Input Output Data from [Russakovsky et al. 2015]

29 Interactive Colorization [Zhang*, Zhu*, Isola, Geng, Lin, Yu, Efros, 2017]

30 Edges Images Input Output Input Output Input Output Edges from [Xie & Tu, 2015]

31 Sketches Images Input Output Input Output Input Output Trained on Edges Images Data from [Eitz, Hays, Alexa, 2012]

32 Data from [maps.google.com] Input Output Groundtruth

33 Input Output Groundtruth Data from [maps.google.com]

34 Paired

35 Paired Unpaired

36 Goals: Improve Control, Quality, and Resolution pix2pix CycleGAN pix2pixhd GANs Conditional on user inputs. Learning without pairs. High quality and resolution.

37 Cycle-Consistent Adversarial Networks [Zhu*, Park*, Isola, and Efros, 2017]

38 Cycle-Consistent Adversarial Networks [Mark Twain, 1903] [Zhu*, Park*, Isola, and Efros, 2017]

39 Cycle Consistency Loss x G(x) F(G x ) D Y (G x ) Reconstruction error F G x x 1 See also [Yi et al., 2017], [Kim et al, 2017] [Zhu*, Park*, Isola, and Efros, 2017]

40 Cycle Consistency Loss x G(x) F(G x ) y F(y) G(F x ) D Y (G x ) D G (F x ) Reconstruction error Reconstruction error F G x x 1 G F y y 1 See also [Yi et al., 2017], [Kim et al, 2017] [Zhu*, Park*, Isola, and Efros, 2017]

41 Horse Zebra

42 Orange Apple

43 Cezanne Ukiyo-e Collection Style Transfer Photograph Alexei Efros Monet Van Gogh

44 Monet s paintings photographic style

45 Why CycleGAN works

46 Style and Content Separation Paired Separation Content Unpaired Separation Adversarial Loss: change the style Style Cycle Consistency Loss: preserve the content Separating Style and Content with Bilinear Models [Tenenbaum and Freeman 2000 ] Two empirical assumptions: - content is easy to keep. - style is easy to change.

47 Neural Style Transfer [Gatys et al. 2015] Style and Content: - Content: feature difference - Style: Gram Matrix difference - Both losses are hard-coded.

48 horse zebra Input Style Image I Style image II Entire collection CycleGAN Photo Van Gogh Input Style image I Style image II Entire collection CycleGAN

49 Cycle Loss upper bounds Conditional Entropy Conditional Entropy High Conditional Entropy Low Conditional Entropy ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching [Li et al. NIPS 2017]. Also see [Tiao et al. 2018] CycleGAN as Approximate Bayesian Inference

50 Cycle Loss upper bounds Conditional Entropy Conditional Entropy ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching [Li et al. NIPS 2017]. Also see [Tiao et al. 2018] CycleGAN as Approximate Bayesian Inference

51 Customizing Gaming Experience Grand Theft Auto v (GTA5) Street view images in German cities Data from [Richter et al., 2016], [Cordts et al, 2016]

52 Customizing Gaming Experience Output image with Input German GTA5 CG street view style

53 Domain Adaptation with CycleGAN Train on GTA5 data Test on real images meaniou Per-pixel accuracy Oracle (Train and test on Real) Train on CG, test on Real See Judy Hoffman s talk at 14:30 Adversarial Domain Adaptation

54 Domain Adaptation with CycleGAN GTA5 data + Domain adaptation Test on real images meaniou Per-pixel accuracy Oracle (Train and test on Real) Train on CG, test on Real FCN in the wild [Previous STOA] See Judy Hoffman s talk at 14:30 Adversarial Domain Adaptation

1 Train on CG, test on Real 17.9 54.0 FCN in the wild [Previous STOA] 27.

55 Domain Adaptation with CycleGAN Train on CycleGAN data Test on real images meaniou Per-pixel accuracy Oracle (Train and test on Real) Train on CG, test on Real FCN in the wild [Previous STOA] Train on CycleGAN, test on Real See Judy Hoffman s talk at 14:30 Adversarial Domain Adaptation

56 Failure case

57 Failure case

58 Open Source CycleGAN and pix2pix Among the most popular GitHub research projects since Among the most cited papers in Graphics/CV/ML since 2017.

59 CycleGAN in Classes CycleGAN results by students MS emoji Apple emoji MS emoji Input photo Stained glass art Roger Grosse, UoT Alena Harley, FastAI

60 Applications and Extentions Attribute Editing [Lu et al.] Object Editing [Liang et al.] Low-res Bald Bangs arxiv: Mask Input Output arxiv: Front/Character Transfer [Ignatov et al.] Data generation [Wang et al.] Input output arxiv: samples by CycleWGAN arxiv:

61 Photo Enhancement WESPE: Weakly Supervised Photo Enhancer for Digital Cameras. arxiv Andrey Ignatov, Nikolay Kobyshev, Kenneth Vanhoey, Radu Timofte, Luc Van Gool

62 Image Dehazing Cycle-Dehaze: Enhanced CycleGAN for Single Image Dehazing. CVPRW 2018 Deniz Engin Anıl Genc, Hazım Kemal Ekenel

63 Unsupervised Motion Retargeting Neural Kinematic Networks for Unsupervised Motion Retargetting. CVPR 2018 (oral) Ruben Villegas, Jimei Yang, Duygu Ceylan, Honglak Lee

64 Neural Kinematic Networks for Unsupervised Motion Retargetting. CVPR 2018 (oral) Ruben Villegas, Jimei Yang, Duygu Ceylan, Honglak Lee

65 Applications Beyond Computer Vision Medical Imaging and Biology [Wolterink et al., 2017] Voice conversion [Fang et al., 2018, Kaneko et al., 2017] Cryptography [CipherGAN: Gomez et al., ICLR 2018] Robotics NLP: Unsupervised machine translation. NLP: Text style transfer.

66 Input MR Generated CT Ground truth CT

67 Latest from #CycleGAN Input dog Output cat Input cat Output dog itok_msi

68 CycleGAN for Customized Gaming Cahintan Trivedi Battle royale games Low-res 256p/512p Fortnite Input PUBG Style Final result

69 Goals: Improve Control, Quality, and Resolution pix2pix CycleGAN pix2pixhd GANs Conditional on user inputs. Learning without pairs. High quality and resolution.

70 The Curse of Dimensionality Tree Building Car Road Sidewalk Pix2pix output

71 pix2pixhd Low-res Discriminator D 1 Real/fake? G 1 Low-res Generator G 2 Coarse-to-fine High-res Generator Image Pyramid [Burt and Adelson, 1987] Also see [Zhang et al., 2017] [Karras et al., 2018] D 2 High-res Discriminator Real/fake? [Wang, Liu, Zhu, Tao, Kautz. Catanzaro, 2018]

72 pix2pixhd: Tree Building Car Road Sidewalk

75 pix2pixhd for sketch face

Improve Control, Still a long Quality, wayand to go Resolution pix2pix CycleGAN pix2pixhd GANs 2014 2016

76 Improve Control, Still a long Quality, wayand to go Resolution pix2pix CycleGAN pix2pixhd GANs Learning to generate images from trillions of photos. Help more people tell their own visual stories.

78 Thank You! LynnHo

Generative Models II. Phillip Isola, MIT, OpenAI DLSS 7/27/18

Generative Models II. Phillip Isola, MIT, OpenAI DLSS 7/27/18 Generative Models II Phillip Isola, MIT, OpenAI DLSS 7/27/18 What s a generative model? For this talk: models that output high-dimensional data (Or, anything involving a GAN, VAE, PixelCNN, etc) Useful