LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin Ellis - MIT, Daniel Ritchie - Brown University, Armando Solar-Lezama - MIT, Joshua b. Tenenbaum - MIT Presented by : Maliha Arif Advanced Computer Vision -CAP 6412 29 th March 2018
OUTLINE Introduction / Motivation Road map Trace Hypothesis Stage 1 Obtaining a Trace Set using Neural Network Training Generalization to Hand Drawings Stage 2 Obtaining a Program from Trace set Search Algorithm Compression Factor Applications 2
MOTIVATION Hand Drawn Images Trace Set- Rendered in Latex Synthesized Program 3
TRACE HYPOTHESIS Stage 1 Stage 2 4
STAGE 1: NEURAL NETWORK 5
NEURAL NETWORK COMPONENTS CNN layers Layer 1: 20 8 8 convolutions, 2 16 4 convolutions, 2 4 16 convolutions. Followed by 8 8 pooling with a stride size of 4. Layer 2: 10 8 8 convolutions. Followed by 4 4 pooling with a stride size of 4. Drawing commands are predicted Token by Token using MLPs and STNs Example Circle command is predicted, then X coordinate then Y coordinate 6
NEURAL NETWORK Circle command or first token is predicted using multinomial logistic regression using a Softmax layer Subsequent Tokens are predicted using a Differentiable Attention Mechanism and previously predicted Tokens STN ( Spatial Transformer Network ) 7
STN- SPATIAL TRANSFORMER NETWORK CNNs lack the ability to be spatially invariant in a computationally and parameter efficient manner. Max Pooling layers in CNNs satisfy this property where the receptive fields are fixed and local. STN ( Spatial Transformer Network ) is a dynamic mechanism that can actively spatially transform an image or feature map. Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 8
STN SPATIAL TRANSFORMER NETWORKS Spatial transformers can be incorporated into CNNs to benefit multifarious tasks such as: Image Classification Co-localization Spatial attention We use them here for spatial attention. Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 9
STN SPATIAL TRANSFORMER NETWORKS The Spatial transformer mechanism is split into 3 parts Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 10
STN SPATIAL TRANSFORMER NETWORKS Grid Generator + Sampler : Affine transform Target Sampling Grid Output pixels are defined to lie on aregular grid. Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 11
STN SPATIAL TRANSFORMER NETWORKS Source Target Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 12
STAGE 1: NEURAL NETWORK 13
PROBABILITY DISTRIBUTION OF THE CNN Distribution over next drawing command Distribution over traces 14
TRAINING Network may predict incorrectly To make the network more robust, SMC is employed SMC Sequential Monte Carlo Sampling uses Pixel wise distance as a surrogate for likelihood function 15
RESULTS Edit Distance = size of the symmetric difference between the ground truth trace set and the set produced by the model. 16
GENERALIZING TO HAND DRAWINGS Actual hand drawings are not used, rather noise is introduced into rendering of training target images They are produced in the following ways: Rescaling the image intensity by a factor chosen from [0.5, 1.5] Translating the image by ±3 pixels chosen uniformly Rendering the LATEX using the pencildraw style 17
LIKELIHOOD FUNCTION L learned Model now predicts 2 scalars T1 T2 and T2 - T1 where T1 is rendered output with noise and T2 is rendered output 18
EVALUATION 100 hand drawn images on graph paper snapped to a 16 x 16 grid IoU is intersection over union between ground truth and predicted trace set We use IoU to measure the system's accuracy at recovering trace sets. 19
STAGE 2 : PROGRAM SYNTHESIS 20
COST FUNCTION Cost is computed in the following way Programs incur a cost of 1 for each command ( primitive drawing action, loop, or reflection ) They incur a cost of 1/3 for each unique coefficient they use in a linear transformation 21
BIAS OPTIMAL SEARCH POLICY Bias Optimality Search Algorithm : A search algorithm is n-bias optimal w.r.t a distribution. if it is guaranteed to find a solution in ( ) after searching for at least We use a 1 bias optimal search algorithm which means we spend time looking in the program space. This implies that we look in the whole program space 22
TRAINING Policy is given by Define a loss to train the search Policy D is the training corpus i.e. minimum cost programs for few trace sets. denotes parameters of the search policy 23
BILINEAR MODEL where denote a one-hot encoding of the parameter settings of sigma, 24 dimensions long 24
SEARCH POLICY - COMPARISON 25
COMPRESSION (TRACE SET AND PROGRAM ) Ising Model Lattice Structure Compression Factor 21/6 = 3.5x Another example Compression Factor 13/6 = 2.2x 26
APPLICATIONS Correcting Errors made by the neural network using Prior Probability of programs Hand Drawing Trace Set Rendered in Latex Program Synthesizer's Output 27
APPLICATIONS Modelling similarity between Drawings 28
APPLICATIONS Extrapolating Figures Example 1 Example 2 29
THANK YOU Discussion 30