LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES

Similar documents
Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Supplementary Material for SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images

Classification of objects from Video Data (Group 30)

Introduction to Neural Networks

Improving Face Recognition by Exploring Local Features with Visual Attention

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Deep Learning and Its Applications

Overall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic

CSE 559A: Computer Vision

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Joint Object Detection and Viewpoint Estimation using CNN features

SAFE: Scale Aware Feature Encoder for Scene Text Recognition

OBJECT DETECTION HYUNG IL KOO

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning for Computer Vision II

ECE 5470 Classification, Machine Learning, and Neural Network Review

Learning to Infer Graphics Programs from Hand-Drawn Images

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

From processing to learning on graphs

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Object Detection on Self-Driving Cars in China. Lingyun Li

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Two-Stream Convolutional Networks for Action Recognition in Videos

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Conditional Random Fields as Recurrent Neural Networks

Lecture 7: Semantic Segmentation

Capsule Networks. Eric Mintun

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Final Report: Smart Trash Net: Waste Localization and Classification

April 4-7, 2016 Silicon Valley

Quantifying Translation-Invariance in Convolutional Neural Networks

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Neural Machine Translation In Linear Time

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

Multi-View 3D Object Detection Network for Autonomous Driving

R-FCN: Object Detection with Really - Friggin Convolutional Networks

Recurrent Transformer Networks for Semantic Correspondence

Deep Learning: Image Registration. Steven Chen and Ty Nguyen

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Deconvolution Networks

Internet of things that video

CAP 6412 Advanced Computer Vision

Deep Learning for Embedded Security Evaluation

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Introduction to Neural Networks

DEEP STRUCTURED OUTPUT LEARNING FOR UNCONSTRAINED TEXT RECOGNITION

Spatial Localization and Detection. Lecture 8-1

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018

ECE 6504: Deep Learning for Perception

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Part Localization by Exploiting Deep Convolutional Networks

Gradient of the lower bound

Project 3 Q&A. Jonathan Krause

Channel Locality Block: A Variant of Squeeze-and-Excitation

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Mattthias Bethge Presented by Weidi Xie (1st Oct 2015 )

Fuzzy Set Theory in Computer Vision: Example 3

DreamCoder: Growing libraries of concepts with wake-sleep program induction

Learning to generate 3D shapes

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

ABC-CNN: Attention Based CNN for Visual Question Answering

Dynamic Routing Between Capsules. Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy

Martian lava field, NASA, Wikipedia

Structured Attention Networks

Structured Prediction using Convolutional Neural Networks

Recurrent Convolutional Neural Networks for Scene Labeling

A spatiotemporal model with visual attention for video classification

Analysis and Synthesis of Texture

INTRODUCTION TO DEEP LEARNING

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Depth from Stereo. Dominic Cheng February 7, 2018

CS 223B Computer Vision Problem Set 3

PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION. Shin Matsuo Wataru Shimoda Keiji Yanai

Fully Convolutional Network for Depth Estimation and Semantic Segmentation

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

EE-559 Deep learning Networks for semantic segmentation

MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY

CS231N Section. Video Understanding 6/1/2018

Convolutional-Recursive Deep Learning for 3D Object Classification

Convolution Neural Networks for Chinese Handwriting Recognition

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

DEEP BLIND IMAGE QUALITY ASSESSMENT

Fully Convolutional Networks for Semantic Segmentation

Cambridge Interview Technical Talk

Deep Learning in Image Processing

Learning visual odometry with a convolutional network

Bilinear Models for Fine-Grained Visual Recognition

Tracking Algorithms. Lecture16: Visual Tracking I. Probabilistic Tracking. Joint Probability and Graphical Model. Deterministic methods

Computer Vision I - Basics of Image Processing Part 2

Transcription:

LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin Ellis - MIT, Daniel Ritchie - Brown University, Armando Solar-Lezama - MIT, Joshua b. Tenenbaum - MIT Presented by : Maliha Arif Advanced Computer Vision -CAP 6412 29 th March 2018

OUTLINE Introduction / Motivation Road map Trace Hypothesis Stage 1 Obtaining a Trace Set using Neural Network Training Generalization to Hand Drawings Stage 2 Obtaining a Program from Trace set Search Algorithm Compression Factor Applications 2

MOTIVATION Hand Drawn Images Trace Set- Rendered in Latex Synthesized Program 3

TRACE HYPOTHESIS Stage 1 Stage 2 4

STAGE 1: NEURAL NETWORK 5

NEURAL NETWORK COMPONENTS CNN layers Layer 1: 20 8 8 convolutions, 2 16 4 convolutions, 2 4 16 convolutions. Followed by 8 8 pooling with a stride size of 4. Layer 2: 10 8 8 convolutions. Followed by 4 4 pooling with a stride size of 4. Drawing commands are predicted Token by Token using MLPs and STNs Example Circle command is predicted, then X coordinate then Y coordinate 6

NEURAL NETWORK Circle command or first token is predicted using multinomial logistic regression using a Softmax layer Subsequent Tokens are predicted using a Differentiable Attention Mechanism and previously predicted Tokens STN ( Spatial Transformer Network ) 7

STN- SPATIAL TRANSFORMER NETWORK CNNs lack the ability to be spatially invariant in a computationally and parameter efficient manner. Max Pooling layers in CNNs satisfy this property where the receptive fields are fixed and local. STN ( Spatial Transformer Network ) is a dynamic mechanism that can actively spatially transform an image or feature map. Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 8

STN SPATIAL TRANSFORMER NETWORKS Spatial transformers can be incorporated into CNNs to benefit multifarious tasks such as: Image Classification Co-localization Spatial attention We use them here for spatial attention. Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 9

STN SPATIAL TRANSFORMER NETWORKS The Spatial transformer mechanism is split into 3 parts Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 10

STN SPATIAL TRANSFORMER NETWORKS Grid Generator + Sampler : Affine transform Target Sampling Grid Output pixels are defined to lie on aregular grid. Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 11

STN SPATIAL TRANSFORMER NETWORKS Source Target Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 12

STAGE 1: NEURAL NETWORK 13

PROBABILITY DISTRIBUTION OF THE CNN Distribution over next drawing command Distribution over traces 14

TRAINING Network may predict incorrectly To make the network more robust, SMC is employed SMC Sequential Monte Carlo Sampling uses Pixel wise distance as a surrogate for likelihood function 15

RESULTS Edit Distance = size of the symmetric difference between the ground truth trace set and the set produced by the model. 16

GENERALIZING TO HAND DRAWINGS Actual hand drawings are not used, rather noise is introduced into rendering of training target images They are produced in the following ways: Rescaling the image intensity by a factor chosen from [0.5, 1.5] Translating the image by ±3 pixels chosen uniformly Rendering the LATEX using the pencildraw style 17

LIKELIHOOD FUNCTION L learned Model now predicts 2 scalars T1 T2 and T2 - T1 where T1 is rendered output with noise and T2 is rendered output 18

EVALUATION 100 hand drawn images on graph paper snapped to a 16 x 16 grid IoU is intersection over union between ground truth and predicted trace set We use IoU to measure the system's accuracy at recovering trace sets. 19

STAGE 2 : PROGRAM SYNTHESIS 20

COST FUNCTION Cost is computed in the following way Programs incur a cost of 1 for each command ( primitive drawing action, loop, or reflection ) They incur a cost of 1/3 for each unique coefficient they use in a linear transformation 21

BIAS OPTIMAL SEARCH POLICY Bias Optimality Search Algorithm : A search algorithm is n-bias optimal w.r.t a distribution. if it is guaranteed to find a solution in ( ) after searching for at least We use a 1 bias optimal search algorithm which means we spend time looking in the program space. This implies that we look in the whole program space 22

TRAINING Policy is given by Define a loss to train the search Policy D is the training corpus i.e. minimum cost programs for few trace sets. denotes parameters of the search policy 23

BILINEAR MODEL where denote a one-hot encoding of the parameter settings of sigma, 24 dimensions long 24

SEARCH POLICY - COMPARISON 25

COMPRESSION (TRACE SET AND PROGRAM ) Ising Model Lattice Structure Compression Factor 21/6 = 3.5x Another example Compression Factor 13/6 = 2.2x 26

APPLICATIONS Correcting Errors made by the neural network using Prior Probability of programs Hand Drawing Trace Set Rendered in Latex Program Synthesizer's Output 27

APPLICATIONS Modelling similarity between Drawings 28

APPLICATIONS Extrapolating Figures Example 1 Example 2 29

THANK YOU Discussion 30