Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

Similar documents
COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Neural style transfer

CS231N Project Final Report - Fast Mixed Style Transfer

Exploring Style Transfer: Extensions to Neural Style Transfer

A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Image Transformation via Neural Network Inversion

arxiv: v5 [cs.cv] 16 May 2018

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Recurrent Convolutional Neural Networks for Scene Labeling

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

CS 229 Final Report: Artistic Style Transfer for Face Portraits

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

Dynamic Routing Between Capsules

Convolution Neural Networks for Chinese Handwriting Recognition

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Machine Learning 13. week

Deep Learning and Its Applications

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Improving Semantic Style Transfer Using Guided Gram Matrices

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Deep Learning with Tensorflow AlexNet

Multi-style Transfer: Generalizing Fast Style Transfer to Several Genres

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

An Exploration of Computer Vision Techniques for Bird Species Classification

Computer Vision: Making machines see

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Lecture 7: Semantic Segmentation

A Deep Learning Approach to Vehicle Speed Estimation

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

Universal Style Transfer via Feature Transforms

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Histograms of Sparse Codes for Object Detection

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius

Structured Prediction using Convolutional Neural Networks

Real-time Object Detection CS 229 Course Project

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

A Deep Learning Framework for Authorship Classification of Paintings

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

arxiv: v2 [cs.cv] 11 Sep 2018

CS230: Lecture 3 Various Deep Learning Topics

MetaStyle: Three-Way Trade-Off Among Speed, Flexibility, and Quality in Neural Style Transfer

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Decoder Network over Lightweight Reconstructed Feature for Fast Semantic Style Transfer

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Introduction to Neural Networks

Perceptron: This is convolution!

Transfer Learning. Style Transfer in Deep Learning

Advanced Video Analysis & Imaging

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Fuzzy Set Theory in Computer Vision: Example 3, Part II

A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Mattthias Bethge Presented by Weidi Xie (1st Oct 2015 )

Fast Patch-based Style Transfer of Arbitrary Style

arxiv: v1 [cs.cv] 17 Nov 2016

Graph Neural Network. learning algorithm and applications. Shujia Zhang

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

Deep Learning. Volker Tresp Summer 2014

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Using Machine Learning for Classification of Cancer Cells

Large-scale Video Classification with Convolutional Neural Networks

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

ImageNet Classification with Deep Convolutional Neural Networks

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

Machine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Caffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications

Fuzzy Set Theory in Computer Vision: Example 3

Improving Face Recognition by Exploring Local Features with Visual Attention

Deep Learning for Computer Vision II

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

Deep neural networks II

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Separating Style and Content for Generalized Style Transfer

INTRODUCTION TO DEEP LEARNING

Neural Bag-of-Features Learning

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

CSE 559A: Computer Vision

Where s Waldo? A Deep Learning approach to Template Matching

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Real-time convolutional networks for sonar image classification in low-power embedded systems

What was Monet seeing while painting? Translating artworks to photo-realistic images M. Tomei, L. Baraldi, M. Cornia, R. Cucchiara

Toward Scale-Invariance and Position-Sensitive Region Proposal Networks

Como funciona o Deep Learning

How to Estimate the Energy Consumption of Deep Neural Networks

Deep Feature Interpolation for Image Content Changes

Spatial Control in Neural Style Transfer

Facial Expression Classification with Random Filters Feature Extraction

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent

Learning to Recognize Faces in Realistic Conditions

Transcription:

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization Presented by: Karen Lucknavalai and Alexandr Kuznetsov

Example Style Content Result

Motivation Transforming content of an image to different styles Photos to Paintings Horses to Zebras Day to Night Current Approaches Optimization based Arbitrary styles Slow (takes minutes) Feed-forward based Limited number of styles Fast

Previous works Image Style Transfer Using CNNs by Gatys et al. Seminal work (2016) Separates style and content Uses VGG Slow optimization process A Learned Representation For Artistic Style by Dumoulin et al. Uses feed forward network Fast Supports 32 styles Restricted to trained styles

Contributions Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization Achieves speed comparable to existinsting approaches Not limited to predefined set of styles Single feed-forward network Adaptive Instance Normalization Flexible user controls Content-style Style interpolation Color and spatial controls

Background: Batch Normalization (BN) Designed to accelerate training are learned parameters during the training are calculated using pixels of the whole mini-batch Replaces them with statistics during the test time Discrepancy between train and test

Background: Instance Normalization (IN) Similar to batch normalization But are computed only across spatial domains For each channel and each sample Images are independent of the batch While for BN, it s computed for the whole batch Uses the same image statistics for training and for testing Ulyanov et al. shows significant improvement in style transfer by replacing BN with IN layers.

Background: Conditional Instance Normalization(CIN) Based on Instance Normalization Learn different set of parameters Supports multiple different styles (32) using a single network for each style s Instance Normalization supports only single style Requires retraining for new styles

Interpreting Instance Normalization Normalization of contrast of the content images doesn t help Instance normalization is style normalization Convolutional feature statistics captures style Different styles have different statistics

Adaptive Instance Normalization (AdaIN) Based on Instance Normalization, but sheared params are calculated, not learned Can be used for arbitrary styles without retraining Normalize content features (like IN) Shear content features by style feature statistics (mean and variance) Feature statistics captures style Spatial structure captures content Normalize by style of content, shear by desired style

Proposed Style Transfer Algorithm Content Result Style Fixed first few layers of pretrained VGG-net Decoder mirrors encoder Content Loss Style Loss Nearest upsampling instead of pooling No normalization layers in decoder AdaIN layer aligns the channel-wise mean and variance of content image to style image

VGG-19 Designed for image recognition 3x3 convolutional layers With max pooling layers in between Followed by fully connected layers And softmax Deeper layers capture complex features Image from Stanford cs231n

Visualizing deep convolutional neural networks using natural pre-images by Mahendran et al.

Training Trained using MS-COCO for the content images Paintings from WikiArt for the style images Adam optimizer (instead of a stochastic gradient descent procedure to update the network weights in backpropagation) During training the images are resized to be all be 256 x 256 Resized the smallest dimension to be 512 while preserving the aspect ratio Then randomly cropped to 256 x 256

Training Loss function computed from the pre-trained VGG-19 Content loss is the Euclidean distance between the target features and the features of the output from the AdaIN layer Used AdaIN output as content target The style loss is calculated in terms of mean and standard deviation since their AdaIN layer transfers the mean and standard deviation

Results Chen & schmidt - patch based Ulyanov et al. - fast feed-forward method that is restricted to a single style Gatys et al. - original optimization based method (flexible but slow)

Loss Comparisons Gatys method is 3 orders of magnitude slower Ulyanov method needed to be trained on the test styles AdaIN method did not see the test styles beforehand, therefore is the best

Comparison with Baselines Concatenation method unable to transfer just the style BN and IN decoders Fails to disentangle the style information from the content of the style image Unable to preserve the original content Underperformed with the style transfer AdaIN more effective at combining the content and style

Training Curves of Style and Content Loss Want low style and content loss AdaIN does the best job preserving both style and content

Speed Comparison Achieves speeds similar to the those who train on a limited set of styles while achieving the flexibility of being applicable to arbitrary styles * * Time in parentheses includes the style encoding procedure

Speed Comparison Achieves speeds similar to the those who train on a limited set of styles while achieving the flexibility of being applicable to arbitrary styles * * Time in parentheses includes the style encoding procedure

Speed Comparison Achieves speeds similar to the those who train on a limited set of styles while achieving the flexibility of being applicable to arbitrary styles * * Time in parentheses includes the style encoding procedure

Runtime Controls The Huang-Belongie method also allows users to control various properties of the style transfer including degree of stylization interpolation between different/multiple styles transferring the style while preserving original colors using different styles in different spatial regions All of these controls are applied at runtime using a network that has already been trained.

Content-Style trade off The degree of style transfer can be controlled in two ways 1. During training by adjusting the style weight λ in the loss function a. b. 2. The higher the value of λ the more the style will be transferred The lower the value of λ the less the style will be transferred At test time by interpolating between feature maps that are fed to the decoder. a. b. α = 0 the network will reconstruct the content image α = 1 the network will reconstruct the most stylized image

Multiple-Style Interpolation Interpolate between a set of K style images si with weights wi st the sum of the weights is one Content Image

Multiple-Style Interpolation

Multiple-Style Interpolation

Multiple-Style Interpolation

Color Control Matches the color distribution of the content image to the style image so the style image Now perform style transfer with this color-aligned style image

Spatial Control To style match different areas of an image to different styles Use masks to separate out the desired regions Run the AdaIN separately for each region

Prior Work Gaty s - 2016 Showed that DNN encode style and content of images Style and content are somewhat separable Optimization process manipulated pixel values to match feature statistics Ulyanov - 2017 Improved speed with feed-forward neural network Trained a network to modify pixel values to indirectly match feature statistics

Presented Work Huang & Belongie - 2017 Created single feedforward network with an Adaptive Instance Normalization Layer to transfer feature statistics from the style image to the content image Directly aligned image statistics in the feature space in one shot with the AdaIN layer then inverted the features back to pixel space

Future Work More advanced network architectures More complicated training schemes Residual architecture Architecture with additional skip connections from the encoder Incremental training Replace AdaIN with Correlation alignment Histogram matching Using higher-order statistics

Summary Compared Instance and Batch Normalization Showed that it is optimal to use neither in the decoder Showed that Instance Normalization performs a form of style normalization by normalizing feature statistics AdaIN Aligns the channel-wise mean & variance Allows for style transfer to arbitrary styles in real-time