Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Similar documents
Deep Learning with Tensorflow AlexNet

Deep Learning for Computer Vision II

arxiv: v1 [cs.lg] 16 Jan 2013

Deconvolutions in Convolutional Neural Networks

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Perceptron: This is convolution!

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Structured Prediction using Convolutional Neural Networks

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Fuzzy Set Theory in Computer Vision: Example 3, Part II

ImageNet Classification with Deep Convolutional Neural Networks

Face Recognition A Deep Learning Approach

Deep Neural Networks:

Convolutional Neural Networks

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

Know your data - many types of networks

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Study of Residual Networks for Image Recognition

Two-Stream Convolutional Networks for Action Recognition in Videos

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Large-scale Video Classification with Convolutional Neural Networks

Rich feature hierarchies for accurate object detection and semantic segmentation

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Advanced Video Analysis & Imaging

Rotation Invariance Neural Network

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Dynamic Routing Between Capsules

Weighted Convolutional Neural Network. Ensemble.

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

A Deep Learning Approach to Vehicle Speed Estimation

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Computer Vision Lecture 16

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Machine Learning. MGS Lecture 3: Deep Learning

3D model classification using convolutional neural network

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Using Machine Learning for Classification of Cancer Cells

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

CNN Visualizations. Seoul AI Meetup Martin Kersner, 2018/01/06

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Multi-Glance Attention Models For Image Classification

Keras: Handwritten Digit Recognition using MNIST Dataset

Deep Learning. Volker Tresp Summer 2014

Deep Learning and Its Applications

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Fully Convolutional Network for Depth Estimation and Semantic Segmentation

Plankton Classification Using ConvNets

Facial Expression Classification with Random Filters Feature Extraction

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

Training Convolutional Neural Networks for Translational Invariance on SAR ATR

Deep neural networks II

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Tiny ImageNet Visual Recognition Challenge

Return of the Devil in the Details: Delving Deep into Convolutional Nets

An Exploration of Computer Vision Techniques for Bird Species Classification

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *

Deep Learning & Neural Networks

All You Want To Know About CNNs. Yukun Zhu

M. Sc. (Artificial Intelligence and Machine Learning)

Automatic detection of books based on Faster R-CNN

Machine Learning 13. week

CNN Basics. Chongruo Wu

Advanced Machine Learning

Classification of objects from Video Data (Group 30)

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

CSE 559A: Computer Vision

arxiv: v1 [cs.cv] 29 Sep 2016

Real-time convolutional networks for sonar image classification in low-power embedded systems

CAP 6412 Advanced Computer Vision

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Seminars in Artifiial Intelligenie and Robotiis

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Recurrent Convolutional Neural Networks for Scene Labeling

WE are witnessing a rapid, revolutionary change in our

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Pedestrian and Part Position Detection using a Regression-based Multiple Task Deep Convolutional Neural Network

Convolutional Networks in Scene Labelling

Transfer Learning. Style Transfer in Deep Learning

Automated Diagnosis of Vertebral Fractures using 2D and 3D Convolutional Networks

arxiv: v1 [stat.ml] 16 Nov 2017

Survey of Convolutional Neural Network

Kaggle Data Science Bowl 2017 Technical Report

Lecture 7: Semantic Segmentation

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Single Image Depth Estimation via Deep Learning

Image and Video Understanding

Residual Networks for Tiny ImageNet

Recurrent Neural Networks and Transfer Learning for Action Recognition

Multi-Task Self-Supervised Visual Learning

Data Augmentation for Skin Lesion Analysis

Transcription:

Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer Vision 1/34

Background deconvnet Overview Creating a CNN Deep learning is a machine learning technique which utilizes multiple layers of Neural Networks. The basic Building Blocks are: Multiple Layers Bias Input Weighting (and not the weighting between Neurons ) Pooling Operation between layers Pooling together inputs which reduces the size of the input between layers. Can be overlapped like in the Alex Network ReLUs (Rectified Linear Units) Modeling a neuron s output and keeping it positive such as f (x) = max(0, x). Can also be tied to Local Response Normalization (brightness normalization) Types of Learning Stochastic Gradient Decent Restricted Boltzmann Machines Auto-encoders 2/34

deconvnet Overview Creating a CNN Kinds of Networks 3/34

History deconvnet Overview Creating a CNN 4/34

deconvnet Overview Creating a CNN What is a Convolutional Neuro-Network 5/34

deconvnet Overview Creating a CNN Data Augmentation 1 Augmenting data is crucial to avoid overfitting and build robustness 2 Ideally you want to cover all deformations occurring in target domain (not more) translations scales rotations contrast lighting colors flip 3 If possible, average or max over these transformations at test time 4 Allows for smaller datasets (Imagenet would be far too small without this for the Alex Network) 6/34

Mutliscale deconvnet Overview Creating a CNN 1 Skip Connections can improve multiscale detection 7/34

deconvnet Overview Creating a CNN Spatio-Temporal Features 1 If you know where you need to scale then you can do use multiple streams 8/34

deconvnet Overview Creating a CNN Different kinds of Networks 9/34

deconvnet Overview Creating a CNN ImageNet pre-training 1 Leveraging Previously Trained datasets (Transfer Learning) 10/34

deconvnet Overview Creating a CNN ImageNet pre-training 1 Size of your dataset dictates the number of levels (or at least the approach) 11/34

deconvnet Overview Creating a CNN Why are CNNs so good? 1 Size of your dataset dictates the number of levels (or at least the approach) 12/34

deconvnet Overview Creating a CNN Pedestrian Detection 13/34

deconvnet Overview Methods Final Network Analysis Discussion Table of Contents 1 Overview Creating a CNN 2 deconvnet Overview Methods Final Network Analysis Discussion 14/34

deconvnet Overview Methods Final Network Analysis Discussion Visualizing and Understanding Convolutional Networks Authors: Matthew D. Zeiler and Rob Fergus. Dept. of Computer Science, Courant Institute, NYU. Abstract: Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark (Krizhevsky et al., 2012). However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. Used in a diagnostic role, these visualizations allow us to find model architectures that outperform Krizhevsky et al. on the ImageNet classification benchmark. We also perform an ablation study to discover the performance contribution from different model layers. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets. 15/34

Purpose deconvnet Overview Methods Final Network Analysis Discussion The purpose of the paper is to propose a way of visualizing the inter workings of the Convolutional Neuro-Networks called a Deconvolutional Network (deconvnet) Understand in general Understand how the specific network is working Reverse the flow of the network to visualize the learned elements. Not a generative projection from the model! They are showing the highest activation levels of the filter from a validation set 16/34

deconvnet Overview Methods Final Network Analysis Discussion Basic Opperations Unpooling The unpooling step is to reverse of the pooling operation between layers. This normally is a one way function so they have to include variable (called switches) which represent which elements the pooling operation is selecting to move to the next layer. Rectification This operation makes sure that the output of the reverse direction is non-negative. This is done in the normal feed forward network stage as well. Filtering The CNN uses convolution with learned filters at beginning stage of each layer. To invert this, they use a transpose of the same filter but apply rectified maps to this element (instead of a different part of the layer) 17/34

deconvnet Overview Methods Final Network Analysis Discussion Visualization of Deconvnet 18/34

The Network deconvnet Overview Methods Final Network Analysis Discussion 19/34

Training deconvnet Overview Methods Final Network Analysis Discussion They use a dense amount of connections for the 3, 4, and 5 layers unlike the Alex Network which used sparse connections because of less hardware restrictions They trained on ImageNet 2012 training set (1.3 Million Images / 1000 different classes) Image Processing Steps Resize smallest dimension to 256 Cropping the Center 256x256 region Subtracting the per-pixel Mean (across all images) 10 different sub-crops of size 244x244 (corners + center with(out) horizontal flips) 20/34

deconvnet Overview Methods Final Network Analysis Discussion Training Parameters Continued Learning Parameters Stochastic gradient descent with a mini-batch size of 128 to update parameters Starting with a learning rate of 10 2 and a momentum term of 0.9 They anneal the learning rate throughout training manually when the validation error plateaus Dropout is used in the fully connected layers (6 and 7) with a rate of 0.5 All weights are initialized to 10 2 The visualization of the first layers showed that some filters were becoming too dominate so they renormalized layers with a high RMS It took them 12 days of training on a single GPU using 70 epochs 21/34

deconvnet Overview Methods Final Network Analysis Discussion Final Visualizations Part 1 22/34

deconvnet Overview Methods Final Network Analysis Discussion Final Visualizations Part 2 23/34

deconvnet Overview Methods Final Network Analysis Discussion Final Visualizations Part 3 l5 r1 c2 patches have little in common but the background l3 r1 c1 complex invariances capturing similar textures l4 significant variation but class specific l5 entire obj with significant pose variation 24/34

deconvnet Overview Methods Final Network Analysis Discussion Figure Evolution During Training 25/34

deconvnet Overview Methods Final Network Analysis Discussion Figure Invariance 26/34

deconvnet Overview Methods Final Network Analysis Discussion Architecture Selection 27/34

deconvnet Overview Methods Final Network Analysis Discussion Occlusion Sensitivity 28/34

deconvnet Overview Methods Final Network Analysis Discussion Correspondence Analysis 29/34

Results deconvnet Overview Methods Final Network Analysis Discussion First they Replicated the Alex Network results, than surpassed the best with error rate of 14.8% Then they played around with tweaking the network (results bottom right) Convolutional Part is important for best results 30/34

deconvnet Overview Methods Final Network Analysis Discussion Results Generalizaed on Caltech-101/256 Training on 15 or 30 randomly selected images per class and test on 50 per class The results showed how pre-training on a large dataset increase accuracy tremendously Just need 6 Caltech-256 training images to beat the next best which needs 10 times as many 31/34

deconvnet Overview Methods Final Network Analysis Discussion Results Generalizaed on Pascal 2012 They could not beat the state of the art here overall (but they did for some classes) Might be because of the difference in the datasets 3.2% from best reported result but who s counting 32/34

deconvnet Overview Methods Final Network Analysis Discussion Feature Analysis They add an SVM or softmax classifier at the end of each layer of the Imagenet-pretrained models They show how well each additional layer does at classifying the image 33/34

Discussion deconvnet Overview Methods Final Network Analysis Discussion Features are far from randomize Increasing Invariance while ascending through the levels Can use this to debug problems Occlusion can be use to localization of objects in the image The increase in classification is cause by holistic elements of the model and no specific aspect 34/34