Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Similar documents
Deep Learning with Tensorflow AlexNet

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Know your data - many types of networks

Deep Learning for Computer Vision II

Computer Vision Lecture 16

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Convolutional Neural Networks

Machine Learning 13. week

Classification of objects from Video Data (Group 30)

Machine Learning. MGS Lecture 3: Deep Learning

ImageNet Classification with Deep Convolutional Neural Networks

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Deep Neural Networks:

CSE 559A: Computer Vision

Deep Learning & Neural Networks

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Fuzzy Set Theory in Computer Vision: Example 3, Part II

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

An Exploration of Computer Vision Techniques for Bird Species Classification

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

INTRODUCTION TO DEEP LEARNING

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Convolutional Neural Networks

Deep (1) Matthieu Cord LIP6 / UPMC Paris 6

Su et al. Shape Descriptors - III

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Introduction to Neural Networks

Deep Learning. Volker Tresp Summer 2014

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Computer Vision Lecture 16

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Real-time Object Detection CS 229 Course Project

Chapter 8: Convolutional Networks

Lecture 20: Neural Networks for NLP. Zubin Pahuja

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

ECE 6504: Deep Learning for Perception

Dynamic Routing Between Capsules

Advanced Introduction to Machine Learning, CMU-10715

Object Detection. Part1. Presenter: Dae-Yong

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Large-scale Video Classification with Convolutional Neural Networks

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Computer Vision Lecture 16

Backpropagation and Neural Networks. Lecture 4-1

All You Want To Know About CNNs. Yukun Zhu

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Visual object classification by sparse convolutional neural networks

Introduction to Neural Networks

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

C-Brain: A Deep Learning Accelerator

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

CS 523: Multimedia Systems

Study of Residual Networks for Image Recognition

Using Machine Learning for Classification of Cancer Cells

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Rotation Invariance Neural Network

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Does the Brain do Inverse Graphics?

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Como funciona o Deep Learning

Deep Learning in Image Processing

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Structured Prediction using Convolutional Neural Networks

Spatial Localization and Detection. Lecture 8-1

Global Optimality in Neural Network Training

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Yiqi Yan. May 10, 2017

Using Capsule Networks. for Image and Speech Recognition Problems. Yan Xiong

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *

Why equivariance is better than premature invariance

ConvolutionalNN's... ConvNet's... deep learnig

Traffic Sign Localization and Classification Methods: An Overview

SVM Segment Video Machine. Jiaming Song Yankai Zhang

Convolutional-Recursive Deep Learning for 3D Object Classification

Convolutional Neural Network for Image Classification

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Multi-Glance Attention Models For Image Classification

Fully Convolutional Networks for Semantic Segmentation

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

OBJECT RECOGNITION ALGORITHM FOR MOBILE DEVICES

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

! References: ! Computer eyesight gets a lot more accurate, NY Times. ! Stanford CS 231n. ! Christopher Olah s blog. ! Take ECS 174!

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Capsule Networks. Eric Mintun

Extended Dictionary Learning : Convolutional and Multiple Feature Spaces

Transcription:

Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper

Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation

Example Computer Vision Task: label these images

Neural Networks in Computer Vision Introduced in the 1980 s and abandoned in the 1990 s Images are complex signals Most of any image is background Pixel [x,y] in one images isn t the same 3D point as in another Targets can be any place or scale sometimes any orientation Fully-connected networks were too specific Not translationally invariant Not scale invariant Fully-connected networks have too many weights Images have lots of pixels Sometimes used to classify image chips after focus of attention

So computer vision went in other directions Object recognition (1990-2010): Local feature extraction Edges Homogeneous regions High-information image patches Features to descriptors Histogram of Edges (e.g. HoG, SIFT) Graphs of regions Image patch codes (e.g. BoW) Classify images based on descriptors Kernelized support vector machines (SVMs)

until deep convolutional neural networks arrived

Biological motivation: human vision Cells in the early vision system (LGN/V1) Have very small receptive fields Input from a specific small piece of the visual field Respond to specific local patterns Spatially distributed like an image Tootell, et al. 1982.

Biological motivation: human vision (II) Further down the visual pathway (V8/ITC) Larger receptive fields More complex patterns Respond to wider variations Some are viewpoint dependent Others are viewpoint independent Translationally invariant Within the cells receptive field Tsunoda, et al 2001 Inspiration for DCNNs: mimic this structure

1stst Convolutional Layer Input is an image For every NxN window, there is a neuron N is small The neuron has a weight vector, a bias term and a nonlinear function f Here s the 1st piece of magic: Every neuron has the same weight vector, bias term and non-linear function

Convolution Convolution is the process of sliding one function over another and computing the dot product at every position Formally, discrete 2D convolution is defined as: In practice, g[x,y] is non-zero over a small range N When every neuron has the same weights, the layer as a whole computes a convolution:

Why Convolution? Convolutional layers solve two problems: Small numbers of parameters Given a 3x3 window size, the entire layer has only 10 weights! Translational invariance Translating the input just translates the output In terms of biological motivation Outputs looks like early vision features Dot products with small masks Same features computed everywhere Output structured like an image But 1 early vision feature is not enough!

1stst layer: lots of feature banks In practice, you don t have just 1 first layer You train a bank of filters (weight vectors) Krizehvsky et al 2012 train 96 1st level features Here is an example of the trained weight vectors

Max Pooling Layers Convolutional layers transform (raw) images to (feature) images The data didn t get smaller or become more symbolic No increase in receptive field size Max pooling layers reduce resolution For every non-overlapping NxN window, select maximum value No weights to train Reduces size by N2

Convolution & Max Pooling Piece of magic #2: alternate convolution and max pooling layers Until the images get really small Then use a fully-connected backprop net

Convolution & Max Pooling (II) Note that the convolution masks are 3D Width x Height x Depth Depth = # of features 1st layer: depth = 3 (color images: R, G & B) Other layers: depth = # of features at previous level Max pooling increases the receptive field of following layers 1st layer computes local features E.g. edges, bars & impulses Successive layers combine previous features into larger features Until image size is 1x1xD (or at least very small) At which point you have 2D translationally invariant features Which become the input to a fully connected single hidden layer

Training (light) The algorithm to train a deep convolutional network is still back propagation Present image to network, calculate output Compute error between actual output and target output Propagate error backwards through the network Update weights based on error gradients Last layer is a traditional output layer Often one node per label Penultimate layer is a fully-connected hidden layer We need to back-propagate error gradients through: Max pooling layers Convolutional layers

Review: updating output layer weights Error function: Chain rule (no non-linear function):

Review: Updating with non-linear functions Chain rule (output): If F = Sigmoid, then So Chain rule for hidden layer: Where

Updating max pooling layers Q: How do we update a max pooling node? A: We don t! It has no weights to update! (trick question)

Updating convolutional layers Consider just one convolutional neuron It feeds a single max pooling element Which feeds elements j We will use q to index both pooling & convolutional neurons Let be the back-propagated gradient to the output of the pooling node Pq. Then

Updating convolutional layers (II) This makes the update gradients sparse But every convolutional neuron has to share weights So average the Δwi,j s across the layer

one more thing Convolutional layers rarely use tanh or sigmoid More often, This function: Is non-linear Has trivial derivatives Avoids vanishing (positive) gradients This is the 3rd piece of magic

Deep Convolutional Nets in practice There are three well-known, publicly available deep convolutional nets for object recognition AlexNet (Krizhevsky, Sutskever & Hinton 2012) GoogLeNet (Szegedy, et al 2015) DeCAF (Donahue, et al 2013) Why just three? They take many training images and a very long time to train Even given lots of GPUs

AlexNet architecture Krizhevsky, et al 2012 Deep filter banks (max 192) Stride layer (instead of max pooling) at beginning Window sizes range from 5x5 to 15x15 Actually, two nets that merge at the end (for efficiency) The majority of weights are in the last two fully connected layers

Training AlexNet Images resized to 256x256 60 million parameters 1,000 training classes 90 training cycles 1.2 million training images per cycle 6 days on two NVIDOA GTX 580 3GB GPUs