ImageNet Classification with Deep Convolutional Neural Networks

Similar documents
Deep Learning with Tensorflow AlexNet

Convolutional Neural Networks

Deep Learning for Computer Vision II

Machine Learning. MGS Lecture 3: Deep Learning

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Advanced Introduction to Machine Learning, CMU-10715

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Backpropagation and Neural Networks. Lecture 4-1

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Apparel Classifier and Recommender using Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Semantic Image Search. Alex Egg

Lecture 37: ConvNets (Cont d) and Training

Perceptron: This is convolution!

Deep Learning & Neural Networks

Study of Residual Networks for Image Recognition

Deconvolutions in Convolutional Neural Networks

Using Machine Learning for Classification of Cancer Cells

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Training Convolutional Neural Networks for Translational Invariance on SAR ATR

Computer Vision Lecture 16

A performance comparison of Deep Learning frameworks on KNL

Computer Vision Lecture 16

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Know your data - many types of networks

Deep Indian Delicacy: Classification of Indian Food Images using Convolutional Neural Networks

All You Want To Know About CNNs. Yukun Zhu

Deep Neural Networks Optimization

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Backpropagation + Deep Learning

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Classification of objects from Video Data (Group 30)

Global Optimality in Neural Network Training

Weighted Convolutional Neural Network. Ensemble.

Structured Prediction using Convolutional Neural Networks

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Deep Learning With Noise

Machine Learning 13. week

Learning visual odometry with a convolutional network

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Transfer Learning. Style Transfer in Deep Learning

Deep Neural Networks for Market Prediction on the Xeon Phi *

Convolutional Neural Networks

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

arxiv: v2 [cs.ne] 26 Apr 2014

ROB 537: Learning-Based Control

Computer Vision Lecture 16

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

3D model classification using convolutional neural network

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Fuzzy Set Theory in Computer Vision: Example 3, Part II

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Deep Neural Networks:

Introduction to Deep Learning

Recurrent Convolutional Neural Networks for Scene Labeling

CSE 559A: Computer Vision

INTRODUCTION TO DEEP LEARNING

Yiqi Yan. May 10, 2017

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Mocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

A Deep Learning primer

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

EE 511 Neural Networks

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

Neural Networks for Machine Learning. Lecture 5a Why object recogni:on is difficult. Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

Advanced Video Analysis & Imaging

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Lecture 5: Training Neural Networks, Part I

Deep neural networks II

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Project 3 Q&A. Jonathan Krause

Keras: Handwritten Digit Recognition using MNIST Dataset

Convolutional Neural Network Layer Reordering for Acceleration

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016

Learning Binary Code with Deep Learning to Detect Software Weakness

Yelp Restaurant Photo Classification

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Neural Networks and Deep Learning

An Exploration of Computer Vision Techniques for Bird Species Classification

Transcription:

ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012

Main idea Architecture Technical details

Neural networks A neuron f(z1) w2 f(z2) A neural network Output f(x) w1 w3 Hidden f(z3) Data x = w1f(z1) + w2f(z2) + w3f(z3) x is called the total input to the neuron, and f(x) is its output A neural network computes a differentiable function of its input. For example, ours computes: p(label an input image)

Convolutional neural networks Here's a one-dimensional convolutional neural network Each hidden neuron applies the same localized, linear filter to the input Output Hidden Data

Convolution in 2D Input image Output map Filter bank

Local pooling Max

Overview of our model Deep: 7 hidden weight layers Learned: all feature extractors initialized at white Gaussian noise and learned from the data Entirely supervised More data = good Convolutional layer: convolves its input with a bank of 3D filters, then applies point-wise non-linearity Image Fully-connected layer: applies linear filters to its input, then applies pointwise non-linearity

Overview of our model Trained with stochastic gradient descent on two NVIDIA GPUs for about a week 650,000 neurons 60,000,000 parameters 630,000,000 connections Final feature layer: 4096-dimensional Convolutional layer: convolves its input with a bank of 3D filters, then applies point-wise non-linearity Image Fully-connected layer: applies linear filters to its input, then applies pointwise non-linearity

96 learned low-level filters

Main idea Architecture Technical details

Local convolutional filters Training Fully-connected filters Image Backward pass Forward pass Using stochastic gradient descent and the backpropagation algorithm (just repeated application of the chain rule) Image

Our model Max-pooling layers follow first, second, and fifth convolutional layers The number of neurons in each layer is given by 253440, 186624, 64896, 64896, 43264, 4096, 4096, 1000

Main idea Architecture Technical details

Input representation Centered (0-mean) RGB values. An input image (256x256) Minus sign The mean input image

Neurons f(x) = tanh(x) f(x) = max(0, x) f(x) w1 f(z1) w2 f(z2) w3 f(z3) x = w1f(z1) + w2f(z2) + w3f(z3) x is called the total input to the neuron, and f(x) is its output Very bad (slow to train) Very good (quick to train)

Data augmentation Our neural net has 60M real-valued parameters and 650,000 neurons It overfits a lot. Therefore we train on 224x224 patches extracted randomly from 256x256 images, and also their horizontal reflections.

Testing Average predictions made at five 224x224 patches and their horizontal reflections (four corner patches and center patch) Logistic regression has the nice property that it outputs a probability distribution over the class labels Therefore no score normalization or calibration is necessary to combine the predictions of different models (or the same model on different patches), as would be necessary with an SVM.

Dropout Independently set each hidden unit activity to zero with 0.5 probability We do this in the two globally-connected hidden layers at the net's output A hidden layer's activity on a given training image A hidden unit turned off by dropout A hidden unit unchanged

Implementation The only thing that needs to be stored on disk is the raw image data We stored it in JPEG format. It can be loaded and decoded entirely in parallel with training. Therefore only 27GB of disk storage is needed to train this system. Uses about 2GB of RAM on each GPU, and around 5GB of system memory during training.

Implementation Written in Python/C++/CUDA Sort of like an instruction pipeline, with the following 4 instructions happening in parallel: Train on batch n (on GPUs) Copy batch n+1 to GPU memory Transform batch n+2 (on CPU) Load batch n+3 from disk (on CPU)

Validation classification

Validation classification

Validation classification

Validation localizations

Validation localizations

Retrieval experiments First column contains query images from ILSVRC-2010 test set, remaining columns contain retrieved images from training set.

Retrieval experiments