Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah

Similar documents
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Lecture 20: Neural Networks for NLP. Zubin Pahuja

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

A Deep Learning primer

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Technical University of Munich. Exercise 7: Neural Network Basics

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Deep Learning for Computer Vision II

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Seminars in Artifiial Intelligenie and Robotiis

Deep Learning & Neural Networks

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Neural Network Neurons

Neural Networks: promises of current research

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Reification of Boolean Logic

Computer Vision Lecture 16

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Perceptron: This is convolution!

Stacked Denoising Autoencoders for Face Pose Normalization

Neural Networks: What can a network represent. Deep Learning, Fall 2018

For Monday. Read chapter 18, sections Homework:

Neural Networks: What can a network represent. Deep Learning, Spring 2018

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

Week 3: Perceptron and Multi-layer Perceptron

Neural Networks (Overview) Prof. Richard Zanibbi

CS6220: DATA MINING TECHNIQUES

Data Mining. Neural Networks

Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation

Character Recognition Using Convolutional Neural Networks

Deep Learning With Noise

Deep Learning for Generic Object Recognition

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Advanced Introduction to Machine Learning, CMU-10715

Keras: Handwritten Digit Recognition using MNIST Dataset

Autoencoders, denoising autoencoders, and learning deep networks

Deep Neural Networks Optimization

The Mathematics Behind Neural Networks

Machine Learning Classifiers and Boosting

Neural Networks (pp )

Study of Residual Networks for Image Recognition

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

ImageNet Classification with Deep Convolutional Neural Networks

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

CSC 578 Neural Networks and Deep Learning

Visual object classification by sparse convolutional neural networks

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

Emotion Detection using Deep Belief Networks

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

CMPT 882 Week 3 Summary

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Introduction to Deep Learning

6. Linear Discriminant Functions

Multi Layer Architectures Gradient Back Propagation

Machine Learning 13. week

Deep Learning Applications

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

INTRODUCTION TO DEEP LEARNING

COMPUTATIONAL INTELLIGENCE

Pedestrian and Part Position Detection using a Regression-based Multiple Task Deep Convolutional Neural Network

Neural Nets & Deep Learning

Notes on Multilayer, Feedforward Neural Networks

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Supervised Learning in Neural Networks (Part 2)

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Multi Layer Perceptron with Back Propagation. User Manual

CS489/698: Intro to ML

Backpropagation + Deep Learning

Neural Networks CMSC475/675

2. Neural network basics

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Deep Learning. Volker Tresp Summer 2015

ECE 6504: Deep Learning for Perception

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

Deep Learning Cook Book

Deep (1) Matthieu Cord LIP6 / UPMC Paris 6

Vulnerability of machine learning models to adversarial examples

Final Report: Classification of Plankton Classes By Tae Ho Kim and Saaid Haseeb Arshad

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

11. Neural Network Regularization

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Transcription:

Using neural nets to recognize hand-written digits Srikumar Ramalingam School of Computing University of Utah

Reference Most of the slides are taken from the first chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

Introduction Deep learning allows computational models that are composed of multiple layers to learn representations of data. Significantly improved state-of-the-art results in speech recognition, visual object recognition, object detection, drug discovery and genomics. deep comes from having multiple layers of non-linearity [Source: Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, Deep Learning, Nature 2015]

Introduction neural is used because it is loosely inspired by neuroscience. The goal is generally to approximate some function f, e.g., consider a classifier y = f x : We define a mapping y = f θ, x and learn the value of the parameters θ that result in the best function approximation. Feedforward network is a specific type of deep neural network where information flows through the function being evaluated from input x through the intermediate computations used to define f, and finally to the output y.

Perceptron A perceptron takes several Boolean inputs (x 1, x 2, x 3 ) and returns a Boolean output. The weights (w 1, w 2, w 3 ) and the threshold are real numbers.

Simplification (Threshold -> Bias) b

NAND gate using a perceptron - NAND is equivalent to NOT AND

It s an old paradigm The first learning machine: the Perceptron Built at Cornell in 1960 The Perceptron was a linear classifier on top of a simple feature extractor The vast majority of practical applications of ML today use glorified linear classifiers or glorified template matching. Designing a feature extractor requires considerable efforts by experts. A N y=sign( i=1 Feature Extractor W i W i F i (X )+b) Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Design the weights and thresholds for the following truth table When all the three Boolean variables are 1s, we output 1, otherwise we output 0. When all the three Boolean variables are 0s, we output 1, otherwise we output 0. When two of the three Boolean variables are 1s, we output 1, otherwise we output 0.

NAND is universal for computation XOR gate and AND gate

OR gate using perceptrons?

Sigmoid neuron A sigmoid neuron can take real numbers (x 1, x 2, x 3 ) within 0 to 1 and returns a number within 0 to 1. The weights (w 1, w 2, w 3 ) and the bias term b are real numbers. Sigmoid function

Sigmoid neuron A sigmoid neuron can take real numbers (x 1, x 2, x 3 ) within 0 to 1 and returns a number within 0 to 1. The weights (w 1, w 2, w 3 ) and the bias term b are real numbers.

Sigmoid function

Sigmoid function can be seen as smoothed step function

Small changes in parameters produce small changes in output for sigmoid neurons (small change in parameters) (small change in output) (partial derivatives) Δoutput is approx. a linear function in small changes in weights and bias terms. Not for perceptrons! The outputs flip from 0 to 1 or vice versa for small change in inputs.

The architecture of neural networks

MNIST data Each grayscale image is of size 28x28. 60,000 training images and 10,000 test images 10 possible labels (0,1,2,3,4,5,6,7,8,9)

Digit recognition using 3 layers Example outputs: 6 -> [0 0 0 0 0 0 1 0 0 0 ] Input normalized to a value between 0 and 1.

Compute the weights and biases for the last layer

Cost function parameters to compute # of input samples input -> x vector output -> a We assume that the network approximates a function y x and outputs a. We use a quadratic cost function, i.e., mean squared error or MSE.

Cost function Can the cost function be negative in the above example? What does it mean when the cost is approximately equal to zero?

Gradient Descent Small changes in parameters to leads to small changes in output Gradient vector! Change the parameter using learning rate (positive) and gradient vector! Update rule! Let us consider a cost function C v 1, v 2 that depends on two variables. The goal is to change the two variables to minimize the cost function.

Cost function from the network parameters to compute # of input samples What are the challenges in gradient descent when you have a large number of training samples? Gradient from a set of training samples.

Stochastic gradient descent The idea is to compute the gradient using a small set of randomly chosen training data. We assume that the average gradient obtained from the small set is close to the gradient obtained from the entire set.

Stochastic gradient descent Let us consider a mini-batch with m randomly chosen samples. Provided that the sample size is large enough, we expect the average gradient from the m samples is approximately equal to the average gradient from all the n samples.

Thank You

Source: http://math.arizona.edu/~calc/rules.pdf

Source: http://math.arizona.edu/~calc/rules.pdf