Introduction to Multilayer Perceptrons

Similar documents
Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Supervised Learning in Neural Networks (Part 2)

Data Mining. Neural Networks

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

CMPT 882 Week 3 Summary

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Week 3: Perceptron and Multi-layer Perceptron

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

For Monday. Read chapter 18, sections Homework:

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

In this assignment, we investigated the use of neural networks for supervised classification

Multi-Layered Perceptrons (MLPs)

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Neural Networks. Neural Network. Neural Network. Neural Network 2/21/2008. Andrew Kusiak. Intelligent Systems Laboratory Seamans Center

Back propagation Algorithm:

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Lecture #11: The Perceptron

Neural Network Neurons

11/14/2010 Intelligent Systems and Soft Computing 1

Reservoir Computing for Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Multi Layer Perceptron with Back Propagation. User Manual

Machine Learning Classifiers and Boosting

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Neural Networks CMSC475/675

Neural Networks (Overview) Prof. Richard Zanibbi

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Neural Networks and Deep Learning

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Machine Learning 13. week

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

arxiv: v1 [cs.lg] 25 Jan 2018

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Fast Training of Multilayer Perceptrons

Neuro-Fuzzy Inverse Forward Models

Neural Networks for Classification

Artificial Neural Networks MLP, RBF & GMDH

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

Neural Nets for Adaptive Filter and Adaptive Pattern Recognition

Review on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

A Novel Technique for Optimizing the Hidden Layer Architecture in Artificial Neural Networks N. M. Wagarachchi 1, A. S.

Exercise: Training Simple MLP by Backpropagation. Using Netlab.

Image Compression: An Artificial Neural Network Approach

The Mathematics Behind Neural Networks

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions

Classification: Linear Discriminant Functions

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Logical Rhythm - Class 3. August 27, 2018

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Global Optimality in Neural Network Training

Alex Waibel

MATLAB representation of neural network Outline Neural network with single-layer of neurons. Neural network with multiple-layer of neurons.

Character Recognition Using Convolutional Neural Networks

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Info 2950, Lecture 16

Artificial Neural Networks

RADIAL BASIS FUNCTIONS NETWORK FOR DEFECT SIZING. S. Nair, S. Udpa and L. Udpa Center for Non-Destructive Evaluation Scholl Road Ames, IA 50011

Simulation of Back Propagation Neural Network for Iris Flower Classification

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.

Yuki Osada Andrew Cannon

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

Practical Tips for using Backpropagation

Introduction to Neural Networks

6. Backpropagation training 6.1 Background

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah

Neural Network Approach for Automatic Landuse Classification of Satellite Images: One-Against-Rest and Multi-Class Classifiers

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

A NEW EFFICIENT VARIABLE LEARNING RATE FOR PERRY S SPECTRAL CONJUGATE GRADIENT TRAINING METHOD

}w!"#$%&'()+,-./012345<ya

Optimum size of feed forward neural network for Iris data set.

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Motivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Introduction to Neural Networks: Structure and Training

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

MULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS CHINMAY RANE. Presented to the Faculty of Graduate School of

Deep Learning. Architecture Design for. Sargur N. Srihari

Combined Weak Classifiers

Implementation of FPGA-Based General Purpose Artificial Neural Network

Technical University of Munich. Exercise 7: Neural Network Basics

Classification Algorithms for Determining Handwritten Digit

Introduction AL Neuronale Netzwerke. VL Algorithmisches Lernen, Teil 2b. Norman Hendrich

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Deep Learning for Computer Vision II

ARTIFICIAL NEURAL NETWORK CIRCUIT FOR SPECTRAL PATTERN RECOGNITION

Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)

COMPUTATIONAL INTELLIGENCE

Introduction to Neural Networks

Artificial Intellegence

Transcription:

An Introduction to Multilayered Neural Networks Introduction to Multilayer Perceptrons Marco Gori University of Siena

Outline of the course Motivations and biological inspiration Multilayer perceptrons: architectural issues Learning as function optimization Backpropagation The applicative perspective

A difficult problem for knowledge-based solutions An A perceived by a webcam How can I provide a satisfatory statement to associate the picture with an A?

Emulation of the brain?... Or simply inspiration? I just want to point out that the componentry used in The memory may be entirely different from the one that Underlines the basic active organs. John von Neumann, 1958... Inspiration at the level of neurons... Hard to go beyond!

Artificial neurons Radial basis units Sigmoidal units

Supervised and reinforcement learning Reinforcement info: reward/punishment Supervised info: specific target

Directed Acyclic Graph Architecture Partial ordering on the nodes Feedforward architecture Multilayer architecture

Forward Propagation Let be any topological sorting of the nodes and let Be the parents of node

Boolean Functions

Boolean Functions by MLP Every Boolean function can be expressed in the first canonical form Every minterm is a linearly-separable function (one on a hypercube s vertex) OR is linearly-separable Similar conclusions using the second canonical form. Every Boolean function can be represented by an MLP with two layers: minterms at the first layer, OR at the second one

Membership functions A set function is defined by for all Convex set by MLP

Membership for comlex domains non-connected domains non-convex domains

Membership functions (Lippman ASSP 87) Every hidden unit is associated with a hyperplane Every convex set is associated with units in the first hidden layer Every non-connected or non-convex set can be represented by a proper combination (at the second hidden layer) of units representing convex sets in the first hidden layer Basic statement: Two hidden layer to approximate any set function

Universal Approximation Given and find One hidden layer (with enough hidden units)!

Supervised Learning Consider the triple where Error due to the mismatch between

Gradient Descent The optimization may involve a huge number of paramters even one million (Bourlard 1997) The gradient heuristics is the only one which is meaningful in such huge spaces The trajectory ends up in local minima of the error function. How is the gradient calculated? (very important issue!)

Backpropagation Bryson & Ho (1969), Werbos (1974), le Cun (1995), Rumerlhart-Hinton-Williams (1986) Error accumulation: DAG hypothesis:

Backpropagation (con t) if else then

Backpropagation (con t) any topologic sorting induced by topologic sorting induced by the inverse

Batch-learning We calculate the true-gradient The weight updating takes place according within The classic framework of numerical analysis

On-line learning Weight updating after presenting each example momentum term to filter out abrupt changes

Backprop Heuristics Weight initialization Avoid small weights (no Backprop of the delta errors Avoid large weights (neuron saturation) Learning rate The learning rate can change during the learning (higher when the gradient is small) There is no magic solution!

Backprop Heuristics (con t) Activation function: symmetric vs asymmetric Target values to avoid saturation Input normalization The input variables (coordinates) should be incorrelated Normalization w.r.t. to the fan-in of the first hidden layer

The problem of local minima simple example of sub-optimal learning local minima as symmetric configurations...

Basic results No local minima in the case of linearly separable patterns (Gori & Tesi, IEEE-PAMI92) No local m inima if the number of hidden units is equal to the number of examples (Yu et al, IEEE- TNN95) A general comment: The theoretical investigations on this problem have not relevant results for the design of the networks.

Complexity issues: Backprop is optimal Classic numerical algorithms requires for the computation of a single partial derivative and for the whole gradient computation Backprop requires for the whole gradient computation e.g. in the case of 10,000 parameters, we need 10,000 FPO versus 100 millions FPO!!! This is one of the main reasons of the success of Backpropagation!

crossvalidation penalties to limit large weights Courtesy MathWorks

An Introduction to Multilayered Neural Networks The applicative perspective: pattern recognition

MLP as classifiers: a simple pre-processing input Preprocessed input output

The cascade

Autoassociator-based classifiers Motivation: better behavior for pattern verification Gori & Scarselli, IEEE-PAMI-98

Some successful applications Airline market assistance, BehavHeuristics Inc Automated Real Estate Appraisal Systems - HNC Software OCR - Caere, Audre Recognition System Path planning, NeuroRoute - Protel Electronic nose - AromaScan Inc Quality control - Anheuser-Busch, Dunlop Banknote acceptor BANK, DF Elettronica Florence