Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

Similar documents
11/14/2010 Intelligent Systems and Soft Computing 1

Supervised Learning in Neural Networks (Part 2)

COMPUTATIONAL INTELLIGENCE

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

11/14/2010 Intelligent Systems and Soft Computing 1

Pattern Classification Algorithms for Face Recognition

Multilayer Feed-forward networks

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Artificial Neural Networks

Neural Networks (Overview) Prof. Richard Zanibbi

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Learning and Generalization in Single Layer Perceptrons

Week 3: Perceptron and Multi-layer Perceptron

NEURAL NETWORKS. Typeset by FoilTEX 1

Data Mining. Neural Networks

Multi Layer Perceptron with Back Propagation. User Manual

Lecture #11: The Perceptron

Introduction AL Neuronale Netzwerke. VL Algorithmisches Lernen, Teil 2b. Norman Hendrich

For Monday. Read chapter 18, sections Homework:

6. Backpropagation training 6.1 Background

Introduction to Neural Networks

Neural Networks CMSC475/675

Lecture 3: Linear Classification

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

CS6220: DATA MINING TECHNIQUES

Character Recognition Using Convolutional Neural Networks

Support Vector Machines

Introduction to Machine Learning

Online Algorithm Comparison points

Notes on Multilayer, Feedforward Neural Networks

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

TechTalk on Artificial Intelligence

CMPT 882 Week 3 Summary

CHAPTER VI BACK PROPAGATION ALGORITHM

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

6. Linear Discriminant Functions

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

Machine Learning Classifiers and Boosting

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Artificial Neural Networks MLP, RBF & GMDH

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

In this assignment, we investigated the use of neural networks for supervised classification

CS547, Neural Networks: Homework 3

CHAPTER 7 MASS LOSS PREDICTION USING ARTIFICIAL NEURAL NETWORK (ANN)

Deep (1) Matthieu Cord LIP6 / UPMC Paris 6

Function approximation using RBF network. 10 basis functions and 25 data points.

Machine Learning (CSE 446): Unsupervised Learning

Figure (5) Kohonen Self-Organized Map

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Yuki Osada Andrew Cannon

Machine Learning in Biology

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Support Vector Machines

Reification of Boolean Logic

2. Neural network basics

Biologically inspired object categorization in cluttered scenes

Classification: Linear Discriminant Functions

Neural Networks (pp )

Linear Discriminant Functions: Gradient Descent and Perceptron Convergence

MULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS CHINMAY RANE. Presented to the Faculty of Graduate School of

KINEMATIC ANALYSIS OF ADEPT VIPER USING NEURAL NETWORK

Statistical Methods in AI

Instructor: Jessica Wu Harvey Mudd College

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

IMPLEMENTATION OF RBF TYPE NETWORKS BY SIGMOIDAL FEEDFORWARD NEURAL NETWORKS

A Thermal Perceptron Learning Rule

Neural Network Neurons

Large Margin Classification Using the Perceptron Algorithm

Neuro-Fuzzy Computing

Data Compression. The Encoder and PCA

Image Compression: An Artificial Neural Network Approach

Neural Nets. General Model Building

Ensembles. An ensemble is a set of classifiers whose combined results give the final decision. test feature vector

Instantaneously trained neural networks with complex inputs

Fall 09, Homework 5

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)

Experimental Data and Training

Combine the PA Algorithm with a Proximal Classifier

CHAPTER 3 RESEARCH METHODOLOGY

Artificial Neural Network Methodology for Modelling and Forecasting Maize Crop Yield

Unsupervised Learning : Clustering

Machine Learning written examination

ECG782: Multidimensional Digital Signal Processing

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section

6.034 Quiz 2, Spring 2005

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Transcription:

Supervised Learning in Neural Networks (Part 1) A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. Variety of learning algorithms are existing, each of which offers advantages of its own. Basically, learning algorithms differ from each other in the way in which the adjustment Δwkj to the synaptic weight wkj is formulated. Theoretically, a neural network could learn by: 1. Developing new connections. 2. Deleting existing connections. 3. Changing connecting weights, (and practically). 4. Changing the threshold values of neurons, (and practically). 5. Changing activation function, propagation function or output function. 6. Developing new neurons. 7. Deleting existing neurons. Fundamentals on learning and training: Learning is a process by which the free parameters (weights and biases) of a neural network are adapted through a continuing process of stimulation by the environment. This definition of the learning process implies the following sequence of events: 1. The neural network is stimulated by an environment. 2. The neural network is changed (internal structure) as a result of this stimulation. 3. The neural network responds in a new way to the environment. Setting the Weights The method of setting the values of the weights (training) is an important characteristic of different neural nets. different types of training: o Supervised: in which the network is trained by providing it with input and matching output patterns. o Unsupervised or Self-organization in which an output unit is trained to respond to clusters of pattern within the input, the system must develop its own representation of the input stimuli. o Reinforcement learning, sometimes called reward-penalty learning, is a combination of the above two methods; it is based on presenting input vector x to a neural network and looking at the output vector calculated by the network. If it is considered "good," then a "reward" is given to the network in the sense that the existing 1

connection weights are increased; otherwise the network is "punished," the connection weights decreased. o Nets whose weights are fixed without an iterative training process. Supervised learning network paradigms. Supervised Learning in Neural Networks: Perceptrons and Multilayer Perceptrons. Training set: A training set (named P) is a set of training patterns, which we use to train our neural net. Batch training of a network proceeds by making weight and bias changes based on an entire set (batch) of input vectors. Incremental training changes the weights and biases of a network as needed after presentation of each individual input vector. Incremental training is sometimes referred to as on line or adaptive training. Hebbian learning rule suggested by Hebb in his classic book Organization of Behavior: The basic idea is that if two units j and k are active simultaneously, their interconnection must be strengthened, If j receives input from k, the simplest version of Hebbian learning prescribes to modify the weight w jk with Where γ, is a positive constant of proportionality representing the learning rate. Another common rule uses not the actual activation of unit k but the deference between the actual and desired activation for adjusting the weights. in which d k is the desired activation provided by a teacher. This is often called the Widrow-Hoff rule or the delta rule. Error-correction learning Error-correction learning diagram 2

Let d k (n) denote some desired response or target response for neuron k at time n. Let the corresponding value of the actual response (output) of this neuron be denoted by y k (n). Typically, the actual response y k (n) of neuron k is different from the desired response d k (n). Hence, we may define an error signal e k (n) = y k (n) d k (n) The ultimate purpose of error-correction learning is to minimize a cost function based on the error signal e k (n). A criterion commonly used for the cost function is the instantaneous value of the mean square-error criterion The network is then optimized by minimizing J(n) with respect to the synaptic weights of the network. Thus, according to the error-correction learning rule (or delta rule), the synaptic weight adjustment is given by Let wkj(n) denote the value of the synaptic weight wkj at time n. At time n an adjustment Δwkj(n) is applied to the synaptic weight wkj(n), yielding the updated value w kj (n +1) = w kj (n) + Δw kj (n) The perceptron training algorithm The Perceptron Learning Rule Perceptrons are trained on examples of desired behavior, which can be summarized by a set of input-output pairs {p1, t1}, {p2, t2},, {pq, tq} The objective of training is to reduce the error e, which is the difference t a between the perceptron output a, and the target vector t. This is done by adjusting the weights (W) and biases (b) of the perceptron network according to following equations W new = W old + ΔW = W old + ep T b new = b old + Δb = b old + e Where e = t a 3

Diagram of a neuron: The neuron computes the weighted sum of the input signals and compares the result with a threshold value, θ. If the net input is less than the threshold, the neuron output is 1. But if the net input is greater than or equal to the threshold, the neuron becomes activated and its output attains a value +1. The neuron uses the following transfer or activation function: Single neuron training algorithm In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training a simple ANN: a perceptron. Single-layer two-input perceptron The Perceptron The operation of Rosenblatt s perceptron is based on the McCulloch and Pitts neuron model. The model consists of a linear combiner followed by a hard limiter. The weighted sum of the inputs is applied to the hard limiter, which produces an output equal to +1 if its input is positive and -1 if it is negative. The aim of the perceptron is to classify inputs, x1, x2,..., xn, into one of two classes, say A1 and A2. In the case of an elementary perceptron, the n- dimensional space is divided by a hyperplane into two decision regions. The hyperplane is defined by the linearly separable function: 4

Linear separability in the perceptrons How does the perceptron learn its classification tasks? This is done by making small adjustments in the weights to reduce the difference between the actual and desired outputs of the perceptron. The initial weights are randomly assigned, usually in the range [-0.5, 0.5], and then updated to obtain the output consistent with the training examples. If at iteration p, the actual output is Y(p) and the desired output is Yd (p), then the error is given by: where p = 1, 2, 3,... Iteration p here refers to the pth training example presented to the perceptron. If the error, e(p), is positive, we need to increase perceptron output Y(p), but if it is negative, we need to decrease Y(p). The perceptron learning rule where p = 1, 2, 3,... α is the learning rate, a positive constant less than unity. The perceptron learning rule was first proposed by Rosenblatt in 1960. Using this rule we can derive the perceptron training algorithm for classification tasks. 5

Perceptron s training algorithm Step 1: Initialization Set initial weights w 1, w 2,, wn and threshold θ to random numbers in the range [-0.5, 0.5]. Step 2: Activation Activate the perceptron by applying inputs x 1 (p), x 2 (p),, xn(p) and desired output Yd (p). Calculate the actual output at iteration p = 1 Where n is the number of the perceptron inputs, and step is a step activation function. Step 3: Weight training Update the weights of the perceptron where w i (p) is the weight correction at iteration p. The weight correction is computed by the delta rule: Step 4: Iteration Increase iteration p by one, go back to Step 2 and repeat the process until convergence. Example of perceptron learning: the logical operation AND (a) AND (x 1 6

Two-dimensional plots of basic logical operations A perceptron can learn the operations AND and OR, but not Exclusive-OR. 7