LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

Similar documents
Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

CS6220: DATA MINING TECHNIQUES

Supervised Learning in Neural Networks (Part 2)

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Multilayer Feed-forward networks

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Notes on Multilayer, Feedforward Neural Networks

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Week 3: Perceptron and Multi-layer Perceptron

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

COMPUTATIONAL INTELLIGENCE

Data Mining. Neural Networks

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Neural Network Neurons

Technical University of Munich. Exercise 7: Neural Network Basics

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Character Recognition Using Convolutional Neural Networks

For Monday. Read chapter 18, sections Homework:

Artificial Neural Networks MLP, RBF & GMDH

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

A Novel Pruning Algorithm for Optimizing Feedforward Neural Network of Classification Problems

Neural Network Weight Selection Using Genetic Algorithms

Neural Networks (Overview) Prof. Richard Zanibbi

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

CP365 Artificial Intelligence

Reification of Boolean Logic

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Lecture 20: Neural Networks for NLP. Zubin Pahuja

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

5 Learning hypothesis classes (16 points)

Pattern Classification Algorithms for Face Recognition

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Machine Learning 13. week

CS 4510/9010 Applied Machine Learning

27: Hybrid Graphical Models and Neural Networks

In this assignment, we investigated the use of neural networks for supervised classification

Learning and Generalization in Single Layer Perceptrons

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

A neural network that classifies glass either as window or non-window depending on the glass chemistry.

The Mathematics Behind Neural Networks

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

11/14/2010 Intelligent Systems and Soft Computing 1

Opening the Black Box Data Driven Visualizaion of Neural N

Lecture #11: The Perceptron

DEEP LEARNING IN PYTHON. The need for optimization

CSC 578 Neural Networks and Deep Learning

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section

Exercise: Training Simple MLP by Backpropagation. Using Netlab.

Instantaneously trained neural networks with complex inputs

Knowledge Discovery and Data Mining

Network Traffic Measurements and Analysis

WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES?

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Deep neural networks II

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

CHAPTER 6 COUNTER PROPAGATION NEURAL NETWORK IN GAIT RECOGNITION

Machine Learning in Biology

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

MULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS CHINMAY RANE. Presented to the Faculty of Graduate School of

CHAPTER VI BACK PROPAGATION ALGORITHM

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Deep Neural Networks Optimization

Fall 09, Homework 5

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CHAPTER 7 MASS LOSS PREDICTION USING ARTIFICIAL NEURAL NETWORK (ANN)

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Applying Neural Network Architecture for Inverse Kinematics Problem in Robotics

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Neural Networks Laboratory EE 329 A

CMPT 882 Week 3 Summary

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Unit V. Neural Fuzzy System

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Multi Layer Perceptron with Back Propagation. User Manual

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Simulation of objective function for training of new hidden units in constructive Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks

Hybrid PSO-SA algorithm for training a Neural Network for Classification

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Programming Exercise 4: Neural Networks Learning

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks

CHAPTER 3 RESEARCH METHODOLOGY

INTELLIGENT PROCESS SELECTION FOR NTM - A NEURAL NETWORK APPROACH

The exam is closed book, closed notes except your one-page cheat sheet.

OMBP: Optic Modified BackPropagation training algorithm for fast convergence of Feedforward Neural Network

Perceptron: This is convolution!

Neural Networks (pp )

Transcription:

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class label is known for all data. DATA is divided, as in any classification problem, into TRAINING and TEST data sets.

Neural Networks Classifier ALL DATA must be normalized, i.e. all values of attributes in the dataset has to be changed to contain values in the interval [0,1], or [-1,1]. TWO BASIC normalization techniques: Max- Min normalization and Decimal Scaling normalization.

Data Normalization Max-Min Normalization Performs a linear transformation on the original data. Given an attribute A, we denote by mina, maxa the minimum and maximum values of the values of the attribute A. Max-Min Normalization maps a value v of A to v in the range [new_mina, new_maxa] as follows.

Data Normalization Max- Min normalization formula is as follows: v min A v' = ( new _ max A new _ min A) + max A min A new _ min A Example: we want to normalize data to range of the interval [-1,1]. We put: new_max A= 1, new_mina = -1. In general, to normalize within interval [a,b], we put: new_max A= b, new_mina = a.

Example of Max-Min Normalization Max- Min normalization formula v min A v' = ( new _ max A new _ min A) + max A min A new _ min A Example: We want to normalize data to range of the interval [0,1]. We put: new_max A= 1, new_mina =0. Say, max A was 100 and min A was 20 ( That means maximum and minimum values for the attribute A). Now, if v = 40 ( If for this particular pattern, attribute value is 40 ), v will be calculated as, v = (40-20) x (1-0) / (100-20) + 0 => v = 20 x 1/80 => v = 0.4

Decimal Scaling Normalization Normalization by decimal scaling normalizes by moving the decimal point of values of attribute A. A value v of A is normalized to v by computing v' = v 10 where is the smallest integer such that max v <1. Example : A values range from -986 to 917. Max v = 986. v = -986 normalize to v = -986/1000 = -0.986

Neural Network Neural Network is a set of connected INPUT/OUTPUT UNITS, where each connection has a WEIGHT associated with it. Neural Network learning is also called CONNECTIONIST learning due to the connections between units. It is a case of SUPERVISED, INDUCTIVE or CLASSIFICATION learning.

Neural Network Neural Network learns by adusting the weights so as to be able to correctly classify the training data and hence, after testing phase, to classify unknown data. Neural Network needs long time for training. Neural Network has a high tolerance to noisy and incomplete data.

Neural Network Learning Learning is being performed by a backpropagation algorithm. The inputs are fed simultaneously into the input layer. The weighted outputs of these units are, in turn, are fed simultaneously into a neuron like units, known as a hidden layer. The hidden layer s weighted outputs can be input to another hidden layer, and so on. The number of hidden layers is arbitrary, but in practice, usually one or two are used. The weighted outputs of the last hidden layer are inputs to units making up the output layer.

A Multilayer Feed-Forward (MLFF) Neural Network Output vector; classes Output nodes w k O k Hidden nodes O Input nodes w i - weights Input vector; Record: x i Network is fully connected

MLFF Neural Network The units in the hidden layers and output layer are sometimes referred to as neurones, due to their symbolic biological basis, or as output units. A multilayer neural network shown on the previous slide has two layers of output units. Therefore, we say that it is a two-layer neural network.

MLFF Neural Network A network containing two hidden layers is called a three-layer neural network, and so on. The network is feed-forward in that none of the weights cycles back to an input unit or to an output unit of a previous layer.

MLFF Neural Network Output vector; classes Output nodes w k O k Hidden nodes O Input nodes w i - weights Input vector; Record: x i Network is fully connected

MLFF Network Input INPUT: records without class attribute with normalized attributes values. We call it an input vector. INPUT VECTOR: X = { x1, x2,. xn} where n is the number of (non class) attributes.

MLFF Network Topology INPUT LAYER there are as many nodes as non-class attributes i.e. as the length of the input vector. HIDDEN LAYER the number of nodes in the hidden layer and the number of hidden layers depends on implementation. O =1, 2..#hidden nodes

MLFF Network Topology OUTPUT LAYER corresponds to the class attribute. There are as many nodes as classes (values of the class attribute). O k k= 1, 2,.. #classes Network is fully connected, i.e. each unit provides input to each unit in the next forward layer.

Classification by Backpropagation Backpropagation is a neural network learning algorithm. It learns by iteratively processing a set of training data (samples), comparing the network s classification of each record (sample) with the actual known class label (classification).

Classification by Backpropagation For each training sample, the weights are modified as to minimize the mean squared error between the network s classification (prediction) and actual classification (value of the class attribute). These weights modifications are made in backwards direction, that is, from the output layer, through each hidden layer down to the first hidden layer. Hence the name backpropagation.

Steps in Backpropagation Algorithm STEP ONE: initialize the weights and biases. The weights in the network are initialized to small random numbers ranging for example from -1.0 to 1.0, or -0.5 to 0.5. Each unit has a BIAS associated with it (see next slide). The biases are similarly initialized to small random numbers. STEP TWO: feed the training sample (record)

Steps in Backpropagation Algorithm STEP THREE: propagate the inputs forward; we compute the net input and output of each unit in the hidden and output layers. STEP FOUR: backpropagate the error. STEP FIVE: update weights and biases to reflect the propagated errors. STEP SIX: repeat and apply terminating conditions.

A Neuron; a Hidden, or Output Unit - x 0 w 0 Bias Θ x 1 w 1 f output y x n w n Input vector x weight vector w weighted sum Activation function The inputs to unit are outputs from the previous layer. These are multiplied by their corresponding weights in order to form a weighted sum, which is added to the bias associated with unit. A nonlinear activation function f is applied to the net input.

Step Three: propagate the inputs forward For unit in the input layer, its output is equal to its input, that is, O = I for input unit. The net input to each unit in the hidden and output layers is computed as follows. Given a unit in a hidden or output layer, the net input is I = w O + θ i i where wi is the weight of the connection from unit i in the previous layer to unit ; Oi is the output of unit I from the previous layer; i θ is the bias of the unit

Step Three: propagate the inputs forward Each unit in the hidden and output layers takes its net input and then applies an activation function. The function symbolizes the activation of the neuron represented by the unit. It is also called a logistic, sigmoid, or squashing function. Given a net input I to unit, then O = f(i), the output of unit, is computed as O 1 = 1 + e I

Back propagation Formulas Output vector Output nodes Hidden nodes Err Err = O (1 O ) θ i = =θ O i k + (l) Err w = w + ( l) Err Err k ( 1 O )( T O ) O i k w Input nodes w i I O = w O + θ i 1 = 1 + e i i I Input vector: x i

Step 4: Back propagate the error When reaching the Output layer, the error is computed and propagated backwards. For a unit k in the output layer the error is computed by a formula: Err k = O k ( 1 O )( T O k k k Where O k is the actual output of unit k ( computed by activation function. 1 O k = 1 I + e k ) Tk is the TRUE output based of known class label; classification of training sample Observe: Ok(1-Ok) is a Derivative ( rate of change ) of activation function.

Step 4: Back propagate the error The error is propagated backwards by updating weights and biases to reflect the error of the network classification. For a unit in the hidden layer the error is computed by a formula: Err = O (1 O ) k Errw k k where wk is the weight of the connection from unit to unit k in the next higher layer, and Errk is the error of unit k.

Step 5: Update weights and biases Weights are updated by the following equations, where l is a constant between 0.0 and 1.0 reflecting the learning rate, this learning rate is fixed for implementation. Δ w = ( l ) Err O i i w i = w i + Δ w i Biases are updated by the following equations Δ θ = ( l ) Err θ = θ + Δθ

Weights and Biases Updates Case updating: we are updating weights and biases after the presentation of each sample (record). Epoch: One iteration through the training set is called an epoch. Epoch updating: The weight and bias increments are accumulated in variables and the weights and biases are updated after all of the samples of the training set have been presented. Case updating is more accurate

Terminating Conditions Training stops when All Δw i in the previous epoch are below some threshold, or The percentage of samples misclassified in the previous epoch is below some threshold, or a pre- specified number of epochs has expired. In practice, several hundreds of thousands of epochs may be required before the weights will converge.

Back Propagation Formulas Output vector Output nodes O 1 = 1 + e I Hidden nodes Err k Err = O k ( 1 O )( Tk Ok) k = O (1 O ) k Err k w k I = w O + θ i i i Input nodes i w i w = w + ( l) Err i θ =θ + (l) Err O i Input vector: x i

Example of Back Propagation Input = 3, Hidden Neuron = 2 Output =1 Initialize weights : Random Numbers from -1.0 to 1.0 Initial Input and weight x1 x2 x3 w14 w15 w24 w25 w34 w35 w46 w56 1 0 1 0.2-0.3 0.4 0.1-0.5 0.2-0.3-0.2

Example of Back Propagation Bias added to Hidden and output nodes Initialize Bias Bias: Random Values from -1.0 to 1.0 Bias ( Random ) θ4 θ5 θ6-0.4 0.2 0.1

Net Input and Output Calculation Unit Net Input I Output O 4 0.2 + 0 + 0.5-0.4 = -0.7 5-0.3 + 0 + 0.2 + 0.2 =0.1 6 (-0.3)0.332- (0.2)(0.525)+0.1= -0.105 O 1 = 1 + e O 1 = 1+ e 1 O = 1+ e 0.7 0.1 0.105 = 0.332 = 0.525 = 0.475

Calculation of Error at Each Node Unit Error 6 0.475(1-0.475)(1-0.475) =0.1311 We assume T 6 = 1 5 0.525 x (1-0.525)x 0.1311x (-0.2) = 0.0065 4 0.332 x (1-0.332) x 0.1311 x (-0.3) = -0.0087

Calculation of weights and Bias Updating Weight Learning Rate l =0.9 New Values w46-0.3 + 0.9(0.1311)(0.332) = - 0.261 w56-0.2 + (0.9)(0.1311)(0.525) = - 0.138 w14 0.2 + 0.9(-0.0087)(1) = 0.192 w15-0.3 + (0.9)(-0.0065)(1) = - 0.306..similarly similarly θ6 0.1 +(0.9)(0.1311)=0.218..similarly similarly

Some Facts to be Remembered NNs perform well, generally better with larger number of hidden units More hidden units generally produce lower error Determining network topology is difficult Choosing single learning rate impossible Difficult to reduce training time by altering the network topology or learning parameters NN with Subsets (see next slides) learning often produce better results

Advanced Features of Neural Network (to be covered by students presentations) Training with Subsets Modular Neural Network Evolution of Neural Network

Training with Subsets Select subsets of data Build new classifier on subset Aggregate with previous classifiers Compare error after adding a classifier Repeat as long as error decreases

Training with subsets Subset 1 NN 1 The Whole Dataset Split the dataset into subsets that can fit into memory Subset 2 Subset 3... NN 2 NN 3 Subset n NN n A Single Neural Network Model

Modular Neural Network Modular Neural Network Made up of a combination of several neural networks. The idea is to reduce the load for each neural network as opposed to trying to solve the problem on a single neural network.

Evolving Network Architectures Small networks without a hidden layer can t solve problems such as XOR, that are not linearly separable. Large networks can easily overfit a problem to match the training data, limiting their ability to generalize a problem set.

Constructive vs Destructive Algorithm Constructive algorithms take a minimal network and build up new layers nodes and connections during training. Destructive algorithms take a maximal network and prunes unnecessary layers nodes and connections during training.

Faster Convergence Back propagation requires many epochs to converge (An epoch is one presentation of all the training examples in the dataset) Some ideas to overcome this are: Stochastic learning: updates weights after each example, instead of updating them after one epoc Momentum: This optimization is due to the fact that it speeds up the learning when the weight are moving in a single direction continuously by increasing the size of steps The closer this value is to one, the more each weight change will not only include the current error, but also the weight change from previous examples (which often leads to faster convergence)