CMPT 882 Week 3 Summary

Similar documents
Data Mining. Neural Networks

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

CS6220: DATA MINING TECHNIQUES

Multilayer Feed-forward networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

For Monday. Read chapter 18, sections Homework:

Supervised Learning in Neural Networks (Part 2)

Machine Learning Classifiers and Boosting

CP365 Artificial Intelligence

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Instructor: Jessica Wu Harvey Mudd College

Week 3: Perceptron and Multi-layer Perceptron

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Neural Network Neurons

Pattern Classification Algorithms for Face Recognition

Machine Learning Week 3

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

Technical University of Munich. Exercise 7: Neural Network Basics

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

MULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS CHINMAY RANE. Presented to the Faculty of Graduate School of

Lecture #11: The Perceptron

Linear Discriminant Functions: Gradient Descent and Perceptron Convergence

Neural Networks (pp )

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Logical Rhythm - Class 3. August 27, 2018

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Model learning for robot control: a survey

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Opening the Black Box Data Driven Visualizaion of Neural N

Neural Networks CMSC475/675

CS 354R: Computer Game Technology

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Reservoir Computing with Emphasis on Liquid State Machines

Image Compression: An Artificial Neural Network Approach

COMPUTATIONAL INTELLIGENCE

Artificial Neural Networks

Exercise: Training Simple MLP by Backpropagation. Using Netlab.

Ensemble methods in machine learning. Example. Neural networks. Neural networks

CHAPTER VI BACK PROPAGATION ALGORITHM

Notes on Multilayer, Feedforward Neural Networks

Multi-Layered Perceptrons (MLPs)

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

11/14/2010 Intelligent Systems and Soft Computing 1

Learning and Generalization in Single Layer Perceptrons

Artificial Neural Networks (Feedforward Nets)

Back propagation Algorithm:

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

Design and Performance Analysis of and Gate using Synaptic Inputs for Neural Network Application

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Artificial Neural Networks MLP, RBF & GMDH

Simple Model Selection Cross Validation Regularization Neural Networks

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Support Vector Machines

CS547, Neural Networks: Homework 3

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Unit V. Neural Fuzzy System

Classification: Linear Discriminant Functions

Machine Learning 13. week

5 Learning hypothesis classes (16 points)

Adaptive Regularization. in Neural Network Filters

Character Recognition Using Convolutional Neural Networks

Neural Networks. By Laurence Squires

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Machine Learning in Biology

COMPUTATIONAL INTELLIGENCE

In this assignment, we investigated the use of neural networks for supervised classification

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Logistic Regression

6. Linear Discriminant Functions

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Machine Learning in Telecommunications

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Automatic Adaptation of Learning Rate for Backpropagation Neural Networks

Neural Network Weight Selection Using Genetic Algorithms

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

CSEP 573: Artificial Intelligence

Introduction to Multilayer Perceptrons

An Improved Backpropagation Method with Adaptive Learning Rate

Use of Artificial Neural Networks to Investigate the Surface Roughness in CNC Milling Machine

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Online Algorithm Comparison points

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Visual object classification by sparse convolutional neural networks

Future Image Prediction using Artificial Neural Networks

Neural Networks for Classification

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

Yuki Osada Andrew Cannon

Transcription:

CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being able to compute results quickly interpolating data well.! There are two main types of ANNs, feed forward networks and recurrent networks. Perceptrons is a special case of feed forward networks with only input and output nodes.! Three main Perceptron learning algorithms are covered: mistake bound Perceptron algorithm, Perceptron training rule and the Delta rule. The delta rule uses gradient descent which makes it easy to compute what changes are needed to optimize the network.! The backpropagation learning algorithm is widely used for multi-layer feed forward network. This uses gradient descent as well.! Bayesian learning is based on statistics and knowledge of prior statistics to classify or predict. Bayes Theorem is central to Bayesian learning.

Overview Artificial Neural Networks (ANNs) are loosely modeled after the brain. ANNs are composed of units (also called nodes) that are modeled after neurons, with weighted links interconnecting the units together. The main difference between ANNs and other learning mechanisms is that it is composed of these simple units and they work together in a highly parallel manner. Artificial Neural Networks should be used when:! Target function is not known! Readability of result is not important! Multi-dimensional input! Continuous or discrete input values! Possibly noisy Input data! Output is a vector of continuous or discrete values Applications of ANNs:! Control (such as cars in ALVINN)! Recognize/Classify (written/spoken words)! Predict (trends) Typical Unit: In the unit shown above, there are n input units (X i ) connected with n links (W i ). It also has one output. The ANN unit is composed of two main parts: the first part sums the input and sends it to the threshold function. If the activation is greater than 0 then the unit activates and sends a 1 as the output, otherwise it sends a 0 (or 1). The X 0 can be set to any value so that instead of tuning the threshold function to activate at some fixed point Y, X 0 can be set to -Y.

Feed Forward Networks: Types of Neural Networks! Activation flows in one direction only! Multi-layer Feed Forward Networks: o Can approximate any function o Not guaranteed to reach the best possible approximation, could be trapped in a local minima! Perceptrons (No hidden units) o Guaranteed to converge to any linearly separable function o Simpler to work with and see results, only two layers Recurrent Networks:! Cycles used to allow for states

Perceptron Learning Perceptrons Perceptrons can represent any linearly separated surface. That is, when the values of each input are plotted into n-dimensional space (where n is the number of inputs), if there can be a straight plane that can divide all members as true on one side and false on the other. Although difficult to visualize with many input variables, it is easy to show with two variables (where the line represents the plane in the 2D space). Linearly separable surface Non-linearly separable surface Many functions such as these are possible, such as and and or. Functions that are not possible to describe with perceptrons cannot be linearly separated, such as XOR. Learning Learning algorithms are used to change the weights of the links between units. The goal of the learning algorithms will always be to reduce the error of the training set to the target. Let w be the vector of input weights and x be the vector of input to the ANN. Approaches to Perceptron Learning:! Mistake Bound Perceptron Algorithm o Initialize w = o o For each training is sufficient # Predict 1 ifff w * x > 0 else 0 # Mistake on positive: w w + x # Mistake on negative: w w - x! Perceptron Training Rule o Initialize w = small random numbers o Let η represent the learning rate, which is used to control the change at each step. t is the target of the unit, o is the output of the unit.

o For each training example # w j η (t-o)x j # w j w j + w j! Delta Rule [Gradient Descent] Delta Rule o Initialize w = small random numbers o Until termination condition is met (error bound, or iterations of training examples) # Initialize all w i 0 # For each training example ( x, t) Compute the output of each node: O( x ) For each weight unit w i: w i w i + η (t-o)x j # For Each weight unit w i : w i w i + w i! Attempts to minimize the squared error of the training examples! Guaranteed that there is a single minimum error! Uses Sigmoid function σ (x): 1/(1+e -x )! dσ (x)/dx = σ (x)(1-σ (x))! The sigmoid function is used because the derivative is very easy to find! Gradient: E[ w ] = [de/dw o, de/dw 1, de/dw 2,.. de/dw n ]! Training Rule: w = -η E[ w ] Advantages of Deltra training rule versus Perceptron training rule:! Guaranteed to always converge to a hypothesis with minimum squared error (with a small learning rate)! Allows for noise in the data! Allows for non-separable functions

Backpropagation Algorithm The backpropagation algorithm is a multi-layer network using a weight adjustment based on the sigmoid function, like the delta rule. The backpropagation method, as well as all the methods previously mentioned are examples of supervised learning, where the target of the function is known. The following is an example of the backpropagation algorithm working on a small Artificial Neural Network. The Network has a single hidden layer of size two and input and output nodes of size 3. Initialize the weighted links. Typically the weights are initialized to a small random number. Then, for each training example in the testing set: Input the training data to the input nodes, then calculate O k, which is the output of node k. This is done for each node in the hidden layer(s) and output layer. Then calculate δ k for the each output node, where t k is the target of the node: δ k O k (1 O k )(t k O k )

Now calculate k for the each hidden node: δ k O k (1 O k ) w h,k δ k kε child(h) Finally adjust the weights of all the links, where x i is the activation and η is the learning rate: W i,j W i,j + ηδ j x i Most likely the neural network will need to be trained at many iterations of the training set to find an acceptable approximation of the function it is being trained on. Bayesian Learning Bayesian learning is based on probability distributions that are previously known. Using these statistics, a Bayesian network can learn to classify data and predict outcomes. Bayes Theorem: P(A B) = P(B A)P(A) P(B) A simple proof of Bayes Theorem: By definition, P(A B) = P(A,B) P(B) It follows that: P(A,B) = P(A B)P(B) By the same reasoning: P(A,B) = P(B A)P(A) Therefore, P(B A)P(A) = P(A B)P(B), since both equal to P(A,B) Then, P(A B) = P(B A)P(A) P(B)

Definitions Artificial Neural Networks: (ANNs) these networks allow for learning using highly parallel series of simple units and are suited for data that is noisy and vector based. Backpropagation: a learning algorithm for multi-layered feed forward networks that uses the sigmoid function Hidden layer: the set of nodes that are not input or output units Learning rate: a value greater than 0 but less than 1, this is used so that the weights on the links do not change to quickly, or the ANN might never converge onto the optimal solution. Linearly separable function: A function where if plotted in a n-dimensional plane, the negative and positive examples of the function can be totally separated using a straight plane across the space. Multi-layer Feed Forward Networks: a network with at least one unit that is not output or input, where the direction of data flow is in only one direction. Perceptrons: a network with no units that are output or input, where the direction of data flow is in only one direction. Supervised learning: all learning algorithms where the known targets are used to adjust the network. Target: The expected output of the input. This is used to calculate the error. Threshold function: the function to decide whether a unit should fire or not. Typically 1 for exceeding the threshold and 0 or 1 otherwise. Units/Nodes: simple elements of a ANN, they take in input from other nodes or training data, sum up the data and applies a threshold function to decide what output to send. Weighted links: connects units together, conceptually shows the strength of the bond between two units.