Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Similar documents
Neural Network Application Design. Supervised Function Approximation. Supervised Function Approximation. Supervised Function Approximation

Supervised Learning in Neural Networks (Part 2)

Multilayer Feed-forward networks

Practice Exam Sample Solutions

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

ImageNet Classification with Deep Convolutional Neural Networks

Neuro-Fuzzy Inverse Forward Models

Week 3: Perceptron and Multi-layer Perceptron

COMPUTATIONAL INTELLIGENCE

CMPT 882 Week 3 Summary

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

CHAPTER 4 IMPLEMENTATION OF BACK PROPAGATION ALGORITHM NEURAL NETWORK FOR STEGANALYSIS

Practical Tips for using Backpropagation

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

11/14/2010 Intelligent Systems and Soft Computing 1

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

A neural network that classifies glass either as window or non-window depending on the glass chemistry.

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Machine Learning Classifiers and Boosting

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

Feedback Alignment Algorithms. Lisa Zhang, Tingwu Wang, Mengye Ren

MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS

APPLICATION OF A MULTI- LAYER PERCEPTRON FOR MASS VALUATION OF REAL ESTATES

Character Recognition Using Convolutional Neural Networks

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Perceptron: This is convolution!

NEURAL NETWORK VISUALIZATION

Fast Learning for Big Data Using Dynamic Function

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Data Mining. Neural Networks

CHAPTER VI BACK PROPAGATION ALGORITHM

Artificial Neural Networks MLP, RBF & GMDH

Neural Network Estimator for Electric Field Distribution on High Voltage Insulators

Deep Neural Networks Optimization

CS6220: DATA MINING TECHNIQUES

In this assignment, we investigated the use of neural networks for supervised classification

CS 8520: Artificial Intelligence

International Journal of Computer Sciences and Engineering. Research Paper Volume-5, Issue-10 E-ISSN:

Handout 4 - Interpolation Examples

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Linear Regression Implementation

Back propagation Algorithm:

A Systematic Overview of Data Mining Algorithms

6. Backpropagation training 6.1 Background

Constructing Hidden Units using Examples and Queries

Review on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network

Reification of Boolean Logic

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Image Compression: An Artificial Neural Network Approach

Query Learning Based on Boundary Search and Gradient Computation of Trained Multilayer Perceptrons*

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

DEEP LEARNING IN PYTHON. The need for optimization

Graphing Linear Equations

The Mathematics Behind Neural Networks

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Pattern Classification Algorithms for Face Recognition

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Neural Network Estimator for Electric Field Distribution on High Voltage Insulators

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

16. LECTURE 16. I understand how to find the rate of change in any direction. I understand in what direction the maximum rate of change happens.

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Evaluating Computational Performance of Backpropagation Learning on Graphics Hardware 1

Artificial Neural Networks

Vocabulary Unit 2-3: Linear Functions & Healthy Lifestyles. Scale model a three dimensional model that is similar to a three dimensional object.

Artificial Neural Network based Curve Prediction

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Hybrid PSO-SA algorithm for training a Neural Network for Classification

Bioinformatics - Lecture 07

Load forecasting of G.B. Pant University of Agriculture & Technology, Pantnagar using Artificial Neural Network

Neural Network Approach for Automatic Landuse Classification of Satellite Images: One-Against-Rest and Multi-Class Classifiers

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

Deep Learning. Architecture Design for. Sargur N. Srihari

Notes on Multilayer, Feedforward Neural Networks

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

291 Programming Assignment #3

Experimental Data and Training

We can use square dot paper to draw each view (top, front, and sides) of the three dimensional objects:

Neural Nets. CSCI 5582, Fall 2007

Ensemble methods in machine learning. Example. Neural networks. Neural networks

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Report: Privacy-Preserving Classification on Deep Neural Network

A Neural Network for Real-Time Signal Processing

Learning internal representations

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

International Journal of Advanced Research in Computer Science and Software Engineering

The exam is closed book, closed notes except your one-page cheat sheet.

Transcription:

Linear Separability Input space in the two-dimensional case (n = ): - - - - - - w =, w =, = - - - - - - w = -, w =, = - - - - - - w = -, w =, = Linear Separability So by varying the weights and the threshold, we can realize any linear separation of the input space into a region that yields output, and another region that yields output As we have seen, a two-dimensional input space can be divided by any straight line A three-dimensional input space can be divided by any two-dimensional plane In general, an n-dimensional input space can be divided by an (n-)-dimensional plane or hyperplane Of course, for n > this is hard to visualize What do we do if we need a more complex function? We can combine multiple artificial neurons to form networks with increased capabilities For example, we can build a two-layer network with any number of neurons in the first layer giving input to a single neuron in the second layer The neuron in the second layer could, for example, implement an AND function x i What kind of function can such a network realize? 4 Assume that the dotted lines in the diagram represent the input-dividing lines implemented by the neurons in the first layer: nd comp st comp Then, for example, the second-layer neuron could output if the input is within a polygon, and otherwise However, we still may want to implement functions that are more complex than that An obvious idea is to extend our network even further Let us build a network that has three layers, with arbitrary numbers of neurons in the first and second layers and one neuron in the third layer The first and second layers are completely connected, that is, each neuron in the first layer sends its output to every neuron in the second layer 5 6

Assume that the polygons in the diagram indicate the input regions for which each of the second-layer neurons yields output : nd comp o i What type of function can a three-layer network realize? st comp Then, for example, the third-layer neuron could output if the input is within any of the polygons, and otherwise 7 8 The more neurons there are in the first layer, the more vertices can the polygons have With a sufficient number of first-layer neurons, the polygons can approximate any given shape The more neurons there are in the second layer, the more of these polygons can be combined to form the output function of the network With a sufficient number of neurons and appropriate weight vectors w i, a three-layer network of threshold neurons can realize any (!) function R n {, } 9 Terminology Usually, we draw neural networks in such a way that the input enters at the bottom and the output is generated at the top Arrows indicate the direction of data flow The first layer, termed input layer, just contains the input vector and does not perform any computations The second layer, termed hidden layer, receives input from the input layer and sends its output to the output layer After applying their activation function, the neurons in the output layer contain the output vector Terminology Example: Network function f: R {, } output vector output layer hidden layer input layer input vector

General Network Structure Feedback-Based Weight Adaptation Feedback from environment (possibly teacher) is used to improve the system s performance Synaptic weights are modified to reduce the system s error in computing a desired function For example, if increasing a specific weight increases error, then the weight is decreased Small adaptation steps are needed to find optimal set of weights Learning rate can vary during learning process Typical for supervised learning 4 Network Training Basic idea: Define error function to measure deviation of network output from desired output across all training exemplars As the weights of the network completely determine the function computed by it, this error is a function of all weights We need to find those weights that minimize the error An efficient way of doing this is based on the technique of gradient descent Gradient descent is a very common technique to find the absolute minimum of a function It is especially useful for high-dimensional functions We will use it to iteratively minimizes the network s (or neuron s) error by finding the gradient of the error surface in weight-space and adjusting the weights in the opposite direction 5 6 Gradient-descent example: Finding the absolute minimum of a one-dimensional error function f(x): Gradients of two-dimensional functions: f(x) slope: f (x ) x = x - f (x ) Repeat this iteratively until for some x i, f (x i ) is sufficiently close to x 7 The two-dimensional function in the left diagram is represented by contour lines in the right diagram, where arrows indicate the gradient of the function at different locations Obviously, the gradient is always pointing in the direction of the steepest increase of the function In order to find the function s minimum, we should always move against the gradient 8

Multilayer Networks The backpropagation algorithm was popularized by Rumelhart, Hinton, and Williams (986) This algorithm solved the credit assignment problem, ie, crediting or blaming individual neurons across layers for particular outputs The error at the output layer is propagated backwards to units at lower layers, so that the weights of all neurons can be adapted appropriately Backpropagation Learning Algorithm Backpropagation; Start with randomly chosen weights; while MSE is above desired threshold and computational bounds are not exceeded, do for each input pattern x p, p P, Compute hidden node inputs; Compute hidden node outputs; Compute inputs to the output nodes; Compute the network outputs; Compute the error between output and desired output; Modify the weights between hidden and output nodes; Modify the weights between input and hidden nodes; end-for end-while 9 There is a tradeoff between a network s ability to precisely learn the given exemplars and its ability to generalize (ie, inter- and extrapolate) This problem is similar to fitting a function to a given set of data points Let us assume that you want to find a fitting function f:r R for a set of three data points You try to do this with polynomials of degree one (a straight line), two, and nine f(x) Obviously, the polynomial of degree provides the most plausible fit x deg deg deg 9 The same principle applies to ANNs: If an ANN has too few neurons, it may not have enough degrees of freedom to precisely approximate the desired function If an ANN has too many neurons, it will learn the exemplars perfectly, but its additional degrees of freedom may cause it to show implausible behavior for untrained inputs; it then presents poor ability of generalization Unfortunately, there are no known equations that could tell you the optimal size of your network for a given application; there are only heuristics Reducing Overfitting with Dropout During each training step, we turn off a randomly chosen subset of 5% of the hidden-layer neurons, ie, we set their output to zero During testing, we once again use all neurons but reduce their outputs by 5% to compensate for the increased number of inputs to each unit By doing this, we prevent each neuron from relying on the output of any particular other neuron in the network It can be argued that in this way we train an astronomical number of decoupled sub-networks, whose expertise is combined when using all neurons again Due to the changing composition of sub-networks it is much more difficult to overfit any of them 4 4

Reducing Overfitting with Dropout 5 5