Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Similar documents
Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Supervised Learning in Neural Networks (Part 2)

Multilayer Feed-forward networks

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Neural Network Neurons

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

CP365 Artificial Intelligence

Motivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function

Notes on Multilayer, Feedforward Neural Networks

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Week 3: Perceptron and Multi-layer Perceptron

Data Mining. Neural Networks

Lecture 20: Neural Networks for NLP. Zubin Pahuja

CMPT 882 Week 3 Summary

Character Recognition Using Convolutional Neural Networks

DEEP LEARNING IN PYTHON. The need for optimization

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

Introduction to Neural Networks

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Hybrid PSO-SA algorithm for training a Neural Network for Classification

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

For Monday. Read chapter 18, sections Homework:

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Neural Networks CMSC475/675

Machine Learning Classifiers and Boosting

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Artificial Neural Networks MLP, RBF & GMDH

CS6220: DATA MINING TECHNIQUES

11/14/2010 Intelligent Systems and Soft Computing 1

Khmer Character Recognition using Artificial Neural Network

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Artificial Neural Networks (Feedforward Nets)

ImageNet Classification with Deep Convolutional Neural Networks

2. Neural network basics

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Multi Layer Perceptron with Back Propagation. User Manual

Simple Model Selection Cross Validation Regularization Neural Networks

COMPUTATIONAL INTELLIGENCE

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

House Price Prediction Using LSTM

A Systematic Overview of Data Mining Algorithms

Reservoir Computing with Emphasis on Liquid State Machines

Technical University of Munich. Exercise 7: Neural Network Basics

In this assignment, we investigated the use of neural networks for supervised classification

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Back propagation Algorithm:

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

Lecture #11: The Perceptron

Artificial Neural Networks

5 Learning hypothesis classes (16 points)

CS 4510/9010 Applied Machine Learning

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Logical Rhythm - Class 3. August 27, 2018

Performance Evaluation of Artificial Neural Networks for Spatial Data Analysis

Neural Networks and Deep Learning

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Introduction to Multilayer Perceptrons

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

arxiv: v1 [cs.lg] 25 Jan 2018

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Review on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network

Overlapping Swarm Intelligence for Training Artificial Neural Networks

Artificial Intelligence

Pattern Classification Algorithms for Face Recognition

Transactions on Information and Communications Technologies vol WIT Press, ISSN

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Predict the box office of US movies

Neural network based Numerical digits Recognization using NNT in Matlab

Deep Learning with R. Francesca Lazzeri Data Scientist II - Microsoft, AI Research

An introduction to neural networks for beginners. By Dr Andy Thomas Adventures in Machine Learning

Deep Learning With Noise

Machine Learning 13. week

6. Backpropagation training 6.1 Background

Yuki Osada Andrew Cannon

THE NEURAL NETWORKS: APPLICATION AND OPTIMIZATION APPLICATION OF LEVENBERG-MARQUARDT ALGORITHM FOR TIFINAGH CHARACTER RECOGNITION

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Machine Learning in Biology

Fall 09, Homework 5

Deep Neural Networks with Flexible Activation Function

CS 354R: Computer Game Technology

A neural network that classifies glass either as window or non-window depending on the glass chemistry.

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

RIMT IET, Mandi Gobindgarh Abstract - In this paper, analysis the speed of sending message in Healthcare standard 7 with the use of back

Homework 5. Due: April 20, 2018 at 7:00PM

MODELLING OF ARTIFICIAL NEURAL NETWORK CONTROLLER FOR ELECTRIC DRIVE WITH LINEAR TORQUE LOAD FUNCTION

Optimization. Industrial AI Lab.

Transcription:

Perceptrons and Backpropagation Fabio Zachert Cognitive Modelling WiSe 2014/15

Content History Mathematical View of Perceptrons Network Structures Gradient Descent Backpropagation (Single-Layer-, Multilayer-Networks) Learning The Structure Questions

History Inspired by the brains' information processing 1943: Warren McCulloch and Walter Pitts create computational model 1974: Paul Werbos creates backpropagation-algorithm Generally used for regression and classification in AI

Mathematical Model of Perceptrons

Mathematical Model of Perceptrons Activation function Non-linear function with output between 0 and 1 Often a thresholdhold function or sigmoid Sigmoid:

Boolean Function -0.5-1 But how to create the XOR operator? Use a network of units

Network Structures Combine multiple units to generate more complex functions Feed-forward networks No loops allowed Recurrent networks Loops allowed More complex functions Possibility to simulate memory No convergence guaranteed

Network Structures

Network Structures Input Units

Network Structures Hidden Layer

Network Structures Output unit

Network Structures

Gradient Descent Problem: Find argminx f(x) Minimum cannot be calculated directly Gradient defines the direction of the highest slope at a given point Use gradient of f(x) to search downhill in the direction of the minimum Algorithm: Init x Until convergence x x α * f'(x) α is the learning rate

Gradient Descent When to converge? After n steps Gradient threshold Gain threshold How to set learn rate α? Experience of the programmer Adaptive algorithms No guarantee to find global minimum

Single-Layer Backpropagation

Single-Layer Backpropagation A supervised learning task Given (x,y) pairs Learn general function for all x Minimize sum of square error (SSE) for network-function h(x) with weight-vector w Use gradient descent to find optimal w Calculate partial derivation of the error over the weight-vector

Single-Layer Backpropagation Recall: Derivation Partial derivation : derivation of a function with multiple input above one single input Rules: Sum-rule: Chain-rule: Product-rule: Power-rule: Constant-factor-rule:

Single-Layer Backprogation Calculate partial derivation For

Single-Layer Backpropagation Algorithm: Calculate perceptron output and error for each training sample Update weights, based on error-gradient

Multi-Layer Backpropagation

Multi-Layer Backpropagation Extend network with a hidden-layer Update-rule for output-layer based on output of hidden-layer with Propagate the error back on the hidden layer Update-rule for hidden-layer is based on its output One hidden-layer unit is partly responsible for the errors of the connected output-units with

Multi-Layer Backpropagation Algorithm:

Multi-Layer Backpropagation Algorithm: Init input-layer Feed-forward input through layers

Multi-Layer Backpropagation Algorithm: Calculate error of output-layer

Multi-Layer Backpropagation Algorithm: Propagate error backwards through the layers

Multi-Layer Backpropagation Algorithm: Correct weights based on error-gradient

Learning the Structure But, how to learn the size/structure of the network? Oversized nets tend to overfitting Learn the general function and not the noise Try to estimate the optimal size: Optimal-brain-damage-algorithm Cross-validation

N-Fold-Crossvalidation Separate data for learning and evaluation Split dataset in n parts n-1 parts for training, 1 part for evaluation 1 2 3... n Use different parts for evaluation Calculate average evaluation error Use network-size with the lowest evaluation-error

Questions

Questions How to initialize weights?

Sources Russell, S., & Norvig, P. (2003). Artificial intelligence: A modern approach. Upper Saddle River, NJ: Pearson Education Madsen, K., Nielsen, H.B., Tingleff, O., (2004, 2nd Edition), Methods for non-linear least squares problems Murphy, Kevin P. (2012). Machine learning: a probabilistic perspective. MIT press http://en.wikipedia.org/wiki/differentiation_rules

Thank you for your attention!