Motivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function

Similar documents
Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Learning from Data: Adaptive Basis Functions

Lecture #11: The Perceptron

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Convolutional Neural Networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

For Monday. Read chapter 18, sections Homework:

Cost Functions in Machine Learning

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Knowledge Discovery and Data Mining

Supervised Learning in Neural Networks (Part 2)

Neural Networks. By Laurence Squires

Opening the Black Box Data Driven Visualizaion of Neural N

Optimization. Industrial AI Lab.

Machine Learning 13. week

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

CS489/698: Intro to ML

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Logical Rhythm - Class 3. August 27, 2018

Neural Network Neurons

Lecture 20: Neural Networks for NLP. Zubin Pahuja

M. Sc. (Artificial Intelligence and Machine Learning)

Data Mining. Neural Networks

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Deep Learning With Noise

Introduction to Deep Learning

CP365 Artificial Intelligence

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Opening the black box of Deep Neural Networks via Information (Ravid Shwartz-Ziv and Naftali Tishby) An overview by Philip Amortila and Nicolas Gagné

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Recurrent Neural Network (RNN) Industrial AI Lab.

Deep Learning with R. Francesca Lazzeri Data Scientist II - Microsoft, AI Research

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Advanced Introduction to Machine Learning, CMU-10715

Administrative. Assignment 1 due Wednesday April 18, 11:59pm

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Adaptive Regularization. in Neural Network Filters

Simple Model Selection Cross Validation Regularization Neural Networks

arxiv: v1 [cs.lg] 25 Jan 2018

Graph Neural Network. learning algorithm and applications. Shujia Zhang

Deep Learning with Intel DAAL

Neural Networks (pp )

Data Mining and Analytics

Deep Neural Networks Optimization

Optimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

CS 4510/9010 Applied Machine Learning. Deep Learning. Paula Matuszek Fall copyright Paula Matuszek 2016

EE 511 Neural Networks

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Machine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016

CS 229 Midterm Review

Artificial Neural Networks

Hidden Units. Sargur N. Srihari

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Neural Networks for Classification

Backpropagation and Neural Networks. Lecture 4-1

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

CS6220: DATA MINING TECHNIQUES

Optimum size of feed forward neural network for Iris data set.

INTRODUCTION TO DEEP LEARNING

The Problem of Overfitting with Maximum Likelihood

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

ECE 5470 Classification, Machine Learning, and Neural Network Review

Evaluation of Machine Learning Algorithms for Satellite Operations Support

WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES?

Automated Crystal Structure Identification from X-ray Diffraction Patterns

Quantifying Data Needs for Deep Feed-forward Neural Network Application in Reservoir Property Predictions

CSC 2515 Introduction to Machine Learning Assignment 2


Instructor: Jessica Wu Harvey Mudd College

INVESTIGATING DATA MINING BY ARTIFICIAL NEURAL NETWORK: A CASE OF REAL ESTATE PROPERTY EVALUATION

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Multi Layer Perceptron trained by Quasi Newton Algorithm

Introduction to Neural Networks

AMOL MUKUND LONDHE, DR.CHELPA LINGAM

Edge Pixel Classification Using Automatic Programming

Keywords: Extraction, Training, Classification 1. INTRODUCTION 2. EXISTING SYSTEMS

β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual

MACHINE LEARNING CLASSIFIERS ADVANTAGES AND CHALLENGES OF SELECTED METHODS

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Learning. Architecture Design for. Sargur N. Srihari

RAJESH KEDIA 2014CSZ8383

The exam is closed book, closed notes except your one-page cheat sheet.

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Neural Network Approach for Automatic Landuse Classification of Satellite Images: One-Against-Rest and Multi-Class Classifiers

Transcription:

Neural Networks

Motivation Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight Fixed basis function

Flashback: Linear regression

Flashback: Linear regression Basis functions. Sometimes called features.

Motivation Problem: With our linear methods, we can train the weights but not the basis functions: Can we learn the basis functions?

Motivation Problem: With our linear methods, we can train the weights but not the basis functions: Can we learn the basis functions? Maybe reuse ideas from our previous technology? Adjustable weight vectors?

Historical notes This motivation is not how the technology came about Neural networks were developed in an attempt to achieve AI by modelling the brain (but artificial neural networks have very little to do with biological neural networks).

Truth in advertising We usually still want to extract features from the data, but in a different way... we do not have to guess at basis functions but just use features to extract what is most relevant in our data, based on our intuition about the data...

Feed-forward neural network Two levels of linear regression (or classification): Output layer Hidden layer

Flashback: Linear regression Basis functions. Sometimes called features.

Flashback: Linear classification Non-linear function assigning a class.

Feed-forward neural network

Network training If we can give the network a probabilistic interpretation, we can use our standard estimation methods for training.

Network training If we can give the network a probabilistic interpretation, we can use our standard estimation methods for training. Regression problems:

Network training If we can give the network a probabilistic interpretation, we can use our standard estimation methods for training. Classification problems:

Network training If we can give the network a probabilistic interpretation, we can use our standard estimation methods for training. Training by minimizing an error function.

Network training If we can give the network a probabilistic interpretation, we can use our standard estimation methods for training. Training by minimizing an error function. Typically, we cannot minimize analytically but need numerical optimization algorithms.

Network training If we can give the network a probabilistic interpretation, we can use our standard estimation methods for training. Training by minimizing an error function. Typically, we cannot minimize analytically but need numerical optimization algorithms. Notice: There will typically be more than one minimum of the error function (due to symmetries). Notice: At best, we can find local minima.

Example tanh activation function for 3 hidden layers Linear activation function for output layer

Example tanh activation function for 3 hidden layers Linear activation function for output layer

Example tanh activation function for 3 hidden layers Linear activation function for output layer

Example tanh activation function for 3 hidden layers Linear activation function for output layer

Example tanh activation function for 3 hidden layers Linear activation function for output layer

Parameter optimization Numerical optimization is beyond the scope of this class you can (should) get away with just using appropriate libraries.

Parameter optimization Numerical optimization is beyond the scope of this class you can (should) get away with just using appropriate libraries. These can typically find (local) minima of general functions, but the most efficient algorithms also need the gradient of the function to minimize.

Back propagation Back propagation is an efficient algorithm for computing both the error function and the gradient.

Back propagation Back propagation is an efficient algorithm for computing both the error function and the gradient. We just focus on this guy

Back propagation After running the feed-forward algorithm, we know the values z i and y k. we need to compute the δs (called errors).

kz i Error at destination node (output or hidden layer) Value at source node (hidden or input layer)

We compute the node values from input to output (the feed-forward algorithm)

We compute the error values from output to input (the back-propagation algorithm)

Back propagation The choice of error function and output activator typically makes it simple to deal with the output layer

Back propagation For hidden layers, we apply the chain rule

Calculating error and gradient Algorithm 1. Apply feed forward to calculate hidden and output variables and then the error. 2. Calculate output errors and propagate errors back to get the gradient. 3. Iterate ad lib.

Calculating error and gradient Algorithm 1. Apply feed forward to calculate hidden and output variables and then the error. 2. Calculate output errors and propagate errors back to get the gradient. 3. Iterate ad lib.

Calculating error and gradient Algorithm 1. Apply feed forward to calculate hidden and output variables and then the error. 2. Calculate output errors and propagate errors back to get the gradient. 3. Iterate ad lib.

Calculating error and gradient Algorithm 1. Apply feed forward to calculate hidden and output variables and then the error. 2. Calculate output errors and propagate errors back to get the gradient. 3. Iterate ad lib.

Calculating error and gradient Algorithm 1. Apply feed forward to calculate hidden and output variables and then the error. 2. Calculate output errors and propagate errors back to get the gradient. 3. Iterate ad lib.

Calculating error and gradient Algorithm 1. Apply feed forward to calculate hidden and output variables and then the error. 2. Calculate output errors and propagate errors back to get the gradient. 3. Iterate ad lib.

Calculating error and gradient Algorithm 1. Apply feed forward to calculate hidden and output variables and then the error. 2. Calculate output errors and propagate errors back to get the gradient. 3. Iterate ad lib.

Calculating error and gradient Algorithm 1. Apply feed forward to calculate hidden and output variables and then the error. 2. Calculate output errors and propagate errors back to get the gradient. 3. Iterate ad lib.

Overfitting The complexity of the network is determined by the hidden layer and overfitting is still a problem. Use e.g. test datasets to alleviate this problem.