CP365 Artificial Intelligence

Similar documents
CMPT 882 Week 3 Summary

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Neural Network Neurons

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

Supervised Learning in Neural Networks (Part 2)

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

CS6220: DATA MINING TECHNIQUES

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Data Mining. Neural Networks

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Week 3: Perceptron and Multi-layer Perceptron

Opening the Black Box Data Driven Visualizaion of Neural N

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Notes on Multilayer, Feedforward Neural Networks

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

A Systematic Overview of Data Mining Algorithms

Artificial Neural Networks MLP, RBF & GMDH

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

For Monday. Read chapter 18, sections Homework:

Lecture #11: The Perceptron

Neural Networks. Neural Network. Neural Network. Neural Network 2/21/2008. Andrew Kusiak. Intelligent Systems Laboratory Seamans Center

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Machine Learning 13. week

Pattern Classification Algorithms for Face Recognition

Motivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Multilayer Feed-forward networks

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Back propagation Algorithm:

5 Learning hypothesis classes (16 points)

Dept. of Computing Science & Math

Machine Learning in Biology

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

INTRODUCTION TO DEEP LEARNING

Technical University of Munich. Exercise 7: Neural Network Basics

Character Recognition Using Convolutional Neural Networks

Neural Networks. By Laurence Squires

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Lecture 20: Neural Networks for NLP. Zubin Pahuja

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation

Machine Learning Classifiers and Boosting

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

The Mathematics Behind Neural Networks

Neural Networks (pp )

Optimizing Number of Hidden Nodes for Artificial Neural Network using Competitive Learning Approach

Weiguang Guan Code & data: guanw.sharcnet.ca/ss2017-deeplearning.tar.gz

Model learning for robot control: a survey

Ship Energy Systems Modelling: a Gray-Box approach

CP365 Artificial Intelligence

Logical Rhythm - Class 3. August 27, 2018

Introduction to Neural Networks

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Neural Networks CMSC475/675

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

CS 8520: Artificial Intelligence

Neural Nets. CSCI 5582, Fall 2007

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

AMOL MUKUND LONDHE, DR.CHELPA LINGAM

EECS 496 Statistical Language Models. Winter 2018

Reservoir Computing with Emphasis on Liquid State Machines

CS 354R: Computer Game Technology

Simple Model Selection Cross Validation Regularization Neural Networks

A Quick Guide on Training a neural network using Keras.

The Problem of Overfitting with Maximum Likelihood

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Fast Learning for Big Data Using Dynamic Function

Neural Nets & Deep Learning

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

Machine Learning in Telecommunications

Machine Learning. Chao Lan

11/14/2010 Intelligent Systems and Soft Computing 1

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Optimum size of feed forward neural network for Iris data set.

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.

Deep Neural Networks Optimization

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Predict the box office of US movies

COMPUTATIONAL INTELLIGENCE

Introduction to Neural Networks

FAST NEURAL NETWORK ALGORITHM FOR SOLVING CLASSIFICATION TASKS

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Artificial Neural Networks

COMPUTATIONAL INTELLIGENCE

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Perceptron as a graph

Neural Networks (Overview) Prof. Richard Zanibbi

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Neural Networks and Deep Learning

Exercise: Training Simple MLP by Backpropagation. Using Netlab.

Neural Network Approach for Automatic Landuse Classification of Satellite Images: One-Against-Rest and Multi-Class Classifiers

Transcription:

CP365 Artificial Intelligence

Tech News! Apple news conference tomorrow?

Tech News! Apple news conference tomorrow? Google cancels Project Ara modular phone

Weather-Based Stock Market Predictions?

Dataset Preparation Clean remove bogus data/fill in missing data Normalize data adjust features to be similar magnitudes

Deal with Missing Data Option 1: remove datapoints with any missing feature values

Deal with Missing Data Option 1: remove datapoints with any missing feature values Option 2: fill in missing data with <data_missing> tags for categorical data

Deal with Missing Data Option 1: remove datapoints with any missing feature values Option 2: fill in missing data with <data_missing> tags for categorical data Option 3: fill in missing data with global means for numeric data

Deal with Missing Data Option 1: remove datapoints with any missing feature values Option 2: fill in missing data with <data_missing> tags for categorical data Option 3: fill in missing data with global means for numeric data Option 4: fill in missing data with values from similar data points

Remove Outliers Some datapoints may have ridiculous feature values. We can remove outliers from our dataset to increase performance. What is an outlier?

Outliers Patient Height (cm) Patient Weight (kg)... Prognosis 131.2 59.2... Good 176.7 82.9... Good 12613.9 66.0... Poor 161.0 70.2... Poor

Outliers Patient Height (cm) 131.2 176.7 PatientObvious Weight (kgs) outlier... How can we define what makes an outlier? 59.2... We 82.9could use 3σ... as the threshold. Prognosis Good Good 12613.9 66.0... Poor 161.0 70.2... Poor

Outliers Patient Height (cm) This column has x = 156.3 and Patient Weight... Prognosis σ = 23.1 (without the possible (kgs) outlier). 131.2 59.2 176.7 The 3σ thresholds would be (156.3-3 * 23.1, * 23.1) 82.9... 156.3 + 3 Good or (87, 225.6) 12613.9 161.0... Good 66.0... Poor 70.2... Poor

A Bad Dataset Patient Height (nm) Patient Weight (tons)... Prognosis 1.31 x 109 0.065... Good 1.76 x 109 0.091... Good 1.23 x 109 0.073... Poor 1.61 x 109 0.077... Poor

A Bad Dataset How will these large differences affect learning? Patient Height (nm) Patient Weight (tons)... Prognosis 1.31 x 109 0.065... Good 1.76 x 109 0.091... Good 1.23 x 109 0.073... Poor 1.61 x 109 0.077... Poor

Data Normalization Procedure Patient Height (nm) 1.31 x 109 Range of Extreme Values 1.76 x 109 1.76 x 109 1.23 x 109 1.23 x 109 1.61 x 109

Data Normalization Procedure Patient Height (nm) 1.31 x 109 Range of Extreme Values 1.76 x 109 1.76 x 109 1.23 x 109 1.23 x 109 1.61 x 109 Normalized Range Mapping 1.0 0.0 (-1.0)

Data Normalization Formula Patient Height (nm) 1.31 x 109 1.76 x 109 1.23 x 109 1.61 x 109 Say we want the normalized value, newpt, for the first height, 1.31 x 109, called pt. oldmax = 1.76 x 109 oldmin = 1.23 x 109 newmax = 1.0 newmin = 0.0

Data Normalization Formula Patient Height (nm) 1.31 x 109 1.76 x 109 1.23 x 109 1.61 x 10 9 Say we want the normalized value, newpt, for the first height, 1.31 x 109, called pt. oldmax = 1.76 x 109 oldmin = 1.23 x 109 newmax = 1.0 newmin = 0.0 ( newpt= pt oldmin ( newmax newmin ) +newmin oldmax oldmin newpt=0.15 )

How do we know if an ML model is any good?

Overfitting

Testing Error Training Epoch

A Biological Neuron

Human Brain

How many neurons? Animal Number Neurons (cerebral cortex) Rat 20,000,000 Dog 160,000,000 Cat 300,000,000 Pig 450,000,000 Horse 1,200,000,000 Dolphin 5,800,000,000 African Elephant 11,000,000,000 Human 20,000,000,000

How many connections? Human 100,000,000,000,000

How many connections? Human Google (2012) 100,000,000,000,000 1,700,000,000 Google/Stanford (2013) 11,200,000,000 Digital Reasoning (2015) 160,000,000,000

Artificial Neuron Output connections Threshold Function w1 w2 w3 Input connections and weights

Hard Threshold S = Sum up all inputi * weighti if S > THRESHOLD: output = 1 else: output = 0 Threshold Function w1 w2 w3

Hard Threshold: Step Function

Write down artificial neurons with weights and thresholds that model the following functions: Identity Logical AND Logical OR Logical XOR Constant function

Sigmoid Threshold S = Sum up all inputi * weighti Threshold Function output = w1 w2 w3 1 S 1 e

Sigmoid Threshold: 'S' Function

sigmoid w1 = 0.1 w3 = 0.42 w2 = 0.2

sigmoid w1 = 0.1 w3 = 0.42 w2 = 0.2 Features x1 = 0.66 x2 = 0.11 x3 = 0.20

Output Calculations s = w1 * x1 + w2 * x2 + w3 * x3 s = 0.1 * 0.66 + 0.2 * 0.11 + 0.42 * 0.2 s = 0.09 1 =0.52 0.09 1 e

y1 = 0.52 sigmoid w1 = 0.1 w3 = 0.42 w2 = 0.2 Features x1 = 0.66 x2 = 0.11 x3 = 0.20

Perceptron Network Output Layer Input Layer

Perceptron: Linear Boundary

Linear Boundary?

Multilayer Network Output Layer Hidden Layer(s) Input Layer

ANN Learning How to get the weights?

ANN Learning How to get the weights? error weight1 weight2

ANN Learning How do we get the right weights? Perceptron: Gradient descent Multilayer Network: Back propagation

Node Activation Function Activation (output) of node j. n a j =g(input j )=g( w ij ai ) i=0

Node Activation Function Activation (output) of node j. n a j =g(input j )=g( w ij ai ) i=0 g is the threshold activation function.

Node Activation Function Sum of all weights and input values. Activation (output) of node j. n a j =g(input j )=g( w ij ai ) i=0 g is the threshold activation function.

Minimize Global Error Function For every output node, j, sum up... error = (t j a j ) 2 j

Minimize Global Error Function...the difference in target value vs. generated output value and square it. For every output node, j, sum up... 2 error= (t j a j ) j

Perceptron Learning Δ w ij =η(t j a j )ai Update the weight on connection i j

Perceptron Learning The learning rate (0.3ish) Δ w ij =η(t j a j )ai Update the weight on connection i j

Perceptron Learning The learning rate (0.3ish) Δ w ij =η(t j a j )ai Update the weight on connection i j Difference in target and generated output.

Perceptron Learning The learning rate (0.3ish) Input activation Δ w ij =η(t j a j )ai Update the weight on connection i j Difference in target and generated output.

Let's learn NAND! Starting weight values: W1 = 0.81, W2 = 0.55, W3 = 0.16 n a j=g (input j )=g ( w ji ai ) i=0 η = 0.3 Δ wij =η(t j a j ) ai Use sigmoid threshold Dataset: NAND Input1 Input2 Label 0 0 1 0 1 1 1 0 1 1 1 0 Out W1 In1 W2 In2 W3 1.0

ANN Learning - Backpropagation Output Layer Hidden Layer Input Layer Put in input values and feed the activation forward to produce the output.

ANN Learning - Backpropagation Output Layer Hidden Layer Input Layer Calculate the error in the output layer and then backpropagate it to update lower weights.

ANN Learning - Backpropagation Update the weight on connection i j Δ w ij =ηδ j ai

ANN Learning - Backpropagation Update the weight on connection i j Δ w ij =ηδ j ai Think of this as the error measure for node j. Different for output and hidden weights.

ANN Learning - Backpropagation Update the weight on connection i j Input activation Δ w ij =ηδ j ai Think of this as the error measure for node j. Different for output and hidden weights.

ANN Learning Backpropagation for Output Nodes δ j =a j (1 a j )(t j a j ) Error measure for output node, j.

ANN Learning Backpropagation for Output Nodes Derivative of sigmoid function. δ j =a j (1 a j )(t j a j ) Error measure for output node, j.

ANN Learning Backpropagation for Output Nodes Derivative of sigmoid function. Difference in target vs. generated output. δ j =a j (1 a j )(t j a j ) Error measure for output node, j.

ANN Learning Backpropagation for Hidden Nodes δ j =a j (1 a j ) δk w jk k Error measure for hidden node, j.

ANN Learning Backpropagation for Hidden Nodes Derivative of sigmoid function. δ j =a j (1 a j ) δk w jk k Error measure for hidden node, j.

ANN Learning Backpropagation for Hidden Nodes Derivative of sigmoid function. Error measure a combination of output errors that this weight contributes to. δ j =a j (1 a j ) δk w jk k Error measure for hidden node, j.

ANN Learning Initialize random network weights for epoch in range NUMBER_EPOCHS: Train network on random presentation of instances Update weights with backpropagation Report global error function value

Choosing the Learning Rate, η What happened when our learning rate was too high for linear regression? How do we choose an appropriate learning rate for ANNs?

Bold Driver After each epoch... sodahead.com if error went down: η = η * 1.05 else: η = η * 0.50

Choosing the Network Structure Output Layer How many nodes? What are their connections? Hidden Layer Input Layer

Choosing the Network Structure # of output nodes determined by the number of function Output outputs. Layer Hidden Layer Input Layer

Choosing the Network Structure # of input nodes Output determined by the Layer number of function inputs. Hidden Layer Input Layer

Choosing the Network Too Structure few hidden nodes: unable to get a detailed enough approximation of the target function Output Layer Hidden Layer Input Layer

Choosing the Network Structure Output Layer Too many hidden nodes: slower to train and easier to overfit training data Hidden Layer Input Layer

ANN Representational Power With one hidden layer: Model all continuous functions With two hidden layers: Model all functions

Rules of Thumb Use 1 or 2 hidden layers

Rules of Thumb Use 1 or 2 hidden layers Use about (2/3)n hidden nodes for reasonably complex functions

Rules of Thumb Use 1 or 2 hidden layers Use about (2/3)n hidden nodes for reasonably complex functions Don't train for too many epochs

Splitting up datasets Training data use to train your ML model Validation data use to improve your ML model while training Testing data use to test performance of your ML model

K-Fold Cross Validation Full Dataset Dataset split into k chunks

K-Fold Cross Validation: Pass 1 Training Dataset Validation Dataset

K-Fold Cross Validation: Pass 2 Training Dataset Validation Dataset

K-Fold Cross Validation Perform K training/validation passes Each pass counts as a classification accuracy sample Extreme case: K = datasetsize Leave-one-out testing

ANN Implementation?

Break!