CP365 Artificial Intelligence

Size: px

Start display at page:

Download "CP365 Artificial Intelligence"

Kelley McLaughlin
6 years ago
Views:

1 CP365 Artificial Intelligence

2 Tech News! Apple news conference tomorrow?

3 Tech News! Apple news conference tomorrow? Google cancels Project Ara modular phone

4 Weather-Based Stock Market Predictions?

5 Dataset Preparation Clean remove bogus data/fill in missing data Normalize data adjust features to be similar magnitudes

6 Deal with Missing Data Option 1: remove datapoints with any missing feature values

7 Deal with Missing Data Option 1: remove datapoints with any missing feature values Option 2: fill in missing data with <data_missing> tags for categorical data

8 Deal with Missing Data Option 1: remove datapoints with any missing feature values Option 2: fill in missing data with <data_missing> tags for categorical data Option 3: fill in missing data with global means for numeric data

9 Deal with Missing Data Option 1: remove datapoints with any missing feature values Option 2: fill in missing data with <data_missing> tags for categorical data Option 3: fill in missing data with global means for numeric data Option 4: fill in missing data with values from similar data points

10 Remove Outliers Some datapoints may have ridiculous feature values. We can remove outliers from our dataset to increase performance. What is an outlier?

11 Outliers Patient Height (cm) Patient Weight (kg)... Prognosis Good Good Poor Poor

12 Outliers Patient Height (cm) PatientObvious Weight (kgs) outlier... How can we define what makes an outlier? We 82.9could use 3σ... as the threshold. Prognosis Good Good Poor Poor

13 Outliers Patient Height (cm) This column has x = and Patient Weight... Prognosis σ = 23.1 (without the possible (kgs) outlier) The 3σ thresholds would be ( * 23.1, * 23.1) Good or (87, 225.6) Good Poor Poor

14 A Bad Dataset Patient Height (nm) Patient Weight (tons)... Prognosis 1.31 x Good 1.76 x Good 1.23 x Poor 1.61 x Poor

15 A Bad Dataset How will these large differences affect learning? Patient Height (nm) Patient Weight (tons)... Prognosis 1.31 x Good 1.76 x Good 1.23 x Poor 1.61 x Poor

16 Data Normalization Procedure Patient Height (nm) 1.31 x 109 Range of Extreme Values 1.76 x x x x x 109

17 Data Normalization Procedure Patient Height (nm) 1.31 x 109 Range of Extreme Values 1.76 x x x x x 109 Normalized Range Mapping (-1.0)

18 Data Normalization Formula Patient Height (nm) 1.31 x x x x 109 Say we want the normalized value, newpt, for the first height, 1.31 x 109, called pt. oldmax = 1.76 x 109 oldmin = 1.23 x 109 newmax = 1.0 newmin = 0.0

19 Data Normalization Formula Patient Height (nm) 1.31 x x x x 10 9 Say we want the normalized value, newpt, for the first height, 1.31 x 109, called pt. oldmax = 1.76 x 109 oldmin = 1.23 x 109 newmax = 1.0 newmin = 0.0 ( newpt= pt oldmin ( newmax newmin ) +newmin oldmax oldmin newpt=0.15 )

20 How do we know if an ML model is any good?

21 Overfitting

22 Testing Error Training Epoch

23 A Biological Neuron

24 Human Brain

25 How many neurons? Animal Number Neurons (cerebral cortex) Rat 20,000,000 Dog 160,000,000 Cat 300,000,000 Pig 450,000,000 Horse 1,200,000,000 Dolphin 5,800,000,000 African Elephant 11,000,000,000 Human 20,000,000,000

26 How many connections? Human 100,000,000,000,000

27 How many connections? Human Google (2012) 100,000,000,000,000 1,700,000,000 Google/Stanford (2013) 11,200,000,000 Digital Reasoning (2015) 160,000,000,000

28 Artificial Neuron Output connections Threshold Function w1 w2 w3 Input connections and weights

29 Hard Threshold S = Sum up all inputi * weighti if S > THRESHOLD: output = 1 else: output = 0 Threshold Function w1 w2 w3

30 Hard Threshold: Step Function

31 Write down artificial neurons with weights and thresholds that model the following functions: Identity Logical AND Logical OR Logical XOR Constant function

32 Sigmoid Threshold S = Sum up all inputi * weighti Threshold Function output = w1 w2 w3 1 S 1 e

33 Sigmoid Threshold: 'S' Function

34 sigmoid w1 = 0.1 w3 = 0.42 w2 = 0.2

35 sigmoid w1 = 0.1 w3 = 0.42 w2 = 0.2 Features x1 = 0.66 x2 = 0.11 x3 = 0.20

36 Output Calculations s = w1 * x1 + w2 * x2 + w3 * x3 s = 0.1 * * * 0.2 s = = e

37 y1 = 0.52 sigmoid w1 = 0.1 w3 = 0.42 w2 = 0.2 Features x1 = 0.66 x2 = 0.11 x3 = 0.20

38 Perceptron Network Output Layer Input Layer

39 Perceptron: Linear Boundary

40 Linear Boundary?

41 Multilayer Network Output Layer Hidden Layer(s) Input Layer

42 ANN Learning How to get the weights?

43 ANN Learning How to get the weights? error weight1 weight2

44 ANN Learning How do we get the right weights? Perceptron: Gradient descent Multilayer Network: Back propagation

45 Node Activation Function Activation (output) of node j. n a j =g(input j )=g( w ij ai ) i=0

46 Node Activation Function Activation (output) of node j. n a j =g(input j )=g( w ij ai ) i=0 g is the threshold activation function.

47 Node Activation Function Sum of all weights and input values. Activation (output) of node j. n a j =g(input j )=g( w ij ai ) i=0 g is the threshold activation function.

48 Minimize Global Error Function For every output node, j, sum up... error = (t j a j ) 2 j

49 Minimize Global Error Function...the difference in target value vs. generated output value and square it. For every output node, j, sum up... 2 error= (t j a j ) j

50 Perceptron Learning Δ w ij =η(t j a j )ai Update the weight on connection i j

51 Perceptron Learning The learning rate (0.3ish) Δ w ij =η(t j a j )ai Update the weight on connection i j

52 Perceptron Learning The learning rate (0.3ish) Δ w ij =η(t j a j )ai Update the weight on connection i j Difference in target and generated output.

53 Perceptron Learning The learning rate (0.3ish) Input activation Δ w ij =η(t j a j )ai Update the weight on connection i j Difference in target and generated output.

54 Let's learn NAND! Starting weight values: W1 = 0.81, W2 = 0.55, W3 = 0.16 n a j=g (input j )=g ( w ji ai ) i=0 η = 0.3 Δ wij =η(t j a j ) ai Use sigmoid threshold Dataset: NAND Input1 Input2 Label Out W1 In1 W2 In2 W3 1.0

55 ANN Learning - Backpropagation Output Layer Hidden Layer Input Layer Put in input values and feed the activation forward to produce the output.

56 ANN Learning - Backpropagation Output Layer Hidden Layer Input Layer Calculate the error in the output layer and then backpropagate it to update lower weights.

57 ANN Learning - Backpropagation Update the weight on connection i j Δ w ij =ηδ j ai

58 ANN Learning - Backpropagation Update the weight on connection i j Δ w ij =ηδ j ai Think of this as the error measure for node j. Different for output and hidden weights.

59 ANN Learning - Backpropagation Update the weight on connection i j Input activation Δ w ij =ηδ j ai Think of this as the error measure for node j. Different for output and hidden weights.

60 ANN Learning Backpropagation for Output Nodes δ j =a j (1 a j )(t j a j ) Error measure for output node, j.

61 ANN Learning Backpropagation for Output Nodes Derivative of sigmoid function. δ j =a j (1 a j )(t j a j ) Error measure for output node, j.

62 ANN Learning Backpropagation for Output Nodes Derivative of sigmoid function. Difference in target vs. generated output. δ j =a j (1 a j )(t j a j ) Error measure for output node, j.

63 ANN Learning Backpropagation for Hidden Nodes δ j =a j (1 a j ) δk w jk k Error measure for hidden node, j.

64 ANN Learning Backpropagation for Hidden Nodes Derivative of sigmoid function. δ j =a j (1 a j ) δk w jk k Error measure for hidden node, j.

65 ANN Learning Backpropagation for Hidden Nodes Derivative of sigmoid function. Error measure a combination of output errors that this weight contributes to. δ j =a j (1 a j ) δk w jk k Error measure for hidden node, j.

66 ANN Learning Initialize random network weights for epoch in range NUMBER_EPOCHS: Train network on random presentation of instances Update weights with backpropagation Report global error function value

67 Choosing the Learning Rate, η What happened when our learning rate was too high for linear regression? How do we choose an appropriate learning rate for ANNs?

68 Bold Driver After each epoch... sodahead.com if error went down: η = η * 1.05 else: η = η * 0.50

69 Choosing the Network Structure Output Layer How many nodes? What are their connections? Hidden Layer Input Layer

70 Choosing the Network Structure # of output nodes determined by the number of function Output outputs. Layer Hidden Layer Input Layer

71 Choosing the Network Structure # of input nodes Output determined by the Layer number of function inputs. Hidden Layer Input Layer

72 Choosing the Network Too Structure few hidden nodes: unable to get a detailed enough approximation of the target function Output Layer Hidden Layer Input Layer

73 Choosing the Network Structure Output Layer Too many hidden nodes: slower to train and easier to overfit training data Hidden Layer Input Layer

74 ANN Representational Power With one hidden layer: Model all continuous functions With two hidden layers: Model all functions

75 Rules of Thumb Use 1 or 2 hidden layers

76 Rules of Thumb Use 1 or 2 hidden layers Use about (2/3)n hidden nodes for reasonably complex functions

77 Rules of Thumb Use 1 or 2 hidden layers Use about (2/3)n hidden nodes for reasonably complex functions Don't train for too many epochs

78 Splitting up datasets Training data use to train your ML model Validation data use to improve your ML model while training Testing data use to test performance of your ML model

79 K-Fold Cross Validation Full Dataset Dataset split into k chunks

80 K-Fold Cross Validation: Pass 1 Training Dataset Validation Dataset

81 K-Fold Cross Validation: Pass 2 Training Dataset Validation Dataset

82 K-Fold Cross Validation Perform K training/validation passes Each pass counts as a classification accuracy sample Extreme case: K = datasetsize Leave-one-out testing

83 ANN Implementation?

84 Break!

CMPT 882 Week 3 Summary

CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being