CP365 Artificial Intelligence

Size: px

Start display at page:

Download "CP365 Artificial Intelligence"

Lorraine Baldwin
5 years ago
Views:

1 CP365 Artificial Intelligence

2 Example Problem Problem: Does a given image contain cats? Input vector: RGB/BW pixels of the image. Output: Yes or No.

3 Example Problem Problem: What category is a news story? Input vector: text file of the words in the news story. Output: Sport or Technology or Business... Bomb squad finds Schrodinger's Cat alive A mysterious box appeared in a parking lot at Erie Community College campus in Amherst, N.Y., last Friday afternoon. The state police bomb squad responded and took an x-ray of the sealed box, which showed a cat inside!

4 Example Problem Problem: Does a patient have prostate cancer? Input vector: RGB/BW pixels of the biopsy image Output: Malignant or benign tumor.

5 Example Problem Problem: What will be the value of the NASDAQ stock index tomorrow at 2pm? Input vector: past financial data Output:

6 Example Problem Problem: What did that human just say into the microphone? Input vector: audio data wave forms. Output: Volume down, volume down!

7 Machine Learning Models Linear Regression KNN: K-Nearest Neighbors ANN: Artificial Neural Networks Decision Trees Naive Bayes

8 Will Random Student be Successful at CC? Random Random Student Student

9 Past Students Student HS GPA CC GPA PS # PS # PS # PS #

10 Supervised Learning Data on past students Learn ML Model

11 Supervised Learning Make prediction for ML Model New random student

12 Building an ML Model College GPA HS GPA

13 Trend Line College GPA HS GPA

14 Another Problem Suppose you're too lazy to run a marathon, but you'd like to know what time you would get if you were to run What can you do?

15 Possible Solution Find another variable that correlates well with marathon time

16 Possible Solution Find another variable that correlates well with marathon time Running a 5K isn't too hard Use 5K time to predict marathon time

17 Running Dataset Runner ID Marathon 5K Time Time (min) (min)

18 Scatterplot of Dataset

19 How do we Scatterplot of Dataset calculate a trend line?

20 Least-Squares Regression We want to minimize the distance between the predicted values (regression line) and actual data points.

21 Least-Squares Regression Square distances and sum them up. Minimize that value.

22 Our First ML Model: Least-Squares Linear Regression y = mx + b Model is (m, b)

23 An Example Model For Student Success Predictions Model is (1.0, -2.0)

24 An Example Model For Student How do we use Success Predictions the model to Model is (1.0, -2.0) make a prediction?

25 An Example Model For Student How do we use Success Predictions the model to Model is (1.0, -2.0) make a prediction? New student: HS GPA = 3.25 Predicted CC GPA = 3.25 * = 1.25

26 Function Approximation There exists some real function (very high order!) that determines a student's success in college.

27 Function Approximation There exists some real function (very high order!) that determines a student's success in college. Our model approximates that function by learning from the available dataset.

28 Potato Diagram Space of Students n (Input Vectors) ℝ PS #1 Space of Predictions [0, 4] PS #2 PS #3 2.56

29 A difficulty: n is Potato Diagram often very large. Space of Students n (Input Vectors) ℝ PS #1 Space of Predictions [0, 4] PS #2 PS #3 2.56

30 Good Function Approximations How do we know if our learned model is a good approximation for the true function?

31 Cost Function n 2 C (h, X )= i =0 (h (x i ) y i )

32 Cost Function Our cost function is called, C. n 2 C (h, X )= i =0 (h (x i ) y i )

33 Cost Function Its two inputs are a hypothesis (an ML model) and the training dataset. n 2 C (h, X )= i =0 (h (x i ) y i )

34 Cost Function For least-squares linear regression, this is the formula. n 2 C (h, X )= i =0 (h (x i ) y i )

35 Cost Function For every labeled training example that we have, sum up... n 2 C (h, X )= i =0 (h (x i ) y i )

36 Cost Function...the difference between each model prediction... n 2 C (h, X )= i =0 (h (x i ) y i )

37 Cost Function...and actual target value... n 2 C (h, X )= i =0 (h (x i ) y i )

38 Cost Function...squared. n 2 C (h, X )= i =0 (h (x i ) y i )

39 CostSoFunction our job is to change h in order to minimize the cost. n 2 C (h, X )= i =0 (h (x i ) y i )

40 Cost Function Landscape and Contour

46 1D Valley Finding

47 Small learning rate = slow convergence

48 Large learning rate = bounce around

49 Gradient Descent Go down the hill toward the valley! For each x, y pair... α(h θ (x ) y ) x θ=θ m

50 Gradient Descent Go down the hill toward the valley! Old parameter values. α(h θ (x ) y ) x θ=θ m

51 Gradient Descent Go down the hill toward the valley! Learning rate. α(h θ (x ) y ) x θ=θ m

52 Gradient Descent Go down the hill toward the valley! The derivative of the cost function. α(h θ (x ) y ) x θ=θ m

53 Derivative of the cost function C (θ)=(h θ ( x ) y )2

54 Derivative of the cost function C (θ)=(h θ ( x ) y )2 C (θ)=((θ x +b ) y )2

55 Derivative of the cost function C (θ)=(h θ ( x ) y )2 C (θ)=((θ x +b ) y )2 Recall :( f g )' =( f ' g )g '

56 Derivative of the cost function C (θ)=(h θ ( x ) y )2 C (θ)=((θ x +b ) y )2 Recall :( f g )' =( f ' g )g ' g =(θ x +b ) y

57 Derivative of the cost function C (θ)=(h θ ( x ) y )2 C (θ)=((θ x +b ) y )2 Recall :( f g )' =( f ' g )g ' g =(θ x +b ) y f =g 2

58 Derivative of the cost function C (θ)=(h θ ( x ) y )2 C (θ)=((θ x +b ) y )2 Recall :( f g )' =( f ' g )g ' g =(θ x +b ) y g ' =x f =g 2 f ' g =2((θ x +b ) y )

59 Derivative of the cost function C (θ)=(h θ ( x ) y )2 C (θ)=((θ x +b ) y )2 Recall :( f g )' =( f ' g )g ' g =(θ x +b ) y g ' =x f =g 2 f ' g =2((θ x +b ) y ) ( f ' g ) g ' =x (h θ (x ) y )

60 Gradient Descent Go down the hill toward the valley! Scaled by the number of examples. α(h θ (x ) y ) x θ=θ m

61 Gradient Descent Go down the hill toward the valley! With higher dimensional x's... α(h θ (x ) y ) x j θ j =θ j m

62 ForDescent linear regression with 1 Gradient variable, x is the regular x value Go down the0hill toward the valley! and x1 is a 1 (bias or how to adjust y-intercept) α(h θ (x ) y ) x j θ j =θ j m

63 Gradient Descent Simultaneously update θ and θ1. Go down the hill toward the valley! Don't recalculate the error after updating one! α(h θ (x ) y ) x j θ j =θ j m 0

64 Group Exercise: α(h θ (x ) y ) x j θ j =θ j m Do 3 iterations of linear regression (α = 0.05, starting θ = [1.0, 1.0]) Rowing 2K Time (mins) Rowing Marathon Time (mins)

65 Gradient Descent in Python

66 Example Data

67 Simple Approximation

68 More Complicated Approximation

69 Higher-Order Function Approximation

70 Another Approximation

71 A New Problem Suppose you just watched the Korean horror movie, A Tale of Two Sisters, and now you want to watch something related. How do we know if another movie is similar or related to AToTS?

72 r ro Ho r Scariness Spatial Representation Terror

73 ro r A Tale of Two Sisters Ho r Scariness Spatial Representation Terror

74 ro r A Tale of Two Sisters Ho r Scariness Spatial Representation Terror

75 ro r The Grudge Ho r The Smurfs Scariness Spatial Representation Terror

76 r ro Ho r Scariness Measure Distances Terror

77 Euclidean Distance Metric dist (a, b )= n 2 (a b ) i i i =0

78 Supervised learning: how to classify? What label? [5.5, 2.3, 3.9] Labels [ 4.6, -1.3, 3.6 ], 1 Existing dataset [ 1.1, 1.2, 0.3 ], 0 [ 6.2, 2.2, 3.1 ], 1 [ -3.0, 0.1, 2.2 ], 1 [ -2.0, -1.7, 1.9 ], 0

79 Supervised learning: K Nearest Neighbors (KNN) [5.5, 2.3, 3.9] [ 4.6, -1.3, 3.6 ], 1 [ 1.1, 1.2, 0.3 ], 0 [ 6.2, 2.2, 3.1 ], 1 [ -3.0, 0.1, 2.2 ], 1 [ -2.0, -1.7, 1.9 ], 0 Calculate distance to each vector

80 Supervised learning: K Nearest Neighbors (KNN) Distances [5.5, 2.3, 3.9] 3.72 [ 4.6, -1.3, 3.6 ], [ 1.1, 1.2, 0.3 ], [ 6.2, 2.2, 3.1 ], [ -3.0, 0.1, 2.2 ], [ -2.0, -1.7, 1.9 ], 0

81 Supervised learning: K Nearest Neighbors (KNN) [5.5, 2.3, 3.9] 3.72 [ 4.6, -1.3, 3.6 ], [ 1.1, 1.2, 0.3 ], [ 6.2, 2.2, 3.1 ], [ -3.0, 0.1, 2.2 ], [ -2.0, -1.7, 1.9 ], 0 For K=2, choose the two closest neighbors

82 Supervised learning: K Nearest Neighbors (KNN) [5.5, 2.3, 3.9] 3.72 [ 4.6, -1.3, 3.6 ], [ 1.1, 1.2, 0.3 ], [ 6.2, 2.2, 3.1 ], [ -3.0, 0.1, 2.2 ], [ -2.0, -1.7, 1.9 ], 0 For K=2, Take a majority choose the two vote of their closest labels. neighbors

83 Supervised learning: KNN - Regression [5.5, 2.3, 3.9] 3.72 [ 4.6, -1.3, 3.6 ], [ 1.1, 1.2, 0.3 ], [ 6.2, 2.2, 3.1 ], [ -3.0, 0.1, 2.2 ], [ -2.0, -1.7, 1.9 ], 0.9 Take Average a majority these vote real-valued of their labels.

84 Supervised learning: KNN - Regression [5.5, 2.3, 3.9] 3.72 [ 4.6, -1.3, 3.6 ], [ 1.1, 1.2, 0.3 ], [ 6.2, 2.2, 3.1 ], [ -3.0, 0.1, 2.2 ], [ -2.0, -1.7, 1.9 ], 0.9 Take OR:a use majority a vote weighted of their average. labels.

85 KNN Python

86 Time Complexity of KNN? Model creation time vs. classification time?

87 KNN function approximation

88 3 Main Types of Machine Learning Supervised learning Unsupervised learning Learning from labeled examples Finding patterns in unlabeled examples Reinforcement learning Learning based on rewards earned

89 Two Types of Supervised Learning Classification Choosing a discrete class for a new feature vector

90 Two Types of Supervised Learning Classification Choosing a discrete class for a new feature vector Regression Predicting a floating point value based on the feature vector

91 Practical Machine Learning Decide on a problem domain

92 Practical Machine Learning Decide on a problem domain Identify good input features

93 Practical Machine Learning Decide on a problem domain Identify good input features Gather a dataset of input feature values

94 Practical Machine Learning Decide on a problem domain Identify good input features Gather a dataset of input feature values Clean up and preprocess the data as necessary

95 Practical Machine Learning Decide on a problem domain Identify good input features Gather a dataset of input feature values Clean up and preprocess the data as necessary Label the dataset with class labels

96 Practical Machine Learning Decide on a problem domain Identify good input features Gather a dataset of input feature values Clean up and preprocess the data as necessary Label the dataset with class labels Train a model (hypothesis) on the labeled data

97 Practical Machine Learning Decide on a problem domain Identify good input features Gather a dataset of input feature values Clean up and preprocess the data as necessary Label the dataset with class labels Train a model (hypothesis) on the labeled data Test the model's performance on an unseen dataset

Linear Regression and K-Nearest Neighbors 3/28/18

Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,