Network Traffic Measurements and Analysis

Size: px
Start display at page:

Download "Network Traffic Measurements and Analysis"

Transcription

1 DEIB - Politecnico di Milano Fall, 2017

2

3 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng: Machine Learning course

4 Introduction Given this data, what throughput will we expect to have when 100 people are connected to the BS? Expected throughput (Mbps) Number of active UE

5 Introduction Given this data, what throughput will we expect to have when 100 people are connected to the BS? Expected throughput (Mbps) Number of active UE Fit a straight line?

6 Introduction Given this data, what throughput will we expect to have when 100 people are connected to the BS? Expected throughput (Mbps) Number of active UE Fit a straight line? Fit a second order polynomial?

7 Supervised learning problems Two types of problems: : predict a continuous valued output : classify data into discrete classes Both generally consider many features: Predict the throughput based on number of connected user, signal strength, device type Classify incoming traffic as malicious based on source IP address, length of flows, evil bit

8

9 Agenda We ll focus on the following supervised learning algorithms: : (Multiple) Linear regression Nearest Neighbour : Logistic Regression Decision trees Naive Bayes classifiers Other very popular and powerful algorithms are available: Support Vector Machines (SVM), Neural Networks, Random Forests...

10

11 Linear Regression Starting point: The Training Set m training examples x input variables (features, predictors) y output variable (target, response) (x (i), y (i) ) training examples Linear regression takes the training set and generates an hypothesis function h that tries to estimate the value of y given x: h θ (x) = θ 0 + θ 1 x (1) θ 0, θ 1 are the parameters, determined by a learning system and used to output a prediction y based on the input x

12 Linear Regression - implementation How to choose the parameters? we would like to choose θ s.t. h θ (x) is close to y for our training examples h θ (x) tries to convert x into y. Since we got both x and y we can evaluate how well h θ (x) does this... Define the Mean Squared Error (MSE) cost function: J(θ) = 1 m m (h θ (x (i) ) y (i) ) 2 i=1

13 Linear Regression - implementation How to choose the parameters? we would like to choose θ s.t. h θ (x) is close to y for our training examples h θ (x) tries to convert x into y. Since we got both x and y we can evaluate how well h θ (x) does this... Define the Mean Squared Error (MSE) cost function: J(θ) = 1 m m (h θ (x (i) ) y (i) ) 2 i=1 We want to solve a minimization problem: min θ J(θ) (2)

14 Linear Regression - Normal Equation Let X (design matrix) and y, be 1 x (1) y (1) 1 x (2) y (2) X = 1 x (3), y = y (3)... 1 x (m) y (m) It is possible to show that the optimal parameters θ = [θ 0, θ 1 ] T is: θ = (X T X ) 1 X T y At an arbitrary input x, the prediction is ŷ = h θ (x)

15 40 35 y = *x data1 linear regression 30 Expected throughput (Mbps) Number of active UE

16 Multilinear regression Sometimes, we would like to use more than one feature for making our prediction, e.g.: the throughput may be predicted based on (i) how many users are connected x 1 and (ii) the channel quality x 2 we may also create new features starting from existing ones, e.g., the square of the number of users connected, x 3 = x 2 1 in general, we can have n features x 1,..., x n The hypothesis in case of multilinear regression is: h θ (x) = θ 0 + θ 1 x 1 + θ 2 x θ n x n (3)

17 Multilinear regression The normal equation can still be used as a solution, with the following design matrix: 1 x (1) 1 x (1) 2 x (1) 3... x (1) n 1 x (2) 1 x (2) 2 x (2) 3... x (2) n X = 1 x (3) 1 x (3) 2 x (3) 3... x n (3) x (m) 1 x (m) 2 x (m) 3... x n (m) and θ = [θ 0, θ 1,..., θ n ] T = (X T X ) 1 X T y

18 45 40 y = *x *x y = e-05*x *x 2-0.8*x data1 multilinear regression (quadratic) multilinear regression (cubic) 35 Expected throughput (Mbps) Number of active UE

19 Normal equation - Practical problems The normal equation requires to compute the matrix (X T X ) 1 (X T X ) is an (n + 1) (n + 1) matrix Most implementation compute the matrix inverse in O(n 3 ) Slow if n is large! how large? n 1000 still (relatively) small. (X T X ) may be non-invertible

20 Normal equation - Practical problems The normal equation requires to compute the matrix (X T X ) 1 (X T X ) is an (n + 1) (n + 1) matrix Most implementation compute the matrix inverse in O(n 3 ) Slow if n is large! how large? n 1000 still (relatively) small. (X T X ) may be non-invertible Cause 1: m n (more features than examples... bad idea)

21 Normal equation - Practical problems The normal equation requires to compute the matrix (X T X ) 1 (X T X ) is an (n + 1) (n + 1) matrix Most implementation compute the matrix inverse in O(n 3 ) Slow if n is large! how large? n 1000 still (relatively) small. (X T X ) may be non-invertible Cause 1: m n (more features than examples... bad idea) Cause 2: redundant features (some of the columns of X are linearly dependent. remove them!)

22 Gradient descent What if n is too large? Can we still learn the optimal parameters θ?

23 Gradient descent What if n is too large? Can we still learn the optimal parameters θ? The answer is the Gradient Descent algorithm: Start with some initial values for θ, e.g. θ i = 0 i

24 Gradient descent What if n is too large? Can we still learn the optimal parameters θ? The answer is the Gradient Descent algorithm: Start with some initial values for θ, e.g. θ i = 0 i Change θ in the direction of the gradient in order to reduce the cost function J(θ) a little bit.

25 Gradient descent What if n is too large? Can we still learn the optimal parameters θ? The answer is the Gradient Descent algorithm: Start with some initial values for θ, e.g. θ i = 0 i Change θ in the direction of the gradient in order to reduce the cost function J(θ) a little bit. Repeat until convergence (a local minimum is found) For (multi)linear regression, the cost function is convex and has only a single minimum.

26 Gradient descent What if n is too large? Can we still learn the optimal parameters θ? The answer is the Gradient Descent algorithm: Start with some initial values for θ, e.g. θ i = 0 i Change θ in the direction of the gradient in order to reduce the cost function J(θ) a little bit. Repeat until convergence (a local minimum is found) For (multi)linear regression, the cost function is convex and has only a single minimum. Gradient descent will converge to the same optimal θ found by the normal equation (if α is small enough...)

27 Gradient descent Do the following until convergence: θ j = θ j α θ j J(θ) j (4) that is: θ j = θ j α 1 m m i=1 (h θ (x (i) ) y (i) ) x (i) j (5) with x (i) 0 = 1 for convenience.

28 Gradient descent The variable alpha in: θ j = θ j α 1 m m i=1 (h θ (x (i) ) y (i) ) x (i) j (6) is the learning rate. How to choose it?

29 Gradient descent The variable alpha in: θ j = θ j α 1 m m i=1 (h θ (x (i) ) y (i) ) x (i) j (6) is the learning rate. How to choose it? too small: baby steps, convergence takes too long

30 Gradient descent The variable alpha in: θ j = θ j α 1 m m i=1 (h θ (x (i) ) y (i) ) x (i) j (6) is the learning rate. How to choose it? too small: baby steps, convergence takes too long too big: huge steps, you can miss the minimum and fail to converge Normalizing features (e.g., between -1 and 1) helps in providing numerical stability

31

32 k-nearest Neighbours Regression Simple idea: given unknown input x, output y by averaging the k most similar examples in the training set.

33 k-nearest Neighbours Regression Simple idea: given unknown input x, output y by averaging the k most similar examples in the training set. How many examples should we take (i.e., what is k)?

34 k-nearest Neighbours Regression Simple idea: given unknown input x, output y by averaging the k most similar examples in the training set. How many examples should we take (i.e., what is k)? How should we average the examples? Generally, Inverse Distance Weighting (IDW) is applied. ŷ = k w k y k (7) with w k inversely proportional to dist(x, x (k) ).

35 k-nn Training data knn, k = 2 knn, k = 4 Normalized Expected throughput (Mbps) Normalized Number of active UE

36 k-nn Training data knn, k = 2 knn, k = 4 Normalized Expected throughput (Mbps) Normalized Number of active UE How many samples are needed for prediction? Will it generalize well to new data points?

37 LOESS Linear regression and k-nn can be fused together in the LOESS (LOcal regression) method. For a new input x 0, gather the k points (x i, y i ) whose x i are nearest to x 0 fit a linear model h θ using only these k points output y h at = h θ (x 0 )

38 Bias-Variance trade off We have seen a few algorithm. They work well in practice, but may suffer from:

39 Bias-Variance trade off We have seen a few algorithm. They work well in practice, but may suffer from: underfitting (high bias). It occurs when the model used is too simple (e.g. linear regression).

40 Bias-Variance trade off We have seen a few algorithm. They work well in practice, but may suffer from: underfitting (high bias). It occurs when the model used is too simple (e.g. linear regression). overfitting (high variance). It occurs when the model uses follows the training data too much (e.g. k-nn, high-order polynomial fitting)

41 Evaluating a hypothesis How do we know if our hypothesis is suffering from underfitting or overfitting? Linear regression works by finding θ so that the cost function J(θ) is minimized. What if we found J(θ) = 0. Does it mean we have found the perfect model?

42 Evaluating a hypothesis How do we know if our hypothesis is suffering from underfitting or overfitting? Linear regression works by finding θ so that the cost function J(θ) is minimized. What if we found J(θ) = 0. Does it mean we have found the perfect model? Can t be sure unless we try to generalize! Standard way to evaluate the hypothesis: split the training set! 70% used for training the model, by minimizing J train 20% used for computing the cross validation error J cv 10% used for computing the test error J cv

43 Evaluating a hypothesis data1 linear quadratic 9th degree 70 Normalized Expected throughput (Mbps) Normalized Number of active UE Which one will have the smallest J train? Should we select the model based on that?

44 Good practices for model selection Divide your initial data set in three parts Train all your models on the training set Compute J cv on the cross validation set. Pick the model who has the smallest J cv error Compute generalization error J test on the test set.

45 k-fold cross validation If the total number of observation m is small, we may have too few examples in each set. Also, J cv depends on which 20% of the data is used. To cope with these issues, we can use k-fold cross validation. randomly divide the entire set in k fold of similar size. use the first fold as cross-validation set and the other k 1 as training set. Obtain J cv,k repeat k times, each time changing the cross-validation set Set J cv = 1 k k J cv,k Generally, k = 5 or k = 10 is used

46 Bias-variance diagnosis How do we understand if our model is suffering from overfitting? We look at the train and cross-validation errors. High bias: both errors are high High variance: training error is low and cv error is high

47 Bias-variance diagnosis Another possibility is looking at the learning curves

48 Bias-variance diagnosis Another possibility is looking at the learning curves High bias: errors are close and high, adding data does not help, you should use a different / more complex model High variance: adding data could help, but you should consider using a simpler model or using regularization

49 Regularization / Ridge regression Overfitting happens when the model is too complex, or using an excessive number of features. Main idea: modify the cost function adding a penalty on features J(θ) = 1 m m [ (h θ (x (i) ) y (i) ) 2 + λ i=1 n θj 2 ] (8) j=1 The variable λ controls the amount of penalization. λ should be chosen carefully if too big we ll have underfitting if too small it s like not using it

50

51 When the target variable y is discrete (e.g., 0/1), we talk about classification problems 1 (Type 2) Application Type 0 (Type 1) Flow duration

52 We could use linear regression with a threshold: Estimate h θ (x) = θ T x = θ 0 + θ 1 x Output 0 if θ T x < 0.5, 1 otherwise data1 linear 1 (Type 2) Application Type (Type 1) Flow duration

53 Linear regression as classifier Problem 1: the addition of a single point changes things a lot! data1 linear 1 (Type 2) data1 linear 1 (Type 2) Application Type 0.5 Application Type (Type 1) Flow duration 0 (Type 1) Flow duration Problem 2: h θ (x) output values much greater than 1 or much smaller than 0

54 k-nn classifier We could use a k-nearest Neighbour classifier: we take the k most similar example and output the class who occurs most often (majority voting). 1 (Type 2) Application Type (Type 1) Flow duration

55 k-nn classifier We could use a k-nearest Neighbour classifier: we take the k most similar example and output the class who occurs most often (majority voting). 1 (Type 2) Application Type (Type 1) Flow duration Will it generalize well to new data points?

56 Logistic regression To cope with those problems, logistic regression is introduced: 1 h θ (x) = 1 + e θt x where the function f (z) = 1 1+e z is the sigmoid function (9) /1+e -z z

57 Sigmoid function The sigmoid function h θ (x) = 1 1+e θt x outputs values between 0 and 1 crosses 0 at 0.5 When θ T x 0, h θ (x) 0.5 When θ T x < 0, h θ (x) < 0.5 The output can be interpreted as the probability of being of belonging to a particular class.

58 Logistic regression cost function In order to do this, use a modified cost function (in order to make it a convex function): J(θ) = 1 m m [ y (i) log h θ (x (i) ) + (1 y (i) ) log(1 h θ (x (i) )] (10) i=1 One can use gradient descent (or other more complicated algorithms) to solve for parameters θ.

59 Logistic regression 1 (Type 2) Application Type (Type 1) Flow duration In this example, θ = [ 20.9, 2.83] T. The boundary value is at θ T x = 0 that is x = 7.38

60 Decision boundary With more than one feature, logistic regression finds a decision boundary in the space of the features: 25 Type 1 Type 2 20 Flow Duration [s] Packet Length [byte] 1 In this example, h θ (x) =, and the line 1+e (θ 0 +θ 1 x 1 +θ 2 x 2 ) x 2 = 1 θ 2 θ 0 + θ 1 x 1 is the decision boundary.

61 Decision trees Very simple and powerful algorithms, used standalone or to build more complex algorithms (e.g. Random Forest). They can be used for both classification and regression. Main idea: follow a series of questions, and take a path depending on the answer. 25 Type 1 Type 2 20 packet length < 773? Flow Duration [s] YES NO Packet Length [byte] Type 0 Type 1

62 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set

63 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set How does it work: start by the training set, identify a feature and a split criterion on it. Two children nodes are generated

64 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set How does it work: start by the training set, identify a feature and a split criterion on it. Two children nodes are generated For each generated node, repeat the process.

65 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set How does it work: start by the training set, identify a feature and a split criterion on it. Two children nodes are generated For each generated node, repeat the process. Identify a stop criterion, or stop when all nodes contain example of the same class.

66 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set How does it work: start by the training set, identify a feature and a split criterion on it. Two children nodes are generated For each generated node, repeat the process. Identify a stop criterion, or stop when all nodes contain example of the same class. When stopping, output the class label that occurs most.

67 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set How does it work: start by the training set, identify a feature and a split criterion on it. Two children nodes are generated For each generated node, repeat the process. Identify a stop criterion, or stop when all nodes contain example of the same class. When stopping, output the class label that occurs most. Prune the tree

68 Growing a decision tree Output variable y = k. In our example k = {0, 1} The training set is represented by L = (x (i), y (i) ), i = 1... m Each node t in the tree will contain a set of associated observations L(t). The root of the tree contains all observations L(t 1 ) = L. How and when to split a node t?

69 Growing a decision tree Output variable y = k. In our example k = {0, 1} The training set is represented by L = (x (i), y (i) ), i = 1... m Each node t in the tree will contain a set of associated observations L(t). The root of the tree contains all observations L(t 1 ) = L. How and when to split a node t? If all the observation in a node L(t) belong to the same class j, we do not split. We declare t to be a leaf node and whenever a new observation x reaches t, we declare y = k.

70 Splitting criteria Assume a node t contains samples of different classes. We need to decide: which feature x j to operate the split onto, at which value v to operate the split in order to output the question: is x j < v?

71 Node impurity We introduce the measure Q(t). If all observations of node t are of the same class, Q(t) = 0. When the distribution of the classes in a node is uniform, Q(t) takes the maximum value. When we split a node, we try to decrease the impurity as much as possible

72 Impurity measures Let p t,k = p(k t) the proportion of class k observations in node t. Gini index: Q(t) = k p t,k (1 p t,k ) (11) Cross-entropy: Q(t) = k p t,k log(p t,k ) (12)

73 Split criterion To measure a split s change in impurity, we can evaluate: Q = Q(t) p L Q(t L ) p R Q(t R ) where p L and p R are the proportions of observations that fall in the left and right children node, respectively. In order to find which node t to operate on and which variable, we test all possible splits! At each step, we will choose the node t for which Q is higher.

74 Tree pruning After the learning phase, the full tree may be very complex. Some nodes can be discarded or pruned. Given a full tree T 0, pruning seeks another tree T with a smaller number of leaves, at the cost of an higher misclassification error R. The process tries to minimize: R α (T ) = R(T ) + α T (13) where alpha is a penalty on the tree size, and T is the number of nodes in the tree.

75 Decision tree summary Pros: Cons: Highly interpretable. Very instable (small change in training set may produce a very different tree) May be complex to train (depending on the data) May not generalize well to new data

76 Other approaches Support Vector Machines Finds the decision boundary so that the distance (margin) between the classes is maximized Binary Classifier Neural Networks Learn weights and activations (parameters) of a complex layered structure. By doing this the network learns also which are the best features. Works very well in many scenarios and can output multiple classes. Basis for deep learning Difficult to interpret

77 Ensemble methods Fuse information from many weak classifier into a strong one. Gold standard: Random Forest Main goal: reduce variance/overfitting of trees Majority voting over multiple trees learnt from multiple training set obtained by sampling the original set with replacement (bagging). For each tree, each time split over only a random subset of features. Additionally apply boosting: learn tree sequentially, each time creating a new bag paying higher attention on misclassified samples.

78 Discriminative vs Generative classifiers The algorithms seen so far are discriminative algorithms. They look at examples from all classes (0s and 1s) and find a decision boundary or a set of rules that separates the classes. They learn: in a direct way. p(y x) (14)

79 Discriminative vs Generative classifiers The algorithms seen so far are discriminative algorithms. They look at examples from all classes (0s and 1s) and find a decision boundary or a set of rules that separates the classes. They learn: p(y x) (14) in a direct way. In contrast, a generative learning algorithm looks at only one class of examples at a time, and learns p(x y) and p(y) (15) what are the features like, given a particular class (and the class prior).

80 Naive Bayes classifier Recalling the Bayes Theorem: p(y x) = p(x y)p(y) p(x) (16) Naive Bayes assume that variables x 1, x 2,..., x n are conditionally independent. Therefore: p(x y) = i p(x i y) (17) p(y) is the class prior and can be easily obtained from the available data p(x) = j p(x y j)p(y j ), but is often dropped as the probability of the data is constant

81 Naive Bayes classifier Recalling the Bayes Theorem: p(y x) = p(x y)p(y) p(x) (16) Naive Bayes assume that variables x 1, x 2,..., x n are conditionally independent. Therefore: p(x y) = i p(x i y) (17) p(y) is the class prior and can be easily obtained from the available data p(x) = j p(x y j)p(y j ), but is often dropped as the probability of the data is constant Note: variables are never independent. However Naive Bayes works well in practice.

82 Naive Bayes: estimation We need to estimate p(x i y) for each feature from the available data. For continuous features, generally a Gaussian distribution is assumed. p(x i, µ i, σ 2 i ) = 1 2πσi 2 exp( (x i µ i ) 2 2σi 2 ) (18) Therefore we simply need to estimate µ i and σ i for each feature of our data. For discrete variables, binomial/multinomial distributions are used.

83 Naive Bayes: example 25 Type 1 Type 2 20 Flow Duration [s] Packet Length [byte]

84 Naive Bayes: example p(x 1 y = 0): µ 1 = 364, σ1 2 = p(x 2 y = 0): µ 2 = 3.1, σ2 2 = 1.81 p(x 1 y = 1): µ 1 = , σ1 2 = p(x 2 y = 1): µ 2 = 13.8, σ2 2 = Type 1 Type Flow Duration [s] Packet Length [byte]

85 Naive Bayes: classification Given a new observation x, we can predict the class it belongs: compute p(y j x) using Bayes Theorem for all y j output the class j for which p(y j x) is maximized Example 1: x 1 = 700, x 2 = 6 p(x 1 = 700 y = 0) p(x 2 = 6 y = 0) p(y = 0) = p(x 1 = 700 y = 1) p(x 2 = 6 y = 1) p(y = 1) = Example 2: x 1 = 900, x 2 = 6 p(x 1 = 900 y = 0) p(x 2 = 6 y = 0) p(y = 0) = p(x 1 = 900 y = 1) p(x 2 = 6 y = 1) p(y = 1) =

86 Other approaches Linear Discriminant Analysis Similar to Naive Bayes, without the independence assumption Assume that all classes share the same covariance matrix Quadratic Discriminant Analysis Similar to LDA, but assume that all classes have different covariance matrix LDA is simpler than QDA (has lower variance). Use it when m is small.

87 Error analysis For regression problem, the MSE can be used to evaluate the cross-validation or test error. MSE = 1 m m (y i ŷ i ) 2 (19) i=1 For a binary classification problem, we could use the classifier accuracy: ACC = 1 1 I (y i ŷ i ) (20) m where I (y i ŷ i ) = 1 if y i ŷ i and 0 otherwise. Is it a good metric? i=1

88 Skewed classes Assume you have a test set with m = 100 examples of traffic flows, and need to classify between neutral (0) or malicious traffic (1) Assume that 99 examples are neutral and only one malicious. What is the accuracy of a dummy classifier that always outputs 0? What can we do to better analyse and compare the classifier performance?

89 Precision and Recall Define: True Positive (TP): malicious flows that were classified as malicious False Positive (FP): neutral flows that were classified as malicious (false alarms) True Negative (TN): neutral flows classified as neutrals False Negative (FN): malicious flows classified as neutrals (miss) We have: Precision: how often our algorithm cause a false alarm? Recall: how sensitive is our algorithm?

90 Precision and Recall Precision: (i) how often our algorithm causes a false alarm? (ii) among all predicted positive examples, how many were actually positive? True Positive Number of Predicted Positive = TP TP + FP (21) Recall: (i) how sensitive is our algorithm? (ii) among all positive examples present in the set, how many were identified? True Positive Number of Actual Positive = TP TP + FN (22)

91 F 1 Score Often you can control a tradeoff between recall and precision using a threshold. An always-1-classifier has a recall of 100% but a very low precision (produces many false positive). Similarly you can have classifiers with low recall and high precision. How to compare? Compute the F 1 score: 2 Precision Recall Precision + Recall (23)

92 Multiclass classification Sometimes you need to classify among multiple classes. Some of the algorithm we have seen naturally have this possibility (k-nn, Naive Bayes) What about logistic regression?

93 One vs all classification Assume to have three classes in the training set: A, B, C. We can create three new datasets and learn three classifiers: h θ1 : A (1) vs B and C (0) h θ2 : B (1) vs A and C (0) h θ3 : C (1) vs A and B (0) On a new input x, look at the output of the three classifier and assign the class for which h θi (x) is maximized

94 Confusion Matrix For multiclass problems, one can use the confusion matrix to easily visualize errors:

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

06: Logistic Regression

06: Logistic Regression 06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Note: All source materials and diagrams are taken from the Coursera s lectures created by Dr Andrew Ng. Everything I have

More information

Supervised Learning Classification Algorithms Comparison

Supervised Learning Classification Algorithms Comparison Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------

More information

Predictive modelling / Machine Learning Course on Big Data Analytics

Predictive modelling / Machine Learning Course on Big Data Analytics Predictive modelling / Machine Learning Course on Big Data Analytics Roberta Turra, Cineca 19 September 2016 Going back to the definition of data analytics process of extracting valuable information from

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Classification with PAM and Random Forest

Classification with PAM and Random Forest 5/7/2007 Classification with PAM and Random Forest Markus Ruschhaupt Practical Microarray Analysis 2007 - Regensburg Two roads to classification Given: patient profiles already diagnosed by an expert.

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Performance Evaluation of Various Classification Algorithms

Performance Evaluation of Various Classification Algorithms Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------

More information

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Classification. Slide sources:

Classification. Slide sources: Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

8. Tree-based approaches

8. Tree-based approaches Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

Bayes Net Learning. EECS 474 Fall 2016

Bayes Net Learning. EECS 474 Fall 2016 Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

An introduction to random forests

An introduction to random forests An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

Penalizied Logistic Regression for Classification

Penalizied Logistic Regression for Classification Penalizied Logistic Regression for Classification Gennady G. Pekhimenko Department of Computer Science University of Toronto Toronto, ON M5S3L1 pgen@cs.toronto.edu Abstract Investigation for using different

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Classification. Instructor: Wei Ding

Classification. Instructor: Wei Ding Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline Bias

More information

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information

Linear Regression & Gradient Descent

Linear Regression & Gradient Descent Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.

More information

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery Practice notes 2 Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Cyber attack detection using decision tree approach

Cyber attack detection using decision tree approach Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information

More information

Logistic Regression: Probabilistic Interpretation

Logistic Regression: Probabilistic Interpretation Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS535 Big Data Fall 2017 Colorado State University   10/10/2017 Sangmi Lee Pallickara Week 8- A. CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE

More information

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016 CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.

More information

Nonparametric Classification Methods

Nonparametric Classification Methods Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)

More information

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers

More information

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5

More information

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier DATA MINING LECTURE 11 Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie?

More information