Network Traffic Measurements and Analysis
|
|
- Douglas Conley
- 5 years ago
- Views:
Transcription
1 DEIB - Politecnico di Milano Fall, 2017
2
3 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng: Machine Learning course
4 Introduction Given this data, what throughput will we expect to have when 100 people are connected to the BS? Expected throughput (Mbps) Number of active UE
5 Introduction Given this data, what throughput will we expect to have when 100 people are connected to the BS? Expected throughput (Mbps) Number of active UE Fit a straight line?
6 Introduction Given this data, what throughput will we expect to have when 100 people are connected to the BS? Expected throughput (Mbps) Number of active UE Fit a straight line? Fit a second order polynomial?
7 Supervised learning problems Two types of problems: : predict a continuous valued output : classify data into discrete classes Both generally consider many features: Predict the throughput based on number of connected user, signal strength, device type Classify incoming traffic as malicious based on source IP address, length of flows, evil bit
8
9 Agenda We ll focus on the following supervised learning algorithms: : (Multiple) Linear regression Nearest Neighbour : Logistic Regression Decision trees Naive Bayes classifiers Other very popular and powerful algorithms are available: Support Vector Machines (SVM), Neural Networks, Random Forests...
10
11 Linear Regression Starting point: The Training Set m training examples x input variables (features, predictors) y output variable (target, response) (x (i), y (i) ) training examples Linear regression takes the training set and generates an hypothesis function h that tries to estimate the value of y given x: h θ (x) = θ 0 + θ 1 x (1) θ 0, θ 1 are the parameters, determined by a learning system and used to output a prediction y based on the input x
12 Linear Regression - implementation How to choose the parameters? we would like to choose θ s.t. h θ (x) is close to y for our training examples h θ (x) tries to convert x into y. Since we got both x and y we can evaluate how well h θ (x) does this... Define the Mean Squared Error (MSE) cost function: J(θ) = 1 m m (h θ (x (i) ) y (i) ) 2 i=1
13 Linear Regression - implementation How to choose the parameters? we would like to choose θ s.t. h θ (x) is close to y for our training examples h θ (x) tries to convert x into y. Since we got both x and y we can evaluate how well h θ (x) does this... Define the Mean Squared Error (MSE) cost function: J(θ) = 1 m m (h θ (x (i) ) y (i) ) 2 i=1 We want to solve a minimization problem: min θ J(θ) (2)
14 Linear Regression - Normal Equation Let X (design matrix) and y, be 1 x (1) y (1) 1 x (2) y (2) X = 1 x (3), y = y (3)... 1 x (m) y (m) It is possible to show that the optimal parameters θ = [θ 0, θ 1 ] T is: θ = (X T X ) 1 X T y At an arbitrary input x, the prediction is ŷ = h θ (x)
15 40 35 y = *x data1 linear regression 30 Expected throughput (Mbps) Number of active UE
16 Multilinear regression Sometimes, we would like to use more than one feature for making our prediction, e.g.: the throughput may be predicted based on (i) how many users are connected x 1 and (ii) the channel quality x 2 we may also create new features starting from existing ones, e.g., the square of the number of users connected, x 3 = x 2 1 in general, we can have n features x 1,..., x n The hypothesis in case of multilinear regression is: h θ (x) = θ 0 + θ 1 x 1 + θ 2 x θ n x n (3)
17 Multilinear regression The normal equation can still be used as a solution, with the following design matrix: 1 x (1) 1 x (1) 2 x (1) 3... x (1) n 1 x (2) 1 x (2) 2 x (2) 3... x (2) n X = 1 x (3) 1 x (3) 2 x (3) 3... x n (3) x (m) 1 x (m) 2 x (m) 3... x n (m) and θ = [θ 0, θ 1,..., θ n ] T = (X T X ) 1 X T y
18 45 40 y = *x *x y = e-05*x *x 2-0.8*x data1 multilinear regression (quadratic) multilinear regression (cubic) 35 Expected throughput (Mbps) Number of active UE
19 Normal equation - Practical problems The normal equation requires to compute the matrix (X T X ) 1 (X T X ) is an (n + 1) (n + 1) matrix Most implementation compute the matrix inverse in O(n 3 ) Slow if n is large! how large? n 1000 still (relatively) small. (X T X ) may be non-invertible
20 Normal equation - Practical problems The normal equation requires to compute the matrix (X T X ) 1 (X T X ) is an (n + 1) (n + 1) matrix Most implementation compute the matrix inverse in O(n 3 ) Slow if n is large! how large? n 1000 still (relatively) small. (X T X ) may be non-invertible Cause 1: m n (more features than examples... bad idea)
21 Normal equation - Practical problems The normal equation requires to compute the matrix (X T X ) 1 (X T X ) is an (n + 1) (n + 1) matrix Most implementation compute the matrix inverse in O(n 3 ) Slow if n is large! how large? n 1000 still (relatively) small. (X T X ) may be non-invertible Cause 1: m n (more features than examples... bad idea) Cause 2: redundant features (some of the columns of X are linearly dependent. remove them!)
22 Gradient descent What if n is too large? Can we still learn the optimal parameters θ?
23 Gradient descent What if n is too large? Can we still learn the optimal parameters θ? The answer is the Gradient Descent algorithm: Start with some initial values for θ, e.g. θ i = 0 i
24 Gradient descent What if n is too large? Can we still learn the optimal parameters θ? The answer is the Gradient Descent algorithm: Start with some initial values for θ, e.g. θ i = 0 i Change θ in the direction of the gradient in order to reduce the cost function J(θ) a little bit.
25 Gradient descent What if n is too large? Can we still learn the optimal parameters θ? The answer is the Gradient Descent algorithm: Start with some initial values for θ, e.g. θ i = 0 i Change θ in the direction of the gradient in order to reduce the cost function J(θ) a little bit. Repeat until convergence (a local minimum is found) For (multi)linear regression, the cost function is convex and has only a single minimum.
26 Gradient descent What if n is too large? Can we still learn the optimal parameters θ? The answer is the Gradient Descent algorithm: Start with some initial values for θ, e.g. θ i = 0 i Change θ in the direction of the gradient in order to reduce the cost function J(θ) a little bit. Repeat until convergence (a local minimum is found) For (multi)linear regression, the cost function is convex and has only a single minimum. Gradient descent will converge to the same optimal θ found by the normal equation (if α is small enough...)
27 Gradient descent Do the following until convergence: θ j = θ j α θ j J(θ) j (4) that is: θ j = θ j α 1 m m i=1 (h θ (x (i) ) y (i) ) x (i) j (5) with x (i) 0 = 1 for convenience.
28 Gradient descent The variable alpha in: θ j = θ j α 1 m m i=1 (h θ (x (i) ) y (i) ) x (i) j (6) is the learning rate. How to choose it?
29 Gradient descent The variable alpha in: θ j = θ j α 1 m m i=1 (h θ (x (i) ) y (i) ) x (i) j (6) is the learning rate. How to choose it? too small: baby steps, convergence takes too long
30 Gradient descent The variable alpha in: θ j = θ j α 1 m m i=1 (h θ (x (i) ) y (i) ) x (i) j (6) is the learning rate. How to choose it? too small: baby steps, convergence takes too long too big: huge steps, you can miss the minimum and fail to converge Normalizing features (e.g., between -1 and 1) helps in providing numerical stability
31
32 k-nearest Neighbours Regression Simple idea: given unknown input x, output y by averaging the k most similar examples in the training set.
33 k-nearest Neighbours Regression Simple idea: given unknown input x, output y by averaging the k most similar examples in the training set. How many examples should we take (i.e., what is k)?
34 k-nearest Neighbours Regression Simple idea: given unknown input x, output y by averaging the k most similar examples in the training set. How many examples should we take (i.e., what is k)? How should we average the examples? Generally, Inverse Distance Weighting (IDW) is applied. ŷ = k w k y k (7) with w k inversely proportional to dist(x, x (k) ).
35 k-nn Training data knn, k = 2 knn, k = 4 Normalized Expected throughput (Mbps) Normalized Number of active UE
36 k-nn Training data knn, k = 2 knn, k = 4 Normalized Expected throughput (Mbps) Normalized Number of active UE How many samples are needed for prediction? Will it generalize well to new data points?
37 LOESS Linear regression and k-nn can be fused together in the LOESS (LOcal regression) method. For a new input x 0, gather the k points (x i, y i ) whose x i are nearest to x 0 fit a linear model h θ using only these k points output y h at = h θ (x 0 )
38 Bias-Variance trade off We have seen a few algorithm. They work well in practice, but may suffer from:
39 Bias-Variance trade off We have seen a few algorithm. They work well in practice, but may suffer from: underfitting (high bias). It occurs when the model used is too simple (e.g. linear regression).
40 Bias-Variance trade off We have seen a few algorithm. They work well in practice, but may suffer from: underfitting (high bias). It occurs when the model used is too simple (e.g. linear regression). overfitting (high variance). It occurs when the model uses follows the training data too much (e.g. k-nn, high-order polynomial fitting)
41 Evaluating a hypothesis How do we know if our hypothesis is suffering from underfitting or overfitting? Linear regression works by finding θ so that the cost function J(θ) is minimized. What if we found J(θ) = 0. Does it mean we have found the perfect model?
42 Evaluating a hypothesis How do we know if our hypothesis is suffering from underfitting or overfitting? Linear regression works by finding θ so that the cost function J(θ) is minimized. What if we found J(θ) = 0. Does it mean we have found the perfect model? Can t be sure unless we try to generalize! Standard way to evaluate the hypothesis: split the training set! 70% used for training the model, by minimizing J train 20% used for computing the cross validation error J cv 10% used for computing the test error J cv
43 Evaluating a hypothesis data1 linear quadratic 9th degree 70 Normalized Expected throughput (Mbps) Normalized Number of active UE Which one will have the smallest J train? Should we select the model based on that?
44 Good practices for model selection Divide your initial data set in three parts Train all your models on the training set Compute J cv on the cross validation set. Pick the model who has the smallest J cv error Compute generalization error J test on the test set.
45 k-fold cross validation If the total number of observation m is small, we may have too few examples in each set. Also, J cv depends on which 20% of the data is used. To cope with these issues, we can use k-fold cross validation. randomly divide the entire set in k fold of similar size. use the first fold as cross-validation set and the other k 1 as training set. Obtain J cv,k repeat k times, each time changing the cross-validation set Set J cv = 1 k k J cv,k Generally, k = 5 or k = 10 is used
46 Bias-variance diagnosis How do we understand if our model is suffering from overfitting? We look at the train and cross-validation errors. High bias: both errors are high High variance: training error is low and cv error is high
47 Bias-variance diagnosis Another possibility is looking at the learning curves
48 Bias-variance diagnosis Another possibility is looking at the learning curves High bias: errors are close and high, adding data does not help, you should use a different / more complex model High variance: adding data could help, but you should consider using a simpler model or using regularization
49 Regularization / Ridge regression Overfitting happens when the model is too complex, or using an excessive number of features. Main idea: modify the cost function adding a penalty on features J(θ) = 1 m m [ (h θ (x (i) ) y (i) ) 2 + λ i=1 n θj 2 ] (8) j=1 The variable λ controls the amount of penalization. λ should be chosen carefully if too big we ll have underfitting if too small it s like not using it
50
51 When the target variable y is discrete (e.g., 0/1), we talk about classification problems 1 (Type 2) Application Type 0 (Type 1) Flow duration
52 We could use linear regression with a threshold: Estimate h θ (x) = θ T x = θ 0 + θ 1 x Output 0 if θ T x < 0.5, 1 otherwise data1 linear 1 (Type 2) Application Type (Type 1) Flow duration
53 Linear regression as classifier Problem 1: the addition of a single point changes things a lot! data1 linear 1 (Type 2) data1 linear 1 (Type 2) Application Type 0.5 Application Type (Type 1) Flow duration 0 (Type 1) Flow duration Problem 2: h θ (x) output values much greater than 1 or much smaller than 0
54 k-nn classifier We could use a k-nearest Neighbour classifier: we take the k most similar example and output the class who occurs most often (majority voting). 1 (Type 2) Application Type (Type 1) Flow duration
55 k-nn classifier We could use a k-nearest Neighbour classifier: we take the k most similar example and output the class who occurs most often (majority voting). 1 (Type 2) Application Type (Type 1) Flow duration Will it generalize well to new data points?
56 Logistic regression To cope with those problems, logistic regression is introduced: 1 h θ (x) = 1 + e θt x where the function f (z) = 1 1+e z is the sigmoid function (9) /1+e -z z
57 Sigmoid function The sigmoid function h θ (x) = 1 1+e θt x outputs values between 0 and 1 crosses 0 at 0.5 When θ T x 0, h θ (x) 0.5 When θ T x < 0, h θ (x) < 0.5 The output can be interpreted as the probability of being of belonging to a particular class.
58 Logistic regression cost function In order to do this, use a modified cost function (in order to make it a convex function): J(θ) = 1 m m [ y (i) log h θ (x (i) ) + (1 y (i) ) log(1 h θ (x (i) )] (10) i=1 One can use gradient descent (or other more complicated algorithms) to solve for parameters θ.
59 Logistic regression 1 (Type 2) Application Type (Type 1) Flow duration In this example, θ = [ 20.9, 2.83] T. The boundary value is at θ T x = 0 that is x = 7.38
60 Decision boundary With more than one feature, logistic regression finds a decision boundary in the space of the features: 25 Type 1 Type 2 20 Flow Duration [s] Packet Length [byte] 1 In this example, h θ (x) =, and the line 1+e (θ 0 +θ 1 x 1 +θ 2 x 2 ) x 2 = 1 θ 2 θ 0 + θ 1 x 1 is the decision boundary.
61 Decision trees Very simple and powerful algorithms, used standalone or to build more complex algorithms (e.g. Random Forest). They can be used for both classification and regression. Main idea: follow a series of questions, and take a path depending on the answer. 25 Type 1 Type 2 20 packet length < 773? Flow Duration [s] YES NO Packet Length [byte] Type 0 Type 1
62 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set
63 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set How does it work: start by the training set, identify a feature and a split criterion on it. Two children nodes are generated
64 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set How does it work: start by the training set, identify a feature and a split criterion on it. Two children nodes are generated For each generated node, repeat the process.
65 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set How does it work: start by the training set, identify a feature and a split criterion on it. Two children nodes are generated For each generated node, repeat the process. Identify a stop criterion, or stop when all nodes contain example of the same class.
66 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set How does it work: start by the training set, identify a feature and a split criterion on it. Two children nodes are generated For each generated node, repeat the process. Identify a stop criterion, or stop when all nodes contain example of the same class. When stopping, output the class label that occurs most.
67 Growing (learning) a tree Growing a decision tree is a greedy process that tries to minimize the misclassification error: final tree is not optimal different trees may be learned from the same data set How does it work: start by the training set, identify a feature and a split criterion on it. Two children nodes are generated For each generated node, repeat the process. Identify a stop criterion, or stop when all nodes contain example of the same class. When stopping, output the class label that occurs most. Prune the tree
68 Growing a decision tree Output variable y = k. In our example k = {0, 1} The training set is represented by L = (x (i), y (i) ), i = 1... m Each node t in the tree will contain a set of associated observations L(t). The root of the tree contains all observations L(t 1 ) = L. How and when to split a node t?
69 Growing a decision tree Output variable y = k. In our example k = {0, 1} The training set is represented by L = (x (i), y (i) ), i = 1... m Each node t in the tree will contain a set of associated observations L(t). The root of the tree contains all observations L(t 1 ) = L. How and when to split a node t? If all the observation in a node L(t) belong to the same class j, we do not split. We declare t to be a leaf node and whenever a new observation x reaches t, we declare y = k.
70 Splitting criteria Assume a node t contains samples of different classes. We need to decide: which feature x j to operate the split onto, at which value v to operate the split in order to output the question: is x j < v?
71 Node impurity We introduce the measure Q(t). If all observations of node t are of the same class, Q(t) = 0. When the distribution of the classes in a node is uniform, Q(t) takes the maximum value. When we split a node, we try to decrease the impurity as much as possible
72 Impurity measures Let p t,k = p(k t) the proportion of class k observations in node t. Gini index: Q(t) = k p t,k (1 p t,k ) (11) Cross-entropy: Q(t) = k p t,k log(p t,k ) (12)
73 Split criterion To measure a split s change in impurity, we can evaluate: Q = Q(t) p L Q(t L ) p R Q(t R ) where p L and p R are the proportions of observations that fall in the left and right children node, respectively. In order to find which node t to operate on and which variable, we test all possible splits! At each step, we will choose the node t for which Q is higher.
74 Tree pruning After the learning phase, the full tree may be very complex. Some nodes can be discarded or pruned. Given a full tree T 0, pruning seeks another tree T with a smaller number of leaves, at the cost of an higher misclassification error R. The process tries to minimize: R α (T ) = R(T ) + α T (13) where alpha is a penalty on the tree size, and T is the number of nodes in the tree.
75 Decision tree summary Pros: Cons: Highly interpretable. Very instable (small change in training set may produce a very different tree) May be complex to train (depending on the data) May not generalize well to new data
76 Other approaches Support Vector Machines Finds the decision boundary so that the distance (margin) between the classes is maximized Binary Classifier Neural Networks Learn weights and activations (parameters) of a complex layered structure. By doing this the network learns also which are the best features. Works very well in many scenarios and can output multiple classes. Basis for deep learning Difficult to interpret
77 Ensemble methods Fuse information from many weak classifier into a strong one. Gold standard: Random Forest Main goal: reduce variance/overfitting of trees Majority voting over multiple trees learnt from multiple training set obtained by sampling the original set with replacement (bagging). For each tree, each time split over only a random subset of features. Additionally apply boosting: learn tree sequentially, each time creating a new bag paying higher attention on misclassified samples.
78 Discriminative vs Generative classifiers The algorithms seen so far are discriminative algorithms. They look at examples from all classes (0s and 1s) and find a decision boundary or a set of rules that separates the classes. They learn: in a direct way. p(y x) (14)
79 Discriminative vs Generative classifiers The algorithms seen so far are discriminative algorithms. They look at examples from all classes (0s and 1s) and find a decision boundary or a set of rules that separates the classes. They learn: p(y x) (14) in a direct way. In contrast, a generative learning algorithm looks at only one class of examples at a time, and learns p(x y) and p(y) (15) what are the features like, given a particular class (and the class prior).
80 Naive Bayes classifier Recalling the Bayes Theorem: p(y x) = p(x y)p(y) p(x) (16) Naive Bayes assume that variables x 1, x 2,..., x n are conditionally independent. Therefore: p(x y) = i p(x i y) (17) p(y) is the class prior and can be easily obtained from the available data p(x) = j p(x y j)p(y j ), but is often dropped as the probability of the data is constant
81 Naive Bayes classifier Recalling the Bayes Theorem: p(y x) = p(x y)p(y) p(x) (16) Naive Bayes assume that variables x 1, x 2,..., x n are conditionally independent. Therefore: p(x y) = i p(x i y) (17) p(y) is the class prior and can be easily obtained from the available data p(x) = j p(x y j)p(y j ), but is often dropped as the probability of the data is constant Note: variables are never independent. However Naive Bayes works well in practice.
82 Naive Bayes: estimation We need to estimate p(x i y) for each feature from the available data. For continuous features, generally a Gaussian distribution is assumed. p(x i, µ i, σ 2 i ) = 1 2πσi 2 exp( (x i µ i ) 2 2σi 2 ) (18) Therefore we simply need to estimate µ i and σ i for each feature of our data. For discrete variables, binomial/multinomial distributions are used.
83 Naive Bayes: example 25 Type 1 Type 2 20 Flow Duration [s] Packet Length [byte]
84 Naive Bayes: example p(x 1 y = 0): µ 1 = 364, σ1 2 = p(x 2 y = 0): µ 2 = 3.1, σ2 2 = 1.81 p(x 1 y = 1): µ 1 = , σ1 2 = p(x 2 y = 1): µ 2 = 13.8, σ2 2 = Type 1 Type Flow Duration [s] Packet Length [byte]
85 Naive Bayes: classification Given a new observation x, we can predict the class it belongs: compute p(y j x) using Bayes Theorem for all y j output the class j for which p(y j x) is maximized Example 1: x 1 = 700, x 2 = 6 p(x 1 = 700 y = 0) p(x 2 = 6 y = 0) p(y = 0) = p(x 1 = 700 y = 1) p(x 2 = 6 y = 1) p(y = 1) = Example 2: x 1 = 900, x 2 = 6 p(x 1 = 900 y = 0) p(x 2 = 6 y = 0) p(y = 0) = p(x 1 = 900 y = 1) p(x 2 = 6 y = 1) p(y = 1) =
86 Other approaches Linear Discriminant Analysis Similar to Naive Bayes, without the independence assumption Assume that all classes share the same covariance matrix Quadratic Discriminant Analysis Similar to LDA, but assume that all classes have different covariance matrix LDA is simpler than QDA (has lower variance). Use it when m is small.
87 Error analysis For regression problem, the MSE can be used to evaluate the cross-validation or test error. MSE = 1 m m (y i ŷ i ) 2 (19) i=1 For a binary classification problem, we could use the classifier accuracy: ACC = 1 1 I (y i ŷ i ) (20) m where I (y i ŷ i ) = 1 if y i ŷ i and 0 otherwise. Is it a good metric? i=1
88 Skewed classes Assume you have a test set with m = 100 examples of traffic flows, and need to classify between neutral (0) or malicious traffic (1) Assume that 99 examples are neutral and only one malicious. What is the accuracy of a dummy classifier that always outputs 0? What can we do to better analyse and compare the classifier performance?
89 Precision and Recall Define: True Positive (TP): malicious flows that were classified as malicious False Positive (FP): neutral flows that were classified as malicious (false alarms) True Negative (TN): neutral flows classified as neutrals False Negative (FN): malicious flows classified as neutrals (miss) We have: Precision: how often our algorithm cause a false alarm? Recall: how sensitive is our algorithm?
90 Precision and Recall Precision: (i) how often our algorithm causes a false alarm? (ii) among all predicted positive examples, how many were actually positive? True Positive Number of Predicted Positive = TP TP + FP (21) Recall: (i) how sensitive is our algorithm? (ii) among all positive examples present in the set, how many were identified? True Positive Number of Actual Positive = TP TP + FN (22)
91 F 1 Score Often you can control a tradeoff between recall and precision using a threshold. An always-1-classifier has a recall of 100% but a very low precision (produces many false positive). Similarly you can have classifiers with low recall and high precision. How to compare? Compute the F 1 score: 2 Precision Recall Precision + Recall (23)
92 Multiclass classification Sometimes you need to classify among multiple classes. Some of the algorithm we have seen naturally have this possibility (k-nn, Naive Bayes) What about logistic regression?
93 One vs all classification Assume to have three classes in the training set: A, B, C. We can create three new datasets and learn three classifiers: h θ1 : A (1) vs B and C (0) h θ2 : B (1) vs A and C (0) h θ3 : C (1) vs A and B (0) On a new input x, look at the output of the three classifier and assign the class for which h θi (x) is maximized
94 Confusion Matrix For multiclass problems, one can use the confusion matrix to easily visualize errors:
Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationTree-based methods for classification and regression
Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationAdvanced Video Content Analysis and Video Compression (5LSH0), Module 8B
Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationCSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13
CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit
More information06: Logistic Regression
06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationUnderstanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)
Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Note: All source materials and diagrams are taken from the Coursera s lectures created by Dr Andrew Ng. Everything I have
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationPredictive modelling / Machine Learning Course on Big Data Analytics
Predictive modelling / Machine Learning Course on Big Data Analytics Roberta Turra, Cineca 19 September 2016 Going back to the definition of data analytics process of extracting valuable information from
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationClassification with PAM and Random Forest
5/7/2007 Classification with PAM and Random Forest Markus Ruschhaupt Practical Microarray Analysis 2007 - Regensburg Two roads to classification Given: patient profiles already diagnosed by an expert.
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationClassification. Slide sources:
Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More information8. Tree-based approaches
Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationPart I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes
Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationCART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology
CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationBayes Net Learning. EECS 474 Fall 2016
Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationBoosting Simple Model Selection Cross Validation Regularization
Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,
More informationAn introduction to random forests
An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationBoosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]
Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization
More informationPenalizied Logistic Regression for Classification
Penalizied Logistic Regression for Classification Gennady G. Pekhimenko Department of Computer Science University of Toronto Toronto, ON M5S3L1 pgen@cs.toronto.edu Abstract Investigation for using different
More informationDecision trees. Decision trees are useful to a large degree because of their simplicity and interpretability
Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful
More information6.034 Quiz 2, Spring 2005
6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationClassification/Regression Trees and Random Forests
Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationClassification. Instructor: Wei Ding
Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification
More informationBIOINF 585: Machine Learning for Systems Biology & Clinical Informatics
BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline Bias
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationEquation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.
Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way
More information1) Give decision trees to represent the following Boolean functions:
1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following
More informationLinear Regression & Gradient Descent
Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationCSE 158. Web Mining and Recommender Systems. Midterm recap
CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationCyber attack detection using decision tree approach
Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information
More informationLogistic Regression: Probabilistic Interpretation
Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationRESAMPLING METHODS. Chapter 05
1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.
More informationNonparametric Classification Methods
Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)
More informationEnsemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar
Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers
More informationMachine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme
Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /
More informationSupport Vector Machines
Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to
More informationCPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017
CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after
More informationCS229 Lecture notes. Raphael John Lamarre Townshend
CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5
More informationDATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier
DATA MINING LECTURE 11 Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie?
More information