Machine Learning Models for Pattern Classification. Comp 473/6731

Size: px
Start display at page:

Download "Machine Learning Models for Pattern Classification. Comp 473/6731"

Transcription

1 Machine Learning Models for Pattern Classification Comp 473/6731 November 24th 2016 Prof. Neamat El Gayar

2 Neural Networks Neural Networks Low level computational algorithms Learn by example (no required algorithms) Good performance for real time applications Adaptive, generalize, robust, fault tolerant,.etc Drawbacks: Convergence to optimal solution not guaranteed Some implementation heuristics (learning rate, stopping criteria,.etc) Learning sometimes tedious and slow Black-box architecture Interpretable model? Not easy to describe human knowledge 2

3 Topics Popular Classifiers: KNN Support Vector Machines Decision Trees Classifier testing and Evaluation 3

4 More Classifiers

5 Outline Lecture Some popular ML models for classification K-nearest neighbor (KNN) Support vector machines (SVM) Decision Trees 5

6 K-Nearest Neighbor Classifiers Learning by analogy: Tell me who your friends are and I ll tell you who you are A new example is assigned to the most common class among the (K) examples that are most similar to it. To determine the class of a new example E: Calculate the distance between E and all examples in the training set Select K-nearest examples to E in the training set Assign E to the most common class among its K-nearest neighbors 6

7 K-Nearest Neighbor Classification (knn) Unlike all the previous learning methods, knn does not build model from the training data. To classify a test instance d, define k- neighborhood P as k nearest neighbors of d Count number n of training instances in P that belong to class c j Estimate Pr(c j d) as n/k No training is needed. Classification time is linear in training set size for each test case. 7

8 knnalgorithm k is usually chosen empirically via a validation set or cross-validation by trying a range of k values. Distance function is crucial, but depends on applications. 8

9 Example: k=6 (6NN) Government Science Arts A new point Pr(science )? 9

10 Discussions knn can deal with complex and arbitrary decision boundaries. Despite its simplicity, researchers have shown that the classification accuracy of knn can be quite strong and in many cases as accurate as those elaborated methods. knn is slow at the classification time knn does not produce an understandable model 10

11 K-Nearest Neighbor Classifier Strengths: Simple to implement and use Comprehensible easy to explain prediction Robust to noisy data by averaging k-nearest neighbors. Some appealing applications 11

12 Outline Lecture Some popular ML models K-nearest neighbor (KNN) Support vector machines (SVM) Decision Trees Summary & some ML Advice 12

13 Why are SVM interesting? Discriminant based Classifiers Linear Classifier that use the Kernel trick for nonlinearily Maximize generalization Support vectors represent knowledge Learning is based on complex optimization 13

14 Introduction Support vector machines were invented by V. Vapnik and his co-workers in 1970s in Russia and became known to the West in SVMs are linear classifiers that find a hyperplane to separate two class of data, positive and negative. Kernel functions are used for nonlinear separation. SVM not only has a rigorous theoretical foundation, but also performs classification more accurately than most other methods in applications, especially for high dimensional data. It is perhaps the best classifier for text classification. 14

15 Review of Linear Classifiers Linear classifiers One of the simplest classifiers Linear decision boundary Applicable to linearly separable tasks Perceptron One of the most popular linear classifiers Perceptron learning Given a training set containing a number of examples with labels Iteratively update weights of linear equation until convergence Sensitive to initialisation and example input orders during learning Could generate decision boundaries of different generalization capabilities x2-1 y w T +1 x1 w T x b 0 y x b 15

16 Motivation and Concept Perceptron learning denotes +1 denotes -1 Q: How would you classify this data set with the perceptron? 16

17 Motivation and Concept Perceptron learning denotes +1 w T x b 0 denotes -1 Q: How would you classify this data set? A: using perceptron learning rule to learn weights w, b 17

18 Motivation and Concept Perceptron learning denotes +1 denotes -1 w T x b 0 Q: How would you classify this data set? A: Using perceptron learning rule to learn weights w, b Q: With different initial values of w, b and example orders what happens? 18

19 Motivation and Concept Perceptron learning denotes +1 denotes -1 w T x b 0 Q: How would you classify this data set? A: Using perceptron learning rule to learn weights w, b Q: With different initial values of w, b and example orders what happens? A: Leading to many decision boundaries 19

20 Motivation and Concept Perceptron learning denotes +1 denotes -1 w T x b 0 Q: How would you classify this data set? A: Using perceptron learning rule to learn weights w, b Q: With different initial values of w, b and example orders what happens? A: Leading to many decision boundaries Q: Which decision boundary is the best for generalisation? 20

21 Motivation and Concept Margin of linear classifier denotes +1 denotes -1 w T x b 0 Definition: The margin of a linear classifier as the width that the boundary could be increased by before hitting a data point. 21

22 Motivation and Concept Maximum margin denotes +1 w T x b 0 denotes -1 The maximum margin is the one with the widest width to a data point. The maximum margin linear classifier is linear Support Vector Machines. 22

23 Motivation and Concept Support Vectors denotes +1 w T x b 0 denotes -1 Support Vectors Support Vectors are those data points that the margin pushes up against 23

24 Motivation and Concept SVM: the best solution in terms of generalisation denotes +1 denotes -1 Support Vectors w T x b 0 Intuitively this feels safest. If we ve made a small error in the location of the boundary this gives us least chance of causing a misclassification. The model is immune to removal of any non-supportvector data points. There s some theory (using VC dimension) that is related to the proposition that this is a good thing. Empirically it works very well. 24

25 SVM Learning Objectives: finding appropriate weights w and bias b to minimize training errors (the same as that of Perceptron) maximize the margin What is the relationship between weights and margin? with knowledge of analytic geometry, we obtain Support vectors: Margin : M y sv Learning rule is no longer that simple like perceptron! need to search the space of w s and b s to find the widest margin that matches all the data points or support vectors How? Using a Quadratic Programming (QP) algorithm! 2 T W W w 2 1 T ( w x b) 1. sv 2 w 2 n 25

26 Basic concepts Let the set of training examples D be {(x 1, y 1 ), (x 2, y 2 ),, (x r, y r )}, where x i = (x 1, x 2,, x n ) is an input vector in a realvalued space X R n and y i is its class label (output value), y i {1, -1}. 1: positive class and -1: negative class. SVM finds a linear function of the form (w: weight vector) f(x) = w x + b y i 1 1 if if w w x x i i b b

27 The hyperplane The hyperplane that separates positive and negative training data is w x + b = 0 It is also called the decision boundary (surface). So many possible hyperplanes, which one to choose? 27

28 Maximal margin hyperplane SVM looks for the separating hyperplane with the largest margin. Machine learning theory says this hyperplane minimizes the error bound 28

29 Linear SVM: separable case Assume the data are linearly separable. Consider a positive data point (x +, 1) and a negative (x -, - 1) that are closest to the hyperplane <w x> + b = 0. We define two parallel hyperplanes, H + and H -, that pass through x + and x - respectively. H + and H - are also parallel to <w x> + b = 0. 29

30 Compute the margin Now let us compute the distance between the two margin hyperplanes H + and H -. Their distance is the margin (d + + d in the figure). Recall from vector space in algebra that the (perpendicular) distance from a point x i to the hyperplane w x + b = 0 is: where w is the norm of w, w x i w b w ww w1 w... wn 2 (36) (37) 30

31 Compute the margin (cont ) Let us compute d +. Instead of computing the distance from x + to the separating hyperplane w x + b = 0, we pick up any point x s on w x + b = 0 and compute the distance from x s to w x + + b = 1 by applying the distance Eq. (36) and noticing w x s + b = 0, d w x s b 1 w margin d d 2 w 1 w (38) (39) 31

32 A optimization problem! Definition (Linear SVM: separable case): Given a set of linearly separable training examples, y D = {(x 1, y 1 ), (x 2, y 2 ),, (x r, y r )} Learning is to solve the following constrained minimization problem, i Minimize: Subject to: w w 2 y ( w x ( w x b 1, i 1,2,..., r i w x i + b 1 for y i = 1 w x i + b -1 for y i = -1. i i b) 1, summarizes i 1, 2,..., r (40) 32

33 The final decision boundary Finding the support vectors is equivalent to training the SVM From optimization we obtain the values for i, which are used to compute the weight vector w and the bias b The decision boundary w x b y x x b 0 (57) isv Testing: Use (57). Given a test instance z, sign( w z b) i If (58) returns 1, then the test instance z is classified as positive; otherwise, it is classified as negative, x i are the support vectors. i i sign isv y i i x i z b (58) 33

34 How to deal with nonlinear separation? The SVM formulations require linear separation. Real-life data sets may need nonlinear separation. To deal with nonlinear separation, the same formulation and techniques as for the linear case are still used. We only transform the input data into another space (usually of a much higher dimension) so that a linear decision boundary can separate positive and negative examples in the transformed space, The transformed space is called the feature space. The original data space is called the input space. 34

35 Space transformation The basic idea is to map the data in the input space X to a feature space F via a nonlinear mapping, : x X F (76) ( x) After the mapping, the original training data set {(x 1, y 1 ), (x 2, y 2 ),, (x r, y r )} becomes: {((x 1 ), y 1 ), ((x 2 ), y 2 ),, ((x r ), y r )} (77) 35

36 Geometric interpretation In this example, the transformed space is also 2-D. But usually, the number of dimensions in the feature space is much higher than that in the input space 36

37 An example space transformation Suppose our input space is 2-dimensional, and we choose the following transformation (mapping) from 2-D to 3-D: 2 ( x1, x2) ( x1, x2, 2x1x2 ) The training example ((2, 3), -1) in the input space is transformed to the following in the feature space: ((4, 9, 8.5), -1) 2 37

38 Kernel functions In solving the quadratic optimization problem of SVM we only require dot products (x) (z) and never the mapped vector (x) in its explicit form. This is a crucial point. Good news: explicit transformation is not needed. Thus, if we have a way to compute the dot product (x) (z) using the input vectors x and z directly, no need to know the feature vector (x) or even itself. In SVM, this is done through the use of kernel functions, denoted by K, K(x, z) = (x) (z) (82) 38

39 An example kernel function Polynomial kernel K(x, z) = x z d Let us compute the kernel with degree d = 2 in a 2- dimensional space: x = (x 1, x 2 ) and z = (z 1, z 2 ). This shows that the kernel x z 2 is a dot product in a transformed feature space (83), ) ( ) ( ) 2 ( ) 2 ( 2 ) ( z x z x z z, z, z x x, x, x z x z x z x z x z x z x (84) 39

40 Kernel trick The derivation in (84) is only for illustration purposes. We do not need to find the mapping function. We can simply apply the kernel function directly by replace all the dot products (x) (z) in the optimization problem with the kernel function K(x, z) (e.g., the polynomial kernel x z d in (83)). This strategy is called the kernel trick. 40

41 Commonly used kernels It is clear that the idea of kernel generalizes the dot product in the input space. This dot product is also a kernel with the feature map being the identity 41

42 Some other issues in SVM SVM works only in a real-valued space. For a categorical attribute, we need to convert its categorical values to numeric values. SVM does only two-class classification. For multi-class problems, some strategies can be applied, e.g., oneagainst-rest, and error-correcting output coding. The hyperplane produced by SVM is hard to understand by human users. The matter is made worse by kernels. Thus, SVM is commonly used in applications that do not required human understanding. 42

43 43 Kernel SVM Weight/bias solution and decision rule ]. ), ( sign[ ] ) ( ) ( sign[ ] ) ( sign[ )] ( [ )], ( [ 1 )] ( ) ( [ 1 )] ( [ 1 ), ( b K v b v b K v y n v y n y n b v i SV i i i T i i T T j i i i j j j i T i i j j j T j j i i i x x x x x w x w x x x x x w x w SV SV SV SV SV SV SV σ (internal for SVs) In practice, we don t use this transformation but a kernel on inner product. (decision rule)

44 Conclusions (Linear) SVM is a state-of-the-art linear classifier Developed based on statistical learning theory Learning process seeking support vectors or maximum margin Best generalisation performance guaranteed for linearly separable data sets Kernel trick : extending linear SVM to non-linear SVM In principle, data points are mapped onto a higher dimensional feature space so that they are linearly separable in that space In reality, a kernel function directly works on the inner product of data points With the kernel transformation, linear SVM extended to non-linear SVM can be extended to multi-category classification Decompose the problem into multiple binary classification tasks Variants of SVM that can tackle multi-category classification in a straightforward way 44

45 Outline Lecture Some popular ML models K-nearest neighbor (KNN) Support vector machines (SVM) Decision Trees Summary & some ML Advice 45

46 Example Pattern is described by a list of attributes Fruit: Colour {red, green, yellow} Texture {shiny, rough, smooth} Shape { round, thin} Taste {sweet, sour} Size {big, small} 46

47 47

48 What is a decision Tree? It is a hierarchical data structure implementing a divide and conquer strategy It is composed of internal decision nodes and terminal leaves Decision nodes: implement a function (test) Leave nodes: produce output (decision) 48

49 What makes DT interesting? Nonlinear decision boundaries Categorical data Interpretable models Extract classification Rules from model 49

50 Decision Tree for Classification A pattern is classified by a sequence of questions What is special about that? You can handle nominal data Interpretability: 1. Get insight about why a test pattern was classified to belong to a certain class X={sweet, yellow, thin, medium} this is a banana because it is yellow and thin 2. Derive classification Rules (logical descriptions for categories) Apple= (green AND medium) OR (red AND medium) Integrate Expert Knowledge Decision Rapid classification Trees are by sometimes simple query(using preferred over only more necessary accurate tests) (NN?) but less interpretable model 50

51 51

52 Decision Trees Univariate Trees: Classification Trees Discrete Variables Continuous Variables Pruning Rule Extraction form Trees 52

53 Tree Uses Nodes, and Leaves 53

54 The figures below are such examples. This type of trees is known as Ordinary Binary Classification Trees (OBCT). The decision hyperplanes, splitting the space into regions, are parallel to the axis of the spaces. Other types of partition are also possible, yet less popular. 54

55 Divide and Conquer Internal decision nodes Univariate: Uses a single attribute, x i Numeric x i : Binary split : x i > w m Discrete x i : n-way split for n possible values Multivariate: Uses all attributes, x Leaves Classification: Class labels, or proportions Regression: Numeric; r average, or local fit Learning is greedy; find the best split recursively (Breiman et al, 1984; Quinlan, 1986, 1993) 55

56 Tree Induction Creating a Tree How does it work: Start at the root (complete training data) repeat following steps recursively { Look for the best split Split the training data into: 2 splits if attribute is numeric n splits if attribute is discrete Continue for resulting splits until no more splitting is needed create leaf node } 56

57 Classification Trees (ID3, CART, C4.5) For node m, N m instances reach m, N i m belong to C i Pˆ C i x,m p i m N N Node m is pure if p i m is 0 or 1 Measure of impurity is entropy i m m I m K i 1 p i m log 2 p i m 57

58 Best Split If node m is pure, generate a leaf and stop, otherwise split and continue recursively Impurity after split: N mj of N m take branch j. N i mj belong to C i I' Pˆ m C i x,m, j n j 1 N N mj m Find the variable and split that min impurity (among all variables -- and split positions for numeric variables) K i 1 p p i mj i mj N N log 2 i mj mj p i mj Overall Impurity: Quantifies goodness of a split 58

59 Finding the best split position for numeric variables How can we choose w Test at decision node: m No f m (x): need x i to > try w m all values We have at most N m -1 poss Enough to test splits where divides adjacent points input belong space to into: Ldifferent m ={x / xclasses i > w m } left branch ==for x 1 try points A, B, C, D (5,6.5,7.5, 8.5) ==for x 2 try points A, B, C, D R(2.5,3.5,4.5,5.5) m ={x /x i < w m } right branch 59

60 Which is the best tree? For a given training data; many trees exist that code this data with no error. We need to find the smallest among these trees Tree size is measured as: No. of nodes on the tree Complexity of the decision nodes 60

61 Model Selection in Trees: Y=0.9 61

62 Pruning Trees Remove subtrees for better generalization (decrease variance) Prepruning: Early stopping (ex. Instance reaching node< 5% data) Postpruning: Grow the whole tree then prune subtrees which overfit on the pruning set Prepruning is faster, postpruning is more accurate (requires a separate pruning set) 62

63 Rule Extraction from Trees C4.5Rules (Quinlan, 1993) If-then rules =>Rule Base 63

64 Rule Extraction from Trees(cont.) A decision rule does its own feature extraction: Certain features might not be used (see feature x3) Features close to the root are globally more important Interpretability: - Allows model to be verified by expert - Get insight on important variables and how they effect output decision - Describe features and their relations that describe a certain class - Rules can also be pruned 64

65 Multivariate Trees 65

66 Remarks: A critical factor in the design is the size of the tree. Usually one grows a tree to a large size and then applies various pruning techniques. Decision trees belong to the class of unstable classifiers. This can be overcome by a number of averaging techniques. Bagging is a popular technique. Using bootstrap techniques in X, various trees are constructed, T i, i=1, 2,, B. The decision is taken according to a majority voting rule. 66

67 Outline Lecture Some popular ML models K-nearest neighbor (KNN) Support vector machines (SVM) Decision Trees Evaluation and best practices 67

68 Evaluation and best Practices

69 Classifier Performance assessment How good is the classifier that I designed? How well does it compare to competing techniques? Related question: Can an ensemble of classifiers improve performance? 69

70 Topics Accuracy and Error Measures Classifier accuracy measures Predictor error measures Classifier Evaluation: Evaluation Criteria Evaluation Methods: Holdout method and Random Resampling Cross Validation Bootstrapping Comparing classifier performance

71 Learning a Class from Examples Class C of a family car Prediction: Is car x a family car? Knowledge extraction: What do people expect from a family car? Output: Positive (+) and negative ( ) examples Input representation: x 1 : price, x 2 : engine power 71

72 Training set X X x t N { t,r } t 1 r 1 if 0 if x x is is positive negative x x x

73 Class C p price p AND e engine power e

74 Hypothesis class H h( x) 1 if h 0 if h classifies x classifies x as positive as negative Error of h on H N E( h X) 1h x t 1 t r t 74

75 Noise and Model Complexity Use the simpler one because Simpler to use (lower computational complexity) Easier to train (lower space complexity) Easier to explain (more interpretable) Generalizes better (lower variance - Occam s razor) 75

76 Bias- Variance Trade-off Bias and variance measure the alignment or match of the learning algorithm to the classification problem. Bias measures quality of match: (keep low ) Has enough free parameters to match problem; not too simple high bias more rigid to capture data characteristics Variance measures precision of match: (keep low) Performance does not change with slight changes in training data - favours simple models Procedures with increased flexibility tend to have low bias but high variance and vice versa Bias- Variance Trade-off 76

77 Accuracy and Error Measures

78 Classification measures Accuracy is only one measure (error = 1-accuracy). Accuracy is not suitable in some applications. In text mining, we may only be interested in the documents of a particular topic, which are only a small portion of a big document collection. In classification involving skewed or highly imbalanced data, e.g., network intrusion and financial fraud detections, we are interested only in the minority class. High accuracy does not mean any intrusion is detected. E.g., 1% intrusion. Achieve 99% accuracy by doing nothing. The class of interest is commonly called the positive class, and the rest negative classes. 78

79 Precision and recall measures Used in information retrieval and text classification. We use a confusion matrix to introduce them. 79

80 Precision and recall measures (cont ) p TP. TP FP TP TP FN Precision p is the number of correctly classified positive examples divided by the total number of examples that are classified as positive. Recall r is the number of correctly classified positive examples divided by the total number of actual positive examples Machine in the learning test model for set. Pattern r. 80

81 An example This confusion matrix gives precision p = 100% and recall r = 1% because we only classified one positive example correctly and no negative examples wrongly. Note: precision and recall only measure classification on the positive class. 81

82 F1-value (also called F1-score) It is hard to compare two classifiers using two measures. F 1 score combines precision and recall into one measure The harmonic mean of two numbers tends to be closer to the smaller of the two. For F 1 -value to be large, both p and r much be large. 82

83 Measuring Error Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives = TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found = TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity 83

84 ROC Curve 84

85 Evaluation of classifiers

86 Evaluating classification methods (Han & Kamber, 2 nd edition, 2006; chapter 6) Predictive accuracy/error Efficiency time to construct the model time to use the model Robustness: handling noise and missing values Scalability: efficiency in disk-resident databases Interpretability: understandable and insight provided by the model Compactness of the model: size of the tree, or the number of rules. 86

87 Algorithm Preference Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability Cost-sensitive learning 87

88 Evaluation methods Holdout set: The available data set D is divided into two disjoint subsets, the training set D train (for learning a model) the test set D test (for testing the model) Important: training set should not be used in testing and the test set should not be used in learning. Unseen test set provides a unbiased estimate of accuracy. The test set is also called the holdout set. (the examples in the original data set D are all labeled with classes.) This method is mainly used when the data set D is large. 88

89 Evaluation methods (cont ) k-fold cross-validation: The available data is partitioned into k equal-size disjoint subsets. Use each subset as the test set and combine the rest k-1 subsets as the training set to learn a classifier. The procedure is run k times, which give k accuracies. The final estimated accuracy of learning is the average of the k accuracies. 10-fold and 5-fold cross-validations are commonly used. This method is used when the available data is not large. 89

90 Resampling and K-Fold Cross-Validation K K K K K K X X X T X V X X X T X V X X X T X V The need for multiple training/validation sets {X i,v i } i : Training/validation sets of fold i K-fold cross-validation: Divide X into k, X i,i=1,...,k T i share K-2 parts 90

91 Cross-Validation To estimate generalization error, we need data unseen during training. We split the data as Training set (50%) Validation set (25%) Test (publication) set (25%) Resampling when there is few data 91

92 5 2 Cross-Validation 5 times 2 fold cross-validation (Dietterich, 1998) T T T T T T X X X X X X 2 5 V V V V 1 V 2 3 V X X X X 2 1 X X

93 Bootstrapping Draw instances from a dataset with replacement Prob that we do not pick an instance after N N draws N e that is, only 36.8% is new! 93

94 Evaluation methods (cont ) Leave-one-out cross-validation: This method is used when the data set is very small. It is a special case of cross-validation Each fold of the cross validation has only a single test example and all the rest of the data is used in training. If the original data has m examples, this is m-fold crossvalidation 94

95 Evaluation methods (cont ) Validation set: the available data is divided into three subsets, a training set, a validation set and a test set. A validation set is used frequently for estimating parameters in learning algorithms. In such cases, the values that give the best accuracy on the validation set are used as the final parameter values. Cross-validation can be used for parameter estimating as well. 95

96 More on Classifier Performance Using statistical tests for evaluation and comparison Improving accuracy: Bagging and boosting Classifier Combination 96

97 Best Practices Understand your model (strength/limitations/best practices) REASON ABOUT YOUR RESULTS (get insight on your data) 97

98 Diagnostics tell you what to try next Diagnostic for bias and Variance: Variance: Training error will be much lower than test error. Bias: Training error will also be high. Fixes to try: Try getting more training examples. Fixes high variance. Try a smaller set of features. Fixes high variance. Try a larger set of features. Fixes high bias Enhance Model Switch to a different model 98

99 Good machine learning practice Understanding your application problem: get a intuitive understand of what works and what doesn t work in your problem. convey insight about the problem, and justify your research claims: i.e., Rather than saying Here s an algorithm that works, it s more interesting to say Here s an algorithm that works because of component X, and here s my justification. Error analysis. Try to understand what your sources of error are. 99

100 Getting started on a problem Approach #1: Careful design. Spend a long term designing exactly the right features, collecting the right dataset, and designing the right algorithmic architecture. Implement it and hope it works. Benefit: Nicer, perhaps more scalable algorithms. May come up with new, elegant, learning algorithms; contribute to basic research in machine learning. 100

101 Getting started on a problem Approach #2: Build-and-fix. Implement something quick-and-dirty. Run error analysis and diagnostics to see what s wrong with it, and fix its errors. Benefit: Will often get your application problem working more quickly. Faster time to market. 101

102 Putting it All Together! Time spent coming up with diagnostics for learning algorithms is time well-spent. It s often up to how skilled you are to come up with right diagnostics.(cost-sensitive, time/space complexity, convergence,...etc.) Error analysis and learning curves also give insight into the problem. Two approaches to applying learning algorithms: Design very carefully, then implement. Build a quick-and-dirty prototype, diagnose, and fix. 102

103 Classification error Simple classifier Learning curves Complex classifier? Bayes error Size training set 103

104 Thank YOU!

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy Machine Learning Dr.Ammar Mohammed Nearest Neighbors Set of Stored Cases Atr1... AtrN Class A Store the training samples Use training samples

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

K- Nearest Neighbors(KNN) And Predictive Accuracy

K- Nearest Neighbors(KNN) And Predictive Accuracy Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

8. Tree-based approaches

8. Tree-based approaches Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Topics in Machine Learning

Topics in Machine Learning Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

Classification. Instructor: Wei Ding

Classification. Instructor: Wei Ding Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Data Mining in Bioinformatics Day 1: Classification

Data Mining in Bioinformatics Day 1: Classification Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls

More information

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen

More information

Classification and Regression

Classification and Regression Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan

More information

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

DM6 Support Vector Machines

DM6 Support Vector Machines DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Data Mining Lecture 8: Decision Trees

Data Mining Lecture 8: Decision Trees Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Classification Part 4

Classification Part 4 Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate

More information

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2. Supervised Learning CS 586 Machine Learning Prepared by Jugal Kalita With help from Alpaydin s Introduction to Machine Learning, Chapter 2. Topics What is classification? Hypothesis classes and learning

More information

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training

More information

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999 Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification

More information

COMP 465: Data Mining Classification Basics

COMP 465: Data Mining Classification Basics Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Support Vector Machines + Classification for IR

Support Vector Machines + Classification for IR Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines

More information

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Support Vector Machines

Support Vector Machines Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization

More information

Distribution-free Predictive Approaches

Distribution-free Predictive Approaches Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

Supervised Learning: K-Nearest Neighbors and Decision Trees

Supervised Learning: K-Nearest Neighbors and Decision Trees Supervised Learning: K-Nearest Neighbors and Decision Trees Piyush Rai CS5350/6350: Machine Learning August 25, 2011 (CS5350/6350) K-NN and DT August 25, 2011 1 / 20 Supervised Learning Given training

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Data Mining. Lecture 03: Nearest Neighbor Learning

Data Mining. Lecture 03: Nearest Neighbor Learning Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F. Provost

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

More information

Performance Evaluation of Various Classification Algorithms

Performance Evaluation of Various Classification Algorithms Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------

More information

An introduction to random forests

An introduction to random forests An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random

More information

Data Mining and Analytics

Data Mining and Analytics Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/

More information

Predictive modelling / Machine Learning Course on Big Data Analytics

Predictive modelling / Machine Learning Course on Big Data Analytics Predictive modelling / Machine Learning Course on Big Data Analytics Roberta Turra, Cineca 19 September 2016 Going back to the definition of data analytics process of extracting valuable information from

More information

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running

More information

Data Mining Classification - Part 1 -

Data Mining Classification - Part 1 - Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,

More information

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 Overview The goals of analyzing cross-sectional data Standard methods used

More information

k-nearest Neighbors + Model Selection

k-nearest Neighbors + Model Selection 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

CS Machine Learning

CS Machine Learning CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K

More information

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,

More information

2. On classification and related tasks

2. On classification and related tasks 2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.

More information

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday. CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 15-1: Support Vector Machines Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information