Data Mining and Knowledge Discovery: Practice Notes
|
|
- Amberlynn Davis
- 5 years ago
- Views:
Transcription
1 Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak 2016/01/12 1
2 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms Decision tree induction, entropy, information gain, overfitting, Occam s razor, model pruning, naïve Bayes classifier, KNN, association rules, support, confidence, numeric prediction, regression tree, model tree, heuristics vs. exhaustive search, predictive vs. descriptive DM Evaluation Train set, test set, accuracy, confusion matrix, cross validation, true positives, false positives, ROC space, error, precision, recall 2
3 Discussion 1. Compare naïve Bayes and decision trees (similarities and differences). 2. Compare cross validation and testing on a separate test set. 3. Why do we prune decision trees? 4. What is discretization. 5. Why can t we always achieve 100% accuracy on the training set? 6. Compare Laplace estimate with relative frequency. 7. Why does Naïve Bayes work well (even if independence assumption is clearly violated)? 8. What are the benefits of using Laplace estimate instead of relative frequency for probability estimation in Naïve Bayes? 3
4 Comparison of naïve Bayes and decision trees Similarities Classification Same evaluation Differences Missing values Numeric attributes Interpretability of the model Model size 4
5 Comparison of naïve Bayes and decision trees: Handling missing values Will the spider catch these two ants? Color = white, Time = night missing value for attribute Size Color = black, Size = large, Time = day Naïve Bayes uses all the available information. 5
6 Comparison of naïve Bayes and decision trees: Handling missing values 6
7 Comparison of naïve Bayes and decision trees: Handling missing values Algorithm ID3: does not handle missing values Algorithm C4.5 (J48) deals with two problems: Missing values in train data: Missing values are not used in gain and entropy calculations Missing values in test data: A missing continuous value is replaced with the median of the training set A missing categorical values is replaced with the most frequent value 7
8 Comparison of naïve Bayes and decision trees: numeric attributes Decision trees ID3 algorithm: does not handle continuous attributes data need to be discretized Decision trees C4.5 (J48 in Weka) algorithm: deals with continuous attributes as shown earlier Naïve Bayes: does not handle continuous attributes data need to be discretized (some implementations do handle) 8
9 Comparison of naïve Bayes and decision trees: Interpretability Decision trees are easy to understand and interpret (if they are of moderate size) Naïve bayes models are of the black box type. Naïve bayes models have been visualized by nomograms. 9
10 Comparison of naïve Bayes and decision trees: Model size Naïve Bayes model size is low and quite constant with respect to the data Trees, especially random forest tend to be very large 10
11 Discussion 1. Compare naïve Bayes and decision trees (similarities and differences). 2. Compare cross validation and testing on a separate test set. 3. Why do we prune decision trees? 4. What is discretization. 5. Why can t we always achieve 100% accuracy on the training set? 6. Compare Laplace estimate with relative frequency. 7. Why does Naïve Bayes work well (even if independence assumption is clearly violated)? 8. What are the benefits of using Laplace estimate instead of relative frequency for probability estimation in Naïve Bayes? 11
12 Comparison of cross validation and testing on a separate test set Both are methods for evaluating predictive models. Testing on a separate test set is simpler since we split the data into two sets: one for training and one for testing. We evaluate the model on the test data. Cross validation is more complex: It repeats testing on a separate test n times, each time taking 1/n of different data examples as test data. The evaluation measures are averaged over all testing sets therefore the results are more reliable. 12
13 (Train Validation Test) Set Training set: a set of examples used for learning Validation set: a set of examples used to tune the parameters of a classifier Test set: a set of examples used only to assess the performance of a fully-trained classifier Why separate test and validation sets? The error rate estimate of the final model on validation data will be biased (smaller than the true error rate) since the validation set is used to select the final model. After assessing the final model on the test set, YOU MUST NOT tune the model any further! 13
14 Discussion 1. Compare naïve Bayes and decision trees (similarities and differences). 2. Compare cross validation and testing on a separate test set. 3. Why do we prune decision trees? 4. What is discretization. 5. Why can t we always achieve 100% accuracy on the training set? 6. Compare Laplace estimate with relative frequency. 7. Why does Naïve Bayes work well (even if independence assumption is clearly violated)? 8. What are the benefits of using Laplace estimate instead of relative frequency for probability estimation in Naïve Bayes? 14
15 Decision tree pruning To avoid overfitting Reduce size of a model and therefore increase understandability. 15
16 Discussion 1. Compare naïve Bayes and decision trees (similarities and differences). 2. Compare cross validation and testing on a separate test set. 3. Why do we prune decision trees? 4. What is discretization. 5. Why can t we always achieve 100% accuracy on the training set? 6. Compare Laplace estimate with relative frequency. 7. Why does Naïve Bayes work well (even if independence assumption is clearly violated)? 8. What are the benefits of using Laplace estimate instead of relative frequency for probability estimation in Naïve Bayes? 16
17 Discretization A good choice of intervals for discretizing your continuous feature is key to improving the predictive performance of your model. Hand-picked intervals good knowledge about the data Equal-width intervals probably won't give good results Find the right intervals using existing data: Equal frequency intervals If you have labeled data, another common technique is to find the intervals which maximize the information gain Caution: The decision about the intervals should be done based on training data only 17
18 Discussion 1. Compare naïve Bayes and decision trees (similarities and differences). 2. Compare cross validation and testing on a separate test set. 3. Why do we prune decision trees? 4. What is discretization. 5. Why can t we always achieve 100% accuracy on the training set? 6. Compare Laplace estimate with relative frequency. 7. Why does Naïve Bayes work well (even if independence assumption is clearly violated)? 8. What are the benefits of using Laplace estimate instead of relative frequency for probability estimation in Naïve Bayes? 18
19 Why can t we always achieve 100% accuracy on the training set? Two examples have the same attribute values but different classes Run out of attributes 19
20 Discussion 1. Compare naïve Bayes and decision trees (similarities and differences). 2. Compare cross validation and testing on a separate test set. 3. Why do we prune decision trees? 4. What is discretization. 5. Why can t we always achieve 100% accuracy on the training set? 6. Compare Laplace estimate with relative frequency. 7. Why does Naïve Bayes work well (even if independence assumption is clearly violated)? 8. What are the benefits of using Laplace estimate instead of relative frequency for probability estimation in Naïve Bayes? 20
21 Relative frequency vs. Laplace estimate Relative frequency P(c) = n(c) /N A disadvantage of using relative frequencies for probability estimation arises with small sample sizes, especially if they are either very close to zero, or very close to one. In our spider example: P(Time=day caught=no) = = 0/3 = 0 n(c) number of examples where c is true N number of all examples k number of classes Laplace estimate Assumes uniform prior distribution of k classes P(c) = (n(c) + 1) / (N + k) In our spider example: P(Time=day caught=no) = (0+1)/(3+2) = 1/5 With lots of evidence approximates relative frequency If there were 300 cases when the spider didn t catch ants at night: P(Time=day caught=no) = (0+1)/(300+2) = 1/302 = With Laplace estimate probabilities can never be 0. 21
22 Discussion 1. Compare naïve Bayes and decision trees (similarities and differences). 2. Compare cross validation and testing on a separate test set. 3. Why do we prune decision trees? 4. What is discretization. 5. Why can t we always achieve 100% accuracy on the training set? 6. Compare Laplace estimate with relative frequency. 7. Why does Naïve Bayes work well (even if independence assumption is clearly violated)? 8. What are the benefits of using Laplace estimate instead of relative frequency for probability estimation in Naïve Bayes? 22
23 Why does Naïve Bayes work well? Because classification doesn't require accurate probability estimates as long as maximum probability is assigned to correct class. 23
24 Discussion 1. Compare naïve Bayes and decision trees (similarities and differences). 2. Compare cross validation and testing on a separate test set. 3. Why do we prune decision trees? 4. What is discretization. 5. Why can t we always achieve 100% accuracy on the training set? 6. Compare Laplace estimate with relative frequency. 7. Why does Naïve Bayes work well (even if independence assumption is clearly violated)? 8. What are the benefits of using Laplace estimate instead of relative frequency for probability estimation in Naïve Bayes? 24
25 Benefits of Laplace estimate With Laplace estimate we avoid assigning a probability of 0, as it denotes an impossible event Instead we assume uniform prior distribution of k classes 25
26 Numeric prediction 26
27 Height Example data about 80 people: Age and Height Age Height 27
28 Test set 28
29 Height Baseline numeric predictor Average of the target variable Height Average predictor Age 29
30 Baseline predictor: prediction Average of the target variable is
31 Height Linear Regression Model Height = * Age Height Prediction Age 31
32 Linear Regression: prediction Height = * Age
33 H e i g h t Regression tree H e i g h t P re d i c ti o n A g e 33
34 Regression tree: prediction 34
35 Height Model tree Height Prediction Age 35
36 Model tree: prediction 36
37 Height KNN K nearest neighbors Looks at K closest examples (by non-target attributes) and predicts the average of their target variable In this example, K= Height Prediction KNN, n= Age 37
38 KNN prediction A g e H e ig h t
39 KNN prediction A g e H e ig h t
40 KNN prediction A g e H e ig h t
41 KNN prediction A g e H e ig h t
42 KNN video 42
43 Which predictor is the best? Age Height Baseline Linear regression Regressi on tree Model tree knn
44 Evaluating numeric prediction 44
45 Numeric prediction Classification Data: attribute-value description Target variable: Continuous Target variable: Categorical (nominal) Evaluation: cross validation, separate test set, Error: MSE, MAE, RMSE, Algorithms: Linear regression, regression trees, Baseline predictor: Mean of the target variable Error: 1-accuracy Algorithms: Decision trees, Naïve Bayes, Baseline predictor: Majority class 45
46 Discussion 1. Can KNN be used for classification tasks? 2. Compare KNN and Naïve Bayes. 3. Compare decision trees and regression trees. 4. Consider a dataset with a target variable with five possible values: 1. non sufficient 2. sufficient 3. good 4. very good 5. excellent 1. Is this a classification or a numeric prediction problem? 2. What if such a variable is an attribute, is it nominal or numeric? 46
47 KNN for classification? Yes. A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor. 47
48 Discussion 1. Can KNN be used for classification tasks? 2. Compare KNN and Naïve Bayes. 3. Compare decision trees and regression trees. 4. Consider a dataset with a target variable with five possible values: 1. non sufficient 2. sufficient 3. good 4. very good 5. excellent 1. Is this a classification or a numeric prediction problem? 2. What if such a variable is an attribute, is it nominal or numeric? 48
49 Comparison of KNN and naïve Bayes Naïve Bayes KNN Used for Handle categorical data Handle numeric data Model interpretability Lazy classification Evaluation Parameter tuning 49
50 Comparison of KNN and naïve Bayes Used for Naïve Bayes Classification KNN Classification and numeric prediction Handle categorical data Yes Proper distance function needed Handle numeric data Discretization needed Yes Model interpretability Limited No Lazy classification Partial Yes Evaluation Cross validation, Cross validation, Parameter tuning No No 50
51 Discussion 1. Can KNN be used for classification tasks? 2. Compare KNN and Naïve Bayes. 3. Compare decision trees and regression trees. 4. Consider a dataset with a target variable with five possible values: 1. non sufficient 2. sufficient 3. good 4. very good 5. excellent 1. Is this a classification or a numeric prediction problem? 2. What if such a variable is an attribute, is it nominal or numeric? 51
52 Comparison of regression and decision trees 1. Data 2. Target variable 3. Evaluation 4. Error 5. Algorithm 6. Heuristic 7. Stopping criterion 52
53 Comparison of regression and decision trees Regression trees Data: attribute-value description Target variable: Continuous Decision trees Target variable: Categorical (nominal) Evaluation: cross validation, separate test set, Error: MSE, MAE, RMSE, Error: 1-accuracy Algorithm: Top down induction, shortsighted method Heuristic: Standard deviation Stopping criterion: Standard deviation< threshold Heuristic : Information gain Stopping criterion: Pure leafs (entropy=0) 53
54 Discussion 1. Can KNN be used for classification tasks? 2. Compare KNN and Naïve Bayes. 3. Compare decision trees and regression trees. 4. Consider a dataset with a target variable with five possible values: 1. non sufficient 2. sufficient 3. good 4. very good 5. excellent 1. Is this a classification or a numeric prediction problem? 2. What if such a variable is an attribute, is it nominal or numeric? 54
55 Classification or a numeric prediction problem? Target variable with five possible values: 1.non sufficient 2.sufficient 3.good 4.very good 5.excellent Classification: the misclassification cost is the same if non sufficient is classified as sufficient or if it is classified as very good Numeric prediction: The error of predicting 2 when it should be 1 is 1, while the error of predicting 5 instead of 1 is 4. If we have a variable with ordered values, it should be considered numeric. 55
56 Nominal or numeric attribute? A variable with five possible values: 1.non sufficient 2.sufficient 3.good 4.very good 5.Excellent Nominal: Numeric: If we have a variable with ordered values, it should be considered numeric. 56
57 Association Rules 57
58 Association rules Rules X Y, X, Y conjunction of items Task: Find all association rules that satisfy minimum support and minimum confidence constraints - Support: Sup(X Y) = #XY/#D p(xy) - Confidence: Conf(X Y) = #XY/#X p(xy)/p(x) = p(y X) 58
59 Association rules - algorithm 1. Generate frequent itemsets with a minimum support constraint 2. Generate rules from frequent itemsets with a minimum confidence constraint * Data are in a transaction database 59
60 Association rules transaction database Items: A=apple, B=banana, C=coca-cola, D=doughnut Client 1 bought: A, B, C, D Client 2 bought: B, C Client 3 bought: B, D Client 4 bought: A, C Client 5 bought: A, B, D Client 6 bought: A, B, C 60
61 Frequent itemsets Generate frequent itemsets with support at least 2/6 61
62 Frequent itemsets algorithm Items in an itemset should be sorted alphabetically. 1. Generate all 1-itemsets with the given minimum support. 2. Use 1-itemsets to generate 2-itemsets with the given minimum support. 3. From 2-itemsets generate 3-itemsets with the given minimum support as unions of 2-itemsets with the same item at the beginning From n-itemsets generate (n+1)-itemsets as unions of n- itemsets with the same (n-1) items at the beginning. To generate itemsets at level n+1 items from level n are used with a constraint: itemsets have to start with the same n-1 items. 62
63 Frequent itemsets lattice Frequent itemsets: A&B, A&C, A&D, B&C, B&D A&B&C, A&B&D 63
64 Rules from itemsets A&B is a frequent itemset with support 3/6 Two possible rules A B confidence = #(A&B)/#A = 3/4 B A confidence = #(A&B)/#B = 3/5 All the counts are in the itemset lattice! 64
65 Quality of association rules Support(X) = #X / #D. P(X) Support(X Y) = Support (XY) = #XY / #D P(XY) Confidence(X Y) = #XY / #X P(Y X) Lift(X Y) = Support(X Y) / (Support (X)*Support(Y)) Leverage(X Y) = Support(X Y) Support(X)*Support(Y) Conviction(X Y) = 1-Support(Y)/(1-Confidence(X Y)) 65
66 Quality of association rules Support(X) = #X / #D. P(X) Support(X Y) = Support (XY) = #XY / #D P(XY) Confidence(X Y) = #XY / #X P(Y X) Lift(X Y) = Support(X Y) / (Support (X)*Support(Y)) How many more times the items in X and Y occur together then it would be expected if the itemsets were statistically independent. Leverage(X Y) = Support(X Y) Support(X)*Support(Y) Similar to lift, difference instead of ratio. Conviction(X Y) = 1-Support(Y)/(1-Confidence(X Y)) Degree of implication of a rule. Sensitive to rule direction. 66
67 Discussion Transformation of an attribute-value dataset to a transaction dataset. What are the benefits of a transaction dataset? What would be the association rules for a dataset with two items A and B, each of them with support 80% and appearing in the same transactions as rarely as possible? minsupport = 50%, min conf = 70% minsupport = 20%, min conf = 70% What if we had 4 items: A, A, B, B Compare decision trees and association rules regarding handling an attribute like PersonID. What about attributes that have many values (eg. Month of year) 67
68 Next week Written exam 60 minutes of time 4 tasks: 2 computational (60%), 2 theoretical (40%) Literature is not allowed Each student can bring one hand-written A4 sheet of paper, and a hand calculator Data mining seminar One page seminar proposal on paper 68
Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 06/0/ Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2013/12/09 1 Practice plan 2013/11/11: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate
More informationData Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM
Practice notes 4..9 Practice plan Data Mining and Knowledge Discovery Knowledge Discovery and Knowledge Management in e-science Petra Kralj Novak Petra.Kralj.Novak@ijs.si Practice, 9//4 9//: Predictive
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationExam Advanced Data Mining Date: Time:
Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your
More informationData Mining Classification - Part 1 -
Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More information7. Decision or classification trees
7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationData Mining Algorithms: Basic Methods
Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More informationDecision Tree CE-717 : Machine Learning Sharif University of Technology
Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationMachine Learning. Decision Trees. Manfred Huber
Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationPart I. Instructor: Wei Ding
Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set
More informationCS Machine Learning
CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationProbabilistic Classifiers DWML, /27
Probabilistic Classifiers DWML, 2007 1/27 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation
Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set
More informationInterpretation and evaluation
Interpretation and evaluation 1. Descriptive tasks Evaluation based on novelty, interestingness, usefulness and understandability Qualitative evaluation: obvious (common sense) knowledge knowledge that
More informationCS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods
+ CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest
More informationCSC411/2515 Tutorial: K-NN and Decision Tree
CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:
More informationDATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines
DATA MINING LECTURE 10B Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines NEAREST NEIGHBOR CLASSIFICATION 10 10 Illustrating Classification Task Tid Attrib1
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationAssociation Rules. Comp 135 Machine Learning Computer Science Tufts University. Association Rules. Association Rules. Data Model.
Comp 135 Machine Learning Computer Science Tufts University Fall 2017 Roni Khardon Unsupervised learning but complementary to data exploration in clustering. The goal is to find weak implications in the
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More information9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)
Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o
More informationNotes based on: Data Mining for Business Intelligence
Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction
More informationDecision tree learning
Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical
More informationNonparametric Classification Methods
Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationDecision trees. Decision trees are useful to a large degree because of their simplicity and interpretability
Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationChapter 2: Classification & Prediction
Chapter 2: Classification & Prediction 2.1 Basic Concepts of Classification and Prediction 2.2 Decision Tree Induction 2.3 Bayes Classification Methods 2.4 Rule Based Classification 2.4.1 The principle
More informationData Preparation. Nikola Milikić. Jelena Jovanović
Data Preparation Nikola Milikić nikola.milikic@fon.bg.ac.rs Jelena Jovanović jeljov@fon.bg.ac.rs Normalization Normalization is the process of rescaling the values of an attribute to a specific value range,
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More informationData Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology
❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationClassification. 1 o Semestre 2007/2008
Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class
More informationMachine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU
Machine Learning 10-701/15-781, Spring 2008 Decision Trees Le Song Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 1.6, CB & Chap 3, TM Learning non-linear functions f:
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationLecture 19: Decision trees
Lecture 19: Decision trees Reading: Section 8.1 STATS 202: Data mining and analysis November 10, 2017 1 / 17 Decision trees, 10,000 foot view R2 R5 t4 1. Find a partition of the space of predictors. X2
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationPattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42
Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationAssociation Pattern Mining. Lijun Zhang
Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms
More information1) Give decision trees to represent the following Boolean functions:
1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following
More informationk-nearest Neighbor (knn) Sept Youn-Hee Han
k-nearest Neighbor (knn) Sept. 2015 Youn-Hee Han http://link.koreatech.ac.kr ²Eager Learners Eager vs. Lazy Learning when given a set of training data, it will construct a generalization model before receiving
More informationClassification: Decision Trees
Classification: Decision Trees IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University 1 Decision Tree Example Will a pa)ent have high-risk based on the ini)al 24-hour observa)on?
More informationUnderstanding Rule Behavior through Apriori Algorithm over Social Network Data
Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172
More informationMS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods
MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric
More informationNon-trivial extraction of implicit, previously unknown and potentially useful information from data
CS 795/895 Applied Visual Analytics Spring 2013 Data Mining Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs795-s13/ What is Data Mining? Many Definitions Non-trivial extraction of implicit, previously
More informationCLASSIFICATION JELENA JOVANOVIĆ. Web:
CLASSIFICATION JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is classification? Binary and multiclass classification Classification algorithms Naïve Bayes (NB) algorithm
More informationMachine Learning. Classification
10-701 Machine Learning Classification Inputs Inputs Inputs Where we are Density Estimator Probability Classifier Predict category Today Regressor Predict real no. Later Classification Assume we want to
More informationPart 12: Advanced Topics in Collaborative Filtering. Francesco Ricci
Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules
More informationText Categorization (I)
CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization
More informationData mining techniques for actuaries: an overview
Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of
More informationAdditive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan
Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan Predictive Modeling Goal: learn a mapping: y = f(x;θ) Need: 1. A model structure 2. A score function
More informationData Mining Practical Machine Learning Tools and Techniques
Engineering the input and output Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Attribute selection z Scheme-independent, scheme-specific
More informationModel Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal
More informationData Mining Practical Machine Learning Tools and Techniques
Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward
More informationExample of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.
lassification-decision Trees, Slide 1/56 Classification Decision Trees Huiping Cao lassification-decision Trees, Slide 2/56 Examples of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationStructure of Association Rule Classifiers: a Review
Structure of Association Rule Classifiers: a Review Koen Vanhoof Benoît Depaire Transportation Research Institute (IMOB), University Hasselt 3590 Diepenbeek, Belgium koen.vanhoof@uhasselt.be benoit.depaire@uhasselt.be
More informationInstance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.
Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k
More informationData mining: concepts and algorithms
Data mining: concepts and algorithms Practice Data mining Objective Exploit data mining algorithms to analyze a real dataset using the RapidMiner machine learning tool. The practice session is organized
More informationCS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014
CS273 Midterm Eam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 Your name: Your UCINetID (e.g., myname@uci.edu): Your seat (row and number): Total time is 80 minutes. READ THE
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationCSE 158. Web Mining and Recommender Systems. Midterm recap
CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158
More informationComparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini*
Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini* #Student, Department of Computer Engineering, Punjabi university Patiala, India, aikjotnarula@gmail.com
More informationMidterm Examination CS 540-2: Introduction to Artificial Intelligence
Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State
More informationIEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde
IEE 520 Data Mining Project Report Shilpa Madhavan Shinde Contents I. Dataset Description... 3 II. Data Classification... 3 III. Class Imbalance... 5 IV. Classification after Sampling... 5 V. Final Model...
More informationMIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA
Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on
More informationChapter 4: Algorithms CS 795
Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that
More informationEnsemble Learning. Another approach is to leverage the algorithms we have via ensemble methods
Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationIntroduction to Machine Learning
Introduction to Machine Learning Decision Tree Example Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short} Class: Country = {Gromland, Polvia} CS4375 --- Fall 2018 a
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationData Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification
Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More information