Data Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM

Similar documents
Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery: Practice Notes

Supervised and Unsupervised Learning (II)

Association Rules. Comp 135 Machine Learning Computer Science Tufts University. Association Rules. Association Rules. Data Model.

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Data Mining: STATISTICA

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

Tutorial on Association Rule Mining

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

Data mining techniques for actuaries: an overview

Machine Learning. Classification

Classification. Instructor: Wei Ding

Frequent Pattern Mining

Machine Learning. Decision Trees. Manfred Huber

Part I. Instructor: Wei Ding

CSE4334/5334 DATA MINING

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

Data mining: concepts and algorithms

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

Data warehouse and Data Mining

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Wild Mushrooms Classification Edible or Poisonous

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Supervised Learning Classification Algorithms Comparison

Association Rule Learning

I211: Information infrastructure II

Answer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10.

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Chapter 4: Mining Frequent Patterns, Associations and Correlations

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

Classification Algorithms in Data Mining

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

SCHEME OF COURSE WORK. Data Warehousing and Data mining

Lecture 25: Review I

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Association Pattern Mining. Lijun Zhang

Probabilistic Classifiers DWML, /27

SOCIAL MEDIA MINING. Data Mining Essentials

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

CS570 Introduction to Data Mining

Frequent Pattern Mining

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Data Preprocessing. Supervised Learning

Fundamental Data Mining Algorithms

Performance and Scalability: Apriori Implementa6on

1) Give decision trees to represent the following Boolean functions:

Performance Analysis of Data Mining Classification Techniques

Business Intelligence. Tutorial for Performing Market Basket Analysis (with ItemCount)

Data Mining Classification - Part 1 -

Knowledge Discovery in Databases

CHAPTER 6 EXPERIMENTS

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Naïve Bayes for text classification

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Network Traffic Measurements and Analysis

Association Rule Mining

Text Categorization (I)

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Production rule is an important element in the expert system. By interview with

Classification Algorithms for Determining Handwritten Digit

Performance Evaluation of Various Classification Algorithms

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

Interpretation and evaluation

Weka ( )

Machine Learning: Symbolische Ansätze

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

American University in Cairo Date: Winter CSCE664 Advanced Data Mining Final Exam

6.034 Quiz 2, Spring 2005

Chapter 4: Algorithms CS 795

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Lab Exercise Two Mining Association Rule with WEKA Explorer

Chapter 12 Feature Selection

Effectiveness of Freq Pat Mining

A Conflict-Based Confidence Measure for Associative Classification

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Data Mining Concepts

Seminars of Software and Services for the Information Society

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Jarek Szlichta

CHAPTER 8. ITEMSET MINING 226

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

The Data Mining Application Based on WEKA: Geographical Original of Music

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Nearest neighbor classification DSE 220

A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis

Exam Advanced Data Mining Date: Time:

Classification and Regression

Transcription:

Practice notes 4..9 Practice plan Data Mining and Knowledge Discovery Knowledge Discovery and Knowledge Management in e-science Petra Kralj Novak Petra.Kralj.Novak@ijs.si Practice, 9//4 9//: Predictive data mining Decision trees Naïve Bayes classifier Evaluating classifiers (separate test set, cross validation, confusion matrix, classification accuracy) Predictive data mining in Weka 9//4: Numeric prediction and descriptive data mining Numeric prediction Association rules Regression models and evaluation in Weka Descriptive data mining in Weka Discussion about seminars and exam 9//8: Written exam //6: Seminar proposal presentations 9/3/: deadline for data mining papers (written seminar) 9/3/3: Data mining seminar presentations Numeric prediction Data: attribute-value description Classification Numeric prediction Baseline, Linear Regression, Regression tree, Model Tree, KNN 3 Continuous Evaluation: cross validation, separate test set, MSE, MAE, RMSE, Algorithms: Linear regression, regression trees, Baseline predictor: Mean of the target variable Categorical (nominal) -accuracy Algorithms: Decision trees, Naïve Bayes, Baseline predictor: Majority class 4 Example Test set data about 8 people: and.5.5 5 5 6

Practice notes 4..9 Baseline numeric predictor Average of the target variable.8.6.4..8.6.4. 4 6 8 Average predictor 7 Baseline predictor: prediction Average of the target variable is.63 8 Linear Regression Model =.56 * +.48.5 Linear Regression: prediction =.56 * +.48.5.5 Prediction 4 6 8 9 Regression tree Regression tree: prediction.5.5 Prediction 5

Practice notes 4..9 Model tree Model tree: prediction.5.5 Prediction 4 6 8 3 4 KNN K nearest neighbors Looks at K closest examples (by non-target attributes) and predicts the average of their target variable In this example, K=3..8.6.4...8.6.4.. Prediction KNN, n=3 4 6 8 5.9.99. 3.3 3.7 5.9 5.7 6 8.36 8.33 9.45 9.39.49.66.5 3.59 4.58 7 3.57 3.88 3.7 34.55 37.65 37.8 38.6 39.69 39.8 8 3

Practice notes 4..9 67.56 67.87 69.67 69.86 7.74 7.8 7.7 76.88 Which predictor is the best? Baseline Linear regression Regressi on tree Model tree knn.85.63.43.39...4.63.47.46.47.44 35.7.63.6.7.7.67 7.6.63.8.7.75.77 9 Evaluating numeric prediction. non sufficient. sufficient Classification or a numeric prediction problem? Target variable with five.non sufficient.sufficient 3.good 4.very good 5.excellent Classification: the misclassification cost is the same if non sufficient is classified as sufficient or if it is classified as very good Numeric prediction: The error of predicting when it should be is, while the error of predicting 5 instead of is 4. If we have a variable with ordered values, it should be considered numeric. 3 Nominal or numeric attribute? A variable with five.non sufficient.sufficient 3.good 4.very good 5.excellent Nominal: Numeric: If we have a variable with ordered values, it should be considered numeric. 4 4

Practice notes 4..9. non sufficient. sufficient 5 Can KNN be used for classification tasks? YES. In numeric prediction tasks, the average of the neighborhood is computed In classification tasks, the distribution of the classes in the neighborhood is computed 6. non sufficient. sufficient 7 Similarities between KNN and Naïve Bayes. Both are black box models, which do not give the insight into the data. Both are lazy classifiers : they do not build a model in the training phase and use it for predicting, but they need the data when predicting the value for a new example (partially true for Naïve Bayes) 8 Regression trees Data: attribute-value description Decision trees. non sufficient. sufficient 9 Continuous MSE, MAE, RMSE, Algorithm: Top down induction, shortsighted method Heuristic: Standard deviation Stopping criterion: Standard deviation< threshold Categorical (nominal) Evaluation: cross validation, separate test set, -accuracy Heuristic : Information gain Stopping criterion: Pure leafs (entropy=) 3 5

Practice notes 4..9 Association rules Association Rules Rules X Y, X, Y conjunction of items Task: Find all association rules that satisfy minimum support and minimum confidence constraints - Support: Sup(X Y) = #XY/#D p(xy) - Confidence: Conf(X Y) = #XY/#X p(xy)/p(x) = p(y X) 3 3 Association rules - algorithm. generate frequent itemsets with a minimum support constraint. generate rules from frequent itemsets with a minimum confidence constraint * Data are in a transaction database Association rules transaction database Items: A=apple, B=banana, C=coca-cola, D=doughnut Client bought: A, B, C, D Client bought: B, C Client 3 bought: B, D Client 4 bought: A, C Client 5 bought: A, B, D Client 6 bought: A, B, C 33 34 Frequent itemsets Frequent itemsets algorithm Generate frequent itemsets with support at least /6 35 Items in an itemset should be sorted alphabetically. Generate all -itemsets with the given minimum support. Use -itemsets to generate -itemsets with the given minimum support. From -itemsets generate 3-itemsets with the given minimum support as unions of -itemsets with the same item at the beginning. From n-itemsets generate (n+)-itemsets as unions of n-itemsets with the same (n- ) items at the beginning. 36 6

Practice notes 4..9 Frequent itemsets lattice Rules from itemsets Frequent itemsets: A&B, A&C, A&D, B&C, B&D A&B&C, A&B&D 37 A&B is a frequent itemset with support 3/6 Two possible rules A B confidence = #(A&B)/#A = 3/4 B A confidence = #(A&B)/#B = 3/5 All the counts are in the itemset lattice! 38 Quality of association rules Quality of association rules Support(X) = #X / #D. P(X) Support(X Y) = Support (XY) = #XY / #D P(XY) Confidence(X Y) = #XY / #X P(Y X) Lift(X Y) = Support(X Y) / (Support (X)*Support(Y)) Leverage(X Y) = Support(X Y) Support(X)*Support(Y) Conviction(X Y) = -Support(Y)/(-Confidence(X Y)) Support(X) = #X / #D. P(X) Support(X Y) = Support (XY) = #XY / #D P(XY) Confidence(X Y) = #XY / #X P(Y X) Lift(X Y) = Support(X Y) / (Support (X)*Support(Y)) How many more times the items in X and Y occur together then it would be expected if the itemsets were statistically independent. Leverage(X Y) = Support(X Y) Support(X)*Support(Y) Similar to lift, difference instead of ratio. Conviction(X Y) = -Support(Y)/(-Confidence(X Y)) Degree of implication of a rule. Sensitive to rule direction. 39 4 Discussion Transformation of an attribute-value dataset to a transaction dataset. What would be the association rules for a dataset with two items A and B, each of them with support 8% and appearing in the same transactions as rarely as possible? minsupport = 5%, min conf = 7% minsupport = %, min conf = 7% What if we had 4 items: A, A, B, B Compare decision trees and association rules regarding handling an attribute like PersonID. What about attributes that have many values (eg. Month of year) 4 7