# Data Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM

Size: px
Start display at page:

Download "Data Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM"

## Transcription

1 Practice notes 4..9 Practice plan Data Mining and Knowledge Discovery Knowledge Discovery and Knowledge Management in e-science Petra Kralj Novak Practice, 9//4 9//: Predictive data mining Decision trees Naïve Bayes classifier Evaluating classifiers (separate test set, cross validation, confusion matrix, classification accuracy) Predictive data mining in Weka 9//4: Numeric prediction and descriptive data mining Numeric prediction Association rules Regression models and evaluation in Weka Descriptive data mining in Weka Discussion about seminars and exam 9//8: Written exam //6: Seminar proposal presentations 9/3/: deadline for data mining papers (written seminar) 9/3/3: Data mining seminar presentations Numeric prediction Data: attribute-value description Classification Numeric prediction Baseline, Linear Regression, Regression tree, Model Tree, KNN 3 Continuous Evaluation: cross validation, separate test set, MSE, MAE, RMSE, Algorithms: Linear regression, regression trees, Baseline predictor: Mean of the target variable Categorical (nominal) -accuracy Algorithms: Decision trees, Naïve Bayes, Baseline predictor: Majority class 4 Example Test set data about 8 people: and

2 Practice notes 4..9 Baseline numeric predictor Average of the target variable Average predictor 7 Baseline predictor: prediction Average of the target variable is.63 8 Linear Regression Model =.56 * Linear Regression: prediction =.56 * Prediction Regression tree Regression tree: prediction.5.5 Prediction 5

3 Practice notes 4..9 Model tree Model tree: prediction.5.5 Prediction KNN K nearest neighbors Looks at K closest examples (by non-target attributes) and predicts the average of their target variable In this example, K= Prediction KNN, n=

4 Practice notes Which predictor is the best? Baseline Linear regression Regressi on tree Model tree knn Evaluating numeric prediction. non sufficient. sufficient Classification or a numeric prediction problem? Target variable with five.non sufficient.sufficient 3.good 4.very good 5.excellent Classification: the misclassification cost is the same if non sufficient is classified as sufficient or if it is classified as very good Numeric prediction: The error of predicting when it should be is, while the error of predicting 5 instead of is 4. If we have a variable with ordered values, it should be considered numeric. 3 Nominal or numeric attribute? A variable with five.non sufficient.sufficient 3.good 4.very good 5.excellent Nominal: Numeric: If we have a variable with ordered values, it should be considered numeric. 4 4

5 Practice notes non sufficient. sufficient 5 Can KNN be used for classification tasks? YES. In numeric prediction tasks, the average of the neighborhood is computed In classification tasks, the distribution of the classes in the neighborhood is computed 6. non sufficient. sufficient 7 Similarities between KNN and Naïve Bayes. Both are black box models, which do not give the insight into the data. Both are lazy classifiers : they do not build a model in the training phase and use it for predicting, but they need the data when predicting the value for a new example (partially true for Naïve Bayes) 8 Regression trees Data: attribute-value description Decision trees. non sufficient. sufficient 9 Continuous MSE, MAE, RMSE, Algorithm: Top down induction, shortsighted method Heuristic: Standard deviation Stopping criterion: Standard deviation< threshold Categorical (nominal) Evaluation: cross validation, separate test set, -accuracy Heuristic : Information gain Stopping criterion: Pure leafs (entropy=) 3 5

6 Practice notes 4..9 Association rules Association Rules Rules X Y, X, Y conjunction of items Task: Find all association rules that satisfy minimum support and minimum confidence constraints - Support: Sup(X Y) = #XY/#D p(xy) - Confidence: Conf(X Y) = #XY/#X p(xy)/p(x) = p(y X) 3 3 Association rules - algorithm. generate frequent itemsets with a minimum support constraint. generate rules from frequent itemsets with a minimum confidence constraint * Data are in a transaction database Association rules transaction database Items: A=apple, B=banana, C=coca-cola, D=doughnut Client bought: A, B, C, D Client bought: B, C Client 3 bought: B, D Client 4 bought: A, C Client 5 bought: A, B, D Client 6 bought: A, B, C Frequent itemsets Frequent itemsets algorithm Generate frequent itemsets with support at least /6 35 Items in an itemset should be sorted alphabetically. Generate all -itemsets with the given minimum support. Use -itemsets to generate -itemsets with the given minimum support. From -itemsets generate 3-itemsets with the given minimum support as unions of -itemsets with the same item at the beginning. From n-itemsets generate (n+)-itemsets as unions of n-itemsets with the same (n- ) items at the beginning. 36 6

7 Practice notes 4..9 Frequent itemsets lattice Rules from itemsets Frequent itemsets: A&B, A&C, A&D, B&C, B&D A&B&C, A&B&D 37 A&B is a frequent itemset with support 3/6 Two possible rules A B confidence = #(A&B)/#A = 3/4 B A confidence = #(A&B)/#B = 3/5 All the counts are in the itemset lattice! 38 Quality of association rules Quality of association rules Support(X) = #X / #D. P(X) Support(X Y) = Support (XY) = #XY / #D P(XY) Confidence(X Y) = #XY / #X P(Y X) Lift(X Y) = Support(X Y) / (Support (X)*Support(Y)) Leverage(X Y) = Support(X Y) Support(X)*Support(Y) Conviction(X Y) = -Support(Y)/(-Confidence(X Y)) Support(X) = #X / #D. P(X) Support(X Y) = Support (XY) = #XY / #D P(XY) Confidence(X Y) = #XY / #X P(Y X) Lift(X Y) = Support(X Y) / (Support (X)*Support(Y)) How many more times the items in X and Y occur together then it would be expected if the itemsets were statistically independent. Leverage(X Y) = Support(X Y) Support(X)*Support(Y) Similar to lift, difference instead of ratio. Conviction(X Y) = -Support(Y)/(-Confidence(X Y)) Degree of implication of a rule. Sensitive to rule direction Discussion Transformation of an attribute-value dataset to a transaction dataset. What would be the association rules for a dataset with two items A and B, each of them with support 8% and appearing in the same transactions as rarely as possible? minsupport = 5%, min conf = 7% minsupport = %, min conf = 7% What if we had 4 items: A, A, B, B Compare decision trees and association rules regarding handling an attribute like PersonID. What about attributes that have many values (eg. Month of year) 4 7

### Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2013/12/09 1 Practice plan 2013/11/11: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate

### Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

### Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 06/0/ Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

### Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

### Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

### Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

### Data Mining and Knowledge Discovery Practice notes 2

Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

### Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

### Supervised and Unsupervised Learning (II)

Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised

### Association Rules. Comp 135 Machine Learning Computer Science Tufts University. Association Rules. Association Rules. Data Model.

Comp 135 Machine Learning Computer Science Tufts University Fall 2017 Roni Khardon Unsupervised learning but complementary to data exploration in clustering. The goal is to find weak implications in the

### Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the

### Data Mining: STATISTICA

Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and

### 2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

### Tutorial on Association Rule Mining

Tutorial on Association Rule Mining Yang Yang yang.yang@itee.uq.edu.au DKE Group, 78-625 August 13, 2010 Outline 1 Quick Review 2 Apriori Algorithm 3 FP-Growth Algorithm 4 Mining Flickr and Tag Recommendation

### Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other

### Data mining techniques for actuaries: an overview

Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of

### Machine Learning. Classification

10-701 Machine Learning Classification Inputs Inputs Inputs Where we are Density Estimator Probability Classifier Predict category Today Regressor Predict real no. Later Classification Assume we want to

### Classification. Instructor: Wei Ding

Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute

### Frequent Pattern Mining

Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B

### Machine Learning. Decision Trees. Manfred Huber

Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic

### Part I. Instructor: Wei Ding

Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set

### CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

### IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

IEE 520 Data Mining Project Report Shilpa Madhavan Shinde Contents I. Dataset Description... 3 II. Data Classification... 3 III. Class Imbalance... 5 IV. Classification after Sampling... 5 V. Final Model...

### Data mining: concepts and algorithms

Data mining: concepts and algorithms Practice Data mining Objective Exploit data mining algorithms to analyze a real dataset using the RapidMiner machine learning tool. The practice session is organized

### CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

### Data warehouse and Data Mining

Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

### International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

### Wild Mushrooms Classification Edible or Poisonous

Wild Mushrooms Classification Edible or Poisonous Yulin Shen ECE 539 Project Report Professor: Hu Yu Hen 2013 Fall ( I allow code to be released in the public domain) pg. 1 Contents Introduction. 3 Practicality

### Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

### Supervised Learning Classification Algorithms Comparison

Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------

### Association Rule Learning

Association Rule Learning 16s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 15, 2016 COMP9417 ML & DM (CSE, UNSW) Association

### I211: Information infrastructure II

Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1

### Answer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10.

Code No: 126VW Set No. 1 JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD B.Tech. III Year, II Sem., II Mid-Term Examinations, April-2018 DATA WAREHOUSING AND DATA MINING Objective Exam Name: Hall Ticket

### Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

### Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

### CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

CS273 Midterm Eam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 Your name: Your UCINetID (e.g., myname@uci.edu): Your seat (row and number): Total time is 80 minutes. READ THE

### Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

### 10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1 Distance Measures Nearest Neighbors In a p-dimensional space, the Euclidean distance between two records, a = a, a,..., a ) and b = b, b,...,

### Classification Algorithms in Data Mining

August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

### Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

Data Mining (Big Data Analytics) Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://user.engineering.uiowa.edu/~ankusiak/

### SCHEME OF COURSE WORK. Data Warehousing and Data mining

SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH

### Lecture 25: Review I

Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

### Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction

### Association Pattern Mining. Lijun Zhang

Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

### Probabilistic Classifiers DWML, /27

Probabilistic Classifiers DWML, 2007 1/27 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium

### SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

### Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172

### CS570 Introduction to Data Mining

CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

### Frequent Pattern Mining

Frequent Pattern Mining...3 Frequent Pattern Mining Frequent Patterns The Apriori Algorithm The FP-growth Algorithm Sequential Pattern Mining Summary 44 / 193 Netflix Prize Frequent Pattern Mining Frequent

### Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

### Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights

### Data Preprocessing. Supervised Learning

Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are

### Fundamental Data Mining Algorithms

2018 EE448, Big Data Mining, Lecture 3 Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html REVIEW What is Data

### Performance and Scalability: Apriori Implementa6on

Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:

### 1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

### Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

Business Intelligence Professor Chen NAME: Due Date: Tutorial for Performing Market Basket Analysis (with ItemCount) 1. To perform a Market Basket Analysis, we will begin by selecting Open Template from

### Data Mining Classification - Part 1 -

Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model

### Knowledge Discovery in Databases

Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 3: Frequent Itemsets

### CHAPTER 6 EXPERIMENTS

CHAPTER 6 EXPERIMENTS 6.1 HYPOTHESIS On the basis of the trend as depicted by the data Mining Technique, it is possible to draw conclusions about the Business organization and commercial Software industry.

### Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Data Mining: Classifier Evaluation CSCI-B490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what

### MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric

### Naïve Bayes for text classification

Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

### Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

### Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

### Network Traffic Measurements and Analysis

DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

### Association Rule Mining

Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}

### Text Categorization (I)

CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization

### What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT

### Production rule is an important element in the expert system. By interview with

2 Literature review Production rule is an important element in the expert system By interview with the domain experts, we can induce the rules and store them in a truth maintenance system An assumption-based

### Classification Algorithms for Determining Handwritten Digit

Classification Algorithms for Determining Handwritten Digit Hayder Naser Khraibet AL-Behadili Computer Science Department, Shatt Al-Arab University College, Basrah, Iraq haider_872004 @yahoo.com Abstract:

### Performance Evaluation of Various Classification Algorithms

Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------

### CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM 4.1 Introduction Nowadays money investment in stock market gains major attention because of its dynamic nature. So the

### Interpretation and evaluation

Interpretation and evaluation 1. Descriptive tasks Evaluation based on novelty, interestingness, usefulness and understandability Qualitative evaluation: obvious (common sense) knowledge knowledge that

### Weka ( )

Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

### Machine Learning: Symbolische Ansätze

Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the

### 10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

### CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem

### American University in Cairo Date: Winter CSCE664 Advanced Data Mining Final Exam

American University in Cairo Date: Winter 2012 Computer Science Department Time : 3 hours CSCE664 Advanced Data Mining Final Exam Name: ID: Question Maximum Grade 1 20 2 15 3 15 4 20 5 15 6 15 7 15 8 15

### 6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

### Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

### Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set

### Lab Exercise Two Mining Association Rule with WEKA Explorer

Lab Exercise Two Mining Association Rule with WEKA Explorer 1. Fire up WEKA to get the GUI Chooser panel. Select Explorer from the four choices on the right side. 2. To get a feel for how to apply Apriori,

### Chapter 12 Feature Selection

Chapter 12 Feature Selection Xiaogang Su Department of Statistics University of Central Florida - 1 - Outline Why Feature Selection? Categorization of Feature Selection Methods Filter Methods Wrapper Methods

### Effectiveness of Freq Pat Mining

Effectiveness of Freq Pat Mining Too many patterns! A pattern a 1 a 2 a n contains 2 n -1 subpatterns Understanding many patterns is difficult or even impossible for human users Non-focused mining A manager

### A Conflict-Based Confidence Measure for Associative Classification

A Conflict-Based Confidence Measure for Associative Classification Peerapon Vateekul and Mei-Ling Shyu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33124, USA

### Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Association rules Marco Saerens (UCL), with Christine Decaestecker (ULB) 1 Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004),

### Data Mining Concepts

Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

### Seminars of Software and Services for the Information Society

DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society

### Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

### Jarek Szlichta

Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns

### CHAPTER 8. ITEMSET MINING 226

CHAPTER 8. ITEMSET MINING 226 Chapter 8 Itemset Mining In many applications one is interested in how often two or more objectsofinterest co-occur. For example, consider a popular web site, which logs all

### INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program

### The Data Mining Application Based on WEKA: Geographical Original of Music

Management Science and Engineering Vol. 10, No. 4, 2016, pp. 36-46 DOI:10.3968/8997 ISSN 1913-0341 [Print] ISSN 1913-035X [Online] www.cscanada.net www.cscanada.org The Data Mining Application Based on

### Study on Classifiers using Genetic Algorithm and Class based Rules Generation

2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules

### Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

### Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

### A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis

A Critical Study of Selected Classification s for Liver Disease Diagnosis Shapla Rani Ghosh 1, Sajjad Waheed (PhD) 2 1 MSc student (ICT), 2 Associate Professor (ICT) 1,2 Department of Information and Communication