A Systematic Overview of Data Mining Algorithms

Similar documents
A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Univariate and Multivariate Decision Trees

CS6220: DATA MINING TECHNIQUES

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Data Mining Concepts & Techniques

Classification and Regression Trees

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification: Linear Discriminant Functions

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

CS Machine Learning

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lecture 5: Decision Trees (Part II)

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Logical Rhythm - Class 3. August 27, 2018

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

CSE4334/5334 DATA MINING

Machine Learning Classifiers and Boosting

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Contents. Preface to the Second Edition

Decision Trees Oct

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Part I. Instructor: Wei Ding

Neural Network Neurons

7. Decision or classification trees

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

AMOL MUKUND LONDHE, DR.CHELPA LINGAM

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Decision tree learning

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Classification and Regression

Predict the box office of US movies

5 Learning hypothesis classes (16 points)

Extra readings beyond the lecture slides are important:

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

CS 229 Midterm Review

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Machine Learning in Telecommunications

Week 3: Perceptron and Multi-layer Perceptron

Supervised vs unsupervised clustering

Ensemble methods in machine learning. Example. Neural networks. Neural networks

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Random Forest A. Fornaser

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Classification. Instructor: Wei Ding

2. Data Preprocessing

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Decision Trees: Representation

3. Data Preprocessing. 3.1 Introduction

CS229 Lecture notes. Raphael John Lamarre Townshend

Artificial Neural Networks (Feedforward Nets)

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Based on Raymond J. Mooney s slides

Lecture 7: Decision Trees

COMP 465: Data Mining Classification Basics

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

CP365 Artificial Intelligence

Statistical foundations of machine learning

Chapter 7: Numerical Prediction

Information Management course

Classification/Regression Trees and Random Forests

Algorithms: Decision Trees

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Credit card Fraud Detection using Predictive Modeling: a Review

Perceptron-Based Oblique Tree (P-BOT)

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Optimization. Industrial AI Lab.

6. Linear Discriminant Functions

Hybrid PSO-SA algorithm for training a Neural Network for Classification

The Curse of Dimensionality

Business Club. Decision Trees

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

CS570: Introduction to Data Mining

8. Tree-based approaches

Exercise: Training Simple MLP by Backpropagation. Using Netlab.

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Neural Networks (Overview) Prof. Richard Zanibbi

Tree-based methods for classification and regression

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Decision Tree Learning

Bayesian model ensembling using meta-trained recurrent neural networks

Comparing Univariate and Multivariate Decision Trees *

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Transcription:

A Systematic Overview of Data Mining Algorithms 1

Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a finite set of rules. algorithm: terminates after finite no of steps and produces an output e.g., gradient descent is a computational method- for it to be an algorithm need to specify where to begin, how to calculate direction of descent, when to terminate search model structure: a global summary of the data set, e.g., Y=aX+c where Y, X are variables; a, c are extracted parameters pattern structure: statements about restricted regions of the space If X > x 1 then prob( Y > y 1 ) = p 1 2

Components of a Data Mining Algorithm 1. Task: e.g., visualization, classification, clustering, regression, etc 2. Structure: functional form of model or pattern, e.g., linear regression, hierarchical clustering 3. Score function: to judge quality of fitted model or pattern, e.g., generalization performance on unseen data 4. Search or Optimation method: e.g., steepest descent 5. Data Management technique: storing, indexing and retrieving data. ML algorithms do not specify this. Massive data sets need it. 3

Task Components of 3 well-known Data Mining algorithms CART (model) Classification and Regression Backpropagation (parameter est.) Classification and Regression A Priori Rule Pattern Discovery Structure Decision Tree Neural Network Association Rules Score Function Search Methods Data Mgmt Techniques Cross-validated Loss Function Greedy Search over Structures Squared Error Gradient descent on Parameters Support/Accur acy Breadth-First with Pruning Unspecified Unspecified Linear Scans 4

CART Algorithm Task Classification and Regression Trees Widely used statistical procedure Produces classification and regression models with a tree based structure Only classification considered here: Mapping input vector x to categorical label y 5

Example Task: Wine Classification Two-dimensional wine data Color Intensity Goal is to classify into 3 different wine types (cultivars) Classification Tree Class o Class x Alcohol Content(%) Scatterplot of Data Data originate from a 13-dimensional data set each variable measuring a different wine characteristic Class * Test of Thresholds (shown beside branches) Uncertainty about class label at leaf node labelled as? 6

CART 5-tuple 1. Task = prediction (classification) 2. Model Structure = tree 3. Score Function = cross-validated loss function 4. Search Method = greedy local search 5. Data Management Method = unspecified Hierarchy of univariate binary decisions Each internal node specifies a binary test on a single variable Using thresholds on real and integer valued variables Can use any of several splitting criteria Chooses best variable for splitting data Classification Tree 7

Score Function of CART Quality of Tree structure n i= 1 C ( y( i), yˆ( i) ) Loss incurred when class label for ith data vector y(i) is predicted by the tree to be y^(i) Specified by an m x m matrix, where m is the number of classes 8

CART Search Greedy local search to identify candidate structures Recursively expands from root node Prunes back specific branches of large tree Greedy local search is most common method for practical tree learning! 9

Result of CART Representational power is coarse: Color Intensity Decision regions are constrained to be hyper-rectangles with boundaries parallel to input variable axes Alcohol Content(%) Decision Boundaries of Classification Tree Superposed on Data. Note parallel nature of boundaries Classification Tree 10

CART Scoring/Stopping Criterion Cross Validation to estimate misclassification: Partition sample into training and validation sets Estimate misclassification on validation set Repeat with different partitions and average results for each tree size Overfitting Tree complexity (no of leaves in tree) 11

CART Data Management Assumes that all the data is in main memory For tree algorithms the data management non-trivial Since it recursively partitions the data set Repeatedly find different subsets of observations in database Naïve inmplementation will involve repeated scans of secondary stoarage medium leading to poor time performance 12

Reductionist Viewpoint of Data Mining Algorithms A Data Mining Algorithm is a tuple: {model structure, score function, search method, data management techniques} Combining different model structures with different score functions, etc will yield a potentially infinite number of different algorithms 13

Reductionist Viewpoint applied to 3 algorithms 1. Multilayer Perceptron (MLP) for Regression and Classification 2. A Priori Algorithm for Association Rule Learning 3. Vector Space Algorithms for Text Retrieval 14

Multilayer Perceptron (MLP) Artificial Neural Network Non-linear mapping from real-valued input vector x to real-valued output vector y Thus MLP can be used as a nonlinear model for regression as well as for classification 15

MLP Formulas From first layer of weights s 4 4 1 = αixi s2 = βi i= 1 i= 1 x i Non-linear Transformation at hidden nodes h 1 1+ e = h( s 1 ) = 2 s2 1 1 s h 1 = h( s 2 ) = 1+ e Multilayer Perceptron with two Hidden nodes (d 1 =2) and one output node (d 2 =1) Output Value y = 2 i= 1 w i h i 16

MLP in Matrix Notation [.. ] X 1 x p p x d 1 = [.. ] 1 x d 1 Input Values Weight matrix Hidden Node Outputs X [.. ] d 1 = 2 and d 2 = 1 f(1 x d 2 ) = Output Values Weight matrix d1 x d2 Multilayer Perceptron with two Hidden nodes (d =2) and one output node (d =1) 17

MLP Result on Wine Data Color Intensity Highly non-linear decision boundaries Unlike CART, no simple summary form to describe workings of neural network model Alcohol Content(%) Type of decision boundaries produced by a neural network on wine data 18

MLP algorithm-tuple 1. Task = prediction: classification or regression 2. Structure = Layers of nonlinear transformations of weighted sums of inputs 3. Score Function = Sum of squared errors 4. Search Method = Steepest descent from random initial parameter values 5. Data Management Technique = online or batch 19

MLP Score, Search, Data Mgmt Score function S SSE = n i= 1 ( ) 2 y( i) yˆ( i) True Target Value Output of Network Search Highly nonlinear multivariate optimization Backpropagation uses steepest descent to local minimum Data Management On-line (update one data point at a time) Batch mode (update after seeing all data points) 20