Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1
Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive groups based on a set of measurable object's features Introduction Purpose of Discriminant Analysis To classify objects (people, customers, things, etc.) into one of two or more groups based on a set of features that describe the objects (e.g. gender, age, income, weight, preference score, etc. ). Two things to check Which set of features can best determine group membership of the object? Feature Selection What is the classification rule or model to best separate those groups? Classification 2
Outline Introduction Linear Discriminant Analysis Examples Linear Discriminant Analysis (LDA) Linear discriminant analysis (LDA), Also called Fisher's linear discriminant Methods used in statistics and machine learning to find the linear combination of features which best separate two or more classes of object or event. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification. 3
Dimensionality Reduction Curse of dimensionality: Problem caused by higher the dimension of the feature vectors Data sparsity Undertrained classifier Goal: Reduce dimension of feature vectors without loss of information Linear Discriminant Analysis (LDA) Goal: Try to optimize class separability Also known as Fisher s discriminant analysis 4
Linear Discriminant Analysis (LDA) Problem statement Assign class category (or the group, class label): good and bad for each product Class category is also called dependent variable. Based on Features Each measurement on the product is called features that describe the object it is also called independent variable. Dependent variable (Y) is the group The dependent variable is always category (nominal scale) variable Independent variables (X) are the object features that might describe the group independent variables can be any measurement scale (i.e. nominal, ordinal, interval or ratio) Linear Discriminant Analysis (LDA) Linear Discriminant Analysis (LDA) Assume that the groups are linearly separable Use linear discriminant model (LDA) What is Linearly separable? It suggests that the groups can be separated by a linear combination of features that describe the objects 5
Linear Discriminant Analysis (LDA) PCA vs LDA PCA is trying to find the strongest correlation in the dataset LDA is trying to optimize class separability Linear Discriminant Analysis (LDA) Goal of LDA: try to maximize class seperability 6
Different Approaches to LDA Class-dependent transformation Maximizing the ratio of between class variance to within class variance. Involving using two optimizing criteria for transforming the data sets independently Class-independent transformation Maximizing the ratio of overall variance to within class variance Only using one optimizing criterion to transform the data sets and hence all data points irrespective of their class identity are transformed using this transform. Numerical Example Given a two-class problem. Input: two sets of 2-D data points Class 1 Class 2 7
Numerical Example Step 1 Compute the mean of each data set and mean of entire data set. Data Points in Class 1 Mean of Set 1 µ 1 n*1 column vector Data Points in Class 2 Mean of Set 2µ 2 n*1 column vector Data Points in both Mean of Entire Data Class 1 and Class 2 3 µ n*1 column vector Where n is the number of dimension. In our case, it is equal to 2 Numerical Example Step 2 Compute the Between Class Scatter Matrix Matrix S w S b and Within Class Scatter Within Class Scatter Matrix where p j is the prior probabilities of the j th class and is covariance matrix of the j th class (set j) Between Class Scatter Matrix where and µ 3 µ j is the mean of the entire data is the mean of the j th class (set j) 8
Numerical Example Step 3 Eigenvectors computation Class-dependent transformation: Obtain the eigenvectors from Maximizing the ratio of between class variance to within class variance. Involving using two optimizing criteria for transforming the data sets independently Optimizing Criterion Eigenvectors Transform_j Class-independent transformation: Obtain the eigenvectors from Maximizing the ratio of overall variance to within class variance Only using one optimizing criterion to transform the data sets and hence all data points irrespective of their class identity are transformed using this transform. Optimizing Criterion Eigenvectors Transform_spec Numerical Example Step 4 Transformed matrix calculation Where transform_j is composed of eigenvectors from Where transform_spec is composed of eigenvectors from 9
Numerical Example Step 5 Euclidean distance calculate where µ ntrans n x is the mean of the transformed data set is the class index is the test vector For n classes, n Euclidean distances are obtained for each test point Numerical Example Step 6 Classification result is based on the smallest Euclidean distance among the n distances classifiers the test vector a belonging to class n 10
Extension to Multiple Classes Between Class Scatter Matrix Extension to Multiple Classes Within Class Scatter Matrix 11
Extension to Multiple Classes S 1 S w b r φ λ r = φ i i Questions? 12