Probability Base Classification Technique: A Preliminary Study for Two Groups

Similar documents
X- Chart Using ANOM Approach

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Classifier Selection Based on Data Complexity Measures *

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Generarlized Operations on Fuzzy Graphs

Speech Recognition Using Vector Quantization through Modified K-meansLBG Algorithm

Cluster Analysis of Electrical Behavior

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Outlier Detection based on Robust Parameter Estimates

A Binarization Algorithm specialized on Document Images and Photos

Feature Reduction and Selection

A Semi-parametric Regression Model to Estimate Variability of NO 2

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Recognizing Faces. Outline

Solving two-person zero-sum game by Matlab

Support Vector Machines

Constructing Minimum Connected Dominating Set: Algorithmic approach

A Robust Method for Estimating the Fundamental Matrix

Parallelism for Nested Loops with Non-uniform and Flow Dependences

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Fuzzy Logic Based RS Image Classification Using Maximum Likelihood and Mahalanobis Distance Classifiers

The Research of Support Vector Machine in Agricultural Data Classification

S1 Note. Basis functions.

Analysis of Malaysian Wind Direction Data Using ORIANA

y and the total sum of

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Estimating Regression Coefficients using Weighted Bootstrap with Probability

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

TN348: Openlab Module - Colocalization

CS 534: Computer Vision Model Fitting

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

A new paradigm of fuzzy control point in space curve

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Classifying Acoustic Transient Signals Using Artificial Intelligence

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Smoothing Spline ANOVA for variable screening

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Biostatistics 615/815

An Optimal Algorithm for Prufer Codes *

High Dimensional Data Clustering

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

A Multivariate Analysis of Static Code Attributes for Defect Prediction

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Load Balancing for Hex-Cell Interconnection Network

Lecture 4: Principal components

Machine Learning. Topic 6: Clustering

A Statistical Model Selection Strategy Applied to Neural Networks

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

Fusion Performance Model for Distributed Tracking and Classification

Cell Count Method on a Network with SANET

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Wireless Sensor Network Localization Research

Robust data analysis in innovation project portfolio management

EVALUATION OF THE PERFORMANCES OF ARTIFICIAL BEE COLONY AND INVASIVE WEED OPTIMIZATION ALGORITHMS ON THE MODIFIED BENCHMARK FUNCTIONS

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

Analysis of Dependencies of Checkpoint Cost and Checkpoint Interval of Fault Tolerant MPI Applications

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Unsupervised Learning and Clustering

2-Dimensional Image Representation. Using Beta-Spline

Incremental MQDF Learning for Writer Adaptive Handwriting Recognition 1

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Methodology of optimal sampling planning based on VoI for soil contamination investigation

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

FACIAL FEATURE EXTRACTION TECHNIQUES FOR FACE RECOGNITION

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

Intra-Parametric Analysis of a Fuzzy MOLP

3D vector computer graphics

A fast algorithm for color image segmentation

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Announcements. Supervised Learning

A Comparative Study for Outlier Detection Techniques in Data Mining

An Entropy-Based Approach to Integrated Information Needs Assessment

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Query Clustering Using a Hybrid Query Similarity Measure

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

High-Boost Mesh Filtering for 3-D Shape Enhancement

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Analysis of Collaborative Distributed Admission Control in x Networks

Transcription:

Mathematcal Theory and Modelng ISSN 4-5804 (Paper) ISSN 5-05 (Onlne) www.ste.org Probablty Base Classfcaton Technque: A Prelmnary Study for Two Groups Frday Znzendoff Okwonu,* Abdul Rahman Othman. Department of Mathematcs and Computer Scence, Delta State Unversty, P.M.B., Abraka, Ngera. Center for Mathematcal Scences, School of Dstance Educaton, Unverst Sans Malaysa, 800, Penang, Malaysa * E-mal:fzokwonu_delsu@yahoo.com Abstract The conventonal Fsher lnear classfcaton technque to perform classfcaton for two groups problem s strctly developed based on the wthn group sample mean vectors and wthn group sample varance covarance matrces. A comparable classfcaton procedure that ncorporate the wthn group probabltes s consdered. The conventonal procedure based on the Fsher s technque assumed equalty of the wthn group probabltes as such the computatonal procedure negate the wthn groups probabltes to solve classfcaton problems. The new approach s a modfcaton of the coeffcent of the Fsher s technque by applyng the wthn group probablty for the respectve groups to solve classfcaton problems.the classfcaton performance of these technques s nvestgated based on generated contamnated normal data set usng homoscedastc and heteroscedastc varance covarance matrces for varous sample szes and dmensons. The comparatve performance of these procedures are nvestgated by comparng the mean probabltes of correct classfcaton based on the contamnated date set wth the mean of the optmal probablty computed from the uncontamnated data set. The comparatve classfcaton performance revealed that both technques perform comparable. Though, the Monte Carlo smulaton ndcate that as the proporton of contamnaton ncreases, the probablty base approach perform better for homoscedastc covarance matrces, on the other hand, the Fsher s technque outperformed the probablty base procedure for heteroscedastc covarance matrces. The comparatve analyss ndcate that the probablty base approach performed comparable wth the conventonal procedure. The mplcaton of ths procedure ndcate that classfcaton problems can be solved by ncorporatng the respectve wthn group probabltes to develop the classfcaton model. Keywords: Classfcaton, Homoscedastc and Heteroscedastc Covarance Matrces, Mean Probablty. Introducton Conventonally, the lnear classfcaton problem for two groups s accomplshed usng the Fsher Lnear Classfcaton Analyss (). Ths procedure strctly depends on the wthn group sample mean vectors and the wthn group sample varance covarance matrces. The Fsher s technque s based on the assumpton of multvarate normal data set and the varance covarance matrces are homoscedastc. The sample mean vectors and sample covarance matrces are unstable because these parameters are susceptble or easly nfluenced by nfluental observatons (Maronna et al. 006; Munoz-Pchardo et al. 0). Sajob et al. (0) proposed to robustfy the sample mean vectors and the covarance matrces by replacng the maxmum lkelhood estmates by the maxmum lkelhood estmators computed based on coordnate wse trmmng. Hubert et al. (00) proposed permutaton nvarant technque called determnstc algorthm for the mnmum covarance determnant procedure. Ths procedure uses permutaton/determnstc method rather than the random subset to robustfy the sample mean and covarance matrx. Bouveyron & Brunet (0) proposed robust and flexble Fsher lnear dscrmnant analyss based on probablstc concept that relax the equal covarance assumpton. Ths technque, bascally does not ncorporate the wthn group probabltes n computng the classfcaton coeffcent. Ths paper consder the modfcaton of the Fsher s technque by ntroducng the wthn group probabltes to the separaton parameter w. The new procedure solve classfcaton problems for two groups by ncorporatng the nformaton the wthn group probabltes provdes and to obtan maxmum correct classfcaton rate. Ths procedure adheres strctly to the homoscedastc assumpton of the covarance matrces. The performance of these methods s nvestgated for contamnated normal data set, equal and unequal varance covarance matrces. The methodology secton contans the Fsher lnear classfcaton analyss followed by the probablty base classfcaton technque. Smulaton results are contaned n results secton followed by dscusson and conclusons, respectvely.. Method The method secton conssts of the Fsher lnear classfcaton analyss and the Probablty base classfcaton technque. Both procedures are appled to perform classfcaton for two groups problem. 40

Mathematcal Theory and Modelng ISSN 4-5804 (Paper) ISSN 5-05 (Onlne) www.ste.org. Fsher Lnear Classfcaton Analyss () It s observed that the two groups lnear classfcaton technque based on Fsher s technque assumed that the wthn group probablty and msclassfcaton cost are equal, as such ts classfcaton rule negate the probabltes for each group, that s: ℵ (/) p ξ ξ = c(/) p ℵ c - ln 0 ℵ ξ ξ< ℵ (/) p (/) p c - ln =0 c As observed n the lterature, the Fsher s technque performs optmally f the data set s drawn from the multvarate normal dstrbuton and f the varance covarance matrces are equal. When the classfcaton coeffcent s nconsstent, the msclassfcaton rate tends to ncrease. The wthn group mean vector, varance covarance matrces and the pooled common covarance matrx are defned as follows: N x = x j / N,=, (3) j= () () N S = (x x )(x x ) /(N ) (4) j j j= S = = pooled (N )S = N Equatons (3-5) are appled to develop Equatons (-). Based on the equalty assumptons n Equatons (-), the Fsher s procedure reduces to: ξ ξ (6) (5) ξ<ξ (7) where, ξ = (x x )Spooledx = q x s the classfcaton score and ξ = ((x + x )/)q s the cutoff pont. Equatons (6-7) defnes the Fsher s classfcaton rule. Equaton (6) mples that an observaton n group one s allocated correctly to group one otherwse the observaton s assgned to group two f Equaton (7) s satsfed, respectvely.. Probablty Base Classfcaton Technque (PCT) Ths secton descrbe classfcaton procedure that ncludes the wthn group probabltes to develop the classfcaton coeffcent. Based on Equaton (3), the wthn group mean vectors dfference for the two groups s ) obtaned, say, d= x xand the sum of the wthn group mean vectors s gven as d = x+ x, respectvely. To formulate the coeffcent of the new procedure, the followng are obtaned: = d, d% = +, β= d /d, % (8) ε= β. Based on the defntons n Equaton (8), the followng s obtaned: w e β β = + e /ε + p (9) where, P = N /N s the wthn group probabltes, N s the sample sze for each group, N s the total sample sze for the two groups and p= p, s the total probablty. The classfcaton model s gven as: = 4

Mathematcal Theory and Modelng ISSN 4-5804 (Paper) ISSN 5-05 (Onlne) www.ste.org The classfcaton cutoff pont s gven as follows: w z= x= ux, S pooled w u =. S pooled ) d z = u (0) () The classfcaton rule s defned as: z< z () n ths regard, an observaton s assgned to group one f Equaton () s satsfed otherwse the observaton s classfed to group two f the followng equaton hold: z z (3) 3. Result The Monte Carlo smulaton s desgned to nvestgate the comparatve classfcaton performance of the above technques for unequal and equal varance covarance matrces based on contamnated normal data set. The contamnaton normal model used n ths study for the respectve groups s gven as: ( ε )N (0,) +ε N ( µσ, I ) (4) dp dp dp Ths model requre that majorty of the data set come from the normal dstrbuton whle the rest come from the contamnated dstrbuton (Cont. Dst.). In each case, the data set s randomly reshuffled and dvded nto two categores; say tranng set (60%) and valdaton set (40%). To determne the performance of each procedure, the mean of the optmal probablty (Opt.) s used as the performance benchmark. The comparatve analyses are based on the comparson between the mean of the optmal probablty computed from the uncontamnated normal data set and the mean probabltes of correct classfcaton obtan from each technque. In the respectve fgures, the straght lne s the performance benchmark. Fgure and Fgure show that the Fsher s technque performed better than the probablty based approach for ncreasng proporton of contamnaton for the unequal varance covarance matrces. Fgure 3 revealed that the probablty base approach performed better than the Fsher s technque for the equal varance covarance matrces and performed comparable n Fgure 4. The followng results n Tables and reveal the performance of these technques for heteroscedastc matrces whle Table 3 and 4 show the performance of both technques for homoscedastc matrces. The best procedure appears n bold. The analyss reveals that the and the PCT technques are comparable n all cases nvestgated. (Mean probabltes of correct classfcaton) 0.84 0.8 0.8 0.78 0.76 0.74 Mean of the optmal probablty PCT 0.7 0 4 6 8 0 4 6 8 30 (Proporton of contamnaton) Fgure.Effect of contamnaton on the mean probablty of correct classfcaton 4

Mathematcal Theory and Modelng ISSN 4-5804 (Paper) ISSN 5-05 (Onlne) www.ste.org Table. Mean probablty of correct classfcaton and standard devaton (In Bracket), Optmal = 0.8340 Con. Dst. N d ε PCT OPT- OPT-PCT p ε N( 3,0 )? 30 0 0.834 0.8338 0.006 0.000 (0.0055) (0.00) ε N( 3,0 )? 30 0 0.807 0.806 0.068 0.034 (0.0065) (0.040) ε N( 3,0 )? 30 30 0.745 0.7393 0.095 0.0947 (0.000) (0.0096) : Fsher lnear classfcaton analyss PCT: Probablty base classfcaton technque OPT-: Dfference between the mean of the optmal probablty and the mean probablty of OPT-PCT: Dfference between the mean of the optmal probablty and the mean probablty of PCT (Mean probabltes of correct classfcaton) 0.9 0.88 0.86 0.84 0.8 0.8 0.78 0.76 0.74 Mean of the optmal probablty PCT 0.7 0 4 6 8 0 4 6 8 30 (Proporton of contamnaton) Fgure.Effect of contamnaton on the mean probablty of correct classfcaton Table. Mean probablty of correct classfcaton and standard devaton (In Bracket), Optmal = 0.8749 Con. Dst. N d ε PCT OPT- OPT-PCT p ε N3( 4.5 ) 牋 60 3 0 0.8553 0.8506 0.096 0.044 (0.0068) (0.0033) ε N3( 4.5 ) 牋 60 3 0 0.84 0.7997 0.0608 0.075 (0.0084) (0.0030) ε N3( 4.5 ) 牋 60 3 30 0.7570 0.76 0.79 0.488 (0.0096) (0.0070) : Fsher lnear classfcaton analyss PCT: Probablty base classfcaton technque OPT-: Dfference between the mean of the optmal probablty and the mean probablty of OPT-PCT: Dfference between the mean of the optmal probablty and the mean probablty of PCT 43

Mathematcal Theory and Modelng ISSN 4-5804 (Paper) ISSN 5-05 (Onlne) www.ste.org (Mean probabltes of correct classfcaton) 0.9 0.9 0.89 0.88 0.87 0.86 0.85 0.84 Mean of the optmal probablty PCT 0.83 0 4 6 8 0 4 6 8 30 (Proporton of contamnaton) Fgure 3.Effect of contamnaton on the mean probablty of correct classfcaton Table 3. Mean probablty of correct classfcaton and standard devaton (In Bracket), Optmal = 0.9099 Con. Dst. N d ε PCT OPT- OPT-PCT p ε N3(,9 )? 30 3 0 0.8967 0.9009 0.03 0.009 (0.006) (0.0063) ε N3(,9 )? 30 3 0 0.8774 0.879 0.035 0.0308 (0.000) (0.08) ε N3(,9 )? 30 3 30 0.839 0.8406 0.0707 0.0694 (0.046) (0.08) : Fsher lnear classfcaton analyss PCT: Probablty base classfcaton technque OPT-: Dfference between the mean of the optmal probablty and the mean probablty of OPT-PCT: Dfference between the mean of the optmal probablty and the mean probablty of PCT 44

Mathematcal Theory and Modelng ISSN 4-5804 (Paper) ISSN 5-05 (Onlne) www.ste.org (Mean probabltes of correct classfcaton) 0.96 0.94 0.9 0.9 0.88 0.86 0.84 Mean of the optmal probablty PCT 0.8 0 4 6 8 0 4 6 8 30 (Proporton of contamnaton) Fgure 4.Effect of contamnaton on the mean probablty of correct classfcaton Table 4. Mean probablty of correct classfcaton and standard devaton (In Bracket), Optmal = 0.9484 Con. Dst. N d ε PCT OPT- OPT-PCT p ε N5( 4,6 ) 牋 00 5 0 0.9383 0.936 0.00 0.0 (0.0085) (0.0074) ε N5( 4,6 ) 牋 00 5 0 0.897 0.890 0.0567 0.0564 (0.007) (0.00) ε N5( 4,6 ) 牋 00 5 30 0.8379 0.8438 0.05 0.046 (0.004) (0.004) : Fsher lnear classfcaton analyss PCT: Probablty base classfcaton technque OPT-: Dfference between the mean of the optmal probablty and the mean probablty of OPT-PCT: Dfference between the mean of the optmal probablty and the mean probablty of PCT 3. Dscusson The conventonal technque to solve classfcaton problem based on the Fsher s technque does not ncorporate the wthn group probabltes to develop the Fsher s classfcaton coeffcent, see Equatons (-). A comparable classfcaton technque that ncorporate the wthn group probabltes to formulate the classfcaton coeffcent was proposed. The classfcaton performance of these technques was nvestgated by volatng the homoscedastc and multvarate normalty assumptons. The Monte Carlo smulatons performed are based on the followng controlled varables; the mean vector shft, varance shft, sample sze and dmenson, proporton of contamnaton. The comparatve classfcaton performace based on the fgures and tables revealed that these technques performed comparable. These technques ultlze all the nformaton glean from the data set. The probablty base approach provde more nformaton to the end user than the conventonal technque. 4. Concluson A comparable classfcaton technque based on probablty concept for two groups problem was compared wth the conventonal Fsher lnear classfcaton procedure. The new technque based on the wthn group probabltes s sutable to perform classfcaton for two groups problem where the probablty of the respectve groups are gven. The comparatve analyses revealed that both technques performed comparable. 45

Mathematcal Theory and Modelng ISSN 4-5804 (Paper) ISSN 5-05 (Onlne) www.ste.org Acknowledgement Ths research work was funded through the short term grant of the Unverst Sans Malaysa, Penang, Malaysa. References Bouveyron, C. & Brunet, C. ( 0), Probablstc Fsher Dscrmnant analyss: A robust and Flexble Alternatve to Fsher Dscrmnant Analyss, Neurocomputng 90,-. Hubert, M., Rousseeuw, P. J. & Verdonck, T. (00), "A Determnstc Algorthm for the MCD. Cteseerx.st.psu.edu/vewdoc/summary?, -6. Maronna, R., Martn, R. D. & Yoha, V. J. (006), "Robust Statstcs: Theory and Methods", John Wley, New York. Munoz-Pchardo, J. M., Engux-Gonzalez, A., Munoz -Garca, J. & Moreno-Rebollo, J. L. (0)," Influence Analyss on Dscrmnant Coordnates", Communcatons n Statstcs-Smulaton and Computaton, 40(60), 793-807. Sajob, T. T., Lx, L. M., Dansu, B. M., Laverty, W. & L, L. (0), "Robust Descrptve Dscrmnant Analyss for Repeated Measures Data", Computatonal Statstcs and Data Anal.yss, 56(9), 78-794. 46

Ths academc artcle was publshed by The Internatonal Insttute for Scence, Technology and Educaton (IISTE). The IISTE s a poneer n the Open Access Publshng servce based n the U.S. and Europe. The am of the nsttute s Acceleratng Global Knowledge Sharng. More nformaton about the publsher can be found n the IISTE s homepage: http://www.ste.org CALL FOR JOURNAL PAPERS The IISTE s currently hostng more than 30 peer-revewed academc journals and collaboratng wth academc nsttutons around the world. There s no deadlne for submsson. Prospectve authors of IISTE journals can fnd the submsson nstructon on the followng page: http://www.ste.org/journals/ The IISTE edtoral team promses to the revew and publsh all the qualfed submssons n a fast manner. All the journals artcles are avalable onlne to the readers all over the world wthout fnancal, legal, or techncal barrers other than those nseparable from ganng access to the nternet tself. Prnted verson of the journals s also avalable upon request of readers and authors. MORE RESOURCES Book publcaton nformaton: http://www.ste.org/book/ Recent conferences: http://www.ste.org/conference/ IISTE Knowledge Sharng Partners EBSCO, Index Coperncus, Ulrch's Perodcals Drectory, JournalTOCS, PKP Open Archves Harvester, Belefeld Academc Search Engne, Elektronsche Zetschrftenbblothek EZB, Open J-Gate, OCLC WorldCat, Unverse Dgtal Lbrary, NewJour, Google Scholar