Self-Supervised Learning Based on Discriminative Nonlinear Features for Image Classification

Size: px

Start display at page:

Download "Self-Supervised Learning Based on Discriminative Nonlinear Features for Image Classification"

Lynette Jordan
5 years ago
Views:

1 Self-Supervsed Learnng Based on Dscrmnatve Nonlnear Features for Image Classfcaton Q an, Yng Wu 2, Je Yu, homas S. Huang 3 Department of Computer Scence, Unversty of exas at San Antono, San Antono, X Department of Electrcal and Computer Engneerng, Northwestern Unversty, Evanston, IL Beckman Insttute, Unversty of Illnos, 405 N. Mathews Ave, Urbana, IL 680 Summary It s often tedous and expensve to label large tranng datasets for learnng-based mage classfcaton. hs problem can be allevated by self-supervsed learnng technques, whch take a hybrd of labeled and unlabeled data to tran classfers. However, the feature dmenson s usually very hgh (typcally from tens to several hundreds). he learnng s afflcted by the curse of dmensonalty as the search space grows exponentally wth the dmenson. Dscrmnant-EM (DEM) proposed a framework for such tasks by applyng self-supervsed learnng n an optmal dscrmnatng subspace of the orgnal feature space. However, the algorthm s lmted by ts lnear transformaton structure whch cannot capture the non-lnearty n the class dstrbuton. hs paper extends the lnear DEM to a nonlnear kernel algorthm, Kernel DEM (KDEM) based on kernel multple dscrmnant analyss (KMDA). KMDA provdes better ablty to smplfy the probablstc structures of data dstrbuton n a dscrmnatng subspace. KMDA and KDEM are evaluated on both benchmark databases and synthetc data. Expermental results show that classfer usng KMDA s comparable wth support vector machne (SVM) on standard benchmark test, and KDEM outperforms a varety of supervsed and sem-supervsed learnng algorthms for several tasks of mage classfcaton. Extensve results show the effectveness of our approach. Correspondng author: Q an, 6900 North Loop 604 West, Department of Computer Scence, Unversty of exas at San Antono, San Antono, X Emal: qtan@cs.utsa.edu el: (20) , Fax: (20)

2 Revsed draft submtted to Pattern Recognton, Specal Issue on Image Understandng for Dgtal Photographs June 2004 Self-Supervsed Learnng Based on Dscrmnatve Nonlnear Features for Image Classfcaton Q an, Yng Wu 2, Je Yu, homas S. Huang 3 Department of Computer Scence, Unversty of exas at San Antono, San Antono, X Department of Electrcal and Computer Engneerng, Northwestern Unversty, IL Beckman Insttute, Unversty of Illnos, 405 N. Mathews Ave., Urbana, IL 680 Abstract For learnng-based tasks such as mage classfcaton, the feature dmenson s usually very hgh. he learnng s afflcted by the curse of dmensonalty as the search space grows exponentally wth the dmenson. Dscrmnant-EM (DEM) proposed a framework by applyng self-supervsed learnng n a dscrmnatng subspace. hs paper extends the lnear DEM to a nonlnear kernel algorthm, Kernel DEM (KDEM), and evaluates KDEM extensvely on benchmark mage databases and synthetc data. Varous comparsons wth other state-of-the-art learnng technques are nvestgated for several tasks of mage classfcaton. Extensve results show the effectveness of our approach. Keywords Dscrmnant analyss, kernel functon, unlabeled data, support vector machne, mage classfcaton Correspondng author: Q an, 6900 North Loop 604 West, Department of Computer Scence, Unversty of exas at San Antono, San Antono, X Emal: qtan@cs.utsa.edu el: (20) , Fax: (20)

3 . INRODUCION Content-based mage retreval (CBIR) s a technque whch uses vsual content to search mages from large-scale mage database accordng to users nterests and has been an actve and fast advancng research area snce the 990s []. A challenge n content-based mage retreval s the semantc gap between the hgh-level semantcs n human mnd and the low-level features computed by the machne. Users seek semantc smlarty, but the machne can only provde smlarty by data processng. he mappng between them would be nonlnear such that t s mpractcal to represent t explctly. A promsng approach to ths problem s machne learnng, by whch the mappng could be learned through a set of examples. he task of mage retreval s to fnd as many as possble smlar mages to the query mages n a gven database. he retreval system acts as a classfer to dvde the mages n the database nto two classes, ether relevant or rrelevant. Many supervsed or sem-supervsed learnng approaches have been employed to approach ths classfcaton problem. Successful examples of learnng approaches n content-based mage retreval can be found n the lterature [2-9]. However, representng mage n the mage space s formdable, snce the dmensonalty of the mage space s ntractable. Dmenson reducton could be acheved by dentfyng nvarant mage features. In some cases, doman knowledge could be exploted to extract mage features from vsual nputs. On the other hand, the generalzaton abltes of many current methods largely depend on tranng datasets. In general, good generalzaton requres large and representatve labeled tranng datasets. In content-based mage retreval [-9], there are a lmted number of labeled tranng mages gven by user query and relevance feedback [0]. Pure supervsed learnng from such a small tranng dataset wll have poor generalzaton performance. If the learnng classfer s over-traned on the small tranng dataset, over-fttng wll probably occur. However, there are a large number of unlabeled mages or unlabeled data n general n the gven database. Unlabeled data contan nformaton about the ont dstrbuton over features whch can be used to help supervsed learnng. hese algorthms normally 2

4 assume that only a fracton of the data s labeled wth ground truth, but stll take advantage of the entre data set to generate good classfers; they make the assumpton that nearby data are lkely to be generated by the same class. hs learnng paradgm could be looked as an ntegraton of pure supervsed and unsupervsed learnng. Dscrmnant-EM (DEM) [] s a self-supervsed learnng algorthm for such purposes by takng a small set of labeled data wth a large set of unlabeled data. he basc dea s to learn dscrmnatng features and the classfer smultaneously by nsertng a mult-class lnear dscrmnant step n the standard expectaton-maxmzaton (EM) [2] teraton loop. DEM makes assumpton that the probablstc structure of data dstrbuton n the lower dmensonal dscrmnatng space s smplfed and could be captured by lower order Gaussan mxture. Fsher dscrmnant analyss (FDA) and multple dscrmnant analyss (MDA) [2] are tradtonal twoclass and mult-class dscrmnant analyss whch treats every class equally when fndng the optmal proecton subspaces. Contrary to FDA and MDA, Zhou and Huang [3] proposed a based dscrmnant analyss (BDA) whch treats all postve,.e., relevant examples as one class, and negatve,.e., rrelevant, examples as dfferent classes for content-based mage retreval. he ntuton behnd BDA s that all postve examples are alke, each negatve example s negatve n ts own way. Compared wth the stateof-the-art methods such as support vector machnes (SVM) [4], BDA [3] outperforms SVM when the sze of negatve examples s small (<20). However, one drawback of BDA s ts gnorance of unlabeled data n the learnng process. Unlabeled data could mprove the classfcaton under the assumpton that nearby data s to be generated by the same class [5]. In the past years there has been a growng nterest n the use of unlabeled data for enhancng classfcaton accuracy n supervsed learnng such as text classfcaton [6, 7], face expresson recognton [8], and mage retreval [, 9]. 3

5 DEM dffers from BDA n the use of unlabeled data and the way they treat the postve and negatve examples n the dscrmnaton step. However, the dscrmnaton step s lnear n both DEM and BDA, they have dffculty handlng data sets whch are not lnearly separable. In CBIR, mage dstrbuton s usually modeled as a mxture of Gaussans, whch s hghly non-lnear-separable. In ths paper, we generalze the DEM from lnear settng to a nonlnear one. Nonlnear, kernel dscrmnant analyss transforms the orgnal data space X to a hgher dmensonal kernel feature space F and then proects the transformed data to a lower dmensonal dscrmnatng subspace such that nonlnear dscrmnatng features could be dentfed and tranng data could be better classfed n a nonlnear feature subspace. he rest of paper s organzed as follows. In Secton 2, we present nonlnear dscrmnant analyss usng kernel functons [20, 2]. In Secton 3, two schemes are presented for samplng tranng data for effcent learnng of nonlnear kernel dscrmnants. In Secton 4, Kernel DEM s formulated and n Secton 5 we apply Kernel DEM algorthm to varous applcatons and compare wth other state-of-the-art methods. Our experments nclude standard benchmark testng, mage classfcaton on Corel photos and synthetc data, and vew-ndependent hand posture recognton. Fnally, conclusons and future work are gven n Secton NONLINEAR DISCRIMINAN ANALYSIS Prelmnary results of applyng DEM for CBIR have been shown n []. In ths secton, we generalze the DEM from lnear settng to a nonlnear one. We frst map the data x va a nonlnear mappng φ nto some hgh, or even nfnte dmensonal feature space F and then apply lnear DEM n the feature space F. o avod workng wth the mapped data explctly (beng mpossble f F s of an nfnte dmenson), we wll adopt the well-known kernel trck [22]. he kernel functons k( x, z) compute a dot product n a feature A term used n kernel machne lteratures to denote the new space after the non-lnear transform-ths shall not be confused wth the feature space concept used n content-based mage retreval to denote the space for features or descrptors extracted from the meda data. 4

6 space F: k ( x, z) = ( φ( x) φ( z)). Formulatng the algorthms n F usng φ only n dot products, we can replace any occurrence of a dot product by the kernel functon k, whch amounts to performng the same lnear algorthm as before, but mplctly n a kernel feature space F. Kernel prncple has quckly ganed attenton n mage classfcaton n recent years [3, 9-23]. 2.. Lnear Features and Multple Dscrmnant Analyss It s common practce to preprocess data by extractng lnear and non-lnear features. In many feature extracton technques, one has a crteron assessng the qualty of a sngle feature whch ought to be optmzed. Often, one has pror nformaton avalable that can be used to formulate qualty crtera, or probably even more common, the features are extracted for a certan purpose, e.g., for subsequently tranng some classfer. What one would lke to obtan s a feature, whch s as nvarant as possble whle stll coverng as much of the nformaton necessary for descrbng the datas propertes of nterest. A classcal and well-known technque that solves thus type of problem, consderng only one lnear feature, s the maxmzaton of the so called Raylegh coeffcent [24, 2]. W SW J ( W ) = () W S W 2 Here, W denotes the weght vector of a lnear feature extractor (.e., for an example x, the feature s gven by the proectons (W x) and S and S2 are symmetrc matrces desgned such that they measure the desred nformaton and the undesred nose along the drecton W. he rato n equaton () s maxmzed when one covers as much as possble of the desred nformaton whle avodng the undesred. If we look for dscrmnatng drectons for classfcaton, we can choose S B to measure the separablty of class centers (between-class varance),.e., S n equaton (), and S W to measure the wthn-class varance,.e., S2 n equaton (). In ths case, we recover the well-known Fsher dscrmnant [25], where S B and S W are gven by: 5

7 S S B W = = C = C N N = = ( m m)( m m) (2) ( ) ( ) ( x m )( x m ) (3) we use { x ( ),, =,, N }, =, C ( C = 2 for Fsher dscrmnant analyss) to denote the feature vectors of tranng samples. C s the number of classes, th sample from the th class, N s the number of the samples of the th class, m s mean vector of the th class, and m s grand mean of all examples. ( ) x s the If S n equaton () s the covarance matrx of all the samples S C N ( ) ( ) = x m)( x m) C N = = ( (4) and S 2 dentty matrx, we recover standard prncpal component analyss (PCA) [26]. If S s the data covarance and S 2 the nose covarance (whch can be estmated analogous to equaton (4), but over examples sampled from the assumed nose dstrbuton), we obtan orented PCA [26], whch ams at fndng a drecton that descrbes most varance n the data whle avodng known nose as much as possble. PCA and FDA,.e., lnear dscrmnant analyss (LDA) are both common technques for feature dmenson reducton. LDA constructs the most dscrmnatve features whle PCA constructs the most descrptve features n the sense of packng most energy. here has been a tendency to prefer LDA over PCA because, as ntuton would suggest, the former deals drectly wth dscrmnaton between classes, whereas the latter deals wthout payng partcular attenton to the underlyng class structure. An nterestng result s reported by Martnez and Kaka [27] that ths s not always true n ther study on face recognton. PCA mght outperform LDA when the number of samples per class s small or when the tranng data non-unformly sample the underlyng dstrbuton. 6

8 When the number of tranng samples s large and tranng data s representatve for each class, LDA wll outperform PCA. Multple dscrmnant analyss (MDA) s a natural generalzaton of Fshers lnear dscrmnatve analyss for multple classes [2]. he goal s to maxmze the rato of equaton (). he advantage of usng ths rato s that t has been proven n [28] that f S W s nonsngular matrx then ths rato s maxmzed when the column vectors of the proecton matrx, W, are the egenvectors of S W S B. It should be noted that W maps the orgnal d -dmensonal data space X to a d 2 -dmensonal space ( d 2 C, C s the number of classes). For both FDA and MDA, the columns of the optmal W are the generalzed egenvector(s) w assocated wth the largest egenvalue(s). W w, w, w ] wll contan n ts columns C- opt = [ 2, C egenvectors correspondng to C- egenvalues,.e., S B w = λ S w [2]. W 2.2. Kernel Dscrmnant Analyss o take nto account non-lnearty n the data, we propose a kernel-based approach. he orgnal MDA algorthm s appled n a feature space F whch s related to the orgnal space by a non-lnear mappng φ: x φ(x). Snce n general the number of components n φ(x) can be very large or even nfnte, ths mappng s too expensve and wll not be carred out explctly, but through the evaluaton of a kernel k, wth elements k x x ) φ ( x ) φ ( x ). hs s the same dea adopted by the support vector machne (, = [4], kernel PCA [29], and nvarant feature extractons [30, 3]. he trck s to rewrte the MDA formulae usng only dot products of the form φ φ, so that the reproducng kernel matrx can be substtuted nto the formulaton and the soluton; elmnate the need for drect nonlnear transformaton. Usng superscrpt φ to denote quanttes n the new space and usng S B and S W for between-class scatter matrx and wthn-class scatter matrx, we have the obectve functon n the followng form: 7

9 W opt φ W S BW = arg max (5) W φ W S W W and S φ B = C = N ( m φ m φ )( m φ φ m ) (6) S φ W = C N = = ( ) φ ( ) φ ( φ( x ) m )( φ( x ) m ) (7) wth m φ = N φ( x ) N k k= N ( k ) k= φ, m = N φ x where =,, C, and N s the total number of samples. In general, there s no other way to express the soluton W opt F, ether because F s too hgh or nfnte dmenson, or because we do not even know the actual feature space connected to a certan kernel. But we know [22, 24] that any column of the soluton F,.e., w F. hus for some expanson coeffcents H α = [ α,, α N ], W opt must le n the span of all tranng samples n N H w = α kφ(x k ) = Φα =,, N (8) k = where Φ = [ φ( x ),, φ( x N )]. We can therefore proect a data pont x k onto one coordnate of the lnear subspace of F as follows (we wll drop the subscrpt on w n the ensung): w φ ( x k Φ k ) = H α φ( x ) (9) k( x, x L = α k( x N, x k k ) H = α ξ k ) (0) k( x, x k ) ξ = k () k( x, ) N x k 8

10 where we have rewrtten dot products, φ( x ) φ(y ) wth kernel notaton k ( x, y). Smlarly, we can proect each of the class means onto an axs of the subspace of feature space F usng only products: w φ m H = α N N k φ( x ) φ( x k ) = φ( x N ) φ( x k ) (2) N k( x N, x k ) H k= = α (3) N k( x, ) N N x k k = H = α (4) It follows that H H w S w = α K α (5) B B where C K B = N = ( )( ) and H H w S w = α K α (6) W W where W C N K ( ξ )( ξ ) = = k= k k. he goal of kernel multple dscrmnant analyss (KMDA) s to fnd A opt A K BA = arg max (7) A A K A H H where A = [ α,, α C ], C s the total number of classes, N s the sze of tranng samples, and K B and K W are N N matrces whch requre only kernel computatons on the tranng samples [22]. W Now we can solve for α H s, the proecton of a new pattern z onto w s gven by equatons (9)-(0). Smlarly, algorthms usng dfferent matrces for S, and S 2 n equaton (), are easly obtaned along the same lnes. 9

11 2.3. Based Dscrmnant Analyss BDA [3] dffers from tradtonal MDA defned n equatons ()-(3) and (5)-(7) n a modfcaton on the computaton of between-class scatter matrx S B and wthn-class scatter matrx S W. hey are replaced by S and S P, respectvely. N P S N P = N y = ( y m )( y m ) (8) x x S P = N x = ( x m )( x m ) (9) x x where x, =,, N } denotes the postve examples and y, =,, N } { x { y denotes the negatve examples, m s the mean vector of the sets { x }, respectvely. S N P s the scatter matrx between the x negatve examples and the centrod of the postve examples, and S P s the scatter matrx wthn the postve examples. N P ndcates the asymmetrc property of ths approach,.e., the users based opnon towards the postve class, thus the name of based dscrmnant analyss [3] Regularzaton and Dscountng Factors It s well known that sample-based plug-n estmates of the scatter matrces based on equatons (2, 3, 6, 7, 8, 9) wll be severely based for small number of tranng samples,.e., the largest egenvalue becomes larger, whle the small ones become smaller. If the number of the feature dmensons s large compared wth the number of tranng examples, the problem becomes ll-posed. Especally n the case of kernel algorthms, we effectvely work n the space spanned by all N mapped tranng examples φ(x ) whch are, n practce, often lnearly dependent. For nstance, for KMDA, a soluton wth zero wthn class scatter (.e., A K W A =0) s very lkely due to overfttng. A compensaton or regulaton can be done by addng small quanttes to the dagonal of the scatter matrces [32]. 0

12 3. RAINING ON A SUBSE We stll have one problem: whle we could avod workng explctly n the extremely hgh or nfnte dmensonal space F, we are now facng a problem n N varables, a number whch n many practcal applcatons would not allow to store or manpulate N N matrces on a computer anymore. Furthermore, solvng an egen-problem or a QP of ths sze s very tme consumng ( O ( N ) ). o maxmze equaton (7), we need to solve an N N egen- or mathematcal programmng problem, whch mght be ntractable for large N. Approxmate solutons could be obtaned by samplng representatve subsets of the tranng 3 data { x k k =,, M, M << N}, and usng data-samplng schemes are proposed. ξ k, x k ),, k( x M, x k = [ k ( x )] to take the place of ξ k. wo 3.. PCA-based Kernel Vector Selecton he frst scheme s blnd to the class labelng. We select representatves, or kernel vectors, by dentfyng those tranng samples whch are lkely to play a key role n Ξ = ξ,, ξ ]. Ξ s a N N matrx, but [ N rank ( Ξ) << N, when the sze of tranng dataset s very large. hs fact suggests that some tranng samples could be gnored n calculatng kernel features ξ. We frst compute the prncpal components of Ξ. Denote the N N matrx of concatenated egenvectors wth P. hresholdng elements of abs(p) by some fracton of the largest element of t allows us to dentfy salent PCA coeffcents. For each column correspondng to a non-zero egenvalue, choose the tranng samples whch correspond to a salent PCA coeffcent,.e., choose the tranng samples correspondng to rows that survve the thresholdng. Do so for every non-zero egenvalue and we arrve at a decmated tranng set, whch represents data at the perphery of each data cluster. Fgure shows an example of KMDA wth 2D two-class non-lnear-separable samples.

13 3.2 Evolutonary Kernel Vector Selecton he second scheme s to take advantage of class labels n the data. We mantan a set of kernel vectors at every teraton whch are meant to be the key peces of data for tranng. M ntal kernel vectors, are chosen at random. At teraton k, we have a set of kernel vectors, (0) KV, (k ) KV, whch are used to perform KMDA such that the nonlnear proecton ( k) ( k ) ( k ) ( k ) y = w φ( x ) = A opt ξ of the orgnal data x can be obtaned. We assume Gaussan dstrbuton (k ) (k ) θ for each class n the nonlnear dscrmnaton space, and the parameters θ can be estmated by { y }, such that the labelng and tranng error ( k) obtaned by l = arg max p( l y, θ ( k)}. (k) (k ) e can be If e ( k ) ( ) < k e, we randomly select M tranng samples from the correctly classfed tranng samples as kernel vector ( t+) KV at teraton k +. Another possblty s that f any current kernel vector s correctly classfed, we randomly select a sample n ts topologcal neghborhood to replace ths kernel vector n the next teraton. Otherwse,.e., e ( k ) ( ) > k e, and we termnate. he evolutonary kernel vector selecton algorthm s summarzed below: Evolutonary Kernel Vector Selecton: Gven a set of tranng data D = ( X, L) = {( x, l ), =,, N} to dentfy a set of M kernel vectors KV = { v, =,, M}. (0) k = 0; e = ; KV = random _ pck( X); // Int do { ( k ) ( k ) A = KMDA( X, KV ) ; // Perform KMDA Y opt ( k) k opt ( k) ( ) = Pro( X, A ); // Proect X to ( k ) θ = Bayes( Y, L); // Bayesan Classfer L e ( k ) ( k) ( k) ( k ) = Labelng( Y, θ ); // Classfcaton ( k ) = Error( L, L) ; // Calculate error ( f ( e k < e) ) (k ) e (k) e = ; KV = KV ; k + + ; 2

14 else end } return KV; KV ( k) KV = KV break; = random _ pck({ x ( k ) ; : l ( k) l }); 4. KERNEL DEM ALGORIHM In ths paper, mage classfcaton s formulated as a transductve problem, whch s to generalze the mappng functon learned from the labeled tranng data set L to a specfc unlabeled data set U. We make an assumpton here that L and U are from the same dstrbuton. hs assumpton s reasonable because n content-based mage retreval, the query mages are drawn from the same mage database. In short, mage retreval s to classfy the mages n the database by: y = arg max p( y =,, C x, L, U : x U} (20) where C s the number of classes and y s the class label for x. he expectaton-maxmzaton (EM) [2] approach can be appled to ths transductve learnng problem, snce the labels of unlabeled data can be treated as mssng values. We assume that the hybrd data set s drawn from a mxed densty dstrbuton of C components {, =,, C}, whch are parameterzed by Θ = { θ, =,, C}. he mxture model can be represented as: c C p( x Θ) = p( x c ; θ ) p( c θ ) (2) = where x s sample drawn from the hybrd data set D = L U. We make another assumpton that each component n the mxture model corresponds to one class,.e., { y = c, =,, C}. 3

15 Snce the tranng data set D s unon of labeled data set L and unlabeled data set U, the ont probablty densty of the hybrd data set can be wrtten as: C p( D Θ) = p( c Θ) p( x c ; Θ) p( y = c Θ) p( x y = c ; Θ) (22) x U = x L he above equaton holds when we assume that each sample s ndependent to others. he frst part of equaton (22) s for the unlabeled data set, and the second part s for the labeled data set. he parameters Θ can be estmated by maxmzng a posteror probablty p( Θ D). Equvalently, ths can be done by maxmzng log( p( Θ D)). Let l ( Θ D) = log( p( Θ) p( D Θ)), and we have C log( p( c Θ) p( x c ; Θ)) + l( Θ D) = log( p( Θ)) + log( p( y = c Θ) p( x y = c ; Θ)) (23) x U = x L Snce the log of sum s hard to deal wth, a bnary ndcator z s ntroduced, z = ( z,, zc ), denoted wth observaton O : z = f and only f y = c, and z = 0 otherwse, so that C l( Θ D, Z) = log( p( Θ)) + z log( p( O Θ) p( x O ; Θ)) (24) x D = he EM algorthm can be used to estmate the probablty parameters Θ by an teratve hll clmbng procedure, whch alternatvely calculates E (Z), the expected values of all unlabeled data, and estmates the parameters Θ gven E (Z). he EM algorthm generally reaches a local maxmum of l( Θ D). As an extenson to EM algorthm, [] proposed a three step algorthm, called Dscrmnant-EM (DEM), whch loops between an expectaton step, a dscrmnant step (va MDA), and a maxmzaton step. DEM estmates the parameters of a generatve model n a dscrmnatng space. As dscussed n Secton 2.2, Kernel DEM (KDEM) s a generalzaton of DEM n whch nstead of a smple lnear transformaton to proect the data nto dscrmnant subspaces, the data s frst proected nonlnearly nto a hgh dmensonal feature space F where the data s better lnearly separated. he 4

16 nonlnear mappng φ ( ), s mplctly determned by the kernel functon, whch must be determned n advance. he transformaton from the orgnal data space X to the dscrmnatng space, whch s a lnear subspace of the feature space F, s gven by w φ( ) mplctly or A ξ explctly. A low-dmensonal generatve model s used to capture the transformed data n. C p( y Θ) = p( w φ( x ) c ; θ ) p( c θ ) (25) = Emprcal observatons suggest that the transformed data y approxmates a Gaussan n, and so n our current mplementaton, we use low-order Gaussan mxtures to model the transformed data n. Kernel DEM can be ntalzed by selectng all labeled data as kernel vectors, and tranng a weak classfer based on only labeled samples. hen, the three steps of Kernel DEM are terated untl some approprate convergence crteron: E-step: set ( k+ ) [ ; ( k Z = E Z D Θ ) ] D-step: set A k+ opt feature space F. A = arg max A A K K B W A, and proect a data pont x to a lnear subspace of A M-Step: set ( k+ ) arg max ( ; ( k+ Θ = p Θ D Z ) ) Θ he E-step gves probablstc labels to unlabeled data, whch are then used by the D-step to separate the data. As mentoned above, ths assumes that the class dstrbuton s moderately smooth. 5. EXPERIMENS AND ANALYSIS In ths secton, we compare KMDA and KDEM wth other supervsed learnng technques on varous benchmark datasets and synthetc data for several mage classfcaton tasks. 5

17 5.. Benchmark est for KMDA In the frst experment, we verfy the ablty of KMDA wth our data samplng algorthms. Several benchmark datasets 2 are used n the experments. For comparson, KMDA s compared wth a sngle RBF classfer (RBF), a support vector machne (SVM), AdaBoost, and the kernel Fsher dscrmnant (KFD) on the benchmark dataset [33] and lnear MDA. Kernel functons that have been proven useful are e.g., Gaussan RBF, k( x, z) = exp(- x - z / c), or polynomal kernels, 2 k ( x, z) = ( x z) d, for some postve constants c R and d N, respectvely [22]. In ths work, Gaussan RBF kernels are used n all kernelbased algorthms. In able, KMDArandom s KMDA wth kernel vectors randomly selected from tranng samples, KMDApca s KMDA wth kernel vectors selected from tranng samples based on PCA, KMDA evolutonary s KMDA wth kernel vectors selected from tranng samples based on evolutonary scheme. he benchmark test shows the Kernel MDA acheves comparable performance as other state-of-the-art technques over dfferent tranng datasets, n spte of the use of a decmated tranng set. Comparng three schemes of selectng kernel vectors, t s clear that both PCA-based and evolutonary-based schemes work slghtly better than the random selecton scheme by havng smaller error rate and/or smaller standard devaton. Fnally, able clearly shows that superor performance of KMDA over lnear MDA Kernel Settng here are two parameters need to be determned for kernel algorthms usng Gaussan RBF kernel. he frst s the varance c and the second s the number of kernel vectors used. ll now there s no good method on how to choose a kernel functon and ts parameter. In the second experment, we wll determne varance c and number of kernel vectors emprcally usng Gaussan RBF kernel as an example. he same benchmark dataset as n Secton 5. s used. Fgure 2 shows the average error rate n percentage of KDEM wth Gaussan RBF kernel under dfferent varance c and varyng number of kernel vectors used on Heart 6

18 data. By emprcal observaton, we fnd that 0 for c and 20 for number of kernel vectors gves nearly the best performance at a relatvely low computaton cost. Smlar results are obtaned for tests on Breast- Cancer data and Banana data. herefore ths kernel settng wll be used n the rest of our experments Image Classfcaton on Corel Photos In the thrd experment, a seres of classfcaton methods are compared n photo classfcaton scenaro. hose methods are MDA, BDA, DEM, KMDA, KBDA, and KDEM. he stock photos are selected from wdely used COREL dataset. Corel photos covers a wde range of more than 500 categores, rangng from anmals and brds to bet and the Czech Republc. In ths test, we use Corel photos consstng of 5 classes wth 99 mages n each class. Randomly selected 4 classes of mages are used n the experment. One class of mages s consdered as postve whle all the other 3 classes are consdered as negatve. he tranng set conssts of 40 to 80 mages wth half postve and half negatve mages. he rest mages n the database (485) are all used as testng set. We wll frst descrbe the vsual features used. here are three vsual features used: color moments [34], wavelet-based texture [35], and water-fllng edge-based structure feature [36]. he color space we use s HSV because of ts de-correlated coordnates and ts perceptual unformty. We extract the frst three moments (mean, standard devaton and skewness) from the three-color channels and therefore have a color feature vector of length 3 3 = 9. For wavelet-based texture, the orgnal mage s fed nto a wavelet flter bank and s decomposed nto 0 de-correlated sub-bands. Each sub-band captures the characterstcs of a certan scale and orentaton of the orgnal mage. For each sub-band, we extract the standard devaton of the wavelet coeffcents and therefore have a texture feature vector of length 0. For waterfllng edge-based structure feature vector, we frst pass the orgnal mages through an edge detector to generate ther correspondng edge map. We extract eghteen (8) elements from the edge maps, ncludng max fll tme, max fork count, etc. For a complete descrpton of ths edge feature vector, nterested readers 2 he benchmark data sets are obtaned from 7

19 are referred to [36]. In our experments, the 37 vsual features (9 color moments, 0 wavelet moments and 8 water-fllng features) are pre-extracted from the mage database and stored off-lne. able 2 shows the comparson of dfferent classfcaton methods usng MDA, BDA, DEM and ther kernel algorthms. here are several conclusons: () he classfcaton error drops wth the ncreasng sze of the tranng dataset for all methods; (2) All the kernel algorthms perform better than ther lnear algorthms. hs shows that the kernel machne has better capacty than lnear approaches to separate lnearly non-separable data. (3) KDEM performs best (comparable to KMDA for the tranng sze 80) among all the methods. hs shows that the ncorporaton of unlabeled data and nonlnear mappng contrbutes to the mproved performance. (4) DEM performs better than MDA. hs shows that ncorporaton of unlabeled data n sem-supervsed learnng n DEM helps mprove classfcaton accuracy. (5) he results for BDA and DEM are mxed but KDEM performs better than KBDA. We wll further compare KBDA and KDEM n the next secton. As to the computatonal complexty of each algorthm, able 2 also shows the actually executng tme to run each algorthm. he Matlab program was runnng on a Pentum IV processor of 2.4 GHz CPU wth 256 MB memory. KDEM s most computatonally complex among all the algorthms. But the extra computatonal complexty ntroduced by sem-supervsed learnng,.e, EM, and the nonlnear mappng s stll affordable, e.g., wthn 3 seconds, for the tranng data sze 80 and testng data sze 485. Fnally a summary of the advantages and dsadvantages of the dscrmnant algorthms for MDA, BDA, DEM, KMDA, KBDA, and KDEM s gven n able KDEM versus KBDA for Image Classfcaton As reported n [3], based dscrmnant analyss (BDA) has acheved satsfactory results n content-based mage retreval when the number of tranng samples s small (<20). BDA dffers from tradtonal MDA n that t tends to cluster all the postve samples and scatter all the negatve samples from the centrod of the postve examples. hs works very well wth relatvely small tranng set. However, BDA s based tuned 8

20 towards the centrod of the postve examples. It wll be effectve only f these postve examples are the most-nformatve mages [3, 4], e.g., mages close to the classfcaton boundary. If the postve examples are most-postve mages [3, 4], e.g., the mages far away from the classfcaton boundary. he optmal transformaton found based on the most-postve mages wll not help the classfcaton for mages on the boundary. Moreover, BDA gnores the unlabeled data and takes only the labeled data n the learnng. In the fourth experment, Kernel DEM (KDEM) s further compared wth the Kernel BDA (KBDA) on both mage database and synthetc data. Fgure 3 shows the average classfcaton error rate n percentage for KDEM and KBDA wth the same RBF kernel for face and non-face classfcaton. he face mages are from MI facal mage database 3 [37] and non-face mages are from Corel database. here are 2429 faces mages from MI database. here are 485 non-face mages (5 classes wth about 99 n each class) from Corel database. Some examples of face and non-face mages are shown n Fgure 4. For tranng dataset, 00 face mages are randomly selected from MI database, and a varyng number of 5 to 200 non-face mages are randomly selected from Corel database. he testng dataset conssts of 200 randomly mages (00 faces and 00 non-faces) from two databases. he mages are reszed to 6 6 and converted to a column-wse concatenated feature vector. In Fgure 3, when the sze of negatve examples s small (<20), KBDA outperforms KDEM whle KDEM performs better when more negatve examples are provded. hs agrees wth our expectaton. In the above experment, the sze of negatve examples s ncreased from 5 to 200. here s a possblty that most of the negatve examples are from the same class. o further test the capablty of KDEM and KBDA n classfyng negatve examples wth varyng number of classes, we perform experments on synthetc data for whch we have more controls over data dstrbuton. A seres of synthetc data s generated based on Gaussan or Gaussan mxture models wth feature dmenson of 2, 5 and 0 and varyng number of negatve classes from to 9. In the feature space, the 3 MI facal database can be downloaded from 9

21 centrod of postve samples s set at orgn and the centrods of negatve classes are set randomly wth dstance to the orgn. he varance of each class s a random number between 0. and 0.3. he features are ndependent to each other. We nclude 2D synthetc data for vsualzaton purpose. Both the tranng and testng sets have fxed sze of 200 samples wth 00 postve samples and 00 negatve samples wth varyng number of classes. Fgure 5 shows the comparson of KDEM, KBDA and DEM algorthms on 2-D, 5-D and 0-D synthetc data. In all cases, wth the ncreasng sze of negatve classes from to 9, KDEM always performs better than KBDA and DEM thus shows ts superor capablty of mult-class classfcaton. Lnear DEM has comparable performance wth KBDA on 2-D synthetc data and outperforms KBDA on 0-D synthetc data. One possble reason s that learnng s on hybrd data n both DEM and KDEM, whle only labeled data s used n KBDA. hs ndcates that proper ncorporaton of unlabeled data n semsupervsed learnng does mprove classfcaton to some extent Correlated versus Independent Features Feature ndependence s usually assumed n hgh-dmensonal feature space for tranng samples. However, ths assumpton s not always true n practce. Such an example s color correlogram [38], a feature commonly used n content-based mage retreval. In the ffth experment, we wll nvestgate how feature correlaton wll affect algorthm performance n two-class classfcaton problem such as mage retreval. wo classes of samples,.e., postve and negatve are syntheszed based on the Gaussan mxture model (C=2) wth feature dmenson 0, varance and dstance between the centrods of two classes 2. How to generatng correlated features s referred to [39] able 4 shows the average error rate wth dfferent par-wse correlaton coeffcents ρ among postve examples and negatve examples. For example, the par (0., 0.6) means the correlaton among all par- 20

22 wse feature components s 0. for the negatve examples and 0.6 for the postve examples. From table 2, the correspondng error rate of KDEM s 4.05%. It s clear when features are hghly correlated, t becomes easer for Kernel DEM algorthm to separate the postve examples from the negatve examples. Because data s more clustered wth larger correlaton. Smlar results are obtaned for lnear DEM algorthm. hs s reasonable and an llustraton example can be vsualzed n Fgure 6. c and c 2 represent the centrods of 2D two-class samples, respectvely. When the features of two classes are ndependent to each other, ther dstrbutons can be llustrated by two crcles n blue and red, respectvely. Usng dscrmnant analyss, the optmal transformaton wll be proecton onto subspace (.e., D space). However, two classes wll stll have some overlap between B and C along the proected D subspace and ths wll result n classfcaton error. But when the features of two classes are hghly correlated, the dstrbutons of two classes are llustrated by two ellpses n blue and red, respectvely. he optmal transformaton found by dscrmnant analyss wll be proecton 2. In ths case, the proected samples of two classes have no overlap and can be easly separated n D subspace Hand Posture Recognton In the sxth experment, we examne KDEM on a hand gesture recognton task. he task s to classfy among 4 dfferent hand postures, each of whch represents a gesture command model, such as navgatng, pontng, graspng, etc. Our raw dataset conssts of 4,000 unlabeled hand mages together wth 560 labeled mages (approxmately 40 labeled mages per hand posture), most from vdeo of subects makng each of the hand postures. hese 560 labeled mages are used to test the classfers by calculatng the classfcaton errors. Hands are localzed n vdeo sequences by adaptve color segmentaton and hand regons are cropped and converted to gray-level mages. Gabor wavelet [40] flters wth 3 levels and 4 orentatons are used to extract 2 texture features. 0 coeffcents from the Fourer descrptor of the occludng contour are used to represent hand shape. We also use area, contour length, total edge length, densty, and 2 nd moments of edge 2

23 dstrbuton, for a total of 28 low-level mage features (I-feature). For comparson, we represent mages by coeffcents of 22 largest prncpal components of the dataset reszed to pxels (these are egenmages, or E-features). In out experments, we use 40,.e., 0 for each hand posture, and 0,000 (randomly selected from the whole database) labeled and unlabeled mages respectvely, for tranng both EM and DEM. able 5 shows the comparson. Sx classfcaton algorthms are compared n ths experment. he mult-layer perceptron [4] used n ths experment has one hdden layer of 25 nodes. We experment wth two schemes of the nearest neghbor classfer. One uses ust 40 labeled samples, and the other uses these 40 labeled samples to bootstrap the classfer by a growng scheme, n whch newly labeled samples wll be added to the classfer accordng to ther labels. We observe that mult-layer perceptrons are often trapped n local mnma and nearest neghbors suffers from the sparsty of the labeled templates. he poor performance of pure EM s due to the fact that the generatve model does not capture the ground-truth dstrbuton well, snce the underlyng data dstrbuton s hghly complex. It s not surprsng that lnear DEM and KDEM outperform other methods, snce the D-step optmzes separablty of the class. Comparng KDEM wth DEM, we fnd KDEM often appears to proect classes to approxmately Gaussan clusters n the transformed spaces, whch facltate ther modelng wth Gaussans. Fgure 7 shows typcal transformed data sets for lnear and nonlnear dscrmnant analyss, n a proected 2D subspaces of three dfferent hand postures. Dfferent postures are more separated and clustered n the nonlnear subspace by KMDA. Fgure 8 shows some examples of correctly classfed and mslabeled hand postures for KDEM and lnear DEM. 6. CONCLUSIONS wo samplng schemes are proposed for effcent, kernel-based, nonlnear, multple dscrmnant analyss. hese algorthms dentfy a representatve subset of the tranng samples for the purpose of classfcaton. 22

24 Benchmark tests show that KMDA wth these adaptatons not only outperforms the lnear MDA but also performs comparably wth the best known supervsed learnng algorthms. We also present a selfsupervsed dscrmnant analyss technque, Kernel DEM (KDEM), whch employs both labeled and unlabeled data n tranng. On real mage database and synthetc data for several applcatons of mage classfcaton, KDEM shows the superor performance over based dscrmnant analyss (BDA), na ve supervsed learnng and exstng sem-supervsed learnng algorthms. Our future work ncludes several aspects. () We wll look nto advanced regularzaton schemes for the dscrmnant-based approaches; (2) We wll ntellgently ntegrate based dscrmnant analyss wth multple dscrmnant analyss n a unfed framework so that ther advantages can be utlzed and dsadvantages can be compensated by each other. (3) o avod the heavy computaton over the whole database, we wll nvestgate schemes of selectng a representatve subset of unlabeled data whenever unlabeled data helps. (4) Gaussan or Gaussan mxture models are assumed for data dstrbuton n the proected optmal subspace, even when the ntal data dstrbuton s hghly non-gaussan. We wll examne the data modelng ssue more closely wth Gaussan (or Gaussan mxture) and non-gaussan dstrbutons. ACKNOWLEDGEMEN hs work was supported n part by Unversty of exas at San Antono, and by Natonal Scence Foundaton (NSF) under the grant IIS n Northwestern Unversty. References. A. Smeulder, M. Worrng, S. Santn, A. Gupta, and R. Jan, Content-based mage retreval at the end of the early years, IEEE rans. PAMI, pp , December K. eu, and P. Vola, Boostng mage retreval, Proc. of IEEE Conf. CVPR, June

25 3. I. J. Cox, M. L. Mller,. P. Mnka, and. V. Papsthomas, he Bayesan mage retreval system, PcHunter: theory, mplementaton, and psychophyscal experments, IEEE rans. Image Processng, vol. 9, no., pp , S. ong, and E. Wang, Support vector machne actve learnng for mage retreval, Proc. of ACM Intl. Conf. Multmeda, pp. 07-8, Ottawa. Canada, Oct Q. an, P. Hong, and. S. Huang, Update relevant mage weghts for content-based mage retreval usng support vector machnes, Proc. of IEEE Intl Conf. Multmeda and Expo, vol. 2, pp ,, NY, Y. Ru, and. S. Huang, Optmal learnng n mage retreval, Proc. IEEE Conf. Computer Vson and Pattern Recognton, vol., pp , South Carolna, June N. Vasconcelos, and A. Lppman, Bayesan relevance feedback for content-based mage retreval, Proc. of IEEE Workshop on Content-based Access of Image and Vdeo Lbrares, CVPR00, Hlton Head Island, June C. Yang, and. Lozano-PØrez, Image database retreval wth multple-nstance learnng technques, Proc. of the 6 th Intl Conf. Data Engneerng, Santa Dego, CA, J. Laakaonen, M. Koskela, and E. Oa, PcSOM: self-organzng maps for content-based mage retreval, Proc. IEEE Intl Conf. Neural Networks, Washngton, DC, Y. Ru,. S. Huang, M. Ortega, and S. Mehrotra, Relevance feedback: a power tool n nteractve content-based mage retreval, IEEE rans Crcuts and Systems for Vdeo ech., vol. 8, no. 5, pp , Sept Y. Wu, Q. an, and. S. Huang, Dscrmnant EM algorthm wth applcaton to mage retreval, Proc. of IEEE Conf. Computer Vson and Pattern Recognton, South Carolna, June 3-5, R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classfcaton, 2 nd edton, John Wley & Sons, Inc., X. Zhou, and.s. Huang, Small sample learnng durng multmeda retreval usng basmap, Proc. of IEEE Conf. Computer Vson and Pattern Recognton, Hawa, December V. Vapnk, he nature of statstcal learnng theory, second edton, Sprnger-Verlag, F. G. Cozman, and I. Cohen, Unlabeled data can degrade classfcaton performance of generatve classfers, Proc. of 5 th Intl Florda Artfcal Intell. Socety Conf., pp , Florda,

26 6. K. Ngram, A. K. McCallum, S. hrun, and. M. Mtchell, ext classfcaton from labeled and unlabeled documents usng EM, Machne Learnng, 39(2/3):03-34, Mtchell, he role of unlabeled data n supervsed learnng, Proc. Sxth Intl Colloquum on Cogntve Scence, Span, Cohen, N. Sebe, F. G. Cozman, M. C. Crelo, and. S. Huang, Learnng Bayesan network classfers for facal expresson recognton wth both labeled and unlabeled data, Proc. of IEEE Conf. Computer Vson and Pattern Recognton, Madson, WI, L. Wang, K. L. Chan, and Z. Zhang, Bootstrappng SVM actve learnng by ncorporatng unlabelled mages for mage retreval, Proc. of IEEE Conf. CVPR, Madson, WI, Y. Wu, and. S. Huang, Self-supervsed learnng for obect recognton based on kernel dscrmnant-em algorthm, Proc. of IEEE Internatonal Conf. on Computer Vson, Canada, Q. an, J. Yu, Y. Wu, and.s. Huang, Learnng based on kernel dscrmnant-em algorthm for mage classfcaton, IEEE Intl Conf. on Acoustcs, Speech, and Sgnal Processng (ICASSP2004), May 7-24, 2004, Montreal, Quebec, Canada. 22. B. Sch lkopf and A. J. Smola, Learnng wth Kernels. Mass: MI Press, L. Wolf, and A. Shashua, Kernel prncple for classfcaton machnes wth applcatons to mage sequence nterpretaton, Proc. of IEEE Conf. CVPR, Madson, WI, S. Mka, G. R tsch, J. Weston, B. Sch lkopf, A. Smola, and K. M ller, Constructng descrptve and dscrmnatve nonlnear features: Raylegh coeffcents n kernel feature spaces, IEEE rans. Pattern Analyss and Machne Intellgence, Vol. 25, No. 5, May R. A. Fsher, he use of multple measurement n taxonomc problems, Annals of Eugencs, vol. 7, pp.79-88, K. I. Damantaras and S. Y. Kung, Prncpal Component Neural Networks, New York: Wley, A. M. Martnez, and A. C. Kak, PCA versus LDA, IEEE rans. Pattern Analyss and Machne Intellgence, Vol. 23, No. 2, pp , February R. A. Fsher, he statstcal utlzaton of multple measurements, Annals of Eugencs, Vol. 8, pp , B. Sch lkopf, A. Smola, and K. R. M ller, Nonlnear component analyss as a kernel egenvalue problem, Neural Computaton, vol. 0, pp ,

27 30. S. Mka, G. R tsch, J. Weston, B. Sch lkopf, A. Smola, and K. R. M ller, Invarant feature extracton and classfcaton n kernel spaces, Proc. of Neural Informaton Processng Systems, Denver, Nov V. Roth, and V. Stenhage, Nonlnear dscrmnant analyss usng kernel functons, Proc. of Neural Informaton Processng Systems, Denver, Nov J. Fredman, Regularzed dscrmnant analyss, Journal of Amercan Statstcal Assocaton, vol. 84, no. 405, pp , S. Mka, G. R tsch, J. Weston, B. Sch lkopf, A. Smola, and K. M ller, Fsher dscrmnant analyss wth Kernels, Proc. of IEEE Workshop on Neural Networks for Sgnal Processng, M. Strcker, and M. Orengo, Smlarty of color mages, Proceedngs of. SPIE Storage and Retreval for Image and Vdeo Databases, San Dego, CA, J. R. Smth, S. F. Chang, ransform features for texture classfcaton and dscrmnaton n large mage database, Proceedngs of IEEE Intl Conference on Image Processng, Austn, X, X. S. Zhou, Y. Ru, and. S. Huang, Water-fllng algorthm: a novel way for mage feature extracton based on edge maps, Pros. of IEEE Intl Conf. on Image Processng, Kobe, Japan, CBCL Face Database #, MI Center For Bologcal and Computaton Learnng download from J. Huang, S. R. Kumar, M. Mtra, W.J. Zhu, and R. Zabh, Image ndexng usng color correlogram, Proc. of IEEE Conf. CVPR, pp , June M. Hugh, Generatng correlated random varables and stochastc process, Lecture Notes 5, Monte Carlo Smulaton IEOR E4703, Columba Unversty, March A. K. Jan, and F. Farrokna, Unsupervsed texture segmentaton usng Gabor flters, Pattern Recognton, Vol. 24, No. 2, pp , S. Haykn, Neural Networks: A Comprehensve Foundaton, second edton, Prentce Hall, New Jersey,

Note: he fgures n ths paper are better vewed n color under http://www.cs.utsa.

(a) Orgnal data (b) he kernel features of the data (c) he normalzed coeffcents of PCA on Ξ, n whch

c5 c 0 c 20 c 40 c 60 c 80 c 00 45 40 35 30 25 20 5 5 0 20 40 60 80 00 Number of Kernel Vectors

28 Note: he fgures n ths paper are better vewed n color under (a) (b) (c) (d) Fgure. KMDA wth a 2D two-class nonlnear-separable example. (a) Orgnal data (b) he kernel features of the data (c) he normalzed coeffcents of PCA on Ξ, n whch only a small number of them are large (n black) (d) he nonlnear mappng Error Rate n Percentage 50 c c5 c 0 c 20 c 40 c 60 c 80 c Number of Kernel Vectors Fgure 2. he average error rate for KDEM wth Gaussan RBF kernel under varyng varance c and number of kernel vectors on Heart data 27

Error Rate 20 8 6 4 2 0 8 6 4 2 0 5 0 20 40 60 80 00 200 Number of Negtve Examples

29 Error Rate Number of Negtve Examples KBDA KDEM Fgure 3. Comparson of KDEM and KBDA for face and non-face classfcaton (a) Face mages from MI facal database (b) Non-face mages from Corel database Fgure 4. Examples of (a) face mages from MI facal database and (b) non-face mages from Corel database 28

30 Error Rate Number of Classes of Negtve Examples DEM KBDA KDEM (a) Error rate on 2-D synthetc data Error Rate Number of Classes of Negtve Examples DEM KBDA KDEM (b) Error rate on 5-D synthetc data Error Rate Number of Classes of Negtve Examples DEM KBDA KDEM (c) Error rate on 0-D synthetc data Fgure 5. Comparson of KDEM, KBDA and DEM algorthms on (a) 2-D (b) 5-D (c) 0-D synthetc data wth varyng number of negatve classes 29

31 Fgure 6. An llustraton example of the optmal D subspace proecton of correlated and ndependent features for 2D two-class samples (a) (b) Fgure 7. Data dstrbuton n the proected subspace (a) Lnear MDA (b) Kernel MDA. Dfferent postures are more separated and clustered n the nonlnear subspace by KMDA. 30

32 (a) (b) (c) Fgure 8. (a) Some correctly classfed mages by both DEM and KDEM (b) mages that are mslabeled by DEM, but correctly labeled by KDEM (c) mages that nether DEM nor KDEM can correctly label. 3

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components