Probabilistic Visual Learning for Object Representation

Size: px

Start display at page:

Download "Probabilistic Visual Learning for Object Representation"

Allen Jenkins
5 years ago
Views:

1 696 IEEE TRASACTIOS O PATTER AALYSIS AD ACHIE ITELLIGECE, VOL. 9, O. 7, JULY 997 Probablstc Vsual Learnng for Object Representaton Babac oghaddam, Student ember, IEEE, and Alex Pentland, ember, IEEE Abstract We present an unsupervsed technque for vsual learnng, whch s based on densty estmaton n hgh-dmensonal spaces usng an egenspace decomposton. Two types of densty estmates are derved for modelng the tranng data: a multvarate Gaussan (for unmodal dstrbutons) and a xture-of-gaussans model (for multmodal dstrbutons). These probablty denstes are then used to formulate a maxmum-lelhood estmaton framewor for vsual search and target detecton for automatc object recognton and codng. Our learnng technque s appled to the probablstc vsual modelng, detecton, recognton, and codng of human faces and nonrgd objects, such as hands. Index Terms Face recognton, gesture recognton, target detecton, subspace methods, maxmum-lelhood, densty estmaton, prncpal component analyss, Egenfaces. ITRODUCTIO V ISUAL attenton s the process of restrctng hgher-level processng to a subset of the vsual feld, referred to as the focus-of-attenton (FOA). The crtcal component of vsual attenton s the selecton of the FOA. In humans, ths process s not based purely on bottom-up processng and s, n fact, goal-drven. The measure of nterest or salency s modulated by the behavoral state and the demands of the partcular vsual tas that s currently actve. Palmer [25] has suggested that vsual attenton s the process of locatng the object of nterest and placng t n a canoncal (or object-centered) reference frame sutable for recognton (or template matchng). We have developed a computatonal technque for automatc object recognton, whch s n accordance wth Palmer s model of vsual attenton (see Secton 4.). The system uses a probablstc formulaton for the estmaton of the poston and scale of the object n the vsual feld and remaps the FOA to an object-centered reference frame, whch s subsequently used for recognton and verfcaton. At a smple level, the underlyng mechansm of attenton durng a vsual search tas can be based on a spatotopc salency map S(, j), whch s a functon of the mage nformaton n a local regon R S(, j) = f[{i( + r, j + c) : (r, c) Œ R}] () For example, salency maps have been constructed whch employ spato-temporal changes as cues for foveaton [] or other low-level mage features, such as local symmetry for detecton of nterest ponts [32]. However, bottom-up technques based on low-level features lac context wth respect to hgh-level vsual tass, such as object recognton. In a recognton tas, the selecton of the FOA s drven by The authors are wth the Perceptual Computng Secton, The eda Laboratory, assachusetts Insttute of Technology, 20 Ames Street, Cambrdge, A E-mal: {babac, sandy}@meda.mt.edu. anuscrpt receved 3 ov Recommended for acceptance by J. Daugman. For nformaton on obtanng reprnts of ths artcle, please send e-mal to: transpam@computer.org, and reference IEEECS Log umber hgher-level goals and, therefore, requres nternal representatons of an object s appearance and a means of comparng canddate objects n the FOA to the stored object models. In vew-based recognton (as opposed to 3D geometrc or nvarant-based recognton), the salency can be formulated n terms of vsual smlarty, usng a varety of metrcs rangng from smple template matchng scores to more sophstcated measures usng, for example, robust statstcs for mage correlaton [5]. In ths paper, however, we are prmarly nterested n salency maps whch have a probablstc nterpretaton as object-class membershp functons or lelhoods. These lelhood functons are learned by applyng densty estmaton technques n complementary subspaces obtaned by an egenvector decomposton. Our approach to ths learnng problem s vew-based.e., the learnng and modelng of the vsual appearance of the object from a (sutably normalzed and preprocessed) set of tranng magery. Fg. shows examples of the automatc selecton of FOA for detecton of faces and hands. In each case, the target object s probablty dstrbuton was learned from tranng vews and then subsequently used n computng lelhoods for detecton. The face representaton s based on appearance (normalzed grayscale mage), whereas the hand s representaton s based on the shape of ts contour. The maxmum lelhood (L) estmates of poston and scale are shown n the fgure by the cross-hars and boundng box, respectvely.. Object Detecton The standard detecton paradgm n mage processng s that of normalzed correlaton or template matchng. However, ths approach s only optmal n the smplstc case of a determnstc sgnal embedded n addtve whte Gaussan nose. When we begn to consder a target class detecton problem e.g., fndng a generc human face or a human hand n a scene we must ncorporate the underlyng probablty dstrbuton of the object. Subspace methods /97/$ IEEE

OGHADDA AD PETLAD: PROBABILISTIC VISUAL LEARIG FOR OBJECT REPRESETATIO 697 problem of target detecton from the pont of vew of a maxmum lelhood (L) estmaton problem.

2 OGHADDA AD PETLAD: PROBABILISTIC VISUAL LEARIG FOR OBJECT REPRESETATIO 697 problem of target detecton from the pont of vew of a maxmum lelhood (L) estmaton problem. Specfcally, gven the vsual feld, we estmate the poston (and scale) of the mage regon whch s most representatve of the target of nterest. Computatonally, ths s acheved by sldng an m-by-n observaton wndow throughout the mage and at each locaton computng the lelhood that the local submage x s an nstance of the target class W.e., P(x W). After ths probablty map s computed, we select the locaton correspondng to the hghest lelhood as our L estmate of the target locaton. ote that the lelhood map can be evaluated over the entre parameter space affectng the object s appearance, whch can nclude transformatons such as scale and rotaton. (c) Fg.. nput mage, face detecton, (c) nput mage, (d) hand detecton. and egenspace decompostons are partcularly well-suted to such a tas snce they provde a compact and parametrc descrpton of the object s appearance and also automatcally dentfy the degrees-of-freedom of the underlyng statstcal varablty. In partcular, the egenspace formulaton leads to a powerful alternatve to standard detecton technques such as template matchng or normalzed correlaton. The reconstructon error (or resdual) of the egenspace decomposton (referred to as the dstance-from-face-space n the context of the wor wth egenfaces [36]) s an effectve ndcator of smlarty. The resdual error s easly computed usng the projecton coeffcents and the orgnal sgnal energy. Ths detecton strategy s equvalent to matchng wth a lnear combnaton of egentemplates and allows for a greater range of dstortons n the nput sgnal (ncludng lghtng, and moderate rotaton and scale). In a statstcal sgnal detecton framewor, the use of egentemplates has been shown to yeld superor performance n comparson wth standard matched flterng [8], [27]. In [27], we used ths formulaton for a modular egenspace representaton of facal features where the correspondng resdual referred to as dstance-from-feature-space or DFFS was used for localzaton and detecton. Gven an nput mage, a salency map was constructed by computng the DFFS at each pxel. When usng egenvectors, ths requres convolutons (whch can be effcently computed usng an FFT) plus an addtonal local energy computaton. The global mnmum of ths dstance map was then selected as the best estmate of the locaton of the target. In ths paper, we wll show that the DFFS can be nterpreted as an estmate of a margnal component of the probablty densty of the object and that a complete estmate must also ncorporate a second margnal densty based on a complementary dstance-n-feature-space (DIFS). Usng our estmates of the object denstes, we formulate the (d).2 Relatonshp to Prevous Research In recent years, computer vson research has wtnessed a growng nterest n egenvector analyss and subspace decomposton methods. In partcular, egenvector decomposton has been shown to be an effectve tool for solvng problems whch use hgh-dmensonal representatons of phenomena whch are ntrnscally low-dmensonal. Ths general analyss framewor lends tself to several closely related formulatons n object modelng and recognton, whch employ the prncpal modes or characterstc degrees-offreedom for descrpton. The dentfcaton and parametrc representaton of a system n terms of these prncpal modes s at the core of recent advances n physcally-based modelng [26], correspondence and matchng [34], and parametrc descrptons of shape [7]. Egenvector-based methods also form the bass for data analyss technques n pattern recognton and statstcs where they are used to extract low-dmensonal subspaces comprsed of statstcally uncorrelated varables whch tend to smplfy tass such as classfcaton. The Karhunen- Loeve Transform (KLT) [9] and Prncpal Components Analyss (PCA) [4] are examples of egenvector-based technques whch are commonly used for dmensonalty reducton and feature extracton n pattern recognton. In computer vson, egenvector analyss of magery has been used for characterzaton of human faces [7] and automatc face recognton usng egenfaces [36], [27]. ore recently, prncpal component analyss of magery has also been appled for robust target detecton [27], [6], nonlnear mage nterpolaton [3], vsual learnng for object recognton [23], [38], as well as vsual servong for robotcs [24]. Specfcally, urase and ayar [23] used a lowdmensonal parametrc egenspace for recoverng object dentty and pose by matchng vews to a splne-based hypersurface. ayar et al. [24] have extended ths technque to vsual feedbac control and servong for a robotc arm n peg-n-thehole nserton tass. Pentland et al. [27] proposed a vewbased multple-egenspace technque for face recognton under varyng pose as well as for the detecton and descrpton of facal features. Smlarly, Burl et al. [6] used Bayesan classfcaton for object detecton usng a feature vector derved from prncpal component mages. Weng [38] has proposed a vsual learnng framewor based on the KLT n conjuncton wth an optmal lnear dscrmnant transform for learnng and recognton of objects from 2D vews.

3 698 IEEE TRASACTIOS O PATTER AALYSIS AD ACHIE ITELLIGECE, VOL. 9, O. 7, JULY 997 However, these authors (wth the excepton of [27]) have used egenvector analyss prmarly as a dmensonalty reducton technque for subsequent modelng, nterpolaton, or classfcaton. In contrast, our method uses an egenspace decomposton as an ntegral part of an effcent technque for probablty densty estmaton of hghdmensonal data. 2 DESITY ESTIATIO I EIGESPACE In ths secton, we present our recent wor usng egenspace decompostons for object representaton and modelng. Our learnng method estmates the complete probablty dstrbuton of the object s appearance usng an egenvector decomposton of the mage space. The desred target densty s decomposed nto two components: the densty n the prncpal subspace (contanng the tradtonally-defned prncpal components) and ts orthogonal complement (whch s usually dscarded n standard PCA). We derve the form for an optmal densty estmate for the case of Gaussan data and a near-optmal estmator for arbtrarly complex dstrbutons n terms of a xture-of- Gaussans densty model. We note that ths learnng method dffers from supervsed vsual learnng wth functon approxmaton networs [30] n whch a hypersurface representaton of an nput/output map s automatcally learned from a set of tranng examples. Instead, we use a probablstc formulaton whch combnes the two standard paradgms of unsupervsed learnng PCA and densty estmaton to arrve at a computatonally feasble estmate of the class condtonal densty functon. Specfcally, gven a set of tranng mages { x t } T t=, from an object class W, we wsh to estmate the class membershp or lelhood functon for ths data.e., P(x W). In ths secton, we examne two densty estmaton technques for vsual learnng of hgh-dmensonal data. The frst method s based on the assumpton of a Gaussan dstrbuton whle the second method generalzes to arbtrarly complex dstrbutons usng a xture-of-gaussans densty model. Before ntroducng these estmators, we brefly revew egenvector decomposton as commonly used n PCA. 2.2 Prncpal Component Imagery Gven a tranng set of m-by-n mages { I t } T t=, we can form a tranng set of vectors {x t }, where x Œ 5 =mn, by lexcographc orderng of the pxel elements of each mage I t. The bass functons for the KLT [9] are obtaned by solvng the egenvalue problem L = F T SF (2) where S s the covarance matrx, F s the egenvector matrx of S, and L s the correspondng dagonal matrx of egenvalues. The untary matrx F defnes a coordnate transform (rotaton) whch decorrelates the data and maes explct the nvarant subspaces of the matrx operator S. In PCA, a partal KLT s performed to dentfy the largestegenvalue egenvectors and obtan a prncpal component feature vector y =F T~ x, where ~ x = x - x s the meannormalzed mage vector and F s a submatrx of F contanng the prncpal egenvectors. PCA can be seen as a lnear transformaton y = 7(x) : 5 Æ 5 whch extracts a lower-dmensonal subspace of the KL bass correspondng to the maxmal egenvalues. These prncpal components preserve the major lnear correlatons n the data and dscard the mnor ones. By ranng the egenvectors of the KL expanson wth respect to ther egenvalues and selectng the frst prncpal components, we form an orthogonal decomposton of the vector space 5 nto two mutually exclusve and complementary subspaces: the prncpal subspace (or feature space) F = { F} = contanng the prncpal components and ts orthogonal complement F = { F} = +. Ths orthogonal decomposton s llustrated n Fg. 2a, where we have a prototypcal example of a dstrbuton whch s embedded entrely n F. In practce there s always a sgnal component n F due to the mnor statstcal varabltes n the data or smply due to the observaton nose whch affects every element of x. Fg. 2. Decomposton nto the prncpal subspace F and ts orthogonal complement F for a Gaussan densty. A typcal egenvalue spectrum and ts dvson nto the two orthogonal subspaces.. In practce, the number of tranng mages T s far less than the dmensonalty of the magery, consequently, the covarance matrx S s sngular. However, the frst < T egenvectors can always be computed (estmated) from t samples usng, for example, a Sngular Value Decomposton [2].

4 OGHADDA AD PETLAD: PROBABILISTIC VISUAL LEARIG FOR OBJECT REPRESETATIO 699 In a partal KL expanson, the resdual reconstructon error s defned as e af x = y = ~ x - y (3) Â Â = + = and can be easly computed from the frst prncpal components and the L 2 norm of the mean-normalzed mage ~ x. Consequently the L 2 norm of every element x Œ 5 can be decomposed n terms of ts projectons n these two subspaces. We refer to the component n the orthogonal subspace F as the dstance-from-feature-space (DFFS) whch s a smple Eucldean dstance and s equvalent to the resdual error e 2 (x) n (3). The component of x whch les n the feature space F s referred to as the dstance-n-featurespace (DIFS) but s generally not a dstance-based norm, but can be nterpreted n terms of the probablty dstrbuton of y n F. 2.2 Gaussan Denstes We begn by consderng an optmal approach for estmatng hgh-dmensonal Gaussan denstes. We assume that we have (robustly) estmated the mean x and covarance S of the dstrbuton from the gven tranng set {x t }. 2 Under ths assumpton, the lelhood of an nput pattern x s gven by L T - - ax - xf S ax - xfo exp Pdx W = 2 QP (4) 2 2 a2pf S The suffcent statstc for characterzng ths lelhood s the ahalanobs dstance T daf x = ~ - x S ~ x (5) where ~ x = x - x. However, nstead of evaluatng ths quadratc product explctly, a much more effcent and robust computaton can be performed, especally wth regard to the matrx nverse S -. Usng the egenvectors and egenvalues of S we can rewrte S - n the dagonalzed form T daf x = ~ - x S ~ x T - T = ~ x FL F ~ x - = y T L y (6) where y =F T~ x are the new varables obtaned by the change of coordnates n a KLT. Because of the dagonalzed form, the ahalanobs dstance can also be expressed n terms of the sum af = d x y Â 2 = l (7) In the KLT bass, the ahalanobs dstance n (5) s convenently decoupled nto a weghted sum of uncorrelated component energes. Furthermore, the lelhood becomes a product of ndependent separable Gaussan denstes. Despte ts smpler form, evaluaton of (7) s stll computatonally nfeasble due to the hgh-dmensonalty. We therefore see to estmate d(x) usng only projectons. Intutvely, an obvous choce for a lower-dmensonal representaton s the prncpal subspace ndcated by PCA, whch captures the major degrees of statstcal varablty n the data. 3 Therefore, we dvde the summaton nto two ndependent parts correspondng to the prncpal subspace F { } = F = + and ts orthogonal complement F { } 2 2 y y daf x = Â + l Â l = = + = F = We note that the terms n the frst summaton can be computed by projectng x onto the -dmensonal prncpal subspace F. The remanng terms n the second sum { y } = +, however, cannot be computed explctly n practce because of the hgh-dmensonalty. However, the sum of these terms s avalable and s n fact the DFFS quantty e 2 (x) whch can be computed from (3). Therefore, based on the avalable terms, we can formulate an estmator for d(x) as follows 2 y $d af L x = + O 2 Â y l r Â = = + QP = Â y 2 2 eaf x + (9) l r = where the term n the bracets s e 2 (x), whch as we have seen can be computed usng the frst prncpal components. We can therefore wrte the form of the lelhood estmate based on d $ ( x ) as the product of two margnal and ndependent Gaussan denstes d P$ x W = L F IO exp H G JP a2pf QP Fdx W x W Fd L F G afio J QP 2 2 y e x - exp - 2Â l K 2r = H K b g a - f P 2 l pr = (8) = P P $ (0) where P F (x W) s the true margnal densty n F-space and $P F dx W s the estmated margnal densty n the orthogonal complement F -space. The optmal value of r can now be determned by mnmzng a sutable cost functon J(r). From an nformaton-theoretc pont of vew, ths cost functon should be the Kullbac-Lebler dvergence or relatve entropy [9] between the true densty P(x W) and ts estmate $P x W d 2. In practce, a full ran -dmensonal covarance S can not be estmated from T ndependent observatons when T <. But, as we shall see, our estmator does not requre the full covarance, but only ts frst prncpal egenvectors where < T. 3. We wll see shortly that, gven the typcal egenvalue spectra observed n practce (e.g., Fg. 2b), ths choce s optmal for a dfferent reason: It mnmzes the nformaton-theoretc dvergence between the true densty and our estmate of t.

5 700 IEEE TRASACTIOS O PATTER AALYSIS AD ACHIE ITELLIGECE, VOL. 9, O. 7, JULY 997 P P Jbg r Pd dx W L dx = d = O z x W log x E log Pd P () $ x W $ dx WQP Usng the dagonalzed forms of the ahalanobs dstance d(x) and ts estmate $ 2 d( x ) and the fact that E[ y ] = l, t can be easly shown that l r Jbg L r = O Â - + log 2 r lp (2) = + The optmal weght r* can be then found by mnmzng ths cost functon wth respect to r. Solvng the equaton J = 0 yelds r r* = l - Â = + Q (3) whch s smply the arthmetc average of the egenvalues n the orthogonal subspace F. 4 In addton to ts optmalty, r* also results n an unbased estmate of the ahalanobs dstance.e., E[ d $ ( x; r *)] = E[ d( x)]. Ths dervaton shows that once we select the -dmensonal prncpal subspace F (as ndcated, for example, by PCA), the optmal estmate of the suffcent statstc d $ ( x ) wll have the form of (9) wth r gven by (3). In actual practce of course, we only have the frst T - of the egenvalues n Fg. 2b and, consequently, the remander of the spectrum must be estmated by fttng a (typcally /f) functon to the avalable egenvalues, usng the fact that the very last egenvalue s smply the estmated pxel nose varance. The value of r* can then be estmated from the extrapolated porton of the spectrum. It s nterestng to consder the mnmal cost J(r*) b g = r * J r * Â log 2 l = + (4) from the pont of vew of the F -space egenvalues {l : = +,, }. It s easy to show that J(r*) s mnmzed when the the F -space egenvalues have the least spread about ther mean r*. Ths suggests a strategy for selectng the prncpal subspace: choose F such that the egenvalues assocated wth ts orthogonal complement F have the least absolute devaton about ther mean. In practce, the hgherorder egenvalues typcally decay and stablze near the observaton nose varance. Therefore, ths strategy s usually consstent wth the standard PCA practce of dscardng the hgher-order components snce these tend to correspond to the flattest porton of the egenvalue spectrum (see Fg. 2b). In the lmt, as the F -space egenvalues become exactly equal, the dvergence J(r*) wll be zero and our densty estmate P $ dx W approaches the true densty P(x W). We note that, n most applcatons, t s customary to smply dscard the F -space component and smply wor wth P F (x W). However, the use of the DFFS metrc or equvalently the margnal densty P F dx W s crtcally mportant n formulatng the lelhood of an observaton x especally n an object detecton tas snce there are an nfnty of vectors whch are not members of W whch can have lely F-space projectons. Wthout P F dx W a detecton system can result n a sgnfcant number of false alarms. 2.3 ultmodal Denstes In the prevous secton we assumed that probablty densty of the tranng mages was Gaussan. Ths lead to a lelhood estmate n the form of a product of two ndependent multvarate Gaussan dstrbutons (or equvalently the sum of two ahalanobs dstances: DIFS + DFFS). In our experence, the dstrbuton of samples n the feature space s often accurately modeled by a sngle Gaussan dstrbuton. Ths s especally true n cases where the tranng mages are accurately algned vews of smlar objects seen from a standard vew (e.g., algned frontal vews of human faces at the same scale and orentaton). However, when the tranng set represents multple vews or multple objects under varyng llumnaton condtons, the dstrbuton of tranng vews n F-space s no longer unmodal. In fact the tranng data tends to le on complex and nonseparable low-dmensonal manfolds n mage space. One way to tacle ths multmodalty s to buld a vew-based (or object-based) formulaton where separate egenspaces are used for each vew [27]. Another approach s to capture the complexty of these manfolds n a unversal or parametrc egenspace usng splnes [23], or local bass functons [3]. If we assume that the F -space components are Gaussan and ndependent of the prncpal features n F (ths would be true n the case of pure observaton nose n F ) we can stll use the separable form of the densty estmate P $ x W n (0) where P F (x W) s now an arbtrary densty P(y) n the prncpal component vector y. Fg. 3 llustrates the decomposton, where the DFFS s the resdual e 2 (x) as before. The DIFS, however, s no longer a smple ahalanobs dstance but can nevertheless be nterpreted as a dstance by relatng t to P(y) e.g., as DIFS = -log P(y). d 4. Cootes et al. [8] have used a smlar decomposton of the ahalanobs dstance but nstead use an ad-hoc parameter value of r = l as an 2 + approxmaton. Fg. 3. Decomposton nto the prncpal subspace F and ts orthogonal complement F for an arbtrary densty.

6 OGHADDA AD PETLAD: PROBABILISTIC VISUAL LEARIG FOR OBJECT REPRESETATIO 70 The densty P(y) can be estmated usng a parametrc mxture model. Specfcally, we can model arbtrarly complex denstes usng a xture-of-gaussans c Pdy Q = Â p gcy ; m, Sh (5) = where g(y; m, S) s an -dmensonal Gaussan densty wth mean vector m and covarance S, and the p are the mxng parameters of the components, satsfyng Âp =. The mxture s completely specfed by the parameter Q = { p, m, S} c =. Gven a tranng set { y t } T t=, the mxture parameters can be estmated usng the L prncple L T Q* = argmax P y t Q t= e O j QP (6) Ths estmaton problem s best solved usng the Expectaton-axmzaton (E) algorthm [], whch conssts of the followng two-step teratve procedure: E-step: -step: + S = haf t = T Â t= c Â j= t e e j t j p g y ; m, S j j p g y ; m, S af T Âh t + t= p = c T ÂÂh = t= T Âh + t= m = T Âh t= af t af t y af t t + t + T afe t y - m jey - m j h T Â t= haf t t (7) (8) (9) (20) The E-step computes the a posteror probabltes h (t) whch are the expectatons of mssng component labels z (t) = {0, }, whch denote the membershp of y t n the th component. Once these expectatons have been computed, the -step maxmzes the jont lelhood of the data and the mssng varables z (t). The E algorthm s monotoncally convergent n lelhood and s thus guaranteed to fnd a local maxmum n the total lelhood of the tranng set. Further detals of the E algorthm for estmaton of mxture denstes can be found n [3]. Gven our operatng assumptons that the tranng data s -dmensonal (at most) and resdes solely n the prncpal subspace F wth the excepton of perturbatons due to whte Gaussan measurement nose, or, equvalently, that the F -space component of the data s tself a separable Gaussan densty the estmate of the complete lelhood functon P(x W) s gven by d d d P$ x W = P y Q * P $ F x W (2) where P $ F dx W s a Gaussan component densty based on the DFFS, as before. 3 AXIU LIKELIHOOD DETECTIO The densty estmate P $ dx W can be used to compute a local measure of target salency at each spatal poston (, j) n an nput mage based on the vector x obtaned by the lexcographc orderng of the pxel values n a local neghborhood R b g d S, j ; W = P $ x W (22) for x = Ø [{I( + r, j + c) : (r, c) Œ R}], where Ø[ ] s the operator whch converts a submage nto a vector. The L estmate of poston of the target W s then gven by (, j) L = argmax S(, j; W) (23) Ths L formulaton can be extended to estmate object scale wth multscale salency maps. The lelhood computaton s performed (n parallel) on lnearly scaled versons of the nput mage I for a predetermned set of scales. The L estmate of the spatal and scale ndces s then defned by 4 APPLICATIOS (, j, ) L = argmax S(, j, ; W) (24) The above L detecton technque has been tested n the detecton of complex natural objects ncludng human faces, facal features (e.g., eyes), and nonrgd artculated objects such as human hands. In ths secton, we wll present several examples from these applcaton domans. 4. Face Detecton, Codng, and Recognton Over the years, varous strateges for facal feature detecton have been proposed, rangng from edge map projectons [5], to more recent technques usng generalzed symmetry operators [32] and multlayer perceptrons [37]. In any robust face processng system, ths tas s crtcally mportant, snce a face must frst be geometrcally normalzed by algnng ts features wth those of a stored model before recognton can be attempted. The egentemplate approach to the detecton of facal features n mugshots was proposed n [27], where the DFFS metrc was shown to be superor to standard template matchng for target detecton. The detecton tas was the estmaton of the poston of facal features (the left and rght eyes, the tp of the nose and the center of the mouth) n frontal vew photographs of faces at fxed scale. Fg. 4 shows examples of facal feature tranng templates and the resultng detectons on the IT eda Laboratory s database of 7,562 mugshots.

7 702 IEEE TRASACTIOS O PATTER AALYSIS AD ACHIE ITELLIGECE, VOL. 9, O. 7, JULY 997 Fg. 4. Examples of facal feature tranng templates. The resultng typcal detectons. We have compared the detecton performance of three dfferent detectors on approxmately 7,000 test mages from ths database: a sum-of-square-dfferences (SSD) detector based on the average facal feature (n ths case the left eye), an egentemplate or DFFS detector and a L detector based on S(, j; W) as defned n Secton 2.2. Fg. 5a shows the recever operatng characterstc (ROC) curves for these detectors, obtaned by varyng the detecton threshold ndependently for each detector. The DFFS and L detectors were computed based on a fve-dmensonal prncpal subspace. Snce the projecton coeffcents were unmodal, a Gaussan dstrbuton was used n modelng the true dstrbuton for the L detector as n Secton 2.2. ote that the L detector exhbts the best detecton vs. false-alarm tradeoff and yelds the hghest detecton rate (95 percent). Indeed, at the same detecton rate, the L detector has a false-alarm rate whch s nearly two orders of magntude lower than the SSD. Fg. 5b provdes the geometrc ntuton regardng the operaton of these detectors. The SSD detector s threshold s based on the radal dstance between the average template (the orgn of ths space) and the nput pattern. Ths leads to hypersphercal detecton regons about the orgn. In contrast, the DFFS detector measures the orthogonal dstance to F, thus formng planar acceptance regons about F. Consequently, to accept vald object patterns n W, whch are very dfferent from the mean, the SSD detector must operate wth hgh thresholds whch result n many false alarms. However, the DFFS detector can not dscrmnate between the object class W and non-w patterns n F. The soluton s provded by the L detector whch ncorporates both the F -space component (DFFS) and the F-space lelhood (DIFS). The probablstc nterpretaton of Fg. 5b s as follows: SSD assumes a sngle prototype (the mean) n addtve whte Gaussan nose, whereas the DFFS assumes a unform densty n F. The L detector, on the other hand, uses the complete probablty densty for detecton. We have ncorporated and tested the multscale verson of the L detecton technque n a face detecton tas. Ths multscale head fnder was tested on the FERET database where 97 percent of 2,000 faces were correctly detected. Fg. 6 shows examples of the L estmate of the poston and scale on these mages. The multscale salency maps S(, j, ; W) were computed based on the lelhood estmate P $ dx W n Fg. 5. Detecton performance of an SSD, DFFS, and an L detector. Geometrc nterpretaton of the detectors. a 0-dmensonal prncpal subspace usng a Gaussan model (Secton 2.2). ote that ths detector s able to localze the poston and scale of the head despte varatons n har style and har color, as well as presence of sunglasses. Illumnaton nvarance was obtaned by normalzng the nput submage x to a zero-mean unt-norm vector. We have also used the multscale verson of the L detector as the attentonal component of an automatc system for recognton and model-based codng of faces. The bloc dagram of ths system s shown n Fg. 7, whch conssts of a two-stage object detecton and algnment stage, a contrast normalzaton stage, and a feature extracton stage whose output s used for both recognton and codng. Fg. 8 llustrates the operaton of the detecton and algnment stage on a natural test mage contanng a human face. The functon of the face fnder s to locate regons n the mage whch have a hgh lelhood of contanng a face. Fg. 6. Examples of multscale face detecton.

OGHADDA AD PETLAD: PROBABILISTIC VISUAL LEARIG FOR OBJECT REPRESETATIO 703 Fg. 7. The face processng system. shape of the face wth that of a canoncal model.

9a) s then projected onto a custom set of egenfaces to obtan a feature vector whch s then used for recognton purposes as well as facal mage codng. Fg.

8d, ts reconstructon usng a 00-dmensonal egenspace representaton (requrng only 85 bytes to encode) and a comparable nonparametrc reconstructon obtaned usng a standard transform-codng approach for

The frst eght egenfaces used for ths representaton are shown n Fg. 0. (c) (d) Fg. 8. Orgnal mage. Poston and scale estmate. (c) ormalzed head mage. (d) Poston of facal feature.

Once these regons have been dentfed, the estmated scale and poston are used to normalze for translaton and scale, yeldng a standard head-n-the-box format mage (Fg. 8c).

8 OGHADDA AD PETLAD: PROBABILISTIC VISUAL LEARIG FOR OBJECT REPRESETATIO 703 Fg. 7. The face processng system. shape of the face wth that of a canoncal model. Then the facal regon s extracted (by applyng a fxed mas) and subsequently normalzed for contrast. The geometrcally algned and normalzed mage (shown n Fg. 9a) s then projected onto a custom set of egenfaces to obtan a feature vector whch s then used for recognton purposes as well as facal mage codng. Fg. 9 shows the normalzed facal mage extracted from Fg. 8d, ts reconstructon usng a 00-dmensonal egenspace representaton (requrng only 85 bytes to encode) and a comparable nonparametrc reconstructon obtaned usng a standard transform-codng approach for mage compresson (requrng 530 bytes to encode). Ths example llustrates that the egenface representaton used for recognton s also an effectve model-based representaton for data compresson. The frst eght egenfaces used for ths representaton are shown n Fg. 0. (c) (d) Fg. 8. Orgnal mage. Poston and scale estmate. (c) ormalzed head mage. (d) Poston of facal feature. The frst step n ths process s llustrated n Fg. 8b, where the L estmate of the poston and scale of the face are ndcated by the cross-hars and boundng box. Once these regons have been dentfed, the estmated scale and poston are used to normalze for translaton and scale, yeldng a standard head-n-the-box format mage (Fg. 8c). A second feature detecton stage operates at ths fxed scale to estmate the poston of four facal features: the left and rght eyes, the tp of the nose, and the center of the mouth (Fg. 8d). Once the facal features have been detected, the face mage s warped to algn the geometry and (c) Fg. 9. Algned face. Egenspace reconstructon (85 bytes). (c) JPEG, reconstructon (530 bytes). Fg. 0. The frst eght egenfaces. Fg. shows the results of a smlarty search n an mage database tool called Photoboo [28]. Each face n the database was automatcally detected and algned by the face processng system n Fg. 7. The normalzed faces were then projected onto a 00-dmensonal egenspace. The mage n the upper left s the one searched on and the remander are the raned nearest neghbors n the FERET database. The top three matches n ths case are mages of the same person taen a month apart and at dfferent scales. The recognton accuracy (defned as the percent correct ran-one matches) on a database of 55 ndvduals s 99 percent [2]. ore recently, n the September 996 FERET face recognton competton, an mproved Bayesan matchng verson of our system [22] acheved a 95 percent recognton rate wth a database of approxmately,200 ndvduals, the hghest recognton rate obtaned n the competton [29]. In order to have an estmate of the recognton performance on much larger databases, we have conducted our own tests on a database of 7,562 mages of approxmately 3,000 people. The mages were collected n a small booth at a Boston photography show, and nclude men, women, and

9 704 IEEE TRASACTIOS O PATTER AALYSIS AD ACHIE ITELLIGECE, VOL. 9, O. 7, JULY 997 Fg.. Photoboo: FERET face database. chldren of all ages and races. Head poston was controlled by asng people to tae ther own pcture when they were lned up wth the camera. Two LEDs placed at the bottom of holes adjacent to the camera allowed them to judge ther algnment; when they could see both LEDs, then they were correctly algned. The egenfaces for ths database were approxmated usng a prncpal components analyss on a representatve sample of 28 faces. Recognton and matchng was subsequently performed usng the frst 20 egenvectors. To assess the average recognton rate, 200 faces were selected at random, and a nearest-neghbor rule was used to fnd the most-smlar face from the entre database. If the most-smlar face was of the same person, then a correct recognton was scored. In ths experment, the egenvectorbased recognton system produced a recognton accuracy of 95 percent. Ths performance s somewhat surprsng, because the database contans wde varatons n expresson, and has relatvely wea control of head poston and llumnaton. In a verfcaton tas, our system yelded a false rejecton rate of.5 percent at a false acceptance rate of 0.0 percent. 4.2 Vew-Based Recognton The problem of face recognton under general vewng condtons (change n pose) can also be approached usng an egenspace formulaton. There are essentally two ways of approachng ths problem usng an egenspace framewor. Gven ndvduals under dfferent vews, one can do recognton and pose estmaton n a unversal egenspace computed from the combnaton of mages. In ths way, a sngle parametrc egenspace wll encode both dentty as well as pose. Such an approach, for example, has recently been used by urase and ayar [23] for general 3D object recognton. Alternatvely, gven ndvduals under dfferent vews, we can buld a vew-based set of dstnct egenspaces, each capturng the varaton of the ndvduals n a common vew. The vew-based egenspace s essentally an extenson of the egenface technque to multple sets of egenvectors, one for each combnaton of scale and orentaton. One can vew ths archtecture as a set of parallel observers, each tryng to explan the mage data wth ther set of egenvectors (see also Darrell and Pentland [0]). In ths vew-based, multple-observer approach, the frst step s to determne the locaton and orentaton of the target object by selectng the egenspace whch best descrbes the nput mage. Ths can be accomplshed by calculatng the lelhood estmate usng each vewspace s egenvectors and then selectng the maxmum. The ey dfference between the vew-based and parametrc representatons can be understood by consderng the geometry of facespace. In the hgh-dmensonal vector space of an nput mage, multple-orentaton tranng mages are represented by a set of dstnct regons, each defned by the scatter of ndvduals. ultple vews of a face form nonconvex (yet connected) regons n mage space [2]. Therefore, the resultng ensemble s a hghly complex and nonseparable manfold. The parametrc egenspace attempts to descrbe ths ensemble wth a projecton onto a sngle low-dmensonal lnear subspace (correspondng to the frst n egenvectors of the tranng mages). In contrast, the vew-based approach corresponds to ndependent subspaces, each descrbng a partcular regon of the facespace (correspondng to a partcular vew of a face). The relevant analogy here s that of modelng a complex dstrbuton by a sngle cluster model or by the unon of several component clusters. aturally, the latter (vew-based) representaton can yeld a more accurate representaton of the underlyng geometry. Ths dfference n representaton becomes evdent when consderng the qualty of reconstructed mages usng the two dfferent methods. Fg. 3 compares reconstructons obtaned wth the two methods when traned on mages of faces at multple orentatons. In Fg. 3a, we see frst an mage n the tranng set, followed by reconstructons of ths mage usng, frst, the parametrc egenspace, and then, the vew-based egenspace. ote that n the parametrc reconstructon, nether the pose nor the dentty of the ndvdual s adequately captured. The vew-based reconstructon, on the other hand, provdes a much better characterzaton of the object. Smlarly, n Fg. 3b, we see a novel vew (+68 ) wth respect to the tranng set (-90 to +45 ). Here, both reconstructons correspond to the nearest vew n the tranng set (+45 ), but the vew-based reconstructon s seen to be more representatve of the ndvdual s dentty. Although the qualty of the reconstructon s not a drect ndcator of the recognton power, from an nformaton-theoretc pont-of-vew, the multple egenspace representaton s a more accurate representaton of the sgnal content. We have evaluated the vew-based approach wth data smlar to that shown n Fg. 2. Ths data conssts of 89 mages consstng of nne vews of 2 people. The nne vews of each person were evenly spaced from -90 $ to +90 $ along the horzontal plane. In the frst seres of experments, the nterpolaton performance was tested by tranng on a subset of the avalable vews {±90, ±45, 0 } and testng on the ntermedate vews {±68, ±23 }. A 90 percent average recognton rate was obtaned. A second seres of exper-

OGHADDA AD PETLAD: PROBABILISTIC VISUAL LEARIG FOR OBJECT REPRESETATIO 705 Fg. 2. Some of the mages used to test accuracy at face recognton despte wde varatons n head orentaton.

Recognton rates for egenfaces, egenfeatures, and the combned modular representaton. Fg. 3. Parametrc vs. vew-based egenspace reconstructons for a tranng vew and a novel testng vew.

A schematc representaton of the two approaches. ments tested the extrapolaton performance by tranng on a range of vews (e.g., -90 to +45 ) and testng on novel vews outsde the tranng range (e.g., +68 and +90 ).

10 OGHADDA AD PETLAD: PROBABILISTIC VISUAL LEARIG FOR OBJECT REPRESETATIO 705 Fg. 2. Some of the mages used to test accuracy at face recognton despte wde varatons n head orentaton. Average recognton accuracy was 92 percent, the orentaton error had a standard devaton of 5 $. Tranng Vew Testng Vew Fg. 4. Facal egenfeature regons. Recognton rates for egenfaces, egenfeatures, and the combned modular representaton. Fg. 3. Parametrc vs. vew-based egenspace reconstructons for a tranng vew and a novel testng vew. The nput mage s shown n the left column. The mddle and rght columns correspond to the parametrc and vew-based reconstructons, respectvely. All reconstructons were computed usng the frst 0 egenvectors. A schematc representaton of the two approaches. ments tested the extrapolaton performance by tranng on a range of vews (e.g., -90 to +45 ) and testng on novel vews outsde the tranng range (e.g., +68 and +90 ). For testng vews separated by ±23 from the tranng range, the average recognton rates were 83 percent. For ±45 testng vews, the average recognton rates were 50 percent (see [27] for further detals). 4.3 odular Recognton The egenface recognton method s easly extended to facal features as shown n Fg. 4a. Ths leads to an mprovement n recognton performance by ncorporatng an addtonal layer of descrpton n terms of facal features. Ths can be vewed as ether a modular or layered representaton of a face, where a coarse (low-resoluton) descrpton of the whole head s augmented by addtonal (hgherresoluton) detals n terms of salent facal features. The utlty of ths layered representaton (egenface plus egenfeatures) was tested on a small subset of our large face database. We selected a representatve sample of 45 ndvduals wth two vews per person, correspondng to dfferent facal expressons (neutral vs. smlng). Ths set of mages was parttoned nto a tranng set (neutral) and a testng set (smlng). Snce the dfference between these partcular facal expressons s prmarly artculated n the mouth, ths feature was dscarded for recognton purposes. Fg. 4b shows the recognton rates as a functon of the number of egenvectors for egenface-only, egenfeatureonly, and the combned representaton. What s surprsng s that (for ths small dataset at least) the egenfeatures alone were suffcent n achevng an (asymptotc) recognton rate of 95 percent (equal to that of the egenfaces). ore

706 IEEE TRASACTIOS O PATTER AALYSIS AD ACHIE ITELLIGECE, VOL. 9, O.

Fnally, by usng the combned representaton, we gan a slght mprovement n the asymptotc recognton rate (98 percent).

templates. A potental advantage of the egenfeature layer s the ablty to overcome the shortcomngs of the standard egenface method.

These test mages are ndcatve of the type of varatons whch can lead to false matches: a hand near the face, a panted face, and a beard. Fg.

11 706 IEEE TRASACTIOS O PATTER AALYSIS AD ACHIE ITELLIGECE, VOL. 9, O. 7, JULY 997 surprsng, perhaps, s the observaton that n the lower dmensons of egenspace, egenfeatures outperformed the egenface recognton. Fnally, by usng the combned representaton, we gan a slght mprovement n the asymptotc recognton rate (98 percent). A smlar effect was reported by Brunell and Poggo [4], where the cumulatve normalzed correlaton scores of templates for the face, eyes, nose, and mouth showed mproved performance over the face-only templates. A potental advantage of the egenfeature layer s the ablty to overcome the shortcomngs of the standard egenface method. A pure egenface recognton system can be fooled by gross varatons n the nput mage (hats, beards, etc.). Fg. 5a shows addtonal testng vews of three ndvduals n the above dataset of 45. These test mages are ndcatve of the type of varatons whch can lead to false matches: a hand near the face, a panted face, and a beard. Fg. 5b shows the nearest matches found based on standard egenface matchng. one of the three matches correspond to the correct ndvdual. On the other hand, Fg. 5c shows the nearest matches based on the eyes and nose, and results n correct dentfcaton n each case. Ths smple example llustrates the potental advantage of a modular representaton n dsambguatng low-confdence egenface matches. We have also extended the normalzed egenface representaton nto an edge-based doman for facal descrpton. We smply run the normalzed facal mage through a Canny edge detector to yeld an edge map as shown n (c) Fg. 5. Test vews. Egenface matches. (c) Egenfeature matches. Fg. 6. Examples of combned texture/edge-based face representatons. Few of the resultng egenvectors. Fg. 6a. Such an edge map s smply an alternatve representaton whch mparts mostly shape (as opposed to texture) nformaton and has the advantage of beng less susceptble to llumnaton changes. The recognton rate of a pure edge-based normalzed egenface representaton (on a FERET database of 55 ndvduals) was found to be 95 percent, whch s surprsng consderng that t utlzes what appears to be (to humans at least) a rather mpovershed representaton. The slght drop n recognton rate s most lely due to the ncreased dmensonalty of ths representaton space and ts greater senstvty to expresson changes, etc. Interestngly, we can combne both texture and edgebased representatons of the object by smply performng a KL expanson on the augmented mages shown n Fg. 6. The resultng egenvectors convenently decorrelate the jont representaton and provde a bass set whch optmally spans both domans smultaneously. Wth ths bmodal representaton, the recognton rate was found to be 97 percent. Though stll less than a normalzed grayscale representaton, we beleve a bmodal representaton can have dstnct advantages for tass other than recognton, such as detecton and mage nterpolaton. 4.4 Hand odelng and Recognton We have also appled our egenspace densty estmaton technque to artculated and nonrgd objects, such as hands. In ths partcular doman, however, the orgnal ntensty mage s an unsutable representaton snce, unle faces, hands are essentally textureless objects. Ther dentty s characterzed by the varety of shapes they can assume. For ths reason, we have chosen an edge-based representaton of hand shapes whch s nvarant wth respect to llumnaton, contrast, and scene bacground. A tranng

12 OGHADDA AD PETLAD: PROBABILISTIC VISUAL LEARIG FOR OBJECT REPRESETATIO 707 set of hand gestures was obtaned aganst a blac bacground. The 2D contour of the hand was then extracted usng a Canny edge-operator. These bnary edge maps, however, are hghly uncorrelated wth each other due to ther sparse nature. Ths leads to a very hgh-dmensonal prncpal subspace. Therefore, to reduce the ntrnsc dmensonalty, we nduced spatal correlaton va a dffuson process on the bnary edge map, whch effectvely broadens and smears the edges, yeldng a contnuous-valued contour mage whch represents the object shape n terms of the spatal dstrbuton of edges. Fg. 7 shows examples of tranng mages and ther dffused edge map representatons. We note that ths spatotopc representaton of shape s bologcally motvated and therefore dffers from methods based purely on computatonal consderatons (e.g., moments [3], Fourer descrptors [20], snaes [6], Pont Dstrbuton odels [7], and modal descrptons [34]). Fg. 7. Top: Examples of hand gestures and bottom: ther dffused edge-based representaton. It s mportant to verfy whether such a representaton s adequate for dscrmnatng between dfferent hand shapes. Therefore, we tested the dffused contour mage representaton n a recognton experment whch yelded a 00 percent ran-one accuracy on 375 frames from an mage sequence contanng seven hand gestures. The matchng technque was a nearest-neghbor classfcaton rule n a 6dmensonal prncpal subspace. Fg. 8a shows some examples of the varous hand gestures used n ths experment. Fg. 8b shows the 5 mages that are most smlar to the two gesture appearng n the top left. ote that the hand gestures judged most smlar are all objectvely the same shape. aturally, the success of such a recognton system s crtcally dependent on the ablty to fnd the hand (n any of ts artculated states) n a cluttered scene, to account for ts scale, and to algn t wth respect to an object-centered reference frame pror to recognton. Ths localzaton was acheved wth the same multscale L detecton paradgm used wth faces, wth the excepton that the underlyng mage representaton of the hands was the dffused edge map rather the grayscale mage. The probablty dstrbuton of hand shapes n ths representaton was automatcally learned usng our egenspace densty estmaton technque. In ths case, however, the dstrbuton of tranng data s multmodal due to the dfferent hand shapes. Therefore, the multmodal densty estmaton technque n Secton 2.3 was used. Fg. 9a shows a projecton of the tranng data on the frst two dmensons Fg. 8. A random collecton of hand gestures. Images ordered by smlarty (left-to-rght, top-to-bottom) to the mage at the upper left. of the prncpal subspace F (defned n ths case by = 6) whch exhbt the underlyng multmodalty of the data. Fg. 9b shows a 0-component xture-of-gaussans densty estmate for the tranng data. The parameters of ths estmate were obtaned wth 20 teratons of the E algorthm. The orthogonal F -space component of the densty was modeled wth a Gaussan dstrbuton as n Secton 2.3. The resultng complete densty estmate P$ x W was d then used n a detecton experment on test magery of hand gestures aganst a cluttered bacground scene. In accordance wth our representaton, the nput magery was frst preprocessed to generate a dffused edge map and then scaled accordngly for a multscale salency computaton. Fg. 20 shows two examples from the test sequence, where we have shown the orgnal mage, the negatve loglelhood salency map, and the L estmates of poston and scale (supermposed on the dffused edge map). ote

13 708 IEEE TRASACTIOS O PATTER AALYSIS AD ACHIE ITELLIGECE, VOL. 9, O. 7, JULY 997 Fg. 9. Dstrbuton of tranng hand shapes (shown n the frst two dmensons of the prncpal subspace). xture-of-gaussans ft usng 0 components. (c) Fg. 20. Orgnal grayscale mage. egatve log-lelhood map (at most lely scale). (c) L estmate of poston and scale supermposed on edge map. that these examples represent two dfferent hand gestures at slghtly dfferent scales. To better quantfy the performance of the L detector on hands, we carred out the followng experment. The orgnal 375-frame vdeo sequence of tranng hand gestures was dvded nto two parts. The frst (tranng) half of ths sequence was used for learnng, ncludng computaton of the KL bass and the subsequent E clusterng. For ths experment we used a fve-component mxture n a 0- dmensonal prncpal subspace. The second (testng) half of the sequence was then embedded n the bacground scene, whch contans a varety of shapes. In addton, severe nose condtons were smulated as shown n Fg. 2a. We then compared the detecton performance of an SSD detector (based on the mean edge-based hand representaton) and a probablstc detector based on the complete estmated densty. The resultng negatve-log-lelhood detecton maps were passed through a valley-detector to solate local mnmum canddates, whch were then subjected to an ROC analyss. A correct detecton was defned as a below-threshold local mnmum wthn a fve-pxel radus of the ground truth target locaton. Fg. 2b shows the performance curves obtaned for the two detectors. We note, for example, that, at an 85 percent detecton probablty, the L detector yelds (on the average) one false alarm per frame, whereas the SSD detector yelds an order of magntude more false alarms. Fg. 2. Example of test frame contanng a hand gesture amdst severe bacground clutter. ROC curve performance contrastng SSD and L detectors. 5 DISCUSSIO In ths paper, we have descrbed an egenspace densty estmaton technque for unsupervsed vsual learnng whch explots the ntrnsc low dmensonalty of the tranng magery to form a computatonally smple estmator for the complete lelhood functon of the object. Our estmator s based on a subspace decomposton and can be evaluated usng only the -dmensonal prncpal component vector. We derved the form for an optmal estmator and ts assocated expected cost for the case of a Gaussan densty. In contrast to prevous wor on learnng and characterzaton whch uses PCA prmarly for dmensonalty reducton and/or feature extracton our method uses the egenspace decomposton as an ntegral part of estmatng complete densty functons n hgh-dmensonal mage spaces. These densty estmates were then used n a maxmum lelhood formulaton for target detecton. The multscale verson of ths detecton strategy was demonstrated n applcatons n whch t functoned as an attentonal subsystem for object recognton. The performance was found to be superor to exstng detecton technques n experments wth large numbers of test data. We note that, from a probablstc perspectve, the class condtonal densty P(x W) s the most mportant object representaton to be learned. Ths densty s the crtcal compo-

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural