Hybrid Body Representation for Integrated Pose Recognition, Localization and Segmentation

Size: px

Start display at page:

Download "Hybrid Body Representation for Integrated Pose Recognition, Localization and Segmentation"

Charlene Campbell
6 years ago
Views:

Hybrd Body Representaton for Integrated Pose Recognton, Localzaton and Segmentaton Cheng Chen and Guolang Fan School of Electrcal and Computer Engneerng Oklahoma State Unversty, Stllwater, OK 74078

Specfcally, each body part as well as the whole body are represented by an off-lne learned shape model where both regon-based and edge-based prors are combned n a coupled shape representaton.

1 Hybrd Body Representaton for Integrated Pose Recognton, Localzaton and Segmentaton Cheng Chen and Guolang Fan School of Electrcal and Computer Engneerng Oklahoma State Unversty, Stllwater, OK {c.chen, Abstract We propose a hybrd body representaton that represents each typcal pose by both template-lke vew nformaton and part-based structural nformaton. Specfcally, each body part as well as the whole body are represented by an off-lne learned shape model where both regon-based and edge-based prors are combned n a coupled shape representaton. Part-based spatal prors are represented by a star graphcal model. Ths hybrd body representaton can synergstcally ntegrate pose recognton, localzaton and segmentaton nto one computatonal flow. Moreover, as an mportant step for feature extracton and model nference, segmentaton s nvolved n the low-level, md-level and hgh-level vson stages, where top-down pror knowledge and bottom-up data processng s well ntegrated va the proposed hybrd body representaton. 1. Introducton We consder pose recognton, localzaton and segmentaton of the whole body as well as body parts n a sngle mage. Ths research s a fundamental step toward vdeo-based human moton analyss that have been ntensvely studed recently. Pose recognton, localzaton and segmentaton n a stll mage are challengng problems due to the varablty of human body shapes and poses as well as the nherent ambguty n the observed mage. Our goal s to develop a hybrd human representaton and the correspondng processng to assemble three tasks nto one ntegrated framework. We propose a hybrd body representaton, as shown n Fg. 1, where the four mages show the nput mage represented by watershed cells, the part-based body representaton, the onlne learned whole shape pror, and the part/whole segmentaton results respectvely. Partcularly, the segmentaton process that has been found useful for object recognton and localzaton s nvolved for learnng and nference n ths work. Fgure 1. The nput mage represented by watershed cells, the partbased body representaton, the onlne learned whole shape pror, and the part-whole segmentaton results (from left to rght). The proposed research s deeply nspred and motvated by shape representaton theores n cogntve psychology where there are two prevalng theores,.e., the structural descrpton-based and the vew-based representatons [10]. The former one suggests that a complex object s represented by a collecton of smpler elements wth specfc nter-relatonshps. The latter one postulates a very smple template-lke representaton n whch an objects s holstcally represented by a smple vector or matrx feature wthout an ntermedate representatonal step. Current cogntve studes ndcate that none of these two representaton schemes alone can provde a complete characterzaton of the human vson system for object recognton [7]. Smlarly, exstng shape representatons n computer vson can be roughly grouped nto two categores. One s template-lke or slhouette-based methods, whch are sutable for shape pror-based segmentaton. The other s the part-based methods, whch can capture the ntra-class varablty. The man dea of our research s to ntegrate both vew-based and structural descrpton-based models nto a hybrd body representaton to support ntegrated pose recognton, localzaton, and segmentaton. Partcularly, t can facltate shape pror guded segmentaton, by whch bottom-up features can be extracted to drve the top-down nference n a cascade fashon. Addtonally, both offlne and onlne learnng are nvolved to learn general and subject-specfc knowledge respectvely, ncludng the colors, shapes and spatal structure /08/$ IEEE

2. Related work Exstng pose recognton, localzaton, and segmentaton methods can be broadly grouped nto three major categores accordng to the way how the body s represented: the representaton-free

2 2. Related work Exstng pose recognton, localzaton, and segmentaton methods can be broadly grouped nto three major categores accordng to the way how the body s represented: the representaton-free methods, the vew-based methods, and the structural descrptons-based methods. The frst category manly contans some bottom-up approaches, n whch there s no explct shape pror representaton, [15] and [19]. All the nformaton used s a seres of regon groupng rules establshed accordng to physcal constrants such as the body part proxmty. In general, these approaches focus on explotng bottom-up cues. The second category ncludes all slhouette-based pose analyss methods. In [1], a specfc vew-based approach was proposed where pose nformaton s mplctly emboded nto a classfer learned from SIFT-lke features. In general, no ntermedate feature or color s used n these approaches. All vew-based approaches normally am at detectng partcular body pose wthout extractng body parts. Thus t cannot recover anthropometrc nformaton. The pctoral structure model proposed [5] s a typcal approach belongng to the thrd category, n whch the human body s descrbed by several parts wth ther appearances and spatal relatonshps. Ths knd of approach usually requres a robust part detector. The edge hstogram [3] and other SIFT-lke features are wdely used to represent parts. Very recently, a regon-based deformable model s used to represent parts [17] where segmentaton was used to verfy the object hypothess. The method n [17] s smlar n sprt to the part-level nference proposed here. However, n our approach, where an mage s represented by small buldng blocks (watershed cells), the coupled shape model s nvolved n a hypothess-and-test paradgm where the regon pror forms a segmentaton gven a poston hypothess and the edge pror evaluates the formed segmentaton. As the name suggests, the hybrd human body representaton proposed here absorbs recent multfaceted advances n the feld. The proposed representaton nvolves shape pror guded segmentaton and nference n a mult-stage fashon. Unlke prevous methods, we use segmentaton to extract bottom-up features to drve the top-down nference. Our contrbutons n ths work nclude: (1) a hybrd human body representaton that supports the onlne color model learnng and nvolves an onlne learned deformable shape model to segment the whole body and parts, (2) an effectve hypothess-and-test paradgm for the part-level nference that nvolves the coupled regon-edge shape prors, (3) a three-stage cascade computatonal flow to ntegrate pose recognton, localzaton and segmentaton nto a bologcally plausble framework, and (4) a new watershedbased Graphc-cut segmentaton where both regon and edge shape prors are used for optmal segmentaton. 3. Overvew of our approach The proposed hybrd body representaton synergstcally ntegrates pose recognton, localzaton and segmentaton of the whole body as well as body parts n an mage, as shown n Fg. 2. Several key ssues are addressed. Hgh-level vson Mddle-level vson Low-level vson Check assocatons of part and whole segmentatons Localzaton and pose recognton va Inference Map mages Low-level groupng Input Bottom-up flow Graph-cut based segmentaton Onlne learnng and deformaton Part locatons Segmentaton based hypothessand-test UC regons Feedback flow Fgure 2. Overvew of our approach. On-lne learned body shape Off-lne learned body shape Off-lne learned spatal prors Off-lne learned part shape prors Top-down flow Off-lne and onlne learnng: Off-lne and onlne learnng are used to obtan both general and subject-specfc nformaton, respectvely. The former one acqures the general shape and spatal prors for both body parts and the whole body, and the latter one captures the subject specfc nformaton, ncludng colors and shapes. Part-whole organzaton: Parts and the whole are two complementary components for object representaton. The part-level nference produces the map mages that are assembled to localze the whole body as well as body parts. Coupled regon-edge shape model: The coupled regon-edge shape representaton supports a hypothessand-test paradgm, where the regon-based pror s used to form a segmentaton and the edge-based pror s used to evaluate the formed segmentaton. After the onlne learnng of the whole body, both prors are used n a new Graph-cut segmentaton framework for an optmal segmentaton. Integraton of bottom-up and top-down: Both the data-drven bottom-up and knowledge-drven top-down flows are ntegrated at low-level vson (watershed segmentaton), md-level (part-level nference/segmentaton) and hgh-level (whole-level nference/segmentaton). Potentally, ths work can lead to a dynamc computatonal framework by ncorporatng two feedback flows (from hghlevel vson to md-level and low-level vson).

3 4. Hybrd human body representaton Consder a walkng cycle wth K typcal poses W = {W (k) k = 1,..., K}, We model each pose W (k) by both part-based and whole-based statstcal representatons W (k) = {V (k) 1:d, L(k),SW (k) (k) off }, where V 1:d are shape prors of d part, L (k) s a set of statstcal parameters that encode the spatal relatonshps between parts n a star graphcal model, and SW (k) off s the off-lne learned shape pror of the whole body. The shape pror of each part V (k) s represented by the regon-based shape pror SP (k), the edgebased shape pror M (k) (k), and the average orentaton θ,.e., V (k) = {SP (k), M (k) (k), θ }. Moreover, durng the nference processes, the part-based and whole-based color models as well as the subject specfc whole shape model wll be onlne learned as part of the hybrd body representaton. For clearness, we may omt the pose ndex (k), n some places below Part-based shape pror Inspred by the MetaMorphs model n [9], we develop an mplct shape model for each part where both regonbased and edge-based shape prors are holstcally represented. The learnng process s smlar to the one n [13]. belongng to the object and background respectvely. Gven a threshold ε, anaverage object boundary M can be extracted from the learned regon-based shape pror SP (p) by a level-set lke method, M = {p SP (p) =ε}. (1) Snce both the regon-based and edge-based prors are embedded n V, the two prors can be learned smultaneously n the tranng process. More mportantly, ths coupled representaton allows a hypothess-and-test paradgm for the nference at the part-level where the regon pror nduces a segmentaton gven a poston hypothess and the edge pror s used to valdate the segmentaton, resultng part-based possblty maps for whole-based nference Part-based spatal pror We use the spatal pror model proposed n [3] to characterze the varablty of spatal confguraton of body parts. For pose k, we defne the part-based spatal pror by a start graphcal model as shown n the second fgure of Fg. 1 that, Σ (k) =1,..., d, = = r} denote the Gaussan prors for the relatve locatons between the non-reference part and the reference part r. These statstcal parameters can be obtaned by a maxmum-lkelhood estmator (MLE) from labeled tranng data. Gven a partcular spatal confguraton of d parts, L =(l 1,..., l d ), the jont dstrbuton of d parts wth respect to pose k can be wrtten as the followng: s parameterzed by L (k) = {µ (k) r}. Specfcally, {µ (k), Σ (k) p L (k)(l) =p L (k)(l 1,..., l d )=p L (k)(l r ) =r p L (k)(l l r ). Fgure 3. The top row shows the tranng mages wthout algnment. The bottom row show the algned tranng mages, the learned shape pror, and the extracted edge-based shape pror. For each part, we have obtaned a set of tranng mages (pre-segmented bnary mages wth a fxed wndow sze and the measured orentaton). Let θ be the average orentaton of part V. All tranng mages have been algned to the average orentaton frst. Let {Ω (p) j =1,..., Q} denote the algned tranng mages, where p =(x, y) s the locaton of one pxel n the wndow where the shape pror s defned. The shape pror SP (p) can be obtaned by addng all algned tranng mages as shown n Fg. 3, SP (p) = 1 Q Q Ω (p). j=1 SP (p) and 1 SP (p) reflect the the probablty of pxel p (2) Thesameasn[3], we assume that p L (k)(l) s Gaussan. Therefore, the condtonal dstrbuton p L (k)(l l r ) s stll Gaussan. As defned above, µ (k) and Σ (k) are the mean and covarance for the spatal dstrbuton (relatve) of part n pose k. Then, for each non-reference part, the condtonal dstrbuton of ts poston wth respect to pose k s defned below, p L (k)(l l r )=N(l l r µ (k), Σ (k) ). (3) 4.3. Whole body shape pror For each pose, a whole body shape pror s needed for body segmentaton after pose recognton and localzaton. Both off-lne and onlne learnng are nvolved for generatng shape models that capture the general representaton as well as the subject specfc nformaton, as shown n Fg Off-lne learnng The off-lne learnng s smlar to that of parts, except that a part-based algnment s needed due to the spatal varablty

Average poston and orentaton of parts Off-lne learnng On lne Localzaton + 1 T T On-lne learnng Fgure 4. The learnng of the whole body shape pror. of each pose.

For each tranng mage wth a segmented body and parts, we want to fnd a set of control ponts based on whch the tranng mage can be deformed n a way that all parts are transformed to the average locaton

To preserve the shape nformaton of parts, we wll use the edge ponts of all parts to be control ponts whch can be transformed to a target mage va 2-D rgd transformatons obtaned from the averaged

After we get control ponts n both source and target mages, the multlevel B-splne method [11] s used to obtan non-lnear transformatons by whch all pxels n the source mage are mapped to the target mage.

2 Onlne learnng The onlne learnng s used to create a subject specfc shape model SW on (p) after all parts are localzed.

The smlar technque descrbed above for off-lne learnng s used here to fnd the non-lnear transformaton functons for every pxel n SW off (p) by whch SW off (p) s converted to SW on (p) that carres a

4 Average poston and orentaton of parts Off-lne learnng On lne Localzaton + 1 T T On-lne learnng Fgure 4. The learnng of the whole body shape pror. of each pose. For each pose, we can compute the average locaton and orentaton for all parts. For each tranng mage wth a segmented body and parts, we want to fnd a set of control ponts based on whch the tranng mage can be deformed n a way that all parts are transformed to the average locaton and orentaton. To preserve the shape nformaton of parts, we wll use the edge ponts of all parts to be control ponts whch can be transformed to a target mage va 2-D rgd transformatons obtaned from the averaged locatons and orentatons. After we get control ponts n both source and target mages, the multlevel B-splne method [11] s used to obtan non-lnear transformatons by whch all pxels n the source mage are mapped to the target mage. Small holes can be flled by smple morphologcal operatons. These algned blary mages are used construct a whole body shape pror,.e., SW off (p) Onlne learnng The onlne learnng s used to create a subject specfc shape model SW on (p) after all parts are localzed. The goal s to deform SW off (p) n a way that the locatons of all detected parts are reflected n the shape pror. The smlar technque descrbed above for off-lne learnng s used here to fnd the non-lnear transformaton functons for every pxel n SW off (p) by whch SW off (p) s converted to SW on (p) that carres a subject specfc shape model. It s worth notng that both SW off (p) and SW on (p) are not bnary mages. Approprate nterpolaton s needed to fll the possble holes n SW on (p) durng the deformaton process. 5. Low-level vson: watershed transform Groupng pxels nto small homogenous regons s becomng a popular pre-processng for many computer vson tasks. Ths s well supported by the cogntve theory proposed by [16] that consders unform connectedness (UC) regons as the buldng block for object representaton. In ths work, we chose the watershed transform [20] because of ts many bologcally plausble propertes, such as fast, local computaton. More mportantly, both boundary and regonal nformaton are avalable for each cell. To overcome the over-segmentaton problem, the geodesc reconstructon preprocessng [14] s used to control the watershed sze through some morphologcal parameters, whch can be dynamcally adjusted accordng to the feedback from the hgh-level vson (as shown n Fg. 2). A gven an mage I, s represented by Z watershed cells I = {C = 1, 2,..., Z}. Each cell conssts of a set of pxels C = {p (),..., p() η }, where η s the number of 1,p() 2 pxels. Moreover, we use a 3-D Gaussan model {µ (c),σ (c) } to represent the color dstrbuton n the L a b color space for cell C. The watershed cells are used as the buldng blocks n the followng processes. 6. Md-level vson: part-based nference The goal of the md-level vson s to generate mmedate part detecton results that wll be useful for the hgh-level vson. What we need here s a map mage that ndcates how lkely there s an object (.e., a body part) at each locaton. Recently, the dea of usng segmentaton to verfy object hypotheses has acheved remarkable success n object detecton [18, 12]. We hereby propose a new hypotheses-and-test paradgm, as shown n Fg. 5, where the regon-based shape pror s used to form a segmentaton for a gven poston hypothess and the edge-based shape pror s used to valdate the formed segmentaton. Good hypothess Bad hypothess Segmentaton usng regon-based shape pror Regon based Shape pror Evaluaton usng edgebased shape pror Edge based Hgh Score Low Score Fgure 5. The hypothess-and-test paradgm for part nference Hypothess step: regon-based segmentaton Gven the shape pror SP (p) p Ω for part where Ω s a rectangular wndow, we can use Ω as a sldng-wndow to scan through the whole mage to examne the exstence of part at each locaton. For a hypotheszed locaton, we use SP (p) to nduce a local fgure-ground segmentaton that s composed by some watershed cells covered by Ω. Ths segmentaton wll be used to valdate the exstence of part at that locaton. In order to take advantage of watershed cells

and ther color models, we propose a new sem-parametrc kernel-based model learnng technques to onlne learn the fgure/ground color models from the watershed results drectly, as shown n Fg. 6(a).

5 and ther color models, we propose a new sem-parametrc kernel-based model learnng technques to onlne learn the fgure/ground color models from the watershed results drectly, as shown n Fg. 6(a). We treat the Gaussan model learned from a watershed cell as a kernel center, and learn the fgure/ground color models as follows, ˆ f ob (x) = ˆ f bg (x) = Z =1,C Ω = Z =1,C Ω = α K (x), (4) β K (x), (5) where x s a color vector; C s one of Z watershed cells that has overlap wth wndow Ω; K (x) =N (x µ (c), Σ (c) ) s the color model assocated wth C ; α and β denote the contrbuton of cell C to the object and background respectvely that can be calculated from SP (p), (p Ω) as α = 1 T β = 1 T p (C Ω) p (C Ω) SP(p), (6) (1 SP(p)), (7) where T s the sze of shape pror wndow Ω. Based on the fgure/ground color models, we can use the maxmum a posteror (MAP) crteron to dentfy the watershed cells that belong to the object. Let τ be the class label for C : τ = { 1 (object), α ˆ 0 (background), α ˆ fob (µ (c) fob (µ (c) ) >β fbg ˆ (µ (c) ); ) <β fbg ˆ (µ (c) ). (8) Therefore, we can obtan the correspondng segmentaton for the poston hypothess l as, X = { C τ =1}. Dfferent from the one n [18] where the shape pror s used once for onlne color model learnng, here we use the shapebased pror twce. The frst tme s for the onlne color model learnng and the second tme s for MAP-based segmentaton. Consderng the false negatve s more detrmental than the false postve n md-level vson, we fully ncorporate the regon-base shape pror nto segmentaton. Ths may lead to more false postves due to more object-lke segmentatons. However, the sequent edge-based evaluaton wll mtgate ths problem Test step: edge-based evaluaton After segmentaton X s formed, we evaluate t accordng to the edge pror M. LetΓ(X) to be the boundary of X, we compare Γ(X) wth M n terms of shape smlarty and smoothness. The score of X wth respect to ts complance wth M,.e., ρ M (X) s gven by, ρ M (X) = exp( d c (Γ(X), M)) + ζ(1 S(Γ(X), M)), (9) Fgure 6. (a) Kernel-based color model learnng usng the regonbased prors and watershed cells.(b) Smoothness measurement. where the frst term s the chamfer dstance ndcatng the shape smlarty; the second term measures the boundary smoothness; and ζ balances the relatve mportance between the two terms. It s expected that a vald segmentaton should be smooth and match wth M well. The chamfer dstance s senstve to the transton, rotaton and scale. Ths s desred for rejectng false hypotheses. The second term ams to reject segmentatons wth ragged boundares. We defne an effectve smoothness measurement as follows. As shown n Fg. 6(b), assume that Γ(X) touches n cells {C 1,..., C n }, and we defne H = {h () 1,..., h() n } to be the set of n boundary pxels shared by cell C and Γ(X). Let φ M (p) :R 2 R to be the sgned Eucldan dstance transform that s + or - for p nsde or outsde M, respectvely. The maxmum and mnmum dstances from H to M are obtaned by d () max = max(φ M (h () 1 ),..., φ M(h () n )), and d () mn = mn(φ M(h () 1 ),..., φ M(h () m )), respectvely. The degree of parallelness between Γ(X) and H s measured by S M (H )= d() max d () mn. (10) n When H s parallel to M (e.g., H 2 n Fg. 6(b)), S M (H ) = 0, ndcatng good local smoothness. When H s perpendcular to M (e.g., H 3 n Fg. 6(b)), S M (H ) = 1, ndcatng poor local smoothness. In general, the smaller the value, the more parallel between H and Γ(X). Therefore, we defne the overall smoothness of Γ(X) as S(Γ(X), M) = 1 n n S M (H ). (11) =1 For each of d parts of pose k, the score functon (9) wll return a map mage that records the exstence possblty of a part n every locaton and wll be the nput for the hgh-level vson. It s worth notng that at each poston hypothess, the shape pror s hypotheszed wth dfferent orentatons around the mean angle, and we use the wnner-take-all strategy to generate the map mage. The optmal angle at each locaton s also recorded.

6 7. Hgh-level vson: recognton/localzaton We use g (k) (I,l ) to represent the map mage for part of pose k n mage I, and l denotes an arbtrary poston. Let I maps (k) = {g (k) (I,l ),..., g (k) d (I,l d)} denotes the set of d map mages, part localzaton and pose recognton are formulated as an nference process guded by the spatal prors of dfferent poses represented by {L (k) k =1,..., K}. Usng Bayes law, the posteror dstrbuton for pose k can be wrtten n terms of the map mages I (k) maps and the spatal pror defned n (2) as, p L (k)(l I (k) maps) p L (k)(i (k) maps L).p L (k)(l). (12) Let P L (k)(i maps L) (k) = =d the terms n (12), we have =1 g(k) (I,l ), by manpulatng p L (k)(l I maps) (k) p L (k)(l r )g r (k) (I,l r ) p Lk (l l r )g (k) (I,l ). =r (13) Then pose recognton and part localzaton can be jontly obtaned by the followng optmzaton: {k,l } = arg max p L k,l (k)(l I(k) maps). (14) However, the drect evaluaton of (14) s computatonally prohbtve. We use the effcent nference engne proposed n [3] to obtan the soluton here. For any non-reference part of pose k, the qualty of an optmal locaton can be, ɛ k,(l r ) = max p L (k)(l l r )g (k) (I,l ). (15) l Gven p L (k)(l l r ) s Gaussan, ɛ,k (l r) can be computed by the generalzed dstance transform. Then, the posteror probablty of an optmal confguraton for pose k can be expressed n terms of the reference locaton l r and ɛ. Then the posteror probablty n (13) wll become, p Lk (L I (k) maps) p Lk (l r )g (k) r (I,l r ) =r ɛ k,(l r ), (16) whch wll lead to a new map mage G k (I,l r ) that ndcates how lkely the reference part of pose k s n each locaton. Ths new map mage G k (I,l r ) s the poolng results of all the map mages n I maps (k) va the spatal pror model of pose k,.e., L (k). Therefore, pose recognton and reference part localzaton can be effcently mplemented by {k,l r} = arg max k,l r G k=1:k (I,l r ). (17) After the reference part s located, the poston of each nonreference part can be obtaned by l = arg max p(l lr)g (k (I,l ). (18) l Accordng to the maxmum value of obtaned G k (I,lr), we could desgn a feedback loop to adjust the sze of watershed cells n low-level vson, as shown n Fg. 2. ) 8. Whole body Segmentaton va Graph-cut Recently, the graph-cut approach has acheved consderable success n mage segmentaton. It has the capacty to fuse both boundary and regonal cues n an unfed optmzaton framework [2]. Several exstng methods, such as [6], only ncorporate a sngle shape pror (edge-based or regon-based) nto the segmentaton process. Our contrbuton here s to combne two shape prors nto segmentaton where the mage s represented by watershed cells. After pose recognton and localzaton, onlne learned whole body shape prors, SWon L (p), M L w and pose confguraton L can be obtaned. Gven mage I = {C = 1,..., Z}, τ = {τ =1,..., Z} denotes the set of bnary class labels for all watershed cells (τ =0: background and τ =1: object). Followng the segmentaton energy defnton from [2] E(τ) =λ. Z R(τ )+ =1 C Cj = E(H,j )δ(τ,τ j ), (19) where R(τ ) s the regonal term, whch relates to the posteror probablty of C belongng to class τ ; E(H,j ) s the boundary term, whch represents the consstence between the edge-based shape pror M L w and local boundary formed by two cells, H,j = C Cj ; δ(τ,τ j )=1when τ = τ j otherwse δ(τ,τ j )=0; λ specfes a relatve mportance between two terms. The calculaton of R(τ ) nvolves onlne learnng of fgure/ground color models where regon-based shape pror SWon L (p) s nvolved for kernel-based densty estmaton, (w) (w) as dscussed n Secton 6.1. Let ˆf ob (x) and ˆf bg (x) be the fgure/ground color models, and α (w) and β (w) are computed from SW on (p) that denote the pror probabltes of C belongng to the object and background respectvely. Therefore, R(τ ) s defned as R(τ =1) = ln α (w) R(τ =0) = ln β (w) ˆf (w) ob (µ(c) ), (20) ˆf (w) bg (µ(c) ), (21) where µ (c) s the mean color vector of C. Usng the same dea of edge-based shape evaluaton defned n (9), let X = H,j, and we can defne E(H,j )=ρ M (X), whch evaluates the consstence between H,j and edge-based shape pror M L w n terms of the degree of parallelness and the shape smlarty. In a smlar way, all body parts can also be segmented. Moreover, the segmentaton n the hgh-level vson stage wll help us extract more useful features to prune possble false postves. For example, false postves can be dentfed by checkng the color smlarty between the two arms or legs. The feedback loop from segmentaton to localzaton(asshownnnfg.2) makes our framework a dynamc system that has potental to be further optmzed.

9. Expermental results Here we valdate the effectveness of the proposed approach on the CMU Mobo database [8], whch contans mages of 25 ndvduals walkng on a treadmll.

For each pose, there are totally sx parts (the head, torso, leftarm, rght-arm, left-leg, and rght-leg), and 200 manually segmented bnary mages are used for the off-lne learnng of part-based and whole

Totally, the expermental results are reported for three tasks, pose recognton, body/parts localzaton, and body segmentaton.

Although our approach s a dynamc process wth potental for further optmzaton, we have fxed the watershed transform n all experments (wthout the feedback from hgh-level vson to fne tune the watershed

e., Hgh-pont (H-pont), Contact and Passng. Table 1 compares the proposed method wth the 1-fan method n terms of localzaton error n pxel for sx body parts.

The comparson of localzaton errors (n pxel) between the two methods wth respect to three poses. Poses Methods Head Torso Larm Rarm Lleg Rleg H-pont 1-fan 6.7 7.9 10.7 13 16.4 15.4 hybrd 6.2 6.2 6.4 9.

7 9. Expermental results Here we valdate the effectveness of the proposed approach on the CMU Mobo database [8], whch contans mages of 25 ndvduals walkng on a treadmll. Each mage s down-sampled to the sze of pxels, and each body part s defned n a wndow of pxels. For each pose, there are totally sx parts (the head, torso, leftarm, rght-arm, left-leg, and rght-leg), and 200 manually segmented bnary mages are used for the off-lne learnng of part-based and whole body shape prors. The algorthm was programmed n C++, and the test platform s Pentum 4 3.2GHz and 1GB RAM. Totally, the expermental results are reported for three tasks, pose recognton, body/parts localzaton, and body segmentaton. The man competng algorthms are the 1- fan method n [3] for the frst two tasks and the graphccut algorthm defned n (19) for the last task. Although our approach s a dynamc process wth potental for further optmzaton, we have fxed the watershed transform n all experments (wthout the feedback from hgh-level vson to fne tune the watershed transform as shown n Fg. 2). Hgh-pont Passng Recol Fgure 7. Pose defntons [4]. Contact 9.2. Localzaton Based on the same test mages used for pose recognton, we have tested two methods on the localzaton of three poses,.e., Hgh-pont (H-pont), Contact and Passng. Table 1 compares the proposed method wth the 1-fan method n terms of localzaton error n pxel for sx body parts. It s shown that the results on localzng the head and torso are comparable for the two methods, and our approach shows sgnfcant advantages on localzng other body parts. Table 1. The comparson of localzaton errors (n pxel) between the two methods wth respect to three poses. Poses Methods Head Torso Larm Rarm Lleg Rleg H-pont 1-fan hybrd Contact 1-fan hybrd Passng 1-fan hybrd The reasons for above observatons s that the relatve poston between the head and torso has least varablty, and the part-based spatal pror that s shared by the two methods plays the major role for part localzaton, leadng to the smlar results. However, there s drastc (relatve) spatal varablty, both postonal and orentatonal, for the arms and legs, and the mprovements from the proposed method are much more sgnfcant due to the enhanced salency of the part-based map mages generated by the segmentatonbased hypothess-and-test paradgm. Some part localzaton results of three poses are shown n Fg. 8 (the frst three rows), where the proposed method successfully detects (and segments) all body parts despte the sgnfcant varablty Pose recognton In a walkng cycle, the human pose s a contnuous tmevaryng varable. Accordng to [4], a complete walkng cycle can be defned by eght poses that are further grouped nto four poses due to the symmetrc property. They are Contact, Recol, Passng and Hgh-pont, as shown n Fg. 7. In our experments, we combne poses Recol and Contact together due to ther strong smlarty. For each of the three poses, the torso s used as the reference part, and 230 labeled tranng data are collected for learnng the partbased spatal pror. It was found that the proposed approach acheves the recognton rate of 98% for the three poses over 144 test mages from 21 persons. The ms-classfcaton only occurs for pose passng that sometmes s very smlar to other two poses. One possble way to mprove the recognton for ths pose s to ncorporate the moton nformaton from vdeo sequences. The 1-fan method n [3] acheved the recognton rate of 93%. Fgure 8. Part localzaton of three poses and onlne learned whole body shape models that are used for the whole body segmentaton.

8 9.3. Segmentaton After localzaton, an onlne learned subject specfc shape pror (as shown n the last row of Fg. 8) susedn the new Graph-cut algorthm where both regon and edge prors are nvolved n the energy functon defned n (19). For comparson,the second term of the energy functon defned n (19) s replaced by a standard color smlarty-based boundary penalty term [2] wthout usng the edge-based shape pror. Currently, we only performed Graph-cut segmentaton on pose Contact due to ts least occluson. By settng λ =1/30 n (19) and usng 60 test mages, segmentaton results are evaluated by the rato between the falsely detected regon sze (ncludng both false postves and false negatves) and the ground truth regon sze. The error rate of the segmentaton usng both regon-based and edge-based prors s 17.2%, whle that of the one wthout usng the edge-based shape pror s 38.1%. Therefore, we have obtaned more than 50% mprovement. 10. Dscusson and Concluson In ths paper, we have proposed a hybrd body representaton that supports an ntegrated pose recognton, localzaton and segmentaton framework. Partcularly, segmentaton, as a brdge between bottom-up cues and top-down knowledge, plays an mportant role n all three levels of vson. In our experment, when the number of poses goes up, the performance of pose recognton wll deterorate due to the overlap between adjacent poses. However, the performance of body/part localzaton and segmentaton s qute stable and s less senstve to the error of pose recognton. We also found that the proposed approach s very robust to the occluson of one body part. More experments need to be done to test ts robustness for multple occluded parts. The proposed framework has potental to be a dynamc system wth the feedback loops that can be used for further optmzaton. In prncple, ths approach s applcable to any artculated object that has well defned spatal confguratons and can be decomposed nto dfferent parts that have lttle shape varablty. Our future research wll focus on extendng the proposed body representaton to be a dynamc human body representaton that supports vdeo-based pose recognton, localzaton, trackng and segmentaton. Acknowledgments The authors would lke to thank the revewers for ther valuable comments. Ths work s supported by the Natonal Scence Foundaton under Grant IIS (CAREER). References [1] A. Bssacco, M. H. Yang, and S. Soatto. Detectng humans va ther pose. In Neural Informaton Processng Systems (NIPS), [2] Y. Boykov and G. Funka-Lea. Graph cuts and effcent n- d mage segmentaton. Internatonal Journal of Computer Vson, 70: , , 8 [3] D. Crandall, P. Felzenszwalb, and D. Huttenlocher. Spatal prors for part-based recognton usng statstcal models. In Proc. of IEEE CVPR, , 3, 6, 7 [4] Dermot [5] P. Felzenszwalb and D. Huttenlocher. Pctoral structures for object recognton. Internatonal Journal of Computer Vson, 61(1):55 79, January [6] D. Freedman and T. Zhang. Interactve graph cut based segmentaton wth shape prors. In Proc. of IEEE CVPR, [7] R. L. Goldstone. Object, please reman composed. Behavoral and bran scences, 21: , [8] R. Gross and J. Sh. The cmu moton of body (mobo) database. Techncal Report CMU-RI-TR-01-18, Robotcs Insttute, Carnege Mellon Unversty, [9] X. Huang, D. Metaxas, and T. Chen. Metamorphs: Deformable shape and texture models. In Proc. of IEEE CVPR, [10] J. E. Hummel and B. J. Stankewcz. Two roles for attenton n shape percepton: A structural descrpton model of vsual scrutny. Vsual Congnton, 5:49 79, [11] S. Lee, G. Wolberg, and S. Y. Shn. Scattered data nterpolaton wth multlevel b-splnes. IEEE Trans. on Vsualzaton and Computer Graphcs, 3: , [12] B. Lebe, E. Seemann, and B. Schele. Pedestran detecton n crowded scenes. In Proc. IEEE Conference on Computer Vson and Pattern Recognton, [13] X. L and G. Hamarneh. Modelng pror shape and appearance knowledge n watershed segmentat. In The 2nd Canadan Conference on Computer and Robot Vson, pages 27 33, [14] Y.-C. Ln, Y.-P. Tsa, Y.-P. Hung, and Z.-C. Shh. Comparson between mmerson-based and toboggan-based watershed mage segmentaton. IEEE Trans. on Image Processng, 15: , [15] G. Mor, X. Ren, A. A. Efros, and J. Malk. Recoverng human body confguratons: combnng segmentaton and recognton. In Proc. of IEEE CVPR, volume 2, pages , [16] S. E. Palmer and I. Rock. Rethnkng perceptual organzaton: The role of unform connectedness. Psychonomc Bulletn and Revew, 1:29 55, [17] D. Ramanan. Learnng to parse mages of artculated bodes. In Proc. of Neural Info. Proc. Systems (NIPS),, [18] D. Ramanan. Usng segmentaton to verfy object hypotheses. In Proc. of IEEE CVPR, , 5 [19] P. Srnvasan and J. Sh. Bottom-up recognton and parsng of the human body. In Proc. of IEEE CVPR, [20] L. Vncent and P. Solle. Watersheds n dgtal spaces: an effcent algorthm based onmmerson smulatons. IEEE Trans. on Pattern Analyss and Machne Intellgence, 13: ,

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department