Capturing Global and Local Dynamics for Human Action Recognition

Size: px

Start display at page:

Download "Capturing Global and Local Dynamics for Human Action Recognition"

Pierce Poole
5 years ago
Views:

2014 22nd Internatonal Conference on Pattern Recognton Capturng Global and Local Dynamcs for Human Acton Recognton Sq Ne Department of Electrcal, Computer and System Engneerng Rensselaer Polytechnc

1 nd Internatonal Conference on Pattern Recognton Capturng Global and Local Dynamcs for Human Acton Recognton Sq Ne Department of Electrcal, Computer and System Engneerng Rensselaer Polytechnc Insttute Troy, New York Qang J Department of Electrcal, Computer and System Engneerng Rensselaer Polytechnc Insttute Troy, New York qj@ecse.rp.edu Abstract Human acton analyss has acheved great success especally wth the recent development of advanced sensors and algorthms that can effectvely track the body jonts. Temporal moton of body jonts carres crucal nformaton about human actons. However, current dynamc models typcally assume statonary local transton and therefore are lmted to local dynamcs. In contrast, we propose a novel human acton recognton algorthm that s able to capture both global and local dynamcs of jont trajectores by combnng a Gaussan-Bnary restrcted Boltzmann machne (GB-RBM) wth a hdden Markov model (HMM). We present a method to use RBM as a generatve model for mult-class classfcaton. Expermental results on benchmark datasets demonstrate the capablty of the proposed method n explotng the dynamc nformaton at dfferent levels. I. INTRODUCTION Human acton s the combnaton of the movements of body jonts over a tme nterval. Understandng a complex acton requres studyng not only the spatal confguratons among the body jonts, but also how they move at dfferent tme scales n the tme doman. Capturng the movements of the body jonts used to be a dffcult task, whch sgnfcantly lmted the performance of prevous vdeo-based human acton recognton, untl the recent emergence of low-cost and relable depth sensors such as Knect and effcent pose trackng systems [18] that can provde well-estmated jont postons n real tme. Jont trajectores present a more explct representaton of the acton dynamcs. However, these temporal characterstcs of human actons have not yet been thoroughly exploted, partally due to the lmtatons of current models. In ths work, we nterpret a human acton as a set of 3D trajectores of domnant body jonts. We comprehensvely nvestgate the underlyng temporal dynamcs of these trajectores for acton recognton. Modelng the temporal patterns of body jonts of a complex human acton s generally addressed by extractng bottom-level spato-temporal features from the mage sequences or desgnng top-level dynamc models such as hdden Markov model (HMM), dynamc Bayesan network (DBN) or condtonal random feld (CRF). Tme-slced dynamc models generally assume n th order Markov property and statonary transton. They, hence, can only capture local statonary transtons but cannot represent global movng pattern. Moreover, these assumptons may not hold for many real-world applcatons. Spato-temporal features are typcally based on local nterest ponts and therefore are also not able to descrbe the movement pattern throughout the whole acton process. Compared to tme-slced dynamc models, restrcted Boltzmann machne (RBM) has been demonstrated to have strong power to capture the jont dstrbuton of the nputs and therefore can be used to model the global patterns when the nput s a tme sequence of jont postons. To the best of our knowledge, RBM has not yet been appled to analyze the global dynamcs of trajectores for acton recognton, although t has been wdely used n many other applcatons such as mage and document analyss. To comprehensvely model the temporal dynamcs of human actons at dfferent levels, we propose a hybrd approach that combnes a Gaussan-Bnary restrcted Boltzmann machne (GB-RBM) to capture the global movement patterns wth an HMM to capture the local dynamcs. As GB-RBM s a varaton of the standard RBM, we use the term RBM n the followng sectons to represent our model. The local and global models capture complementary dynamc nformaton at dfferent tme scales and are combned through a fuson approach for acton classfcaton. A detaled llustraton of the framework s gven n Fgure 1. The remander of the paper s organzed as follows. Secton II presents an overvew of the related work. Secton III ntroduces the learnng process of RBM for acton representaton. Secton IV demonstrates the fuson method for global and local dynamc models. Expermental results are gven n Secton V. The paper s concluded n Secton VI. II. RELATED WORK Human actvty recognton has been wdely nvestgated n the past few decades. Dependng on the acton complexty, human actons are categorzed nto four dfferent levels: gestures, actons, nteractons and group actvtes [1]. In ths work, we focus on acton recognton,.e., a sngle person s actvtes that may be composed of multple gestures organzed temporally, such as wavng, runnng, and jumpng. Research n acton recognton generally follows two paths: sngle-layered approaches and herarchcal approaches. Snglelayered approaches recognze human actons drectly from /14 $ IEEE DOI /ICPR

Global Model Tranng Phase Testng Phase Moton Data RBM 1 RBM 2 RBM 3 Acton 1 Acton 2 Acton 3 f 12 f 13 f 23 Parwse Classfcaton Preference Score Model Fuson HMM 1 HMM 2 HMM 3 Local Model Fg. 1. Framework of the proposed method.

Each par of RBMs M and M j forms a parwse classfer, whch gves a preference score toward acton class or j. The preference scores of RBM and HMM are combned to make the fnal predcton.

In sngle-layered approaches, sequence of mages may be consdered as 3D volume [16], trajectores [15], or spatotemporal features.

2 Global Model Tranng Phase Testng Phase Moton Data RBM 1 RBM 2 RBM 3 Acton 1 Acton 2 Acton 3 f 12 f 13 f 23 Parwse Classfcaton Preference Score Model Fuson HMM 1 HMM 2 HMM 3 Local Model Fg. 1. Framework of the proposed method. For each class of acton, one RBM and one HMM are traned to represent the global and local dynamcs respectvely. Each par of RBMs M and M j forms a parwse classfer, whch gves a preference score toward acton class or j. The preference scores of RBM and HMM are combned to make the fnal predcton. sequental mages, whle herarchcal approaches represent actons wth smpler sub-actons. In sngle-layered approaches, sequence of mages may be consdered as 3D volume [16], trajectores [15], or spatotemporal features. The most wdely used spato-temporal features for vsble vdeos are hstogram of gradents (HOG) and hstogram of flows (HOF), whch capture the local appearance or moton nformaton. Features from depth mages and jont trajectores have also been developed recently wth the development of nexpensve and relable depth sensors. For nstance, Wang et al. [20] propose an LOP features whch are the frequency coeffcents of Fourer Transform of local features extracted from the depth mages around human jonts. Gven some specfcally desgned features, template matchng [16], neghborhood-based method [22] and other models are typcally used to make predctons. Herarchcal approaches typcally nclude statstcal approaches and descrpton-based approaches. Statstcal approaches construct statstcal state-based models that are concatenated herarchcally. Condtonal random felds [7] and hdden Markov models [12] are common examples of statstcal models. These dynamc models, ether generatve or dscrmnatve, assume statonary transton and hence are only able to capture local temporal nteractons between several consecutve frames. Descrpton-based approaches dvde human actons nto sub-events. Predcton s made by modelng the temporal and spatal relatonshp of sub-events [6]. Restrcted Boltzmann machne and ts varants are generally used as a tool for feature learnng or data pre-processng yet could also be used for modelng the moton data. For example, Wang et al. [21] uses RBM to get a pror probablty for fnger trace. Larochelle and Bengo [10] uses RBM to generate features for character recognton. Our work s nspred by the dea of Taylor et al. [19], where a Condtonal RBM (CRBM) s proposed to model the temporal transtons between consecutve tme slces and generate pseudo movement sequences. Stll, CRBM models local dynamcs by assumng n th order Markov property. Condtoned on prevous slces, t models the nformaton of the current tme slce. Unlke these works, RBM s used as a generatve model n ths research, whch models the hgh dmensonal sequental data and returns the lkelhood of the nput. Moreover, RBM s combned wth an HMM to jontly capture the global and local dynamcs of human actons. By utlzng an approach to estmate the relatve partton functons of RBMs, we are able to compare between dfferent RBMs, and thus make predctons. III. MODELING TRAJECTORIES In ths work, we propose to capture the global patterns of human jont trajectores usng the restrcted Boltzmann machne (RBM). We choose RBM due to ts capablty to model complex patterns n hgh dmensonal data. One RBM s learned to capture the global movng pattern of one type of acton. In ths secton, we wll frstly gve a bref ntroducton of restrcted Boltzmann machne. We wll then ntroduce how to use RBM to model a sequence of moton data. An approach to estmate the partton functon of RBM s then proposed to perform classfcaton among multple actons. A. Restrcted Boltzmann Machne A restrcted Boltzmann machne (RBM) s a generatve stochastc neural network that can learn a probablty dstrbuton over a set of nputs. As shown n Fgure 2, all the neurons form a bpartte graph: they have nput unts, correspondng to data, hdden unts that are learned, and each connecton n an RBM must connects a vsble unt to a hdden unt. In our work, the hdden unts are bnary and the vsble varables are assumed to follow normal dstrbuton. 1947

The energy functon E(v, h) s parameterzed n Equaton 1. Varable a s the bas, σ s the standard devaton of the Gaussan dstrbuton for vsble unt v. If the data s normalzed n each dmenson, then σ =1,a =0.

3 The energy functon E(v, h) s parameterzed n Equaton 1. Varable a s the bas, σ s the standard devaton of the Gaussan dstrbuton for vsble unt v. If the data s normalzed n each dmenson, then σ =1,a =0. b j s the bas of the hdden unt h j. The jont dstrbuton of the vsble and hdden varables s gven n Equaton 2. Wth contnuous nputs, the partton functon Z can be computed by the ntegral over all vsble nodes and summaton over all hdden unts. E (v, h) = (v a ) 2 2σ 2 v w j h j b j h j, (1) σ j j p (v, h) = 1 exp ( E (v, h)). (2) Z The probablty of an observaton can be calculated by margnalzng over the hdden varables, as shown n Equaton 3. p (v) = 1 exp ( E (v, h)). (3) Z h Hdden unts n the RBM have two states: on and off. Gven an nput vector v, the bnary state h j s set on wth probablty p(h j =1 v) =σ(b j + v w j ), (4) where σ(x) s the logstc sgmod functon 1/(1 + exp( x)). A hdden unt h j s connected to all the nputs, so t s actvated when there exsts some specfc pattern n the vsble layer through Equaton 4. The pattern s captured by the weghts connectng each element n v to h j. Thus the hdden layer h represent mportant patterns of v. Parameters of RBM nclude the weghts of the connectons between the hdden unts and vsble unts as well as ther bases. They are usually learned usng the Contrastve Dvergence (CD) [3] method to get an approxmate Maxmum- Lkelhood soluton. B. Modelng Actons usng RBM Typcally, the nput to the RBM s the lmted to a sngle mage. As we nterpret an acton as a combnaton of the 3D trajectores of human jonts, we propose to use RBM to model the whole sequence of an acton. The basc dea s to feed the jont postons along the temporal trajectory as the nputs to RBM, as shown n Fgure 3, where the t th vsble varable corresponds to the jont postons at tme slce t. Gven a total of N actons, N RBM s {M 1, M 2,, M N } are learned, Hdden Layer wth each model M learnng the temporal dynamcs for acton A. RBM can be effcently learned usng the contrastve dvergence algorthm (CD) [3]. However parameter estmaton of an RBM stll faces one or more of the followng challenges. Due to the non-convexty property of RBM, only local optmal solutons can be acheved. Dfferent ntalzatons could end up wth dfferent estmated parameters. Moreover, parameters are estmated n a generatve manner and therefore t does not necessarly beneft acton classfcaton. We propose a model selecton approach to smultaneously address all the above ssues. Model selecton s performed for every acton n turn. Consder selectng a model for the th acton, we frst generate K canddate RBMs {M k : k = 1,,K} from dfferent ntalzatons. These RBM canddates are then evaluated on the tranng set {V j j }, wth V representng the j th sample of the th acton. The score of each model M s defned n Equaton 5, where E(V j j M) corresponds to the energy of V on model M. The basc dea s that we hope the selected model can maxmally dfferentate the samples of the th acton from other actons n terms of ther lkelhood. Fnally the model that produces the hghest score s selected as the model for the th acton. The procedure s repeated for N tmes untl all the models are selected. Score(M) = j exp( E(V j M)) j exp( E(V j. (5) M)) The dfference between usng energy functon and lkelhood functon s the partton functon. Snce we are computng the lkelhood based on one sngle model, the partton functon s a constant, whch can be omtted n Equaton 5. Wth the RBM learned for each acton, t s nfeasble to compare between models, because calculatng the lkelhood requres calculatng the partton functons, whch s ntractable for RBM wth large number of hdden unts. Nevertheless, there stll exsts a method to estmate the relatve partton functon between dfferent RBM s. For bnary classfcaton, Schmah et al. [17] propose a method to dscrmnatvely estmate the dfference of log-partton functons of two RBMs. t j =logz log Z j. (6) h h 1 2 hm X X X1 2 3 X n Vsble Layer Fg. 2. Graphcal Illustraton of RBM Fg. 3. Modelng Actons wth RBM 1948

4 We extend ths approach to mult-class classfcaton wth a label rankng procedure [9] (Secton IV). C. Local Dynamc Model Local dynamc models capture the local nteractons between consecutve frames. In ths work we mplement hdden Markov model as a local dynamc model. An HMM s defned by the pror of the hdden states, the transton probabltes and the emsson probabltes. The well-known Expectaton-Maxmzaton algorthm (EM) [14] can be employed to estmate the parameters. The hdden states n HMM are generalzaton of the nput sequence. For nstance, f the nputs are actual jont postons, then the hdden states represent some specfc jont postons whch are crucal n the sequence. In ths way a sequence can be transformed nto a sequence of states. To recognze actons, we follow the same procedure as RBM and learn a group of HMM s, each of whch corresponds to one acton. Gven the query sample, ts lkelhood for each HMM s calculated usng the Forward- Backward procedure. HMM s treated as a local dynamc model because t assumes statonary transton and Markov property of the states. We only consder the transton between two consecutve frames. The score of HMM s smply the lkelhood of the observaton, whch s easy to compute usng the Forward-Backward algorthm. IV. FUSION OF GLOBAL AND LOCAL MODELS In ths secton, we ntroduce how we transform unnormalzed lkelhood of RBM nto confdence score, and together wth lkelhood of HMM for acton recognton. A standard procedure to classfy a query sample v s to compute ts lkelhood for all the models, and choose the model wth the greatest lkelhood, as shown n Equaton 7. y =argmaxp(v M ), (7) where y s the predcted result for nstance v. Let p (v) denote the unnormalzed lkelhood n RBM wth log p(v) = logp (v) log Z. A confdence score for a sequence s defned as: 1 F j (v) = 1+exp( α(log p (v M ) log p (v M j ) t j )), (8) where parameter α modfes the dstrbuton of the score, n case all the scores are too close to 0 or 1. The output of such soft bnary classfer can be nterpreted as a confdence value n the classfcaton: the closer the output F j to 1, the stronger the decson of choosng acton A s supported. A valued preference relaton R v s defned for any query nstance v: R v (, j) = { Fj (v) f < j 1 F j (v) f > j. (9) In our approach, we evaluate the score as sum all the confdence value S v () = j R v (, j). (10) The global and local temporal nformaton can be ntegrated at dfferent levels of the learnng process. In ths paper we propose to combne them n the predcton phase. The score of RBM and HMM models are lnearly combned (Equaton 11) wth a tuned weght ω, whch maxmze the recognton accuracy on a valdaton set, and the label wth the hghest score s proposed as the fnal decson. S(v) =S RBM (v)+ωs HMM (v). (11) V. EXPERIMENTS We evaluate our algorthm on three datasets: MSRC- 12 Knect gesture dataset [5], G3D dataset [2], and MSR Acton3D dataset [11]. Models that wll be compared n the experments nclude a global model RBM, a local model HMM, and the combned model. We wll also compare our proposed approach wth other related works. A. MSRC-12 Dataset The Mcrosoft Research Cambrdge-12 Knect gesture dataset conssts of sequences of human movements, represented as body-part locatons, and the assocated gesture to be recognzed by the system. The data set ncludes 594 sequences and 719,359 frames collected from 30 people performng 12 gestures. The moton fles contan tracks of 20 jonts estmated usng the Knect Pose Estmaton ppelne. The body poses are captured at a sample rate of 30Hz wth an accuracy of about two centmeters n jont postons. To deal wth the trackng nose, ansotropc dffuson [13] s employed to smooth the trajectores, correctng nose, yet preservng meanngful changes n moton. Fgure 4 llustrates an example of ansotropc dffuson. As the sze of the vsble layer of an RBM s fxed, lnear nterpolaton s performed to convert all sequences nto the same length (20 frames for each sequence). In ths work, we only use the 3D locaton nformaton of four domnant jonts (.e., two hands and two feet) due to the lmted number of samples. However the proposed approach can be appled to model more jonts f tranng data are adequate. The 3D postons of the body jonts along all three dmensons (x, y and z) are concatenated as the 240-dmenson nput vectors for the RBM model. The sze of hdden layer s set to be 150 accordng to the suggeston n Hnton [8]. The dataset s constructed both to measure the performance of recognton systems and evaluate varous methods of teachng human subjects how to perform dfferent actons. So t s parttoned along dfferent methods of nstructon, such as textonly or text and vdeo. In our work, dfferent nstructons are 1949

5 Fg. 4. Illustraton of the pre-processng of trajectores. The left fgure shows the orgnal trackng result of a jont and the rght fgure shows the processed trajectory after ansotropc dffuson smoothng. It s clear that the hgh frequency nose can be effectvely removed whle the turnng ponts n a trajectory are reserved. LftArms Duck PushRght Goggles WndUp Shoot Bow Throw HadEnough ChangeWeapon Fg. 5. BeatArms Kck LftArms Duck PushRght Goggles WndUp Shoot Bow Throw HadEnough ChangeWeapon BeatArms The confuson matrx of the proposed method on MSRC-12 dataset gnored and only vdeo-based actons are selected to evaluate the performance of the proposed algorthm. The acton markers provded wth the dataset are used to segment the actons from long sequences. 4-fold cross-subject valdaton confguraton s used n our experment. Detaled results are shown n Table I, and the confuson matrx s shown n Fgure 5. The local dynamc model acheves a recognton accuracy of 85.2%, whle the global dynamc model reaches 89.8%. Ths demonstrates the mportance of ncorporatng global dynamcs. Combnng global and local dynamc models, the proposed method can acheve an even better recognton accuracy of 93.1%. In partcular our method outperforms the state-of-the-art method as reported n Ells et al. [4]. Accordng to the confuson matrx, our algorthm performs pretty well on most of the actons, and only fals on a small porton of actons such as Had Enough and Lft Arms. B. G3D Dataset G3D dataset s an acton dataset contanng a range of gamng actons captured by Mcrosoft Knect. The dataset contans 10 subjects performng 20 gamng actons. Synchronzed vdeo, depth and skeleton data are avalable n ths dataset. We only use the skeleton nformaton n the experment. The acton segmentaton s manually labeled. The nput vectors are extracted followng the same procedure n Secton V-A. Half of the samples are used as testng data, 5 samples from each acton as valdaton data, and the other samples as tranng data. To compare wth the baselne method [2], we compute the F1 score for each category of actons. The result s shown n Table II. The proposed method outperforms baselne model for Kck TABLE I. PERFORMANCE COMPARISON OF DIFFERENT METHODS ON MSRC-12 DATASET Method Accuracy Hdden Markov Model 85.2% Ells et al. [4] 88.7% RBM 89.8% Proposed Method 93.1% most of the actons, but also encounters some falures n the actons of Tenns and ThrowBowlngBall. The reason s that when there s occluson of the body parts, the Knect tracker may fal occasonally, and gves the nferred results that wll affect the accuracy, whch s the case of TennsSwngBackhand, Golf and ThrowBowlngBall. Especally n Golf acton, only one sde of the subject can be seen by the camera, so the poston of one leg s nferred usng the trackng procedure, whch brngs n much trouble. Also, the movement range of the acton Walk and Jump s relatvely small, and may be confused wth each other. However, the overall accuracy of our algorthm s acceptable. From Table III, the combned model outperforms the global and local models, as expected. C. MSR Acton3D Dataset MSR Acton3D Dataset s dataset of 20 actons ncludng both depth mages and skeleton trackng results. The dataset reasonably cover the varous movements of arms, legs, torso and ther combnatons. Each acton s performed by 10 subjects, repeated 2 or 3 tmes. There are 567 sequences all together. We use the same four jont postons as our features. Followng the same cross-subject settng as [11], 5 subjects for each acton are selected for testng. For the remanng 5 subjects, 4 are used for tranng and 1 s used for valdaton. Ths dataset poses many challenges for recognton: there exst small between-class varatons (e.g., hgh arm wave and horzontal arm wave), and some actons nvolve complex nteractons among the body parts, thereby leadng to large amount of occlusons (one leg n front of the other or part of body s outsde the camera range) whch sgnfcantly decreases the trackng performance. Table IV llustrates recognton rate of the the local model (HMM), the global model (RBM), the combned approach as well as the results reported n [11], [20]. From the results we can see that the global model outperforms both local models by about 20%. Ths demonstrates the mportance of global dynamcs for dscrmnatng actons, and wth the proposed classfcaton approach, RBM can successfully capture the global dynamcs for acton recognton. Meanwhle, by TABLE II. TABLE III. F1 SCORE OF PROPOSED MODEL AND BASELINE MODEL Acton Bloom et al. [2] Proposed Method Fghtng Golf Tenns Bowlng FPS Drvng Msc Avg COMPARISON OF DIFFERENT METHODS ON G3D DATASET Method Accuracy Hdden Markov Model 77.4% RBM 84.0% Proposed Method 86.4% 1950

6 TABLE IV. COMPARISON OF DIFFERENT METHODS ON ACTION3D DATASET Method Accuracy Hdden Markov Model 55.3% L et al. [11] 74.7& RBM 79.6% Proposed Method 80.2% Wang et al. [20] 88.2% combnng the local model and global model, the proposed method can further mprove the classfcaton accuracy. The performance of our method s below the method reported n [20]. Such result s reasonable because our method only uses a subset of the jonts, and we do not use any features from the depth mages. The feature s much less than other methods that consder both appearance and shape. The proposed method needs more nformaton to classfy among smlar actons such as draw X, draw tck, and draw crcle. The model also cannot handle severely nosy or corrupted data lke bend. However, the proposed algorthm performs qute well on the dstnctve actons, especally complcated actons whch nvolve both hands and feet, lke tenns swng. VI. CONCLUSION In ths paper we propose a novel approach that captures both local and global dynamcal nformaton of the human jont trajectores for acton recognton. The contrbutons of ths paper are as follows. Frst, we ntroduce the Gaussan- Bernoull restrcted Boltzmann machne to model the moton data and capture the global dynamcs of human actons. RBM s used as a generatve model for dynamc modelng. A model selecton method s ntroduced to generate dscrmnatve models. We further propose a novel classfcaton approach to apply RBM for acton recognton. Second, we combne RBM wth hdden Markov model usng a fuson procedure to jontly explot global and local patterns. Fnally, expermental results demonstrate the effectveness of the proposed approach. ACKNOWLEDGMENT The work descrbed n ths paper s supported n part by the grant N from the Offce of Navy Research. [7] L. Han, X. Wu, W. Lang, G. Hou, and Y. Ja. Dscrmnatve human acton recognton n the learned herarchcal manfold space. Image and Vson Computng, 28(5): , [8] G. Hnton. A practcal gude to tranng restrcted boltzmann machnes. Momentum, 9:1, [9] E. Hüllermeer, J. Fürnkranz, W. Cheng, and K. Brnker. Label rankng by learnng parwse preferences. Artfcal Intellgence, 172(16): , [10] H. Larochelle and Y. Bengo. Classfcaton usng dscrmnatve restrcted boltzmann machnes. In Internatonal Conference on Machne Learnng, pages ACM, [11] W. L, Z. Zhang, and Z. Lu. Acton recognton based on a bag of 3d ponts. In Computer Vson and Pattern Recognton Workshops (CVPRW), pages IEEE, [12] F. Lv and R. Nevata. Recognton and segmentaton of 3-d human acton usng hmm and mult-class adaboost. European Conference on Computer Vson (ECCV), pages , [13] P. Perona and J. Malk. Scale-space and edge detecton usng ansotropc dffuson. Pattern Analyss and Machne Intellgence, IEEE Transactons on, 12(7): , [14] L. Rabner. A tutoral on hdden markov models and selected applcatons n speech recognton. Proceedngs of the IEEE, 77(2): , [15] C. Rao and M. Shah. Vew-nvarance n acton recognton. In Computer Vson and Pattern Recognton (CVPR), volume 2, pages II 316. IEEE, [16] M. D. Rodrguez, J. Ahmed, and M. Shah. Acton mach a spatotemporal maxmum average correlaton heght flter for acton recognton. In Computer Vson and Pattern Recognton (CVPR), pages 1 8. IEEE, [17] T. Schmah, G. E. Hnton, S. L. Small, S. Strother, and R. S. Zemel. Generatve versus dscrmnatve tranng of rbms for classfcaton of fmr mages. In Advances n Neural Informaton Processng Systems (NIPS), pages , [18] J. Shotton, A. Ftzgbbon, M. Cook, T. Sharp, M. Fnoccho, R. Moore, A. Kpman, and A. Blake. Real-tme human pose recognton n parts from sngle depth mages. In Computer Vson and Pattern Recognton (CVPR), pages IEEE, [19] G. Taylor, G. Hnton, and S. Rowes. Modelng human moton usng bnary latent varables. Advances n Neural Informaton Processng Systems (NIPS), 19:1345, [20] J. Wang, Z. Lu, Y. Wu, and J. Yuan. Mnng actonlet ensemble for acton recognton wth depth cameras. In Computer Vson and Pattern Recognton (CVPR), pages IEEE, [21] Z. Wang, G. Schalk, and Q. J. Anatomcally constraned decodng of fnger flexon from electrocortcographc sgnals. In Advances n Neural Informaton Processng Systems (NIPS), [22] A. Ylma and M. Shah. Recognzng human actons n vdeos acqured by uncalbrated movng cameras. In Internatonal Conference on Computer Vson (ICCV), volume 1, pages IEEE, REFERENCES [1] J. Aggarwal and M. S. Ryoo. Human actvty analyss: A revew. ACM Computng Surveys (CSUR), 43(3):16, [2] V. Bloom, D. Makrs, and V. Argyrou. G3d: A gamng acton dataset and real tme acton recognton evaluaton framework. In Computer Vson and Pattern Recognton Workshops (CVPRW), pages IEEE, [3] M. Carrera-Perpnan and G. Hnton. On contrastve dvergence learnng. In Artfcal Intellgence and Statstcs, volume 2005, page 17, [4] C. Ells, S. Masood, M. Tappen, J. LaVola, and R. Sukthankar. Explorng the trade-off between accuracy and observatonal latency n acton recognton. Internatonal Journal of Computer Vson, pages 1 17, [5] S. Fothergll, H. M. Ments, P. Kohl, and S. Nowozn. Instructng people for tranng gestural nteractve systems. In J. A. Konstan, E. H. Ch, and K. Höök, edtors, CHI, pages ACM, [6] A. Gupta and L. S. Davs. Objects n acton: An approach for combnng acton understandng and object percepton. In Computer Vson and Pattern Recognton (CVPR), pages 1 8. IEEE,

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today: