Robust visual tracking based on Informative random fern

5th Internatonal Conference on Computer Scences and Automaton Engneerng (ICCSAE 205) Robust vsual trackng based on Informatve random fern Hao Dong, a, Ru Wang, b School of Instrumentaton Scence and Opto-electroncs Engneerng, Behang Unversty, Bejng 009, Chna; a qanxn_dh@63.com, bwangr@buaa.edu.cn Keywords: Vsual trackng, IRF-LD, Gaussan projecton, Real tme Abstract. In ths paper, a novel vsual trackng algorthm named as Informatve random fern rackng Learnng Detecton (IRF-LD) has been proposed. Instead of a bnary comparson n the standard random fern of LD, we use the real value feature and Gaussan random projecton to acqure the advantages of hgh accuracy and low memory requrement. Expermental results on challengng sequences have demonstrated the superor performance of our IRF-LD when compared wth several state-of-the-art trackng algorthms.. Introducton Vsual trackng s one of the most mportant problems n computer vson. It s the bass for many applcatons such as survellance, human computer nteracton and acton recognton, etc. Many methods have been proposed for vsual trackng over the past few decades. Generally speakng, most trackers can be dvded nto two categores: generatve models and dscrmnatve models. Generatve models [] are typcally formulated as searchng the most smlar mage regon wth mnmal reconstructon error. Owng to the fact that they concern only about the appearance of the object, the generatve models often fal n cluttered background. For dscrmnatve models [2], trackng s treated as a bnary classfcaton task that fnds the decson boundary between the target and the background. Compared wth generatve models, dscrmnatve models are usually more resstant to cluttered background snce they explctly sample mage patch from the background as negatve example to tran the classfer. Kalal et.al [3] proposed a novel approach called rackng Learnng Detecton(LD), n whch trackng and detecton are ndependent processes that exchange nformaton va learnng. Random fern [4] classfer s one mportant component of the cascade detector n LD and shows excellent performance. However, there exst some potental problems wth t. Frst, the comparson of each pxel par produces only two outputs, 0 or, leadng to lots of nformaton loss. In addton, the random fern classfer n LD requres enormous memory, havng an exponental relatonshp wth the number of pxel pars n a fern. o address the ssues, we extend the LD based on nformatve random fern whch produces the real value feature for a fern based on subtracton and Gaussan projecton. he rest of ths paper s organzed as follows. In Secton 2, the ntroducton of random fern s ntroduced. he proposed method IRF-LD s presented n Secton 3. Secton 4 shows the expermental results, followed by concluson n Secton 5. 2. Prelmnares In random fern, the smple ntensty comparsons between pxel pars are chosen as the bnary features. Let f, =,.., N denotes the bnary feature that extracted from an mage patch whch to classfy. he class c for ths mage can be descrbed by () c = arg max p (c f, f 2,..., f N ) c C Here, C s the set of all classes. Usng Bayes formula, the posteror can be wrtten as 206. he authors - Publshed by Atlants Press 689

p( f, f2,..., fn c) pc ( ) pc ( f, f2,..., fn ) = (2) p( f, f2,..., fn ) We take the denomnator as a constant and assume the probablty pc () s unform, then () s equal to c = arg max p( f, f,..., f c) (3) c C Ozuysal et al. [4] proposed to dvde the features nto several groups, and assumed the dfferent groups are ndependent of each other. Formally, 2 2 N/ S p( f, f,..., f c) pf ( c) N = N = (4) Where S s the number of pxel pars n each fern and F s a group of features, named as a fern. = N Ss the total number of ferns. In practce, S cannot be too small, thus the memory occupaton s very enormous. 3. IRF-LD algorthm 3. IRF-LD framework Our whole IRF-LD trackng approach s summarzed n Fg.. It nherts the framework of LD whch decomposes the long-term trackng task nto trackng, detecton and learnng. he target s followed by a tracker from frame to frame and ts moton s estmated usng the Lucas-Kanade tracker extended wth falure detecton. he task of learnng s to ntalze the cascade detector n the frst frame and update t n run-tme usng the P-N experts. In orgnal LD, the cascade detector, whch s responsble for selectng the most possble target canddate n each frame, conssted of three stages: () patch varance: ths stage can rejects those patches wth gray-value varance smaller than 50 percent of the varance of the target patch; () random fern: t performs a quantty of pxel comparsons on a patch resultng n a bnary code, whch ndexes to an array of posterors. () nearest neghbor: t s the last stage to dvded each canddate patch nto target object or background by appearance usng Normalzed Correlaton Coeffcent. In our IRF-LD, the tracker and learnng methods from LD are adopted. Meanwhle, some mprovements are made n the cascade detector. Instead of the bnary comparson n the orgnal random fern of LD, we ntroduce the nformatve random fern classfer to mprove the robustness of the detector. he more nformatve real value from the subtracton s used n our method. Moreover, a random projecton s utlzed to map the value of each fern derved from feature value to a parametrc dstrbuton, specfcally, Gaussan dstrbuton, n whch the classfcaton s done. In the followng, the proposed IRF classfer wll be descrbed n detal from three steps: feature formaton, classfcaton wth probablty and onlne update. Fg. Framework of IRF-LD trackng algorthm 3.2 Feature Formaton We adapt the real value feature from [5],.e., the real value feature f, j descrbed n Eq.(5) s extracted from pxel par j of fern : f, j= Id ( ( j, )) Id ( 2( j, )), (5) Where Id ( ) represents the ntensty of an mage patch I at d. d (, j ) and d (, j ) 2 denote the 690

coordnates of the randomly generated pxel par j of fern. Obvously, the real value feature can preserve more nformaton about the ntensty dfference between two pxels because of f, j nstead of f, j { 0,}. Snce the feature f, j s a real value, t s necessary to encode all real values n each fern nto a sngle real value to smplfy the subsequent classfcaton. A theoretcal bass for ths dea has been stated by Johnson-Lndenstrauss(JL) lemma [6] that wth hgh probablty the dstances between the ponts n the hgh-dmensonal space are preserved f they are projected onto a randomly selected low-dmensonal subspace. Besdes, the lterature [6] also proved that for k-sparse data (e.g, mage and audo sgnal), the random matrx such as Gaussan matrx satsfyng the JL lemma holds true for the restrcted sometry property n compressve sensng. herefore, we use the Gaussan matrx to facltate effcent projecton from feature values of dfferent pxel pars nto a sngle real value n ths paper. Formally, F S = rf (6) j, j j= Where rj ~ N (0,) s a real value generated randomly accordng to a Gaussan dstrbuton. Besdes, comparng the proposed IRF wth the standard random ferns method, we can fnd that the IRF has the advantages of requrng a constant and much lower memory from the followng analyss. Assumng that the number of classes s γ = 2 (foreground and background) and the real value feature s stored n a sngle precson type whch occupes 4 Bytes. hen the memory requrement s MEM Our = γ 4. Whle n the standard random ferns method used n LD, a specfc bnary code s stored n an ntegral type whch occupes 4 Bytes. he memory requrement s MEM 2 S LD = γ 4. It s clear that the standard random fern method n LD needs memory 2 S tmes more than the proposed IRF method. 3.3 Classfcaton wth probablty: In IRF, the output F s calculated as a sngle real value produced randomly on the bass of Gaussan dstrbuton. For smplcty, we model the probablty pf ( c ) as a Gaussan dstrbuton c c µ, σ for fern of class c. Whereupon, the dscrmnatve functon s wth parameters ( ) H( F) = log = = pf ( c= ) pc ( = ) pf ( c= 0) pc ( = 0) = log( pf ( c= )) log( pf ( c= 0)) = = Where we assume unform pror pc ( = ) = pc ( = 0), c { 0,} s a bnary varable whch represents the sample label and F = { F, F2,..., F } s a set contanng the value of all ferns for an mage patch. he IRF classfes the patch as the target f the correspondng value H( F ) s larger than zero. 3.4 Onlne Update: o ntegrate our IRF feature that the value of each fern s modeled as a Gaussan dstrbuton wth c c µ, σ to the target model, we smplfy the update of the classfer as a parameter update: parameter ( ) µ λµ + ( λ ) µ c c c, new σ λσ ( ) + ( λ)( σ ) + λ( λ )( µ µ ) c c 2 c, new 2 c c, new 2 c, new c, new 2 2 Where λ s the learnng rate, µ = EF [ c] and σ = E[( F c) ] ( E[ F c]) are estmated from the tranng samples that are generated by P-N experts at current frame. 4. Experments Our IS s mplemented n C++, whch runs at 25 frames per second on an Intel Dual-Core 3.30GHz CPU wth 4G RAM. hree state-of-the-art algorthms on 6 fully-annotated vdeo sequences ncluded LD [3], OAB [7] and C [2] are used to valdate the performance of our IRF-LD (7) (8) 69

algorthm. All of these algorthms are evaluated n the one-pass evaluaton(ope) [8], and these sequences wth the correspondng ground truth fles and the compared code lbrary are avalable on the webste: http://vsual-trackng.net. In all the experments, the total number of ferns s set to = 50, the number of pxel pars n a fern s decded as S = 4, and the learnng rate λ s selected as 0.85. 4. Evaluaton Metrc We use the precson plots and success plots [8] to evaluate the robustness of trackng algorthms quanttatvely. he precson plot shows the percentage of frames whose estmated center locatons are wthn the gven threshold dstance of the ground truth. o compare the performances of dfferent algorthms, the score for the threshold equal to 20 pxels s used to be the representatve precson score. Meanwhle, the success plot s based on the overlap rato that s OS = Area( bt ba) Area( bt ba), where b t s the tracked target box and b a denotes the ground truth box. he success plot shows the ratos of frames wth OS > t0 throughout all threshold t 0 [0,]. he area under curve(auc) of each success plots serves as the frst measure to rank the trackng algorthms n the followng. 4.2 Result and Analyss he overall performance of the 4 trackng algorthms based on success plots and precson plots are llustrated n Fg.2. Accordng to the expermental results, our algorthm acheves outstandng performances n both the metrc overlap and center locaton error: n the success plot, t acheves an AUC score of 0.58 and ranks st. Moreover, our IRF-LD algorthm outperforms LD by 6.8%. Meanwhle, the overall precson of our IRF-LD at 79.3% s stll the hghest among all algorthms, yet beatng LD by.%. Fg. 2 he overall performance of the 4 trackng algorthms on all vdeo sequences o further analyze the performance of IRF-LD, the AUC scores and precson scores for each sequence are also shown n able. Some sampled results on sequences are llustrated n Fg.3. From able, we can observe that IRF-LD performs best on 4 out of 6 sequences (the talc fonts ndcate the best performance). Note that there exst many challengng factors n these vdeos that IRF-LD acheves favorable results. For nstance, the sequences faceocc2, carscale, and sylverster have the attrbutes of scale varaton and (n-)out-of plane rotaton, n whch faceocc2 and carscale also have occluson attrbute, thereby makng them far more challengng. Notwthstandng, IRF-LD performs persstently well from begnnng to end. able he AUC/Precson scores on each sequence Sequences Our LD OAB C Faceocc2 0.69/0.792 0.6/0.856 0.593/0.708 0.602/0.68 Sylvester 0.676/0.946 0.666/0.949 0.557/0.74 0.659/0.90 Carscale 0.575/0.78 0.452/0.853 0.398/0.663 0.433/0.78 Skatng 0.366/0.495 0.90/0.38 0.394/0.688 0.086/0.090 Doll Deer 0.56/0.98 0.690/0.95 0.566/0.983 0.590/0.732 0.533/0.874 0.640/0.958 0.455/0.684 0.039/0.042 IRF-LD not merely nherts the orgnal trackng framework of LD, the superorty of our IRF-LD algorthm compared wth LD manly les n that: ts cascade detector performance s further mproved by combnng the IRF classfer. In addton, IRF-LD produces the real value for a fern based on subtracton and Gaussan projecton, leadng to a more nformatve result than the 692

bnary feature used n the LD. Hence, the mantanng of the dversty of real value features enables IRF-LD to practce excellently n the presence of sgnfcant drastc appearance changes. 5 Concluson In ths paper, we have proposed a novel trackng method based on LD and Informatve random fern. he proposed method has advantages of hgh accuracy and low memory requrement, thus s very approprate for embedded systems. Expermental results show that t performs better than some other methods on most vdeo sequences. #4 #365 #72 #275 #363 #90 (a)faceocc (b)sylvester #50 #324 2 #3585 #0 #46 #68 (c)doll (d)skatng #50 #92 #48 #9 #32 #48 Acknowledgement (e)carscale (f)deer Our LD OA C Fg. 3 Screenshots from some of some sampled trackng results. hs work was partally supported by Natonal Natural Scence Foundaton of Chna(6097408). References []. Ross, D.A., Lm, J., Ln, R.S., Yang, M.H. Incremental Learnng for Robust Vsual rackng[j]. Internatonal Journal of Computer Vson, 2008, 77(-3): 25-4. [2]. Zhang, K., Zhang, L., Yang, M.-H. Real-tme compressve trackng[c]. European Conference on Computer Vson. Italy, 202, pp. 864-877. [3]. Kalal, Z., Mkolajczyk, K., Matas, J. rackng-learnng-detecton[j]. IEEE ransactons on Pattern Analyss and Machne Intellgence, 202, 34(7): 409-422. [4]. Ozuysal, M., Fua, P., Lepett, V. Fast keypont recognton n ten lnes of code[j]. IEEE Conference on Computer Vson and Pattern Recognton. USA, 2007, pp. -8. [5]. Zhang, J., Lu, K., Cheng, F., L, Y. Vsual trackng wth randomly projected ferns[j]. Sgnal Processng: Image Communcaton, 204, 29(9):987-997. [6]. Achloptas, D. Database-frendly random projectons: Johnson-Lndenstrauss wth bnary cons[j]. Journal of computer and System Scences, 2003, 66(4), 67-687. [7]. Grabner, H., Grabner, M., Bschof, H. Real-me rackng va On-lne Boostng[C]. Brtsh Machne Vson Conference. UK, 2006, 47-56. [8]. Wu, Y., Lm, J., Yang, M.-H. A benchmark[c]. IEEE Conference on Computer Vson and Pattern Recognton. USA, 203, pp. 24-248. 693