Evaluation scheme for Tracking in AMI

A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig: Evaluatio scheme.0 0 Jauary 2006 A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ a b Techische Uiversität Müche, Germay IDIAP, Switzerlad

AMI WP4 Trackig: Evaluatio scheme.0 Itroductio Sice a umber of trackig algorithms for AMI meetig scearios is developed at several istitutes, there is a certai ecessity to agree o a commo scheme to evaluate the performace of the differet approaches. I the followig paragraph a fudametal cocept based o [] for such a scheme is itroduced, defiig how to evaluate multiple object trackig for ukow cofiguratios. 2 Coverage test I order to determie the quality of a trackig result for a sigle object, we itroduce two shapeidepedet measures, idicatig if a groud truth object is beig tracked ad which E i is coected to which GT j : Recall α i,j = Ei GT j GT j Precisio β i,j = Ei GT j E i While the first measuremet (recall) represets the ratio of the groud truth area, which is covered by the estimate, the precisio embodies the ratio of the estimate area covered by the groud truth. As it ca be show very easily, both α ad β must be high to obtai good trackig results. For this reaso, a coverage test usig the F-measure [2] F i,j = 2α i,jβ i,j α i,j + β i,j () has to be passed, returig oly a high value if α i,j ad β i,j are high. This test is cosidered to be passed, if F i,j exceeds a fixed threshold t c ad thus determies, that GT j is beig tracked by E i. 3 Cofiguratio test To facilitate the explaatios i the followig sectios some defiitios will be itroduced at first. I this documet labeled trackig targets are deoted as groud truth objects GT, tracker outputs are referred to as estimates E. The output of a trackig approach is cosidered to be correct, if ad oly if oe GT (resp. E) is trackig exactly oe GT (resp. E). I the followig sectios there will be defied what kid of errors arise ad how they ca be detected. 3. Cofiguratio error measures I this cotext, cofiguratio meas the umber, the locatio ad the size of all objects i a frame of the sceario. Accordig to the above defiitio of a correct tracker output, a cofiguratio error occurs if the size or the locatio of a certai E i ad its related GT j do ot match. To idetify all types of errors that may occur, 4 cofiguratio measures are itroduced: a) Measure F P - False positive. There is a E idicatig a object, where o GT is. b) Measure F N - False egative. A GT is ot tracked by a E. c) Measure MT - Multiple trackers. More tha oe E is associated with oly oe GT. I order to obtai the subjective impressio of a huma spectator each excess E is couted as a MT error. d) Measure MO - Multiple objects. More tha oe GT is associated with oly oe GT. Agai a MO error is assiged for each excess E. For each of these errors above a example is depicted i Fig., where the groud truth is marked with gree, the estimates with red resp. blue colored boxes.

2 AMI WP4 Trackig: Evaluatio scheme.0 False egative False positive Multiple tracker Multiple object Figure : Example for the cofiguratio errors 3.2 Occlusio hadlig Situatios with occlusio will be treated i a special maer, sice MO or MT errors might occur although the estimates are correctly placed. For this reaso groud truth labels are elarged by a additioal flag occ j idicatig a occlusio i the image data. This flag is defied for each object ad is set to oe, if the ratio of the groud truth area from object j, which is covered by the groud truth object k, exceeds a certai threshold t o. {, GT occ j = k s.t. GT j GT k > t o 0, otherwise (2) For all situatios with a set occlusio flag there will be o evaluatio of ay error, i.e. oe of the error measuremet scores itroduced above is icreased ad thus o groud truth data has to be available for these frames. 3.3 Cofiguratio evaluatio procedure To eable a performace evaluatio of differet trackig approaches evaluated o diverse data sets, all those measuremets preseted above have to be ormalized by both the umber of groud truth objects NGT t per frame ad the umber of frames as listed i the structure chart below. Sice there may occur frames with o GT labeled at all, ormalizig by simply NGT t would fail ad thus the deomiator was chose to max(ngt t, ) to avoid a divisio by zero for N GT t = 0. For a easy compariso of trackig algorithms a quality measure ME is computed from the error measuremets. Sice the huma impressio does ot cosider oe of the error types much more severe tha other oes, agai the F-measure is used to compute the quality measure. Structure chart for the cofiguratio evaluatio procedure calculate F i,j for each E i combied with each GT j if F i,j > t c if GT j ot already mapped: map GT j E i else icremet MO else icremet F P if F i,j > t c if E i ot already mapped: map E i GT j else icremet MT else icremet F N

AMI WP4 Trackig: Evaluatio scheme.0 3 report F P, F N, MT ad MO F P = F P MT = MT max(n t GT, ), F N = F N max(ngt t, ), MO = MO max(n t GT, ) max(n t GT, ) compute ME = 4F N F P MT MO F N+F P +MT +MO 4 Idetificatio test I the field of trackig, idetificatio meas that a particular E tracks exactly oe GT over its etire lifetime ad thus correctly idetifies this groud truth object. Amog several methods to associate idetities that could be cosidered, each with its assets ad drawbacks, a approach based o a majority rule was chose to represet the idetificatio associatios. Thus a GT j is said to be idetified by that E i which tracks object j most of the time, ad vice versa E i idetifies that GT j where it spet most of the time. 4. Idetificatio error measures Examiig trackig scearios there arise two differet types of idetificatio failures. The first type occurs, whe oe estimate i suddely stops trackig groud truth object j ad aother estimate k cotiues trackig this groud truth object. The secod error type results from swappig the groud truth paths, i.e. a estimate i iitially tracks GT j ad after a while chages to track GT k. To detect all these idetificatio errors, the measures listed below are itroduced: a) Measure F IT - Falsely idetified tracker. A E i which passed the coverage test for GT j is differet to that idetifyig this groud truth object before. b) Measure F IO - Falsely idetified object. A GT j which passed the coverage test for E i has ot bee the idetified object i the frame before. Sice these measuremets oly report chages i associatios of Es ad GT s, a purity measure is itroduced to evaluate the degree of cosistecy to associatios betwee a E ad a GT. a) Measure OP - Object purity. If GT j is the groud truth object which has bee idetified by E i for most of the time, the OP is the ratio of frames that GT j is correctly idetified by E i ( i,j ) to the overall umber of frames ( j ) GT j exists. Agai the errors metioed above are visualized i the example (Fig. 2) below, where the each box describes a estimate. 4.2 Idetificatio evaluatio procedure Similar to the cofiguratio evaluatio procedure agai all measuremets have to be ormalized by the umber of groud truth objects NGT t per frame ad the umber of frames as listed i the

4 AMI WP4 Trackig: Evaluatio scheme.0 Situatio at time step t- Situatio at time step t : FIT Situatio at time step t : FIO Figure 2: Example for the idetificatio errors structure chart below. For the idetificatio task it is difficult to create oly oe value idicatig the performace of the algorithm, thus all three measures should be reported to get a idea of the quality of the idetificatio capability of a approach. Structure chart for the idetificatio evaluatio procedure if GT j,t E i,t if GT j,t E k,t icremet FIT if GT j,t ot mapped before icremet FIO report F IT, F IO, OP F IT = F IT F IO = F IO OP = N GT N GT j=0 max(n t GT, ), max(n t GT, ), i,j j 5 Traiig ad Evaluatio Video Set To get comparable evaluatio results for the trackig algorithms developed by the differet parters i AMI we will defie a commo video set for the evaluatio. This video set should cotai as much of the challeges which have led to the acquisitio of the special side-corpus AV6.7-ami, thus the followig sets have bee defied for the evaluatio, which may oly be used for the evaluatio task itself ad ot e.g. for tuig parameters: Eval I : Sequeces from the side corpus AV6.7ami (2, 3, 9, 2, 4) Eval II : Sequece from the AMI core corpus (008b) Eval III : Sequeces from the side corpus AV6.7ami (, 8, 3, 6)

AMI WP4 Trackig: Evaluatio scheme.0 5 Sice each of the specified sequeces cosists of three avi-files (left, right ad cetral camera view) o which our algorithms will be evaluated, this material offers a total amout of approximately.5 h of video data for the evaluatio of trackig modules. For the deliverable oly results for Eval I ad Eval II have to be reported. Below you ca fid the webliks to get the video sequeces: AMI core corpus : http://mmm.idiap.ch/private/amizoe/idiaphub.html AV6.7ami : ftp://mmm.idiap.ch/private/ami/90640383/ All measuremet errors itroduced above will be reported accordig to this video evaluatio set. The video material is fully aotated usig differet aotatio rates depedig o the level of dyamics of the perso i the sequece. The advatage of this proceedig is a reductio of the effort i aotatig parts (especially easy parts like seated people) while givig more aotatio resolutio o parts that are more iterestig for trackig (e.g. somebody leavig). For this reaso videos will be aotated based o three differet levels of accuracy: Slow ( frame/ 5 secods) - people seated or stadig for several miutes Middle ( frame/ secod) - people stadig for oe miute or so max Fast (2 frames/ secod) - people eterig/seatig/stadig up/movig to white board The aotatio data ca be foud at ftp://mmm.idiap.ch/private/ami/90640383/. To derive the aotatio resolutio please refer to the frame umber explicitely give i the files. All other video material from the AMI corpus (both mai ad side corpus) - except the evaluatio test set metioed above - is free to be used for traiig the detectors ad modules of the iveted trackig algorithms. 6 Data storage format I order to facilitate a joit evaluatio i the scope of AMI trackig techologies, a commo evaluatio tool has bee developed ad spread amog all parters (also dowloadable at http://www.idiap.ch/ smith/amitrack.html). For simplifyig the usage of this tool each trackig algorithm has to provide the output i the same way, i.e. a head boudig box is geerated eclosig each tracked object. This result has to be stored for the evaluatio tool i a simple ASCII-file accordig to the followig file format: frame [frame umber] object [idetifier] <tab> [head boudig box] object [idetifier] <tab> [head boudig box] I this file format descriptio all expressios i brackets have to be replaced by the real umbers. For each frame, first provide the frame umber (the results ad groud truths must cover the same set of frame umbers), followed by the object parameters. Object parameters iclude a uique idetifier ad the locatio of the object i the image. The idetifiers eed ot (ad should ot ecessarily) match betwee the groud truth ad trackig results, but they should be cosistet withi each. For each frame, provide the object parameters of every object preset (i the results or the groud truth). If there are o groud truths or estimates preset, just provide the frame umber. Objects must be represeted by boudig boxes (i both trackig ad groud truth). The boudig boxes are defied by four umbers, (x,y,w/2,h/2). The poit (x,y) idicates the locatio of the ceter of the boudig box, w/2 is the distace from the ceter to oe of the vertical edges (or half-width), ad h/2 is the distace from the ceter to oe of the horizotal edges (or half-height). All coordiates have to be refereced to the top left image origi.

6 AMI WP4 Trackig: Evaluatio scheme.0 Refereces [] K. Smith, S. Ba, J. Odobez, ad D. Gatica-Perez, Evaluatig multi-object trackig, Sa Diego, CA, USA, Jue 2005, vol. Workshop o Empirical Evaluatio Methods i Computer Visio (EEMCV). [2] C. J. Va Rijsberge, Iformatio Retrieval, Butterworth-Heiema, Newto, MA, USA, 979.