Multi-camera multi-object voxel-based Monte Carlo 3D tracking strategies

Size: px

Start display at page:

Download "Multi-camera multi-object voxel-based Monte Carlo 3D tracking strategies"

Amos Bruce
5 years ago
Views:

1 RESEARCH Open Access Muli-camera muli-objec voxel-based Mone Carlo 3D racking sraegies Crisian Canon-Ferrer *, Josep R Casas, Monse Pardàs and Enric Mone Absrac This aricle presens a new approach o he problem of simulaneous racking of several people in low-resoluion sequences from muliple calibraed cameras. Redundancy among cameras is exploied o generae a discree 3D colored represenaion of he scene, being he saring poin of he processing chain. We review how he iniiaion and erminaion of racks influences he overall racker performance, and presen a Bayesian approach o efficienly creae and desroy racks. Two Mone Carlo-based schemes adaped o he incoming 3D discree daa are inroduced. Firs, a paricle filering echnique is proposed relying on a volume likelihood funcion aking ino accoun boh occupancy and color informaion. Sparse sampling is presened as an alernaive based on a sampling of he surface voxels in order o esimae he cenroid of he racked people. In his case, he likelihood funcion is based on local neighborhoods compuaions hus dramaically decreasing he compuaional load of he algorihm. A discree 3D re-sampling procedure is inroduced o drive hese samples along ime. Muliple arges are racked by means of muliple filers, and ineracion among hem is modeled hrough a 3D blocking scheme. Tess over CLEAR-annoaed daabase yield quaniaive resuls showing he effeciveness of he proposed algorihms in indoor scenarios, and a fair comparison wih oher sae-of-he-ar algorihms is presened. We also consider he real-ime performance of he proposed algorihm. 1 Inroducion Tracking muliple objecs and keeping record of heir ideniies along ime in a cluered dynamic scene is a major research opic in compuer vision, basically fosered by he number of applicaions ha benefi from he rerieved informaion. For insance, muli-person racking has been found useful for auomaic scene analysis [1], human-compuer inerfaces [2], and deecion of unusual behaviors in securiy applicaions [3]. A number of mehods for camera-based muli-person 3D racking have been proposed in he lieraure [4-7]. A common goal in hese sysems is robusness under occlusions creaed by he muliple objecs cluering he scene when esimaing he posiion of a arge. Single-camera approaches [8] have been widely employed, bu hey are vulnerable o occlusions, roaion, and scale changes of he arge. In order o avoid hese drawbacks, muli-camera racking echniques exploi spaial redundancy among differen views and provide 3D informaion a he acual scale of he objecs in he real world. Inegraion * Correspondence: crisian.canon@gmail.com Image and Video Processsing Group, Technical Universiy of Caalonia, Barcelona, Spain of daa exraced from muliple cameras has been proposed in erms of a fusion a feaure level as image correspondences [9] or muli-view hisograms [10] among ohers. Informaion fusion a daa or raw level has been achieved by means of voxel reconsrucions [11], polygon meshes [12], ec. Mos muli-camera approaches rely on a separae analysis of each camera view, followed by a feaure fusion process o finally generae an oupu. Exploiing he underlying epipolar geomery of a muli-camera seup oward finding he mos coheren feaure correspondence among views was firs ackled by Mikič e al. [13] using algebraic mehods ogeher wih a Kalman filer, and furher developed by Focken e al. [14]. Exploiing epipolar consisency wihin a robus Bayesian framework was also presened by Canon-Ferrer e al. [9]. Oher sysems rely on deecing semanically relevan paerns among muliple cameras o feed he racking algorihm as done in [15] by deecing faces. Paricle filering (PF) [16] has been a commonly employed algorihm because of is abiliy o deal wih problems involving muli-modal disribuions and non-lineariies. Lanz e al. [10] proposed a muli-camera PF racker exploiing foreground and color 2011 Canon-Ferrer e al; licensee Springer. This is an Open Access aricle disribued under he erms of he Creaive Commons Aribuion License (hp://creaivecommons.org/licenses/by/2.0), which permis unresriced use, disribuion, and reproducion in any medium, provided he original work is properly cied.

2 Page 2 of 15 informaion, and several conribuions have also followed his pah: [4,7]. Occlusions, being a common problem in feaure fusion mehods, have been addressed in [17] using HMM o model he emporal evoluion of occlusions wihin a PF algorihm. Informaion abou he racking scenario can also be exploied oward deecing and managing occlusions as done in [18] by modeling he occluding elemens, such as furniure, in a raining phase before racking. I mus be noed ha, in his aricle, we assume ha all cameras will be covering he area under sudy. Oher approaches o muli-camera/muli-person racking do no require maximizing he overlap of he field of view of muliple cameras, leading o he nonoverlapped muli-camera racking algorihms [19]. Muli-camera/muli-person racking algorihms based on a daa fusion before doing any analysis was pioneered by Lopez e al. [20] by using a voxel a reconsrucion of he scene. This idea was furher developed by he auhors in [5,21] finally leading o he presen aricle. Up o our knowledge, his is he firs approach o muli-person racking exploiing daa fusion from muliple cameras as he inpu of he algorihms. In his aricle, we firs inroduce a mehodology o muli-person racking based on a colored voxel represenaion of he scene as he sar of he processing chain. The conribuion of his aricle is wofold. Firs, we emphasize he imporance of he iniiaion and erminaion of racks, usually negleced in mos racking algorihms, ha has indeed an impac on he performance of he overall sysem. A general echnique for he iniiaion/erminaion of racks is presened. The second conribuion is he filering sep where wo echniques are inroduced. The firs echnique applies PF o inpu voxels o esimae he cenroid of he racked arges. However, his process is far from real-ime performance and an alernaive, ha we call Sparse Sampling (SS). SS aims a decreasing compuaion ime by means of a novel racking echnique based on he seminal PF principle. Paricles no longer sample he sae space bu insead a magniude whoseexpecancyproduceshe cenroid of he racked person: he surface voxels. The likelihood evaluaion relying on occupancy and color informaion is compued on local neighborhoods, hus dramaically decreasing he compuaion load of he overall algorihm. Finally, effeciveness of he proposed echniques is assessed by means of objecive merics defined in he framework of he CLEAR[22]muli-argeracking daabase. Compuaional performance is reviewed oward proving he realime operaion of he SS algorihms. Fair comparisons wih sae-of-he-ar mehods evaluaed using he same daabase are also presened and discussed. 2 Tracker design mehodology Typically, a muli-arge racking sysem can be depiced as in Figure 1 and comprises a number of elemenary Inpu Daa Analysis Creae rack? Filer Tracker Sae Figure 1 Muli-person racking scheme. Analysis Delee rack? Oupu modules. Alhough mos aricles presen echniques ha conribue o filering module, he overall archiecure is rarely addressed assuming ha some blocks are already available. In his secion, his scheme will be analyzed and some proposals for each module will be presened. The filering sep, being our major conribuion, will be addressed in a separae secion. 2.1 Inpu and oupu daa When addressing he problem of muli-person racking wihin a muli-camera environmen, a sraegy abou how o process his informaion is needed. Many approaches perform an analysis of he images separaely, and hen combine he resuls using some geomeric consrains [10]. This approach is denoed as an informaion combinaion by fusion of decisions. However, a major issue in his procedure is dealing wih occlusion and perspecive effecs. A more efficien way o combine informaion is daa fusion [23]. In our case, daa fusion leads o a combinaion of informaion from all images o build up a new daa represenaion, and o apply he algorihms direcly on hese daa. Several daa represenaions aggregaing he informaion of muliple views have been proposed in he lieraure such as voxel reconsrucions [11,24], level ses [25], polygon meshes [12], conexels [26], deph maps [27], ec. In our research, we oped for a colored voxel represenaion due o boh is fas compuaion and accuracy. For a given frame in he video sequence, a se of N C images are obained from he N C cameras (see a sample in Figure 2(a)). Each camera is modeled using a pinhole camera model based on perspecive projecion wih camera calibraion informaion available. Foreground regions from inpu images are obained using a segmenaion algorihm based on Sauffer-Grimson s background learning and subracion echnique [28] as shown in Figure 2(b). Redundancy among cameras is exploied by means of a Shape-from-Silhouee (SfS) echnique [11]. This process generaes a discree occupancy represenaion of he 3D space (voxels). A voxel is labeled as foreground or background by checking he spaial consisency of is projecion on he N C segmened silhouees, and finally obaining he 3D binary reconsrucion shown in Figure 2(c). We will denoe his raw voxel reconsrucion as V. Thevisibiliyofasurfacevoxelonoagivencamerais

Page 3 of 15 (a) (b) (c) (d) Figure 2 Inpu daa generaion example. (a) A sample of he original images. (b) Foreground segmenaion of he inpu images employed by he SfS algorihm.

assessed by compuing he discree ray originaing from is opical cener o he cener of his voxel using Bresenham s algorihm and esing wheher his ray inersecs wih any oher foreground voxel.

3 Page 3 of 15 (a) (b) (c) (d) Figure 2 Inpu daa generaion example. (a) A sample of he original images. (b) Foreground segmenaion of he inpu images employed by he SfS algorihm. (c) Example of he binary 3D voxel reconsrucion. (d) The final colored version shown over a background image. assessed by compuing he discree ray originaing from is opical cener o he cener of his voxel using Bresenham s algorihm and esing wheher his ray inersecs wih any oher foreground voxel. The mos sauraed color among pixels of he se of cameras ha see a surface voxel is assigned o i. A colored represenaion of surface voxels of he scene is obained, denoed as V C. An example of his process is depiced in Figure 2(d). I should be aken ino accoun ha, wihou loss of generaliy, oher background/foreground and 3D reconsrucion algorihms may be used o generae he inpu daa o he racking algorihm presened in his aricle. The resuling colored 3D scene reconsrucion is fed o he proposed sysem ha assigns a racker o each arge and he obained racks are processed by a higher semanic analysis module. Informaion abou he environmen (dimensions of he room, furniure, ec.) allows assessing he validiy of racked volumes and discarding false volume deecions. Finally, he oupu of he overall racking algorihm will be a number of hypoheses for he cenroid posiion of each of he arges presen in he scene. 2.2 Tracker sae and filering One of he major challenges in muli-arge racking is he esimaion of he number of arges and heir posiions in he scene, based on a se of uncerain observaions. This issue can be addressed from wo perspecives. Firs, exending he heory of single-arge algorihms o muliple arges. This approach defines he working sae space X as he concaenaion of he posiions of all N T arges as X =[x 1, x 2...x NT ].Thedifficulyhereishe ime varian dimensionaliy of his space. Mone Carlo approaches, and specifically PF approaches, o his problem have o face he exponenial dependency beween he number of paricles required by he filer and he dimension of X, urning ou o be compuaionally infeasible. Recenly, a soluion based on random finie ses achieving linear complexiy has been presened [29]. Muli-arge racking can also be ackled by racking each arge independenly, ha is o mainain N T rackers wih a sae space X i = x i. In his case, he sysem aains a linear complexiy wih he number of arges, hus allowing feasible implemenaions. However, ineracions among arges mus be modeled in order o ensure he mos independen se of racks. This approach o muli-person racking will be adoped in our research. 2.3 Track iniiaion and erminaion A crucial facor in he performance of a racking sysem is he module ha addresses he iniiaion and erminaion of racks. The iniiaion of a new racker is independen of he employed filering echnique and only relies on he inpu daa and he curren sae (posiion) of he racks in he scene. On he oher hand, he erminaion of a new racking filer is driven by he performance of he racker. The iniializaion of a new filer is deermined by he correc deecion of a person in he analyzed scene. This process is crucial when racking, and is correc operaion will drive he overall sysem s accuracy. However,

4 Page 4 of 15 despie he imporance of his sep, lile aenion is paid o i in he design of muli-objec rackers in he lieraure. Only few aricles explicily menion his process such as [30] ha employs a face deecor o deec a person or [31] ha uses scou paricle filers o explore he 3D space for new arges. Moreover, i is assumed ha all arges in he scene are of ineres, i.e., people, no accouning for spurious objecs, i.e., furniure, shadows, ec. In his secion, we inroduce a mehod o properly handle he iniiaion and erminaion of filers from a Bayesian perspecive Track iniiaion crieria The 3D inpu daa V fed o he racking sysem is usually corruped and presens a number of inaccuracies such as objecs no reconsruced, mergings among adjacen blobs, spurious blobs, ec. Hence, defining a rack iniializaion crierium based solely on he presence of a blob migh lead o poor performance of he sysem. For insance, objecs such as furniure migh be wrongly deeced as foreground, reconsruced and racked. Insead, a classificaion of he blobs based on a probabilisic crieria can be applied during his iniializaion process aiming a a more robus operaion. Training of his classifier is based on he developmen se of he used daabase, ogeher wih he available ground ruh describing he posiion of he racked objecs. Le X GT = { x 1,..., x NGT } be he ground ruh posiions of he N GT arges presen in he scene of he developmenseaagiveninsan. Once he reconsrucion V is available, a conneced componen analysis is performed over hese daa hus obaining a se of K disjoin componens, C i, fulfilling: V = K C i. (1) i=1 We will consider he region of influence of a arge wih cenroid x as he ellipsoid E(x, y) wih axis size s = (s x, s y, s z ) cenered a c. A mapping is defined such ha for every x j Î X GT a componen C i is assigned. Le us denoe [x] {x,y,z} as he x, y or z coordinae of vecor x. The assignaion process isdefinedasfollows:firs,aregionofinfluencee(x j, s) wih size s =(s x, s y,[x j ] z ) cenered a c = x j is placed in he 3D space. The radii s x and s y are chosen o conain an average person, s x = s y =30cm.Leusdefinehe operaor applied o a volume as he number of nonzero voxels conained in i. Then, he assignaion is defined as x j arg max i E(x j, s) C i, (2) ha is o assign x j o he componen wih he larges volume enclosed in he region of influence. I mus be noed ha some x j migh no have any C i associaed due o a wrong segmenaion or fauly reconsrucion of he arge. Moreover, he se of componens no associaed o any ground ruh posiion can be idenified as spurious objecs, reconsruced shadows, ec. Finally, we have grouped he se of conneced componens C i in wo caegories: person and non-person. A se of feaures are exraced from each of hese componens, hus conforming he characerisics ha will be used o rain a person/no-person binary classifier. This se of exraced feaures is described in Table 1. In order o characerize he objecs o be racked and o decide he bes classifier sysem, we have performed an exploraory daa analysis [32], which will allow us o conras he underlying hypoheses of he classifiers wih he acual daa. Hisograms of hese feaures are compued as shown in Figure 3 and scaer plos depicing he cross dependencies among all feaures are Table 1 Feaures employed by he person/no-person classifier where magniude [V] {x,y,z} denoes he x, y, orz coordinaes of voxel V Feaure Expression Weigh C i s 3 v ρ r = 1.1 [gr/cm3 ] Cenroid (z-axis) Top Heigh Bounding box C i 1 v C i [V] z max V C i [V] z max [V] z max [V] z V C i V C i { } max max [V] x max [V] x,max[v] y max [V] y V C i V C i V C i V C i { } min max [V] x max [V] x,max[v] y max [V] y V C i V C i V C i V C i

5 Page 5 of 15 Figure 3 Normalized hisograms of he variables conforming he feaure vecor employed by he person/non-person classifier. compued. Observing Figure 3, we see ha some variables are easily separable, i.e., weigh, heigh, and bounding box. Moreover, hey show a low cross dependency wih oher feaures. A number of sandard binary classifiers has been esed and heir performances have been evaluaed, namely Gaussian, Mixure of Gaussians, Neural Neworks, K-Means, PCA, Parzen and Decision Trees [33,34]. Due o he aforemenioned properies of he saisic disribuions of he feaures, some classifiers are unable o obain a good performance, i.e., Gaussian, PCA, ec. Oher classifiers require a large number of characerizing elemens, such as K-Means, MoG, or Parzen. Decision rees [33] have repored he bes resuls. Separable variables such as heigh, weigh, and bounding box size are auomaically seleced o build up a decision ree ha yields a high recogniion rae wih a precision of 0.98 and a recall of 0.99 in our es daabase. Anoher complemenary crierium employed in he iniiaion of new racks is based on he curren sae of he racker. I will no be allowed o creae a new rack if is disance o he closes arge is below a hreshold Track erminaion crieria A arge will be deleed if one of he following condiions is fulfilled: - If wo or more racks fall oo close o one anoher, his indicaes ha hey migh be racking he same arge, hence only one will be kep alive while he res will be removed. - If racker s efficiency becomes very low i migh indicae ha he arge has disappeared and should be removed. - The person/no-person classifier is applied o he se of feaures exraced from he voxels assigned o a arge. If he classifier oupus a no-person verdic for a number of frames, he arge will be considered as los. 3 Voxel-based soluions The filering block shown in Figure 1 addresses he problem of keeping consisen rajecories of he racked objecs, resolving crossings among arges, mergings wih spurious objecs (i.e., shadows) and producing an accurae esimaion of he cenroid of he arge based on he inpu voxel informaion. Alhough here is a number of papers addressing he problem of muli-camera/muli-person racking, very few conribuions have been based on voxel analysis [20,21]. 3.1 PF racking PF is an approximaion echnique for esimaion problems where he variables involved do no hold Gaussianiy uncerainy models and linear dynamics. The curren racking scenario can be ackled by means of his algorihm o esimae he 3D posiion of a person x =(x, y, z) a ime, aking as observaion a se of colored voxels represening he 3D scene up o ime denoed as z 1:. For a given arge x, PF approximaes he poserior densiy p(x z 1: )asasumofn p Dirac funcions: N p p (x z 1: ) w j δ(x x j ), (3) j=1

6 Page 6 of 15 where w j are he weighs associaed o he paricles, fulfilling j wj =1,andx j heir posiions. For his ype of racking problem, a sampling imporance re-sampling (SIR) PF is applied o drive paricles along ime [16]. Assuming imporance densiy o be equal o he prior densiy, weigh updae is recursively compued as ) w j w j 1 (z p x j. (4) SIR PF avoids he paricle degeneracy problem by resampling a every ime sep. In his case, weighs are se o w j 1 = N 1 P, j; herefore, ( ) w j p z x j. (5) Hence, he weighs are proporional o he likelihood funcion ha will be compued over he incoming volume z. Finally, he bes sae a ime, x, is derived based on he discree approximaion of Equaion 3. The mos common soluion is he Mone Carlo approximaion of he expecaion as N P x = E[x z 1: ] w j x j. (6) j=1 Basically, in he PF operaion loop wo seps mus be defined: likelihood evaluaion and paricles propagaion. In he following, we presen our proposal for he PF implemenaion Likelihood evaluaion Binary and color informaion conained in z will be ( ) employed o define he likelihood funcion p z x j relaing he observaion z wih he human body insance given by paricle x j,1 j N P.Twoparial ) ) likelihood funcions, p Raw (V x j and p Color (V C xj, ( ) will be combined linearly o produce p z x j as: ( ) ) ( ) p z x j = λp Raw (V x j +(1 λ)p Color V C xj. (7) Facor l conrols he influence of each erm (foreground and color informaion) in he overall likelihood funcion. Empirical ess have shown ha l =0.8provides saisfacory resuls. A more deailed review of he impac of color informaion in he overall performance of he algorihm is addressed in Secion5.1. Likelihood associaed o raw daa is defined as he raio of overlap beween he inpu daa V and he ellipsoid E j defined by paricle x j (see Secion 2.3.1) as ) V E p Raw (V j x j =. (8) E j For a given arge k, an adapive reference hisogram H k of he colored surface voxels is available. This hisogram is consruced using he YCbCr color space due o is robusness agains ligh variaions. The number of bins per channel will drive he abiliy of he sysem o disinguish beween differen color blobs; for our experimens, 21 bins per channel have been se empirically. The color likelihood funcion is consruced as ) ( )) p Color (V C xj = B (H k, H V C E j, (9) where B( ) is he Bhaacharya disance and H( ) sands for he color hisogram exracion operaion of he enclosed volume. Updae of he reference hisogram is performed in a linear manner following he rule: ) H k = αh k 1 (V +(1 α)h C E x, (10) where E x sands for he ellipsoid placed in he cenroid esimaion x and a is he adapaion coefficien. In our experimens, a = 0.9 provided saisfacory resuls Paricle propagaion The propagaion model has been chosen o be a Gaussian noise added o he sae of he paricles afer he re-sampling sep: x j +1 = xj + N. The covariance marix P corresponding o N is proporional o he maximum variaion of he cenroid of he arge and his informaionisobainedfromhedevelopmenparofheesing daase. More sophisicaed schemes employ previously learn moion priors o drive he paricles more efficienly [6]. However, his would penalize he efficiency of he sysem when racking unmodeled moions paerns and, since our algorihm is inended for any moion racking, no dynamical model is adoped Ineracion model Le us assume ha here are N T independen racked arges. However, hey are no fully independen since each racker can consider voxels from oher arges in boh he likelihood evaluaion and he 3D re-sampling sep, resuling in arge merging or ideniy mismaches. In order o achieve he mos independen se of rackers, a blocking mehod o model ineracions is considered. Some blocking proposals can be found in 2D racking relaed sudies [6] and an exension o he 3D domain is proposed. Blocking mehods rely on penalizing paricles whose associaed ellipsoid model overlaps wih oher arges ellipsoid as shown in Figure 4.

Page 7 of 15 (a) (b) Figure 4 Paricles from he racker A (yellow ellipsoid) falling ino he exclusion zone of racker B (green ellipsoid) will be penalized by a muliplicaive facor a Î [0, 1].

a ime 1-1 for arge k and j( ) is he blocking funcion defining exclusion zones ha penalize paricles from arge l falling ino he exclusion zone of arge k.

7 Page 7 of 15 (a) (b) Figure 4 Paricles from he racker A (yellow ellipsoid) falling ino he exclusion zone of racker B (green ellipsoid) will be penalized by a muliplicaive facor a Î [0, 1]. Hence, blocking informaion can also be considered when compuing he paricle weighs for he kh arge as w k,j ( = p z x k,j ) N T l=1 l k ( ) φ x k 1, xl 1, (11) where x k sands for he esimaion of he PF a ime 1-1 for arge k and j( ) is he blocking funcion defining exclusion zones ha penalize paricles from arge l falling ino he exclusion zone of arge k. Inhisparicular case, considering ha people in he room are always siing or sanding up, his zone can be consrained o he xy plane. The proposed funcion is ( ( ) φ x k 1, [ xl 1 =1 exp k x k [ ] ]x,y x l x,y 2), (12) where k s 2 x is he parameer ha drives he sensibiliy of he exclusion zone. 3.2 SS racking In he presened PF racking algorihm, likelihood evaluaion can be compuaionally expensive, hus rendering his approach unsuiable for real-ime sysems. Moreover, daa are usually noisy and may conain merged blobs corresponding o differen arges. A new echnique, SS, is proposed as an efficien and flexible alernaive o PF. Assuming a homogeneous 3D objec, i can be proved ha is cenroid can exacly be compued based only on he surface voxels, since he inerior voxels do no provide any relevan informaion. Hence, his cenroid can be esimaed hrough a discree version of Green s heorem on he surface voxels [35,36], while oher approaches obain an accurae approximaion of he cenroid using feaure poins (see [37] for a review). A common assumpion of hese echniques is he availabiliy of surface daa exraced beforehand, hence a labeling of he voxels in he scene should be available. By assuming ha he objec under sudy presens a cenral symmery in he xy plane, he compuaion of he cenroid can be done as an average of he posiions of he surface voxels: [V] x [V] x V V x = = V Vs. (13) V V s Degree of mass and degree of surfaceness Le us model he human body as an ellipsoid as previously done in he PF approach. In order o es he robusness of he cenroid compuaion of Equaion13 agains missing daa, we sudied he error commied when only a fracion of hese inpu daa is employed. A number of voxels (surface or inerior voxels in each case) is randomly seleced and employed o compue he cenroid. Then, he error is compued showing ha he surface-based esimaion is more sensiive han he esimaion using inerior voxels (see Figure 5). However, his proves ha he cenroid can be compued from a number of randomly seleced surface voxels sill achieving a saisfacory performance. This idea is he underlying principle of he SS algorihm. Le us esimae he cenroid of an objec by analyzing a randomly seleced number of voxels from he whole scene, denoed as W. An approach o he compuaion of he cenroid would be x ρ(w)[w] x W W, ρ(w) = ρ(w) W W { 1ifW V 0ifW V,(14) where ρ(w) gives he mass densiy of voxel W. Since i is assumed ha all voxels have he same mass, his is a binary funcion ha checks he occupancy of a given voxel. Hence, only he fracion of (randomly seleced) voxels inside he objec will conribue o he compuaion of he cenroid. Equaion14 can be rewrien as x W W ρ(w) ρ(w) [W] x = ρ(w)[w] x, (15) W W W W where ρ(w) can be considered as he normalized mass conribuion of voxel W o he compuaion of he cenroid. If funcion ρ(w) akes values in he range [0,1] we may consider i as he degree of mass of W or he imporance of voxel W ino he calculaion of x. Then, ρ(w) migh be considered as a normalized weigh assigned o W. Since we saed ha he cenroid

8 Page 8 of 15 Figure 5 Cenroid s esimaion error when compued wih a fracion of surface or inerior voxels. The employed ellipsoid had a radii s = (30, 30,100) cm, and voxels wih sv = 2 cm were used. can be compued using surface voxels, Equaion13 can be also posed as x W W ρs(w) ρ(w) [W] x = ρs(w)[w] x, (16) W W W W ω i where ρs(w) [0, 1] measures he degree of surfaceness of voxel W. Wihin his conex, funcions r( ) and r S ( ) migh be undersood as pseudo-likelihood funcions and Equaions 16 and 15 as a sample-based represenaion of an esimaion problem Difference wih paricle filers There is an obvious similariy beween hese represenaion and he formulaion of paricle filers bu here is a significan difference. While paricles in PF represen an insance of he whole body, our samples (W W ) are poins in he 3D space. Moreover, paricle likelihoods are compued over all daa while sample pseudo-likelihoods will be compued in a local domain. The presened conceps are applied o define he SS algorihm. Le y i R3,apoininhe3Dspaceand R is associaed weigh measuring he pseudolikelihood of his posiion being par of he objec or par of is surface. Under cerain assumpions, i is achieved ha he cenroid can be compued as N s x ω i yi, (17) i=1 where N s is he number of sampling poins. When using SS we are no longer sampling he sae space since y i canno be considered an insance of he cenroid of he arge as happened wih paricles, x j,inpf. Hence, we will alk abou samples insead of paricles and we will refer o { (y i, ωi )} N s as he sampling se. i=1 This se will approximae he surface of he kh arge, V S,k, and will fulfill he sparsiy condiion N s V S,k.. 4 SS implemenaion In order o define a mehod o recursively esimae x from he sampling se { (y i, ωi )} N s, a filering sraegy i=1 has o be se. Essenially, he proposal is o follow he PF analysis loop (re-sampling, propagaion, evaluaion,

9 Page 9 of 15 and esimaion) wih some opporune modificaions o ensure he convergence of he algorihm. 4.1 Pseudo-likelihood evaluaion Associaed weigh w j o a sample y i will measure he likelihood of ha 3D posiion o be par of he surface of he racked arge. When compuing he pseudo-likelihood, surface has been chosen insead of inerior voxels, based on he efficiency of surface samples o propagae rapidly as will be explained in he nex secion. As in he defined PF likelihood funcion, wo parial pseudo-likelihood funcions b, p Raw V y) i ( and p Color ( V C y i ), are linearly combined o form p ( z y i ) as p ( z y i ) = λ praw ( V y i ) +(1 λ)pcolor ( V C y i ). (18) Parial likelihoods will be compued on a local domain cenered in he posiion y i. Le C ( y i, q, r) be a neighborhood of radius r over a conneciviy q domain on he 3D orhogonal grid around a sample place in a voxel posiion y i. Then, we define he occupancy and color neighborhoods around y i as O i = V C ( y i, q, r) and C i = VC C ( y i, q, r), respecively. For a given sample y i occupying a single voxel, is weigh associaed o he raw daa will measure is likelihood o belong o he surface of an objec. I can be modeled as ( ) p Raw V y i 2 O i =1 C ( y i, q, r ) 1. (19) Ideally, when he sample y i is placed in a surface, half of is associaed occupancy neighborhood will be occupied and he oher half empy. The proposed expression aains is maximum when his condiion is fulfilled. ( Funcion p Color V C y) i can be defined as he likelihood of a sample belonging o he surface corresponding o he kh arge characerized by an adapive reference color hisogram H k : ( p Color V C y i ) ( ) = D H k, Cj. (20) Since C j conains only local color informaion wih reference of he global hisogram H k, he disance D( ) is consruced oward giving a measure of he likelihood beween his local colored region and H k. For every voxel in C j, i is decided wheher i is similar o Hm by selecing he hisogram value for he esed color and checking wheher i is above a hreshold g or no. Finally, he raio beween he number of similar color and oal voxels in he neighborhood gives he color similariy score. Since reference hisogram is updaed and changes over ime, a variable hreshold g is compued, so ha he 80% of he values of H m are aken ino accoun. One of he advanages of he SS algorihm is is compuaional efficiency. The complexiy o compue p ( z y) i is quie reduced since i only evaluaes a local neighborhood around he sample in comparison wih he compuaional load required o evaluae he likelihood of a paricle in he PF algorihm. This poin will be quaniaively addressed in Secion5.2. The parameers defining he neighborhood were se o q =26andr = 2 yielding o saisfacory resuls. Larger values of he radius r did no significanly improve he overall algorihm performance bu increased is compuaional complexiy. 4.2 Sample propagaion and 3D discree resampling Asampley i placed near a surface will have an associaed weigh ω j wih a high value. I is a valid assumpion o consider ha some surrounding posiions migh also be par of his surface. Hence, placing a number of new paricles in he viciniy of x j would conribue o progressively explore he surface of a voxel se. This idea leads o he spaial re-sampling and propagaion scheme ha will drive samples along ime in he surface of he racked arge. Given he discree naure of he 3D voxel space, i will be assumed ha every sample is consrained o occupy a single voxel or discree 3D coordinae and here canno be wo samples placed in he same locaion. Resampling is mimicked from PF so a number of replicas proporional o he normalized weigh of he sample are generaed. Then, hese new samples are propagaed and some discree noise is added o heir posiion meaning ha heir new posiions are also consrained o occupy a discree 3D coordinae (see an example in Figure 6). However, wo re-sampled and propagaed paricles may (a) (b) Figure 6 Example of discree re-sampling and propagaion (in 2D). (a) A sample is re-sampled and is replicas are randomly placed occupying a single voxel. (b) Two re-sampled samples fall in he same posiion (red cell) and one of hem (blue) performs a random search hrough he adjacen voxels o find an empy locaion.

Page 10 of 15 fall in he same 3D voxel locaion as shown in Figure 6.

The choice of sampling he surface voxels of he objec insead of is inerior voxels o finally obain is cenroid is moivaed by he fac ha propagaing samples along he surface rapidly spread hem all around

Propagaing samples on he surface is equivalen o propagae hem on a 2D domain, hence he condiion of no placing wo samples in he same voxel will make hem o explore he surface faser (see Figure 6).

Alhough boh (pseudo-)likelihoods should produce a fair esimaion of he objec s cenroid, boh sampling ses mus fulfill he condiion o be randomly spread around he objec volume, oherwise he cenroid

10 Page 10 of 15 fall in he same 3D voxel locaion as shown in Figure 6. In such case, one of hese paricles will randomly explore he adjacen voxels unil reaching an empy locaion; if here is no any suiable locaion for his paricle, i will be dismissed. The choice of sampling he surface voxels of he objec insead of is inerior voxels o finally obain is cenroid is moivaed by he fac ha propagaing samples along he surface rapidly spread hem all around he objec as depiced in Figure 7. Propagaing samples on he surface is equivalen o propagae hem on a 2D domain, hence he condiion of no placing wo samples in he same voxel will make hem o explore he surface faser (see Figure 6). On he oher hand, inerior voxels propagae on a 3D domain, hus having more space o explore and herefore becoming slower o spread all around he volume (see Figure 6). Alhough boh (pseudo-)likelihoods should produce a fair esimaion of he objec s cenroid, boh sampling ses mus fulfill he condiion o be randomly spread around he objec volume, oherwise he cenroid esimaion will be biased Ineracion model The flexibiliy of a sample-based analysis may, someimes, lead o siuaions where paricles spread ou oo much from he compued cenroid. In order o cope wih his problem, a inra-arge samples ineracion model is devised. If a sample is placed in a posiion such ha [ ] y i x,y [ x ] 1 x, y >δ i will be removed (ha is o assign ω i =0) and we se he hreshold as δ = as x,wihs x = 30 cm. Facor a = 1.5 produced accurae resuls in our experimens. The ineracion among arges is modeled in similar way as in he PF approach. Formulas in Equaions 11 and 12 are applied o samples wih he appropriae scaling parameer k. 5 Resuls and evaluaion In order o assess he performance of he proposed racking sysems, hey have been esed on he se of benchmarking image sequences provided by he CLEAR Evaluaion Campaigns 2007 [22]. Typically, hese evaluaion sequences involved up o five people moving around in a meeing room. This benchmarking se was formed by wo separae daases, developmen, and evaluaion, conaining sequences recorded by five of he paricipaing parners. A sample of hese daa can be seen in Figure 8. The developmen se consised in 5 sequences of an approximae duraion of 20 min each, while he evaluaion se was formed by 40 sequences of 5min each, hus adding up o 5 h of daa. Each sequence was recorded wih four cameras placed in he corners of he room and a zenihal camera placed in he ceiling. All cameras were calibraed and had resoluions ranging from o pixels a an average frame rae of f R = 25fps. Theesenvironmenswerea5 4mroomswih occluding elemens such as ables and chairs. Images of he empy rooms were also provided o rain he background/foreground segmenaion algorihms. Merics proposed in [4] for muli-person racking evaluaion have been adoped, namely he Muliple Objec Tracking Precision (MOTP), which shows racker s abiliy o esimae precise objec posiions, and he Muliple Objec Tracking Accuracy (MOTA), which expresses is performance a esimaing he number of objecs, and a keeping consisen rajecories. MOTP scores he averagemericerrorwhenesimaingmuliplearge3d cenroids, while MOTA evaluaes he percenage of frames where arges have been missed, wrongly deeced or mismached. The aim of a racking sysem would be o produce high values of MOTA and low values of MOTP hus indicaing is abiliy o correcly rack all arges and esimae heir posiions accuraely. When comparing wo algorihms, here will be a preference o choose he one oupuing he highes MOTA score. 5.1 Resuls To demonsrae he effeciveness of he proposed muliperson racking approaches, a se of experimens were (a) Reference (b) Inerior based likelihood (c) Surface based likelihood Figure 7 Sample posiions evoluion and cenroid esimaion. Likelihood based on: (a) inerior voxels, or (b) surface voxels.

Page 11 of 15 (a) (c) (e) Figure 8 CLEAR [22]evaluaion daase sample. Imagesfrom several parners showing a common indoor conference room configuraion involving several paricipans.

3 and he remaining es par was used for our experimens. Firs, he muli-camera daa are pre-processed performing he foreground and background segmenaions and 3D voxel reconsrucion algorihm.

A colored version of hese voxel reconsrucions was also generaed, according o he echnique inroduced in Secion 2.1. Then, hese daa were he inpu fed o he PF and SS proposed approaches.

Experimens carried ou explore he influence of hese parameers in he MOTP, precision in cm., and MOTA, racker accuracy (in % of (b) (d) correcly racked arges), shown in Figure 9.

The conribuion of a new sample o he esimaion of he cenroid in he SS has less impac han he addiion of a new paricle in he PF, hence he slower decay of he MOTP curves for he SS han for he PF.

11 Page 11 of 15 (a) (c) (e) Figure 8 CLEAR [22]evaluaion daase sample. Imagesfrom several parners showing a common indoor conference room configuraion involving several paricipans. conduced over he CLEAR 2007 daabase. The developmen par of he daase was used o rain he iniiaion/ erminaion of racks modules as described in Secion 2.3 and he remaining es par was used for our experimens. Firs, he muli-camera daa are pre-processed performing he foreground and background segmenaions and 3D voxel reconsrucion algorihm. In order o analyze he dependency of he racker s performance wih he resoluion of he 3D reconsrucion, several voxel sizes were employed s v = {2, 5, 10, 15} cm. A colored version of hese voxel reconsrucions was also generaed, according o he echnique inroduced in Secion 2.1. Then, hese daa were he inpu fed o he PF and SS proposed approaches. In boh ypes of filers, SS or PF, hree parameers drive he performance of he algorihm: he voxel size s v, he number of samples N s, or paricles N p,andhe usage of color informaion. Experimens carried ou explore he influence of hese parameers in he MOTP, precision in cm., and MOTA, racker accuracy (in % of (b) (d) correcly racked arges), shown in Figure 9. Some remarks can be drawn - Number of samples/paricles: There is a dependency beween he MOTP score and he number of paricles/samples, especially for he SS algorihm. The conribuion of a new sample o he esimaion of he cenroid in he SS has less impac han he addiion of a new paricle in he PF, hence he slower decay of he MOTP curves for he SS han for he PF. Regarding he MOTA score, here is no a significan dependency wih N s or N p. Two facors drive he MOTA of an algorihm: he rack iniiaion/erminaion modules, ha mainly conribues o he raio of misses and false posiives, and he filering sep ha has an impac o he mismaches raio. The low dependency of MOTA wih N s or N p shows ha mos of he impac of he algorihm in his score is due o he paricle/sample propagaion and ineracion sraegies raher han he quaniy of paricles/samples iself. Moreover, he influence in he MOTA score is ighly correlaed wih he rack iniiaion/erminaion policy. This assumpion was experimenally validaed by esing several classificaion mehods (mixure of Gaussians, PCA, Parzen, and K- Means) in he iniiaion/erminaion modules yielding o a drop in he MOTA score proporional o heir abiliy o correcly classify a blob as person/no-person. - Voxel size: Scenes reconsruced wih a large voxel size do no capure well all spaial deails and may miss some objecs hus decreasing he performance of he sysem (boh in SS and PF). I can be observed ha MOTP and MOTA scores improve as he voxel size decrease. - Color feaures: Color informaion improves he performance of SS and PF in boh MOTP and MOTA scores. Firs, here is an improvemen when using color informaion for a given voxel size, specially for he SS algorihm. Moreover, he smaller he voxel size he mos noiceable difference beween he experimens using raw and color feaures. This effec is suppored by he fac ha color characerisics are beer capured when using small voxel sizes. The performance improvemen when using color in he SS algorihm is more noiceable since samples are placed in he regions wih a high likelihood o be par of he arge. For insance, his effec is more eviden in cases where he subjec is siing and he paricles concenrae in he upper body par, disregarding he par of he chair. In he SS algorihm, MOTP score benefis from his efficien sample placemen. PF algorihm is consrained o evaluae he color likelihood in he ellipsoid defined in Equaion 9 hus no being able o differeniae beween pars of he blob ha do no belong o he racked arge. Color informaion used wihin he filering loop leads o a beer disinguishabiliy among blobs, hus reducing he mismach raio and

Page 12 of 15 Figure 9 MOTP and MOTA scores for he SS and he PF echniques using raw and colored voxels. Several voxel sizes sv = {2, 5, 10, 15} cm have been used in he experimens.

An example of he impac of color informaion is shown in Figure 10 where he usage of color avoids he mismach beween wo arges.

We can compare he resuls obained by SS and PF wih oher algorihms evaluaed using he same CLEAR 2007 daabase whose scores are repored in Table 2.

12 Page 12 of 15 Figure 9 MOTP and MOTA scores for he SS and he PF echniques using raw and colored voxels. Several voxel sizes sv = {2, 5, 10, 15} cm have been used in he experimens. slighly improving he MOTA score. Merging of adjacen blobs or complex crossing among arges is also correcly resolved. An example of he impac of color informaion is shown in Figure 10 where he usage of color avoids he mismach beween wo arges. This effec is more noiceable when arges in he scene are dressed in differen colors. We can compare he resuls obained by SS and PF wih oher algorihms evaluaed using he same CLEAR 2007 daabase whose scores are repored in Table 2. Mos of hese mehods exploied muli-view informaion wih he excepion of [31] ha only used he zenihal camera facing he associaed disorion and perspecive problems. PF is he mos employed echnique due o is suiabiliy o he characerisics of his problem alhough Kalman filering used by [15] provided fair resuls when fed by higher semanical feaures exraced from he inpu daa (in his case, faces). Noe he low FP score for his sysem as a consequence of he unlikely even of deecing a face in a spurious objec. A 3D voxel reconsrucion was used as he inpu daa in [5] ogeher wih a simple rack managemen sysem. The res of he mehods [7,31] relied on a fixed human body appearance model similar o he ellipsoidal region of ineres used in our PF proposal. However, he novely of hese Raw feaures Color feaures Figure 10 Zenihal view of wo comparaive experimens showing he influence of color in he SS algorihm. The crossover beween wo arges is correcly ackled when using color informaion whereas using only raw feaures leads o a mismach and, aferwards, a rack loss (whie ellipsoid) and he iniiaion of a new one (cyan ellipsoid).

Page 13 of 15 Table 2 Resuls presened a he CLEAR 2007 [22] by several parners Mehod MOTP MOTA FP Miss MM (mm) (%) (%) (%) (cases) Face deecion+kalman filering 91 59.66 06.99 30.89 2.

13 Page 13 of 15 Table 2 Resuls presened a he CLEAR 2007 [22] by several parners Mehod MOTP MOTA FP Miss MM (mm) (%) (%) (%) (cases) Face deecion+kalman filering [15] Appearance models+pf [7] Upper body deecion+pf [31] Zenihal camera analysis+pf [31] Voxel analysis+heurisic racker [5] Voxel analysis+pf (bes case) Voxel analysis+ss (bes case) Muli-camera informaion is used o rack muliple people using several mehods mehods is he sraegies o combine he informaion coming from he analysis of differen views wihou performing any 3D reconsrucion. Comparing he bes proposed racking sysem [31] c wih our wo approaches, we obain a relaive improvemen of Δ (MOTP, MOTA) ss = (7.63,17.13)% and Δ(MOTP, MOTA) PF = (5.16,7.15)%. In order o visually show he performance of he SS algorihm, some videos corresponding o he mos challenging racking scenarios have been made available a hp:// 5.2 Compuaional performance Comparing obained merics among differen algorihms can give an idea abou heir performance in a scenario where compuaional complexiy is no aken ino accoun. An analysis of he operaion ime of several algorihms under he same condiions and he produced MOTP/MOTAmericsmighgiveamoreinformaive and fairer comparison ool. Alhough here is no a sandard procedure o measure he compuaional performance of a racking process, we devised a mehod o assess he compuaional efficiency of our algorihms o presen a comparaive sudy. The RTFfacor associaed wih a performance measure MOTP/MOTA (in boh verical axes) of he SS and PF algorihms when dealing wih raw and colored inpu voxels is presened in Figure 11. This facor indicaes a proporional measure of he speed of he algorihm where RTF = 1 sands for real-ime operaion while RTF >1andRTF < 1 indicae a faser or slower performance, respecively. Each poin of every curve is he resul of an experimen conduced over all he CLEAR daa se associaed o a number of samples/paricles of each algorihm. The firs noiceable characerisic of hese chars is ha, due o he compuaional complexiy of each algorihm, when comparing SS and PF algorihms under he same operaion condiions, he RTF associaed wih SS Figure 11 Compuaional performance comparison beween PF and SS using several voxel sizes s V = {2, 5, 10, 15} cm and feaures (raw or colored voxels). MOTP and MOTA scores are relaed o he real-ime facor (RTF) showing he compuaional load required by each algorihm o aain a given racking performance.

14 Page 14 of 15 is always higher han he associaed wih PF. Similarly, he compuaional load is higher when analyzing colored han raw inpus. All he ploed curves aain lower RTF performance values as he size of he voxel s v decreases since he amoun of daa o process increases (noe he differen RTF scale ranges for each voxel size in Figure 11). Regarding he MOTP/MOTA merics, here is a common endency o a decrease in he MOTP and an increase in he MOTA as he RTF decreases. The separaion beween he SS and PF curves is bigger as he voxel size decreases since he PF algorihm has o evaluae a larger amoun of daa. The observaion of hese resuls yields he conclusion ha he SS algorihm is able o produce a similar and, in some cases, beer resuls han he PF algorihm wih a lower compuaional cos. For example, using s v =5 cm, a MOTP score of around 165 mm can be obained using SS wih a RTF en imes larger han when using PF and similarly wih he MOTA score. 6 Conclusions In his aricle, we have presened a number of conribuions o he muli-person racking ask in a muli-camera environmen. A block represenaion of he whole racking process allowed o idenify he performance bolenecks of he sysem and address efficien soluions o each of hem. Real-ime performance of he sysem was a major goal hence efficien racking algorihms have been produced as well as an analysis of heir performance. The performance of hese sysems has horoughly been esed over he CLEAR daabase and quaniaively compared hrough wo scores: MOTP and MOTA. A number of experimens have been conduced oward exploring he influence of he resoluion of he 3D reconsrucion and he color informaion. Resuls have been compared wih oher sae-of-he-ar algorihms evaluaed wih he same merics using he same esing daa. The relevance of he iniiaion and erminaion of filers have been proved, since hese modules have a major impac on he MOTA score. However, mos aricles in he lieraure do no specifically address he operaion of hese modules. We proposed a saisical classifier based on classificaion rees as a way o discriminae blobs beween he person/no-person classes. Training of his classifier was done using daa available in he developmen par of he employed daabase and a number of feaures (namely weigh, heigh, op in z-axis, bounding box size) were exraced and provided as he inpu o he classifier. Anoher crierium such as a proximiy o oher already exising racks was employed o creae or desroy a rack. Performance scores in Table 2 for he PF and SS sysems presen he lowes values for he false posiives (FP) and missed arges (Miss) raios hence supporing he relevance of he iniiaion and erminaion of racks modules. Two proposals for he filering sep of he racking sysem have been presened: PF and SS. An independen racker was assigned o every arge and an ineracion model was defined. PF echnique proved o be robus and leaded o sae-of-he-ar resuls bu is compuaional load was unaffordable for small voxel sizes. As an alernaive, SS algorihm has been presened achieving a similar and, in some occasions, beer performance han PF a a smaller compuaional cos. Is sample-based esimaion of he cenroid allowed a beer adapaion o noisy daa and disinguishabiliy among merged blobs. In boh PF and SS, color informaion provided a useful cue o increase he robusness of he sysem agains rack mismaches hus increasing he MOTA score. In he SS, color informaion also allowed a beer placemen of he samples allowing o disinguish among pars belonging o he racked objec and pars of a merging wih a spurious objec, leading o a beer MOTP score. Fuure research wihin his opic involves muli-modal daa fusion wih audio daa oward improving he precision of he racker, MOTP, and avoid mismaches among arges, hus improving he MOTA score. End noes a Analogously o he pixel definiion (picure elemen) as he minimum informaion uni in a discree image, he voxel (volume elemen) is defined as he minimum informaion uni in a 3D discree represenaion of a volume. b For he sake of simpliciy in he noaion, pseudolikelihood funcions will be denoed as p( ) insead of defining a specific noaion for i. c When selecing he bes sysem, he MOTA score is regarded as he mos significan value. The auhors declare ha hey have no compeing ineress Received: 15 May 2011 Acceped: 23 November 2011 Published: 23 November 2011 References 1. S Park, MM Trivedi, Undersanding human ineracions wih rack and body synergies capured from muliple views. Compu Vis Image Undersand 111(1), 2 20 (2008). doi: /j.cviu Projec CHIL Compuers in he Human Ineracion Loop, hp://chil.server. de ( ) 3. I Hariaoglu, D Harwood, LS Davis, W 4 : real-ime surveillance of people and heir aciviies. IEEE Trans Paern Anal Mach Inell. 22(8), (2000). doi: / K Bernardin, A Elbs, R Siefelhagen, Muliple objec racking performance merics and evaluaion in a smar Room environmen, in Proceedings of IEEE Inernaional Workshop on Visual Surveillance (2006) 5. C Canon-Ferrer, J Salvador, JR Casas, Muli-person racking sraegies based on voxel analysis, in Proceedings of Classificaion of Evens, Aciviies and

15 Page 15 of 15 Relaionships Evaluaion and Workshop, vol Lecure Noes on Compuer Science, (2007) 6. Z Khan, T Balch, F Dellaer, Efficien paricle filer-based racking of muliple ineracing arges using an MRF-based moion model, in Proceedings of Inernaional Conference on Inelligen Robos and Sysems. 1(1), (2003) 7. O Lanz, P Chippendale, R Brunelli, An appearance-based paricle filer for visual racking in smar rooms, in Proceedings of Classificaion of Evens, Aciviies and Relaionships Evaluaion and Workshop, vol Lecure Noes on Compuer Science, (2007) 8. A Yilmaz, O Javed, M Shah, Objec racking: a survey. ACM Compu Surv. 38(4), 1 45 (2006) 9. C Canon-Ferrer, JR Casas, M Pardàs, Towards a Bayesian approach o robus finding correspondences in muliple view geomery environmens, in Proceedings of 4h Inernaional Workshop on Compuer Graphics and Geomeric Modelling, vol Lecure Noes on Compuer Science, (2005) 10. O Lanz, Approximae Bayesian mulibody racking. IEEE Trans Paern Anal Mach Inell. 28(9), (2006) 11. GKM Cheung, T Kanade, JY Bougue, M Holler, A real ime sysem for robus 3D voxel reconsrucion of human moions, in IEEE Conference on Compuer Vision and Paern Recogniion 2, (2000) 12. J Isidoro, S Sclaroff, Sochasic refinemen of he visual hull o saisfy phoomeric and silhouee consisency consrains, in Proceedings of IEEE Inernaional Conference on Compuer Vision 2, (2003) 13. I Mikič, S Sanini, R Jain, Tracking objecs in 3D using muliple camera views, in Proceedings of Asian Conference on Compuer Vision (2000) 14. D Focken, R Siefelhagen, Towards vision-based 3-D people racking in a Smar Room, in Proceedings of IEEE Inernaional Conference on Mulimodal Inerfaces, (2002) 15. N Kasarakis, F Talanzis, A Pnevmaikakis, L Polymenakos, The AIT 3D audiovisual person racker for CLEAR 2007, in Proceedings of Classificaion of Evens, Aciviies and Relaionships Evaluaion and Workshop, vol Lecure Noes on Compuer Science, (2007) 16. MS Arulampalam, S Maskell, N Gordon, T Clapp, A uorial on paricle filers for online nonlinear/non-gaussian Bayesian racking. IEEE Trans Signal Process. 50(2), (2002). doi: / K Lien, C Huang, Muliview-based cooperaive racking of muliple human objecs. EURASIP J. Image Video Process 8(2), 1 13 (2008) 18. T Osawa, X Wu, K Sudo, K Wakabayashi, H Arai, MCMC based muli-body racking using full 3D model of boh arge and environmen, in Proceedings of IEEE Conference on Advanced Video and Signal Based Surveillance, (2007) 19. J Black, T Ellis, P Rosin, Muli view image surveillance and racking, in Proceedings of Workshop on Moion and Video Compuing, (2002) 20. A López, C Canon-Ferrer, JR Casas, Muli-person 3D racking wih paricle filers on voxels, in Proceedings of IEEE Inernaional Conference on Acousics, Speech and Signal Processing 1, (2007) 21. C Canon-Ferrer, R Sblendido, JR Casas, M Pardàs, Paricle Filering and sparse sampling for muli-person 3D racking, in Proceedings of IEEE Inernaional Conference on Image Processing, (2008) 22. CLEAR Classificaion of Evens, Aciviies and Relaionships Evaluaion and Workshop, hp:// (2007) 23. DL Hall, SAH McMullen, Mahemaical Techniques in Mulisense Daa Fusion. Arech House (2004) 24. KN Kuulakos, SM Seiz, A heory of shape by space carving. In J Compu Vis. 38(3), (2000). doi: /a: O Faugeras, R Keriven, Variaional principles, surface evoluion, PDE s, level se mehods and he sereo problem, in Proceedings of 5nd IEEE EMBS Inernaional Summer School on Biomedical Imaging (2002) 26. JR Casas, J Salvador, Image-based muli-view scene analysis using conexels, in Proceedings of HCSNe Workshop on Use of Vision in Human-Compuer Ineracion, (2006) 27. V Kolmogorov, R Zabin, Wha energy funcions can be minimized via graph cus?. IEEE Trans Paern Anal Mach Inell. 26(2), (2004). doi: /tpami C Sauffer, W Grimson, Adapive background mixure models for real-ime racking, in Proceedings of IEEE Inernaional Conference on Compuer Vision and Paern Recogniion, (1999) 29. E Maggio, E Piccardo, C Regazzoni, A Cavallaro, Paricle PHD filering for muli-arge visual racking, in Proceedings of IEEE Inernaional Conference on Acousics, Speech and Signal Processing. 1, (2007) 30. F Talanzis, A Pnevmaikakis, AG Consaninides, Audio-visual acive speaker racking in cluered indoors environmens. IEEE Trans Sys Man Cybern B. 38(3), (2008) 31. K Bernardin, T Gehrig, R Siefelhagen, Muli-level paricle filer fusion of feaures and cues for audio-visual person racking, in Proceedings of Classificaion of Evens, Aciviies and Relaionships Evaluaion and Workshop, vol Lecure Noes on Compuer Science, (2007) 32. JW Tuckey, Exploraory Daa Analysis. Addison-Wesley (1977) 33. L Breiman, JH Friedman, RA Olshen, CJ Sone, Classificaion and Regression Trees. Chapman and Hall (1993) 34. RO Duda, PE Har, DG Sork, Paern Classificaion. Wiley-Inerscience (2000) 35. JJ Crisco, RD McGovern, Efficien calculaion of mass momens of ineria for segmened homogeneous hree-dimensional objecs. J Biomech. 31(1), (1998) 36. JG Leu, Compuing a shape s momens from is boundary. Paern Recogn. 24(10), (1991) 37. L Yang, F Albregsen, Fas and exac compuaion of Caresian geomeric momens using discree Green s heorem. Paern Recogn. 29(7), (1996). doi: / (95) doi: / Cie his aricle as: Canon-Ferrer e al.: Muli-camera muli-objec voxelbased Mone Carlo 3D racking sraegies. EURASIP Journal on Advances in Signal Processing :114. Submi your manuscrip o a journal and benefi from: 7 Convenien online submission 7 Rigorous peer review 7 Immediae publicaion on accepance 7 Open access: aricles freely available online 7 High visibiliy wihin he field 7 Reaining he copyrigh o your aricle Submi your nex manuscrip a 7 springeropen.com

STEREO PLANE MATCHING TECHNIQUE

STEREO PLANE MATCHING TECHNIQUE Commission III KEY WORDS: Sereo Maching, Surface Modeling, Projecive Transformaion, Homography ABSTRACT: This paper presens a new ype of sereo maching algorihm called Sereo