J. Vis. Commun. Image R.

Size: px

Start display at page:

Download "J. Vis. Commun. Image R."

Beverley Cross
5 years ago
Views:

J. Vis. Commun. Image R. 20 (2009) 9 27 Conens liss available a ScienceDirec J. Vis. Commun. Image R. journal homepage: www.elsevier.

Bhandarkar b a Deparmen of Informaion Sysems, Norhwes Missouri Sae Universiy, 800 Universiy Drive, Maryville, MO 64468, USA b Deparmen of Compuer Science, The Universiy of Georgia, Ahens, GA

1 J. Vis. Commun. Image R. 20 (2009) 9 27 Conens liss available a ScienceDirec J. Vis. Commun. Image R. journal homepage: Face deecion and racking using a Boosed Adapive Paricle Filer Wenlong Zheng a, *, Suchendra M. Bhandarkar b a Deparmen of Informaion Sysems, Norhwes Missouri Sae Universiy, 800 Universiy Drive, Maryville, MO 64468, USA b Deparmen of Compuer Science, The Universiy of Georgia, Ahens, GA , USA aricle info absrac Aricle hisory: Received 30 March 2007 Acceped 9 Sepember 2008 Available online 7 Sepember 2008 Keywords: Adapive Paricle Filer Paricle filer Face deecion Video racking Image analysis Boosed learning Boosed Adapive Paricle Filer Adapive Learning Consrain A novel algorihm, ermed a Boosed Adapive Paricle Filer (BAPF), for inegraed face deecion and face racking is proposed. The proposed algorihm is based on he synhesis of an adapive paricle filering algorihm and he AdaBoos face deecion algorihm. An Adapive Paricle Filer (APF), based on a new sampling echnique, is proposed. The APF is shown o yield more accurae esimaes of he proposal disribuion and he poserior disribuion han he sandard Paricle Filer hus enabling more accurae racking in video sequences. In he proposed BAPF algorihm, he AdaBoos algorihm is used o deec faces in inpu image frames, whereas he APF algorihm is designed o rack faces in video sequences. The proposed BAPF algorihm is employed for face deecion, face verificaion, and face racking in video sequences. Experimenal resuls show ha he proposed BAPF algorihm provides a means for robus face deecion and accurae face racking under various racking scenarios. Ó 2008 Elsevier Inc. All righs reserved.. Inroducion Face deecion is imporan in several auomaed sysems ha ake as inpu images of he human face. Examples include fully auomaic face recogniion sysems, video-based surveillance and warning sysems, human face/body racking sysems and percepual human compuer inerfaces. Mos face deecion algorihms can be classified as feaure-based or appearance-based. In recen years, appearance-based face deecion algorihms ha employ machine learning and saisical esimaion mehods have demonsraed excellen resuls among all exising face deecion mehods. Examples of appearance-based face deecion echniques include he AdaBoos algorihm [,2], he FloaBoos algorihm [3], he S-AdaBoos algorihm [4], neural neworks [5,6], Suppor Vecor Machines (SVM) [7,8], Hidden Markov Models [9], and he Bayes classifier [0,]. Viola and Jones [,2] propose a robus AdaBoos face deecion algorihm, which can deec faces in a rapid and robus manner wih a high deecion rae. Li e al. [3] propose he Floa- Boos algorihm, an improved version of he AdaBoos algorihm, for learning a boosed classifier wih minimum error rae. The Floa- Boos algorihm uses a backracking mechanism o improve he face deecion rae afer each ieraion of he AdaBoos procedure. However his mehod is compuaionally more inensive han he AdaBoos algorihm. Jiang and Loe [4] propose he S-AdaBoos * Corresponding auhor. addresses: zheng@nwmissouri.edu (W. Zheng), suchi@cs.uga.edu (S.M. Bhandarkar). algorihm, a varian of he AdaBoos algorihm, for handling ouliers in paern deecion and classificaion. Since he S-AdaBoos algorihm uses differen classifiers in differen phases of compuaion, i suffers from compuaional inefficiency and lack of accuracy. Among he various neural nework-based approaches o face deecion, he work of Rowley e al. [5] is paricularly well known. Rowley e al. [5] employ a mulilayer neural nework o learn he face and non-face paerns from raining ses consising of face and non-face images. A significan drawback of heir echnique is ha he deecion is limied o primarily uprigh fronal faces. Alhough Rowley e al. [5] furher generalize heir mehod o deec roaed face images, he repored resuls are no promising because of he resuling low deecion rae. Face deecion echniques based on Suppor Vecor Machines (SVMs) use srucural risk minimizaion o minimize he upper bound of he expeced generalizaion error [7,8]. The major disadvanages of SVMs include inensive compuaion during he learning process and high memory requiremen. Face deecion echniques based on Hidden Markov Models (HMMs) assume ha face and non-face paerns can be characerized as oupus of a parameric random Markovian process whose parameers can be learned using a well-defined esimaion procedure [9]. The goal of raining an HMM is o esimae he appropriae parameers in he HMM model ha maximize he probabiliy or likelihood of he observed raining daa. Schneiderman and Kanade [0] presen a naive Bayes classifier for face deecion, which is based on he esimaion of he join probabiliy of he local appearance and posiion of a face paern a muliple scales. However, he performance of he naive Bayes /$ - see fron maer Ó 2008 Elsevier Inc. All righs reserved. doi:0.06/j.jvcir

2 0 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9 27 classifier is repored o be poor [0]. To address his problem, Schneiderman [] has proposed a resriced Bayesian nework for face deecion based on performing a search in he large space of possible nework srucures o deermine he opimal srucure of a Bayesian nework-based classifier. Objec racking in video has also been sudied exensively by researchers in compuer vision because of various compuer vision applicaions such as auonomous robos [2], video surveillance [3], human eye racking [4] and human face racking [5] ha use racking algorihms exensively. Objec racking algorihms designed o operae under more general and less srucured siuaions need o deal wih complex issues of uncerainy and error arising from occlusion, and changes in illuminaion, viewpoin and objec scale [6]. Consequenly, many echniques have been developed o ackle he various aforemenioned issues in visual objec racking and repored in he lieraure over he pas decade. The various echniques for visual objec racking can be classified as image (region)-based, conour-based or filering-based. Image (region)-based racking mehods ypically exrac generic regionbased feaures from he inpu images and hen combine hese feaures using high-level scene informaion [6]. Inille e al. [6] propose a blob-racker for human racking in real ime wherein he background is subraced o exrac foreground regions. The foreground regions are hen divided ino blobs based on color. This approach exhibis good run-ime performance, bu suffers from a major disadvanage in erms of merging of blobs when he objecs in he scene approach each oher. Conour-based racking mehods assume ha he racked objecs are bounded by conours wih known properies [7 9]. The conour pixels are racked from one image frame o he nex using a predefined conour shape model. Dynamical or elasic conour models are used o handle changes in objec shape due o deformaion or change in scale. Filering-based mehods are based on predicion and updaing of objec feaures over ime, i.e., over successive frames in he video sequence. Tracking of objec shape and objec locaion over ime is ackled adequaely by he radiional Kalman filer in cases where he racking problem can be effecively modeled as a linear dynamic sysem [20]. The Exended Kalman Filer (EKF) is an exension of he radiional Kalman filer o a non-linear bu unimodal process where he non-linear behavior can be approximaed by local linearizaion [2]. However, i is widely acceped ha he paricle filer is superior o he radiional Kalman filer in erms of racking performance [22], since he paricle filer provides a robus objec racking framework wihou being resriced o a linear sysem model. Paricle filers, also known as sequenial Mone Carlo filers, have been widely used in visual racking o address limiaions arising from non-lineariy and non-normaliy of he moion model [23,24]. The basic idea of he paricle filer is o approximae he poserior densiy using a recursive Bayesian filer based on a se of paricles wih cerain assigned weighs. The Condensaion algorihm, a simple paricle filer, proposed by Isard [25] is designed o solve racking problems arising from non-lineariy and non-normaliy of he moion model. During he sampling sep, he Condensaion algorihm uses a simple proposal disribuion o draw a se of paricles, which defines he condiional disribuion on he paricle sae in he previous frame. The proposal disribuion is hen used o approximae he arge a poseriori disribuion. A commonly observed shorcoming of he Condensaion algorihm is ha he proposal disribuion does no incorporae he informaion from he curren frame hus resuling in longer run ime needed for convergence o he desired a poseriori disribuion. Various approaches have been developed o improve he racking performance of he convenional paricle filer. Li e al. [23] propose a Kalman paricle filer (KPF) and an unscened paricle filer (UPF) o improve he paricle sampling procedure in he conex of visual conour racking. This approach makes use of a Kalman filer or an unscened Kalman filer o incorporae he curren observaion in he esimaion of he proposal disribuion. The Kalman filer or he unscened Kalman filer is shown o seer he se of paricles o regions of high likelihood in he search space, and hus reducing he number of paricles needed o esimae he proposal disribuion [23]. To address he occlusion problem, Wang and Cheong [26] propose a paricle filer wih a Markov random field (MRF)-based represenaion of he racked objec wihin a dynamic Bayesian framework. This mehod ransforms an objec ino a composie of muliple MRF regions o improve he modeling and racking accuracy. Chang e al. [22] presen a kernel paricle filer o improve he sampling efficiency for muliple objec racking using daa associaion echniques. This scheme invokes kernels o approximae coninuously he poserior densiy, where he kernels for objec represenaion and localizaion are allocaed based on he value of he gradien derived from he kernel densiy. However, his mehod can no handle siuaions in which he moion paern of objecs in one group changes drasically. Rahi e al. [9] formulae geomeric acive conours as a parameerizaion echnique o deal wih deformable objecs. However, heir echnique performs poorly when he racked objec is compleely occluded over several frames. Isard and MacCormick [27] propose a Bayesian muliple blob-racker (BraMBLe), an early varian of a paricle filer, in which he number of racked objecs is allowed o vary during he racking process. Noneheless, his approach relies on modeling a fixed background o idenify foreground objecs, a siuaion ha is no always pracical in real-world racking scenarios. To address his problem, Okuma e al. [24] relax he assumpion of a fixed background and allow he background o vary in order o handle real-world image sequences. Okuma e al. [24] also propose a boosed paricle filer (BPF) for muliple objec deecion and racking, which inerleaves he AdaBoos algorihm wih a simple paricle filer based on he Condensaion algorihm. However, his mehod does no presen a sysemaic way o incorporae objec models o guaranee accurae approximaion of he proposal disribuion, nor does i address he occlusion problem. In his paper, we propose a novel paricle filering scheme, ermed as an Adapive Paricle Filer (APF), o enable esimaion of he proposal disribuion and he poserior disribuion wih a much higher degree of accuracy. Based on he previous work of Isard [25], Li e al. [23], Vermaak e al. [28] and Okuma e al. [24], we also propose a novel scheme for inegraion of face deecion and face racking by combining he APF-based racking algorihm wih he AdaBoos face deecion algorihm. We erm he combinaion of he proposed APF algorihm and he AdaBoos algorihm as a Boosed Adapive Paricle Filer (BAPF). In he proposed BAPF-based face racking scheme, he AdaBoos algorihm is used o deec faces and also verify he exisence of a racked face in an inpu image frame, whereas he APF algorihm is designed o rack he deeced faces across he image frames in he video sequence. The BAPF algorihm is shown o yield very good racking resuls in siuaions where he racked objecs are severely occluded. Experimenal resuls show ha he proposed BAPF scheme provides robus face deecion and accurae face racking in various racking scenarios. 2. Saisical model A glossary of mahemaical noaion used in he formulaion of he saisical model and he paricle filering algorihm is given in Appendix A. 2.. Observaion model We denoe a sae vecor for an objec by x, and an observaion vecor by y. I is imporan for conour racking o obain an accurae

3 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9 27 esimae of he observaion likelihood (also ermed as he observaion densiy) p(y x). Blake e al. [29], Isard[25], andmaccormick andblake [8,30] inroduce saisical models for esimaion of he observaion densiy p(y x). These models use specific image feaures ha are colleced along a se of normals o a hypohesized conour. A finie numberofsamplepoins, calledconrol poins, are generaed on he hypohesized conour. We follow he general direcion provided by he aforemenioned models for modeling he observaion process, bu specifically follow he model proposed by MacCormick [8]. Fig. shows an observed conour and image feaures exraced along a measuremen line. We denoe he finie number of sample poins on a hypohesized conour by a se {x i ; i =,2,..., n}. We erm he normals o he above conour as measuremen lines, and denoe hem by a se {s i ; i =,2,..., n}. The lengh of he measuremen lines is fixed a a value T. The Canny edge deecor is applied o he measuremen line s i (i =,2,..., n) in order o obain he posiions of he edge feaures fz ðmþ i ; m ¼ ; 2;...; m i g where m is he index of he deeced feaures, and m i is he number of deeced feaures. As can be seen, each feaure is joinly generaed by he boundary of an objec and random cluer presen in he image. The cluer feaures on he measuremen line s i (i =,2,..., n) are assumed o obey a Poisson disribuion wih densiy parameer k given by p T ðm i Þ¼ ðktþm i e kt m i! where m i is he number of deeced cluer feaures. A boundary densiy funcion is assumed o obey a Gaussian disribuion; hus he generic likelihood funcion of an observaion a a sample poin x i (i =,2,...,n) can be described by [8]: p xi ðzjv ¼fx i gþ!! ¼ ðktþm i e kt q m i! 0 þ q X m i pffiffiffiffiffiffi exp ðzðmþ i x i Þ 2 k 2p r 2r 2 where q 0 is he probabiliy of undeeced feaures for an objec boundary, and q is he probabiliy of deeced feaures for an objec boundary. Assuming ha he sample poins are independen and idenically disribued, he overall likelihood funcion of he observaion p(y x) can be represened as pðyjxþ ¼ Yn ðktþ m i m i! q 0 þ q k Y e kt meas: line s i inersecing x!! pffiffiffiffiffiffi exp ðzðmþ i x i Þ 2 2p r 2r 2 X m i ðþ ð2þ ð3þ 2.2. Dynamical model Generally speaking, a paricle filer algorihm requires a dynamical model o describe he evoluion of a racking sysem over ime. An auo-regressive process (ARP) model has been widely used for he purpose of formulaing such a dynamical model [8,3]. Blake e al. [29,32] model he objec dynamics as a second-order process. Isard and Blake [33] and Li e al. [23] also follow he dynamical model of Blake e al. [29,32] in heir work on objec racking. Following he previous works of Blake e al. [29,32], Isard and Blake [33] and Li e al. [23], his paper also employs a second-order ARP as he dynamical model for face racking. I is widely acceped ha he secondorder ARP capures he various moions of ineres ha are relevan o visual racking [8]. The parameers of he dynamical model in a ypical real-world applicaion can be obained via learning from he inpu raining daa. The second-order ARP represens he sae x a ime as a linear combinaion of he previous wo saes and addiive Gaussian noise. The dynamical model can be represened as a second-order linear difference equaion: x x ¼ Aðx xþþbx where x is Gaussian noise ha is independen of he sae vecor x, x denoes he mean value of he sae vecor, and A and B are marices describing he deerminisic componen and he sochasic componen of he dynamical model, respecively. The sae vecor x encodes he knowledge of he objec conour in he curren sae and he previous sae and is represened by x ¼ X : X In mos real-world applicaions, one can se some reasonable defaul values for he parameers A, B and x of he dynamical model. However, i is more effecive o approximae hese values via observaion of video sequences, in which he racked objec undergoes some ypical represenaive moions [29,34]. The dynamical model can also be represened by a emporal Markov chain [33] given by pðx jx Þ¼Cexp 2 jb ððx xþ Aðx xþþj 2 ð5þ where C is a consan, and denoes he Euclidean norm. ð4þ Fig.. (a) Observaion process: he ellipse is a hypohesized conour in an image. (b) The image feaures on he measuremen line.

4 2 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) Face racking using paricle filering 3.. The filering disribuion We denoe a sae vecor for an objec a ime by x, and is hisory up o ime by x : ={x,x 2,...,x }. Likewise, an observaion vecor a ime is denoed by y and is hisory up o ime is denoed by y : ={y,y 2,...,y }. The sandard problem of arge racking, in he erminology of saisical paern recogniion, is o esimae he sae x of he objecs a ime, using a se of observaions y from a sequence of inpu images. The a poserior densiy p(x y : ) represens all he informaion abou x a ime ha is poenially deducible from he se of observaions y up o ha ime. We assume ha he objec dynamics consiue a emporal Markov process and ha he observaions y are independen. Therefore, he dynamics are deermined by a ransiion prior p(x x ). Given he ransiion prior p(x x ) and he observaion densiy p(y x ), he poserior densiy p(x y : ) can be compued by applying Bayes rule [35] for inferring he poserior sae densiy from ime-varying observaions. The poserior densiy is esimaed recursively via Bayesian filering [33,36] as follows pðx jy : Þ¼ pðy jx ; y : Þpðx jy : Þ ¼ pðy jx Þpðx jy : Þ pðy jy : Þ pðy jy : Þ where Z pðx jy : Þ¼ pðx jx Þpðx jy : Þdx The poserior densiy p(x y : ) is generally evaluaed in wo seps, namely he predicion sep and he updaing sep. Firs, an effecive prior p(x y : ) shown in Eq. (7) is prediced from he poserior densiy p(x y : ) via he ransiion prior p(x x ). Second, he poserior densiy p(x y : ) is updaed based upon he new observaion y a ime, as given by Eq. (6). The observaion prior p(y y : ), which is he denominaor in Eq. (6), can be represened by pðy jy : Þ¼ X x pðy ; x jy : Þ¼ X x pðy jx Þpðx jy : Þ ð8þ Furhermore, he observaion prior p(y y : ) can be represened by he following inegral: Z pðy jy : Þ¼ pðy jx Þpðx jy : Þdx ð9þ ð6þ ð7þ Eqs. (0) and () represen an opimal soluion o he sandard problem of objec racking via recursive Bayesian filering. Obviously, his soluion involves he compuaion of high-dimensional inegrals, and dealing wih he non-lineariy and non-normaliy of he moion model in many racking scenarios. High-dimensional inegraions usually canno be easily compued in a closed analyical form. Thus a paricle filer, also known as a sequenial Mone Carlo filer, is adoped as a pracical soluion o he oherwise inracable problem of objec racking via recursive Bayesian filering The sandard paricle filer A sandard paricle filer uses Mone Carlo simulaion o esimae he poserior probabiliy densiy funcion p(x y : ) given in Eq. (0). Paricle filering uses random sampling sraegies o model a complex poserior probabiliy densiy funcion p(x y : ). I uses N weighed discree paricles o approximae he poserior probabiliy densiy funcion p(x y : ) via observaion of he daa. Each paricle consiss of a sae vecor x and a weigh w. The weighed paricle se is given by fðx ðiþ ; Þ; i ¼ ; 2;...; Ng. Paricle filering samples he space spanned by x wih N discree paricles and approximaes he disribuion using he discree poins sampled by he paricles and heir associaed weighs. Specifically, we assume ha N paricles are used in he sampling procedure o approximae he poserior probabiliy densiy funcion p(x y : ), and ha he N discree sample poins in he space of x are given by x ; x2 ;...; xn, respecively. Thus we have pðx jy : Þ¼ XN w i dðx x i Þ ð3þ Since i is no feasible o draw samples direcly from he poserior disribuion, a proposal disribuion q(x x,y : ) is used o draw he samples easily for approximaion of he poserior probabiliies. Using he proposal disribuion q(x x,y : ), a paricle filer generaes he ih paricle x ðiþ, where (i =,2,...,N), and compues he weigh for x ðiþ ¼ pðy jx ðiþ Þpðx ðiþ jx ðiþ Þ qðx ðiþ jx ðiþ ; y :Þ from x ðiþ using he following equaion The poserior disribuion p(x y : ) can be hus approximaed as pðx jy : Þ XN ð4þ dðx x ðiþ Þ ð5þ The esimae of he funcion f(x ) of he sae vecor could be hen compued as Thus Eq. (6) becomes pðx jy : Þ¼ pðy jx Þpðx jy : Þ R pðy jx Þpðx jy : Þdx ð0þ E½f ðx ÞŠ XN f ðx ðiþ Þ: ð6þ Using Eq. (7), we subsiue he expression for he effecive prior p(x y : ) in Eq. (0) o obain pðx jy : Þ¼ pðy jx Þ R pðx jx Þpðx jy : Þdx R pðy jx Þ R pðx jx Þpðx jy : Þdx dx ðþ In addiion o he esimae of he poserior densiy p(x y : ), an esimae of a funcion f(x ) of objec sae vecor is also required under many siuaions. The expeced value or esimae of he funcion f(x ) is approximaed as Z E½f ðx ÞŠ ¼ f ðx Þpðx jy : Þdx ð2þ 3.3. The Adapive Paricle Filer One of he acive research areas in paricle filering in recen years is he generaion of a good proposal disribuion q(x x,y : ) ha would enable a more accurae esimae of he poserior disribuion p(x y : ). The aim is o obain as close an approximaion o he poserior probabiliy disribuion as possible. The sandard paricle filer based on he Condensaion algorihm [33] does no use he knowledge obained from he curren image frame, which leads o higher inaccuracy in he esimae of he poserior disribuion. Douce e al. [36] propose an opimal proposal disribuion (OPD) for sae esimaion of jump Markov linear sysems, and recursively compue opimal sae esimaes based on he selecion of he minimum variance of weighs

5 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) ði ¼ ; 2;...; NÞ. To overcome he problem of inefficien compuaion of he OPD, Li e al. [23] propose a Kalman paricle filer (KPF) and an unscened paricle filer (UPF) o drive a se of paricles o he high-likelihood regions in he search space. Li e al. [23] propose a local linearizaion of he OPD o esimae he proposal disribuion, which is assumed o be a Gaussian disribuion. Therefore, he proposal disribuion obained by Li e al. [23] can be represened as u l ðxþ ¼qðx ðiþ jx ðiþ ; y :kþ ¼Nð^x ðiþ ; b P ðiþ Þ i ¼ ; 2;...; N: ð7þ where ^x ðiþ and P bðiþ denoe he mean and covariance, respecively, of he Gaussian disribuion Nð^x ðiþ ; P bðiþ Þ. In his paper, we propose a new paricle filering scheme, ermed as an Adapive Paricle Filer (APF), which enables significanly more accurae esimaion of he proposal disribuion and he poserior disribuion. The proposed APF can be viewed as an exension of he Condensaion algorihm and he Kalman paricle filer o obain an accurae approximaion of he proposal disribuion and he poserior disribuion. The proposed APF algorihm is depiced in Fig. 2. In he sampling sep of he APF algorihm (Fig. 2), a new sampling sraegy is used o improve he accuracy of he approximaion, which is disinc from he sampling sraegies used in oher paricle filers. The sampling sep is considered o be he mos imporan sep in a paricle filering algorihm. For each discree paricle x ðiþ ;l, he Adapive Paricle Filer generaes a new paricle x ðiþ ;l based on a proposal disribuion u l (x). We use he loop conrolled by he parameer l in he APF algorihm (Fig. 2) o implemen he proposed sampling sraegy. Le he parameer L denoe he fixed number of ieraions of loop l, where he value of L can be uned based on he needs of differen real-world applicaions. When L =, he APF is anamoun o he sandard paricle filer; hus he sandard paricle filer is a special case of he proposed APF. When L >, he APF performs more sampling ieraions han he sandard paricle filer. We prove in Appendix A ha he addiional ieraions of he APF resul in a lower esimaion error for he proposal disribuion and poserior disribuion. In order o enable more accurae esimaion of he proposal disribuion, we ierae he sampling procedure subjec o a consrain ermed as he Adapive Learning Consrain (ALC) described using Eq. (7). A more deailed explanaion and derivaion of he ALC is provided in Appendix A. K l max l a K l min l where max l ¼ maxfm jn ðiþ in min l ¼ minfm jn ðiþ in xðiþ j; M ;l 2jn ðiþ 2 xðiþ j; m ;l 2jn ðiþ 2 xðiþ ;l jg xðiþ ;l jg ð8þ K l, K l are consans, and 0 < a <. As long as he proposed ALC is saisfied, he curren ieraion ha generaes new paricles for racking in he curren image frame will coninue. The curren ieraion in he sampling sep is haled when he proposed ALC ceases o be saisfied or he predefined loop hreshold is reached. From boh, a heoreical and pracical viewpoin, he paricles (and heir associaed weighs and sae vecor) resuling from he laes ieraion of he ALC represen a beer approximaion o he proposal disribuion and he poserior disribuion. The heoreical analysis and experimenal resuls presened in he following secions of he paper confirm ha he performance of he APF algorihm is indeed superior o ha of he convenional paricle filering algorihm. The oher seps in he proposed APF algorihm are similar o hose in he case of oher paricle filers such as he KPF and he UPF. The iniializaion sep in he proposed APF algorihm akes advanage of he informaion derived from he resuls of he Ada- Boos face deecion algorihm. 4. The Boosed Adapive Paricle Filer The proposed Boosed Adapive Paricle Filer (BAPF) for face deecion and racking employs wo objec models: he conourbased model used in he Adapive Paricle Filer (APF) and he region-based model used in he AdaBoos face deecion procedure. The objec models used in he conex of racking can be ypically classified ino hree general caegories [37]: conour-based models [23,38,39], region-based models [5,27,40], and feaure poinbasedmodels [4 43,45]. Since he proposed BAPF algorihm uses wo disinc models for face deecion and racking, i has disinc advanages over he convenional paricle filer. The incorporaion of he AdaBoos algorihm wihin he APF algorihm is shown o subsanially improve he robusness of he resuling BAPF algorihm. The Ada- Boos algorihm provides a naural mechanism for inegraion of he conour- and region-based represenaions. This makes he proposed BAPF algorihm more powerful han he naïve K-means clusering mehod proposed by Vermaak e al. [28] in heir disribuion mixure racking algorihm. The proposed BAPF algorihm also performs beer han he disribuion mixure represenaion scheme of Okuma e al. [24] since he proposed approach employs a more effecive paricle filering algorihm, i.e., he APF algorihm. In he proposed BAPF scheme, he AdaBoos algorihm allows for explici deecion of faces enering and leaving he field of view of he camera, hus providing a means for (re)iniializaion of he APF-based racking algorihm. The AdaBoos algorihm provides a means of verificaion of he predicions made by he APF-based racking algorihm whereas he APF-based racking algorihm, in urn, provides he focus-of-aenion regions for he AdaBoos algorihm. The resuling BAPF algorihm is seen o provide robus face deecion and accurae face racking under various racking scenarios. 4.. Face deecion using adaboos Among he various face deecion mehods described in he published research lieraure, he boosed learning-based face deecion mehods have demonsraed excellen resuls. Building on he previous work of Tieu and Viola [44] and Schneiderman [], Viola and Jones [,2] have proposed a robus AdaBoos face deecion algorihm, which can deec faces in an inpu image in a rapid and robus manner wih a high deecion rae. The face deecion echnique in AdaBoos is comprised of hree aspecs: he inegral image, a srong classifier comprising of weak classifiers based on he AdaBoos learning algorihm, and an archiecure comprising of a cascade of a number of srong classifiers. We employ he AdaBoos scheme of Viola and Jones [,2] for face deecion. A 25-layer cascade of boosed classifiers is rained o deec muli-view faces. A se of sample face and non-face (ermed as background) images are used for raining. Anoher se of non-face sample images is colleced from video sequences conaining no faces. The resuls of he AdaBoos raining procedure and he AdaBoos face deecion procedure are presened in Secion Inegraing he Adapive Paricle Filer wih adaboos The proposed face deecion and racking scheme is based on he inegraion of wo disinc models: an AdaBoos face deecion model and an Adapive Paricle Filer (APF)-based face racking

6 4 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9 27 Fig. 2. The algorihm describing he Adapive Paricle Filer. model. The AdaBoos face deecion model performs muli-view face deecion based on he rained AdaBoos algorihm. The APF model performs visual conour racking using he adapive paricle filering algorihm described in Secion 3.3. Fig. 3 shows he closed-loop conrol sysem view of he proposed inegraed face deecion and face racking scheme.

7 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) Single Frame Adapive Paricle Filer Face Tracking Predicion scene condiions deermined by he ambien illuminaion, cluer, and iner-objec and background occlusions. Iniializaion Verificaion Definiion The process for face deecion and racking conains wo phases: an iniializaion phase and a racking phase. In he iniializaion phase of he BAPF, he AdaBoos face deecion model is used o provide he iniial parameers for he APF face racking model based on observaions of he image frames in he video sream over a cerain ime inerval. During he racking phase, he AdaBoos face deecion model and he APF face racking model improve he racking performance via synergeic ineracion. The AdaBoos face deecion model helps he APF face racking model o find and define new objecs (faces), and o verify he curren saes of he objecs (faces) being racked. On he oher hand, he APF face racking model provides focus-of-aenion regions wihin he image o speed up he AdaBoos face deecion procedure. Afer performing he AdaBoos face deecion procedure on an image, we obain a confidence measure g for each deeced face in he image. Likewise, using he APF racking algorihm, he esimae of f(x ) generaed by he Adapive Paricle Filer a each sample poin along he conour is given by E½f ðx ÞŠ XN AdaBoos Face Deecion f ðx ðiþ Þ ð9þ The resuls of he AdaBoos algorihm and he APF algorihm are combined o updae he posiion of a sampled poin, as follows E c ðf ðx ÞÞ ¼ ð cþeðf ðx ÞÞ þ c g d Focus-of-aenion Fig. 3. Inegraion of he APF wih AdaBoos wihin a single feedback conrol sysem. ð20þ where E c (.) represens he esimaed posiion of a sampled poin on he conour combining he esimaion values from he APF and he AdaBoos algorihm, he parameer c is he weigh assigned o he Adaboos deecion procedure, he parameer g is a confidence measure assigned o each deeced face in he image, d is he disance beween he cener of a deeced face from he AdaBoos algorihm and he cener of a sampled emplae conour from he APF algorihm, and f(x ) represens he locaion of he conour of he racked face. In pracice, he oupu of he Adaboos face deecion procedure is averaged over he previous F frames before combining i wih he APF oupu using Eq. (9). The value of E c (f(x )) is fed back o he APF for furher processing in successive ieraions. Eq. (9) demonsraes ha he combinaion of he AdaBoos algorihm and he APF algorihm can be employed o obain he esimaed posiion of a sampled poin on he conour. The acual esimaed posiion was compued by adding a weighed esimae of he APF and a weighed disance beween he cener of a face deeced using he AdaBoos algorihm and he cener of a sampled emplae conour from he APF algorihm. The firs erm represens he influence of he APF, whereas he second erm represens he influence of he AdaBoos algorihm. The parameer c in Eq. (9) can be fine-uned wihou affecing he convergence of he APF. When c = 0, he BAPF degeneraes o he pure APF. By increasing he value of c, we lay a greaer emphasis on he oupu of he Ada- Boos face deecion algorihm. On he oher hand, when c =, he BAPF degeneraes o he pure AdaBoos algorihm, which does no ake ino accoun he oupu of he APF-based racker. In pracice, one could adjus he value of he parameer c based on differen 5. Experimenal resuls 5.. Adaboos face deecion The AdaBoos scheme of Viola and Jones [,2] is used o deec faces in inpu images. In our experimen, we rain a 25-layer cascade of srong classifiers o deec muli-view faces in video sequences. The raining daa se is composed of face and nonface images of size pixels. A se of 6230 muli-view face images of eigh persons is colleced from video sequences under differen condiions of ambien scene illuminaion, surface reflecion, facial pose, facial expression and background composiion in order o make face deecion scheme more robus in differen scenarios. The face images are cropped and scaled o a resoluion of pixels. Anoher se of 6598 non-face sample images of size pixels are colleced from video sequences conaining no faces. The non-face sample images are of he same size as he video frames acquired by he video camera for realime face racking, bu his is no a sric requiremen. Fig. 4 shows some random face samples used for he raining, and Fig. 5 shows some random non-face samples used for raining. A larger raining se of face and non-face examples ypically leads o beer deecion resuls, alhough deecion failures sill exis in he inpu image regions ha conain overlaps or occlusions, and are characerized by high scene cluer. Some resuls of face deecion using our rained AdaBoos face deecion procedure are illusraed in Fig. 6. AdaBoos face deecion performs well in mos cases, bu generaes false posiives in very busy images ha conain overlaps or occlusions, and are characerized by high scene cluer Boosed Adapive Paricle Filer The proposed Boosed Adapive Paricle Filer (BAPF) is implemened using C++ under he Microsof Visual C++.NET environmen on a.6 GHz Penium-M worksaion. Video sequences are of size pixels and are sampled a 30 frames per second. In he beginning, he AdaBoos face deecion algorihm iniializes he objec sae(s) in he Adapive Paricle Filer (APF)-based face racking algorihm. The iniializaion is done using observaions of he image frames in he inpu video sequence over a cerain ime inerval. Since he conour defining he appearance of he face in he video sequences is roughly circular or ellipical in shape, we use a simple parameerized model o represen he conour i.e., Ax 2 + By 2 + C = 0. Noe ha he proposed BAPF algorihm can also be used in he case of more complex conours ha are modeled using a B-spline represenaion. The proposed BAPF algorihm has been applied o various racking scenarios shown in Fig. 7 hrough Fig. 2. The racking resuls are presened for hree es video sequences ha are capured under varying condiions defined by ambien scene illuminaion, surface reflecion, objec scale, facial pose (consising of boh, in-plane and ou-of-plane roaions), facial expression, occlusions and background composiion. In wo of he es videos (es videos and 3) here is a single face in he scene. Tes video is used in experimens comprising of differen racking scenarios, whereas es video 3 is used o compare he racking accuracy of he proposed BAPF algorihm wih ha of he Condensaion algorihm (i.e., he convenional paricle filer). The hird es video (es video 2), in which here are wo faces in he scene, is used in experimens ha cover differen muli-face racking scenarios. All racking resuls are obained using N = 000 paricles in he APF, BAPF and Condensaion algorihms.

8 6 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9 27 Fig. 4. Face examples. Fig. 5. Non-face examples. Fig. 6. Resuls of fronal face deecion and muli-view face deecion.

W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9 27 7 Fig. 7. Tracking resuls wih scale changes on es video. From lef o righ, he frame numbers are 98, 043 and 067. Fig. 8.

7 hrough 2, a yellow ellipse around a face implies he absence of occlusion, whereas a red ellipse indicaes he presence of occlusion. Fig.

9 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) Fig. 7. Tracking resuls wih scale changes on es video. From lef o righ, he frame numbers are 98, 043 and 067. Fig. 8. Tracking resuls wih illuminaion changes on es video. From lef o righ, he frame numbers are 866, 954 and Descripion of BAPF experimens In Fig. 7 hrough 2, a yellow ellipse around a face implies he absence of occlusion, whereas a red ellipse indicaes he presence of occlusion. Fig. 7 shows snapshos of a single-face racking experimen on es video where he scale of he racked face is observed o change significanly. The racking resuls show ha he proposed BAPF racking algorihm can handle significan scale changes in he objec appearance. Fig. 8 shows snapshos of a single-face racking experimen on es video under changing illuminaion condiions. I demonsraes ha he proposed BAPF algorihm is robus o various changes in ambien illuminaion condiions which can be aribued o he inheren robusness of he APF algorihm and he inegraion of he saisical learning procedure in AdaBoos wih he APF-based racker. Fig. 9 shows snapshos of a single-face racking experimen on es video under changes in viewpoin and facial pose arising from ou-of-plane roaions of he face. I shows ha he proposed BAPF algorihm can handle muli-view face deecion and racking. Fig. 0 shows snapshos of a single-face racking experimen on es video wih in-plane roaions of he face. I shows ha he proposed BAPF algorihm can handle objec appearance changes due o in-plane objec roaions. Fig. shows snapshos of a single-face racking experimen on es video where he face is periodically occluded. I confirms ha he proposed BAPF algorihm performs correcly in he presence of occlusions because of he robusness of he APF. Fig. 2 presens snapshos of a wo-face racking experimen on es video 2 which conains insances of iner-objec (i.e., inerface) occlusion. I can be observed ha he proposed BAPF algorihm can deal wih and recover from insances of iner-objec (iner-face) occlusion in he video sream. The racking performance of he proposed BAPF algorihm is compared wih ha of he Condensaion algorihm [33], which is a well-known implemenaion of a convenional paricle filerbased racker. Boh algorihms employ N = 000 paricles for face racking in es video 3. The experimenal resuls show ha he racking accuracy of he proposed BAPF algorihm is superior o ha of he Condensaion algorihm. Thus, in he conex of objec racking, he proposed BAPF algorihm can be deemed o yield beer performance han he convenional paricle filer. However, he beer performance of he proposed BAPF algorihm comes a he price of compuaional efficiency. The BAPF algorihm acually consumes more compuing resources han he relaively sraighforward Condensaion algorihm since he BAPF algorihm performs more compuaion per ieraion in order o yield more accurae non-linear esimaions of he objec sae(s). The racking accuracy of he proposed BAPF algorihm and he Condensaion algorihm can be visually compared using he snapshos shown in Figs. 3 and 4, respecively BAPF performance analysis Using racking accuracy and compuaion ime as he performance merics, we quaniaively analyze and compare he performance of he BAPF algorihm, APF algorihm, and Condensaion algorihm. The racking accuracy is defined in erms of he displacemen error beween he cenroid of a ground ruh face and he cenroid of a racked face in a video sequence. All hree aforemenioned racking algorihms employ N = 000 paricles for face racking and are esed on es video 3. In he following empirical comparison, he performance of he APF algorihm is compared o ha of he Condensaion algorihm, he performance of he BAPF algorihm is compared o ha of he APF algorihm, he performance of he APF algorihm is analyzed for differen values of he parameer L, he performance of he BAPF algorihm is analyzed for differen values of he parameer F, and he performance of he BAPF algorihm is analyzed for differen values of he parameer c. We firs compare he performance of he APF algorihm o ha of he Condensaion algorihm. Boh algorihms employ N = 000 paricles for face racking in he es video 3. In he APF algorihm, he upper bound on he number of ieraions of he loop conrolled by parameer l is L = 3. The experimenal resuls, as shown in Fig. 5 and Table, demonsrae ha he racking accuracy of he APF algorihm is superior o ha of he Condensaion algorihm. I can be seen from Table ha he mean displacemen error in he case of he APF algorihm is significanly lower han ha in he case of he Condensaion algorihm. However, he racking speed (in frames/s) of he APF algorihm is observed o be slower han ha of he Condensaion algorihm. As menioned previously, he slower racking speed of he APF algorihm can be aribued o he fac ha he APF algorihm performs more compuaion per

8 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9

The yellow ellipse implies ha no occlusion has occurred, whereas he red ellipse implies ha occlusion has occurred.

Tracking resuls in he presence of in-plane roaions on es video. From lef o righ, he frame numbers are 04, 35 and 52. Fig.

10 8 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9 27 Fig. 9. Tracking resuls in he presence of muliple facial views and ou-of-plane face roaions on es video. The yellow ellipse implies ha no occlusion has occurred, whereas he red ellipse implies ha occlusion has occurred. From op lef o boom righ, he frame numbers are 55, 59, 524, 530, 533, 544, 566, 573 and 585. Fig. 0. Tracking resuls in he presence of in-plane roaions on es video. From lef o righ, he frame numbers are 04, 35 and 52. Fig.. Tracking resuls in he presence of occlusions on es video. The yellow ellipse implies ha no occlusion has occurred, whereas he red ellipse implies ha occlusion has occurred. From op lef o boom righ, he frame numbers are 356, 359, 362, 366, 382 and 397.

W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9 27 9 Fig. 2. Tracking resuls in he presence of wo faces on es video 2.

From op lef o boom righ, he frame numbers are 2, 4, 25, 38, 78 and 38. Fig. 3. Tracking resuls wih he BAPF a six differen imes in es video 3.

11 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) Fig. 2. Tracking resuls in he presence of wo faces on es video 2. The yellow ellipse implies ha no occlusion has occurred, whereas he red ellipse implies ha occlusion has occurred. From op lef o boom righ, he frame numbers are 2, 4, 25, 38, 78 and 38. Fig. 3. Tracking resuls wih he BAPF a six differen imes in es video 3. From op lef o boom righ, he frame numbers are 2, 40, 6, 36, 58 and 80. Fig. 4. Tracking resuls wih he Condensaion algorihm a same imes as in Fig. 4. From op lef o boom righ, he frame numbers are 2, 40, 6, 36, 58 and 80.

20 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9 27 45 APF vs.

12 20 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) APF vs. Condensaion Displacemen (pixels) APF Condensaion Frame number Fig. 5. Tracking resuls for he APF and Condensaion algorihms. Table Summary of racking resuls for he APF and Condensaion algorihms Mean displacemen error (pixels) Sandard deviaion (pixels) Speed (frames/s) ieraion han does he Condensaion algorihm in order o yield more accurae esimaes of he proposal disribuion and he poserior disribuion. In summary, he APF algorihm can be seen o offer significanly more accurae racking han he convenional paricle filer, albei a he cos of a moderae hough olerable loss in racking speed. Nex, he performance of he BAPF algorihm is compared o ha of he APF algorihm. Boh algorihms employ N = 000 paricles for face racking in es video 3. The upper bound on he number of ieraions of he loop conrolled by parameer l is L = 3 for boh algorihms. In he BAPF algorihm, he weigh assigned o he resul of AdaBoos face deecion is c = 0.8. The number of successive frames F over which he oupu of he AdaBoos face deecion algorihm is averaged before combining i wih he APF oupu is F =. The experimenal resuls, shown in Fig. 6 and Table 2, demonsrae ha he racking accuracy of he BAPF algorihm is higher han ha of he APF algorihm. I can be seen from Table ha he mean displacemen error in he case of he BAPF algorihm is significanly lower han ha in he case of he APF algorihm. However, he racking speed of he BAPF algorihm is slighly slower han ha of he APF algorihm since he BAPF algorihm runs he AdaBoos face deecion procedure on each inpu frame of he video sequence. Thus, he BAPF algorihm can be seen o significanly improve he racking accuracy of he APF algorihm, albei a he cos of a sligh decrease in racking speed compared o he APF algorihm. The performance of he APF algorihm is analyzed using differen values of he parameer L which is he upper bound on he number of ieraions of he loop conrolled by he variable l in APF Condensaion he APF algorihm (Fig. 2). The APF algorihm employs N = 000 paricles for face racking in he es video 3. The value of L is varied from L = o L = 4 in he experimens. The experimenal resuls, as shown in Fig. 7 and Table 3, demonsrae ha he racking accuracy of he APF algorihm improves as he value of L increases. I can be seen from Table 3 ha he mean displacemen error in he case of he APF algorihm decreases wih increasing values of L. Thus, he APF algorihm wih a larger value of L provides more accurae racking. However, he racking speed of he APF algorihm also decreases wih increasing values of L since he APF algorihm performs more compuaion per ieraion o esimae he poserior disribuion. From Fig. 7 and Table 3, i can be seen ha as he value of L is increased beyond 3, he accuracy of esimaion of he poserior disribuion improves only slighly whereas he compuaion ime increases significanly. In order o srike a reasonable balance beween racking accuracy and racking speed for real-ime applicaions, we choose L = 3 for all of our experimens wih he APF algorihm and he BAPF algorihm. The performance of he BAPF algorihm is also analyzed for differen values of he parameer F. In Eq. (9), we combine he resuls of he APF algorihm and he AdaBoos algorihm o obain he curren conour of he racked face in he curren frame. The AdaBoos face deecion procedure is performed for each frame of he inpu video sequence. Using he AdaBoos face deecion algorihm, we obain he esimaed posiion of a deeced face by averaging he oupu of he AdaBoos face deecion algorihm over F successive frames including he curren frame (i.e., he curren frame and F previous frames). For example, for F = 3, we average he posiions of he deeced face in he curren frame and he previous wo frames (using he AdaBoos face deecion algorihm) o obain an esimaed posiion of he face in he curren frame, which is hen combined wih he oupu of he APF algorihm using Eq. (9). In his experimen, he weigh assigned o he resul of Ada- Boos face deecion procedure in he BAPF algorihm is c = 0.8. The value of F is varied wihin a se of values S F = {, 3, 5, 0} whereas he upper bound on he number of sampling ieraions L is fixed a 3. The BAPF algorihm employs N = 000 paricles for

W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9 27 2 40 BAPF vs. APF 35 30 Displacemen (pixels) 25 20 5 0 5 0 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 BAPF APF Frame number Fig. 6. Tracking resuls of he BAPF algorihm and he APF algorihm.

13 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) BAPF vs. APF Displacemen (pixels) BAPF APF Frame number Fig. 6. Tracking resuls of he BAPF algorihm and he APF algorihm. Table 2 Summary of racking resuls of he BAPF and he APF algorihms BAPF Mean displacemen error (pixels) Sandard deviaion (pixels) Speed (frames/s) APF Table 3 Summary of racking resuls for he APF algorihm for differen values of L L =4 L =3 L =2 L = Mean displacemen error (pixels) Sandard deviaion (pixels) Speed (frames/s) face racking in es video 3. The experimenal resuls in Fig. 8 and Table 4, show ha racking accuracy of he BAPF algorihm decreases wih increasing values of F. As can be seen from Table 4, he mean displacemen error in BAPF algorihm is an increasing 45 APF Displacemen (pixels) L=4 L=3 L=2 L= Frame number Fig. 7. Tracking resuls of he APF algorihm for differen values of L.

14 22 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) BAPF 20 Displacemen (pixels) F=0 F=5 F=3 F= Frame number Fig. 8. Tracking resuls of he BAPF algorihm for differen values of F. Table 4 Summary of racking resuls of he BAPF for differen values of F F =0 F =5 F =3 F = Mean displacemen error (pixels) Sandard deviaion (pixels) Speed (frames/s) funcion of F. The racking speed of he BAPF algorihm does no depend on he value of he parameer F. Hence, we choose F = for he BAPF algorihm in our experimens, i.e., we consider he resul of he AdaBoos face deecion procedure only for he curren frame before inegraing he resuls of he AdaBoos face deecion procedure and he APF algorihm. The performance of he BAPF algorihm is analyzed using differen values of he parameer c (denoed as gamma in Fig. 9), where c is a weigh assigned o he resul of he AdaBoos face deecion procedure in he BAPF algorihm as shown in Eq. (9). The weigh c is varied wihin a se of values consising of Sc = {0, 0.5, 0.8,.0}. When c = 0, he BAPF algorihm is equivalen o he pure APF algorihm. By increasing he value of c, we lay greaer emphasis on he resul of he AdaBoos face deecion procedure. When c =, he BAPF algorihm is equivalen o he pure AdaBoos face deecion algorihm. The BAPF algorihm employs N = 000 paricles for face racking in he es video 3 wih parameers F = and L = 3. The experimenal resuls in Fig. 9 and Table 5, show ha racking accuracy of he BAPF algorihm varies significanly wih differen values of he parameer c. I can be seen from Table 5 ha he mean displacemen error in he case of he BAPF algorihm wih c = 0.8 is he leas, and wih c = 0 is he highes. Thus he experimenal resuls show ha he performance of he BAPF-based racker ha simply uses he AdaBoos face deecion algorihm in every single frame is he wors. This can be aribued o he fac ha he AdaBoos face deecion algorihm by iself is prone o deecion of false faces or missing he racked faces in he video sequence. In he case of he BAPF algorihm where 0 < c <, he APF racking algorihm provides regions of ineres o he AdaBoos face deecion algorihm, whereas he AdaBoos face deecion algorihm, in urn, provides a means for iniializaion and verificaion of he oupu of he APF racking algorihm via he combinaion funcion in Eq. (9). Consequenly, on accoun of he synergeic ineracion beween he APF racking algorihm and he AdaBoos face deecion algorihm, he performance of he BAPF algorihm is superior o ha of eiher he APF algorihm or he AdaBoos algorihm used in isolaion. Table 5 shows ha he racking speed of he BAPF algorihm does no vary for differen values of c since he APF algorihm and he AdaBoos algorihm are boh performed for each frame of he inpu video sequence. In our experimens, we choose c = 0.8 for he BAPF algorihm Discussion of BAPF experimenal resuls The BAPF-based face racker was observed o be successful in racking he faces hroughou each of he es video sequences excep when he face being racked is compleely occluded for a long ime duraion, as shown in Fig. 20. In his case, he occluded face canno be disinguished from he foreground or he background using he AdaBoos face deecor. In he case where he face is occluded for a shor ime duraion, we assume ha he occluded face is saionary over he ime period of occlusion. In such an insance, he BAPF algorihm is able o recover from emporary racking failure since he AdaBoos face deecion algorihm is capable of reiniializing he APF racking algorihm. However, he above assumpion does no hold in he case of occlusion ha lass for a longer ime period, since he person wih he occluded face could well exi he scene during he ime period of occlusion. When he faces of hree people are occluded and are aligned wih he opical axis of he camera as shown in Fig. 20b, i is hard o deec and rack he faces of he wo people ha are farhes from he camera, which resuls in racking failure as well. From boh he cases shown in Fig. 20, i is clear ha from he appearance of he face alone i is no possible o reliably deduce he rue locaion of he occluded face. For more robus and accurae racking, we need o exploi furher sources of informaion such as he appearance of he body or he limbs o augmen he BAPF algorihm in order o handle cases of complee occlusion over long ime duraions. However, he augmened appearance model for accurae racking

W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) 9 27 23 20 BAPF 00 Displacemen (pixels) 80 60 40 20 0 0 9 28 37 46 55 64 73 82 9 00 09 8 27 36 45 54 gamma=0 gamma=0.5 gamma=0.8 gamma=.

15 W. Zheng, S.M. Bhandarkar / J. Vis. Commun. Image R. 20 (2009) BAPF 00 Displacemen (pixels) gamma=0 gamma=0.5 gamma=0.8 gamma=.0 Frame number Fig. 9. Tracking resuls of he BAPF algorihm wih differen values of he parameer c. Table 5 Summary of racking resuls of he BAPF wih differen values of he parameer c would increase he complexiy of he dynamical model, which, in urn, may reduce he robusness of he racker in oher scenarios. 6. Conclusions c =0 c = 0.5 c = 0.8 c =.0 Mean displacemen error (pixels) Sandard deviaion (pixels) Speed (frames/s) This paper proposes a novel algorihm for face deecion and racking based on a combinaion of a novel adapive paricle filering algorihm and he AdaBoos face deecion algorihm. The proposed algorihm provides a general framework for deecion and racking of faces in video sequences. The proposed framework is also applicable o he racking of oher ypes of objecs such as deformable and elasic objecs if appropriae conour models such as B-splines are used. The proposed Adapive Paricle Filer (APF) uses a new sampling echnique o obain accurae esimaes of he proposal disribuion and he poserior disribuion for improving racking accuracy in video sequences. The proposed scheme ermed as he Boosed Adapive Paricle Filer (BAPF) combines he APF wih he AdaBoos face deecion algorihm. The AdaBoos face deecion algorihm is used o deec faces in he inpu images, whereas he APF is used o rack he faces in he video sequences. The proposed BAPF algorihm is employed for face deecion, face verificaion, and face racking in video sequences. I is experimenally shown ha he performance of face deecion and face racking can be muually improved via synergeic ineracion in he proposed BAPF scheme resuling in a racker ha is boh accurae and compuaionally efficien. Fig. 20. (a) Tracking failure in case of occlusion for a long ime duraion. (b) Tracking failure in case of hree people overlapping.

Visual Perception as Bayesian Inference. David J Fleet. University of Toronto

Visual Perception as Bayesian Inference. David J Fleet. University of Toronto Visual Percepion as Bayesian Inference David J Flee Universiy of Torono Basic rules of probabiliy sum rule (for muually exclusive a ): produc rule (condiioning): independence (def n ): Bayes rule: marginalizaion: