Efficient Region Tracking With Parametric Models of Geometry and Illumination

Similar documents
STEREO PLANE MATCHING TECHNIQUE

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)

CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL

EECS 487: Interactive Computer Graphics

Real Time Integral-Based Structural Health Monitoring

CENG 477 Introduction to Computer Graphics. Modeling Transformations

4.1 3D GEOMETRIC TRANSFORMATIONS

In Proceedings of CVPR '96. Structure and Motion of Curved 3D Objects from. using these methods [12].

A Matching Algorithm for Content-Based Image Retrieval

Image segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding

Learning in Games via Opponent Strategy Estimation and Policy Search

Improved TLD Algorithm for Face Tracking

DAGM 2011 Tutorial on Convex Optimization for Computer Vision

Probabilistic Detection and Tracking of Motion Discontinuities

Sam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes.

Gauss-Jordan Algorithm

A METHOD OF MODELING DEFORMATION OF AN OBJECT EMPLOYING SURROUNDING VIDEO CAMERAS

A Fast Stereo-Based Multi-Person Tracking using an Approximated Likelihood Map for Overlapping Silhouette Templates

Visual Perception as Bayesian Inference. David J Fleet. University of Toronto

STRING DESCRIPTIONS OF DATA FOR DISPLAY*

An Improved Square-Root Nyquist Shaping Filter

Definition and examples of time series

Michiel Helder and Marielle C.T.A Geurts. Hoofdkantoor PTT Post / Dutch Postal Services Headquarters

Rao-Blackwellized Particle Filtering for Probing-Based 6-DOF Localization in Robotic Assembly

Real time 3D face and facial feature tracking

Nonparametric CUSUM Charts for Process Variability

Real-Time Non-Rigid Multi-Frame Depth Video Super-Resolution

In fmri a Dual Echo Time EPI Pulse Sequence Can Induce Sources of Error in Dynamic Magnetic Field Maps

An Adaptive Spatial Depth Filter for 3D Rendering IP

Coded Caching with Multiple File Requests

MATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008

Design Alternatives for a Thin Lens Spatial Integrator Array

The Impact of Product Development on the Lifecycle of Defects

Video-Based Face Recognition Using Probabilistic Appearance Manifolds

Video Content Description Using Fuzzy Spatio-Temporal Relations

MOTION DETECTORS GRAPH MATCHING LAB PRE-LAB QUESTIONS

FACIAL ACTION TRACKING USING PARTICLE FILTERS AND ACTIVE APPEARANCE MODELS. Soumya Hamlaoui & Franck Davoine

NEWTON S SECOND LAW OF MOTION

MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES

Optimal Crane Scheduling

Detection and segmentation of moving objects in highly dynamic scenes

Spline Curves. Color Interpolation. Normal Interpolation. Last Time? Today. glshademodel (GL_SMOOTH); Adjacency Data Structures. Mesh Simplification

Evaluation and Improvement of Region-based Motion Segmentation

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS 1

FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS

A High-Speed Adaptive Multi-Module Structured Light Scanner

AUTOMATIC 3D FACE REGISTRATION WITHOUT INITIALIZATION

Proceeding of the 6 th International Symposium on Artificial Intelligence and Robotics & Automation in Space: i-sairas 2001, Canadian Space Agency,

Algorithm for image reconstruction in multi-slice helical CT

Computer representations of piecewise

M y. Image Warping. Targil 7 : Image Warping. Image Warping. 2D Geometric Transformations. image filtering: change range of image g(x) = T(f(x))

Projection & Interaction

Image Content Representation

Open Access Research on an Improved Medical Image Enhancement Algorithm Based on P-M Model. Luo Aijing 1 and Yin Jin 2,* u = div( c u ) u

Robust Segmentation and Tracking of Colored Objects in Video

J. Vis. Commun. Image R.

IROS 2015 Workshop on On-line decision-making in multi-robot coordination (DEMUR 15)

Network management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional);

Robust parameterized component analysis: theory and applications to 2D facial appearance models

Real-time 2D Video/3D LiDAR Registration

A Face Detection Method Based on Skin Color Model

Upper Body Tracking for Human-Machine Interaction with a Moving Camera

Motion Estimation of a Moving Range Sensor by Image Sequences and Distorted Range Data

Real-Time Avatar Animation Steered by Live Body Motion

Graffiti Detection Using Two Views

An Iterative Scheme for Motion-Based Scene Segmentation

Curves & Surfaces. Last Time? Today. Readings for Today (pick one) Limitations of Polygonal Meshes. Today. Adjacency Data Structures

Moving Object Detection Using MRF Model and Entropy based Adaptive Thresholding

Motor Control. 5. Control. Motor Control. Motor Control

Visual Indoor Localization with a Floor-Plan Map

4 Error Control. 4.1 Issues with Reliable Protocols

Robust 3D Visual Tracking Using Particle Filtering on the SE(3) Group

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Fall 2012)

Robust Visual Tracking for Multiple Targets

Shortest Path Algorithms. Lecture I: Shortest Path Algorithms. Example. Graphs and Matrices. Setting: Dr Kieran T. Herley.

Occlusion-Free Hand Motion Tracking by Multiple Cameras and Particle Filtering with Prediction

Tracking Appearances with Occlusions

A Bayesian Approach to Video Object Segmentation via Merging 3D Watershed Volumes

Reconstruct scene geometry from two or more calibrated images. scene point. image plane. Reconstruct scene geometry from two or more calibrated images

arxiv: v1 [cs.cv] 25 Apr 2017

LAMP: 3D Layered, Adaptive-resolution and Multiperspective Panorama - a New Scene Representation

Improving the Efficiency of Dynamic Service Provisioning in Transport Networks with Scheduled Services

A Hierarchical Object Recognition System Based on Multi-scale Principal Curvature Regions

COSC 3213: Computer Networks I Chapter 6 Handout # 7

(Structural Time Series Models for Describing Trend in All India Sunflower Yield Using SAS

Image Based Computer-Aided Manufacturing Technology

THE micro-lens array (MLA) based light field cameras,

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

3-D Object Modeling and Recognition for Telerobotic Manipulation

Reinforcement Learning by Policy Improvement. Making Use of Experiences of The Other Tasks. Hajime Kimura and Shigenobu Kobayashi

SENSING using 3D technologies, structured light cameras

Scheduling. Scheduling. EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #4 Updated March 16, 2012

Streamline Pathline Eulerian Lagrangian

High Resolution Passive Facial Performance Capture

Scale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields

PART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR

Research Article Auto Coloring with Enhanced Character Registration

Analysis of Various Types of Bugs in the Object Oriented Java Script Language Coding

Multi-Target Detection and Tracking from a Single Camera in Unmanned Aerial Vehicles (UAVs)

A Review on Block Matching Motion Estimation and Automata Theory based Approaches for Fractal Coding

Transcription:

EEE TRANSACTONS ON PATTERN ANALYSS AND MACHNE NTELLGENCE, VOL. 2, NO. 1, OCTOBER 1998 1 Efficien Region Tracking Wih Parameric Models of Geomery and lluminaion Gregory D. Hager, Member, EEE, and Peer N. Belhumeur, Member, EEE Absrac As an objec moves hrough he field of view of a camera, he images of he objec may change dramaically. This is no simply due o he ranslaion of he objec across he image plane. Raher, complicaions arise due o he fac ha he objec undergoes changes in pose relaive o he viewing camera, changes in illuminaion relaive o ligh sources, and may even become parially or fully occluded. n his paper, we develop an efficien, general framework for objec racking one which addresses each of hese complicaions. We firs develop a compuaionally efficien mehod for handling he geomeric disorions produced by changes in pose. We hen combine geomery and illuminaion ino an algorihm ha racks large image regions using no more compuaion han would be required o rack wih no accommodaion for illuminaion changes. Finally, we augmen hese mehods wih echniques from robus saisics and rea occluded regions on he objec as saisical ouliers. Throughou, we presen experimenal resuls performed on live video sequences demonsraing he effeciveness and efficiency of our mehods. ndex Terms Visual racking, real-ime vision, illuminaion, moion esimaion, robus saisics. F 1 NTRODUCTON V ²²²²²²²²²²²²²²²² G.D. Hager is wih he Deparmens of Compuer Science and Elecrical Engineering, Yale Universiy, New Haven, CT 652-8285. E-mail: hager@cs.yale.edu. P.N. Belhumeur is wih he Deparmens of Compuer Science and Elecrical Engineering, Yale Universiy, P.O. Box 28267, New Haven, CT, 652-8267. E-mail: belhumeur@yale.edu. Manuscrip received 4 Mar. 1997; revised 16 July 1998. Recommended for accepance by J. Connell. For informaion on obaining reprins of his aricle, please send e-mail o: pami@compuer.org, and reference EEECS Log Number 17165. SUAL racking has emerged as an imporan componen of sysems in several applicaion areas including vision-based conrol [1], [2], [3], [4], human-compuer inerfaces [5], [6], [7], surveillance [8], [9], agriculural auomaion [1], [11], medical imaging [12], [13], and visual reconsrucion [14], [15], [16]. The cenral challenge in visual racking is o deermine he image configuraion of a arge region (or feaures) of an objec as i moves hrough a camera s field of view. This is done by solving wha is known as he emporal correspondence problem: he problem of maching he arge region in successive frames of a sequence of images aken a closely-spaced ime inervals. The correspondence problem for visual racking has, of course, much in common wih he correspondence problems which arise in sereopsis and moion esimaion. differs, however, in ha he goal is no o deermine he exac correspondence for every image locaion in a pair of images, bu raher o deermine, in a global sense, he movemen of an enire arge region over a long sequence of images. Wha makes racking difficul is he poenial variabiliy in he images of an objec over ime. This variabiliy arises from hree principle sources: variaion in arge pose or arge deformaions, variaion in illuminaion, and parial or full occlusion of he arge. When ignored, any one of hese hree sources of variabiliy is enough o cause a racking algorihm o lose is arge. Thus, he wo principal challenges for visual racking are o develop accurae models of image variabiliy and o design effecive and compuaionally efficien racking algorihms which use hese models. n his aricle, we develop a framework for modeling image variabiliy due o moion and illuminaion. n he case of moion, all poins in he arge region are presumed o be par of he same objec allowing us he luxury a leas for mos applicaions of assuming ha hese poins move coherenly in space. This permis us o develop low-order parameric models for he image moion of poins wihin a arge region models ha can be used o predic he movemen of he poins and rack he arge hrough an image sequence. n he case of illuminaion, we exploi he observaions of [17], [18], [19] o model image variaion due o changing illuminaion by low-dimensional linear subspaces. We hen show ha hese models can be incorporaed ino an efficien esimaion algorihm which esablishes emporal correspondence of he arge region by simulaneously deermining boh moion and illuminaion parameers. Finally, in he case of parial occlusion, we apply resuls from robus saisics [2] o develop auomaic mehods of rejecing occluded pixels in a compuaionally efficien manner. The resul is a family of region-racking algorihms which can easily rack large image regions (for example he face of a user a a worksaion) a a 3 Hz frame rae using no special hardware oher han a sandard digiizer. The racking algorihms developed in his paper are based on minimizing he sum-of-squared differences (SSD) beween wo regions. Alhough his idea has been successfully employed in many conexs including sereo maching [21], opical flow compuaion [22], and visual moion analysis [23], previous SSD-based racking algorihms have suffered from a variey of limiaions. Many algorihms have modeled he moion of he arge region as pure ranslaion in he image plane [16], [3]. This implicily as- 162-8828/98/$1. 1998 EEE

2 EEE TRANSACTONS ON PATTERN ANALYSS AND MACHNE NTELLGENCE, VOL. 2, NO. 1, OCTOBER 1998 sumes ha he underlying objec is ranslaing parallel o he image plane and is being viewed orhographically. While compuaionally efficien, over a long sequence hese assumpions are ofen violaed [23]. More elaborae racking algorihms have included paramerized models for ariculaion [24], [25] or nonrigid deformaions [26], [27] as well as linear image subspaces [28], [29]. However, he resuling algorihms rely on nonlinear opimizaion echniques which require from several seconds o several minues per frame o compue. Furhermore, none explicily address he problem of illuminaion changes. n fac, many algorihms avoid issues relaed o illuminaion by esimaing and accumulaing changes from frame o frame. As a resul any error in moion esimaion beween any wo frames is subsequenly propagaed hrough he enire sequence. Anoher well-esablished roue oward efficien racking is o deec and rack only a sparse collecion of feaures (or conours) [3], [11], [31], [32], [33]. As such mehods use local deecion of areas of high conras change, hey end o be insensiive o global changes in he inensiy and/or composiion of he inciden illuminaion. However, in many siuaions persisen, srong edges are sparsely disribued hroughou he image of he arge. This sparseness makes i difficul o esablish edge correspondences wihou srong geomeric consrains [33], [31] or an accurae predicive model [11], [3]. n conras, region-based mehods such as hose developed in his aricle make direc and complee use of all available image inensiy informaion, hereby eliminaing he need o idenify and model a special se of feaures o rack. By incorporaing illuminaion models and robus esimaion mehods ino an efficien correspondence algorihm, he performance of our region racking algorihms appears o be comparable o ha achieved by edge-based mehods, hereby making regionbased mehods an effecive complemen o local feaurebased algorihms. The remainder of his aricle is organized as follows. Secion 2 esablishes a framework for posing he problem of region racking for parameric moion models and describes condiions under which an efficien racking algorihm can be developed. Secion 3 hen shows how models of illuminaion can be incorporaed wih no loss of compuaional efficiency. Secion 4 deails modificaions for handling parial arge occlusion via robus esimaion echniques. Secion 5 presens experimenal resuls from an implemenaion of he algorihms. Finally, Secion 6 presens a shor discussion of performance improving exensions o our racking algorihm. 2 TRACKNG MOVNG OBJECTS n his secion, we describe a framework for he efficien racking of a arge region hrough an image sequence. We firs wrie down a general parameric model for he se of allowable image moions and deformaions of he arge region. We hen pose he racking problem as he problem of finding he bes (in a leas squares sense) se of parameer values describing he moions and deformaions of he arge hrough he sequence. Finally, we describe how he bes se of parameers can be efficienly compued. 2.1 On Recovering Srucured Moion Le (x, ) denoe he brighness value a he locaion x = (x, y) in an image acquired a ime and le x (x, ) denoe he spaial gradien a ha locaion and ime. The symbol denoes an idenified iniial ime and we refer o he image a ime as he reference image. Le he se 5 = {x 1, x 2,, x N } be a se of N image locaions which define a arge region. We refer o he brighness values of he arge region in he reference image as he reference emplae. Over ime, he relaive moion beween he arge objec and he camera causes he image of he arge o shif and o deform. Le us model he image moion of he arge region of he objec by a parameric moion model f(x; m) parameerized by m = (µ 1, µ 2,, µ n ), wih f(x; ) = x and N > n. We assume ha f is differeniable in boh m and x. We call m he moion parameer vecor. We consider recovering he moion parameer vecor for each image in he racking sequence as racking he objec. We wrie m * () o denoe he ground ruh values of hese parameers a ime, and m() o denoe he corresponding esimae. The argumen will be suppressed when i is obvious from is conex. Suppose ha a reference emplae is acquired a ime and ha iniially m * ( ) = m( ) =. Le us assume for now ha he only changes in subsequen images of he arge are compleely described by f, i.e., here are no changes in he illuminaion of he arge. follows ha for any ime >, here is a parameer vecor m * () such ha (x, ) = (f(x; m * ()), ) for all x 5. (1) This is a generalizaion of he so-called image consancy assumpion [34]. Thus, he moion parameer vecor of he arge region can be esimaed a ime by minimizing he following leas squares objecive funcion 1 6 4 3 2 7 8 2 79 O µ = f x µ x 2 ;,,. (2) x 5 For laer developmens, i is convenien o rewrie his opimizaion problem in vecor noaion. To his end, le us consider images of he arge region as vecors in an N- dimensional space. The image of he arge region a ime, under he change of coordinaes wih parameers m, is wrien as 2 7 µ, = 3f2x, µ 7, 8 1 3f2x2, µ 7, L 3f2xN, µ 7, 8 $ #. (3) This vecor is subsequenly referred o as he recified image a ime wih parameers m. We also make use of he parial derivaives of wih respec o he componens of m and he ime parameer. These are wrien as

HAGER AND BELHUMEUR: EFFCENT REGON TRACKNG WTH PARAMETRC MODELS OF GEOMETRY AND LLUMNATON 3 and µ i 2µ, 7 = = µ i µ i µ i µ i 3f2x, µ 7, 8 1 3f2x2, µ 7, M 3f2xN, µ 7, 8 $ # 2 N 7 (4) 3f2x, µ 7, 8 1 µ, = 3f2x, µ 7, = 2, (5) M # f x, µ, 2 7 3 8$ # where 1 i n. Using his vecor noaion, he image consancy assumpion (1) can be rewrien as and (2) becomes (m * (), ) = (, ) O(m) = (m, ) (, ) 2. (6) n general, (6) is a nonconvex objecive funcion. Thus, in he absence of a good saring poin, his problem will usually require some ype of cosly global opimizaion procedure o solve [35]. n he case of visual racking, he coninuiy of moion provides such a saring poin. Suppose ha, a some arbirary ime >, he geomery of he arge region is described by m(). We recas he racking problem as one of deermining a vecor of offses, δm, such ha m( + τ) = m() + δm from an image acquired a + τ. ncorporaing his modificaion ino (6), we redefine he objecive funcion as a funcion on δm O(δm) = (m() + δm, + τ) (, ) 2. (7) f he magniude of he componens of δm are small, hen i is possible o apply coninuous opimizaion procedures o a linearized version of he problem [29], [34], [21], [36], [23]. The linearizaion is carried ou by expanding (m + δm, + τ) in a Taylor series abou m and, (m + δm, + τ) = (m, ) + M(m, ) δm + τ (m, ) + h.o., (8) where h.o. denoes higher order erms of he expansion, and M is he Jacobian marix of wih respec o m, i.e., he N n marix of parial derivaives which can be wrien in column form as M(m, ) = [µ 1 (m, ) µ 2 (m, ) µ n (m, )]. (9) As he expression above indicaes, he values of he parial derivaives are a funcion of he evaluaion poin (m, ). These argumens will be suppressed when obvious from heir conex. By subsiuing (8) ino (7) and ignoring he higher-order erms, we have O(δm) (m, ) + M δm + τ (, ) 2. (1) Wih he addiional approximaion (1) becomes τ (m, ) (m, + τ) (m, ), O(δm) M δm + (m, + τ) (, ) 2. (11) Solving he se of equaions O = yields he soluion δm = (M M) M [(m, + τ) (, )], (12) provided he marix M M evaluaed a (m, ) has full rank. When his is no he case, we are faced wih a generalizaion of he aperure problem, i.e., he arge region does no have sufficien srucure o deermine all of he elemens of m uniquely. Furher discussion of his poin can be found in Secion 2.4. n subsequen developmens, i will be convenien o define he error vecor e( + τ) = (m(), + τ) (, ). ncorporaing his definiion ino (12), we see ha he soluion of (6) a ime + τ given a soluion a ime is m( + τ) = m() (M M) M e( + τ). (13) is imporan o noe a his poin ha he soluion for δm is homogeneous in e. Thus, while errors in calculaing M may affec sabiliy or speed of convergence, hey do no affec he saionary poins of (13). 2.2 An Efficien Tracking Algorihm From (13), we see ha o rack he arge region hrough he image sequence, we mus compue he Jacobian marix M(m, ). Each elemen of his marix is given by m = f x ; µ, ij µ 3 2 j i 7 8 f 3 2 i 7 8 µ 2 j i 7 (14) = f x ; µ, f x ; µ where f is he gradien of wih respec o he componens of he vecor f. Recall ha he Jacobian marix of he ransformaion f regarded as a funcion of m is he 2 n marix 2 7 2 7 2 7 2 7 1 2 n $ # f x; µ f x; µ f x; µ fµ x; µ =... µ µ µ. (15) By making use of (15), M can be wrien compacly in row form as 2 7 M µ, = f x ; µ, f x ; µ f f 3 2 7 8 µ 2 7 1 1 3 2 7 8 2 7 $ # f f x2; µ, fµ x2; µ. (16) M f x ; µ, f x ; µ 3 2 N 7 8 µ 2 N 7 Because M depends on ime-varying quaniies, i may appear ha i mus be compleely recompued a each ime sep a compuaionally expensive procedure involving he calculaion of he image gradien vecor, he calculaion of a 2 n Jacobian marix, and n 2 1 vecor inner producs for each of he N pixels of he arge region. However, we now show ha i is possible o reduce his compuaion by boh eliminaing he need o recompue image gradiens and by facoring M. Firs, we eliminae he need o compue image gradiens. To do so, le us assume ha our esimae is exac,

4 EEE TRANSACTONS ON PATTERN ANALYSS AND MACHNE NTELLGENCE, VOL. 2, NO. 1, OCTOBER 1998 i.e., m() = m * (). By differeniaing boh sides of (1), we obain x (x, ) = f x (x; m) f (f(x; m), ), (17) where f x is he 2 2 Jacobian marix of f reaed as a funcion of x = (x, y), 2 7 2 7 2 7 $ # f x; µ f x; µ fx x; µ = x y. (18) Combining (17) wih (16), we see ha M can be wrien as 1 6 M µ = 2 7 2 7 2 7 $ # x 2 1 7 x2 1 7 µ 2 1 7 x ; f x ; µ f x ; µ x x2; fx x2; µ fµ x2; µ. (19) M x ; f x ; µ f x ; µ x 2 N 7 x2 N 7 µ 2 N 7 follows ha for any choice of image deformaions, he image spaial gradiens need only be calculaed once on he reference emplae. This is no surprising given ha he arge a ime > is only a geomeric disorion of he arge a ime, and so is image gradiens are also a disorion of hose a. This ransformaion also allows us o drop he ime argumen of M and regard i solely as a funcion of m. The remaining nonconsan facor in M is a consequence x µ of he fac ha, in general, f x and f m involve componens of m and, hence, implicily vary wih ime. However, suppose ha we choose f so ha f f can be facored ino he produc of a 2 k marix G which depends only on image coordinaes, and a k n marix S which depends only on m as f x (x; m) -1 f m (x; m) = G(x)S(m). (2) For example, as discussed in more deail below, one family of such facorizaions resuls when f is a linear funcion of he image coordinae vecor x. Combining (19) wih (2), we have x2x 7 2x 7 1, Γ 1 M1µ 6 = x2x 7 2x 7 2, Γ 2 Σ1µ 6 = MΣ1µ 6. (21) M # x, Γ x x 2 N 7 2 N7$ # As a resul, we have shown ha M can be wrien as a produc of an consan N k marix M and a ime-varying k n marix Σ. We can now exploi his facoring o define an efficien racking algorihm which operaes as follows: Offline: Define he arge region. Acquire and sore he reference emplae. Compue and sore M and Λ = M M. Online: Use he mos recen moion parameer esimae m() o recify he arge region in he curren image. Compue e( + τ) by aking he difference beween he recified image and he reference emplae. Solve he sysem Σ ΛΣδµ = Σ M e + τ S is evaluaed a m(). Compue m( + τ) = m() + δm. 5 for δm, where The online compuaion performed by his algorihm is quie small and consiss of wo n k marix muliplies, k N- vecor inner producs, n k-vecor inner producs, and an n n linear sysem soluion, where k and n are ypically far smaller han N. We noe ha he compuaion can be furher reduced if S is inverible. n his case, he soluion o he linear sysem can be expressed as δm = S (M M ) M e( + τ), (22) 1 4 9 where S = (S ) is evaluaed a m(). The facor M M can be compued offline, so he online compuaion is reduced o n N-vecor inner producs and n n-vecor inner producs. 2.3 Some Examples 2.3.1 Linear Models Le us assume ha f(x; m) is linear in x. Then we have and, hence, f x = A. follows ha f M f(x; m) = A(m)x + u(m) (23) x fµ is linear in he componens of x and he facoring defined in (2) applies. We now presen hree examples illusraing hese conceps. 2.3.1.1 Pure Translaion n he case of pure ranslaion, he allowed image moions are parameerized by he vecor u = (u, v) giving f(x; u) = x + u. (24) follows immediaely ha f x and f m are boh he 2 2 ideniy marix and, herefore, M = [ x ( ) y ( )], (25) and S is he 2 2 ideniy marix. The resuling linear sysem is nonsingular if he image gradiens in he emplae region are no all collinear, in which case he soluion a each ime sep is jus Noe ha in his case, 4 9 5. (26) δµ = M M M e + τ 4 9 M, Λ = M M a consan marix which can be compued offline. 2.3.1.2 Translaion, Roaion, and Scale The moion of objecs which are viewed under scaled orhography and which do no undergo ou-of-plane roaion can be modeled in he image plane by a planar rigid moion consising of a ranslaion u and a roaion hrough an angle θ, plus scaling by a facor s. We subsequenly refer o his as he RM+S model. The change of coordinaes is given by f(x; u, θ, s) = sr(θ)x + u, (27) where R(θ) is a 2 2 roaion marix. Afer some minor algebraic manipulaions, we obain

HAGER AND BELHUMEUR: EFFCENT REGON TRACKNG WTH PARAMETRC MODELS OF GEOMETRY AND LLUMNATON 5 and Γx5 = 2 7 Σ θ, s = 1 1 1 s 5 R θ y x x y $# 1 s $ # (28) 1. (29) From his M can be compued using (21) and, since S is inverible, he soluion o he linear sysem becomes δm = S (M M ) M e( + τ). (3) This resul can be explained as follows. The marix M is he linearizaion of he sysem abou θ = and s = 1. A ime, he arge has orienaion θ() and s(). mage recificaion effecively roaes he arge by θ and scales by 1, so he s displacemens of he arge are compued in he original arge coordinae sysem. S hen applies a change of coordinaes o roae and scale he compued displacemens from he original arge coordinae sysem back o he acual arge coordinaes. $LQHRWLRQ The image disorions of planar objecs viewed under orhographic projecion are described by a six-parameer linear change of coordinaes. Suppose ha we define 2u, v, a, b, c, d7 2 7 = + $# $ # = + µ = a c u f x; µ b d x v Ax u (31) Afer some minor algebraic manipulaions, we obain and 5 = Γ x Σ 1µ 6 = 1 x y 1 x y $ # $# (32) A A. (33) A Noe ha S is once again inverible which allows for addiional compuaional savings as before. 2.3.2 Nonlinear Moion Models The separabiliy propery needed for facoring does no hold for any ype of nonlinear moion. However, consider a moion model of he form u f2x; u, v, a7 = x + 2, (34) v + 1 / 2ax where x = (x, y). nuiively, his model performs a quadraic disorion of he image according o he equaion y = 1/2ax 2. For example, a polynomial model of his form was used in [27] o model he moions of lips and eyebrows on a face. Again, afer several algebraic seps we arrive a Γx5 1 1 1 = 2 x Σ $ # and 1µ 6 = 1 x a #. (35) 2 1 $# # $ # Noe ha his general resul holds for any disorion which can be expressed exclusively as eiher y = f(x) or x = g(y). However, adding more freedom o he moion model, for example combining affine and polynomial disorion, ofen makes facoring impossible. One possibiliy in such cases is o use a cascaded model in which he image is firs recified using an affine disorion model, and hen he resuling recified image is furher recified for polynomial disorion. 2.4 On he Srucure of mage Change The Jacobian marix M plays a cenral role in he algorihms described above, so i is informaive o digress briefly on is srucure. f we consider he recified image as a coninuous ime-varying quaniy, hen is oal derivaive wih respec o ime is d dµ = M + or = Mµ + d d & &. (36) Noe ha his is simply a differenial form of (8). Due o he image consancy assumpion (1), i follows ha & = when m = m *. This is, of course, a parameerized version of Horn s opical flow consrain equaion [34]. n his form, i is clear ha he role of M is o relae variaions in moion parameers o variaions in brighness values in he arge region. The soluion given in (13) effecively reverses his relaionship and provides a mehod for inerpreing observed changes in brighness as moion. n his sense, we can hink of he algorihm as performing correlaion on emporal changes (as opposed o spaial srucure) o compue moion. To beer undersand he srucure of M, recall ha in column form, i can be wrien in erms of he parial derivaives of he recified image: M = [µ 1 µ 2 µ n ]. (37) Thus, he model saes ha he emporal variaion in image brighness in he arge region is a weighed combinaion of he vecors µ i. We can hink of each of hese columns (which have an enry for every pixel in he arge region) as a moion emplae which direcly represens he changes in brighness induced by he moion represened by he corresponding moion parameer. For example, in Fig. 1 we have shown hese emplaes for four canonical moions of an image of a human face. The developmen in his secion has assumed ha we sar wih a given parameric moion model from which hese emplaes are derived. Based on ha model, he srucure of each enry of M is given by (15) which saes ha m, = f µ x= x. (38) i j f j i The image gradien f defines, a each poin in he image, he direcion of sronges inensiy change. The vecor f µ j evaluaed a x i is he insananeous direcion and magniude of moion of ha image locaion capured by he parameer m j. The collecion of he laer for all pixels in he region represens he moion field defined by he moion parameer m j. Thus, he change in he brighness of he image locaion x i due o he moion parameer m j is he projecion

6 EEE TRANSACTONS ON PATTERN ANALYSS AND MACHNE NTELLGENCE, VOL. 2, NO. 1, OCTOBER 1998 (a) (b) (c) (d) Fig. 1. The moion emplaes of a human face for four canonical moions. (a) X ranslaion. (b) Y ranslaion. (c) Roaion. (d) Scale. of he image gradien ono he moion vecor. This also explains why each pixel in he image conribues only one consrain o he parameer compuaion. More imporanly, he mehods described above assume ha M M is full rank. Alhough, in general, his condiion depends on boh he srucure of he moion o be compued and he srucure of he image iself, he form of (38) provides some insigh ino he rank srucure of M. n paricular, i follows ha for M M o be rank deficien, here mus exis a γ n such ha 4 f µ x= x 9γ 1 i f =, i N. (39) Geomerically, his condiion corresponds o a moion γ such ha he displacemen of every pixel in he image is orhogonal o he local image gradien. 1 Thus, we can view he rank deficiency of M as a generalizaion of he wellknown aperure problem [34] in opical flow. Finally, (38) suggess how our echniques can be used o perform srucured moion esimaion wihou an explici parameric moion model. Firs, if he changes in images due o moion can be observed direcly (for example, by compuing he differences of images aken before and afer small reference moions are performed), hen hese can be used as he moion emplaes which comprise M. Second, if a one or more moion fields can be observed (for example, by racking a se of fiducial poins in a series of raining images), hen projecing each elemen of he moion field ono he corresponding image gradien yields moion emplaes for hose moion fields. The linear esimaion process described above can be used o inerpre ime-varying images in erms of hose basis moions. 3 LLUMNATON-NSENSTVE TRACKNG The sysems described above are inherenly sensiive o changes in illuminaion of he arge region. This is no surprising, as he incremenal esimaion sep is effecively compuing a srucured opical flow, and opical flow mehods are well-known o be sensiive o illuminaion changes [34]. Thus, shadowing or shading changes of he arge objec over ime lead o bias, or, in he wors case, complee loss of he arge. Recenly, i has been shown ha a relaively small num- 1. Noe ha one possibiliy is ha he gradien a a poin is zero, in which case his is rue of any moion. ber of basis images can ofen be used o accoun for large changes in illuminaion [19], [18], [17], [37]. Briefly, he reason for his is as follows. Consider a poin p on a Lamberian surface and a collimaed ligh source characerized by a vecor s 3, such ha he direcion of s gives he direcion of he ligh rays and s gives he inensiy of he ligh source. The irradiance a he poin p is given by E = an s, (4) where n is he uni inward normal vecor o he surface a p and a is he nonnegaive absorpion coefficien (albedo) of he surface a he poin p [34]. This shows ha he irradiance a he poin p, and hence he gray level measured by a camera, is linear on s 3. Therefore, in he absence of self-shadowing, given hree images of a Lamberian surface from he same viewpoin aken under hree linearly independen ligh source direcions, one can reconsruc he image of he surface under a novel lighing direcion by a linear combinaion of he hree original images [37], [38]. n oher words, if he surface is purely Lamberian and here is no shadowing, hen all images under varying illuminaion lie wihin a 3D linear subspace of N, he space of all possible images (where N is he number of pixels in he images). A complicaion comes when handling shadowing: All images are no longer guaraneed o lie in a linear subspace [19]. Neverheless, as done in [17], we can sill use a linear model as an approximaion: A small se of basis images can accoun for much of he shading changes ha occur on paches of nonspecular surfaces. Naurally, we need more han hree images (we use beween eigh and 15) and a higher han hree-dimensional linear subspace (we use four or five) if we hope o provide good approximaion o hese effecs. Reurning o he problem of region racking, suppose now ha we have a basis of image vecors B 1, B 2,, B m where he ih elemen of each of he basis vecors corresponds o he image locaion x i 5. To accommodae changes in conras, we choose he firs basis vecor o be he emplae image iself, i.e., B 1 = (, ). To model brighness changes, we choose he second basis vecor o be a column of ones, i.e., B 2 = (1, 1,, 1). 2 Le us choose he remaining basis vecors by performing SVD (singular value 2. n pracice, choosing a value close o he mean of he brighness of he image produces a beer condiioned linear sysem.

HAGER AND BELHUMEUR: EFFCENT REGON TRACKNG WTH PARAMETRC MODELS OF GEOMETRY AND LLUMNATON 7 decomposiion) on a se of raining images of he arge, aken under varying illuminaion. We denoe he collecion of basis vecors by he marix B = [B 1 B 2 B m ] and he corresponding parameers by he vecor l = (λ 1, λ 2,, λ m ). Combining moion wih illuminaion, he image consancy consrain, (1), can now be rewrien as and (2) becomes (m * (), ) = (, ) + Bl(), (41) O(m, l) = (m, ) (, ) Bl 2. (42) n shor, we now have expressions which simulaneously model boh geomeric and phoomeric image changes. By rewriing his opimizaion as O(δm, l) = (m() + δm, + τ) + Bl (, ) 2, (43) and subsiuing in (8) we arrive a O(δm, l) = Mδm + Bl + (m(), + τ) (, ) 2. (44) Solving O(δm, l) = yields δµ λ $# = M M B M $# M B B B M B + $# e τ5, (45) where e( + τ) = (m(), + τ) (, ) as before. We would now like o apply he facoring mehods of he previous secion o reduce he online compuaion needed for esimaion. However, leing B(x; l) denoe he value for pixel locaion x of Bλ, from (41) we have x (x, ) = f x (x; m) f (f(x; m), ) x B(x; l). (46) f we follow he same seps as before in facoring M, we find ha x B(x; l) will appear in M, hus requiring recompuaion of ha form. n pracice, we have found ha, for he specific case of illuminaion, hese erms are small and can be safely ignored wihou seriously affecing he sabiliy of he resuling racking sysem. 3 gnoring hese erms, M facors as before and, since B is consan, he sysem can be efficienly compued. Furher efficiencies can be realized if we are only ineresed in he moion parameers and hence we only need o compue he porions of (45) peraining o hose parameers. We can compue an explici form of his expression by firs opimizing over l as a funcion of δm in (44) and subsiuing he soluion back ino (44). Doing so, solving he resuling expression for δm, and wriing M in facored form, we arrive a 4 9 5, (47) δµ = Σ M NM M Ne + τ N = (1 B(B B) B ). (48) Since N is consan, he compuaion needed o realize (47) depends only on he number of moion fields o be compued, no on he illuminaion model. As a resul, we can compue moion parameers while accouning for variaions in illuminaion using no more online compuaion han would be required o compue pure moion. 3. Noe ha his may no hold rue for oher subspace decomposiions such as hose used by [29]. 4 MAKNG TRACKNG RESSTANT TO OCCLUSON As a sysem racks objecs over a large space, i is no uncommon ha oher objecs inrude ino he picure. For example, he sysem may be in he process of racking a arge region which is he side of a building when, due o observer moion, a parked car begins o occlude a porion of ha region. Similarly he arge objec may roae, causing he racked region o slide off and pick up a porion of he background. Such inrusions will bias he moion parameer esimaes and, in he long erm can poenially cause misracking. n his secion, we describe how o avoid such problems. For he sake of simpliciy, we develop a soluion for he case where we are only recovering moion parameers; he modificaions for combined moion and illuminaion models are sraighforward. A common approach o his problem is o assume ha occlusions creae large image differences which can be viewed as ouliers by he esimaion process [29]. The error meric is hen modified o reduce sensiiviy o ouliers by solving a robus opimizaion problem of he form O 1 R µ 6 = ρ 4 3 f 2 x; µ 7, 8 2 x, 79, (49) x 5 where ρ is one of a variey of robus regression merics [39]. is well-known ha opimizaion of (49) is closely relaed o anoher approach o robus esimaion ieraively reweighed leas squares (RLS). We have chosen o implemen he opimizaion using a somewha unusual form of RLS due o Duer and Huber [2]. n order o formulae he algorihm, we inroduce he noaion of an inner ieraion which is performed one or more imes a each ime sep. We will use a superscrip o denoe hese ieraions, and refer each ime sep in he esimaion as an ouer ieraion. Le δm i denoe he value of δm compued by he ih inner ieraion wih δm =. Define he vecor of residuals in he ih ieraion r i as r i = e( + τ) M(m)δm i. (5) We inroduce a diagonal weighing marix W i = W(r i ) which has enries i i i i 4 9 4 9 1. (51) wk, k = η rk = ρ rk / rk, k N The inner ieraion cycle a ime + τ is consiss of performing an esimaion sep by solving he linear sysem i 1 i i Σ ΛΣδµ + = Σ M W r, (52) where S is evaluaed a m() and r i and W i are given by (5) and (51), respecively. This process is repeaed for k ieraions. This form of RLS is paricularly efficien for our problem. does no require recompuaion of L or S and, since he weighing marix is diagonal, does no add significanly o he overall compuaion ime needed o solve he linear sysem. n addiion, he error vecor e is fixed over all inner ieraions, so hese ieraions do no involve acquiring or warping images. As discussed in [2], on linear problems his procedure is guaraneed o converge o a unique global minimum for a large variey of choices of ρ. n his aricle, ρ is aken o be a

8 EEE TRANSACTONS ON PATTERN ANALYSS AND MACHNE NTELLGENCE, VOL. 2, NO. 1, OCTOBER 1998 so-called windsorizing funcion [39] which is of he form 5 % = &K 'K ρ r r 2 / 2 2 if r τ c r c / 2 if r > τ (53) where r is normalized o have uni variance. The parameer τ is a user-defined hreshold which places a limi on he variaions of he residuals before hey are considered ouliers. This funcion has he advanage of guaraneeing global convergence of he RLS mehod while being cheap o compue. The updaing funcion for marix enries is 5 % = & η r r τ ' 1 if c / r if r > τ (54) As saed, he weighing marix is compued anew a each ouer ieraion, a process which can require several inner ieraions. However, given ha racking is a coninuous process, i is naural o sar each ouer ieraion wih a weighing marix which is closely relaed o ha compued a he end of he previous ouer ieraion. n doing so, wo issues arise. Firs, he fac ha he linear sysem we are solving is a local linearizaion of a nonlinear sysem means ha, in cases when inerframe moion is large, he effec of higher-order erms of he Taylor series expansion will cause areas of he image o masquerade as ouliers. Second, if we assume ha areas of he image wih low weighs correspond o inruders, i makes sense o add a buffer zone around hose areas before he nex ouer ieraion o proacively cancel he effecs of inruder moion. Boh of hese problems can be deal wih by noing ha he diagonal elemens of W hemselves form an image where dark areas (hose locaions wih low value) are areas of occlusion or inrusion, while brigh areas (hose wih value one) are he expeced arge. Le Q(x) o be he pixel values in he eigh-neighborhood of he image coordinae x plus he value a x iself. We use wo common morphological operaors [4] erode x dilae x 5 = max v Q x5 5 = min v Q x5 v (55) v. (56) When applied o a weighing marix image, erode has he effec of removing small areas of oulier pixels, while dilae increases heir size. Beween frames of he sequence we propagae he weighing marix forward afer applying one sep of erode o remove small areas of ouliers followed by wo or hree seps of dilae o provide a buffer abou previously deeced inruders. 5 MPLEMENTATON AND EXPERMENTS This secion illusraes he performance of he racking algorihm under a variey of circumsances, noing paricularly he effecs of image warping, illuminaion compensaion, and oulier deecion. All experimens were performed on live video sequences by an SG ndy equipped wih a 175 Mhz R44 SC processor and VNO image acquisiion sysem. 5.1 mplemenaion We have implemened he mehods described above wihin he X Vision environmen [41]. The implemened sysem incorporaes all of he linear moion models described in Secion 2, nonorhonormal illuminaion bases as described in Secion 3, and oulier rejecion using he algorihm described in Secion 4. The image warping required o suppor he algorihm is implemened by facoring linear ransformaions ino a roaion marix and a posiive-definie upper-diagonal marix. This facoring allows image warping o be implemened by firs acquiring a roaed recangular image region surrounding he arge, and hen scaling and shearing he region using bilinear inerpolaion. The resoluion of he region is hen reduced by averaging neighboring pixels. Spaial and emporal derivaives are compued by applying Prewi operaors on he reduced scale images. More deails on his level of he implemenaion can be found in [41]. n hese experimens, he algorihm is iniialized by ineracively selecing a region o rack in a live video sream. The algorihm immediaely acquires he seleced region as he reference emplae and performs racking on all subsequen images of he sream. When an illuminaion basis is used, care was aken o selec he reference emplae o correspond o he basis, bu no auomaic regisraion was performed. Timings of he algorihm 4 indicae ha i can perform frame rae (3 Hz) racking of image regions of up o 1 1 pixels undergoing affine disorions and illuminaion changes a one-half resoluion. Similar performance has been achieved on a 12 Mhz Penium processor and 7 Mhz Sun Sparc- Saion. Higher performance is achieved for smaller regions, lower resoluions, or fewer parameers. For example, racking he same size region while compuing jus ranslaion a one-fourh resoluion akes jus four milliseconds per cycle. 5.2 Planar Tracking As a baseline, we firs consider racking a non-specular planar objec he cover of a book. Affine warping augmened wih brighness and conras compensaion is a good approximaion in his case (i is exac for an orhographic camera model and purely Lamberian surface). As a poin of comparison, recen work by Black and Jepson [29] used he rigid moion plus scaling model for SSDbased region racking. Their reduced model is more efficien and may be more sable since fewer parameers mus be compued, bu i does ignore he effecs of changing aspec raio and shear. We esed boh he rigid moion plus scale (RM+S) and full affine (FA) moion models on he same live video sequence of he book cover in moion. Fig. 2 shows he se of moion emplaes (he columns of he moion marix) for an 81 72 region of a book cover racked a one hird resoluion. The upper series of images shows several images of he objec wih he region racked indicaed wih a black frame (he RM+S algorihm) and a whie frame (he FA algorihm). 5 The middle row of images shows he oupu of 4. Because of addiional daa collecion overhead, he racking performance in he experimens presened here is slower han he saed figures. 5. These annoaions indicae he region acquired in he firs sage of im-

HAGER AND BELHUMEUR: EFFCENT REGON TRACKNG WTH PARAMETRC MODELS OF GEOMETRY AND LLUMNATON 9 Fig. 2. Top, several images of a planar region and he corresponding warped image compued by a racker compuing posiion, orienaion, and scale (RM+S), and one compuing a full affine deformaion (FA). The image a he lef is he iniial reference image. Boom, he graph of he SSD residuals for boh algorihms. he warping operaor from he RM+S algorihm. f he compued parameers were error-free, hese images would be idenical. However, because of he inabiliy o correc for aspec raio and skew, he bes fi leads o a skewed image. The boom row shows he oupu of he warping operaor for he FA algorihm. Here, we see ha full affine warping is much beer a accommodaing he full range of image disorions. The graph a he boom of he figure shows he leas squares residual (in squared gray-values per pixel). Here, he difference beween he wo geomeric models is age warping and so do no indicae disorions due o image shear. clearly eviden. 5.3 Human Face Tracking There has been a grea deal of recen ineres in face racking in he compuer vision lieraure [27], [6], [42]. Alhough faces can produce images wih significan variaion due o illuminaion, empirical resuls sugges ha a small number of basis images of a face gahered under differen illuminaions is sufficien o accuraely accoun for mos gross shading and illuminaion effecs [17]. A he same ime, he deph variaions exhibied by facial feaures are small enough o be well-approximaed by an affine warping model. The following experimens demonsrae he abiliy

1 EEE TRANSACTONS ON PATTERN ANALYSS AND MACHNE NTELLGENCE, VOL. 2, NO. 1, OCTOBER 1998 Fig. 3. Top row, excerps from a sequence of racked images of a face. The black frames represen he region racked by an SSD algorihm using no illuminaion model (RM+S) and he whie frames represen he regions racked by an algorihm which includes an illuminaion model (RM+S+). n some cases he esimaes are so close ha only one box is visible. Middle row, he region wihin he frame warped by he curren moion esimae. Boom, he residuals of he algorihms expressed in gray-scale unis per pixel as a funcion of ime. of our algorihm o rack a face as i undergoes changes in pose and illuminaion, and under parial occlusion. Throughou, we assume he subjec is roughly looking oward he camera, so we use he rigid moion plus scaling (RM+S) moion model. Fig. 1 shows he columns of he moion marix for his model. 5.3.1 Geomery We firs performed a es o deermine he accuracy of he compued moion parameers for he face and o invesigae he effec of he illuminaion basis on he sensiiviy of hose esimaes. During his es, we simulaneously execued wo racking algorihms: one using he rigid moion plus scale model (RM+S) and one which addiionally included an illuminaion model for he face (RM+S+). The algorihms were execued on a sequence which did no conain large changes in he illuminaion of he arge. The op row of Fig. 3 shows images excerped from he video sequence. n each image, he black frames denoe he region seleced as he bes mach by RM+S and he whie frames correspond o he bes mach compued by RM+S+. For his es, we would expec boh algorihms o be quie accurae and o exhibi similar performance unless he illuminaion basis significanly affeced he sensiiviy of he compuaion. As is apparen from he figures, he compued moion parameers of boh algorihms are exremely similar for he enire run so close ha in many cases one frame is obscured by he oher. n order o demonsrae he absolue accuracy of he racking soluion, below each live image in Fig. 3 we have included he corresponding recified image compued by RM+S+. The recified image a ime zero is he reference emplae. f he moion of he arge fi he RM+S moion model, and he compued parameers were exac, hen we would expec each subsequen recified image o be idenical o he reference emplae. Despie he fac ha he face is nonplanar and we are using a reduced moion model, we see ha he algorihm is quie effecive a compuing an accurae geomeric mach. Finally, he graph in Fig. 3 shows he residuals of he linearized SSD compuaion a each ime sep. As is apparen from he figures, he residuals of boh algorihms are also exremely similar for he enire run. From his experimen we conclude ha, in he absence of illuminaion changes, he performance of boh algorihms is quie similar including illuminaion models does no appear o re-

HAGER AND BELHUMEUR: EFFCENT REGON TRACKNG WTH PARAMETRC MODELS OF GEOMETRY AND LLUMNATON 11 Fig. 4. The illuminaion basis for he face (conras and brighness componens no shown). Fig. 5. The firs row of images shows excerps of a racking sequence. The second row is a magnified view of he region in he whie frame. The hird row conains he images in he second row afer adjusmen for illuminaion using he illuminaion basis shown in Fig. 4 (for he sake of comparison, we have no adjused for brighness and conras across he sequence). duce accuracy. 5.3.2 lluminaion n a second se of experimens, we kep he face nearly moionless and varied he illuminaion. We used an illuminaion basis of four orhogonal image vecors. This basis was compued offline by acquiring en images of he face under various lighing condiions. A singular value decomposiion was applied o he resuling image vecors and he vecors wih he maximum singular values were chosen o be included in he basis. The illuminaion basis is shown in Fig. 4. Fig. 5 shows he effecs of illuminaion compensaion for he illuminaion siuaions depiced in he firs row. As wih warping, if he compensaion were perfec, he images of he boom row would appear o be idenical up o brighness and conras. n paricular, noe how he srong shading effecs of frames 11 hrough 12 have been correced by he illuminaion basis. 5.3.3 Combining lluminaion and Geomery Nex, we presen a se of experimens illusraing he ineracion of geomery and illuminaion. n hese experimens, we again execued wo algorihms labeled RM+S and RM+S+. As he algorihms were operaing, a ligh was periodically swiched on and off and he face moved slighly. The resuls appear in Fig. 6. n he residual graph, we see ha he illuminaion basis clearly accouns for he shading on he face quie well, leading o a much lower flucuaion of he residuals. The sequence of images shows an excerp near he middle of he sequence where he RM+S al-

12 EEE TRANSACTONS ON PATTERN ANALYSS AND MACHNE NTELLGENCE, VOL. 2, NO. 1, OCTOBER 1998 Fig. 6. Top, an excerp from a racking sequence conaining changes in boh geomery and illuminaion. The black frame corresponds o he algorihm wihou illuminaion (RM+S) and he whie frame corresponds o he algorihm wih an illuminaion basis (RM+S+). Noe ha he algorihm which does no use illuminaion compleely loses he arge unil he original lighing is resored. Boom, he residuals, in gray-scale unis per pixel, of he wo algorihms as a ligh is urned on and off. Fig. 7. A run combining illuminaion and geomery in which he algorihm wihou illuminaion compensaion (black frame) loses he arge while he algorihm wih illuminaion compensaion (whie frame) does no. gorihm (which could no compensae for illuminaion changes) compleely los he arge for several frames, only regaining i afer he original lighing was resored. Since he arge was effecively moionless during his period, his can be compleely aribued o biases due o illuminaion effecs. Similar sequences wih larger arge moions ofen cause he purely geomeric algorihm o lose he arge compleely as shown in Fig. 7. 5.3.4 Tracking Wih Ouliers Finally, we illusrae he performance of he mehod when he image of he arge becomes parially occluded. We again rack a face. The moion and illuminaion basis are he same as before. n he weighing marix calculaions, he pixel gray-scale variance was se o five (abou wha is observed in our camera) and he oulier hreshold was se o a conservaive value of five variance unis. The sequence is an office sequence which includes several inrusions including he background, a piece of paper, a elephone and a soda can. As before we execued wo versions of he racker, he nonrobus algorihm from he previous experimen (RM+S+) and a robus version (RM+S++O). Fig. 8 shows he resuls. The upper series of images shows he region acquired by boh algorihms (he black frame corresponds o RM+S+, he whie o RM+S++O). As is clear from he sequence, he nonrobus algorihm is disurbed significanly by he occlusion, whereas he robus algorihm is much more sable. n fac, a sligh moion of he head while he soda can is in he image caused he nonrobus algorihm o misrack compleely. The middle series of images shows he oupu of he warping operaion for he robus algorihm. The lower row of images depics he weighing values aached o each pixel in he warped image. Dark areas correspond o ouliers. Noe ha, alhough he occluded region is clearly idenified by he algorihm, here are some small regions away from he occlusion which received a slighly reduced weigh. This is due o he fac ha he robus meric used inroduces some small bias ino he compued parameers. n areas where he spaial gradien is large (e.g., near he eyes and mouh), his inroduces some false rejecion of pixels. A he same ime, inruding regions of a similar inensiy as he face are no rejeced as seen

HAGER AND BELHUMEUR: EFFCENT REGON TRACKNG WTH PARAMETRC MODELS OF GEOMETRY AND LLUMNATON 13 Fig. 8. The firs row of images shows excerps of a racking sequence wih occurrences of parial occlusion. The black frame corresponds o he algorihm wihou oulier rejecion (RM+S+) and he whie frame corresponds o he algorihm wih oulier rejecion (RM+S++O). The second row is a magnified view of he region in he whie frame. The hird row conains he corresponding oulier images where darker areas mark ouliers. The graph a he boom compares he residual values for boh algorihms. in he lower lef of he lef-mos column of images. is also imporan o noe ha he dynamical performance of he racker is reduced by including ouliers. Large, fas moions end o cause he algorihm o urn off areas of he image where here are large gradiens, slowing convergence. A he same ime, performing oulier rejecion is more compuaionally inensive as i requires explici compuaion of boh he moion and illuminaion parameers o calculae he residual values. 6 DSCUSSON AND CONCLUSONS We have shown a sraighforward and efficien soluion o he problem of racking regions undergoing geomeric disorion, changing illuminaion, and parial occlusion. The mehod is simple and efficien, ye robus o reasonable deviaions from underlying moion and illuminaion models. For example, alhough we have modeled he face as a rigid objec undergoing limied moion in our experimens, he algorihm can sill rack he subjec as he or she is

14 EEE TRANSACTONS ON PATTERN ANALYSS AND MACHNE NTELLGENCE, VOL. 2, NO. 1, OCTOBER 1998 changing expression or, as illusraed in he previous secion, performing ou-of-plane roaions. Alhough he focus in his aricle has been on parameer esimaion echniques for racking using image recificaion, he same esimaion mehods can be used for direcly conrolling devices. For example, insead of compuing a parameer esimae m, he incremenal soluions δm can be used o conrol he posiion and orienaion of a camera so o sabilize he arge image by acive moion. Hybrid combinaions of camera conrol and image warping are also possible. One possible objecion o he mehods is he requiremen ha he change from frame o frame is small (generally wihin a few pixels), limiing he speed a which objecs can move. Luckily, here are several means for improving he dynamical performance of he algorihms. One possibiliy is o include a model for he moion of he underlying objec and o incorporae predicion ino he racking algorihm. Likewise, if a model of he noise characerisics of images is available, he updaing mehod can modified o incorporae his model. n fac, he form of he soluion makes i sraighforward o incorporae he esimaion algorihm ino a Kalman filer or similar ieraive esimaion procedure. Performance can also be improved by operaing he racking algorihm a muliple levels of resoluion. One possibiliy, as is used by many auhors [29], [23], is o perform a complee coarse o fine progression of esimaion seps on each image in he sequence. Anoher possibiliy, which we have used successfully in prior work [41], is o dynamically adap resoluion based on he moion of he arge. Tha is, when he arge moves quickly esimaion is performed a a coarse resoluion, and when i moves slowly he algorihm changes o a higher resoluion. The advanage of his approach is ha i no only increases he range over which he linearized problem is valid, bu i also reduces he compuaion ime required on each image when moion is fas. We are acively coninuing o evaluae he performance of hese mehods, and o exend heir heoreical underpinnings. One area ha sill needs aenion is he problem of deermining an illuminaion basis online, i.e. while racking he objec. niial experimens in his direcion have shown ha online deerminaion of he illuminaion basis can be achieved, alhough we have no included such resuls in his paper. As in [29], we are also exploring he use of basis images o handle changes of view or aspec no well addressed by warping. We are also looking a he problem of exending he mehod o uilize shape informaion on he arge when such informaion is available [43]. n paricular, i is well known [44] ha under orhographic projecion, he image deformaions of a surface due o moion can be described wih a linear moion model. This suggess ha our mehods can be exended o handle such models. Furhermore, as wih he illuminaion basis, i may be possible o esimae he deformaion models online, hereby making i possible o efficienly rack arbirary objecs under changes in illuminaion, pose, and parial occlusion. ACKNOWLEDGMENTS G.D. Hager was suppored by ARO gran DAAG55-98-1-168, U.S. Naional Science Foundaion gran R- 942982, and by funds provided by Yale Universiy. P.N. Belhumeur was suppored by a Presidenial Early Career Award, a U.S. Naional Science Foundaion Career Award R-973134, and ARO gran DAAH4-95-1-494. The auhors would like o hank David Mumford, Alan Yuille, David Kriegman, Peer Hallinan, and Jorgen Karlholm for conribuing o he ideas in his paper. REFERENCES [1] P. Allen, B. Yoshimi, and A. Timcenko, Hand-Eye Coordinaion for Roboics Tracking and Grasping, K. Hashimoo, ed., Visual Servoing, pp. 33 7. World Scienific, 1994. [2] S. Huchinson, G.D. Hager, and P. Corke, A Tuorial nroducion o Visual Servo Conrol, EEE Trans. Robo. Auoma, vol. 12, no. 5, 1996. [3] N. Papanikolopoulos, P. Khosla, and T. Kanade, Visual Tracking of a Moving Targe by a Camera Mouned on a Robo: A Combinaion of Conrol and Vision, EEE Trans. Robo. Auoma, vol. 9, no. 1, pp. 14-35, 1993. [4] E. Dickmanns and V. Graefe, Dynamic Monocular Machine Vision, Machine Vision and Applicaions, vol. 1, pp. 223 24, 1988. [5] A.F. Bobick and A.D. Wilson, A Sae-Based Technique for he Summarizaion of Recogniion of Gesure, Proc. n l Conf. Compuer Vision, pp. 382 388, 1995. [6] T. Darrell, B. Moghaddam, and A. Penland, Acive Face Tracking and Pose Esimaion in an neracive Room, Proc. EEE Conf. Compuer Vision and Paern Recogniion, pp. 67 72, 1996. [7] D. Gavrila and L. Davis, Tracking Humans in Acion: A 3D Model-Based Approach, Proc. mage Undersanding Workshop, pp. 737 746, 1996. [8] R. Howarh and H. Buxon, Visual Surveillance Monioring and Waching, Proc. European Conf. Compuer Vision, vol. 2, pp. 321 334, 1996. [9] T. Frank, M. Haag, H. Kollnig, and H.-H. Nagel, Tracking of Occluded Vehicles in Traffic Scenes, Proc. European Conf. Compuer Vision, vol. 2, pp. 485 494, 1996. [1] R.C. Harrell, D.C. Slaugher, and P.D. Adsi, A Frui-Tracking Sysem for Roboic Harvesing, Machine Vision and Applicaions, vol. 2, pp. 69 8, 1989. [11] D. Reynard, A. Wildenberg, A. Blake, and J. Marchan, Learning Dynamics of Complex Moions From mage Sequences, Proc. European Conf. Compuer Vision, vol. 1, pp. 357 368, 1996. [12] E. Bardine, L. Cohen, and N. Ayache, Tracking Medical 3D Daa Wih a Deformable Parameric Model, Proc. European Conf. Compuer Vision, vol. 1, pp. 317 328, 1996. [13] P. Shi, G. Robinson, T. Consable, A. Sinusas, and J. Duncan, A Model-Based negraed Approach o Track Myocardial Deformaion Using Displacemen and Velociy Consrains, Proc. n l Conf. Compuer Vision, pp. 687 692, 1995. [14] E. Boyer, Objec Models From Conour Sequences, Proc. European Conf. Compuer Vision, vol. 2, pp. 19 118, 1996. [15] L. Shapiro, Affine Analysis of mage Sequences. Cambridge, England: Cambridge Univ. Press, 1995. [16] C. Tomasi and T. Kanade, Shape and Moion From mage Sreams Under Orhography: A Facorizaion Mehod, n l J. Compuer Vision, vol. 9, no. 2, pp. 137 154, 1992. [17] P. Hallinan, A Low-Dimensional Represenaion of Human Faces for Arbirary Lighing Condiions, Proc. EEE Conf. Compuer Vision and Paern Recogniion, pp. 995 999, 1994. [18] R. Epsein, P. Hallinan, and A. Yuille, 5 ± 2 Eigenimages Suffice: An Empirical nvesigaion of Low-Dimensional Lighing Models, Technical Repor 94-11, Harvard Univ., 1994. [19] P.N. Belhumeur and D.J. Kriegman, Wha s he Se of mages of an Objec Under All Possible Lighing Condiions, Proc. EEE Conf. Compuer Vision and Paern Recogniion, pp. 27 277, 1996. [2] R. Duer and P. Huber, Numerical Mehods for he Nonlinear Robus Regression Problem, J. Sais. Compu. Simulaion, vol. 13, no. 2, pp. 79 113, 1981.

HAGER AND BELHUMEUR: EFFCENT REGON TRACKNG WTH PARAMETRC MODELS OF GEOMETRY AND LLUMNATON 15 [21] B.D. Lucas and T. Kanade, An eraive mage Regisraion Technique Wih an Applicaion o Sereo Vision, Proc. n l Join Conf. Arificial nelligence, pp. 674 679, 1981. [22] P. Anandan, A Compuaional Framework and an Algorihm for he Measuremen of Srucure From Moion, n l J. Compuer Vision, vol. 2, pp. 283 31, 1989. [23] J. Shi and C. Tomasi, Good Feaures o Track, Proc. EEE Conf. Compuer Vision and Paern Recogniion, pp. 593 6, EEE CS Press, 1994. [24] J. Rehg and T. Kanade, Visual Tracking of High DOF Ariculaed Srucures: An Applicaion o Human Hand Tracking, Proc. European Conf. Compuer Vision, vol. B, pp. 35 46, 1994. [25] C. Bregler, Learning and Recognizing Human Dynamics in Video Sequences, Proc. EEE Conf. Compuer Vision and Paern Recogniion, pp. 568 574, 1997. [26] J. Rehg and A. Wikin, Visual Tracking Wih Deformaion Models, Proc. EEE n l Conf. Roboics and Auomaion, pp. 844 85, 1991. [27] M. Black and Y. Yacoob, Tracking and Recognizing Rigid and Non-Rigid Facial Moions Using Local Parameric Models of mage Moion, Proc. n l Conf. Compuer Vision, pp. 374 381, 1995. [28] H. Murase and S. Nayar, Visual Learning and Recogniion of 3-D Objecs From Appearance, n l J. Compuer Vision, vol. 14, pp. 5-24, 1995. [29] M. Black and A. Jepson, Eigenracking: Robus Maching and Tracking of Ariculaed Objecs Using a View-Based Represenaion, Proc. European Conf. Compuer Vision, pp. 329 342, 1996. [3] M. sard and A. Blake, Conour Tracking by Sochasic Propagaion of Condiional Densiy, European Conf. on Compuer Vision, vol. 1, pp. 343 356, 1996. [31] D.G. Lowe, Robus Model-Based Moion Tracking Through he negraion of Search and Esimaion, n l J. Compuer Vision, vol. 8, no. 2, pp. 113 122, 1992. [32] D.B. Gennery, Visual Tracking of Known Three-Dimensional Objecs, n l J. Compuer Vision, vol. 7, no. 3, pp. 243 27, 1992. [33] A. Blake, R. Curwen, and A. Zisserman, A Framework for Spaio-Temporal Conrol in he Tracking of Visual Conour, n l J. Compuer Vision, vol. 11, no. 2, pp. 127 145, 1993. [34] B. Horn, Compuer Vision. Cambridge, Mass.: MT Press, 1986. [35] M. Beke and N. Makris, Fas Objec Recogniion in Noisy mages Using Simulaed Annealing, Proc. n l Conf. Compuer Vision, pp. 523 53, 1995. [36] R. Szeliski, mage Mosaicing for Tele-Realiy Applicaions, Proc. Workshop Applicaions of Compuer Vision, pp. 44 53, 1994. [37] A. Shashua, Geomery and Phoomery in 3D Visual Recogniion, PhD hesis, Massachuses nsiue of Technology, 1992. [38] R. Woodham, Analysing mages of Curved Surfaces, Arificial nelligence, vol. 17, pp. 117 14, 1981. [39] P. Huber, Robus Saisics. New York: John Wiley & Sons, 1981. [4] R.M. Haralick and L.G. Shapiro, Compuer and Robo Vision. Reading, Mass.: Addison Wesley, 1993. [41] G.D. Hager and K. Toyama, XVision: A Porable Subsrae for Real-Time Vision Applicaions, Compuer Vision and mage Undersanding, vol. 69, no. 1, pp. 23-37, 1998. [42] S. McKenna, S. Gong, and J. Collins, Face Tracking and Pose Represenaion, Briish Machine Vision Conf., 1996. [43] P. Belhumeur and G.D. Hager, Tracking in 3D: mage Variabiliy Decomposiion for Recovering Objec Pose and lluminaion, Proc. n l Conf. Paern Analysis Applicaions, 1998. Also available as Yale Compuer Science #1141. [44] S. Ullman and R. Basri, Recogniion by a Linear Combinaion of Models, EEE Trans. Paern Analysis and Machine nelligence, vol. 13, pp. 992 1,6, 1991. Gregory D. Hager received his BA degree in compuer science and mahemaics from Luher College in 1983 and his MS and PhD in compuer science from he Universiy of Pennsylvania in 1985 and 1988, respecively. From 1988 o 199, he was a Fulbrigh junior research fellow a he Universiy of Karlsruhe and he Fraunhofer nsiue TB in Karlsruhe, Germany. Upon reurning o he Unied Saes, he joined he Compuer Science Deparmen a Yale Universiy, where he is currenly an associae professor. He is a member of EEE and AAA and is currenly cochairman of he Roboics and Auomaion Sociey Technical Commiee on Compuer and Robo Vision. His research ineress include visual racking, handeye coordinaion, sensor daa fusion, and sensor planning. A book on his disseraion work eniled Task-Direced Sensor Fusion and Planning has been published by Kluwer Academic Publishers, nc. Peer N. Belhumeur graduaed in 1985 from Brown Universiy wih Highes Honors, receiving an ScB degree in compuer and informaion engineering. He received an SM in 1991 and a PhD in 1993 from Harvard Universiy, where he sudied under a Harvard Fellowship. n 1993, he was a posdocoral fellow a he Universiy of Cambridge s Sir saac Newon nsiue for Mahemaical Sciences. He was appoined assisan professor of elecrical engineering a Yale Universiy in 1994 and was given a join appoinmen wih he Deparmen of Compuer Science in 1998. He is a recipien of he Presidenial Early Career Award for Scieniss and Engineers, he U.S. Naional Science Foundaion Career Award, and a Yale Universiy Junior Faculy Fellowship for Naural Sciences. He won he Bes Paper Award a he 1996 EEE Conference on Compuer Vision and Paern Recogniion and an Ousanding Paper Award a he 1998 European Conference on Compuer Vison. He is a Member of he EEE.