Object Tracking with Dynamic Feature Graph

Similar documents
ROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION

Gesture Recognition using a Probabilistic Framework for Pose Matching

Introduction to SLAM Part II. Paul Robertson

CS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012

CS 223B Computer Vision Problem Set 3

Classification Method for Colored Natural Textures Using Gabor Filtering

Distribution Fields with Adaptive Kernels for Large Displacement Image Alignment

CS 231A Computer Vision (Fall 2012) Problem Set 3

Face detection and recognition. Detection Recognition Sally

Binary Morphological Model in Refining Local Fitting Active Contour in Segmenting Weak/Missing Edges

The SIFT (Scale Invariant Feature

Automatic Video Segmentation for Czech TV Broadcast Transcription

Fast Image Matching Using Multi-level Texture Descriptor

Evaluation and comparison of interest points/regions

Selection of Scale-Invariant Parts for Object Class Recognition

Local Image Features

Scale-invariant shape features for recognition of object categories

Visual Saliency Based Object Tracking

Shape Descriptor using Polar Plot for Shape Recognition.

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Object detection using non-redundant local Binary Patterns

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

Detecting Object Instances Without Discriminative Features

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Local Feature Detectors

Computer Vision for HCI. Topics of This Lecture

A SAR IMAGE REGISTRATION METHOD BASED ON SIFT ALGORITHM

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

MAPI Computer Vision. Multiple View Geometry

A Feature Point Matching Based Approach for Video Objects Segmentation

Designing Applications that See Lecture 7: Object Recognition

Research on Image Splicing Based on Weighted POISSON Fusion

Face Detection for Automatic Avatar Creation by using Deformable Template and GA

TA Section 7 Problem Set 3. SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999)

Implementing the Scale Invariant Feature Transform(SIFT) Method

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Object Tracking with an Adaptive Color-Based Particle Filter

Face detection in a video sequence - a temporal approach

A Novel Algorithm for Color Image matching using Wavelet-SIFT

Det De e t cting abnormal event n s Jaechul Kim

Improving Alignment of Faces for Recognition

Neighbourhood Operations

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Color Image Segmentation

CRF Based Point Cloud Segmentation Jonathan Nation

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Motion illusion, rotating snakes

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Deformation Invariant Image Matching

A Review of Evaluation of Optimal Binarization Technique for Character Segmentation in Historical Manuscripts

Switching Hypothesized Measurements: A Dynamic Model with Applications to Occlusion Adaptive Joint Tracking

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Road Sign Analysis Using Multisensory Data

Study and Analysis of Edge Detection and Implementation of Fuzzy Set. Theory Based Edge Detection Technique in Digital Images

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Digital Image Processing. Image Enhancement in the Spatial Domain (Chapter 4)

Face Recognition using Hough Peaks extracted from the significant blocks of the Gradient Image

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing

Stable Interest Points for Improved Image Retrieval and Matching

Conditional Random Fields for Object Recognition

Local Features and Bag of Words Models

Local Image Features

Relaxing the 3L algorithm for an accurate implicit polynomial fitting

SUPER RESOLUTION IMAGE BY EDGE-CONSTRAINED CURVE FITTING IN THE THRESHOLD DECOMPOSITION DOMAIN

Topological Mapping. Discrete Bayes Filter

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

Lecture 10 Detectors and descriptors

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

Video Google: A Text Retrieval Approach to Object Matching in Videos

ECSE-626 Project: An Adaptive Color-Based Particle Filter

A New Feature Local Binary Patterns (FLBP) Method

Local features: detection and description. Local invariant features

Part-based and local feature models for generic object recognition

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

3-D TERRAIN RECONSTRUCTION WITH AERIAL PHOTOGRAPHY

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza

SHIP RECOGNITION USING OPTICAL IMAGERY FOR HARBOR SURVEILLANCE

Computer Vision I - Filtering and Feature detection

Object Recognition with Invariant Features

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Local Features Tutorial: Nov. 8, 04

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Motion Estimation and Optical Flow Tracking

Detecting and Segmenting Humans in Crowded Scenes

TEXTURE CLASSIFICATION METHODS: A REVIEW

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

Image Features: Detection, Description, and Matching and their Applications

3D Hand and Fingers Reconstruction from Monocular View

Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models

A Comparison and Matching Point Extraction of SIFT and ISIFT

The most cited papers in Computer Vision

Local features and image matching. Prof. Xin Yang HUST

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Online Spatial-temporal Data Fusion for Robust Adaptive Tracking

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Generic Face Alignment Using an Improved Active Shape Model

ELL 788 Computational Perception & Cognition July November 2015

Supervised texture detection in images

A Research on Moving Human Body Detection Based on the Depth Images of Kinect

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Transcription:

Object Tracking with Dynamic Feature Graph Feng Tang and Hai Tao Department o Computer Engineering, University o Caliornia, Santa Cruz {tang,tao}@soe.ucsc.edu Abstract Two major problems or model-based object tracking are: 1) how to represent an object so that it can eectively be discriminated with background and other objects; 2) how to dynamically update the model to accommodate the object appearance and structure changes. Traditional appearance based representations (like color histogram) will ail when the object has rich texture. In this paper, we present a novel eature based object representation attributed relational graph (ARG) or reliable object tracking. The object is modeled with invariant eatures (SIFT) and their relationship is encoded in the orm o an ARG that can eectively distinguish itsel rom background and other objects. We adopt a competitive and eicient dynamic model to adaptively update the object model by adding new stable eatures as well as deleting inactive eatures. A relaxation labeling method is used to match the model graph with the observation to get the best object position. Experiments show that our method can get reliable track even under dramatic appearance changes, occlusions, etc. 1 Introduction Model-based object tracking usually has a ixed object representation, and this so-called template is matched with the observation to get object position in visual tracking. Two major problems or model-based object tracking are 1) how to represent an object so that it can eectively be discriminated with background and other objects; 2) how to dynamically update the model to accommodate the object appearance and structure changes due to the changes in the surrounding conditions against which the tracked object is observed. Much work has been done on the irst problem but the second has received relatively little attention. In terms o object representation, appearance and shape based approaches have been widely used. Appearance based models usually model the object as color histogram, shape models usually model the object using 2D edges or 3D geometry models. In [5], the color histogram o a target region is used as the appearance representation. In [8], a contour-based ace-and-shoulder model is used or modeling people. In [18], a Gaussian distribution o the object pixel values is used. These algorithms build appearance models rom examples in training datasets, and then use the model to track objects. Appearance-based representations have been very successul, but most o they are holistic representations which completely lose the object structure inormation and are sensitive to illumination change and background clutter. For shape-based representation, shape contexts [1] characterize the local image by histogramming its edges into radial-polar bins. [3] takes a similar approach by characterizing each edge pixel with the local distribution o edges in its image neighborhood. Similarly, [15]measures the distributions o orientation inormation in a neighborhood chosen to be invariant to scale changes. Shape-based representations can eectively represent the object structure, but they don t consider the object appearance, which is also not satisying. Recent development in local invariant eature representations and their success in pattern recognition inspire us to use the advantage o local eatures in tracking. Such local eatures are usually designed to be invariant to appearance changes as well as geometric transormation (scale, rotation, etc). [10] describes eatures using scale-invariant salient convex local arrangement o contours in the image. The SIFT descriptor [13] uses histogram o gradient, which is scale and rotation invariant. [9] describes local eatures using spin-image, which is generated using a histogram o the relative position o neighborhood points to the interest point. [11] detects salient regions based on image complexity and uses local entropy as eature descriptor. [12] proposes an aine invariant descriptor or texture recognition. [16] gives a detailed perormance evaluation o dierent descriptors. One o the disadvantages o local eature based representation is it s lack o global structure inormation. But such structural inormation is crucial or object to distinguish it rom background and other objects. In this paper, to overcome the disadvantage o traditional appearance based representations, we propose to develop a new Attributed Relational Graph (ARG) based object representation that incorporates both distinctive eature Scale Invariant Feature Transorm (SIFT) and their relations or tracking. Locally, eatures describe the object details; globally, the relations between eatures encode the object structure. This elastic representation has the lexibility to handle objects with coherent motion and certain amount o variations caused by illumination changes, occlusion as well as discriminating between structurally dierent object types. Compared to object representation, relatively less work has been done on model/template updating to accommodate the object changes. In [4], a ranking system is proposed to select the best eature space among the 49 candi-

dates acquired by a linear combination o 3 likelihood images in R, G, B space. It actually uses the immediate previous rame as the training rame or eature selection and use the current rame as the test rame or oreground/background classiication; the results are very impressive or various background changes. The dierence between their work and our work is that their method is a discriminative tracker which cannot easily handle occlusions, while our tracker is a generative one which can handle occlusions easily. Kernel-based tracking [5] proposes a eature value weighting scheme based on the background color inormation and ocuses on salient target parts rom the representation o target and candidate model. In [14] the template is irst updated with the image at the current template location. To eliminate drit, this updated template is then aligned with the irst template to give the inal update. These methods are all global update, which are not suitable or our local eature based representation. In dynamic image sequence, local eatures may be unstable over time; it may show up in one rame, then disappear in the next rame then show up again. In this case, using only the previous rame to do the update is not enough. We model the object dynamic behaviors using a high order HMM that can eectively and adaptively update the model to handle new eatures show up, old eatures die o. Our object model can dynamically take in good eatures (stable eatures) and eliminate unstable eatures based on eature past perormances in a probabilistic sense. For example, i we know that in the previous consecutive several rames, one eature is never matched, it is not likely to be an active eature, so we delete it rom the model. I a new eature persistently appears in several consecutive rames, it is very likely to be a new eature, so we will add it to the model. This adaptively evolving scheme makes the tracker always settle on the optimal state even when the object undergos signiicant changes. Major contributions o this paper are: 1) We propose a novel, compact and robust eature-based object representation in the orm o attributed relational graph (ARG) that is invariant to various appearance changes as well as non-rigid motion. 2) We propose a competitive object dynamic model that can model the short term dynamics (inter-rame graph change) as well as the long term dynamics (dynamic model update - stable eature birth and death) that can eectively predict the evolving behavior o the object. 3) We present a general MAP ramework or eature based tracking. The rest o the paper is organized as ollows. Section 2 describes our eature-based object representation. Section 3 gives the main ramework. Section 4 presents the likelihood computation and object dynamics based on this representation. Experimental results are demonstrated in section 5. Section 6 concludes the paper. 2 Attributed relational graph based eature representation We use SIFT eatures [13] as our object primitive and organize SIFT eatures into an attributed relational graph (ARG). The ARG encodes the relations between dierent eatures, thus provides a more reliable representation or matching and tracking. 2.1 Feature Representation Each eature (or keypoint) is described by its location and scale, and the orientation o the main intensity gradient within a neighborhood and the gradient histogram in the local region. The eatures are located using the DOG detector, as developed by Lowe [13]. The SIFT descriptor is distinctive and it can have high probability to ind the exact match under certain extent o illumination changes and aine transormation. All these characteristics, especially the robustness under illumination changes, match the requirements o object tracking task. More over, the SIFT eatures can also be computed very eiciently so that it makes ast tracking possible. The SIFT eature is represented as : = { p, s, o, hist} where p is the 2-D position o the eature in terms o the image coordinate, s is the eature scale, o is the eature vector direction, and hist is the gradient orientation distribution quantized into 128 bins. 2.2 Organizing eatures with ARG Graph representations are widely used or representing structural inormation in dierent domains such as psychosociology, image interpretation and pattern recognition [7],[2]. Attributed relational graph is a more powerul approach or image representation than pure eature based representations. The semantic inormation o the relations among the image eatures is represented by the attributes associated with the relations between their corresponding eatures. This approach or image representation has shown to provide compact, concise and powerul representation that is capable o comprehending rich inormation contents o the images[7]. Feature relations encode their geometric structure that is a global description o how the object looks like. For simplicity and computational concerns, we deine the relation to be binary. Our deinition o neighborhood is adaptive. I two eatures are close to each other with respect to its scale, a relation is ormed. This is used in many image analysis algorithms, or example [17]. More speciically we deine the relations as ollows. Suppose the and ' are two eatures, their relation attributes are deined as: r(, ') = { rd, rs, ro}, where rd = p p ' is the Euclidean distance between two eatures, rs = s s ' / s + s ' is the scale dierence and ro = o o ' is the orientation dierence. The attributed relational graph G is deined as ollows:

G = {, r} is a relational graph. = { 1, 2... n} is the node set, as deined in 2.1. r r r r = { 1, 2... m} is the edge set. Since we use relative attributes as the relation, our graph representation is rotation and translation invariant. This will enhance the lexibility and robustness o our tracker. Figure 1. Object representation. The let image is the scene with the SIFT eatures imposed by blue arrows; the middle igure is the scaled version o the red rectangle area; the right image is the relational graph, with the green lines showing the edges. 3 The MAP ramework Our eature-based tracker is ormulated into a Maximum A Posteriori (MAP) ramework using the Hidden Markov Model. The hidden state includes the inormation about the eatures and relations (in the orm o an ARG). The object state at time t is identiied as Xt = Gt = We ignore the edge set here because the edges(relations) are completely dependent on the nodes, once the nodes are determined, edges are uniquely determined. Speciically, our eature tracker is ormulated as inding the Maximum A Posterior probability: max arg PX ( t It,... I0, Xt 1,..., X 0) Xt We assume an m-order Hidden Markov Model: max arg PX ( t It,... I0, Xt 1,..., X 0) = Xt max arg PX ( t It, It 1, Xt 1,..., Xt m) Xt PI Xt It 1 Xt 1 PXt It 1 Xt 1 Xt m Xt = max arg (,, ) (,,..., ) where PI ( Xt, It 1, Xt 1) is the likelihood and PX ( t It 1, Xt 1,..., Xt m) is the object dynamics. The reason why we use a high order hidden Markov model is that we need such a model to incorporate the object state history to predict the graph dynamics. Details o the model are in Section 4. Using our graph-based representation, the likelihood computation can be ormulated as a graph matching algorithm. The likelihood computation will be discussed in the ollowing section. 4 Object dynamics and likelihood (1) Unlike other trackers in which observations are image intensities, in our eature-based representation, the observation in each rame is the extracted SIFT eatures, which are used to generate the relational graph. The object dynamics express how the object (in the orm o a graph) evolves over time. To allow urther lexibility, we model the short term graph dynamics as well as long term dynamics. Short term dynamics relects the eature attributes changes with respect to the previous rame. Long term dynamics add new stable eatures and delete inactive eatures in a probabilistic sense based on past perormance o eatures. The likelihood is a measure o how well the model its the observation. We compute the likelihood using an eicient relaxation labeling based graph matching algorithm. 4.1 Object Dynamics Since both the graph nodes and edges evolve over time, their dynamics should be modeled simultaneously. However, the edges are ully dependent on nodes, and once nodes are determined, edges can be uniquely determined. So, we only need to model the node dynamics and the relation dynamics is automatically incorporated. Based on this, we ormulate the object dynamics as: t t 1 t m PX ( t It 1, Xt 1,..., Xt m) = P(,..., ) (2) t t t t Suppose = { 1, 2... Nt}, the state transition can be actorized as: N t t 1 t m t t 1 t m P(,..., ) = P( i i,..., ) (3) i= 1 t t 1 t t 1 P s( i i ) wheni, i t t 1 t m t t 1 t m t 1 i i i new i i i i t t 1 t m t Pdelete ( i i,..., i ) wheni P (,..., )) = P (,..., ) when = (4) = Ps is the short term dynamics which models the eature attribute prior. To better model the graph dynamics, we introduce the stable eatures. They are those persistent eatures that are observed in several consecutive rames. Stable eatures can adaptively model the evolving behavior o the object. Pnew and Pdelete model birth and death o stable eatures. They condition on eature s matching perormance in previous rames, because we want only the stable eature to stay in the stable eature set and unstable eatures which have a poor history will have very low probability o occurring in the next rame. We will discuss the 3 items in detail below. 4.1.1 Feature attribute prior We consider the dynamic prior or eature position, scale and orientation to be Gaussian with previous state attribute as the mean. The attribute prior can be ormulated as: P P P s s P o o P t t 1 s( i i ) = ( pt pt 1)* ( t t 1)* ( t t 1)* ( ht ht 1) T 1 st st 1 2 ot ot 1 2 pt 1 pt p pt 1 pt σs σo d( hh t, t 1) 2 h exp{ ( ) ( )}*exp{ ( ) )}*exp{ ( ) )}* exp{ ( ) )} (5) σ where p, s, o, h are the eature position, scale, orientation and histogram respectively, as deined in 2.1. d( hh t, t 1) is

the distance between histograms. p, σ s, σ o, σ h are the corresponding covariance matrices or variance. 4.1.2 Adding new stable eatures Since in each rame, new eatures may show up due to the appearance changes or pose changes, it s not wise to treat newcomers equally with the eatures that have been proved to be stable. We need a scheme to temporarily hold the new eatures, and ater some period o competition, add those really stable eatures into model. So, we maintain a candidate eature set to hold potential stable eatures. Each eature has an associated status vector, identiying on which rame it appears and on which rame not. We assume the probability or the eatures in the candidate set to be added into the model to be a Binomial distribution Bmp (, b) where m is the order o the HMM, i.e. the time window we use or evaluating the eature stableness. pb is the probability that the eature is observed. For eiciency concerns, in each rame, only those eatures whose probability o being boosted are higher than a threshold τ a can be added into the model. That is Pnew ~ B( no : m, pb) truncated between [ τ a, 1] where n o is the number o times the eature is observed in the previous m rames. This candidate eature set is updated in each rame ater state update. 4.1.3 Deleting inactive eatures Due to object pose or illumination changes, the eatures which have been stable in previous rames may become inactive in the uture. A scheme or deleting such inactive eatures is incorporated into our ramework. For those eatures already in the model, we also maintain a history o their perormances on the previous m rames, i a eature has not been matched or quite some times, we consider it to be inactive and delete it rom the model. Similar to the scenario to incorporate new eatures, we also model the eature deletion as a Binomial distribution: Pdelete ~ B( nm : m, p m) truncated between [ τ d, 1], τ d is the minimum probability that the eature can be deleted rom the model. Figure 2. Dynamic model update Graph nodes are shown as colored circles, and the edges are the lines connecting them. The irst row is the observation rom image sequence; the second row is the object model at each rame. At rame n, the model contains 5 eatures(1-5). In the 5 rames(n ~ n+4), the blue eature(5) is observed only twice, so it is selected as an unstable eature and deleted rom the model(second row) at rame n+4. And at rame n+3, the new eature 6 has been persistently observed or 3 consecutive rames, so, it is a stable eature, and is selected rom candidate set. (For simplicity, the candidate set is not drawn in the igure) The model eature addition and deletion is a survival o the ittest scheme. Figure 2 is a simpliied demonstration o this model update process. It always keeps good eatures (in terms o stableness) in the model. This competitive strategy greatly enhances the lexibility o the model, which makes it more suitable or tracking under appearance and pose changes. 4.2 Likelihood The likelihood unction PI ( t Xt) describes how the underlying state Xt its the observation I t. We propose a eaturebased likelihood unction that is computed as the matching score o the object representation with the observation. The likelihood computation is a graph matching problem. 4.2.1 Graph matching ormulation To handle the case o dierent numbers o eatures in two graphs, we add dummy nodes (null) into both observation graph node set and model graph node set.that is, or the matching unction F, we allow the eature to match to dummy node when no good match can be ound. The likelihood can be measured as the similarity o the observed data and the model graph (both matched node similarity and matched edge similarity). We assume the observation model to be: Gi = Mi + ei (); r Gi = r Mi + e'( i ) ; where Gi and Mi are observation graph and model graph eatures, rgi and rmi are observation graph and model graph relations, ei () and e'( i) are zero mean independent Gaussian noise. The likelihood can be ormalized as a Gibbs distribution with energy: EG ( X) = EG ( F) = E1( F( )) + α E2( r Fr ( )) (6) G r Gr where P1 and P2 are the eature and edge potentials deined below, and α is the coeicient to balance the eature matching and relation matching: K1 k k 2 k 2 k [ i F( i) ] /[ σ ] i F( i) isnotnull E1( i F( i)) = (7) i= 1 k Pv1 i F( i) null Where F( i) is the corresponding node o i and K 1 is the number o attributes associated with the node i. M k i is the k 2 k th component o the attributes, and ( σ i ) is the variance o its Gaussian noise distribution. I the matching o a eature is a dummy node (null), we assign a large penalty P v1. K2 k k 2 k 2 k [ ri F( ri) ] /[ σr] i F( ri) isnot null Er 1( i Fr ( i)) = (8) i= 1 k P2( ri F( ri)) = Pv2 i F( ri) is null

Where F( r i) is the corresponding edge o ri and K 2 is the number o properties associated with the edge e i, k ei is the k 2 k thcomponent o the edge properties, and ( σ 2 ) is the variance o Gaussian noise distribution. I the edge is matched to a dummy edge, it is assigned a penalty P v2. 4.2.2 Relaxation labeling or graph matching Generally speaking, graph matching is an NP-hard problem. Local optimal search techniques are oten used in real applications, whose perormance highly depends on the initial solution. Thus, in the irst stage, we use eature distance to initialize the graph matching (without considering the relations), ollowed by a relaxation labeling process to reine the match. We irst construct a larger graph than that in the previous state, then using sub-graph matching to match the model with this graph, as shown in igure 2. To solve such a matching problem, we use a relaxation labeling method. The basic idea is to use iterated local context updates to achieve a globally consistent result. Details about relaxation labeling can be ound at [23], [19]. Figure 3. Graph matching. Let is the model graph. Right is the constructed larger graph than that in the previous state at the region around the predicted position. The sub-graph matching algorithm matches the model graph with this larger graph. The relaxation labeling method considers the compatibility o label probabilities as constraints in the labeling algorithm, i.e., consider its neighbors. The compatibility Cij( ' k, ' l) is deined as the conditional probability that eature i has a label ' k given that eature j has a label 'l, i.e. Cij( ' k, ' l) = P( ' k ' l). Thus, updating the probabilities o labels is done by considering the probabilities o labels or neighbor eatures. Let us assume that we have changed all probabilities up to some step, and we now seek an updated probability or the next step. We can estimate the change in conidence o Pi( ' k) by δ Pi( ' k) = wij Cij( ' k, ' l) Pj( ' l) (9) j N l L where N is the number o neighbors o i, and wij is the weight o these neighbors with the constraint w ij = 1. The new probability or label Pi( ' k) in generating next step can be computed rom the values rom previous iteration using pi( ' k) = pi( ' k)[1 + δ pi( ' k)] pi( ' l)[1 + δ pi( ' l)] l (10) In this way, ater some iterations, the probability or each eature labeling will stabilize, thus the eature to eature matching is obtained. The matching score is taken as the likelihood. It also gives the approximate graph state in a maximum likelihood sense. 5 Implementation and experiment results We implemented our algorithm on Pentium-4 3.2GHz machine, the computation time depends on the object complexity, i the object has rich texture, it tends to have more eatures, thus will take more time to do the graph matching and model update. Averagely, it runs at 3 rames per second using un-optimized C++ code. We tested our tracking algorithm using real video sequences with complex textures and are undergoing signiicant appearance changes and occlusions. For the eature description, we describe the eatures using a 128-bin gradient histogram. To construct the relations, we need a threshold to determine which nodes are neighbors, the larger the threshold, the more edges the graph will include, i it is very large, the graph will become a complete graph. I it is too small, there will be very ew edges; the structural inormation is not accurately described. The parameters are tuned by testing. For the parameters τ a, τ d, we both set to be 0.8, means that only those eatures that are really stable/unstable, are selected or addition/deletion. For those matching thresholds, we try dierent measure and the ix it, and works well or other scenes. Figure 4, 5, 6, 7 are the tracking results. They demonstrate that using our method, the object can be robustly tracked even through severe appearance changes and occlusion. We compared our method with CamShit which is implemented in OpenCV library. In igure 4, the coee pot is rotating when moving, it s appearance is very complex (highly textured) and changes dramatically(rom dark color to light then to dark again), the background is also very clutter. The upper row(with white cross line as the object locator) shows the tracking result o Cam- Shit. It completely loses track within ten rames. One reason is that CamShit representation is a holistic color histogram; it loses much inormation (highly textured area) which is crucial or correct tracking ( Camshit is good at tracking regions with homogeneous color). Another reason is that the histogram based representation cannot well adapt to the dramatic appearance changes. Our representation can model such details using local eature graph, and the adaptive model update algorithm can elegantly adapt to such changes and keep track. The lower row is our results with the red rectangle as the tracked object. In igure5, the pedestrian o interest is walking. The sun casts shadow on the tracked pedestrian that causes dramatic appearance

changes, but the invariant eature representation guarantees it s correctness or tracking. In the second row, the pedestrian is gradually occluded by another wearing a white shirt. However, our graph dynamics can be adaptively updated, keeping the good eatures into the model. This is because the occluding object is not stable with the object movements. The eatures o occluding object will change; they will only have a small chance to come into the candidate eature set, and an even smaller chance to be added into the model graph. Thus the proposed algorithm can track through partial occlusion and illumination changes. The Camshit will lose track ater a ew rames. Figure 6 shows a vehicle tracking result under severe weather conditions, the vehicle appearances are similar to the background that even human eye cannot easily distinguish them. Also the object made a sharp turn, which changes its pose signiicantly; however, our eature tracker can robustly handle such cases. Figure 7 shows a vehicle tracking with heavy occlusions. Almost 80% part o the vehicle is occluded, but our tracker can keep track o the object. This is because the vehicle in this video has a very salient and stable eature, which is a gray color along the car s door and ront window; it is highly distinctive with background. The detected eature alone is enough to locate the object even the trees may occlude most o other parts o the vehicle. 6 Conclusions In this paper, we presented a novel tracking ramework based on our graph-based object representation. The object is described using a collection o SIFT eatures; the relations between eatures are encoded as the edges in the attributed graph. We also model the graph dynamics with the eature/relation addition and deletion which provides a competitive mechanism that can always keep the stable eatures in the model. The likelihood computation is ormulated into a graph matching problem that can be eiciently solved using relaxation labeling. Experiments have demonstrated the power o our eature-based tracker under signiicant appearance changes, pose changes, occlusions and so on. However, our method relies on the local eature stableness, i the eatures are very unstable, it will degrade the perormance o our tracker. From the experiments, we can see that the eature stability over time is crucial to tracking. We see this as starting point o our uture research in the ollowing directions. (1)Come up new eatures to be suitable or tracking in terms o distinctiveness and stability. (2) Real-time tracking with salient eatures Using only a ew stable salient eatures, tracking can be perormed very ast. It is possible to use salient eatures to build a tracker that can simultaneously track a dozen or even more objects at real-time. Reerences [1] S. Belongie, J. Malik & J. Puzicha, "Shape Matching and Object Recognition Using Shape Contexts", PAMI 24(4): 509-522, 2002 [2] Horst Bunke, Bruno T. Messmer: Eicient Attributed Graph Matching and Its Application to Image Analysis. ICIAP: 45-55 1995 [3] Carmichael and M. Hebert. Shape-based recognition wiry objects. In CVPR03 II, pages: 401 408, 2003. [4] R.T. Collins and Y. Liu. "On-Line Selection o Discriminative Tracking Features", ICCV '03, Nice, France. October, pp346-352 2003. [5]D.Comaniciu, Kernel-based Object Tracking, IEEE Trans.Pattern Anal. Machine Intell., vol. 25,no.5,pp.564-577,2003 [6] R.Deriche, O.Faugeras, Tracking line segments Image and Vision Computing,ECCV,1990 Vol 8(4) [7] M A Eshera, K S Fu An image understanding system using attributed symbolic representation and inexact graph-matching IEEE PAMI Volume8, Issue 5 Pages: 604-618 (September 1986) [8] M. Isard and A. Blake, "Contour tracking by stochastic propagation o conditional density" n, pp. 343-356, Cambridge UK, 1996. [9] A.Johnson and M.Hebert. Object recognition by matching oriented points. CVPR, pp. 684-689,1997 [10]Jurie, F. & Schmid, C Scale-invariant shape eatures or recognition o object categories CVPR, 2004 [11] Kadir, T. and Brady, M. Scale, Saliency and Image description n International Journal o Computer Vision. 45 (2):83-105, November 2001. [12]Svetlana Lazebnik, Cordelia Schmid, Jean Ponce "Aine-Invariant Local Descriptors and Neighborhood Statistics or Texture Recognition" ICCV 2003 [13]David G. Lowe Distinctive image eatures rom scale-invariant keypoints, International Journal o Computer Vision, 60, 2, pp. 91-110,2004 [14]I. Matthews, T. Ishikawa, and S. Baker The Template Update Problem IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 6, June, 2004, pp. 810-815. [15] K. Mikolajczk, A. Zisserman and C. Schmid Shape recognition with edge-based eatures. In British Machine Vision Conerence, September 2003 [16]K. Mikolajczyk, C. Schmid, "A perormance evaluation o local descriptors", CVPR '03 [17]Shams, L., Kamitani, Y., & Shimojo, S Graphmatching vs. entropy-based methods or object detection. Neural Networks Vol. 14, pp. 345-354.(2000). [18]Hai Tao, Harpreet S. Sawhney, Rakesh Kumar, "Object tracking with Bayesian estimation o dynamic layer representations," IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), vol. 24, no. 1, pp. 75-89, 2002.

Figure 4 Tracking o moving coee pot. The upper row (with white cross line) shows the tracking result o CamShit, The lower row (with red rectangle as the object identiier) is our results with the red rectangle as the tracked object position. (Note that the highly textured object is undergoing dramatic appearance changes).

Figure 5 Tracking o a pedestrian under signiicant appearance changes and heavy occlusion The odd rows are the results o CamShit method, with the while line cross as the tracked object position. Figure a-1 is the tracked result and a-2 is the ground truth o the object (scaled up). The even rows are the results o our method; the red rectangles are the tracked position, and the model and clipped tracking results with SIFT eatures(blue line with arrow) overlaid are shown on the side. Figure 6, Tracking o a vehicle under signiicant appearance changes Figure 7. Tracking o a vehicle under heavy occlusion and view-point changes.