Scale-invariant visual tracking by particle filtering

Similar documents
Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Real-time target tracking using a Pan and Tilt platform

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

HUMAN COMPUTER INTERFACE BASED ON HAND TRACKING

NIH Public Access Author Manuscript Proc Int Conf Image Proc. Author manuscript; available in PMC 2013 May 03.

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Video Based Moving Object Tracking by Particle Filter

Noise Reduction in Image Sequences using an Effective Fuzzy Algorithm

Efficient Acquisition of Human Existence Priors from Motion Trajectories

Volume 3, Issue 11, November 2013 International Journal of Advanced Research in Computer Science and Software Engineering

ROBUST OBJECT TRACKING BY SIMULTANEOUS GENERATION OF AN OBJECT MODEL

Face Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation

Object Tracking with an Adaptive Color-Based Particle Filter

International Journal of Advance Engineering and Research Development

Extended target tracking using PHD filters

User-Friendly Sharing System using Polynomials with Different Primes in Two Images

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

A Novel Multi-Frame Color Images Super-Resolution Framework based on Deep Convolutional Neural Network. Zhe Li, Shu Li, Jianmin Wang and Hongyang Wang

Region-based particle filter for video object segmentation

Automatic Logo Detection and Removal

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

Chapter 9 Object Tracking an Overview

An Approach for Reduction of Rain Streaks from a Single Image

MULTI-VIEWPOINT TRACKING FOR VISUAL SURVEILLANCE USING SINGLE DOME CAMERA

A Statistical Consistency Check for the Space Carving Algorithm.

CS 223B Computer Vision Problem Set 3

Introduction to behavior-recognition and object tracking

CS 231A Computer Vision (Fall 2012) Problem Set 3

Idle Object Detection in Video for Banking ATM Applications

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

A Spatio-Spectral Algorithm for Robust and Scalable Object Tracking in Videos

Outline 7/2/201011/6/

Efficient Object Tracking Using K means and Radial Basis Function

Switching Hypothesized Measurements: A Dynamic Model with Applications to Occlusion Adaptive Joint Tracking

Scale Invariant Segment Detection and Tracking

SCALE INVARIANT TEMPLATE MATCHING

Visual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.

Real-Time Visual Tracking Using Image Processing and Filtering Methods. Jin-cheol Ha

Detection and recognition of moving objects using statistical motion detection and Fourier descriptors

Multi-Target Tracking Using 1st Moment of Random Finite Sets

Closed loop visual tracking using observer-based dynamic active contours

Motion Detection Algorithm

Visual Tracking. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

MRF-based Algorithms for Segmentation of SAR Images

AN EFFICIENT BINARY CORNER DETECTOR. P. Saeedi, P. Lawrence and D. Lowe

New Models For Real-Time Tracking Using Particle Filtering

Video Surveillance System for Object Detection and Tracking Methods R.Aarthi, K.Kiruthikadevi

Tracking Soccer Ball Exploiting Player Trajectory

NIH Public Access Author Manuscript Proc Soc Photo Opt Instrum Eng. Author manuscript; available in PMC 2014 October 07.

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

FACIAL ACTION TRACKING USING PARTICLE FILTERS AND ACTIVE APPEARANCE MODELS. Soumya Hamlaoui, Franck Davoine

Rotation Invariant Image Registration using Robust Shape Matching

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES

Scale Invariant Detection and Tracking of Elongated Structures

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

AN APPROACH OF SEMIAUTOMATED ROAD EXTRACTION FROM AERIAL IMAGE BASED ON TEMPLATE MATCHING AND NEURAL NETWORK

EE795: Computer Vision and Intelligent Systems

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

This chapter explains two techniques which are frequently used throughout

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Object Detection in Video Streams

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

Human Upper Body Pose Estimation in Static Images

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

CIRCULAR MOIRÉ PATTERNS IN 3D COMPUTER VISION APPLICATIONS

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

Lecture 20: Tracking. Tuesday, Nov 27

TRACKING OF MULTIPLE SOCCER PLAYERS USING A 3D PARTICLE FILTER BASED ON DETECTOR CONFIDENCE

Dynamic Obstacle Detection Based on Background Compensation in Robot s Movement Space

A Modified Mean Shift Algorithm for Visual Object Tracking

A Background Modeling Approach Based on Visual Background Extractor Taotao Liu1, a, Lin Qi2, b and Guichi Liu2, c

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Hand-Eye Calibration from Image Derivatives

Probabilistic Location Recognition using Reduced Feature Set

Chapter 3 Image Registration. Chapter 3 Image Registration

Face Recognition Using Gabor Wavelets

Marcel Worring Intelligent Sensory Information Systems

Estimating Speed, Velocity, Acceleration and Angle Using Image Addition Method

Tracking Algorithms. Lecture16: Visual Tracking I. Probabilistic Tracking. Joint Probability and Graphical Model. Deterministic methods

Pedestrian counting in video sequences using optical flow clustering

Connected Component Analysis and Change Detection for Images

Fast trajectory matching using small binary images

Threshold-Based Moving Object Extraction in Video Streams

Probabilistic Robotics

Bayesian Color Estimation for Adaptive Vision-based Robot Localization

ECSE 626 Course Project : A Study in the Efficient Graph-Based Image Segmentation

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

CS201: Computer Vision Introduction to Tracking

Experiments with Edge Detection using One-dimensional Surface Fitting

On-line handwriting recognition using Chain Code representation

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

Array Shape Tracking Using Active Sonar Reverberation

DETECTION OF CHANGES IN SURVEILLANCE VIDEOS. Longin Jan Latecki, Xiangdong Wen, and Nilesh Ghubade

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

Target Tracking Based on Mean Shift and KALMAN Filter with Kernel Histogram Filtering

Designing Applications that See Lecture 7: Object Recognition

Adaptive Visual Servoing by Simultaneous Camera Calibration

Fast Natural Feature Tracking for Mobile Augmented Reality Applications

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

Transcription:

Scale-invariant visual tracing by particle filtering Arie Nahmani* a, Allen Tannenbaum a,b a Dept. of Electrical Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel b Schools of Electrical and Computer and Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0250 ABSTRACT Visual tracing is an important tas that has received a lot of attention in recent years. Robust generic tracing tools are of major interest for applications ranging from surveillance and security to image guided surgery. In these applications, the objects of interest may be translated and scaled. We present here an algorithm that uses scaled normalized crosscorrelation matching as the lielihood within the particle filtering framewor. We do not need color and contour cues in our algorithm. Experimental results with constant rectangular templates show that the method is reliable for noisy and cluttered scenarios, and provides accurate and smooth trajectories in cases of target translation and scaling. Keywords: Tracing, cross-correlation, CONDENSATION algorithm, scale-invariant, surveillance 1. INTRODUCTION In this note, we investigate the problem of tracing arbitrary targets in video sequences. Many of the algorithms available tend to be application-specific, are appropriate for a very limited class of video sequences, and suppose strong prior information on the traced target (e.g., shape, texture, size, color, camera dynamics, or motion constraints). On the other hand, a number of more generic target visual tracing algorithms search for distinctive features that can be followed from frame to frame. For these reasons, any progress on general arbitrary target (without distinctive features) tracers will be of interest for active vision, recognition, and surveillance applications. In the present wor, we propose a video tracing framewor for tracing non-articulated (blob-lie) targets, which lac prominent features. The proposed algorithm wors in a variety of scenarios, and deals naturally with clutter and noise in the scenes, target scaling, and low contrast targets. The most important assumption is that the target motion and scaling are smooth, without abrupt changes. We suppose that the target of interest is selected (by human operator or by automatic detection algorithm) in the first frame of video sequence. Tracing is performed by acquiring the target s centroid trajectory in a given bounding box. We should note that this problem formulation is not new, and a large literature is available on this topic. We mention here only a few of the most relevant wors for the approach taen in this paper. The comprehensive survey on visual tracing methods can be found in the paper by Yilmaz et.al. [1]. A deep analysis of particle filters is provided in [2], where rigorous theory and applications of particle filters are presented. Also, a powerful application of particle filters to image sequences (CONDENSATION algorithm) can be found in the paper by Blae and Isard [3]. The possible solutions to scale invariant template matching are presented in [4-6]; see these wors and the references therein. Although several attempts of combining area template matching with particle filtering have been made previously [7, 8], they used adaptive and learning schemes which maes them different from the algorithm given in this paper. The remainder of this paper is organized as follows. Section 2 explains the scale invariant template-matching problem. We briefly discuss the classical template matching with the normalized cross-correlation coefficient function (NCC), and we define the concept of scaled cross-correlation (SNCC). In Section 3, we consider the general problem of tracing with particle filters, and present the algorithm using measurement steps that are based on SNCC. In Section 4, we test our algorithm on three video sequences that illustrate some of its ey features. Finally, in Section 5, we summarize our research, and present the conclusions. We also discuss several problems that still need to be solved, and propose the future directions for the research.

2. SCALE-INVARIANT TEMPLATE MATCHING Let I(m,n) denote the intensity value of the image (or the search region), and P(i,j) denote the intensity value of the template patch. We assume that the size of I is M x M y, and the size of P is N x N y. Clearly, we assume that the size of I is greater than the size of P. It is nown that the noisy version of the patch is placed somewhere in the image I. Our goal is to determine the most probable position of the patch in image I. The standard approach to this problem is to compute the coordinates of the maximum normalized cross-correlation coefficient (NCC) between the image and the template. These coordinates represent the location of the best match. The normalized cross-correlation coefficient is defined for any pixel (m,n) by: NCC( m, n) ( I( i m 1, j n 1) I ( m, n))( P( i, j) P) i 1 j 1 2 2 ( I ( i m 1, j n 1) I ( m, n)) ( P( i, j) P) i 1 j 1 i 1 j 1 (1) where the mean intensity is defined by: N x N y 1 P P( i, j), (2) NxN y i 1 j 1 N x N y 1 I ( m, n) I( i m 1, j n 1), (3) i 1 j 1 m 1,2,..., M N 1, n 1,2,..., M N 1. y x x y (4) The values of NCC(m,n) are between -1 and 1 (1 for perfect match, and 0 for no correlation ). The technique presented here is used in many practical applications, and has demonstrated robustness to noise and intensity variations [9]. Unfortunately, this technique fails in the case of a scaling (zoom) of the desired target in the image I. The straightforward solution to this problem is to find the location of maximum for the scaled normalized crosscorrelation function (SNCC): ( J J ( m, n))( P( i, j) P) i 1 j 1 SNCC( m, n, s) (5) 2 2 ( J J ( m, n)) ( P( i, j) P) i 1 j 1 i 1 j 1 where s is the scaling factor (>0), J = I(m+s(i-1), n+s(j-1)) (if the indices are not integer, then they should be rounded, or the value of J should be interpolated from the closest neighbors), P - is defined in (2), and N 1 N J ( m, n) I( m s( i 1), n s( j 1)) (6) NxN y i 1 j 1

In other words, the template patch is compared to the scaled version of the image I, and the best match is found. Since the number of possible scalings is infinite, even the approximate solution by scale grating can be very computationally demanding, and not appropriate for real-time applications. We propose to overcome this problem by assuming that the scale does not change abruptly, therefore it can be modeled as a simple Marov process, e.g., for the frame : s s 1 v ; v ~ N(0, ); s0 1 (7) Remars: 1) One should mae sure that s remains positive for each frame. 2) If some prior nowledge about changes in scale is available, this nowledge can be incorporated into the model by modifying the distribution of v. For example, if we suppose that most of the time the scale will not change, then we should choose the truncated normal distribution added to delta distribution at s=0. This definition fits well into the particle filtering framewor, and maes the problem tractable. Furthermore, we are interested only in non-negative values of SNCC, thus we use the half-wave rectified scaled cross-correlation, in which the negative values replaced by zeros. In the next section, we will combine the advantages of the SNCC and particle filtering techniques. 3.1 Particle filtering 3. PARTICLE FILTERING Our tracer is based on the CONDENSATION algorithm proposed by Isard and Blae [3]. In this section, a short overview of the algorithm is given, and the application to scale invariant tracing is presented. The algorithm uses the SNCC as the lielihood for determining the target s position. We refer the reader to reference [2] for the complete bacground on particle filtering. In general, the goal of particle filtering is to estimate the sequence of hidden state parameters X, based only on the observed data Z. These estimates follow from the posterior distribution P(X Z 0,Z 1,,Z ). It is assumed that the state and the observations are first order Marov processes, and each Z depends only on X. The particle filter estimates the P(X Z 0,Z 1,,Z ) distribution, and it does not require any linearity or Gaussian assumptions on the model. The particle filter will generate a set of N samples that approximate the filtering distribution. For the -th frame, we denote the state vector by X =(x 1,x 2, ). For example, the state can be the top-left corner coordinates of the desired target (x 1 =x, x 2 =y) in the frame, and its scaling (x 3 =s). Additionally, the state can include velocity and acceleration of the target. The state estimate is recursively obtained as follows: where P( X Z, Z,... Z ) P( X Z, Z,... Z ) P( X X ) (8) 0 1 1 0 1 1 1 ( SNCC) P( Z X ) P ( Z X ) SNCC (9) The prediction step that corresponds to the distribution P( X X 1) is governed by system state dynamical equations. For example, if state time evolution is assumed to be smoothly changing, and there is no additional information about the target dynamics, then the simplest model given by is many times appropriate. The mean of X X 1 v, v ~ N(0, ) (10) X over all the particles is approximately the actual value of X.

3.2 The algorithm The state estimation is carried out by updating weighted particles according to (8). The following table summarizes the algorithm steps. INITIALIZATION The N particles ( n) X0, ( n 1,..., N) are drawn from the uniform distribution, or selected by the operator. For each video frame (-th frame), we perform the following steps: STEP 1: Using the particles from previous frame, predict the new state by sampling from: STEP 2: ( n) ( n) 1 X ~ P( X X ). (11) Measure and weight the new position in terms of the measured features Z : w ( n) ( SNCC) ( n) P Z X ( n) ( n) ( n) w 1 ( ),. (12) STEP 3: Resample the particles STEP 4: ( n X ), ( n 1,..., N) according to the weight ( n) w. Compute the state estimate from: N ˆ 1 ( n) X X, N n 1 (13) and repeat the steps (1-4) for the next video frame. The result of this algorithm is the estimated state ˆX, that includes the information about the position and scaling of the traced target in every video frame. 4. EXPERIMENTAL RESULTS AND DISCUSSION We tested the proposed algorithm in various situations, including highly cluttered exterior scenes with shadows and partial occlusions with a high rate of success. A single template was used for every video. We chose the simplest motion model (10). We selected the target manually in the first video frame. We traced the targets with 60 particles. The video resolution is 240x320, and the frame rate is 25 frames per second.

Figure 1: Maneuvering vehicle sequence with the tracing results. 4.1 Sequence 1: Maneuvering Vehicle In the first sequence, we want to trac a vehicle. Despite the significant zoom and moving camera, our tracer manages to follow the target (see Figure 1). This video represents a challenging scenario for tracing in outdoor conditions. 4.2 Sequence 2: Boat In the second sequence the boat is traced. The contrast of the boat with the bacground is so low, that the following the boat is hard even for a human observer (see Figure 2). Additionally, the scene is very noisy (water glare and the plume behind the boat). The tracer manages to overcome these problems. Although in frame 798 the tracer has the wrong estimate of scale (because of bad measurements), the algorithm reestablishes the correct estimate after a few frames.

Figure 2: The boat sequence with the tracing results. 4.3 Sequence 3: A Crowded Party In this sequence, we want to trac a single person in a large crowd. The results of tracing are shown in Figure 3. In the frame 83 the person traced, despite the variations in the form and partial occlusion. In the frames 115-125, a full occlusion occurs. At frame 123, our tracer temporary lost trac and the scaling is wrong. Nevertheless, the tracer finds the right position after the person reappears. We note that for all sequences, we used simple target dynamics model and a constant template. We assumed that no additional information is given about the target, besides the template. With learned higher order models, and smoothly changing adaptive template we expect to get even better results with the same algorithm.

Figure 3: Crowded party sequence with the tracing results. 5. CONCLUSION In this paper, we presented an algorithm for tracing video sequences of scaled and translated targets without the need for adaptation and learning mechanisms. Using a rather low dimensional state space, we achieve robust tracing results with many complicated and cluttered real world video sequences, including sequences with a moving camera. The combination of the particle filter with a correlation tracer maes it possible to get smooth target trajectories. The algorithm can cope with translations, and moderate deformations of the traced target, when the deformations affect only a small portion of pixels in the template. The algorithm is appropriate also for small targets with low contrast. The algorithm is time efficient, and should be suitable for real-time applications. The disadvantage of our approach is that it is not capable to trac the targets subjected to large rotations. The problems of partial and full occlusions should be addressed too. The next step in our research will be to add rotation states to the particle filter definition, and to choose good dynamic models for rotation, to achieve rotation invariant tracing. In addition, other types of correlation measures should be tested. Finally, in the future, the algorithm should be extended for multiple target tracing.

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] Yilmaz, A., Javed, O., and Shah, M., "Object Tracing: A Survey," ACM Computing Surveys, Vol. 38(4), (2006). Doucet, A., de Freitas, N., and Gordon, N., Sequential Monte Carlo Methods in Practice, Springer, (2001). Isard, M., and Blae, A., "CONDENSATION Conditional Density Propagation for Visual Tracing," International Journal of Computer Vision, Vol. 29(1), pp. 5-28, (1998). Cahn von Seelen, U.M., and Bajcsy, R.,"Adaptive Correlation Tracing of Targets with Changing Scale," Reconnaisance, Surveillance, and Target Acquisition for the Unmanned Ground Vehicle, Morgan Kaufmann Publishers, San Francisco, CA, pp. 313-322, (1997). Zhao, F., Huang, Q., and Gao, W., "Image Matching by Normalized Cross-Correlation," ICASSP Proceedings, (2006). Ooi, J., and Rao, K., "New Insights Into Correlation-Based Template Matching," Proceedings of SPIE, Vol. 1468, pp. 740-751, (1991). Mei, X., Zhou, S.K., and Porili, F., "Probabilistic Visual Tracing via Robust Template Matching and Incremental Subspace Update," IEEE International Conference on Multimedia and Expo, pp. 1818-1821,( 2007). Zhou, S., Chellappa, R., and Moghaddam, B., "Appearance Tracing Using Adaptive Models in a Particle Filter," Proc. of Asian Conf. on Computer Vision, (2004) Lewis, J.P., "Fast Normalized Cross-Correlation," Vision Interface, Quebec, Canada, pp. 120-123, (1995).