Tracking in image sequences

Similar documents
Kanade Lucas Tomasi Tracking (KLT tracker)

OPPA European Social Fund Prague & EU: We invest in your future.

Kanade Lucas Tomasi Tracking (KLT tracker)

Visual Tracking (1) Pixel-intensity-based methods

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Visual Tracking (1) Feature Point Tracking and Block Matching

Autonomous Navigation for Flying Robots

The Lucas & Kanade Algorithm

Learning Efficient Linear Predictors for Motion Estimation

Visual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.

SE 263 R. Venkatesh Babu. Object Tracking. R. Venkatesh Babu

Tracking Computer Vision Spring 2018, Lecture 24

Feature Tracking and Optical Flow

Computer Vision II Lecture 4

Feature Tracking and Optical Flow

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

Visual Tracking. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Robert Collins CSE598G. Intro to Template Matching and the Lucas-Kanade Method

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Leow Wee Kheng CS4243 Computer Vision and Pattern Recognition. Motion Tracking. CS4243 Motion Tracking 1

Computer Vision I - Filtering and Feature detection

Computational Optical Imaging - Optique Numerique. -- Single and Multiple View Geometry, Stereo matching --

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Computer Vision Lecture 20

Computer Vision Lecture 20

Feature Detection and Matching

Visual motion. Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Deep Learning: Image Registration. Steven Chen and Ty Nguyen

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm

Motion Estimation and Optical Flow Tracking

EE795: Computer Vision and Intelligent Systems

NIH Public Access Author Manuscript Proc Int Conf Image Proc. Author manuscript; available in PMC 2013 May 03.

Computer Vision for HCI. Topics of This Lecture

Image processing and features

Towards the completion of assignment 1

Lecture 10 Detectors and descriptors

Peripheral drift illusion

FAST LEARNABLE OBJECT TRACKING AND DETECTION IN HIGH-RESOLUTION OMNIDIRECTIONAL IMAGES

Image Registration Lecture 4: First Examples

Ninio, J. and Stevens, K. A. (2000) Variations on the Hermann grid: an extinction illusion. Perception, 29,

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

COMPUTER VISION > OPTICAL FLOW UTRECHT UNIVERSITY RONALD POPPE

CS664 Lecture #18: Motion

Prof. Feng Liu. Spring /26/2017

Wikipedia - Mysid

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

Optical flow and tracking

arxiv: v1 [cs.cv] 2 May 2016

2D Image Processing Feature Descriptors

CS201: Computer Vision Introduction to Tracking

Computer Vision II Lecture 4

Feature Based Registration - Image Alignment

Automatic Image Alignment (direct) with a lot of slides stolen from Steve Seitz and Rick Szeliski

ECE Digital Image Processing and Introduction to Computer Vision

Problems with template matching

A Comparison of SIFT, PCA-SIFT and SURF

Image Coding with Active Appearance Models

Automatic Image Alignment

Using Geometric Blur for Point Correspondence

Evaluation and comparison of interest points/regions

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Using temporal seeding to constrain the disparity search range in stereo matching

III. VERVIEW OF THE METHODS

The Template Update Problem

The most cited papers in Computer Vision

Motion Estimation. There are three main types (or applications) of motion estimation:

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

Automatic Image Alignment

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

DETECTORS AND DESCRIPTORS FOR PHOTOGRAMMETRIC APPLICATIONS

ROBUST OBJECT TRACKING BY SIMULTANEOUS GENERATION OF AN OBJECT MODEL

Outline 7/2/201011/6/

Dense Image-based Motion Estimation Algorithms & Optical Flow

Robustifying the Flock of Trackers

Comparison between Motion Analysis and Stereo

Error Analysis of Feature Based Disparity Estimation

SCALE INVARIANT TEMPLATE MATCHING

16720 Computer Vision: Homework 3 Template Tracking and Layered Motion.

Local invariant features

Requirements for region detection

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Displacement estimation

Multiview 3D tracking with an Incrementally. Constructed 3D model

NOWADAYS, the computer vision is one of the top

Particle Tracking. For Bulk Material Handling Systems Using DEM Models. By: Jordan Pease

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

Occluded Facial Expression Tracking

Construction of Precise Local Affine Frames

Computational Optical Imaging - Optique Numerique. -- Multiple View Geometry and Stereo --

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

Capturing, Modeling, Rendering 3D Structures

Recall: Derivative of Gaussian Filter. Lecture 7: Correspondence Matching. Observe and Generalize. Observe and Generalize. Observe and Generalize

Computer Vision I. Announcements. Fourier Tansform. Efficient Implementation. Edge and Corner Detection. CSE252A Lecture 13.

Computer Vision: Homework 3 Lucas-Kanade Tracking & Background Subtraction

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Wide Baseline Matching using Triplet Vector Descriptor

Simultaneous Recognition and Homography Extraction of Local Patches with a Simple Linear Classifier

3D Photography. Marc Pollefeys, Torsten Sattler. Spring 2015

Face Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation

Transcription:

CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY Tracking in image sequences Lecture notes for the course Computer Vision Methods Tomáš Svoboda svobodat@fel.cvut.cz March 23, 2011 Lecture notes Center for Machine Perception, Department of Cybernetics Faculty of Electrical Engineering, Czech Technical University Technická 2, 166 27 Prague 6, Czech Republic fax +420 2 2435 7385, phone +420 2 2435 7637, www: http://cmp.felk.cvut.cz

Contents 1 Patch tracking, KLT and NCC...................... 4 1.1 Measuring pixel-based (di)similarity between patches.......... 4 1.2 Lucas-Kanade tracker, and (some) of its modifications......... 7 2 Color model for particle filtering..................... 11 3 Mean-shift tracking............................ 12 References................................... 13 1

Contents 2

The following chapters do not provide a complete survey of motion estimation or object tracking, see e.g. 8]. On the other hand the presented methods are widely accepted and often serve as building blocks in more complex computer vision algorithms. The text is still a draft and the author will gladly receive any comments and criticism.

1 Patch tracking, Lucas-Kanade tracker and Normalized Cross-Correlation This chapter introduces one of the essential cornerstones of computer vision, image alignment, see Figure 1.1. The problem of finding an image patch within a larger image appears in many areas of computer vision, image matching, tracking, or image based retrieval to name just a few. There are many ways how to compare two image patches here, we will measure the similarity by directly comparing the pixel intensities. The number of papers and amount of work in this area is vast. We do Figure 1.1: Goal is to align a template image T (x) to an input image I(x), where x is a column vector containing image coordinates x, y]. The I(x) could be also a small sub-window within an image. not provide exhaustive overview the publications cited here are truly just the seminal works and partly a personal selection of the author(s). 1.1 Measuring pixel-based (di)similarity between patches The sought criterion function measures a difference between a image patch I(x) and a template T (x), where x denotes set of spatial image coordinates. The sizes of the 4

Chapter 1: Patch tracking, KLT and NCC 2 4 6 8 10 2 4 6 8 10 0.5 1 1.5 10 20 30 40 50 60 70 80 90 100 Figure 1.2: The image matrix, image of the J character in this case, can be arranged in to a vector. 50 100 50 100 criterial function ssd x 10 7 4 3.5 150 150 3 200 250 200 250 2.5 300 300 2 350 400 450 350 400 450 1.5 1 500 500 0.5 550 100 200 300 400 500 600 700 550 100 200 300 400 500 600 700 0 Figure 1.3: Sum of square differences image patch and the template are assumed to be the same. We can see I and T as discrete 2D function on a regular grid with image intensities as the functional values. Matrix (algebraic) oriented view may see I and T as matrices, which may be conveniently re-arranged into vectors, see Figure 1.2. We can here work with the image as an 1D function which simplifies thinking. A simple difference obviously does not suffice. Sum of squared differences (SSD) is an often used similarity measure: ssd(x, y) = k (T (k, l) I(x + k, y + l)) 2, (1.1) l where x, y denote position of the patch, think about the center or one of the corners, k, l denote the local template coordinates. The equation above essentially means: compute the square difference in all pixels and sum them up. A simple experiment is visualized on Figure 1.3. The template which is delineated by the green rectangle is compared to the all possible patches of the same size in the image. One can see it as the green window slides over the whole image and in all possible positions the ssd measure is computed. Dark blue is the best match (small difference) and dark red is the worst. We can see that the dark patch is clearly distinguishable form the brighter areas but the fine wooden structure on the doors do not play any significant role. A similar result can be observed for another alternative measure Sum of absolute 5

Chapter 1: Patch tracking, KLT and NCC 50 100 150 200 250 300 350 400 450 500 550 100 200 300 400 500 600 700 50 100 150 200 250 300 350 400 450 500 550 criterial function sad 100 200 300 400 500 600 700 x 10 5 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 Figure 1.4: Sum of absolute differences differences sad(x, y) = k T (k, l) I(x + k, y + l) (1.2) l see Figure 1.4. The main advantage of sad over the ssd is naturally the computation cost. Absolute difference is computed faster than the power of two. Yet the sad is differentiable in the whole domain. Normalized cross-correlation is known from statistics and signal processing. It is defined as: r(x, y) = k ( k l l ) ( ) T (k, l) T I(x + k, y + l) I(x, y) ( ) 2 T (k, l) T k l ( I(x + k, y + l) I(x, y) ) 2, (1.3) where T means mean value in the template, and I(x, y) mean value of the image patch at the position x, y. From the statistical point of view we may see that the normalized cross-correlation can be computed from variances and covariance. r IT = σ T I σt σ I, (1.4) where σ T is the variance of the intensity signal in the template, σ I in the image patch and σ IT is the covariance (sometimes called cross-covariance) between the template and image patch. The interpretation is clear: r(x, y) = 1 means exactly same patch, r(x, y) = 1 means exact negative, r(x, y) = 0 just random difference (independent variables have r = 0). Result for the normalized cross correlation is visible on Figure 1.5. We can see that this measure is much more discriminative than ssd or sad. It is able to distinguish individual similar areas on the wooden doors. Although (1.3) may look frightening from the computational point of view several elements may be 6

Chapter 1: Patch tracking, KLT and NCC criterial function ncc 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 0.8 0.6 0.4 0.2 0 0.2 0.4 550 100 200 300 400 500 600 700 550 0.6 100 200 300 400 500 600 700 Figure 1.5: Normalized cross-correlation pre-computed and an efficient algorithm exists 3] and implementations are widely available in many programming languages. Although the normalized cross correlation has been replaced by more invariant feature detectors 5] it may actually outperform the advanced rivals at several circumstances 6]. We see that all the criterion functions are far from convex. One possible way for aligning the template with its proper position would be to slide over the whole image and selects the best value of the particular criterion function. Despite effective implementations computation complexity of this sliding window approach may by prohibitive at many applications. Thankfully, in many application we know at least approximate position and number of possible location is thus conveniently limited. Further possibility would be an optimization approach which is discussed in the next section. 1.2 Lucas-Kanade tracker, and (some) of its modifications We explain the essentials of this widely adopted optimization technique. Our explanation mainly follows the paper 1], the original paper is 4]. Just remind, the goal is to align a template image T (x) to an input image I(x), where x is a column vector containing image coordinates x, y]. The I(x) could be also a small sub-window within an image, see Figure 1.1. In the optimization formulation we are at some initial state position and we search for the optimal transformation that brings us to the perfect alignment. Let define set of allowable warps (geometrical transformations) 7

Chapter 1: Patch tracking, KLT and NCC W(x; p), where p is a vector of parameters. For translations it would be ] x + p1 W(x; p) = y + p 2 (1.5) but W(x; p) can be arbitrarily complex. The best alignment, p, minimizes image dissimilarity p = arg min I(W(x; p)) T (x)] 2 (1.6) p x which is the already known sum of squared differences. It is important to realize that (1.6) is a highly non-linear optimization problem. Though the warp W(x; p)) can be linear function in p the pixel values I(x) are not linear x. Essentially, intensity values are (linearly) unrelated to their locations. We can linearize (1.6) in a point p by using the first order Taylor expansion and, assuming some p is known, search for the optimal increment p = arg min p x I(W(x; p)) + p T (x)]2, (1.7) where I is the derivation of the image function, i.e. the row-wise oriented 1 image gradient ] I I I = (1.8) x y evaluated at W(x; p) and W is the Jacobian of the warp W = Wx / 1 W x / 2... W y / 1 W y / 2... ]. (1.9) For the translation warp (1.5) W 1 0 = 0 1 ]. (1.10) The problem (1.7) is linear in p and we can find a close form solution. We are looking for an extreme of (1.7) hence, we derive w.r.t. the p and get: 2 x ] I(W(x; p)) + p T (x) ] (1.11) 1 Vectors are usually column-wise oriented, here we use row vector in order to simplify notation. 8

Chapter 1: Patch tracking, KLT and NCC We set equality to zero yields and multiply the elements in brackets, the multiplicative 2 is not important 0 = x ] I(W(x; p))+ ] x separating p yields: p = H 1 x ] p x ] T (x) ] T (x) I(W(x; p))] (1.12) where H is (Gauss-Newton) approximation of Hessian matrix. Is it insightful to see that the element T (x) I(W(x; p)) is an error image, difference between the template and warped image. H = x ] ]. (1.13) In order to make the equation more clear we consider the warp to be pure 2D translation. Inserting (1.8) and (1.10) into (1.13) yields: H = x = 1 0 x 0 1 = x ] ] ] ] I x I I ) 2 y I I ( I x I I x y x y ( ) 2 I y ] x, I 1 0 x ] 0 1 Putting everything into (1.12) reveals p = x ( ) 2 I I I x x y ( ) 2 I I I x y y 1 x I(x; p) I x I(x; p) I y ] (1.14) where I(x; p) denote the intensity differences between the template and warped image. Update equation (1.14) illuminates the process in several aspects. Clearly, if there is no difference between warped image and the template the update is zero. The higher gradients in the image the lower is the inverse Hessian hence, the smaller update. And, perhaps most importantly, no gradients in the image, just think about a textureless wall, means very unstable update. If the Hessian (1.13) is close to a singular matrix the alignment gets unstable. This texturedness condition has been pointed out in 2] and, probably independently, in 7]. Assume, we have some initial estimate of p, it could be also p = 0. The iterative search for the best alignment can be summarized as follows: 9

Chapter 1: Patch tracking, KLT and NCC 1. Warp I with W(x; p) 2. Warp the gradient I with W(x; p) 3. Evaluate the Jacobian W 4. Compute the Hessian H = x 5. Compute p = H 1 x at (x; p) and compute the steepest descent image 6. Update the parameters p p + p until p ɛ ] ] T (x) I(W(x; p))] ] 10

2 Color model for particle filtering 11

3 Mean-shift tracking 12

References 1] Simon Baker and Iain Matthews. Lucas-Kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3):221 255, 2004. 2] C. Harris and M. Stephen. A combined corner and edge detection. In M. M. Matthews, editor, Proceedings of the 4th ALVEY vision conference, pages 147 151, University of Manchaster, England, September 1988. on-line copies available on the web. 3] J.P. Lewis. Fast template matching. In Vision Interfaces, pages 120 123, 1995. Extended version published on-line as "Fast Normalized Cross-Correlation" at http://www.idiom.com/~zilla/work/nvisioninterface/nip.pdf. 4] Bruce D. Lucas and Takeo Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Conference on Artificial Intelligence, pages 674 679, August 1981. 5] K Mikolajczyk, T Tuytelaars, C Schmid, A Zisserman, J Matas, F Schaffalitzky, T Kadir, and L van Gool. A comparison of affine region detectors. International Journal of Computer Vision, 65(7):43 72, November 2005. 6] Tomáš Petříček and Tomáš Svoboda. Matching by normalized cross-correlation reimplementation, comparison to invariant features. Research Report CTU CMP 2010 09, Center for Machine Perception, K13133 FEE Czech Technical University, Prague, Czech Republic, July 2010. 7] Jianbo Shi and Carlo Tomasi. Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 593 600, 1994. 8] Alper Yilmaz, Omar Javed, and Mubarak Shah. Object tracking: A survey. ACM Computing Surveys, 38(4), 2006. 13