Superpixel Tracking. The detail of our motion model: The motion (or dynamical) model of our tracker is assumed to be Gaussian distributed:

Similar documents
THE recent years have witnessed significant advances

Object Tracking using HOG and SVM

Online Multiple Support Instance Tracking

Individualness and Determinantal Point Processes for Pedestrian Detection: Supplementary Material

Real-time Object Tracking via Online Discriminative Feature Selection

CS4495 Fall 2014 Computer Vision Problem Set 6: Particle Tracking

SD 372 Pattern Recognition

International Journal of Advance Engineering and Research Development

OBJECT tracking has been extensively studied in computer

Online Discriminative Tracking with Active Example Selection

ECSE-626 Project: An Adaptive Color-Based Particle Filter

Based on Regression Diagnostics

Adaptive Background Mixture Models for Real-Time Tracking

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

RICOH PJ KU12000/LU8000. Warping and Blending Tool User s Manual

Face Detection and Recognition in an Image Sequence using Eigenedginess

Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool.

THE goal of object tracking is to estimate the states of

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

Dynamic Shape Tracking via Region Matching

A Feature Point Matching Based Approach for Video Objects Segmentation

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

EKF Localization and EKF SLAM incorporating prior information

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

Incremental Subspace Tracking ECCV 2004

A Dynamic Human Model using Hybrid 2D-3D Representations in Hierarchical PCA Space

A Robust Facial Feature Point Tracker using Graphical Models

Tracking Pedestrians using Local Spatio-temporal Motion Patterns in Extremely Crowded Scenes

Markov chain Monte Carlo methods

Unsupervised Learning

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

K-Means and Gaussian Mixture Models

Tracking Using Online Feature Selection and a Local Generative Model

Independently Moving Objects Detection based on Stereo Visual Odometry

Switching Hypothesized Measurements: A Dynamic Model with Applications to Occlusion Adaptive Joint Tracking

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University

Occluded Facial Expression Tracking

Statistical image models

Edge and corner detection

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Real-Time Human Detection using Relational Depth Similarity Features

Active Appearance Models

CHAPTER 2 ADAPTIVE DECISION BASED MEDIAN FILTER AND ITS VARIATION

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

Efficient Acquisition of Human Existence Priors from Motion Trajectories

Object Tracking with an Adaptive Color-Based Particle Filter

VISUAL TRACKING has been one of the fundamental

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

2 Cascade detection and tracking

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Analyzing ICAT Data. Analyzing ICAT Data

Visual Tracking Using Pertinent Patch Selection and Masking

Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics

A Statistical approach to line segmentation in handwritten documents

A Formal Approach to Score Normalization for Meta-search

Data Association for SLAM

Multi-Target Tracking and Occlusion Handling With Learned Variational Bayesian Clusters and a Social Force Model

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Error-Diffusion Robust to Mis-Registration in Multi-Pass Printing

Object Tracking using Superpixel Confidence Map in Centroid Shifting Method

Det De e t cting abnormal event n s Jaechul Kim

Note Set 4: Finite Mixture Models and the EM Algorithm

Estimating the wavelength composition of scene illumination from image data is an

Real Time Unattended Object Detection and Tracking Using MATLAB

Postprint.

CS 787: Assignment 4, Stereo Vision: Block Matching and Dynamic Programming Due: 12:00noon, Fri. Mar. 30, 2007.

Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei i Han 1 University of Illinois, IBM TJ Watson.

Warping & Blending AP

Visual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.

Tracking with Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm

Fragment-based Visual Tracking with Multiple Representations

COMP5318 Knowledge Management & Data Mining Assignment 1

Face detection and recognition. Detection Recognition Sally

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Robust color segmentation algorithms in illumination variation conditions

Human pose estimation using Active Shape Models

A Bayesian Framework for Real-Time 3D Hand Tracking in High Clutter Background

Visual Saliency Based Object Tracking

Multi-Channel Adaptive Mixture Background Model for Real-time Tracking

Online Spatial-temporal Data Fusion for Robust Adaptive Tracking

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

Logistic Regression. Abstract

[2006] IEEE. Reprinted, with permission, from [Wenjing Jia, Huaifeng Zhang, Xiangjian He, and Qiang Wu, A Comparison on Histogram Based Image

Separating Objects and Clutter in Indoor Scenes

A Spatio-Spectral Algorithm for Robust and Scalable Object Tracking in Videos

Road Tracking in Aerial Images Based on Human-Computer Interaction and Bayesian Filtering

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

A Robust Color Image Watermarking Using Maximum Wavelet-Tree Difference Scheme

Object Tracking using Modified Mean Shift Algorithm in A Robust Manner

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

SEMI-BLIND IMAGE RESTORATION USING A LOCAL NEURAL APPROACH

FILTERING OF DIGITAL ELEVATION MODELS

The Pennsylvania State University. The Graduate School. College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS

Globally Stabilized 3L Curve Fitting

A New Approach for Train Driver Tracking in Real Time. Ming-yu WANG, Si-le WANG, Li-ping CHEN, Xiang-yang CHEN, Zhen-chao CUI, Wen-zhu YANG *

Transcription:

Superpixel Tracking Shu Wang 1, Huchuan Lu 1, Fan Yang 1 abnd Ming-Hsuan Yang 2 1 School of Information and Communication Engineering, University of Technology, China 2 Electrical Engineering and Computer Science, University of California at Merced, United States This document includes the illustrations of the training and tracking procedures. Some experimental results are also presented in this document in high resolution. In addition the detail of the Gaussian motion model is presented here. Our algorithm is implemented in MATLAB and is run on a machine of 2.2GHz duo core CPU with 2GB memories. It takes less than 5 seconds per frame during tracking, and 2 seconds or more during every updating procedure. The most timeconsuming part is due to the use of the mean shift clustering algorithm. All the code and data sets will be made available at http://faculty.ucmerced.edu/mhyang/pubs/iccv11a.html. The detail of our motion model: The motion (or dynamical) model of our tracker is assumed to be Gaussian distributed: p(x t X t 1 ) = N(X t ;X t 1,Ψ) (1) where Ψ is a diagonal covariance matrix whose elements are the standard deviations for location and scale, i.e., σ c and σ s. The values of σ c and σ s dictate how the proposed algorithm accounts for motion and scale change. Furthermore, σ c and σ s are related with the target size and resolution of a certain video. According to these factors and Gaussian probability-density function, we calculate the motion estimatep(x (l) t X t 1 ) of N target candidates {X (l) t } N l=1 as follows: p(x (l) t X t 1 ) = (2πσ 2 i ) 1 2 exp[ 1 2 ( Xi,(l) t X i t 1 2 σ i λ r ) ] i=c,s, l = 1,...,N (2) Parameterλ r represents the resolution of a certain video, and it is obtained by: λ r = v h +v w 2 where parameterv h and v w denotes the height and the width of the frame image, respectively. (3) 1

Figure 1 represents a part of the experimental results comparison of our tracker and other state-of-the-art trackers ( tracker, Visual Tracking Decomposition () tracker, tracker, tracker, PDAT and PROST tracker). bird2#7 bird2 #11 bird2 #19 lemming#867 lemming#155 lemming #112 transformer#17 transformer#52 transformer #124 woman#32 woman #65 woman #236 Our Tracker Tracker PDAT Tracker Tracker PROST Tracker Figure 1. Four common challenges encountered in tracking. The results by our tracker,,, PROST, Track and PDAT methods are represented by yellow, red, white, green, cyan, and magenta rectangles. Existing state-of-the-art trackers are not able to effectively handle heavy occlusion, large variation of pose and scale, and non-rigid deformation, while our tracker gives more robust results.

Figure 2 illustrates the whole tracking process, and Figure 3 explains for Equation 11 in our paper. (a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 2. Illustration of the use of confidence map for state prediction in tracking process. (a) a new frame at time t. (b) neighborhood of the target location in the last frame, i.e., at state X t 1. (c) segmentation result of (b). (d) the computed confidence map of superpixel. The superpixels colored with red indicates strong likelihood of belonging to the target and those colored with dark blue indicate strong likelihood of belonging to background. (e), (f) and (g), (h) show two target candidates with high and low confidence, respectively. warped image confidence map Figure 3. Confidence map. Four target candidate regions corresponding to states X (i) t, i = 1,...,4 are shown both in warped image and the confidence map. These candidates confidence regionsm i,i = 1,...,4 have the same canonical size (upper right) after normalization. Based on corresponding equation (Equation 1 in our paper), Candidates X (1) t and X (2) t have similar positive confidence C 1 and C 2. Candidates X (3) t and X (4) t have similar negative confidence C 3 and C 4. However, candidate X (2) t covers less target area than X (1) t, and X (4) t covers more background area than X (3) t. Intuitively, target-background confidence of X (1) t should be higher than X (2) t, while confidence ofx (4) t should be lower than X (3) t. These two factors are considered in computing confidence map in the tracking stage.

Figure 4 illustrates the entire training process, and Figure 5 shows the tracking results with comparisons to color-based trackers of mean shift tracker and adaptive color-based particle filter. Figure 4. Illustration of the training process. girl#745 bolt #26 liquor #1236 basketball #169 singer1 #77 woman#273 Our Tracker Mean Shift Tracker Adaptive Color-based Particle Filter Figure 5. Tracking results with comparisons to color-based trackers. The results by the MS tracker, PF method and our algorithm are represented by red ellipse, green ellipse and yellow rectangles. It is evident that our tracker is able to handle cluttered background (girl and basketball sequences), drastic movement (bolt sequence), heavy occlusion (liquor and woman sequences) and lighting condition change (singer1 sequence).

Figure 6 and Figure 7 represents the experimental results comparison of our tracker and other state-of-the-art trackers ( tracker, Visual Tracking Decomposition () tracker, tracker, tracker and PROST tracker). liquor#51 liquor #734 liquor #778 liquor#1187 liquor #1413 liquor #1722 singer1#28 singer1 #126 singer1 #281 basketball #35 basketball #485 basketball #725 Our Tracker Tracker Tracker Tracker PROST Tracker Figure 6. Tracking results with comparisons to other state-of-the-art trackers. The results by our tracker,,, PROST, Track and Track methods are represented by yellow, red, white, green, blue and cyan rectangles.

bird1 #33 bird1 #1 bird1 #185 bird1 #268 bird1 #314 bird1 #371 girl#117 girl #1395 girl #15 bolt#2 bolt #184 bolt #35 Our Tracker Tracker Tracker Tracker PROST Tracker Figure 7. Tracking results with comparisons to other state-of-the-art trackers. The results by our tracker,,, PROST, Track and Track methods are represented by yellow, red, white, green, blue and cyan rectangles.

Figure 8 shows the tracking results of figure/ground segmentation across frames on sequence liquor. The 1st row are original images from six different frames, and the 2nd row are the local regions of target based on previous tracking results. The 3rd row are the confidence maps of correspondent local regions, which is obtained by using the appearance model. The 4th row are the figure/ground segmentation results. We obtain the segmentation results (the 4th row in Figure 8) by adopting a simple adaptive threshold-based segmentation method. The 5th row are the final tracking results of each frame. liquor #278 liquor #768 liquor #775 liquor #1287 liquor #1453 liquor #1736 Figure 8. The tracking results of figure/ground segmentation across frames on sequence liquor. The 1st row are original images from six different frames, and the 2nd row are the local regions of target based on previous tracking results. The 3rd row are the confidence maps of correspondent local regions, which is obtained by using the appearance model. The 4th row are the figure/ground segmentation results. The 5th row are the final tracking results of each frame. Frame 278 shows that the target appearance is well learned by our appearance model. A simple adaptive threshold-based segmenting on the confidence map (the 3rd row) of target local region (the 2nd row) gives finest figure/ground segmentation result (the 4th row). Frames 768 and 778 show a full procedure when the target experiences a full occlusion by a similar target. It can be clearly seen from the confidence map (the 3rd row) that our appearance model gives high measurement to the superpixels that belong to target. Further, when a heavy occlusion occurs, the target disappears from our confidence map. Frame 1287 shows another example of successful tracking under full occlusion. Frame 1453 shows that our tracker is robust to target pose changes that it still gives fine tracking result under severe pose variance (the 4th and the 5th row). Frame 1736 shows that our tracker survives all challenging factors in sequence liguor and our appearance model gives accurate segmentation result by the end of this sequence.

The plots from the paper in greater resolution are also reproduced in Figure 9. 8 7 6 5 4 3 2 1 lemming 2 4 6 8 1 12 8 7 6 5 4 3 2 1 basketball PROST 1 2 3 4 5 6 7 18 16 14 12 1 8 6 4 2 bolt 5 1 15 2 25 3 35 8 7 6 5 4 3 2 1 bird2 1 2 3 4 5 6 7 8 9 2 15 1 5 transformer 2 4 6 8 1 12 25 2 15 1 5 liquor 2 4 6 8 1 12 14 16 16 14 12 1 8 6 4 2 woman PROST 5 1 15 2 25 3 2 15 1 5 bird1 5 1 15 2 25 3 35 4 4 35 3 25 2 15 1 5 girl 5 1 15 4 35 3 25 2 15 1 5 singer1 5 1 15 2 25 3 35 Figure 9. Quantitative evaluations of our tracker and other state-of-the-art trackers. Trackers of ours,, Visual Tracking Decomposition (), Track, Track, PROST are presented by color magenta, red, black, blue, cyan and green.