Long-Term Moving Object Segmentation and Tracking Using Spatio-Temporal Consistency

Similar documents
A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Real-Time View Recognition and Event Detection for Sports Video

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

ISSN Vol.04,Issue.15, October-2016, Pages:

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Cluster Analysis of Electrical Behavior

A Binarization Algorithm specialized on Document Images and Photos

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Detection of an Object by using Principal Component Analysis

Background Removal in Image indexing and Retrieval

Efficient Content Representation in MPEG Video Databases

Hierarchical clustering for gene expression data analysis

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Object-Based Techniques for Image Retrieval

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Hermite Splines in Lie Groups as Products of Geodesics

Multiple Frame Motion Inference Using Belief Propagation

A SALIENCY BASED OBJECT TRACKING METHOD

Reducing Frame Rate for Object Tracking

Classifying Acoustic Transient Signals Using Artificial Intelligence

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

3D vector computer graphics

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

A Background Subtraction for a Vision-based User Interface *

An Image Fusion Approach Based on Segmentation Region

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

Machine Learning: Algorithms and Applications

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

An Efficient Background Updating Scheme for Real-time Traffic Monitoring

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Accurate Overlay Text Extraction for Digital Video Analysis

Optimized Region Competition Algorithm Applied to the Segmentation of Artificial Muscles in Stereoscopic Images

Wishing you all a Total Quality New Year!

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem

A Gradient Difference based Technique for Video Text Detection

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Adaptive Silhouette Extraction and Human Tracking in Dynamic. Environments 1

A Gradient Difference based Technique for Video Text Detection

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

The Codesign Challenge

Multi-view 3D Position Estimation of Sports Players

MOTION BLUR ESTIMATION AT CORNERS

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

TN348: Openlab Module - Colocalization

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Face Recognition using 3D Directional Corner Points

Active Contours/Snakes

Lecture 5: Multilayer Perceptrons

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A NEW FUZZY C-MEANS BASED SEGMENTATION STRATEGY. APPLICATIONS TO LIP REGION IDENTIFICATION

Structure from Motion

Semantic Image Retrieval Using Region Based Inverted File

Efficient Video Coding with R-D Constrained Quadtree Segmentation

Unsupervised Learning and Clustering

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

CS 534: Computer Vision Model Fitting

Unsupervised Learning

Generalized Video Deblurring for Dynamic Scenes

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

UB at GeoCLEF Department of Geography Abstract

Real-time ghost removal for foreground segmentation methods

Joint Example-based Depth Map Super-Resolution

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

X- Chart Using ANOM Approach

A new segmentation algorithm for medical volume image based on K-means clustering

An Optimal Algorithm for Prufer Codes *

Face Tracking Using Motion-Guided Dynamic Template Matching

Face Detection with Deep Learning

Module Management Tool in Software Development Organizations

S1 Note. Basis functions.

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Positive Semi-definite Programming Localization in Wireless Sensor Networks

A fast algorithm for color image segmentation

Color Image Segmentation Using Multispectral Random Field Texture Model & Color Content Features

Adaptive Silhouette Extraction In Dynamic Environments Using Fuzzy Logic. Xi Chen, Zhihai He, James M. Keller, Derek Anderson, and Marjorie Skubic

Editorial Manager(tm) for International Journal of Pattern Recognition and

A Super-resolution Algorithm Based on SURF and POCS for 3D Bionics PTZ

Mathematics 256 a course in differential equations for engineering students

Fitting: Deformable contours April 26 th, 2018

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Image Alignment CSC 767

Unsupervised Content Discovery in Composite Audio Rui Cai Department of Computer Science and Technology, Tsinghua Univ. Beijing, , China

Virtual Machine Migration based on Trust Measurement of Computer Node

Resolving Ambiguity in Depth Extraction for Motion Capture using Genetic Algorithm

A Comparison and Evaluation of Three Different Pose Estimation Algorithms In Detecting Low Texture Manufactured Objects

Palmprint Feature Extraction Using 2-D Gabor Filters

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Active 3D scene segmentation and detection of unknown objects

Transcription:

Long-Term Movng Obect Segmentaton Trackng Usng Spato-Temporal Consstency D Zhong Shh-Fu Chang {dzhong, sfchang}@ee.columba.edu Department of Electrcal Engneerng, Columba Unversty, NY, USA Abstract The success of obect-based meda representaton descrpton (e.g., MPEG-4 7) depends largely on effectve obect segmentaton tools. In ths paper, we exp our prevous work on automatc vdeo regon trackng develop a robust - movng obects detecton system. In our system, we frst utlze nnovatve methods of combnng color edge nformaton n mprovng the obect moton estmaton results. Then we use the long-term spato-temporal constrants to acheve relable obect trackng over long sequences. Our extensve experments demonstrate excellent results n hlng challengng cases n general domans (e.g., stock footage) ncludng depth-varyng mult-layer background fast camera moton. 1. Introducton The newly establshed MPEG-4 stard has proposed an obect-based framework for effcent multmeda representaton. Smlarly, the upcomng MPEG-7 stard, whch ams at offerng a comprehensve set of audovsual descrpton tools, also adopts an obect-orented model to capture nformaton about obects, events, scenes ther relatonshps. In both stards, segmentaton of obects s non-normatve s left to technology developers researchers. Thus, the success of obect-based meda representaton descrpton depends largely on effectve tools for obect segmentaton. Although much work has been done n decomposng mages nto regons wth unform features, we are stll lackng robust technques for segmentng semantc vdeo obects n general vdeo sources. In our prevous work, AMOS [7], we developed a general nteractve tool for semantc obect segmentaton. It can be used n offlne applcatons where obect-based compresson ndexng s needed. In the case when real-tme processng s requred, user nputs are usually not feasble or very lmted. For example, n broadcast sports or news programs, f we want to parse summarze vdeo obects events n real tme, automatc obect extracton methods are needed. In ths paper, we apply exp our prevous work on automatc vdeo regon trackng [6] develop an automatc movng obect trackng system by groupng lowlevel regons usng doman models. Specfcally, we wll look at the moton characterstcs of obects, extract salent movng obects from complex scenes. Our man obectves are: real-tme, fully automatc, capable of hlng practcal stuatons nvolvng complex scenes. These combned features dstngush our system from exstng works. Except for some specal cases (e.g., survellance vdeos), common TV programs home vdeos usually contan camera motons. In these stuatons, to detect movng obects, we frst need to compensate motons caused by camera operatons. As ponted out n [1], the camera nduced mage moton depends on the ego-moton parameters (.e., rotaton, zoom translaton) of the camera the depth of each pont n the scene. In general, t s an nherently ambguous problem to estmate the depth nformaton these physcal parameters. Exstng camera moton detecton approaches can be generally dvded nto two classes: 2D algorthms that assume the scene can be approxmated by a flat surface, 3D algorthms that work well only when sgnfcant depth varatons are preserved n the scene. It has been notced [1] that n 2D scenes when the depth varatons are not sgnfcant, the 3D algorthms are not robust or relable. On the other h, 2D algorthms usng a 2D global parametrc model (e.g., affne model) cannot hle 3D scenes where there are multple movng layers under camera motons. As typcally depth nformaton s not well preserved, 2D algorthms are used more wdely than 3D algorthms. When the scene s far from the camera /or the camera moton only ncludes rotaton zoom, a sngle affne moton model can be used to model compensate the cameruced moton. However, when the scene s close to the camera the camera s translatng, multple movng planar surfaces may be produced n the mage sequence. For example, n Fgure 3, the fourth sequence contans many moton layers- the ground, the skater the wall. In general, the above two scenaros may follow each other n

the same vdeo shot wth gradual transtons between them. To manage ths problem, many approaches have been proposed to use multple 2D parametrc models to capture multple moton layers. In [5], affne moton parameters are frst estmated from the optcal flow by lnear regresson, then spatotemporal segmentaton s obtaned by a clusterng n the affne parametrc space. In [2], a domnant moton s frst estmated by means of a merge procedure. Then moton vectors that can be well represented by ths domnant moton model are dentfed excluded, secondary affne parameters are estmated from remanng blocks. Ths procedure s repeated untl all moton layers are detected. Smlar approaches are also reported n [4]. These methods rely only on moton nformaton n groupng mage pxel or blocks nto moton layers, thus usually result n naccurate segmentaton on moton boundares. As there s a strong dependence between moton estmaton layer segmentaton, wthout good segmentaton to begn wth, moton estmaton results wll not be accurate. Another problem n most pror works s that obect trackng s not adequately addressed. It s assumed that movng obects detected at ndvdual frames automatcally form the temporal obect track. In real-world scenes, obects camera usually do not have unform temporal motons. Obects may show obvous moton n some frames, but show slght or even no motons n other frames. Ths ntroduces nconsstent detecton results n long sequences. To solve these problems, expng our regon segmentaton trackng algorthms proposed n [6], we develop a two-stage movng obects detecton method. Ths method uses regons wth accurate boundares to effectvely mprove moton estmaton results, uses the temporal constrant to acheve more relable obect trackng results over long sequences. In the rest of the paper, we wll frst gve an overvew of the system. The two detecton stages are dscussed n secton 3 4. Experment results dscussons are gven n secton 5. 2. System Overvew The system contans two stages (Fgure 1). In the frst stage, we apply an teratve moton layer detecton process based on the estmaton mergng of affne moton models. Each teraton generates one moton layer. The dfference from exstng methods s that moton models are estmated from spatally segmented color regons nstead of ust pxels or blocks. (1) Iteratve moton layer detecton (2) Obect extracton usng spto-temporal constrants Fgure 1. Two-stage movng obect detecton based on regon segmentaton trackng In the second stage, temporal constrants are appled to detect movng obects n spatal temporal space. Layers n ndvdual frames are lnked together based on characterstcs of ther underlyng regons. One or more layers wll be declared as moton obects accordng to specfc spato-temporal consstency rules. 3. Iteratve Moton Layer Detecton The teratve layer detecton s appled to each ndvdual frame as shown n Fgure 2. The ntal nput to the system ncludes mage regons automatcally extracted usng color edge nformaton. Frst, non-background regons 1 are merged nto moton layers accordng ther affne moton models, e.g. the 8-parameter ego-moton model. Because dfferent regons that belong to the same moton layer may have dfferent estmated parameters due to naccuracy n the ntal dense moton feld, a smple clusterng approach n the affne parametrc space usually does not work well. To solve ths problem, we use the followng dstance measure to compare two neghborng regons R R. Vdeo regons foreground layers background layer where regon Fgure 2. Iteratve moton layer detecton procedure D (, ) = mn( MCErr( R, M ), MCErr( R, M)) (1) M R Moton based regon merge M are the affne moton models of R respectvely. MCErr ( R, M ) s the Detect background layer moton compensaton error of regon R under moton model M. A regon s merged wth ts closest neghbor f ther dstance s below a gven threshold TH_AFF. After regons are merged nto moton layers, we try to dentfy one background layer n each teraton. Ths s based on the assumpton that a foreground layer usually has dscontnued moton felds around most of ts outer boundares, whle the background layer usually has contnuous outer boundares wth neghborng background layers. Boundares of a layer are conssted of pxels that 1 In the frst teraton, all regons are non-background regons Y Detect & exclude background regons Movng layers N Sold: layer boundary Dash: regon boundary

have at least one neghborng pxel not belongng to the layer. Outer boundary s the outmost closed curve that contans the whole layer. Assume b 1,, b n are the n ponts along the outer boundary of a layer l (do not consder pxels on the frame boundary), we defne the followng energy functon to measure ts boundary dscontnuty. 1 bn E l = G( p) n p = b1 G( p) = max( p 1 8 2 7 3 6 4 5 ) (2) where p1-p8 are moton vectors of p s 8 neghbors (clockwse1 at left-upper corner). Ths energy functon s smlar to common edge detecton operators such as the Roberts operator. A layer l s detected as a potental background layer only when E l s smaller than a threshold (e.g., 0.4). If no background layer s detected, the algorthm stops all remanng regons belong to foreground layers. When there are more than one possble background layers, the largest one s chosen as the background, ts affne moton model s used to compensate non-background regons. Those regons wth small compensaton errors are classfed as background, excluded from the next teraton of layer mergng detecton. After multple teratons, multple background layers may be produced, whle multple foreground layers reman. 4. Obect Extracton Usng Spato-Temporal Constrants The foreground layers detected at ndvdual frames may be relable. There are several reasons. Frst, the moton feld moton models may be naccurate. Second, more mportantly, a movng obect may have notceable motons n some frames where t can be easly detected. But n other frames, t may be statc s mstakenly treated as background. A long-term decson through a long-term nterval (e.g., a shot) s necessary to remove such errors acheve relable results. To apply temporal constrants, we frst lnk foreground layers (.e., trackng) n ndvdual frames accordng to ther underlyng regons. A foreground layer L m n frame m s lnked wth a layer L n n frame n, f the followng condton s satsfed: m n m n L I L = max( L k I L l ) (3) k, l m where L k L n l are the kth lth foreground layer n frame m n respectvely. The maxmum s computed over all foreground layers n frame m n. The ntersecton of two layers n Eq (3) s defned as the number of common regons they both contan. Two regons n dfferent frames are sad to be common f one s tracked by moton proecton from another one. In other words, layer L m n frame m s lnked to the layer n a prevous frame (n) that shares the most common regons. Ths process s terated to foreground layers remanng unlnked. In addton, we also defne the lnk as a conductve relatonshp, whch means f layer A B, B C are lnked respectvely, then A C are also lnked. Ths ensures that each local moton layer belongs to one only one temporal layer. The above lnkng or trackng process results n a number of groups of foreground layers. We wll refer these groups as temporal layers below. We use some spatotemporal constrants to valdate these temporal layers. The frst one s the duraton of a temporal layer. Layers wth short duraton are lkely to be nose or background regons, thus are dropped. Secondly, the frame-to-frame changes of center coordnates szes of a temporal layer are examned. If there are large abrupt changes, the temporal layer s not a vald trackng wll not be detected as a foreground obect. Fnally, a morphologcal open close procedure s appled at ndvdual frames to remove small solated regons to fll holes wthn a movng layer. There are some ssues that are not addressed n our approaches. For example, the temporal occluson s not consdered here. When one movng obect s frst movng, then occluded by another obect or background, later appear as a separate movng obect agan, t wll be treated as a new movng obect. However, we can use regon based obect matchng [3] to detect reoccurrence of the same obect. 5. Results Dscusson In Fgure 3, each row ncludes the mage of frame #1, then shows the movng obect trackng results at frame #1, #10, #20 #30. They all have depth varance camera moton (.e. followng the movng obects) n the scenes, resultng n multple moton layers. The frst sequence contans a skater runnng towards the camera. The ce feld has a gradual depth change from near to far. In the second sequence, a person s workng away from the camera n an offce. Cubc walls ext at dfferent depths. The thrd sequence s a brd-eye s vew of a soccer player runnng n the feld. Sequence 4 contans three background layers, whch are the ground, wall crowd. The last sequence contans the sky, the stage the umpng sker. Note that regons wthn segmented obects are shown n rom colors to demonstrate regon segmentaton results. One regon beng tracked at dfferent frames s shown wth the same color. The gradual depth change n the sequence 1 does not cause much problem as the ground s merged nto one large regon n the frst color based regon segmentaton stage. In sequence 2, the cubc walls are tracked as separated regons. Although these regons are classfed as foreground moton layers n some frames, ther temporal duratons are short thus are consdered as background. In the thrd sequence, both the player the grass feld have gradual

depth varances. Smlar to the frst sequence, color segmentatons are proven to be useful n hlng such stuatons. The above three sequences show good trackng results. Some small background regons are falsely ncluded n the sequence 4. These regons are manly from the connectng parts of two background regons, usually have naccurate moton felds. Some foreground pxels are mssed n (5) s because small solated regons are removed n the fnal morphologcal operatons. In summary, our experments demonstrated that longterm regon based movng obect detecton approach s more robust relable compared to exstng approaches that only uses local moton nformaton (e.g., frame-toframe moton feld). The method s desgned to automatcally detect track salent movng obects wthn scenes wth multple moton layers. By usng temporal constrant, we can robustly accurately segment movng obects over a long perod. Our method can also hle obects wth dscontnuous motons (.e., movng n some frames stll n other frames). References: 1. G. Adv, Inherent ambgutes n recoverng 3D moton structure from a nosy flow feld, IEEE Trans. on Pattern Analyss Machne Intellgence, 11:447-489, May 1989. 2. G. D. Borshukov, G. Bozdag, Y. Altunbasak A.M. Tekalp, Moton segmentaton by multstage affne classfcaton, IEEE transacton on mage processng, Vol 6, No 11, Nov 1997. 3. S.-F. Chang, W. Chen, H. Meng, H. Sundaram, D. Zhong, "VdeoQ: An Automated Content-Based Vdeo Search System Usng Vsual Cues", ACM 5th Multmeda Conference, Seattle, WA, Nov. 1997. 4. F. Moschen, F.Dufaux M.Kunt, A new two-stage global/local moton estmaton based on a background/foreground segmentaton, IEEE Proc ICASSP 95, Detrot, MI, May 1995. 5. J.Y.A.Wang E.H.Adelson, Spato-temporal segmentaton of vdeo data, SPIE Proc Image Vdeo Processng II, San Jose, CA, Feb 1994. 6. D. Zhong S.-F.Chang, "Vdeo Obect Model Segmentaton for Content-Based Vdeo Indexng", ISCAS'97, HongKong, June 9-12, 1997. 7. D. Zhong S.-Fu Chang, "AMOS - An Actve MPEG-4 Vdeo Obect Segmentaton System", ICIP- 98, Chcago, Oct. 1998. (1) (2) (3) (4) (5) Fgure 3. Movng obect detecton trackng results of fve mage sequences (detected obects are show at frame #1, #10, #20 #30), test vdeos are kndly provded by actons, sports, adventures Inc. hot shots cool cuts Inc. for research.