Probabilistic Detection and Tracking of Motion Discontinuities

Similar documents
Visual Perception as Bayesian Inference. David J Fleet. University of Toronto

Probabilistic Detection and Tracking of Motion Discontinuities

Image segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding

FACIAL ACTION TRACKING USING PARTICLE FILTERS AND ACTIVE APPEARANCE MODELS. Soumya Hamlaoui & Franck Davoine

J. Vis. Commun. Image R.

Real Time Integral-Based Structural Health Monitoring

STEREO PLANE MATCHING TECHNIQUE

Rao-Blackwellized Particle Filtering for Probing-Based 6-DOF Localization in Robotic Assembly

An Iterative Scheme for Motion-Based Scene Segmentation

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)

A Fast Stereo-Based Multi-Person Tracking using an Approximated Likelihood Map for Overlapping Silhouette Templates

Improved TLD Algorithm for Face Tracking

Robust Visual Tracking for Multiple Targets

Evaluation and Improvement of Region-based Motion Segmentation

4.1 3D GEOMETRIC TRANSFORMATIONS

Learning in Games via Opponent Strategy Estimation and Policy Search

Tracking Appearances with Occlusions

MATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008

LAMP: 3D Layered, Adaptive-resolution and Multiperspective Panorama - a New Scene Representation

Detection and segmentation of moving objects in highly dynamic scenes

Image Content Representation

EECS 487: Interactive Computer Graphics

Dynamic Depth Recovery from Multiple Synchronized Video Streams 1

Occlusion-Free Hand Motion Tracking by Multiple Cameras and Particle Filtering with Prediction

Sam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes.

Video-Based Face Recognition Using Probabilistic Appearance Manifolds

Reinforcement Learning by Policy Improvement. Making Use of Experiences of The Other Tasks. Hajime Kimura and Shigenobu Kobayashi

Real-Time Non-Rigid Multi-Frame Depth Video Super-Resolution

Optimal Crane Scheduling

In Proceedings of CVPR '96. Structure and Motion of Curved 3D Objects from. using these methods [12].

CENG 477 Introduction to Computer Graphics. Modeling Transformations

A Matching Algorithm for Content-Based Image Retrieval

Upper Body Tracking for Human-Machine Interaction with a Moving Camera

NEWTON S SECOND LAW OF MOTION

A Bayesian Approach to Video Object Segmentation via Merging 3D Watershed Volumes

Video Content Description Using Fuzzy Spatio-Temporal Relations

MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES

Nonparametric CUSUM Charts for Process Variability

CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS 1

Definition and examples of time series

Real-time 2D Video/3D LiDAR Registration

MOTION DETECTORS GRAPH MATCHING LAB PRE-LAB QUESTIONS

A Face Detection Method Based on Skin Color Model

Multiple View Discriminative Appearance Modeling with IMCMC for Distributed Tracking

Moving Object Detection Using MRF Model and Entropy based Adaptive Thresholding

Robust 3D Visual Tracking Using Particle Filtering on the SE(3) Group

In fmri a Dual Echo Time EPI Pulse Sequence Can Induce Sources of Error in Dynamic Magnetic Field Maps

Quantitative macro models feature an infinite number of periods A more realistic (?) view of time

DAGM 2011 Tutorial on Convex Optimization for Computer Vision

arxiv: v1 [cs.cv] 25 Apr 2017

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Coded Caching with Multiple File Requests

Mobile Robots Mapping

Assignment 2. Due Monday Feb. 12, 10:00pm.

ACQUIRING high-quality and well-defined depth data. Online Temporally Consistent Indoor Depth Video Enhancement via Static Structure

A Hierarchical Object Recognition System Based on Multi-scale Principal Curvature Regions

Network management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional);

IROS 2015 Workshop on On-line decision-making in multi-robot coordination (DEMUR 15)

Detection of salient objects with focused attention based on spatial and temporal coherence

Track and Cut: simultaneous tracking and segmentation of multiple objects with graph cuts

COSC 3213: Computer Networks I Chapter 6 Handout # 7

A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER

An Improved Square-Root Nyquist Shaping Filter

Michiel Helder and Marielle C.T.A Geurts. Hoofdkantoor PTT Post / Dutch Postal Services Headquarters

Viewpoint Invariant 3D Landmark Model Inference from Monocular 2D Images Using Higher-Order Priors

Design Alternatives for a Thin Lens Spatial Integrator Array

Multi-Target Detection and Tracking from a Single Camera in Unmanned Aerial Vehicles (UAVs)

Detection Tracking and Recognition of Human Poses for a Real Time Spatial Game

Tracking a Large Number of Objects from Multiple Views

Spline Curves. Color Interpolation. Normal Interpolation. Last Time? Today. glshademodel (GL_SMOOTH); Adjacency Data Structures. Mesh Simplification

Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases

1.4 Application Separable Equations and the Logistic Equation

Wheelchair-user Detection Combined with Parts-based Tracking

MOTION TRACKING is a fundamental capability that

Tracking a Large Number of Objects from Multiple Views

Track-based and object-based occlusion for people tracking refinement in indoor surveillance

Who thinks who knows who? Socio-Cognitive Analysis of an Network

Multi-camera multi-object voxel-based Monte Carlo 3D tracking strategies

Simultaneous Precise Solutions to the Visibility Problem of Sculptured Models

Sequential Monte Carlo Tracking for Marginal Artery Segmentation on CT Angiography by Multiple Cue Fusion

Robust Multi-view Face Detection Using Error Correcting Output Codes

Tracking Deforming Objects Using Particle Filtering for Geometric Active Contours

CONTEXT MODELS FOR CRF-BASED CLASSIFICATION OF MULTITEMPORAL REMOTE SENSING DATA

AUTOMATIC 3D FACE REGISTRATION WITHOUT INITIALIZATION

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA

Simultaneous Localization and Mapping with Stereo Vision

Robust Segmentation and Tracking of Colored Objects in Video

Low-Cost WLAN based. Dr. Christian Hoene. Computer Science Department, University of Tübingen, Germany

SLAM in Large Indoor Environments with Low-Cost, Noisy, and Sparse Sonars

It is easier to visualize plotting the curves of cos x and e x separately: > plot({cos(x),exp(x)},x = -5*Pi..Pi,y = );

Chapter 3 MEDIA ACCESS CONTROL

SENSING using 3D technologies, structured light cameras

4 Error Control. 4.1 Issues with Reliable Protocols

Efficient Region Tracking With Parametric Models of Geometry and Illumination

Learning nonlinear appearance manifolds for robot localization

Fill in the following table for the functions shown below.

AML710 CAD LECTURE 11 SPACE CURVES. Space Curves Intrinsic properties Synthetic curves

Visual Indoor Localization with a Floor-Plan Map

TrackNet: Simultaneous Detection and Tracking of Multiple Objects

Transcription:

Probabilisic Deecion and Tracking of Moion Disconinuiies Michael J. Black David J. Flee Xerox Palo Alo Research Cener 3333 Coyoe Hill Road Palo Alo, CA 94304 fblack,fleeg@parc.xerox.com hp://www.parc.xerox.com/fblack,fleeg/ Absrac We propose a Bayesian framework for represening and recognizing local image moion in erms of wo primiive models: ranslaion and moion disconinuiy. Moion disconinuiies are represened using a non-linear generaive model ha explicily encodes he orienaion of he boundary, he velociies on eiher side, he moion of he occluding edge over ime, and he appearance/disappearance of pixels a he boundary. We represen he poserior disribuion over he model parameers given he image daa using discree samples. This disribuion is propagaed over ime using he Condensaion algorihm. To efficienly represen such a high-dimensional space we iniialize samples using he responses of a low-level moion disconinuiy deecor. 1 Inroducion Moion disconinuiies provide informaion abou he posiion and orienaion of surface boundaries in a scene. Addiionally, analysis of he occlusion/disocclusion of pixels a a moion boundary provides informaion abou he relaive deph ordering of he neighboring surfaces. While hese properies have made he deecion of moion disconinuiies an imporan problem in compuer vision, experimenal resuls have been somewha disappoining. As discussed below, previous approaches have reaed moion disconinuiies as noise (violaions of spaial smoohness) or have used approximae models of he moion disconinuiies. In his paper we formulae a generaive model of moion disconinuiies as illusraed in Figure 1. The model includes he orienaion of he boundary, he velociies of he surfaces on eiher side, he foreground/background assignmen, and an offse of he boundary from he cener of he region. Wih his explici model, we can predic he visibiliy of occluded and disoccluded pixels so ha hese pixels can be excluded when esimaing he probabiliy of a paricular model. Moreover, an explici displacemen parameer background n θ foreground x c d Disocclusion Occlusion -1 Figure 1: Model of an occlusion boundary, parameerized by foreground and background velociies, ~ and ~,anorienaion wih normal ~n, and a signed disance d from he neighborhood cener ~x c. Wih his model we predic which pixels are visible beween frames a imes and 1. allows us o predic he locaion of he edge, and hence rack is movemen hrough a region of ineres. Tracking he moion of he edge allows foreground/background ambiguiies o be resolved. Explici generaive models such as his have no previously been used for deecing moion disconinuiies due o he non-lineariy of he model and he difficuly of esimaing he model parameers. To solve his problem we exploi a probabilisic sampling-based mehod for esimaing image moion [6]. Adoping a Bayesian framework, we define he likelihood of observing he image daa given he parameers of he generaive model. This likelihood disribuion can be efficienly evaluaed for a paricular se of parameers. The prior probabiliy disribuion over he parameers is defined as a mixure of a emporal prior and an iniializaion prior. The emporal prior is defined in erms of he poserior disribuion a he previous ime insan and he emporal dynamics of he disconinuiy model. The iniializaion prior incorporaes predicions from a low-level moion feaure deecor [8]. The poserior disribuion over he parameer space, condiioned on image measuremens, is ypically non-gaussian. The disribuion is represened using facored sampling and is prediced and updaed over ime using he Condensaion algorihm o propagae condiional probabiliy densiies [12]. Given he relaively high dimensional parameer space,

1.0 0.8 0.6 0.4 0.2 0.0 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 In. Conf. on Compuer Vision (ICCV 99), Corfu, Greece, Sep. 1999. cfl IEEE 1999 2 Figure 2: Muli-sage probabilisic model. Low-level deecors help iniialize a sampled disribuion. Likelihoods are compued direcly from pairs of images. The Condensaion algorihm is used o incremenally predic and updae he poserior disribuion over model parameers. naive sampling mehods will be exremely inefficien. Bu if he samples can be direced o he appropriae porion of he parameer space, small numbers of samples can well characerize such disribuions [14]. I is for his purpose ha we use an iniializaion prior (as shown in Figure 2). A he low level, here is a se of dense moion disconinuiy deecors ha signal he presence of poenial occlusion boundaries and give esimaes of heir orienaions and velociies. In he example here, hese deecors are based on approximae linear models of moion disconinuiies, he coefficiens of which can be esimaed wih robus opical flow echniques [8]. Neighborhoods of hese filer oupus provide a prior disribuion over model parameers ha is sampled from when iniializing non-linear models a he higher level. The likelihood of he non-linear model is hen compued direcly from he image daa. We illusrae he mehod on naural images and show how he Bayesian formulaion and condiional densiy propagaion allow moion disconinuiies o be deeced and racked over muliple frames. Explici image feaure models, combined wih a Bayesian formulaion, and he Condensaion algorihm offer a new se of ools for image moion esimaion and inerpreaion. 2 Previous Work Previous approaches for deecing occlusion boundaries have ofen reaed he boundaries as a form of noise, ha is, as he violaion of a smoohness assumpion. This approach is aken in regularizaion schemes where robus saisics, weak coninuiy, or line processes are used o disable smoohing across moion disconinuiies [4, 11]. Similarly, parameerized models of image moion (e.g. ranslaional, affine, or planar) assume ha flow is represened by a low-order polynomial. Robus regression [4, 19] and mixure models [1, 15, 23] have been used o accoun for he muliple moions ha occur a moion boundaries bu hese mehods fail o explicily model he boundary and is spaioemporal srucure. Numerous mehods have aemped o deec disconinuiies in opical flow fields by analyzing local disribuions of flow [21] or by performing edge deecion on he flow field [18, 20, 22]. I has ofen been noed ha hese mehods are sensiive o he accuracy of he opical flow and ha accurae opical flow is hard o esimae wihou prior knowledge of he occlusion boundaries. Oher mehods have focused on deecing occlusion from he srucure of a correlaion surface [3], or of he spaioemporal brighness paern [7, 9, 17]. Sill ohers have used he presence of unmached feaures o deec dynamic occlusions [16]. None of hese mehods explicily model he image moion presen a a moion feaure, and have no proved sficienly reliable in pracice. For example, hey do no explicily model which image pixels are occluded or disoccluded beween frames. This means ha hese pixels, which in one frame have no mach in he nex frame, are reaed as noise. Wih our explici non-linear model, hese pixels can be prediced and ignored. Addiionally, mos of he above mehods have no explici emporal model. Wih our generaive model, we can predic he moion of he occlusion boundary over ime and hence inegrae informaion over a number of frames. When he moion of he disconinuiy is consisen wih ha of he foreground we can explicily deermine he foreground/background relaionships beween he surfaces. 3 Generaive Model For he purposes of his work, we decompose an image ino a grid of circular neighborhoods in which we esimae moion informaion. We assume ha he moion in any region can be modeled by ranslaion (for simpliciy) or by dynamic occlusion. Generaive models of hese moions are used o compue he likelihood of observing wo successive images given a moion model and is parameer values. The ranslaion model has wo parameers, i.e., he horizonal and verical componens of he velociy, denoed ~u0 = (u0;v0). For poins ~x a ime in a region R, assuming brighness consancy, he ranslaion model is I(~x 0 ;)=I(~x; 1) + ν(~x;); (1) where ~x 0 = ~x + ~u0. In words, he inensiy a locaion ~x 0 a ime is equal o ha a locaion ~x a ime 1 plus noise ν. We assume here ha he noise is whie and Gaussian wih a mean of zero and a sandard deviaion of ff n. The occlusion model conains 6 parameers: he edge orienaion, he wo velociies, and he disance from he cener of he neighborhood o he edge. In our parameerizaion,

In. Conf. on Compuer Vision (ICCV 99), Corfu, Greece, Sep. 1999. cfl IEEE 1999 3 as shown in Figure 1, he orienaion, 2 [ ß; ß), specifies he direcion of a uni vecor, ~n = (cos( ); sin( )),ha is normal o he occluding edge. We represen he locaion of he edge by is signed perpendicular disance d from he cener of he region (posiive meaning in he direcion of he normal). Thus, he edge is normal o ~n and passes hrough he poin ~x c + d ~n,where~x c is he cener of he region. Relaive o he cener of he region, we define he foreground o be he side o which he normal ~n poins. Therefore, a poin ~x is on he foreground if (~x ~x c ) ~n > d. Similarly, poins on he background saisfy (~x ~x c ) ~n < d. Finally, we denoe he velociies of he foreground (occluding) and background (occluded) sides by ~ and ~. Assuming ha he occluding edge moves wih he foreground velociy, he occurrences of occlusion and disocclusion depend solely on he difference beween he background and foreground velociies. In paricular, occlusion occurs when he background moves faser han he foreground in he direcion of he edge normal. If n = ~ ~n and n = ~ ~n denoe he wo normal velociies, occlusion occurs when n n > 0. Disocclusion occurs when n n < 0. The widh of he occluded/disoccluded region, measured normal o he occluding edge, is jn n j. Wih his model, parameerized by ( ; ~ ; ~ ;d),apixel ~x a ime 1 moves o locaion ~x 0 a ime, as follows: ρ ~x ~x 0 + ~ if (~x ~x c ) ~n > d = (2) ~x + ~ if (~x ~x c ) ~n < d + w where w = max(n n ; 0) is he widh of he occluded region. Finally, wih ~x 0 defined by (2), he brighness consancy assumpion for a moion edge is given by (1). Referring o Figure 1, in he case of disocclusion, a circular neighborhood a ime 1 will map o a pair of regions a ime, separaed by he widh of he disocclusion region jn n j. Conversely, in he case of occlusion, a pair of neighborhoods a ime 1, separaed by jn n j, map o a circular neighborhood a ime. Being able o look forward or backwards in ime in his way allows us o rea occlusion and disocclusion symmerically. 4 Probabilisic Framework For a given image region and images up o ime, wewish o esimae he poserior probabiliy disribuion over models and model parameers a ime. This disribuion is no direcly observable and, as described below, we expec i o be muli-modal. I is discree over he model ypes and coninuous over model parameers. Le saes be denoed by s =(μ; ~p),whereμis he model ype (ranslaion or occlusion), and ~p is a parameer vecor appropriae for he model ype. For he ranslaion model ~p =(~u0), and for he occlusion model ~p =( ;~ ; ~ ;d). Our goal is o find he poserior probabiliy disribuion over saes a ime given he measuremen hisory up o ime, i.e., p(s j ~ Z ). Here, ~ Z =(z ; :::; z0) denoes he measuremen hisory. Similarly, le ~ S =(s ; :::; s0) denoe he sae hisory (a sochasic process). Following [12], we assume ha he emporal dynamics of he moion models form a Markov chain, in which case p(s j ~ S 1) =p(s j s 1). We also assume condiional independence of he observaions and he dynamics, so ha, given s, he curren observaion z and previous observaions ~ Z 1 are independen. Wih hese assumpions one can show ha he poserior disribuion p(s j ~ Z ) can be facored and reduced using Bayes rule o p(s j ~ Z ) = kp(z js ) p(s j ~ Z 1) (3) where k is a consan facor o ensure ha he disribuion inegraes o one. Here, p(z js ) represens he likelihood of observing he curren measuremen given he curren sae, while p(s j Z ~ 1) is referred o as a emporal prior (he predicion of he curren sae given all previous observaions). According o he generaive models discussed above (1), he likelihood of observing he curren image pair given he curren sae is normally disribued. The curren sae defines a mapping from visible pixels in one frame o hose in he nex. The inensiy difference beween a corresponding pair of pixel locaions is aken o be normally disribued wih a mean of zero and a sandard deviaion of ff n. Using Bayes rule and he condiional independence assumed above, i is sraighforward o show ha he emporal prior can be wrien in erms of emporal dynamics ha propagae saes from ime 1 o ime and he poserior disribuion over saes a ime 1. In paricular, p(s j ~ Z 1) = Z p(s js 1) p(s 1j ~ Z 1) d s 1: (4) The probabiliy disribuion p(s js 1) embodies he emporal dynamics, describing how saes evolve hrough ime. For now assume ha he model ype (i.e. ranslaion or occlusion) remains consan beween frames (his can be exended o allow ransiions beween model ypes [5, 13]). For he ranslaional model, we assume he velociy a ime equals ha a ime 1 plus Gaussian noise: p(s js 1) =G ffu ( ~u0) (5) where G ffu denoes a mean-zero Gaussian wih sandard deviaion ff u,and ~u0 = ~u0; ~u0; 1 denoes he emporal velociy difference. Similarly, for he emporal dynamics of he occlusion model we assume ha he expeced orienaion and velociies remain consan, while he locaion of he edge propagaes wih he velociy of he foreground. Moreover, independen noise is added o each. Therefore, we can express he condiional p(s js 1) as G ffu ( ~ ) G ffu ( ~ ) G ffd ( d ~n ~; 1) G w ( ) (6) ff

In. Conf. on Compuer Vision (ICCV 99), Corfu, Greece, Sep. 1999. cfl IEEE 1999 4 Image Orienaion Mean Horizonal Velociy Horizonal Velociy Difference Figure 3: One frame of he Pepsi Sequence, wih responses from he low-level moion edge deecor, which feed he iniializaion prior. The moion is primarily horizonal. where G w denoes a wrapped-normal (for circular disribuions), and as above, = 1 and d = d d 1. 5 Low-Level Moion-Edge Deecors Unlike a convenional Condensaion racker for which he prior is derived by propagaing he poserior from he previous ime, we have added an iniializaion prior ha provides a form of boom-up informaion o iniialize new saes. This is useful a ime 0 when no poserior is available from a previous ime insan. I is also useful o help avoid geing rapped a local maxima hereby missing he occurrence of novel evens ha migh no have been prediced from he poserior a he previous ime. This use of boom-up informaion, along wih he emporal predicion of Condensaion, allows us o effecively sample he mos ineresing porions of he sae-space. To iniialize new saes and provide a disribuion over heir parameers from which o sample, we use a mehod described by Flee e al. [8] for deecing moion disconinuiies. This approach uses a robus, gradien-based opical flow mehod wih a linear parameerized moion model. Moion edges are expressed as a weighed sum of basis flow fields, he coefficiens of which are esimaed using a areabased regression echnique. Flee e al. hen solve for he parameers of he moion edge ha are mos consisen (in a leas squares sense) wih he linear coefficiens. Figure 3 shows an example of applying his mehod o an image sequence in which a Pepsi can ranslaes horizonally relaive o he background. The mehod provides a mean velociy esimae a each pixel (i.e., he average of he velociies on each side of he moion edge). This is simply he ranslaional velociy when no moion edge is presen. A confidence measure, c(~x) 2 [0; 1] can be used o deermine where edges are mos likely, and is compued from he squared error in fiing a moion edge using he linear coefficiens. The boom wo images in Figure 3 show esimaes for he orienaion of he edge and he horizonal difference velociy across he edge a all poins where c(~x) > 0:5. While he mehod provides good approximae esimaes of moion boundaries, i produces false posiives and he parameer esimaes are noisy, wih esimaes of disocclusion being more reliable han hose of occlusion. Also, i does no deermine which is he foreground side and hence does no predic he velociy of he occluding edge. Despie hese weaknesses, i is a relaively quick, bu someimes error prone, source of informaion abou he presence of moion disconinuiies. Iniializaion Prior. When iniializing a new sae we use he disribuion of confidence values c(~x) o firs decide on he moion ype (ranslaion or disconinuiy). If a disconinuiy was deeced, we would expec some fracion of confidence values, c(~x), wihin our region of ineres, o be high. We herefore rank order he confidence values wihin he region and le he probabiliy of a disconinuiy sae be he 95 h percenile confidence value, denoed C95. Accordingly, he probabiliy of ranslaion is hen 1 C95. Given a disconinuiy model, we assume ha moion boundary locaions are disribued according o he confidence values in he region (pixel locaions wih large c(~x) are more likely). Given a spaial posiion, he deecor a ha posiion provides esimaes of he edge orienaion and he image velociy on each side, bu does no specify which side is he foreground. Thus, he probabiliy disribuion over he sae space, condiioned on he deecor esimaes and locaion, will have wo disinc modes, one for each of he wo possible foreground assignmens. We ake his disribuion o be a mixure of wo Gaussians which are separable wih sandard deviaions 1:5ff or he velociy axes, 5ff for he orienaion axis, and 2ff d for he posiion axis. The sandard deviaions are larger han hose use in he emporal dynamics described in Secion 4 because we expec greaer noise from he low-level esimaes. To generae a ranslaional model, we choose a spaial posiion according o he disribuion of 1 c(~x). The disribuion over ranslaional velociies, given he deecor esimae and spaial posiion, is hen aken o be a Gaussian disribuion cenered a he mean velociy esimae of he deecor a ha locaion. The Gaussian disribuion has sandard deviaions of 1:5ff u along each velociy axis. 6 Compuaional Model In his secion we describe he compuaional embodimen of he probabilisic framework above. The non-linear naure of he disconinuiy model means ha p(s j Z ~ ) will no be Gaussian and we represen his disribuion using a discree se, fs (i) ;i =1;:::;Sg, of random samples [6, 12, 24]. The poserior is compued by choosing discree samples from he prior and hen evaluaing heir likelihood. Normal-

In. Conf. on Compuer Vision (ICCV 99), Corfu, Greece, Sep. 1999. cfl IEEE 1999 5 izing he likelihoods of he samples so ha hey sum o one produces weighs ß (n) : ß (n) p(z js (n) ) = P S i=1 p(z js (i) : ) The se of S pairs, (s (n) ;ß (n) ), provides a fair sampled represenaion of he poserior disribuion as S!1[12]. 6.1 Likelihood To evaluae he likelihood p(z js (i) ) of a paricular sae, we draw a uniform random sample R of visible image locaions (as consrained by he generaive model and he curren sae). Typically we sample 50% of he pixels in he region (cf. [2]). Given his subse of pixels, we compue he likelihood as p(z js (i) )= 1 p 2ßffn exp " 1 X 2ff 2jRj n ~x2r # E(~x;; s (i) ) 2 (7) where E(~x;; s (i) )=I(~x 0 ;) I(~x; 1), jrj is he number of pixels in R, and he warped image locaion ~x 0 is a funcion of he sae s (i), e.g., as in (2). The warped image value I(~x 0 ;) is compued using bi-linear inerpolaion. 6.2 Prior The prior used here is a mixure of he emporal prior and he iniializaion prior. In he experimens ha follow we use mixure proporions of 0.8 and 0.2 respecively; ha is, 80% of he samples are drawn from he emporal prior. Temporal Prior. We draw samples from he emporal prior by firs sampling from he poserior, p(s 1j Z ~ 1), o choose a paricular sae s (n) 1. The discree represenaion of he poserior means ha his is done by consrucing a cumulaive probabiliy disribuion using he ß 1 (n) and hen sampling from i. Given s (n) 1, we hen sample from he dynamics, p(s js (n) 1 ), which, as explained above, is a normal disribuion abou he prediced sae. This is implemened using he following dynamics which propagae he sae forward and add a sample of Gaussian noise: ~u = ~u 1 + N (0;ff u ) (8) d = d 1 + ~; 1 + N (0;ff d ) (9) = [ 1 + N (0;ff )] mod 2ß (10) where ~u denoes any one of ~u0;, ~;,or ~;. Wih a wrapped-normal disribuion over angles, he orienaion 1 is propagaed by adding Gaussian noise and hen removing an ineger muliple of 2ß so ha 2 [ ß; ß). This sampling process has he effec of diffusing he parameers of he saes over ime o perform a local search of he parameer space. Iniializaion Prior. From he iniializaion prior for a paricular neighborhood, disconinuiy saes are drawn wih probabiliy C95. In hese insances, we sample from he discree se of confidence values, c(~x), o choose a spaial posiion for he moion edge wihin he region. Then, as explained above, since he assignmen of foreground/background is ambiguous, he disribuion of he remaining parameers is aken o be an equal mixure of wo Gaussians from which a sample is drawn. Translaional models are drawn wih probabiliy 1 C95, from which we sample a spaial posiion according o he disribuion of 1 c(~x). We hen sample from a Gaussian disribuion over image velociies ha is cenered a he mean velociy esimae of he deecor. 6.3 Algorihm Summary Iniially, a ime 0, aseofs samples is drawn from he iniializaion prior, heir likelihoods are compued, and normalized o give he weighs ß 0 (n). A each subsequen ime, he algorihm hen repeas he process of sampling from he combined prior, compuing he likelihoods, and normalizing. Noe ha given he sampled approximaion o he disribuion p(s j Z ~ ), we can compue he expeced value for some sae parameer, f (s ),as SX E[f (s )j Z ~ ]= f (s (n) )ß (n) : n=1 Care needs o be aken in compuing his for he orienaion as here are ofen 2 modes 180 degrees apar. For displaying resuls, we compue he mean sae for each ype of model (ranslaion or disconinuiy) by compuing he expeced value of he parameers of he sae divided by he sum of all normalized likelihoods for ha sae. These mean saes can be overlaid on he image. Deecion can be performed by comparing he sum of he likelihoods for each model ype. Given he way he Condensaion algorihm allocaes samples, his is no he mos reliable measure of how well each model fis he daa. If he likelihood of a model drops rapidly, he disribuion may emporarily have many low likelihood saes allocaed o ha porion of he sae space. The combined likelihood of hese saes may easily be greaer han he likelihood of a new model ha does a much beer job of fiing he daa. Insead, we herefore compue and compare he likelihoods of he mean models o deermine which model is more likely. 7 Experimenal Resuls We illusrae he mehod wih experimens on naural images. For hese experimens, he sandard deviaion, ff n,of he image noise was 7.0, we use circular image regions wih a 16 pixel radius, and we use 3500 sae samples o represen he disribuion in each image region.

In. Conf. on Compuer Vision (ICCV 99), Corfu, Greece, Sep. 1999. cfl IEEE 1999 6 A B C Probabiliy 0.032 Probabiliy 0.09 Probabiliy 0.18 Probabiliy 0.030 D 0.016 0.016 Velociy -4.00-2.00 2.00-3.00-2.00-1.00 1.00 2.00 3.00 Orienaion -4.00-2.00 2.00 Velociy 4-15.00-5.00 5.00 15.00 A: B: C: u0 D: d (offse) Figure 4: Pepsi Sequence. Disconinuiy (filled) and ranslaional (empy) models shown superimposed on image. Various marginal probabiliy disribuions for each region. Offse 7.1 Pepsi Sequence Since we represen disribuions over models and model parameers, i is ofen difficul o visualize he resuls. Figure 4 shows marginal probabiliy disribuions for differen parameers a various image locaions. For each region we indicae which model was mos likely. Translaional modelsareshownasempycircles(figure4c). For disconinuiies, we compue he mean sae hen sample pixel locaions from he generaive model; pixels ha lie on he foreground are shown as whie and background pixels as black. Figure 4 A, B, D, shows hree such regions. To he righ of he image are views of he marginal probabiliy disribuions for various parameers which illusrae he complexiy of he poserior. Shown for region A is he probabiliy of he horizonal velociy of he foreground; he ambiguiy regarding foreground/background means ha here are wo possible inerpreaions a 1:7 and 0:8 pixels per frame. The disribuion has muliple peaks, wo of which correspond o hese inerpreaions and boh of which are significan. The foreground/background ambiguiy is pronounced in region B where he moion is parallel o he edge; his resuls in srong bi-modaliy of he disribuion wih respec o he edge normal,, since he direcion of he normal poins owards he foreground. In region C here is no ambiguiy wih respec o he horizonal velociy of he ranslaion model as shown by he igh peak in he disribuion. Finally we plo he offse of he edge, d, for region D. In his case, he disribuion also non-gaussian and skewed o one side of he boundary. Figure 5 illusraes he emporal behavior of he mehod. Noe ha he correc assignmen of model ype is made a each frame, he mean orienaion appears accurae, and he boundaries of he Pepsi can are racked. In he firs frame, he assignmen of foreground for region A is incorrec which is no surprising given ha i is ambiguous from wo frames. By propagaing informaion over ime, however, he iniial ambiguiies are resolved since he moion of he edge over ime is consisen wih he foreground being on he righ. Noe ha he ambiguiy remains for region B, as can be seen by inspecing he disribuion in Figure 4, despie he fac ha he displayed mean value maches he cor- Horizonal Velociy Confidence Horizonal Velociy Difference Edge Orienaion Figure 6: Low level deecor responses for one pair of frames in he Flower Garden sequence. rec inerpreaion. In general, propagaion of neighboring disribuions would be needed o resolve such ambiguiies. 7.2 Flower Garden Sequence Resuls on he Flower Garden sequence are shown in Figure 7. The low-level deecor responses for he iniializaion prior are shown in Figure 6; hey provide reasonable iniial hypoheses bu do no precisely localize he edge. Figure 7 shows he resuls of our mehod wihin several image regions. Regions C, D, E, andf correcly model he ree boundary (boh occlusion and disocclusion) and, afer muliple frames, correcly assign he ree region o he foreground. Noe ha he orienaion of he edge correcly maches ha of he ree and ha, afer he edge passes hrough he region, he bes model swiches from disconinuiy o ranslaion. The boom of Figure 7 shows he probabiliy disribuion corresponding o he horizonal velociy of he foreground in region C. A frames 2 and 3, here are wo clear peaks corresponding he moions of he foreground and background indicaing ha his relaionship is ambiguous. Frames 4 6 illusrae how he disribuion has formed around he correc moion corresponding o he foreground. By frame 7, he peak has diminished as he ranslaion model becomes more likely.

In. Conf. on Compuer Vision (ICCV 99), Corfu, Greece, Sep. 1999. cfl IEEE 1999 7 Figure 5: Pepsi Sequence. Top: mean saes a frames 1, 3, 4, 7, and 9. Boom: deail showing region A Region B corresponds o ranslaion and is correcly modeled as such. While ranslaion can be equally well accouned for by he disconinuiy model, he low-level deecors do no respond in his region and hence he disribuion is iniialized wih more samples corresponding o he ranslaional model. Region A is more ineresing; if he sky were compleely uniform, his region would also be modeled as ranslaion. Noe, however, ha here are significan low-level deecor responses in his area (Figure 6) due o he fac ha he sky is no uniform. The probabiliy of ranslaion and disconinuiy are roughly equal here and he displayed model flips back and forh beween hem. For he disconinuiy model, he orienaion corresponds o he orienaion of he ree branches in he region. 8 Conclusions Work on image moion esimaion has ypically exploied limied models of spaial smoohness. Our goal is o move owards a richer descripion of image moion using a vocabulary of moion primiives. Here we describe a sep in ha direcion wih he inroducion of an explici non-linear model of moion disconinuiies and a Bayesian framework for represening a poserior probabiliy disribuion over models and model parameers. Unlike previous work ha aemps o find a maximum-likelihood esimae of image moion, we represen he probabiliy disribuion over he parameer space using discree samples. This faciliaes he correc Bayesian propagaion of informaion over ime when ambiguiies make he disribuion non-gaussian. The applicabiliy of discree sampling mehods o high dimensional spaces, as explored here, remains an open issue. We find ha an appropriae iniializaion prior is needed o direc samples o he porions of he sae space where he soluion is likely. We have proposed and demonsraed such a prior here bu he more general problem of formulaing such priors and incorporaing hem ino a Bayesian framework remains open. This work represens wha we expec o be a rich area of inquiry. For example, we can now begin o hink abou he spaial ineracion of hese local models. For his we migh formulae a probabilisic spaial grammar of moion feaures and how hey relae o heir neighbors in space and ime. This requires incorporaing he spaial propagaion of probabiliies in our Bayesian framework. This also raises he quesion of wha is he righ vocabulary for describing image moion and wha role learning may play in formulaing local models and in deermining spaial ineracions beween hem (see [10]). In summary, he echniques described here (generaive models, Bayesian propagaion, and sampling mehods) will permi us o explore problems wihin moion esimaion ha were previously inaccessible. Acknowledgemens WehankAllanJepsonformany discussions abou moion disconinuiies, generaive models, sampling mehods, and probabiliy heory. References [1] S. Ayer and H. Sawhney. Layered represenaion of moion video using robus maximum-likelihood esimaion of mixure models and MDL encoding. ICCV, pp. 777 784, 1995. [2] A. Bab-Hadiashar and D. Suer. Opic flow calculaion using robus saisics. CVPR, pp. 988 993, 1997. [3] M. Black and P. Anandan. Consrains for he early deecion of disconinuiy from moion. AAAI,pp.1060 1066, 1990. [4] M. Black and P. Anandan. The robus esimaion of muliple moions: Parameric and piecewise-smooh flow fields. CVIU, 63(1):75 104, Jan. 1996.

In. Conf. on Compuer Vision (ICCV 99), Corfu, Greece, Sep. 1999. cfl IEEE 1999 8 Probabiliy Probabiliy Probabiliy Probabiliy Probabiliy Probabiliy -6.00-4.00-2.00-6.00-4.00-2.00-6.00-4.00-2.00-6.00-4.00-2.00-6.00-4.00-2.00-6.00-4.00-2.00 Figure 7: Flower Garden sequence (frames 2 7). Mos likely mean models overlaid on images. Boom: evoluion of he marginal probabiliy of he foreground velociy in region C. [5] M. Black and A. Jepson, A probabilisic framework for maching emporal rajecories: Condensaionbased recogniion of gesures and expressions. ECCV, vol. 1406, LNCS, pp. 909 924, 1998. [6] M. Black. Explaining opical flow evens wih parameerized spaio-emporal models. CVPR, pp. 326 332, 1999. [7] G. Chou. A model of figure-ground segregaion from kineic occlusion. ICCV, pp. 1050 1057, 1995. [8] D. Flee, M. Black, and A. Jepson. Moion feaure deecion using seerable flow fields. CVPR pp. 274 281, 1998. [9] D. Flee and K. Langley. Compuaional analysis of non-fourier moion. Vision Res., 22:3057 3079, 1994. [10] W. Freeman and E. Paszor, Learning o esimae scenes from images. NIPS, 1999. [11] J. Harris, C. Koch, E. Saas, and J. Luo. Analog hardware for deecing disconinuiies in early vision. IJCV, 4(3):211 223, June 1990. [12] M. Isard and A. Blake. Conour racking by sochasic propagaion of condiional densiy. ECCV, vol. 1064, LNCS, pp. 343 356, 1996. [13] M. Isard and A. Blake. A mixed-sae Condensaion racker wih auomaic model-swiching. ICCV, pp. 107 112, 1998. [14] M. Isard and A. Blake. ICondensaion: Unifying lowlevel and high-level racking in a sochasic framework. ECCV, vol. 1406, LNCS, pp. 893 908, 1998. [15] A. Jepson and M. Black. Mixure models for opical flow compuaion. In Pariioning Daa Ses: Wih Applicaions o Psychology, Vision and Targe Tracking, pp. 271 286, DIMACS Workshop, April 1993. [16] K. Much and W. Thompson. Analysis of accreion and deleion a boundaries in dynamic scenes. PAMI, 7(2), pp. 133 138, 1985. [17] S. Niyogi. Deecing kineic occlusion. ICCV, pp. 1044 1049, 1995. [18] J. Poer. Scene segmenaion using moion informaion. IEEE Trans. SMC, 5:390 394, 1980. [19] H. Sawhney and S. Ayer. Compac represenaions of videos hrough dominan and muliple moion esimaion. IEEE PAMI, 18(8):814 831, 1996. [20] B. Schunck. Image flow segmenaion and esimaion by consrain line clusering. IEEE PAMI, 11(10):1010 1027, Oc. 1989. [21] A. Spoerri and S. Ullman. The early deecion of moion boundaries. ICCV, pp. 209 218, 1987. [22] W. Thompson, K. Much, and V. Berzins. Dynamic occlusion analysis in opical flow fields. IEEE PAMI, 7(4):374 383, July 1985. [23] Y. Weiss and E. Adelson. A unified mixure framework for moion segmenaion: Incorporaing spaial coherence and esimaing he number of models. CVPR, pp. 321 326, 1996. [24] A. Yuille, P-Y. Burgi, and N. Grzywacz. Visual moion esimaion and predicion: A probabilisic nework model for emporal coherence. ICCV, pp. 973 978, 1998.