A Unified Approach to Moving Object Detection in 2D and 3D Scenes

Similar documents
Optical Flow for Large Motion Using Gradient Technique

Voting-Based Grouping and Interpretation of Visual Motion

Detection and Recognition of Alert Traffic Signs

Segmentation of Casting Defects in X-Ray Images Based on Fractal Dimension

Prof. Feng Liu. Fall /17/2016

(a, b) x y r. For this problem, is a point in the - coordinate plane and is a positive number.

Controlled Information Maximization for SOM Knowledge Induced Learning

3D Reconstruction from 360 x 360 Mosaics 1

17/5/2009. Introduction

Color Correction Using 3D Multiview Geometry

Mono Vision Based Construction of Elevation Maps in Indoor Environments

View Synthesis using Depth Map for 3D Video

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012

Massachusetts Institute of Technology Department of Mechanical Engineering

Positioning of a robot based on binocular vision for hand / foot fusion Long Han

Multi-azimuth Prestack Time Migration for General Anisotropic, Weakly Heterogeneous Media - Field Data Examples

Gravitational Shift for Beginners

IP Network Design by Modified Branch Exchange Method

Image Registration among UAV Image Sequence and Google Satellite Image Under Quality Mismatch

A modal estimation based multitype sensor placement method

A Novel Image-Based Rendering System With A Longitudinally Aligned Camera Array

Assessment of Track Sequence Optimization based on Recorded Field Operations

A New and Efficient 2D Collision Detection Method Based on Contact Theory Xiaolong CHENG, Jun XIAO a, Ying WANG, Qinghai MIAO, Jian XUE

= dv 3V (r + a 1) 3 r 3 f(r) = 1. = ( (r + r 2

Ego-Motion Estimation on Range Images using High-Order Polynomial Expansion

Augmented Reality. Integrating Computer Graphics with Computer Vision Mihran Tuceryan. August 16, 1998 ICPR 98 1

p w2 μ 2 Δp w p 2 epipole (FOE) p 1 μ 1 A p w1

Frequency Domain Approach for Face Recognition Using Optical Vanderlugt Filters

Development and Analysis of a Real-Time Human Motion Tracking System

An Unsupervised Segmentation Framework For Texture Image Queries

Image Enhancement in the Spatial Domain. Spatial Domain

Point-Biserial Correlation Analysis of Fuzzy Attributes

Extended Perspective Shadow Maps (XPSM) Vladislav Gusev, ,

Fifth Wheel Modelling and Testing

Improved Fourier-transform profilometry

A Neural Network Model for Storing and Retrieving 2D Images of Rotated 3D Object Using Principal Components

EYE DIRECTION BY STEREO IMAGE PROCESSING USING CORNEAL REFLECTION ON AN IRIS

Cellular Neural Network Based PTV

Extract Object Boundaries in Noisy Images using Level Set. Final Report

A Two-stage and Parameter-free Binarization Method for Degraded Document Images

Computer Graphics and Animation 3-Viewing

A Shape-preserving Affine Takagi-Sugeno Model Based on a Piecewise Constant Nonuniform Fuzzification Transform

A Memory Efficient Array Architecture for Real-Time Motion Estimation

5 4 THE BERNOULLI EQUATION

A Mathematical Implementation of a Global Human Walking Model with Real-Time Kinematic Personification by Boulic, Thalmann and Thalmann.

Towards Adaptive Information Merging Using Selected XML Fragments

Lecture 27: Voronoi Diagrams

Conservation Law of Centrifugal Force and Mechanism of Energy Transfer Caused in Turbomachinery

All lengths in meters. E = = 7800 kg/m 3

Adaptation of Motion Capture Data of Human Arms to a Humanoid Robot Using Optimization

Illumination methods for optical wear detection

Improvement of First-order Takagi-Sugeno Models Using Local Uniform B-splines 1

Transmission Lines Modeling Based on Vector Fitting Algorithm and RLC Active/Passive Filter Design

9-2. Camera Calibration Method for Far Range Stereovision Sensors Used in Vehicles. Tiberiu Marita, Florin Oniga, Sergiu Nedevschi

Communication vs Distributed Computation: an alternative trade-off curve

FACE VECTORS OF FLAG COMPLEXES

Obstacle Avoidance of Autonomous Mobile Robot using Stereo Vision Sensor

Cardiac C-Arm CT. SNR Enhancement by Combining Multiple Retrospectively Motion Corrected FDK-Like Reconstructions

Shortest Paths for a Two-Robot Rendez-Vous

A NOVEL VOLUME CT WITH X-RAY ON A TROUGH-LIKE SURFACE AND POINT DETECTORS ON CIRCLE-PLUS-ARC CURVE

ART GALLERIES WITH INTERIOR WALLS. March 1998

Goal. Rendering Complex Scenes on Mobile Terminals or on the web. Rendering on Mobile Terminals. Rendering on Mobile Terminals. Walking through images

An Extension to the Local Binary Patterns for Image Retrieval

Motion Estimation. Yao Wang Tandon School of Engineering, New York University

Spiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks

A ROI Focusing Mechanism for Digital Cameras

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives

A Novel Automatic White Balance Method For Digital Still Cameras

2. PROPELLER GEOMETRY

Research Article. Regularization Rotational motion image Blur Restoration

4.2. Co-terminal and Related Angles. Investigate

ADDING REALISM TO SOURCE CHARACTERIZATION USING A GENETIC ALGORITHM

RANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES

New Algorithms for Daylight Harvesting in a Private Office

POMDP: Introduction to Partially Observable Markov Decision Processes Hossein Kamalzadeh, Michael Hahsler

Multiview plus depth video coding with temporal prediction view synthesis

A Minutiae-based Fingerprint Matching Algorithm Using Phase Correlation

An Assessment of the Efficiency of Close-Range Photogrammetry for Developing a Photo-Based Scanning Systeminthe Shams Tabrizi Minaret in Khoy City

A VECTOR PERTURBATION APPROACH TO THE GENERALIZED AIRCRAFT SPARE PARTS GROUPING PROBLEM

On the Forwarding Area of Contention-Based Geographic Forwarding for Ad Hoc and Sensor Networks

IP Multicast Simulation in OPNET

Prioritized Traffic Recovery over GMPLS Networks

Layered Animation using Displacement Maps

Signal integrity analysis and physically based circuit extraction of a mounted

INFORMATION DISSEMINATION DELAY IN VEHICLE-TO-VEHICLE COMMUNICATION NETWORKS IN A TRAFFIC STREAM

10/29/2010. Rendering techniques. Global Illumination. Local Illumination methods. Today : Global Illumination Modules and Methods

HISTOGRAMS are an important statistic reflecting the

vaiation than the fome. Howeve, these methods also beak down as shadowing becomes vey signicant. As we will see, the pesented algoithm based on the il

ANN Models for Coplanar Strip Line Analysis and Synthesis

Efficient protection of many-to-one. communications

The Internet Ecosystem and Evolution

On Error Estimation in Runge-Kutta Methods

Also available at ISSN (printed edn.), ISSN (electronic edn.) ARS MATHEMATICA CONTEMPORANEA 3 (2010)

Elliptic Generation Systems

Lecture # 04. Image Enhancement in Spatial Domain

Visual Servoing from Deep Neural Networks

COLOR EDGE DETECTION IN RGB USING JOINTLY EUCLIDEAN DISTANCE AND VECTOR ANGLE

Topic -3 Image Enhancement

Scaling Location-based Services with Dynamically Composed Location Index

Dense pointclouds from combined nadir and oblique imagery by object-based semi-global multi-image matching

Transcription:

IEEE RANSACIONS ON PAERN ANALYSIS AND MACINE INELLIGENCE, VOL. 0, NO. 6, JUNE 998 577 A Unified Appoach to Moving Object Detection in D and 3D Scenes Michal Iani and P. Anandan Abstact he detection of moving objects is impotant in many tasks. Pevious appoaches to this poblem can be boadly divided into two classes: D algoithms which apply when the scene can be appoximated by a flat suface and/o when the camea is only undegoing otations and zooms, and 3D algoithms which wok well only when significant depth vaiations ae pesent in the scene and the camea is tanslating. In this pape, we descibe a unified appoach to handling moving-object detection in both D and 3D scenes, with a stategy to gacefully bidge the gap between those two extemes. Ou appoach is based on a statification of the moving object-detection poblem into scenaios which gadually incease in thei complexity. We pesent a set of techniques that match the above statification. hese techniques pogessively incease in thei complexity, anging fom D techniques to moe complex 3D techniques. Moeove, the computations equied fo the solution to the poblem at one complexity level become the initial pocessing step fo the solution at the next complexity level. We illustate these techniques using examples fom eal-image sequences. Index ems Moving object detection, igidity constaints, multifame analysis, plana-paallax, paallax geomety, layes. F INRODUCION M OVING object detection is an impotant poblem in image sequence analysis. It is necessay fo suveillance applications, fo guidance of autonomous vehicles, fo efficient video compession, fo smat tacking of moving objects, and many othe applications. he D motion obseved in an image sequence is caused by 3D camea motion (the egomotion), by the changes in intenal camea paametes (e.g., camea zoom), and by 3D motions of independently moving objects. he key step in moving-object detection is accounting fo (o compensating fo) the camea-induced image motion. Afte compensation fo camea-induced image motion, the emaining esidual motions must be due to moving objects. he camea-induced image motion depends both on the egomotion paametes and the depth of each point in the scene. Estimating all of these physical paametes (namely, egomotion and depth) to account fo the camea-induced motion is, in geneal, an inheently ambiguous poblem [3]. When the scene contains lage depth vaiations, these paametes may be ecoveed. We efe to these scenes as 3D scenes. oweve, in D scenes, namely, when the depth vaiations ae not significant, the ecovey of the camea and scene paametes is usually not obust o eliable [3]. Sample publications that teat the poblem of moving objects in 3D scenes ae [4], [], [30], [3], [9]. A caeful teatment of the issues and poblems associated with movingobject detection in 3D scenes is given in [9]. ²²²²²²²²²²²²²²²² M. Iani is with the Depatment of Applied Math and Compute Science, he Weizmann Institute of Science, 7600 Rehovot, Isael. E-mail: iani@wisdom.weizmann.ac.il. P. Anandan is with Micosoft Copoation, One Micosoft Way, Redmond, WA 9805. E-mail: anandan@micosoft.com. Manuscipt eceived 9 Jan. 996; evised 5 Feb. 998. Recommended fo acceptance by B.C. Vemui. Fo infomation on obtaining epints of this aticle, please send e-mail to: tpami@compute.og, and efeence IEEECS Log Numbe 0643. An effective appoach to accounting fo camea-induced motion in D scenes is to model the image motion in tems of a global D paametic tansfomation. his appoach is obust and eliable when applied to flat (plana) scenes, distant scenes, o when the camea is undegoing only otations and zooms. oweve, the D appoach cannot be applied to the 3D scenes. Examples of methods that handle moving objects in D scenes ae [4], [7], [8], [0], [8], [4], [33], [5]. heefoe, D algoithms and 3D algoithms addess the moving object-detection poblem in vey diffeent types of scenaios. hese ae two extemes in a continuum of scenaios: flat D scenes (i.e., no 3D paallax) vs. 3D scenes with dense depth vaiations (i.e., dense 3D paallax). Both classes fail on the othe exteme case o even on the intemediate case (when 3D paallax is spase elative to amount of independent motion). In eal-image sequences, it is not always possible to pedict in advance which situation (D o 3D) will occu. Moeove, both types of scenaios can occu within the same sequence, with gadual tansitions between them. Unfotunately, no single class of algoithms (D o 3D) can addess the geneal moving object-detection poblem. It is not pactical to constantly switch fom one set of algoithms to anothe, especially since neithe class teats well the intemediate case. In this pape, we pesent a unified appoach to handling moving-object detection in both D and 3D scenes, with a stategy to gacefully bidge the gap between those two extemes. Ou appoach is based on a statification of the moving object-detection poblem into scenaios which gadually incease in thei complexity: ) scenaios in which the camea-induced motion can be modeled by a single D paametic tansfomation, 06-888/98/$0.00 998 IEEE

578 IEEE RANSACIONS ON PAERN ANALYSIS AND MACINE INELLIGENCE, VOL. 0, NO. 6, JUNE 998 ) those in which the camea-induced motion can be modeled in tems of a small numbe of layes of paametic tansfomations, and 3) geneal 3D scenes, in which a moe complete paallax motion analysis is equied. We pesent a set of techniques that match the above statification. hese techniques pogessively incease in thei complexity. Moeove, the computations equied fo the solution to the poblem at one complexity level become the initial pocessing step fo the solution at the next complexity level. In ou appoach, we always apply fist the D analysis. When that is all the infomation that is contained in the video sequence, that is whee the analysis should stop (to avoid encounteing singulaities). Ou 3D analysis gadually builds on top of the D analysis, with the gadual incease in 3D infomation, as detected in the image sequence. Afte D alignment, thee can be two souces fo esidual motions: 3D paallax and independent motions. o distinguish between these two types of motions, we develop a new igidity constaint based on the esidual paallax displacements. his constaint is based on an analysis of the paallax displacements of a few points ove multiple fames, as opposed to the epipola constaint, which is based on many points ove a pai of fames. As such, they ae applicable even in cases whee 3D paallax is vey spase and in the pesence of independent motions. he goal in taking this appoach is to develop a stategy fo moving object detection, so that the analysis pefomed is tuned to match the complexity of the poblem and the availability of infomation at any time. his pape descibes the coe elements of such a stategy. he integation of these elements into a single algoithm emains a task fo ou futue eseach. A shote vesion of this pape appeaed in [3]. D SCENES he instantaneous image motion of a geneal 3D scene can be expessed as in [], []: 7" #! $! 4 9 Y X X ux, y x y x xy Y vxy, 7 Y # + Ω + + Ω Ω + Ω + Ω xω + y xyω + y Ω 4 9 X Y X whee (u(x, y), v(x, y)) denotes the image velocity at image location, (x, y), ( X, Y, ) t denotes the tanslational motion of the camea, R (Ω X, Ω Y, Ω ) t denotes the camea otation, and denotes the depth of the scene point coesponding to (x, y). he instantaneous image motion () can often be appoximated by a single D paametic tansfomation. Below, we eview the conditions associated with the scene geomety and/o camea motion when such an appoximation is valid. )A plana suface: When the scene can be modeled as a single plana suface, i.e., when A X + B Y + C, whee (X, Y, ) ae 3D scene coodinates and (A, B, C) denote the paametes that descibe the plane, Eq. () can be educed to: 7" #! $! " $ # () ux y, a b x c y g x h xy vxy, 7# + + + + () d + e x + f y + g xy + h y " $ # whee the paametes (a, b, c, d, e, f, g, h) ae functions of the camea motion (R, ) and the plana suface paametes (A, B, C). hus, the image motion is descibed by an eight-paamete quadatic tansfomation in D [5]. )Distant Scene: When the scene is vey distant fom the camea, namely, when the deviations fom a plana suface ae small elative to the oveall distance of the scene fom the camea, the plana suface model is still a vey good appoximation. In this case the D quadatic tansfomation descibes the image motion field to subpixel accuacy. Moeove, as the oveall distance, then x, y, z 0, i.e., the tanslational component of image motion is negligible. (his is simila to the case of pue otation descibed below.) he distant scene conditions ae often satisfied in emote suveillance applications, whee naow field-of-view (FOV) cameas (typically 5 o o less) ae used to detect moving objects in a distant scene (typically at least km away). 3)Camea Rotation: When the camea undegoes a pue otational motion (i.e., 0) o when the camea tanslation is negligible ( ), then Eq. () becomes ux, y " Y y x Y xy X vxy, 7! $ # + Ω Ω Ω + Ω ". (3)! ΩX xω xyωy + y ΩX $ # hus the D image motion field is descibed by a quadatic tansfomation in this situation as well. 4)Camea oom: Finally, on top of its motion, when the camea zooms in, the image undegoes an additional dilation. he esulting image motion field can still be modeled as a quadatic tansfomation of the fom of Eq. (); the zoom will influence the paametes b and f. We efe to scenes that satisfy one o moe of the abovementioned conditions (and, hence, Eq. () is applicable) as D scenes. Unde these conditions, we can use a peviously developed method [6], [4] in ode to compute the D paametic motion. his technique locks onto a dominant paametic motion between an image pai, even in the pesence of independently moving objects. It does not equie pio knowledge of thei egions of suppot in the image plane [4]. his computation povides only the D motion paametes of the camea-induced motion, but no explicit 3D shape o motion infomation. o make the pape selfcontained, we biefly eview these steps fo estimating these D motion paametes in the next few paagaphs. Note that this D estimation pocess is also used late as an initial step in the layeed and the 3D analysis methods.. he D Paametic Estimation A numbe of techniques have been descibed in the compute vision liteatue fo the estimation of D paametic motion (e.g., [4], [7], [4], [33], [5], [6], [3]). In this pape, we follow the appoach descibed in [4]. o make this pesentation self-contained, we biefly outline this technique below. We will efe to the two image fames (whose image motion is being estimated) by the names inspection im-

IRANI AND ANANDAN: A UNIFIED APPROAC O MOVING OBJEC DEECION IN D AND 3D SCENES 579 (c) (d) (e) Fig.. Small D moving object detection., wo fames in a sequence obtained by a tanslating and otating camea. he scene itself was not plana, but was distant enough (about km away fom the camea) so that effects of 3D paallax wee negligible. he scene contained a ca diving on a oad. (c) Intensity diffeences befoe dominant (backgound) D alignment. (d) Intensity diffeences afte dominant (backgound) D alignment. Nonovelapping image boundaies wee not pocessed. he D alignment compensates fo the camea-induced motion, but not fo the ca s independent motion. (e) he detected moving object based on local misalignment analysis. he white egion signifies the detected moving object. age and efeence image, espectively. A Laplacian pyamid is fist constucted fom each of the two input images and then estimates the motion paametes in a coase-fine manne. Within each level the sum of squaed diffeence (SSD) measue integated ove egions of inteest (which is initially the entie image egion) is used as a match measue. his measue is minimized with espect to the D image motion paametes. he SSD eo measue fo estimating the image motion within a egion is: E6 α 4Ix, y, t7 I3x ux, y; α7, y vx, y; α7, t 89 (4) x whee I is the (Laplacian pyamid) image intensity, α abcde,,,,, f, gh, 7 denotes the paametes of the quadatic tansfomation, and ux, y; 3 α7, vx, y; α78 denotes the image velocity at the location (x, y) induced by the quadatic tansfomation with paametes α as defined in (). he sum is computed ove all the points within a egion of inteest, often the entie image. he objective function E given in (4) is minimized w..t. the unknown paametes α abcde,,,,, f, gh, 7 via the Gauss-Newton optimization technique. Let α i ai, bi, ci, di, ei, fi, gi, hi7 denote the cuent estimate of the quadatic paametes. Afte waping the inspection image (towads the efeence image) by applying the quadatic tansfomation based on these paametes, an incemental estimate δα δ a, δ b, δ c, δ d, δ e, δ f, δ g, δ h 7 can be detemined. Afte iteating a cetain numbe of times within a pyamid level, the pocess continues at the next fine level. With the above technique, the efeence and inspection images ae egisteed so that the desied image egion is aligned, and the quadatic tansfomation () is estimated. he above estimation technique is a least-squaes-based appoach and hence possibly sensitive to outlies. oweve, as epoted in [7], this sensitivity is minimized by doing the least-squaes estimation ove a pyamid. he pyamid-based appoach locks on to the dominant image motion in the scene. A obust vesion of the above method [4] handles scenes with multiple moving objects. It incopoates a gadual efinement of the complexity of the motion model (anging fom pue tanslation at low esolution levels, to a D affine model at intemediate levels, to the D quadatic model at the highest esolution level). Outlie ejection is pefomed befoe each efinement step within the multiscale analysis. his obust analysis futhe enhances the locking popety of the above-mentioned algoithm onto a single dominant motion. Once the dominant D paametic motion has been estimated, it is used fo waping one image towads the othe. When the dominant motion is that of the camea, all egions coesponding to static potions of the scene ae completely aligned as a esult of the D egistation (except fo nonovelapping image boundaies), while independently moving objects ae not. Detection of moving objects is theefoe pefomed by detemining local misalignments [4] afte the global D paametic egistation. Fig. shows an example of moving-object detection in a D scene. his sequence was obtained by a video camea with an FOV of fou degees. he camea was mounted on a vehicle moving on a bumpy dit oad at about 5 km/h and was looking sideways. heefoe, the camea was both tanslating and otating (camea jitte). he scene itself was

580 IEEE RANSACIONS ON PAERN ANALYSIS AND MACINE INELLIGENCE, VOL. 0, NO. 6, JUNE 998 (c) (d) (e) Fig.. Layeed moving object detection., wo fames in a sequence obtained by a tanslating and otating camea. he FOV captues a distant potion of the scene (hills and oad) as well as a fontal potion of the scene (bushes). he scene contains a ca diving on a oad. (c) he image egion which coesponds to the dominant D paametic tansfomation. his egion coesponds to the emote pat of the scene. White egions signify image egions which wee misaligned afte pefoming global image egistation accoding to the computed dominant D paametic tansfomation. hese egions coespond to the ca and the fontal pat of the scene (the bushes). (d) he image egion which coesponds to the next detected dominant D paametic tansfomation. his egion coesponds to the fontal bushes. he D tansfomation was computed by applying the D estimation algoithm again, but this time only to the image egions highlighted in white in Fig. c (i.e., only to image egions inconsistent in thei image motion with the fist dominant D paametic tansfomation). White egions in this figue signify egions inconsistent with the bushes D tansfomation. hese coespond to the ca and to the emote pats of the scene. (e) he detected moving object (the ca) highlighted in white. not plana, but was distant enough (about km away fom the camea) so that D paametic tansfomations wee sufficient to account fo the camea-induced motion between successive fames. he scene contained a ca moving independently on a oad. Fig. a and Fig. b show two fames out of the sequence. Fig. c and Fig. d show intensity diffeences befoe and afte dominant (backgound) D alignment, espectively. Fig. e shows the detected moving object based on local misalignment analysis [4]. he fame-to-fame motion of the backgound in emote suveillance applications can typically be modeled by a D paametic tansfomation. oweve, when a fontal potion of the scene entes the FOV, effects of 3D paallax motion ae encounteed. he simple D algoithm cannot account fo camea-induced motion in scenes with 3D paallax. In the next two sections, we addess the poblem of moving-object detection in 3D scenes with paallax. 3 MULIPLANAR SCENES When the camea is tanslating, and the scene is not plana o is not sufficiently distant, then a single D paametic motion (Section ) is insufficient fo modeling the camea-induced motion. Aligning two images with espect to a dominant D paametic tansfomation may bing into alignment a lage potion of the scene, which coesponds to a plana (o a emote) pat of the scene. oweve, any othe (e.g., nea) potions of the scene that ente the FOV cannot be aligned by the dominant D paametic tansfomation. hese out-of-plane scene points, although they have the same 3D motion as the plana points, have substantially diffeent induced D motions. he diffeences in D motions ae called 3D paallax motion [3], [5]. Effects of paallax ae only due to camea tanslation and 3D scene vaiations. Camea otation o zoom does not cause paallax (see Section 4.). Fig. shows an example of a sequence whee the effects of 3D paallax ae evident. Figs. a and b show two fames fom a sequence with the same setting and scenaio descibed in Fig., only in this case a fontal hill with bushes (which was much close to the camea than the backgound scene) enteed the FOV. Fig. c displays the image egion which was found to be aligned afte dominant D paametic egistation (see Section ). Clealy the global D alignment accounts fo the camea-induced motion of the distant potion of the scene, but does not account fo the camea-induced motion of the close potion of the scene (the bushes). hus, simple D techniques, when applied to these types of scenaios, will not be able to distinguish between the independent ca motion and the 3D paallax motion of the bush. hee is theefoe a need to model 3D paallax as well. In this section, we descibe one appoach to modeling paallax motion, which builds on top of the D appoach to modeling camea-induced motion. his appoach is based on fitting multiple plana sufaces (i.e., multiple D layes [], [33]) to the scene. In Section 4, appoaches to han-

IRANI AND ANANDAN: A UNIFIED APPROAC O MOVING OBJEC DEECION IN D AND 3D SCENES 58 dling moe complex types of scenes with (spase and dense) 3D paallax will be descibed. hey too build on top of the D (o layeed) appoach. When the scene is piecewise plana, o is constucted of a few distinct potions at diffeent depths, then the cameainduced motion can be accounted fo by a few layes of D paametic tansfomations. his case is vey typical of outdoo suveillance scenaios, especially when the camea FOV is naow. he multilayeed appoach is an extension of the simple D appoach and is implemented using a method simila to the sequential method pesented in [4]: Fist, the dominant D paametic tansfomation between two fames is detected (Section ). he two images ae aligned accodingly, and the misaligned image egions ae detected and segmented out (Fig. c). Next, the same D motion estimation technique is eapplied, but this time only to the segmented (misaligned) egions of the image, to detect the next dominant D tansfomation and its egion of suppot within the image, and so on. Fo each additional laye, the two images ae aligned accoding to the D paametic tansfomation of that laye, and the misaligned image egions ae detected and segmented out (Fig. d). Each D laye is continuously tacked in time by using the obtained segmentation masks. Moving objects ae detected as image egions that ae inconsistent with the image motion of any of the D layes. Such an example is shown in Fig. e. A moving object is not detected as a laye by this algoithm if it is small. oweve, if the object is lage, it may itself be detected as a D laye. A few cues can be used to distinguish between moving objects and static scene layes: ) Moving objects poduce discontinuities in D motion eveywhee on thei bounday, as opposed to static D layes. heefoe, if a moving object is detected as a laye, it can be distinguished fom eal scene layes due to the fact that it appeas floating in the ai (i.e., has depth discontinuities all aound it). A eal scene laye, on the othe hand, is always connected to anothe pat of the scene (laye). On the connecting bounday, the D motion is continuous. If the connection to othe scene potions is outside the FOV, then that laye is adjacent to the image bounday. heefoe, a D laye which is fully contained in the FOV, and exhibits D motion discontinuities all aound it, is necessaily a moving object. ) he 3D consistency ove time of two D layes can be checked. In Section 4. we pesent a method fo checking 3D consistency of two scene points ove time based on thei paallax displacements alone. If two layes belong to a single igid scene, the paallax displacement of one laye with espect to the othe is yet anothe D paametic tansfomation (which is obtained by taking the diffeence between the two D paametic laye tansfomations). heefoe, fo example, consistency of two layes can be veified ove time by applying the 3D-consistency check to paallax displacements of one laye with espect to the othe (see Section 4.). 3) Othe cues, such as detecting negative depth, can also be used. In the sequence shown in Figs. and, we used the fist cue (i.e., eliminated floating layes) to ensue moving objects wee not intepeted as scene layes. he moving ca was successfully and continuously detected ove the entie two-minute video sequence, which altenated between the single-layeed case (i.e., no 3D paallax; fontal scene pat was not visible in the FOV) and the two-layeed case (i.e., existence of 3D paallax). 4 SCENES WI GENERAL 3D PARALLAX While the single and multilayeed paametic egistation methods ae adequate to handle a lage numbe of situations, thee ae cases when the paallax cannot be modeled in tems of layes. An example of such a situation is a clutteed scene which contains many small objects at multiple depths (these could be uban scenes o indoo scenes). In this section, we develop an appoach to handling these moe complex 3D scenes. 4. 3D Scenes With Dense Paallax he key obsevation that enables us to extend the D paametic egistation appoach to geneal 3D scenes is the following: the plane egistation pocess (using the dominant D paametic tansfomation) emoves all effects of camea otation, zoom, and calibation, without explicitly computing them [5], [8], [6], [7]. he esidual image motion afte the plane egistation is due only to the tanslational motion of the camea and to the deviations of the scene stuctue fom the plana suface. ence, the esidual motion is an epipola flow field. his obsevation has led to the so-called plane + paallax appoach to 3D scene analysis [7], [5], [8], [6], [7]. 4.. he Plane + Paallax Decomposition Fig. 3 povides a geometic intepetation of the plana paallax. Let P 0X, Y, 5 and P 0X, Y, 5 denote the Catesian coodinates of a scene point with espect to two diffeent camea views, espectively. Let the 3 3 matix R and the 3 vecto denote the otation and tanslation between the two camea systems, espectively. Let (x, y) and (x, y ) denote the image coodinates of the scene point P, and p x, y,6 KP and p x y,,6 K P denote the same points in homogeneous coodinates. K and K ae 3 3 matices epesenting the intenal calibation paametes of the two cameas (see Appendix A). Also, define t 4t t t K x, y, z9. Note that KP KP 3 8, 3 8, and t z. Note that when z z t t z z 0, e denotes the epipole (o the focus-of-expansion, FOE) in homogeneous coodinates. Let Π be an abitay plana suface and A denote the homogaphy that aligns the plana suface Π between the second and fist fame (i.e., fo all points P S, P A P ). Define u p p u, v, 07, whee (u, v) is the measuable D image displacement vecto of the image point p

58 IEEE RANSACIONS ON PAERN ANALYSIS AND MACINE INELLIGENCE, VOL. 0, NO. 6, JUNE 998 Fig. 3. he plane + paallax decomposition. he geometic intepetation. he epipola field of the esidual paallax displacements. between the two fames. It can be shown (see Appendix A), as well as [9], [5], [6], [7], that u u + (5) whee denotes the plana pat of the D image motion (the homogaphy due to Π), and denotes the esidual plana paallax D motion. he homogaphy due to Π esults in an image motion field that can be modeled as a D paametic tansfomation. In geneal, this tansfomation is a pojective tansfomation, howeve, in the case of instantaneous camea motion, it can be well appoximated by the quadatic tansfomation shown in (). When z 0: z p pw ; e p w (6) whee p w denotes the image point (in homogeneous coodinates) in the fist fame which esults fom waping the coesponding point p in the second image, by the D paametic tansfomation of the efeence plane Π. We will efe to the fist fame as the efeence fame. Also, is the pependicula distance fom the second camea cente to the efeence plane Π (see Fig. 3), and as noted ealie e denotes the epipole (o FOE). is a measue of the 3D shape of the point P. In paticula,, whee is the pependicula distance fom the P to the efeence plane Π, and is the ange (o depth ) of the point P with espect to the fist camea. We efe to as the pojective 3D stuctue of point P. In the case when z 0, the paallax motion has a slightly diffeent fom: t, whee t is as defined ealie. he use of the plane + paallax decomposition fo egomotion estimation is descibed in [5], and fo 3D shape ecovey is descibed in [8], [6]. he plane + paallax decomposition is moe geneal than the taditional decomposition in tems of otational and tanslational motion (and includes the taditional decomposition as a special case). In addition, ) the plana homogaphy (i.e., the D paametic plana tansfomation) compensates fo camea otation, zoom, and othe changes in the intenal paametes of the camea, ) this appoach does not equie any pio knowledge of the camea intenal paametes (in othe wods, no pio camea calibation is needed), and 3) the plana homogaphy being a D paametic tansfomation can be estimated in a moe stable fashion than the otation and tanslation paametes. In paticula, it can be estimated even when the camea FOV is limited, the depth vaiations in the scene ae small, and in the pesence of independently moving objects (see Section ). Since the esidual paallax displacements afte the D alignment of the dominant plana suface ae due to tanslational component alone, they fom a adial field centeed at the epipole/foe (see Fig. 3b). If the epipole is ecoveed, all that is equied fo detecting moving objects is the veification whethe the esidual D displacement associated with a given point is diected towads/away fom the epipole. his is known as the epipola constaint [9]. Residual D motion that violates this equiement can only be due to an independently moving object. Fig. 4a gaphically illustates this situation. An algoithm fo detecting moving objects based on the plane + paallax decomposition is descibed in [0]. his technique, howeve, equies the estimation of the 3D shape and the epipole. 4.. Difficulty of Epipole Recovey While the plane + paallax stategy fo moving object detection woks geneally well when the epipole (FOE) ecovey is possible, its pefomance depends citically on the ability to accuately estimate the epipole. Since the epipole

IRANI AND ANANDAN: A UNIFIED APPROAC O MOVING OBJEC DEECION IN D AND 3D SCENES 583 Fig. 4. Moving object detection based on violation of epipola motion. Moving object detection based on inconsistency of paallax motion with adial epipola motion field. False epipole estimation when 3D paallax is spase elative to independent motion. ecovey is based on the esidual motion vectos, those vectos that ae due to the moving object ae likely to bias the estimated epipole away fom the tue epipole. (Note that this is tue even of the diect methods that do not explicitly ecove the esidual motion vectos, but instead ely on spatiotempoal image gadients [8], since the infomation povided by the points on moving objects will influence the estimate.) he poblem of estimating the epipole is acute when the scene contains spase paallax infomation and the esidual motion vectos due to independently moving object ae significant (eithe in magnitude o in numbe). A gaphic illustation of such a situation is povided in Fig. 4b. In the situation depicted in this figue, the magnitude and numbe of paallax vectos on the tee ae consideably smalle than the esidual motion vectos on the independently moving ca. As a esult, the estimated epipole is likely to be consistent with the motion of the ca (in the figue, this would be somewhee outside the FOV on the left side of the image), and the tee will be detected as an independently moving object. hee ae two obvious ways to ovecome the difficulties in estimating the epipole. he fist is to use pio knowledge egading the camea/vehicle motion to eject potential outlies (namely, the moving objects) duing the estimation. oweve, if only limited paallax infomation is available, any attempt to efine this pio infomation will be unstable. A moe geneal appoach would be to defe, o even completely eliminate, the computation of the epipole. In the next section, we develop an appoach to moving-object detection by diectly compaing the paallax motion of pais of points without estimating the epipole. 4. 3D Scenes With Spase Paallax In this section we pesent a method we have developed fo moving-object detection in the difficult intemediate cases, when 3D paallax infomation is spase elative to independent motion infomation. his appoach can be used to bidge the gap between the D cases and the dense 3D cases. 4.. he Paallax-Based Shape Constaint EOREM. Given the plana-paallax displacement vectos and of two points that belong to the static backgound scene, thei elative 3D pojective stuctue is given by: pw pw. (7) whee, as shown in Fig. 5a, p and p ae the image locations (in the efeence fame) of two points that ae pat of the static scene, pw pw p w, the vecto connecting the waped locations of the coesponding second fame points (as in (6)), and v signifies a vecto pependicula to v. PROOF. See Appendix B. o Note that this constaint diectly elates the elative pojective stuctue of two points to thei paallax displacements alone: No camea paametes, in paticula the epipole (FOE), ae involved. Neithe is any additional paallax infomation equied at othe image points. heoetically, one could use the two paallax vectos to ecove the epipole (the intesection point of the two vectos) and then use the magnitudes and distances of the points fom the computed epipole to estimate thei elative pojective stuctue. he benefit of the constaint (7) is that it povides this infomation diectly fom the positions and paallax vectos of the two points, without the need to go though the computation of the epipole, using as much infomation as one point can give on anothe. Fig. 5b gaphically shows an example of a configuation in which estimating the epipole is vey uneliable, wheeas estimating the elative stuctue diectly fom (7) is eliable. Application of this constaint to the ecovey of 3D stuctue of the scene is descibed in []. ee we focus on its application to moving object detection.

584 IEEE RANSACIONS ON PAERN ANALYSIS AND MACINE INELLIGENCE, VOL. 0, NO. 6, JUNE 998 Fig. 5. he paiwise paallax-based shape constaint. his figue geometically illustates the elative stuctue constaint (7): 7 pw AB AB. When the paallax vectos ae nealy paallel, the epipole estimation is uneliable. oweve, the elative stuctue AC AC pw 7 can be eliably computed even in this case. 4.. he Paallax-Based Rigidity Constaint EOREM. Given the plana-paallax displacement vectos of two points that belong to the backgound static scene ove thee fames, the following constaint must be satisfied: j j j pw j pw k k k pw k pw 0. (8) j j whee, ae the paallax displacement vectos of the two points between the efeence fame and the jth fame, k k, ae the paallax vectos between the efeence fame j k and the kth fame, and p w, p w ae the coesponding distances between the waped points as in (7) and Fig. 5a. PROOF. he elative pojective stuctue is invaiant to camea motion. heefoe, using (7), fo any two fames j and k we get: j j j pw j pw k k k pw k pw o As in the case of the paallax-based shape constaint (7), the paallax-based igidity constaint (8) elates the paallax vectos of pais of points ove thee fames without efeing to the camea geomety (especially the epipole/foe). Futhemoe, this constaint does not even explicitly efe to the stuctue paametes of the points in consideation. he igidity constaint (8) can theefoe be. applied to detect inconsistencies in the 3D motion of two image points (i.e., say whethe the two image points ae pojections of 3D points belonging to the same o diffeent 3D moving objects) based on thei paallax motion among thee (o moe) fames alone, without the need to estimate eithe camea geomety, camea motion, o stuctue paametes, and without elying on paallax infomation at othe image points. A consistency measue is defined as the left-hand side of (8), afte multiplying by the denominatos (to eliminate singulaities). he fathe this quantity is fom zeo, the highe is the 3D-inconsistency of the two points. 4.3 Applying the Paallax Rigidity Constaint to Moving Object Detection Fig. 6a gaphically displays an example of a configuation in which estimating the epipole in pesence of multiple moving objects can be vey eoneous, even when using clusteing techniques in the epipole domain as suggested by [], [30]. Relying on the epipole computation to detect inconsistencies in 3D motion fails in detecting moving objects in such cases. he paallax igidity constaint (8) can be applied to detect inconsistencies in the 3D motion of one image point elative to anothe diectly fom thei paallax vectos ove multiple (thee o moe) fames, without the need to estimate eithe camea geomety, camea motion, o shape paametes. his povides a useful mechanism fo clusteing (o segmenting) the paallax vectos (i.e., the esidual motion afte plana egistation) into consistent goups belonging to consistently 3D moving objects, even in cases such as in Fig. 6a, whee the paallax infomation is minimal, and the independent motion is not negligible. Fig. 6b gaphically explains how the igidity constaint (8) detects the 3D inconsistency of Fig. 6a ove thee fames.

IRANI AND ANANDAN: A UNIFIED APPROAC O MOVING OBJEC DEECION IN D AND 3D SCENES 585 Fig. 6. Reliable detection of 3D motion inconsistency with spase paallax infomation. Camea is tanslating to the ight. he only static object with pue paallax motion is that of the tee. Ball is falling independently. he epipole may incoectly be computed as e. he false epipole e is consistent with both motions. he igidity constaint applied to this scenaio detects 3D inconsistency ove thee fames, since A B C B A. C In this case, even the signs do not match. Fig. 7 shows an example of using the igidity-based inconsistency measue descibed ealie to detect 3D inconsistencies. In this sequence the camea is in motion (tanslating fom left to ight), inducing paallax motion of diffeent magnitudes on the house, oad, and oad sign. he ca moves independently fom left to ight. he detected D plana motion was that of the house. he plana paallax motion was computed afte D egistation of the thee images with espect to the house (see Fig. 7d). A single point on the oad sign was selected as a point of efeence (see Fig. 7e). Fig. 7f displays the measue of inconsistency of each point in the image with espect to the selected oad sign point. Bight egions indicate lage values when applying the inconsistency measue, i.e., violations in 3D igidity detected ove thee fames with espect to the oad sign point. he egion which was detected as moving 3Dinconsistently with espect to the oad sign point coesponds to the ca. Regions close to the image bounday wee ignoed. All othe egions of the image wee detected as moving 3D-consistently with the oad sign point. heefoe, assuming an uncalibated camea, this method povides a mechanism fo segmenting all nonzeo esidual motion vectos (afte D plana stabilization) into goups moving consistently (in the 3D sense). Fig. 8 shows anothe example of using the igidity constaint (8) to detect 3D inconsistencies. In this sequence the camea is mounted on a helicopte flying fom left to ight, inducing some paallax motion (of diffeent magnitudes) on the house oof and tees (bottom of the image) and on the electicity poles (by the oad). hee cas move independently on the oad. he detected D plana motion was that of the gound suface (see Fig. 8d). A single point was selected on a tee as a point of efeence (see Fig. 8e). Fig. 8f displays the measue of inconsistency of each point in the image with espect to the selected efeence point. Bight egions indicate 3D-inconsistency detected ove thee fames. he thee cas wee detected as moving inconsistently with the selected tee point. Regions close to the image bounday wee ignoed. All othe image egions wee detected as moving consistently with the selected tee point. he ability of the paallax igidity constaint (8) to detect 3D-inconsistency with espect to a single point povides a natual way to bidge between D algoithms (which assume that any D motion diffeent than the plana motion is an independently moving object), and 3D algoithms (which ely on having pio knowledge of a consistent set of points o, altenatively, dense paallax data). 5 CONCLUSION Pevious appoaches to the poblem of moving-object detection can be boadly divided into two classes: D algoithms which apply when the scene can be appoximated by a flat suface and/o when the camea is only undegoing otations and zooms, and 3D algoithms which wok well only when significant depth vaiations ae pesent in the scene and the camea is tanslating. hese two classes of algoithms teat two extemes in a continuum of scenaios: no 3D paallax (D algoithms) vs. dense 3D paallax (3D algoithms). Both classes fail on the othe exteme case o even on the intemediate case (when 3D paallax is spase elative to amount of independent motion). In this pape, we have descibed a unified appoach to handling moving-object detection in both D and 3D scenes, with a stategy to gacefully bidge the gap between those two extemes. Ou appoach is based on a statification of the moving object-detection poblem into scenaios which gadually incease in thei complexity. We pesented a set of techniques that match the above statification. hese techniques pogessively incease in thei complexity, anging fom D techniques to moe complex 3D techniques. Moeove, the computations equied fo the solution to the poblem at one complexity level become the initial pocessing step fo the solution at the next complexity level.

586 IEEE RANSACIONS ON PAERN ANALYSIS AND MACINE INELLIGENCE, VOL. 0, NO. 6, JUNE 998 (d) (c) (e) (f) Fig. 7. Moving object detection elying on a single paallax vecto.,, (c) hee image fames fom a sequence obtained by a camea tanslating fom left to ight, inducing paallax motion of diffeent magnitudes on the house, oad, and oad sign. he ca moves independently fom left to ight. he middle fame (Fig. 7b) was chosen as the fame of efeence. (d) Diffeences taken afte D image egistation. he detected D plana motion was that of the house, and is canceled by the D egistation. All othe scene pats that have diffeent D motions (i.e., paallax motion o independent motion) ae misegisteed. (e) he selected point of efeence (a point on the oad sign) highlighted by a white cicle. (f) he measue of 3D-inconsistency of all points in the image with espect to the oad sign point. Bight egions indicate violations in 3D igidity detected ove thee fames with espect to the selected oad sign point. hese egions coespond to the ca. Regions close to the image bounday wee ignoed. All othe egions of the image appea to move 3D-consistently with the oad sign point. (d) (e) (c) (f) Fig. 8. Moving object detection elying on a single paallax vecto.,, (c) hee image fames fom a sequence obtained by a camea mounted on a helicopte (flying fom left to ight while tuning), inducing some paallax motion (of diffeent magnitudes) on the house oof and tees (bottom of the image) and on the electicity poles (by the oad). hee cas move independently on the oad. he middle fame (Fig. 8b) was chosen as the fame of efeence. (d) Diffeences taken afte D image egistation. he detected D plana motion was that of the gound suface and is canceled by the D egistation. All othe scene pats that have diffeent D motions (i.e., paallax motion o independent motion) ae misegisteed. (e) he selected point of efeence (a point on a tee at the bottom left of the image) highlighted by a white cicle. (f) he measue of 3D-inconsistency of each point in the image with the tee point. Bight egions indicate violations in 3D igidity detected ove thee fames with espect to the selected tee point. hese egions coespond to the thee cas (in the efeence image). Regions close to the image bounday wee ignoed. All othe egions of the image appea to move 3D-consistently with the tee point.

IRANI AND ANANDAN: A UNIFIED APPROAC O MOVING OBJEC DEECION IN D AND 3D SCENES 587 he goal in taking this appoach is to develop a stategy fo moving object detection, so that the analysis pefomed is tuned to match the complexity of the poblem and the availability of infomation at any time. his pape descibes the coe elements of such a stategy. he integation of these elements into a single algoithm emains a task fo ou futue eseach. APPENDIX A DERIVAION OF E PLANE + PARALLAX DECOMPOSIION In this appendix, we edeive the decomposition of image motion into the image motion of a plana suface (a homogaphy) and esidual paallax displacements. Let P 0X, Y, 5 and P 0X, Y, 5 denote the Catesian coodinates of a scene point with espect to two diffeent camea views, espectively. An abitay 3D igid coodinate tansfomation between P and P can be expessed by: P RP +, (9) whee R epesents the otation between the two camea coodinate systems, X, Y, 7 denotes the 3D tanslation in between the two views as expessed in the coodinate system of the second camea, and X, Y, 7 R denotes the same quantity in the coodinate system of the fist camea. Let Π denote an abitay 3D plana suface (eal o vitual). Let N denote its nomal as expessed in the coodinate system of the fist camea, and N denote the same quantity in the coodinate system of the second camea. Any point P satisfies the equation N P d (and similaly N P ). Fo a geneal scene point P : N P d + N P + (0) whee denotes the pependicula distance of P fom the plane Π. Note that is invaiant with espect to the two camea coodinate systems (see Fig. 3). By inveting (9), we obtain P R P R R P + () Fom (0), we deive N P () Substituting this in () obtains P R P N P + 4 9 (3) N R + P d d. (4) 6 and 6 de- Let p x, y, KP p x y,, K P note the images of the scene point P in the two camea views as expessed in homogeneous coodinates. K and K ae 3 3 matices epesenting the intenal calibation paametes of the two cameas. In geneal K has the following fom []: K a b c" 0 d e! 0 0 $ # 4 x, y, z9. (Note that Also, define t t t t K 3KP8, 3K P 8, and t z.) Multiplying both z z sides of (4) by K gives: p K R N + K p d t (5) ence, p A p d t, (6) whee denotes equality up to an abitay scale. N 4 9 is a 3 3 matix which epesents A K R + K the coodinate tansfomation of the plana suface Π between the two camea views, i.e., the homogaphy between the two views due to the plane Π. Scaling both sides by thei thid component (i.e., pojection) gives the equality: Ap d t p (7) z ap 3 Ap Ap + Ap ap ap 3 3 ap z Ap + ap 3 ap 3 z 4 3 9 z t Ap ap 3 ap 3 z t (8) (9) whee a 3 denotes the thid ow of the matix A. Moeove by consideing the thid component of the vecto (5), we obtain z ap 3. (0) d Substituting this into (9), we obtain A p z A p p ap + ap t () 3 3 When z 0, let e t denote the epipole in the fist image. hen, z Ap z Ap p e ap + ap () 3 3 On the othe hand, when z 0, we obtain p A p ap t. (3) 3

588 IEEE RANSACIONS ON PAERN ANALYSIS AND MACINE INELLIGENCE, VOL. 0, NO. 6, JUNE 998 Ap he point denoted by the vecto is of special inteest, a3 p since it epesents the location to which the point p is tansfomed due to the homogaphy A. In Fig. 3, this is denoted as the point p w. Also, we define, which is the 3D pojective stuctue () of P with espect to the plana suface Π. Substituting these into () and (3) yields: when z 0: and when z 0: z p pw + p e w (4) p pw d t (5) Rewiting (4) in the fom of image displacements yields (in homogeneous coodinates): p p p pw p e w. (6) Define u p p u, v,07, whee (u, v) is the measuable D image displacement vecto of the image point p between the two fames. Similaly, define u p p u, v,0 and p e,,0 u w 4 x y 9. ence, 7 w u u +, (7) denotes the plana pat of the D image displacement (i.e., the homogaphy due to Π), and denotes the esidual paallax D displacement. When z 0 then, fom Eq. (5): d t. APPENDIX B E PARALLAX-BASED SAPE CONSRAIN In this appendix, we pove heoem, i.e., we deive (7). Let and be the plana-paallax displacement vectos of two points that belong to the static backgound. Fom (6), we know that z z d 4e pw9 ; e p 4 w 9. (8) heefoe, pw p w. (9) his last step eliminated the epipole e. Equation (9) entails that the vectos on both sides of the equation ae paallel. Since d whee p p p constaint. his leads to the paiwise paallax is a scala, we get: pw, w w w 0 p w, (30) whee v signifies a vecto pependicula to v. When 0, a constaint stonge than (30) can be deived: 0, howeve, (30), still holds. his is impotant, as we do not have a pioi knowledge of to distinguish between the two cases. Fom (30), we can easily deive: p w, p w which is the same as (7) of heoem. ACKNOWLEDGMENS his wok was done while the authos wee at Sanoff Copoation, Pinceton, N.J. It was suppoted in pat by DARPA unde contact DAAA5-93-C-006. REFERENCES [] E.. Adelson, Layeed Repesentations fo Image Coding, echnical Repot 8, MI Media Lab, Vision and Modeling Goup, Dec. 99. [] G. Adiv, Detemining hee-dimensional Motion and Stuctue Fom Optical Flow Geneated by Seveal Moving Objects, IEEE ans. Patten Analysis and Machine Intelligence, vol. 7, no. 4, pp. 384 40, July 985. [3] G. Adiv, Inheent Ambiguities in Recoveing 3D Motion and Stuctue Fom a Noisy Flow Field, IEEE ans. Patten Analysis and Machine Intelligence, vol., pp. 477-489, May 989. [4] Y. Aloimonos, ed. Active Peception. Elbaum, 993. [5] S. Aye and. Sawhney, Layeed Repesentation of Motion Video Using Robust Maximum-Likelihood Estimation of Mixtue Models and MDL Encoding, Int l Conf. Compute Vision, pp. 777 784, Cambidge, Mass., June 995. [6] J.R. Begen, P. Anandan, K.J. anna, and R. ingoani, ieachical Model-Based Motion Estimation, Euopean Conf. Compute Vision, pp. 37-5, Santa Magaita Ligue, May 99. [7] J.R. Begen, P.J. But, R. ingoani, and S. Peleg, A hee Fame Algoithm fo Estimating wo-component Image Motion, IEEE ans. Patten Analysis and Machine Intelligence, vol. 4, pp. 886-896, Sept. 99. [8] P.J. But, R. ingoani, and R.J. Kolczynski, Mechanisms fo Isolating Component Pattens in the Sequential Analysis of Multiple Motion, IEEE Wokshop Visual Motion, pp. 87-93, Pinceton, N.J., Oct. 99. [9] J. Costeia and. Kanade. A Multi-Body Factoization Method fo Motion Analysis, Int l Conf. Compute Vision, pp.,07-,076, Cambidge, Mass., June 995. [0]. Daell and A. Pentland, Robust Estimation of a Multi-Layeed Motion Repesentation, IEEE Wokshop Visual Motion, pp. 73-78, Pinceton, N.J., Oct. 99. [] O. Faugeas, hee-dimensional Compute Vision. Cambidge, Mass.: M.I.. Pess, 993. [] M. Iani and P. Anandan, Paallax Geomety of Pais of Points fo 3D Scene Analysis, Euopean Conf. Compute Vision, Cambidge, UK, Ap. 996. [3] M. Iani and P. Anandan, A Unified Appoach to Moving Object Detection in D and 3D Scenes, 3th Int l Conf. Patten Recognition, pp. 7 77, Vienna, Austia, Aug. 996.

IRANI AND ANANDAN: A UNIFIED APPROAC O MOVING OBJEC DEECION IN D AND 3D SCENES 589 [4] M. Iani, B. Rousso, and S. Peleg, Computing Occluding and anspaent Motions, Int l J. Compute Vision, vol., pp. 5-6, Feb. 994. [5] M. Iani, B. Rousso, and S. Peleg, Recovey of Egomotion Using Region Alignment, IEEE ans. Patten Analysis and Machine Intelligence, vol. 9, no. 3, pp. 68-7, Ma. 997. [6] S. Ju, M.J. Black, and A.D. Jepson, Multi-Laye, Locally Affine Optical Flow and Regulaization With anspaency, Poc. of IEEE CVPR96, pp. 307-34, 996. [7] J.J. Koendeink and A.J. van Doon, Repesentation of Local Geomety in the Visual System, Biol. Cyben., vol. 55, pp. 367 375, 987. [8] R. Kuma, P. Anandan, and K. anna, Diect Recovey of Shape Fom Multiple Views: A Paallax Based Appoach, Poc th ICPR, 994. [9] R. Kuma, P. Anandan, and K. anna, Shape Recovey Fom Multiple Views: A Paallax Based Appoach, DARPA IU Wokshop, Monteey, Calif., Nov. 994. [0] R. Kuma, P. Anandan, M. Iani, J.R. Begen, and K.J. anna, Repesentation of Scenes Fom Collections of Images, Wokshop on Repesentations of Visual Scenes, 995. [] J.M. Lawn and R. Cipolla, Robust Egomotion Estimation Fom Affine Motion Paallax, Euopean Conf. Compute Vision, pp. 05-0, May 994. [].C. Longuet-iggins, Visual Ambiguity of a Moving Plane, Poc. Royal Soc. London, Seies B, vol. 3, pp. 65-75, 984. [3].C. Longuet-iggins and K. Pazdny, he Intepetation of a Moving Retinal Image, Poc. Royal Soc. London, Seies B, vol. 08, pp. 385-397, 980. [4] F. Meye and P. Bouthemy, Region-Based acking in Image Sequences, Euopean Conf. Compute Vision, pp. 476-484, Santa Magaita Ligue, May 99. [5] J.. Riege and D.. Lawton, Pocessing Diffeential Image Motion, J. Optical Soc. Am. A, vol. A, no., pp. 354-359, 985. [6]. Sawhney, 3D Geomety Fom Plana Paallax, IEEE Conf. Compute Vision and Patten Recognition, June 994. [7] A. Shashua and N. Navab, Relative Affine Stuctue: heoy and Application to 3D Reconstuction Fom Pespective Views, IEEE Conf. Compute Vision and Patten Recognition, pp. 483-489, Seattle, Wash., June 994. [8] M. Shizawa and K. Mase, Pinciple of Supeposition: A Common Computational Famewok fo Analysis of Multiple Motion, IEEE Wokshop Visual Motion, pp. 64-7, Pinceton, N.J., Oct. 99. [9] W.B. hompson and.c. Pong, Detecting Moving Objects," Int l J. Compute Vision, vol. 4, pp. 9-57, 990. [30] P..S. o and D.W. Muay, Stochastic Motion Clusteing, Euopean Conf. Compute Vision, pp. 38-337, May 994. [3] P..S. o, A. isseman, and S.J. Maybank, Robust Detection of Degeneate Configuations fo the Fundamental Matix, Int l Conf. Compute Vision, pp.,037-,04, Cambidge, Mass., June 995. [3] B.C. Vemui, S. uang, S. Sahni, A Robust and Efficient Algoithm fo Image Registation, Poc. of 5th Int l Conf. Infomation Pocessing in Medical Imaging, Poultney, V, pp. 465-470, 997. [33] J. Wang and E. Adelson, Layeed Repesentation fo Motion Analysis, IEEE Confeence Compute Vision and Patten Recognition, pp. 36-366, New Yok, June 993. Michal Iani eceived the BSc degee in mathematics and compute science fom the ebew Univesity of Jeusalem, Isael, in 985; the MSc and PhD in compute science fom the ebew Univesity of Jeusalem in 989 and 994, espectively. Duing 993-996, she was a membe of the technical staff in the Vision echnologies Laboatoy at David Sanoff Reseach Cente (SRI), Pinceton, N.J. D. Iani is now a membe of the faculty of the Applied Math and Compute Science depatment at the Weizmann Institute of Science, Isael. P. Anandan obtained his PhD in compute science fom the Univesity of Massachusetts, Amhest, Mass., in 987. Duing 987-99, he was an assistant pofesso of compute science at Yale Univesity, New aven, Conn., and duing 990-997 he was at the David Sanoff Reseach Cente, Pinceton, N.J. e is cuently a senio eseache at Micosoft Copoation, Redmond, Wash. is eseach inteests include compute vision with an emphasis on motion analysis and video pocessing, vision fo obotics, and leaning.