Motion Tracking and Event Understanding in Video Sequences

Similar documents
Continuous Multi-Views Tracking using Tensor Voting

Continuous Multi-View Tracking using Tensor Voting

calibrated coordinates Linear transformation pixel coordinates

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

What is Computer Vision?

Segmentation and Tracking of Partial Planar Templates

EE795: Computer Vision and Intelligent Systems

Object Recognition with Invariant Features

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Map-Enhanced UAV Image Sequence Registration and Synchronization of Multiple Image Sequences

Overview. Related Work Tensor Voting in 2-D Tensor Voting in 3-D Tensor Voting in N-D Application to Vision Problems Stereo Visual Motion

Multi-stable Perception. Necker Cube

Feature Tracking and Optical Flow

CSE 252B: Computer Vision II

CS231A Midterm Review. Friday 5/6/2016

Combining Appearance and Topology for Wide

Automatic Image Alignment (feature-based)

Stereo Vision. MAN-522 Computer Vision

EECS 442: Final Project

EE795: Computer Vision and Intelligent Systems

Stereo and Epipolar geometry

Feature Based Registration - Image Alignment

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Fitting. Fitting. Slides S. Lazebnik Harris Corners Pkwy, Charlotte, NC

Local Feature Detectors

Marcel Worring Intelligent Sensory Information Systems

Midterm Examination CS 534: Computational Photography

CS 664 Segmentation. Daniel Huttenlocher

Dense Image-based Motion Estimation Algorithms & Optical Flow

Week 2: Two-View Geometry. Padua Summer 08 Frank Dellaert

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Detecting and Identifying Moving Objects in Real-Time

Motion Estimation and Optical Flow Tracking

Chapter 3 Image Registration. Chapter 3 Image Registration

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

Vision par ordinateur

EE795: Computer Vision and Intelligent Systems

Feature Tracking and Optical Flow

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Computer Vision I - Filtering and Feature detection

3D Reconstruction from Scene Knowledge

Computer Vision Lecture 20

Detecting and Tracking Moving Objects for Video Surveillance. Isaac Cohen and Gerard Medioni University of Southern California

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Computer Vision Lecture 20

Optical flow and tracking

Robotics Programming Laboratory

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

RANSAC and some HOUGH transform

Final Exam Study Guide CSE/EE 486 Fall 2007

CS 565 Computer Vision. Nazar Khan PUCIT Lectures 15 and 16: Optic Flow

CS 534: Computer Vision Segmentation and Perceptual Grouping

EE 264: Image Processing and Reconstruction. Image Motion Estimation I. EE 264: Image Processing and Reconstruction. Outline

Introduction to Computer Vision

Lecture 8 Fitting and Matching

EECS 556 Image Processing W 09

Outline. Data Association Scenarios. Data Association Scenarios. Data Association Scenarios

Computer Vision Lecture 20

Model Fitting. Introduction to Computer Vision CSE 152 Lecture 11

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Chapter 9 Object Tracking an Overview

Motion Estimation. There are three main types (or applications) of motion estimation:

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Comparison between Motion Analysis and Stereo

Feature Detectors and Descriptors: Corners, Lines, etc.

Representing Moving Images with Layers. J. Y. Wang and E. H. Adelson MIT Media Lab

Local invariant features

Global Flow Estimation. Lecture 9

Perceptual Grouping from Motion Cues Using Tensor Voting

Instance-level recognition part 2

Other Linear Filters CS 211A

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

ELEC Dr Reji Mathew Electrical Engineering UNSW

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW

3D SSD Tracking with Estimated 3D Planes

CS 4495 Computer Vision Motion and Optic Flow

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Motion Estimation (II) Ce Liu Microsoft Research New England

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Structure from Motion. Prof. Marco Marcon

EE795: Computer Vision and Intelligent Systems

Dense 3D Reconstruction. Christiano Gava

Exploring Curve Fitting for Fingers in Egocentric Images

Model Fitting, RANSAC. Jana Kosecka

Visual motion. Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys

Peripheral drift illusion

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

ECE 470: Homework 5. Due Tuesday, October 27 in Seth Hutchinson. Luke A. Wendt

Multiple View Geometry. Frank Dellaert

The SIFT (Scale Invariant Feature

CS 664 Image Matching and Robust Fitting. Daniel Huttenlocher

Dense 3D Reconstruction. Christiano Gava

Region-based Segmentation

Autonomous Navigation for Flying Robots

Capturing, Modeling, Rendering 3D Structures

Lecture 19: Motion. Effect of window size 11/20/2007. Sources of error in correspondences. Review Problem set 3. Tuesday, Nov 20

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Transcription:

Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA

Objectives! Infer interesting events events in a video! Some examples: people meeting, exchanging objects, gesturing.! Events may take place over a range of time scales! Requires object recognition! Provide a convenient form to define events! An event recognition language (ERL) that can be compiled automatically to produce recognition programs! Compute a structured representation of video! Events and objects! Spatial and temporal relations between them

Example Video

Video Description! Descriptions useful for query, annotation and compression! Symbolic descriptions! Events: Name, actors (objects), reference objects, duration, place! Sub-events and relations between them! Object descriptions! Trajectory, Shape, Appearance! Background objects

Example Video Video with inferred annotations

Challenges! Translation from signals to semantic information! Signals are ambiguous (one to many mapping)! Image sequence analysis! Detection and tracking of moving objects! Generic and specific object recognition! Object appearances change with many variables (view point, illumination, occlusion, clothing )! Inference of events from object trajectories and identities

Topics! Moving blob detection and tracking! Static and moving cameras! Perceptual grouping for tracking! Detection and tracking of objects! Segmentation of blobs into objects! Tracking of articulated objects! Event recognition! Definition, compilation, computation! Miscellaneous! Activities, evaluations, future plans

Motion Detection (Static Cameras)! Construct an adaptive model of background! Each pixel modeled as a multi-dimensional Gaussian distribution in color space! Updated when new pixel values are observed! Extract foreground! A pixel is foreground if sufficiently different than current background model! Marked pixels grouped into connected components

Background Learning! Maintain an adaptive background model for the entire region of awareness! Model each pixel as a multi-dimensional Gaussian distribution (µ,σ) in RGB space! Update background when a new pixel values are observed

Background Learning! Maintain an adaptive background model for the entire region of awareness! Model each pixel as a multi-dimensional Gaussian distribution (µ,σ) in RGB space! Update background when a new pixel value x is observed: µ αx + σ 2 max where α = learning rate ( 1 α ) µ ( ( ) ( ) ) 2 2 2 σ, α x µ + 1 α σ min

Foreground Extraction! Detect changes produced by occluding elements! Compare observed pixel values to the current distribution! A new pixel observation x is marked as foreground if: ( x µ ) 2 > ( 2σ ) 2! Group foreground pixels in connected components - layered description (sprites)

Detection Example Note: blobs may not correspond to single objects

Detection (Moving Camera)! Compensate for the camera s motion by an affine transformation of images! Multi-resolution, robust feature-based matching! Accurate when depth variations in background and image area of moving objects are small! Detection of moving objects by! Residual motion! Color-based classification

Parameter Recovery! Recover 6 affine parameters : (a,b,c,d,t x,t y ) ->Solving a set of linear equations! Feature based approach:! Feature points are extracted by a corner detector! RANSAC for outliers removal! Linear Least Square Method using inliers + = y x t t y x d c b a y x ( ) T i T y x i y x T d c T b a y x y x ) ( 1 0 0 0 0 0 0 1 1 1 0 0 0 0 =

Panning Camera Mosaic shown from the first camera view point

Hand-held Camera

Limitations! Sudden illumination changes:! The layer model is learnt over a sliding window! Fragmented detection:! neighborhood properties are not considered! Mis-registration of images! Large depth variations in background and large image area of moving objects! Registration in the image joint space

Outlier Detection! e-ransac! RANSAC (Fischler-Bolles) is a standard tool for robust parameter estimation! Enhanced by controlling feature sampling! Tensor voting! General purpose grouping methodology! Encodes local properties and their uncertainties as tensors! Propagates local estimates to neighbors! Propagated information combined as a weighted tensor addition! Provides a saliency map, outliers have low saliency! Used to group points lying in salient planes in the joint image space

Robust Affine Motion Estimation in Joint Image Space! Parametric motion estimation! Widely used for video processing: image mosaics, compression, and surveillance! Affine motion model (six parameters)! Issues! commonly used due to simplicity and the small inter-frame camera motion.! Correspondence-based parameter estimation often fails in the presence of many mismatches and multiple motions! Robustness of estimation relies on outlier removal

Goal and Approach! Goal! Outlier removal for robust parameter estimation! Approach! Representation of correspondences in decoupled joint image spaces! Analyze the metric of the affine parameters in the defined space! Tensor voting-based outlier removal! Direct parameter estimation from correlation matrix of inliers

Illustration of Our Approach Initial Correspondences Tensor encoding in joint image space Second order Sparse Tensor Voting Inlier Detection and Grouping Initial correspondences by affine motions Views in 3D: Affine motions form planes in decoupled joint image space Parameter Recovery Affine Parameters

Affine Joint Image Space! Joint Image Space : (x, y, x, y )! Affine Transformation in the joint image space : ( q T 1 ) C = 0 q 1 C p q = p T a = c = p b d 1 0 0 1 ( x y x ' y ') T t t x y T y x (x,y) (x,y ) Correspondence! The equation : a quadric in the 4 dimensional joint image space of (x, y, x, y )! A 5x 5 matrix C is rank 2 (x,y,x,y ) Joint Image Space

Decoupled Affine Joint Image Space (1/2)! The parameters (a,b,t x ) and (c,d,t y ) are independent! We can decouple the joint image space into two spaces (x,y,x,y ) Joint Image Space! Decoupled joint image spaces! Defined by (x, y, x ) and (x, y, y )! Dimension reduction! Isotropic and orthogonal spaces! Allow to enforce affine constraint during tensor voting (x,y,x ) (x,y,y ) Decoupled Joint Image Spaces

Decoupled Affine Joint Image Space (2/2) Decoupled Joint Image Spaces Parameter Vectors q x =(x,y,x ) T p x =(a,b,-1,t x ) q y =(x,y,y ) T p y =(c,d,-1,t y ) Affine Transformation Equations qx = 1 = 0 qy Ax px A = 0 1 = y py! Consists of all points (q x,1) or (q y,1)! All points (q,1) T lie on a 2D plane given by equation A x or A y parameterized by p x or p y! If a correspondence is correct, it lies on a 2D plane! Outlier/Inlier detection is to extract points on a plane! Properties of a plane :! (a, b, 1) and (c, d, -1) define orientation of the plane! tx and ty define the perpendicular distance between the plane and the origin of the space

Generic Tensor Voting (1/2)! Data representation: Second order symmetric tensors! shape: orientation certainty! size: feature saliency! In 3D! 3 eigenvalues (λ max λ mid λ min )! 3 eigenvectors (e max e mid e min )! Surface extraction Saliency : λ max - λ mid Normal: e max

Generic Tensor Voting (2/2)! Communication: Voting! non-iterative, no initialization! Constraint representation: Voting fields! tensor fields encode smoothness criteria! Each input site propagates its information in a neighborhood! Each site collects the propagated information P FITTING SURFACE Vote Received O y x

Initial Tensor Encoding Initial Correspondences Tensor encoding in joint image space Second order Sparse Tensor Voting Inlier Detection and Grouping Parameter Recovery Affine Parameters! Initial Correspondence! Harris Corner & Cross Correlation! Stick tensor encoding! Use prior knowledge and assumption! The normal direction for each plane! n x = (a,b,-1), n y = (c, d, -1)! Assume! a=1, b=0, c=0, d=1, t x = (x-x ) and t y = (y-y )! Therefore, the normal direction for each plane is initialized by! n x = (1, 0, -1)! n y = (1, 0, -1)! Give preference of normal direction of the plane! Prevent from generating unecessary votes

Plane Extraction by Tensor Voting Initial Correspondences Tensor encoding in joint image space Second order Sparse Tensor Voting Inlier Detection and Grouping Parameter Recovery Affine Parameters! Tensor Voting for Plane Extraction! Vote with planar field for plane extraction! First sparse voting! Extract the normal direction of the 2D plane encoded by e 1 with saliency λ 1 - λ 2! Remove random correspondences! Second sparse voting! Enforce the normal direction extracted from the first voting

Plane Extraction by Tensor Voting! Tensor Voting for Plane Extraction! Vote with planar field for plane extraction! First sparse voting! Extract the normal direction of the 2D plane encoded by e 1 with saliency λ 1 - λ 2! Remove random correspondences! Second sparse voting! Enforce the normal direction extracted from the first voting

Plane Extraction by Tensor Voting! Tensor Voting for Plane Extraction! Vote with planar field for plane extraction! First sparse voting! Extract the normal direction of the 2D plane encoded by e 1 with saliency λ 1 - λ 2! Remove random correspondences! Second sparse voting! Enforce the normal direction extracted from the first voting

Inlier Detection & Grouping Initial Correspondences Tensor encoding in joint image space Second order Sparse Tensor Voting Inlier Detection and Grouping Parameter Recovery! Inlier detection and grouping! Inliers : Points having saliencies higer than a threshold (median)! Two inlier grouping criteria! Normal direction e 1 and locally smooth displacement (x-x ) or (y-y )! Parameter estimation :For each set of grouped inliers! Compute the correlation matrix : M x and M y! To estimate parameters a, b, t x, the eigen vector corresponding to the smallest eigen value of M x is used Affine Parameters

Experimental Result(1/2) Motion 1 : Foreground motion with rotation and translation

Experimental Result(1/2) Motion 1 : Foreground motion with rotation and translation Motion 2 : Conflicting foreground motion with translation

Experimental Result(1/2) Motion 1 : Foreground motion with rotation and translation Motion 2 : Conflicting foreground motion with translation Motion 3 : Transparent background motion with translation

Experimental Result(1/2) Motion 1 : Foreground motion with rotation and translation Motion 2 : Conflicting foreground motion with translation Motion 3 : Transparent background motion with translation Motion 4 : Random motion

Experimental Result(1/2)

Experimental Result(2/2)

Experimental Result(2/2)

Result (1/3) Input frames Initial correlation-based correspondences Residuals: by RANSAC(top) by our method (bottom)

Result (2/3) Input frames Initial correlation-based correspondences Residuals : by RANSAC(top) by our method (bottom)

Result (3/3) Input Video Sequence Stabilization by using our method

Closer Look Input Frames Accumulated Residuals by Our Method Accumulated Residuals by RANSAC

Summary and Future Work! Robust affine parameter estimation in the presence of many mismatches! Representation of correspondences in a decoupled joint image space! Outlier removal by using Tensor voting! Detect multiple motion layers by extracting multiple planes! Extension to eight parameter projective case! Analyze the geometric structure of projective transformation in joint image space

Detection of Moving Objects Two-step approach:! image stabilization to compensate for the motion induced by the observer! moving region detection using residual flow in two consecutive frames

Detection of Moving Objects! Layered representation of scenes! Static component of the scene: background layer! Moving objects: foreground layers! Detecting moving objects! Color-based and pixel-based temporal distribution! Color model : RGB Gaussian pixel-based model! Layer extraction! Background layer : µ! Foreground layers : pixels outside of 2σ range

Vehicle Detection Image sequence Moving objects

Panning Camera

Hand-held Camera

Entourage Results Mosaic Foreground regions

Summary: Moving Blob Detection! Static cameras! Background learning method is effective but can be affected by rapid illumination changes, common in some indoor videos! Moving Cameras! Ego-motion computation effective when scene objects are far or camera motion has no translation

Tracking Moving Regions! Tracking is performed by establishing correspondence between detected regions! Region similarity approach! Tensor Voting perceptual grouping approach! Using Geometric context

Tracking Moving Regions! Detection is performed using two consecutive frames! Tracking is performed by establishing correspondence between these regions! template inference! temporal coherence! temporal integration

Graph Representation of Moving Regions! A node: a detected moving region! An edge: a possible match between two regions

Graph Construction

Objects Trajectories! Tracking is performed by establishing correspondences between these regions! template inference! temporal coherence (>5 frames)! Implemented as a graph search:! node: moving region! edge: between matched regions

Objects Trajectories! Extract optimal path along each graph s connected components! Multiple objects tracking! Temporal integration! Optimality criterion:! Associate to each edge a cost! Characterize optimal paths

Optimality criterion! Associate to each edge a cost: c ij = C ij 2 1+ dij! similarity between the regions:! correlation Cij! shape, gray or color distribution...! proximity between regions:! distance dij! Local optimum

Optimality criterion: Temporal Integration! Characterize each node by its length:! length of the maximal path ending at this node l i = max { l, j successor( i) + 1! Associate to each edge the cost: j } Γ ij = l j c ij

Vehicle Tracking Image sequence Vehicle trajectory

Human Tracking Objects trajectories

Convoy Tracking

Region Similarity Approach! Regions are matched based on similarities! Gray level distributions, Distances! Graph representation of moving regions! Node: Detected moving region! Edge: A possible match between two regions! Inference of median shape from graph! Temporal coherence! Extract optimal paths along connected components of the graph! Multiple objects tracking

Multi-Object Tracking! Graph description well suited for:! Complete description of moving objects properties! Identifying temporal changes of objects:! occlusions! Tracking arbitrary number of moving objects

Multi-object Tracking

Tracking by Similarity Graph! Detected regions are matched based on similarities! Gray level distributions, distances in the image! Graph representation of matched regions! Node: detected moving region! Edge: A possible match between two regions! Extract optimal paths along connected components of the graph! Allows for multiple object tracking

Tracking Moving Blobs! Blobs split and merge due to occlusion, similarity with background, noise blobs appear! Use of region similarities and trajectory smoothness can help infer good trajectories! Perceptual grouping using graph representation and tensor voting

Some Tracking Problems Input: A set of moving regions Noise Crossings Occlusions?? Bad detections Output: Identification and Tracking of the objects

2D+t Representation

Tracking by Tensor Voting! Propagate motion and shape information into uncertainty regions around paths in the graph! Accumulate votes by tensor voting process! Includes process for merging and splitting of blobs to yield smooth trajectories Time t Time t For instance...

Grouping Example

Multi-Scale Processing Gap completion depends on the scale of the voting field (σ)! Larger σ provides more filtering and continuity but is more expensive and tracks are less precise Filtering Merging (a) σ = 10 (b) σ = 20

Multiple Objects Tracking

Tracking with multiple cameras! Motivation! Multiple cameras can provide larger field of view and reduce effects of occlusion! Issues! Registration of multiple views! Calibration of different cameras! Synchronization of different video streams! Different frame rates! Grouping of trajectories across views

Approach! Infer trajectories in each camera view! Tracking by Tensor Voting! Register cameras views using a homography! Use of the ground plane! Misalignment can occur if trajectories are not on the ground plane or the streams are not synchronized! Synchronize the multiple video streams! Register and merge trajectories in 2D+t using a homography! Current implementation not real-time (~1 fps)

Two Video Streams Two partially overlapping views

Registration of Multi Camera! Registration of views using ground plane! Use of a projective transform! Requires at least 4 pairs of corresponding points on the ground plane

Cameras registration 1 H 2 View 1 2 H 1 View 2

Registration of Trajectories! Use of a homography for registering cameras views! Misalignment can occur if:! Trajectories are not on the ground plane! Input streams are not synchronized! Shutter speeds of cameras are different! Use of objects trajectories for registering the views! Videos synchronization using observed trajectories! Align objects trajectories

Registration of Video Streams

Projective Registration Registered Trajectories Registered Views

Reverse View Registered Views Registered Trajectories

Another Example

Projective Registration Registered Trajectories Registered Views

Reverse View Registered Views Registered Trajectories

Tracking Using Multiple Cameras: Summary! Ability to merge trajectories across views! Reduces effects of occlusions! Increases the field of view! Speed: 1 to 1.5 Frames per seconds! Future Extensions! Inference of the 3D tracks! Use camera calibration from vanishing points! Use multi-view geometry! Automatic synchronization of trajectories! Accommodate different shutter speeds

Summary: Blob Tracking! Real-time tracking with hierarchical approach and graph-based methods! Region similarity exploited but not continuity! Perceptual grouping using tensor voting! Slower, but better tracks! Merging multiple camera tracks is in early development stage! Future work! Improve efficiency of perceptual grouping! Develop adaptive multi-scale approach! Integrated detection and tracking from multiple cameras

Essay #1! We have presented two detection algorithms:! Background learning method for stationary cameras! Residual motion for moving platforms Describe these algorithms and suggest an approach for combining the two (I.e. a background learning method for moving platforms).

Essay #2! Various methods are used for estimating camera motion: Direct parameter estimation, RANSAC Can you describe the joint image space algorithm and explain why registering two images amounts to identify planar patches in the joint image space

Essay #3! Describe the tracking algorithms presented:! Graph based blob tracking! Perceptual grouping based tracking And explain what are the advantages in combining both for tracking moving objects in a scene.