Motion Detection and Segmentation Using Image Mosaics

Similar documents
CyberScout: Distributed Agents for Autonomous Reconnaissance and Surveillance

Carnegie Mellon University Pittsburgh, PA ABSTRACT

Thesis. For the Degree of Master of Science. Department of Electrical and Computer Engineering. Carnegie Mellon University Forbes Ave.

Human Motion Detection and Tracking for Video Surveillance

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Backpack: Detection of People Carrying Objects Using Silhouettes

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Segmentation and Tracking of Partial Planar Templates

A Fast Moving Object Detection Technique In Video Surveillance System

Feature Tracking and Optical Flow

Detecting People in Images: An Edge Density Approach

Chapter 3 Image Registration. Chapter 3 Image Registration

CSE 252B: Computer Vision II

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

A Real Time System for Detecting and Tracking People. Ismail Haritaoglu, David Harwood and Larry S. Davis. University of Maryland

EE795: Computer Vision and Intelligent Systems

A Moving Object Segmentation Method for Low Illumination Night Videos Soumya. T

Class 3: Advanced Moving Object Detection and Alert Detection Feb. 18, 2008

Global Flow Estimation. Lecture 9

An Approach for Real Time Moving Object Extraction based on Edge Region Determination

CS 4495 Computer Vision Motion and Optic Flow

A Background Subtraction Based Video Object Detecting and Tracking Method

EE795: Computer Vision and Intelligent Systems

Background Subtraction Techniques

Adaptive Background Mixture Models for Real-Time Tracking

Feature Tracking and Optical Flow

Optical flow and tracking

Dense Image-based Motion Estimation Algorithms & Optical Flow

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

Available online at ScienceDirect. Procedia Computer Science 58 (2015 ) Background Modelling from a Moving Camera

A Texture-based Method for Detecting Moving Objects

3. International Conference on Face and Gesture Recognition, April 14-16, 1998, Nara, Japan 1. A Real Time System for Detecting and Tracking People

A Background Modeling Approach Based on Visual Background Extractor Taotao Liu1, a, Lin Qi2, b and Guichi Liu2, c

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

Chapter 9 Object Tracking an Overview

Object Detection in Video Streams

BACKGROUND MODELS FOR TRACKING OBJECTS UNDER WATER

Moving Object Detection for Video Surveillance

Peripheral drift illusion

Video Surveillance for Effective Object Detection with Alarm Triggering

16720 Computer Vision: Homework 3 Template Tracking and Layered Motion.

Multi-stable Perception. Necker Cube

Robert Collins CSE598G. Intro to Template Matching and the Lucas-Kanade Method

Motion Detection Using Adaptive Temporal Averaging Method

Fast Natural Feature Tracking for Mobile Augmented Reality Applications

FOREGROUND DETECTION ON DEPTH MAPS USING SKELETAL REPRESENTATION OF OBJECT SILHOUETTES

Compositing a bird's eye view mosaic

PA2 Introduction to Tracking. Connected Components. Moving Object Detection. Pixel Grouping. After Pixel Grouping 2/19/17. Any questions?

ELEC Dr Reji Mathew Electrical Engineering UNSW

Tracking and Recognizing People in Colour using the Earth Mover s Distance

CSE 252A Computer Vision Homework 3 Instructor: Ben Ochoa Due : Monday, November 21, 2016, 11:59 PM

A NOVEL MOTION DETECTION METHOD USING BACKGROUND SUBTRACTION MODIFYING TEMPORAL AVERAGING METHOD

Tri-modal Human Body Segmentation

Motion Detection Algorithm

A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation

A Two-Stage Template Approach to Person Detection in Thermal Imagery

Occlusion Robust Multi-Camera Face Tracking

Motion and Optical Flow. Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi

Outline 7/2/201011/6/

Video Surveillance System for Object Detection and Tracking Methods R.Aarthi, K.Kiruthikadevi

MOVING OBJECT DETECTION USING BACKGROUND SUBTRACTION ALGORITHM USING SIMULINK

Clustering Based Non-parametric Model for Shadow Detection in Video Sequences

Adaptive Skin Color Classifier for Face Outline Models

Motion Estimation. There are three main types (or applications) of motion estimation:

Computer Vision I. Announcements. Fourier Tansform. Efficient Implementation. Edge and Corner Detection. CSE252A Lecture 13.

Computer Vision I. Announcement. Corners. Edges. Numerical Derivatives f(x) Edge and Corner Detection. CSE252A Lecture 11

SURVEY PAPER ON REAL TIME MOTION DETECTION TECHNIQUES

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Automatic Shadow Removal by Illuminance in HSV Color Space

AUTOMATIC OBJECT DETECTION IN VIDEO SEQUENCES WITH CAMERA IN MOTION. Ninad Thakoor, Jean Gao and Huamei Chen

/13/$ IEEE

A Survey on Moving Object Detection and Tracking in Video Surveillance System

Detection of a Single Hand Shape in the Foreground of Still Images

Tracking Humans Using a Distributed Array of Motorized Monocular Cameras.

Visual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.

Detection and recognition of moving objects using statistical motion detection and Fourier descriptors

Progress Report of Final Year Project

Moving Object Detection with Fixed Camera and Moving Camera for Automated Video Analysis

Motion Tracking and Event Understanding in Video Sequences

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm

Multiple Model Estimation : The EM Algorithm & Applications

Support Vector Machine-Based Human Behavior Classification in Crowd through Projection and Star Skeletonization

Visual Tracking (1) Feature Point Tracking and Block Matching

Postprint.

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Aircraft Tracking Based on KLT Feature Tracker and Image Modeling

DYNAMIC BACKGROUND SUBTRACTION BASED ON SPATIAL EXTENDED CENTER-SYMMETRIC LOCAL BINARY PATTERN. Gengjian Xue, Jun Sun, Li Song

Camera Calibration for a Robust Omni-directional Photogrammetry System

Image Coding with Active Appearance Models

Classification of objects from Video Data (Group 30)

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES

A Robust Two Feature Points Based Depth Estimation Method 1)

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Survey on Wireless Intelligent Video Surveillance System Using Moving Object Recognition Technology

Robust Camera Pan and Zoom Change Detection Using Optical Flow

Real-time target tracking using a Pan and Tilt platform

EE795: Computer Vision and Intelligent Systems

Automatic Video Object Segmentation and Tracking from Non-Stationary Cameras

Transcription:

Research Showcase @ CMU Institute for Software Research School of Computer Science 2000 Motion Detection and Segmentation Using Image Mosaics Kiran S. Bhat Mahesh Saptharishi Pradeep Khosla Follow this and additional works at: http://repository.cmu.edu/isr This Conference Proceeding is brought to you for free and open access by the School of Computer Science at Research Showcase @ CMU. It has been accepted for inclusion in Institute for Software Research by an authorized administrator of Research Showcase @ CMU. For more information, please contact research-showcase@andrew.cmu.edu.

Motion Detection and Segmentation Using Image Mosaics Kiran S. Bhat 1 Mahesh Saptharishi 2 Pradeep K. Khosla 1,2 1. Robotics Institute 2. Electrical And Computer Engineering Pittsburgh, PA 15213 {kiranb mahesh pkk}@cs.cmu.edu Abstract e propose a motion segmentation algorithm for extracting foreground objects with a pan-tilt camera. Segmentation is achieved by spatio-temporal filtering of the scene to model the background. Temporal filtering is done by a set of modified AR (Auto-Regressive) filters which model the background statistics for a particular view of the scene. Backgrounds from different views of the pan-tilt camera are stitched together into a planar mosaic using a real-time image mosaicing strategy. Our algorithms work in real-time, require no user intervention, and facilitate high-quality video transmission at low bandwidths. 1 Introduction Consider a typical video conferencing application. Participants are either stationary or moving in a controlled environment such as a room. If we can segment the foreground (non-stationary objects) from the background (stationary objects), we can achieve low bit-rate videoconferencing by transmitting the background once and continuously transmitting the foreground. Additionally, if the participants move outside the field of view, the camera should pan or tilt such that they always stay within its field of view. A system that implements the application described above requires robust algorithms for motion detection, segmentation and tracking with a pan-tilt camera. This paper describes a motion detection algorithm that learns the background statistics of a temporally consistent scene. e automatically build an image mosaic of the background by exploring the visibility range of the pan/tilt camera initially. e continuously adapt the background model contained in the viewable subset of the mosaic. e perform motion segmentation by comparing the current camera image with its corresponding background indexed from the mosaic. After segmentation, we track the users by appropriately controlling the camera rotations. Multiple users are temporally corresponded using a novel object correspondence scheme. The novelty of the suggested approach lies in two main areas. First, we propose a simple motion detection and segmentation algorithm that relies on low-level processing, but has the capacity to adapt itself using feedback from higher-level processes such as the classification and correspondence algorithms. Se, we propose a method Detections Camera Detector Classifier Frame Feedback Foreground Pan/Tilt Tracking Angles Feedback Figure 1. Modular organization of the system. Each block is part of the pipelined processing chain and executes concurrently. The arrows illustrate the information flow. The detector block is explained in sections 3 and 4. for automatically constructing and indexing image mosaics by locating and tracking salient features in the scene. e have implemented the proposed algorithms for realtime motion detection, segmentation and tracking. The system requires no help from the user in carrying out its task. 2 System Architecture Background Correspondence Application Layer Video Conf e have developed an agent-based architecture where a series of modules can be cascaded to form a pipelined processing chain. Each module in the processing chain operates concurrently within its own thread. Figure 1 shows the connectivity of the architecture. The camera produces 320 240 8-bit grayscale images which are sent to the detector agent. The detector maintains the background mosaic and segments moving objects from the image. The classifier [1] labels the detected moving objects as person or people (group of persons). It rejects uninteresting detections such as shadows and false alarms, and feeds back this information to the detector. The detection process is described in sections 3 and 4. The correspondence agent [2] temporally associates the labelled objects. It feeds back the predicted future positions of the moving objects to the detector. The correspondence agent informs the tracker of the predicted future locations of non-stationary objects, which in turn controls the pan-tilt to track a selected object. The tracker informs the detector of the camera position for use with the mosaic indexing. The information feedback from the classifier

and correspondence agents is used to adapt the local sensitivity parameters of the detection filters. This simple feedback mechanism is extremely effective in improving the SNR, and therefore the reliability, of the detections. 3 Modeling the Background This section addresses the problem of modeling and adapting the background contained within the current viewable subset of the image mosaic. Our basic algorithm is similar to those described in [3, 4, 5]. The algorithm described in [3] uses an IIR filter to model the background. In [4], Grimson et al. use a gaussian mixture to estimate the background of the scene. Both [3,4] use color imagery in their background modeling process. [5] describes a technique that estimates the maximum and minimum intensity differences for each pixel while there are no moving objects in the scene. This information is then used to detect moving objects. Like the system described in [5], we use grayscale imagery to estimate the background. A key difference in our work is the use of feedback from higher-level processes such as the classifier and the correspondence agents to adapt the detection process. This information feedback helps us simplify the detection algorithm while performing just as well as more complicated detection processes such as [4]. 3.1 The AR Filter B n, x, y ( 1 η)b ( n 1), x, y + ηi n, x, y 0 η 1 e seek to model the background by using a set of AR filters to represent each pixel. Let I nxy,, represent a pixel at time n and at position ( xy, ) in a 320 240 8-bit grayscale image. Similarly, let B ( n 1), xy, represent the predicted background value for that pixel. A significant difference D n, x, y I n, x, y B ( n 1), xy, between the image and the background values suggests the presence of a moving object. If D nxy,, b nxy,, > 0, the pixel is classified as foreground. If D nxy,, b n, x, y 0, the pixel is classified as background, where b n, x, y represents the threshold for a particular pixel. At each step, we wish to gradually minimize the difference between the background model B ( n 1), xy, and the image I nxy,, by using the update rule B n, x, y B ( n 1), xy, + ηd n, x, y, where η represents the learning rate constant. The update rule can also be posed as an AR filter, as shown in equation (1). Rather than making the threshold b n, x, y a constant, we adapt it just as we adapt the background, using an AR filter. This simple AR filter-based background model is quite effective, but suffers when objects in the scene are moving (1) too slowly. Often when the moving objects are slow, the background incorrectly acquires part of the object, resulting in false alarms. To alleviate this problem, we introduce a itionally lagged background model. The itionally lagged background model, B n, x, y is set to the continu- ously updated model, B n, x, y, if the pixel is classified as a foreground pixel. If the pixel is classified as a background pixel, then we don t update the itionally lagged background. If after some T time steps, the magnitude of the difference between the itionally lagged background and the image, D n, x, y, is less than the magnitude of D n, x, y, the value of B n, x, y is reset with the value of B n, x, y. On the other hand, if the magnitude of D n, x, y is less than the magnitude of D n, x, y, the value of B n, x, y is set to B n, x, y. The classification of a pixel as foreground or background now depends on D nxy,, b n, x, y. hen T is chosen appropriately, this technique prevents the false alarms caused by the movement of slow objects. Given a binary map of foreground detections, connected components analysis is used to segment the moving object from the background. Prior to this step, morphological operations such as closing and erosion are performed to remove stray detections. Each positive binary value (i.e., a detection) is replaced with the actual grayscale value from the original image. Thus, the segmented image contains just the grayscale values of the moving object without the background. 3.2 The Feedback Mechanism The feedback mechanism allows for adjusting the sensitivity parameters of the detections. This contains information about the labels (as people or person ) and correspondences of each detected pixel. The information feedback is used to adapt a Perceptron to better classify a pixel as foreground or background. The classification of a pixel as foreground depends on the inequality D nxy,, b n, x, y > 0. e can redefine the classification rule as ω 0 ω 1 D nxy,, ω 2 b n, x, y > 0, where each of the weights, ω i, is adaptable. The weights are updated using the standard perceptron learning rule. If a pixel was classified as foreground and was part of a successfully labelled and corresponded object, the classification (as foreground) is considered correct. On the other hand, if the pixel was part of a rejected object, then the classification is considered incorrect. In practice we have found that adapting ω 0 alone provides a considerable increase in detection performance. For videoconferencing applications, it is important to track foreground objects that become stationary after a while. The information feedback from the correspondence agent is used to determine which pixels represent a stationary object. A new AR filter is initialized to keep track

of the changing intensity of the pixel. This new AR filter can be visualized as an additional background layer. As long as the difference between the value of the new AR filter and the original AR filter is significant (as determined by the perceptron), the layer is maintained. This ensures that the now stationary foreground object is not regressed into the original background of the scene. 4 Mosaic Generation and Indexing This section describes the algorithm used to construct an image mosaic from a sequence of background images. The background represented by the viewable subset of the image mosaic is continuously updated by the algorithm described in the previous section. Several techniques have been proposed to create an image mosaic from sequences of images [6, 7, 8]. They obtain the registration between images by minimizing the sum squared error of image intensities at each pixel. Although these techniques produce very accurate registration results, they tend to be slow, and typically require user interaction to initialize the registration. e create an image mosaic in near-real time (5 fps) by locating and tracking feature points in the image sequence. This technique is much faster than previous techniques, and does not require any user intervention. e also propose a method to accurately index the viewable portion of the image mosaic corresponding to a particular camera rotation. 4.1 Mosaic building e have implemented a fully automatic algorithm to stitch together images captured by a pan-tilt camera. The algorithm uses a feature tracker to robustly identify and track feature points through the background image sequence [9]. Consider a pixel x ( x, y) in an image I that translates to x ( x + t x, y + t y ) in the subsequent image J. e estimate the translation t ( t x, t y ) by minimizing the intensity error between I and J where is a Et () [ Jx ( + t) Ix ( )] (2) 7x7 feature window containing x. After a first-order Taylor series expansion of equation (2), we obtain equation (3), where e J( x) Ix ( ) and g T x Jx ( ). This optimi- Et ( ) ( e + g T t) (3) zation problem has the least-square solution (4). gg T eg (4) The eigenvalues of the 2x2 matrix Z (defined in (5)), Z gg T (5) give a good measure of the texture of the image window. In particular, we choose a feature point if the minimum eigenvalue is greater than a set threshold. e select and track N feature points (N 80 to 100) through the image sequence. e use the coordinates of the tracked feature points to fit an affine model between the two images J and I in the sequence. The affine model is expressed in equation (6),. where x is the coordinate of one feature x x ax + by + f (6) y cx + dy + h point in image J and x is its corresponding feature point coordinate in image I. e can obtain the affine parameters by solving the overconstrained system (7) in the form of equation (8). The least square solution is given by the b Aw (7) x 1 x 2 x N x1 y1 1 x2 y2 1 xn yn 1 a b f pseudoinverse, w ( A T A) 1 A T b. Once the affine parameters w [ a, b, f, c, d, h] T have been calculated, we can warp all the images with respect to a common coordinate system. For our experiments, we have arbitrarily chosen the first image as the reference, and warped all other images into the first image s coordinate system. e use a triangular weighting function (with maximum weight at the image center and zero weight at the edges) to blend overlapping regions between different warped images. The affine transformation allows for representation of the motion of a planar surface under orthographic projection. A more accurate motion model is the perspective transformation, which is characterized by 8 parameters. However, for our application, we have found that an affine model is sufficiently accurate. Moreover, it is extremely simple to implement and is very fast. 4.2 Mosaic Indexing y 1 y 2 y N x1 y1 1 x2 y2 1 xn yn 1 Our system initially builds a background mosaic of the entire viewable environment. During videoconferencing, as the camera rotates, we index and update the corresponding viewable subset of the background mosaic to perform motion detection and segmentation. During the mosaic construction, we store the pan and tilt angles for all the (8) c d h

(a) (b) (c) Raw camera image Background mosaic subset Output of the detector Figure 2. Results of mosaicing and detection. (b) shows the mosaic constructed using a spatial sequence of 70 images. 4 images from this sequence are shown in (a). (c) shows a moving person at a particular camera rotation, the corresponding subset indexed from the background mosaic, and the extracted foreground. images in the sequence, in addition to their affine warp parameters. For given rotation angles (obtained from the tracking agent) we can coarsely index the mosaic using the stored affine parameters. However, due to hysteresis and other errors in the pan-tilt unit, the indexed image is offset relative to the actual camera image by around 2-3 pixels. e solve this problem by performing registration between the indexed mosaic image and the actual camera image. e approximately mask the large moving regions in both images by subtracting the images and ignoring regions with large errors. Then we solve for the translation between the two masked images using the same techniques described in equations (2-4), with the exception that now indicates the entire masked image. Once we find the translation parameter t, we can index the correct portion of the mosaic and can perform accurate motion segmentation. 5 Results and Conclusions Figure 2(b) shows the mosaic of a room created from a sequence of 70 images, four of which are shown in Figure 2(a). Figure 2(c) shows (l-r) an image of the user, the corresponding indexed background from the mosaic and the detected foreground object. Space limitations prevent us from showing a sequence of segmented images of the user as he moves around in the room. The mosaicing algorithm initially registers images at 5Hz, and the subsequent indexing and detection algorithms execute at 10 Hz on a Pentium 266MHz laptop. As seen from the figure, the detector successfully builds a background mosaic and segments the foreground objects. The proposed algorithm for segmentation can be improved by incorporating other relevant visual cues. Our algorithms provide a powerful infrastructure for transmitting video in real time at low bandwidths. References [1] C. Diehl, M. Saptharishi, J. Hampshire, and P. Khosla. Collaborative surveillance using both fixed and mobile unattended ground sensor platforms. In SPIE Proceedings on Unattended Ground Sensor Technologies and Applications, vol. 3713, pp. 178-185, 1999. [2] M. Saptharishi, J. Hampshire, and P. Khosla.Agent-Based moving object correspondence using differential discriminative diagnosis. Submitted to CVPR, 2000. [3] A.J. Lipton, H. Fujiyoshi and R.S. Patil. Moving target classification and tracking from real time video. In IEEE orkshop on Applications of Computer Vision, pp. 8-14, 1998. [4].E.L. Grimson, L. Lee, R. Romano, and C. Stauffer. Using adaptive tracking to classify and monitor activities in a site. In CVPR98, pp. 22-31, 1998. [5] I. Haritaoglu, D. Harwood and L. Davis. 4 : ho? hen? here? hat? A real time system for detecting and tracking people. In IEEE International Conference on Automatic Face and Gesture Recognition, pp. 222-227, 1998. [6] R.Szeliski and H.Shum. Creating full view pararomic image mosaics and environment maps. Computer Graphics Proceedings, Annual Conference Series, pp. 251-258, 1997 [7] M. Irani, P. Anandan, J. Bergen, R. Kumar, and S. Hsu, Mosaic representations of video sequences and their applications. Signal Processing: Image Communication, special issue on Image and Video Semantics: Processing, Analysis, and Application, Vol. 8, No. 4, pp.327-351, May 1996. [8] F. Dufaux and F Moscheni. Background Mosaicking for low bit rate video coding.ieee ICIP 96, pp. 673-676, September 1996 [9] J.Shi and C.Tomasi. Good Features to Track. In CVPR94, pp. 593-600, June 1994.