NEW CONCEPT FOR JOINT DISPARITY ESTIMATION AND SEGMENTATION FOR REAL-TIME VIDEO PROCESSING

Similar documents
Challenges and solutions for real-time immersive video communication

REAL-TIME DISPARITY MAPS FOR IMMERSIVE 3-D TELECONFERENCING BY HYBRID RECURSIVE MATCHING AND CENSUS TRANSFORM

3DPRESENCE A SYSTEM CONCEPT FOR MULTI-USER AND MULTI-PARTY IMMERSIVE 3D VIDEOCONFERENCING

Spatio-Temporal Stereo Disparity Integration

EE795: Computer Vision and Intelligent Systems

Realtime View Adaptation of Video Objects in 3-Dimensional Virtual Environments

Motion Analysis. Motion analysis. Now we will talk about. Differential Motion Analysis. Motion analysis. Difference Pictures

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

CS 565 Computer Vision. Nazar Khan PUCIT Lectures 15 and 16: Optic Flow

The Virtual Meeting Room

Spatio temporal Segmentation using Laserscanner and Video Sequences

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Multiview Generation for 3D Digital Signage

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION

Using temporal seeding to constrain the disparity search range in stereo matching

RENDERING AND ANALYSIS OF FACES USING MULTIPLE IMAGES WITH 3D GEOMETRY. Peter Eisert and Jürgen Rurainsky

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Data Term. Michael Bleyer LVA Stereo Vision

DEPTH IMAGE BASED RENDERING WITH ADVANCED TEXTURE SYNTHESIS. P. Ndjiki-Nya, M. Köppel, D. Doshkov, H. Lakshman, P. Merkle, K. Müller, and T.

Feature Tracking and Optical Flow

Postprint.

Topics to be Covered in the Rest of the Semester. CSci 4968 and 6270 Computational Vision Lecture 15 Overview of Remainder of the Semester

Recognition of Object Contours from Stereo Images: an Edge Combination Approach

Detecting motion by means of 2D and 3D information

Feature Tracking and Optical Flow

Peripheral drift illusion

A Statistical Consistency Check for the Space Carving Algorithm.

BI-DIRECTIONAL AFFINE MOTION COMPENSATION USING A CONTENT-BASED, NON-CONNECTED, TRIANGULAR MESH

Multi-stable Perception. Necker Cube

INTERACTIVE VIRTUAL VIEW VIDEO FOR IMMERSIVE TV APPLICATIONS

Dense Image-based Motion Estimation Algorithms & Optical Flow

CS 664 Segmentation. Daniel Huttenlocher

Depth Discontinuities by

Car tracking in tunnels

Finally: Motion and tracking. Motion 4/20/2011. CS 376 Lecture 24 Motion 1. Video. Uses of motion. Motion parallax. Motion field

Motion Estimation. There are three main types (or applications) of motion estimation:

Filter Flow: Supplemental Material

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

LOCALIZATION OF FACIAL REGIONS AND FEATURES IN COLOR IMAGES. Karin Sobottka Ioannis Pitas

CS 4495 Computer Vision Motion and Optic Flow

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Object Detection in Video Streams

Efficient Block Matching Algorithm for Motion Estimation

Module 7 VIDEO CODING AND MOTION ESTIMATION

A MOTION MODEL BASED VIDEO STABILISATION ALGORITHM

Registration concepts for the just-in-time artefact correction by means of virtual computed tomography

Automatic Tracking of Moving Objects in Video for Surveillance Applications

A Survey of Light Source Detection Methods

Human Motion Detection and Tracking for Video Surveillance

Predictive Interpolation for Registration

Improving Depth Maps by Nonlinear Diffusion

Synthesizing Realistic Facial Expressions from Photographs

International Journal of Modern Engineering and Research Technology

Part 3: Image Processing

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

Image Segmentation for Image Object Extraction

Stereo Vision II: Dense Stereo Matching

Types of image feature and segmentation

Fast Image Segmentation and Smoothing Using Commodity Graphics Hardware

ELEC Dr Reji Mathew Electrical Engineering UNSW

Motion and Optical Flow. Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi

Scene Segmentation by Color and Depth Information and its Applications

COMS W4735: Visual Interfaces To Computers. Final Project (Finger Mouse) Submitted by: Tarandeep Singh Uni: ts2379

CS334: Digital Imaging and Multimedia Edges and Contours. Ahmed Elgammal Dept. of Computer Science Rutgers University

Computer and Machine Vision

Project report Augmented reality with ARToolKit

Fast Denoising for Moving Object Detection by An Extended Structural Fitness Algorithm

Notes 9: Optical Flow

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution

A The left scanline The right scanline

Perceptual Quality Improvement of Stereoscopic Images

Multi-View Stereo for Static and Dynamic Scenes

Towards the completion of assignment 1

Broad field that includes low-level operations as well as complex high-level algorithms

PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Neue Verfahren der Bildverarbeitung auch zur Erfassung von Schäden in Abwasserkanälen?

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

An Approach for Reduction of Rain Streaks from a Single Image

SHORTEST PATH ANALYSES IN RASTER MAPS FOR PEDESTRIAN NAVIGATION IN LOCATION BASED SYSTEMS

A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation

MOVING OBJECT DETECTION USING BACKGROUND SUBTRACTION ALGORITHM USING SIMULINK

Novel Iterative Back Projection Approach

An Algorithm to Determine the Chromaticity Under Non-uniform Illuminant

Overview. Related Work Tensor Voting in 2-D Tensor Voting in 3-D Tensor Voting in N-D Application to Vision Problems Stereo Visual Motion

Detection and recognition of moving objects using statistical motion detection and Fourier descriptors

3-D Shape Reconstruction from Light Fields Using Voxel Back-Projection

Edge Detection. CS664 Computer Vision. 3. Edges. Several Causes of Edges. Detecting Edges. Finite Differences. The Gradient

Volumetric Operations with Surface Margins

Temporally Consistence Depth Estimation from Stereo Video Sequences

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Digital Image Processing

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

A Qualitative Analysis of 3D Display Technology

Chapter 9 Object Tracking an Overview

Structured light 3D reconstruction

Network Snakes for the Segmentation of Adjacent Cells in Confocal Images

Transcription:

NEW CONCEPT FOR JOINT DISPARITY ESTIMATION AND SEGMENTATION FOR REAL-TIME VIDEO PROCESSING Nicole Atzpadin 1, Serap Askar, Peter Kauff, Oliver Schreer Fraunhofer Institut für Nachrichtentechnik, Heinrich-Hertz-Institut, Einsteinufer 37, D-10587 Berlin Phone: ++49 30 31002 { 660, -610, -615, -620}, fax: ++49 30 3927200 Email: {nicole.atzpadin, serap.askar, peter.kauff, oliver.schreer}@hhi.fhg.de ABSTRACT We present a new concept of joint estimation and developed for a real-time immersive video conferencing system, which uses and estimation results to calculate a new virtual view of a conferee. For the subtraction is used which works well under limited conditions. The main goal of the new concept is to allow more general scenarios and to improve the disparities at depth discontinuities. The goal is reached with a twostage estimation which is nearly a hierarchical approach. The main advantage of the two-stage calculation is that can be performed in between these two steps and can use depth information to decide on foreground or and the second step of estimation can use information to speed up the algorithm. 1. INTRODUCTION The concept for joint estimation and for real-time video processing presented in this paper was developed in the context of an immersive teleconferencing system. An immersive teleconferencing system enables conferees located in different geographical places to meet around a virtual table, appearing at each station in such a way to create a convincing impression of presence [1]. The purpose is to enable the participants to make use of rich communication modalities as similar as possible to those used in a face-to-face meeting (e.g., gestures, eye contact, etc) and eliminate the limits of non-immersive teleconferencing, (e.g., face-only images in separate windows, unrealistic avatars). To realize an immersive teleconferencing system a multi-view camera set-up capturing the conferees is mounted around a large 2D display. A full 3D analysis delivers the information needed to render the remote conferees on the display of the local conferee adapted to the position of his head. To achieve realistic viewing conditions the conferees are shown in a shared virtual environment which means that they have to be segmented from the real scene with a fast and robust foreground- tool which is based on a subtraction. The disparities, which represent the depth of the whole scene, are calculated between two stereo camera pairs with an efficient algorithm which is based on hybrid recursive matching (HRM) - an universal approach on real-time analysis of displacement vector fields. Figure 1 gives an overview of the 3D analysis of one stereo pair. left image left map estimation for a 4x4 grid right map right image Fig. 1 : overview of the 3D analysis for one stereo pair 1 nee Nicole Brandenburg

One of the most important modules is the foreground, which separates the foreground object from the. The process is essential for the final view rendering, which integrates conferees together in a shared virtual environment. A further advantage is that the foreground object only covers half of the pixel of the whole frame which leads to less computational complexity in the subsequent processing steps. To achieve real-time the pixel itself is performed on sub-sampled images. The resulting binary marking each pixel as foreground or is interpolated to full image size by a sophisticated refinement scheme[2]. The overall goal of the whole 3D analysis is to estimate correct dense maps from the left to right and right to left camera input image. To reach this goal in real-time disparities are estimated for a sparse grid of 4x4, post-processed and then interpolated to dense maps. The algorithm is a hybrid recursive matching (HRM) approach which unites the advantages of block-recursive matching and pixelrecursive optical flow estimation in one common scheme. Its computational effort is minimised by the efficient selection of a small number of candidate vectors, guaranteeing both spatial and temporal consistency of disparities. 2. NEW CONCEPT The whole 3D analysis system introduced in the last chapter works well under ideal conditions. Ideal conditions exist if the is static without any moving objects and foreground object and are not of similar colour. But a videoconferencing system cannot always provide ideal conditions so that other information has to be included into the decision process of the to allow a more general scenario. Another problem with the existing system is the non-consideration of the window size problem in the estimation as mentioned in [3]. The intensity window must be large enough to include enough intensity variation for matching but small enough to avoid the effects of projective distortion. A lot of solutions were proposed in the last years, one of them is a hierarchical approach [4], were disparities are estimated from coarse to fine, starting with large windows, which become smaller from one step to the next. To overcome the problems of the existing approach we recommend a system with joint estimation and as it is shown in Fig. 2. The estimation is split into two parts. In the first part disparities are estimated for the foreground and in the sub-sampled images. In the second part the disparities are improved only for the foreground object. The splitting of estimation has the advantage that disparities and therefore depth information is available for the foreground-. As a by-product the estimation can solve the window size problem as it can use different window sizes in the two steps. left image left map estimation refinement right map Fig. 2 : joint estimation and right image The different processing stages will now be described in more detail. 2. HYBRID RECURSIVE MATCHING The hybrid block- and pixel-recursive estimator unites the advantages of block-recursive matching and pixel-recursive optical flow estimation in one common scheme[5]. The main idea of hybrid block- and pixel-recursive matching is to use neighbouring spatio-temporal candidates as input for the block-recursive estimation. The rationale is that such candidate vectors are the most likely to provide a good estimate of the for the current pixel. In addition, a further update vector is tested against the best candidate. This

update vector is computed by applying a local, pixelrecursive process to the current block, which uses the best candidate of block-recursion as a start vector. The whole algorithm can be divided into three stages: 1. three candidate vectors (two spatial and one temporal) are evaluated for the current block position by recursive block matching; 2. the candidate vector with the best result is chosen as the start vector for the pixel-recursive algorithm, which yields an update vector; 3. the final vector is obtained by comparing the update vector from the pixel recursive stage with the start vector from the block-recursive one. Only three candidates are tested in the recursive block matching to find the best match between the current pixel position in the left image and the related pixel position in the right image. A shape-driven displaced block difference (DBD) is taken as criterion. Large window sizes are required in order to estimate high reliable disparities which can be inaccurate due to projective distortions but not completely wrong as it could happen with smaller window sizes. Unfortunately large window sizes are connected to high computational complexity. A sub-sampling of the original images avoids this extra costs, because here also small window sizes lead to reliable results. The hybrid analysis scheme has two main advantages in comparison to common approaches. The recursive structure speeds up the analysis dramatically. The combined choice of spatial and temporal candidates yields reliable vectors and spatially and temporally consistent fields due to an efficient strategy of testing particular vector candidates. The latter aspect is important to avoid temporal inconsistencies in sequences, which may cause strongly visible and very annoying artefacts in virtual views synthesised on the basis of these disparities. As a post-processing step an efficient consistency check comparing the disparities from the left to right and right to left image discards all inconsistent disparities. 3. FG-BG SEGMENTATION The algorithm is based on a subtraction, comparing the current image of a particular conferee with a pre-known reference image (see Fig. 3). Fig. 3 : pre-known reference image and current image The structure of the module is shown in Fig. 4. The reference is captured during an initialisation phase at the beginning of the conference session where a short sequence is analysed. This provides further statistical information about noise. From this a pixel wise threshold is derived, which is used for. During the conference session global illumination changes are compensated with an adaptive offset. static initialization threshold offset image image decision morphological postprocessing disparities disp. threshold disp. offset disp. disparities disp. initialization disp. adaptation disp. adaptation result Fig. 4 : outline of the foreground-

The subtraction for colour images is enlarged to a subtraction for colour and depth images. Disparities are estimated for the preknown reference images in the initialisation phase and compared with the current disparities. Fig. 5 shows examples of consistent disparities. A adaptation is also reasonable for disparities because it allows little movements in the. Fig. 5 : pre-calculated disparities and current disparities (inconsistent disparities are black) The differences of the three colour channels y, u, and v and the difference are evaluated in the decision process to find a final result. The difference is stronger weighted than one of the colour channels. If either the of the or the foreground are inconsistent only colour information is used for the decision process. The result of is a binary marking each pixel as foreground (i.e. object) or. Because this is usually corrupted by noise different kinds of well-known non-linear and morphological postprocessing algorithms are applied to smooth the binary and get one closed object region, which represents the conferee. Fig. 6 shows results for the using only colour and for the using colour and depth without any post-processing. movements of the lead to wrong segmented pixel especially around the image on the wall. To speed up the process the is performed on sub-sampled images. Therefore a combination of and refinement is used to up-sample the final back to full resolution. This process is performed block wise. In homogenous areas, i.e. completely inside or outside the object region, the binary can be interpolated by simply copying the known pixels. At the transition regions must be particularly repeated in full resolution to preserve the contour as much as possible. This is achieved with a hierarchical filling [2]. Finally, based on the result of refinement the threshold offset is updated. 4. DISPARITY REFINEMENT A refinement of disparities resulting from the first step of estimation is unavoidable due to the low resolution of disparities produced by the sub-sampling. This means it is not sufficient to multiply all disparities with the sub-sampling factor. The refined disparities have to be estimated defining a search area around the position determined by the multiplied with the sub-sampling factor. Small window sizes are used for the comparison of intensities to exactly find borders which coincide with the object boundaries. Fig. 7 shows results using a search window of size 8x8 pixel. Fig. 7: sub-sampled disparities (left) and refined disparities (right) Fig. 6 : results with colour (left) and with colour and depth (right) The results only using colour are quite good in this case due to the nearly ideal scenario. Using depth gives an improvement in parts of the image where slight The improvement around depth discontinuities can clearly be recognized. More details of the hands become visible. This is achieved by two search ranges in the two depth layers around the border. The refinement has only to be performed inside the foreground object. To save computational costs disparities are only refined for pixel on a 4x4 grid. The remaining disparities are

interpolated with a sophisticated scheme introduced in [2]. 5. CONCLUSION In this paper we have introduced a new concept on joint estimation and, which stabilizes the and extends it to a more general scenario with small movements in the and similar colour in the foreground and. Additionally the estimation is improved especially at depth discontinuities due to a two-stage hierarchical approach refining disparities from coarse to fine. The main advantage of the new concept is that both algorithms use the results of the other algorithm. Disparity estimation needs less computational complexity in the second step because it has only to be performed inside the foreground object and the allows more generality in the scenario because of the used depth information. Processing (ICIAP'97), pages 685-692, Florence, Italy, September 1997 [5] P. Kauff, N. Brandenburg, M. Karl, O. Schreer: "Fast Hybrid Block- and Pixel-Recursive Disparity Analysis for Real-Time Applications in Immersive Tele-Conference Scenarios", Proc. of WSCG 2001, 9 th Int. Conf. on Computer Graphics, Visualization and Computer Vision, Pilzen, Czech Republic, February 2001 6. ACKNOWLEDGEMENTS This study is supported by the Ministry of Science and Technology of the Federal Republic of Germany, Grant- No.01 AK 022. 7. REFERENCES [1] P. Kauff, O.Schreer: "Virtual Team User Environments - A Step From Tele-Cubicles Towards Distributed Tele-Colaboration in Mediated Workspaces", Int. IEEE Conf. on Multimedia and Expo (ICME 2002), Lausanne, Switzerland, August 2002 [2] S. Askar, P. Kauff, N. Brandenburg, O. Schreer: Fast Adaptive Upscaling of Low Structured Images Using a Hierarchical Filling Strategy, Proc. of VIPromCom 2002, 4th EURASIP [3] T. Kanade and M. Okutomi. A stereo matching algorithm with an adaptive window: Theory and experiment. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 920-932, September 1994 [4] J.-I. Park and S.Inoue. Hierarchical depth mapping from multiple cameras. In Proceedings of International Conference on Image Analysis and