Disparity Search Range Estimation: Enforcing Temporal Consistency

Similar documents
Depth Estimation for View Synthesis in Multiview Video Coding

Fast Natural Feature Tracking for Mobile Augmented Reality Applications

View Synthesis for Multiview Video Compression

Using temporal seeding to constrain the disparity search range in stereo matching

ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012

Multi-Camera Calibration, Object Tracking and Query Generation

Next-Generation 3D Formats with Depth Map Support

View Synthesis Prediction for Rate-Overhead Reduction in FTV

Leow Wee Kheng CS4243 Computer Vision and Pattern Recognition. Motion Tracking. CS4243 Motion Tracking 1

Extensions of H.264/AVC for Multiview Video Compression

Key properties of local features

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Integrating LIDAR into Stereo for Fast and Improved Disparity Computation

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Video Processing for Judicial Applications

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Visual Tracking (1) Feature Point Tracking and Block Matching

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Feature Detection and Matching

Evaluation of Stereo Algorithms for 3D Object Recognition

III. VERVIEW OF THE METHODS

The SIFT (Scale Invariant Feature

Automatic Disparity Search Range Estimation for Stereo Pairs of Unknown Scenes

SURF: Speeded Up Robust Features. CRV Tutorial Day 2010 David Chi Chung Tam Ryerson University

Kanade Lucas Tomasi Tracking (KLT tracker)

View Synthesis for Multiview Video Compression

Analysis of Depth Map Resampling Filters for Depth-based 3D Video Coding

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Visual Tracking (1) Pixel-intensity-based methods

Aircraft Tracking Based on KLT Feature Tracker and Image Modeling

Tracking Computer Vision Spring 2018, Lecture 24

EE795: Computer Vision and Intelligent Systems

Temporally Consistence Depth Estimation from Stereo Video Sequences

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization

Part-Based Skew Estimation for Mathematical Expressions

Kanade Lucas Tomasi Tracking (KLT tracker)

Overview of Multiview Video Coding and Anti-Aliasing for 3D Displays

Robotics Programming Laboratory

Scale Invariant Feature Transform

Computational Optical Imaging - Optique Numerique. -- Single and Multiple View Geometry, Stereo matching --

The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation

Synthetic Aperture Imaging Using a Randomly Steered Spotlight

Motion Estimation and Optical Flow Tracking

Robot localization method based on visual features and their geometric relationship

Fast Image Registration via Joint Gradient Maximization: Application to Multi-Modal Data

Asymmetric 2 1 pass stereo matching algorithm for real images

Comparison of Graph Cuts with Belief Propagation for Stereo, using Identical MRF Parameters

Determinant of homography-matrix-based multiple-object recognition

QueryLines: Approximate Query for Visual Browsing

arxiv: v1 [cs.cv] 28 Sep 2018

Face Detection Using Look-Up Table Based Gentle AdaBoost

Visual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.

Segmentation and Tracking of Partial Planar Templates

Scale Invariant Feature Transform

Automatic Logo Detection and Removal

What have we leaned so far?

Object Detection by Point Feature Matching using Matlab

Depth-Layer-Based Patient Motion Compensation for the Overlay of 3D Volumes onto X-Ray Sequences

A Novel Algorithm for Pose and illumination Invariant Image Matching

Occluded Facial Expression Tracking

Outline 7/2/201011/6/

Segment-based Stereo Matching Using Graph Cuts

EE795: Computer Vision and Intelligent Systems

A Comparison of SIFT, PCA-SIFT and SURF

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Peripheral drift illusion

COMPUTER VISION > OPTICAL FLOW UTRECHT UNIVERSITY RONALD POPPE

Feature Based Registration - Image Alignment

Multi-stable Perception. Necker Cube

SIFT: Scale Invariant Feature Transform

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki

Visual Bearing-Only Simultaneous Localization and Mapping with Improved Feature Matching

Computational Optical Imaging - Optique Numerique. -- Multiple View Geometry and Stereo --

A novel 3D torso image reconstruction procedure using a pair of digital stereo back images

Local Features Tutorial: Nov. 8, 04

Research on an Adaptive Terrain Reconstruction of Sequence Images in Deep Space Exploration

Intermediate View Generation for Perceived Depth Adjustment of Stereo Video

Omni-directional Multi-baseline Stereo without Similarity Measures

Visual Tracking. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Local features: detection and description. Local invariant features

Local features and image matching. Prof. Xin Yang HUST

Improved depth map estimation in Stereo Vision

LBP-GUIDED DEPTH IMAGE FILTER. Rui Zhong, Ruimin Hu

Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC

Combining Appearance and Topology for Wide

Stereo Matching with Reliable Disparity Propagation

Factorization Method Using Interpolated Feature Tracking via Projective Geometry

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Real-Time Scene Reconstruction. Remington Gong Benjamin Harris Iuri Prilepov

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

CAP 5415 Computer Vision Fall 2012

Detecting motion by means of 2D and 3D information

Towards the completion of assignment 1

SURF applied in Panorama Image Stitching

Eppur si muove ( And yet it moves )

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Stereo Vision II: Dense Stereo Matching

Ensemble of Bayesian Filters for Loop Closure Detection

Transcription:

MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Disparity Search Range Estimation: Enforcing Temporal Consistency Dongbo Min, Sehoon Yea, Zafer Arican, Anthony Vetro TR1-13 April 1 Abstract This paper presents a new approach for estimating the disparity search range in stereo video that enforces temporal consistency. Reliable search range estimation is very important since an incorrect estimate causes most stereo matching methods to get trapped in local minima or produce unstable results over time. In this work, the search range is estimated based on a disparity histogram that is generated with sparse feature matching algorithms such as SURF. To achieve more stable results over time, we further propose to enforce temporal consistency by calculating a weighted sum of temporally neighboring histograms, where the weights are determined by the similarity of depth distribution between frames. Experimental results show that this proposed method yields accurate disparity search ranges for several challenging stereo videos and is robust to various forms of noise, scene complexity and camera configurations. 1 international Conference on Acoustics, Speech Siganl Processing This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 1 1 Broadway, Cambridge, Massachusetts 2139

MERLCoverPageSide2

DISPARITY SEARCH RANGE ESTIMATION: ENFORCING TEMPORAL CONSISTENCY Dongbo Min 1,2, Sehoon Yea 1, Zafer Arican 3 and Anthony Vetro 1 Mitsubishi Electric Research Laboratories (MERL), Cambridge, USA 1 Yonsei University, Seoul, Korea 2 Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland 3 ABSTRACT This paper presents a new approach for estimating the disparity search range in stereo video that enforces temporal consistency. Reliable search range estimation is very important since an incorrect estimate causes most stereo matching methods to get trapped in local minima or produce unstable results over time. In this work, the search range is estimated based on a disparity histogram that is generated with sparse feature matching algorithms such as SURF. To achieve more stable results over time, we further propose to enforce temporal consistency by calculating a weighted sum of temporallyneighboring histograms, where the weights are determined by the similarity of depth distribution between frames. Experimental results show that this proposed method yields accurate disparity search ranges for several challenging stereo videos and is robust to various forms of noise, scene complexity and camera configurations. Index Terms Disparity search range, temporal consistency, disparity histogram, feature matching 1. INTRODUCTION For decades, the correspondence problem has been one of the most important issues in the field of computer vision. Various stereo matching algorithms have been proposed to address this problem and recent years have witnessed significant improvements in those algorithms including the ones with realtime or near real-time performance [1][2]. Dense disparity maps acquired by stereo matching can be used in many applications, including image-based rendering, 3-D scene reconstruction, robot vision, tracking, etc. Such applications of the stereo matching method often presume the knowledge of appropriate search range for disparity or simply use a fixed range. In practice, an appropriate search range of a scene is needed to facilitate the use of any stereo matching algorithm. The lack of a search range often implies the need to search over a wider range of candidate disparity values, which generally requires more computation and memory. But more importantly, most stereo matching algorithms are more likely to get trapped in local minima when given an inappropriate search range, which might result in significantly compromised quality of the disparity map. However, the acquisition of appropriate disparity ranges is by no means a trivial issue especially when, for example, the scene or camera configuration changes over time. Cyganek and Borgosz [3] proposed a way of estimating the maximum disparity range based on statistical analysis of the spatial correlation between stereo images, which they called image variograms. However, this method assumes that there are only positive disparities between stereo images. Another disparity search range estimation approach was proposed based on Confidently Stable Matching [4]. The disparity search range was obtained by setting an initial search range to the size of the whole image and then performing the proposed matching in a hierarchical manner [5]. This method appears to work well for several stereo images, but temporal aspects are not considered. There are also depth estimation techniques that directly impose temporal constraints as part of the estimation process, for example [6], but without an appropriate search range such techniques are still prone to false matches and incorrect estimation results. In this paper, we propose a novel approach for estimating the disparity search range in the stereo video. The search range is estimated with a disparity histogram, which can be generated by a sparse or dense feature matching algorithm. We further propose a method to enforce temporal consistency in the estimation of the disparity search range. The proposed techniques produce a temporally-consistent estimate of the disparity range that is robust to various forms of noise, scene complexity and camera configurations. The remainder of this paper is organized as follows. In Section 2, we discuss the disparity search range estimation based upon histograms. In Section 3, we describe a novel method to improve the temporal consistency of the histogrambased method. We present the experimental results in Section 4, and conclude the paper in Section 5. 2. DISPARITY SEARCH RANGE ESTIMATION This work forms a disparity histogram to estimate the disparity search range for stereo video. The search range is com-

-5 7 5 3 1 6 6 6 66 68-1 -3 1 1 8 6-126 -56 14 84 1 1 8 6-126 -56 14 84 Fig. 1. Problem in disparity search range estimation without enforcing temporal consistency. puted by thresholding the disparity histogram, which represents the distribution of depth information in a scene. To generate the disparity histogram, an initial set of matching points can be obtained by a sparse feature matching method such as KLT (Kanade-Lucas-Tomasi) [7, 8] and SURF (Speeded Up Robust Features) [9] trackers. These methods define a descriptor for interest points and track the points using gradient or nearest-neighborhood methods. Alternatively, any dense matching method, e.g., based on graph cuts [1] or belief propagation [11], can be used to build the histogram. To reduce computation, the disparity map is typically computed on a sub-sampled version of the original input images [5]. In this work, SURF was used to generate the disparity histogram. SURF is a scale and rotation invariant interest point detector and descriptor. It relies on integral images for image convolution and builds on the strengths of the existing detectors and descriptors with a Hessian matrix-based measure and a distribution-based descriptor [9]. Using the pairs of matched feature points, the disparity histogram is computed as follows: h[i] = N f j=1 ( ) dj D(i), B ±.5 B i = 1, 2, M (1) where h[i] is the histogram count for the i th bin (M: the total num. of bins) and f(a, b) is set to 1 if a = b, otherwise. By quantizing each disparity value d j of a matching-points pair (out of the N total pairs) with the bin-size B, a histogram bin count with the closest representative value D( ) will be incremented by one. Fig. 2 shows sample histograms computed using the SURF matching points. The disparity search range is then computed by thresholding the histogram and removing outliers. Generally speaking, points with positive disparity are more important than those with negative disparity since the human visual system (HVS) is more sensitive to near objects. Based on this assumption, the threshold we use is defined as follows: 1 1 8 6-126 -56 14 84 Fig. 2. Histogram similarity when scene changes. { T h = 2B if d < [B/2] + 1 otherwise The disparity search range R is then determined by: R = {k D(i) B/2 k D(i) + B/2, h[i] > T h } (3) Fig. 1 plots the minimum and maximum disparity for a sequence of stereo images using this scheme. The corresponding synthesized views based on an estimated disparity map at select instances of time are also shown. We can see that the estimated search range is unstable even though the consecutive frames have very similar depth distributions. The visual artifacts in the synthesized views are also apparent. 3. ENFORCING TEMPORAL CONSISTENCY If the search range R has unnecessary disparities or misses significant ones, the stereo matching process can get trapped in local minima more easily. The threshold-based range detection method described in the previous section tends to be sensitive to false matching due to noise, color mismatches, repetitive patterns, etc. Even a few false matches can yield different disparity ranges in the temporally-neighboring frames with similar depth distributions, since consistency among consecutive frames is not considered. In order to address this problem, one might attempt to build a temporally more consistent feature matching algorithm, but it is outside the scope of this paper. Instead, we consider enhancing the temporal consistency among the disparity histograms of consecutive frames so that we can leverage any decent feature-matching method in a temporally consistent manner. The new histogram is obtained by calculating weighted sums of temporally-neighboring histograms using (2)

the weights determined by the similarity of depth distributions between frames. The depth distribution of a scene depends on scene or camera configuration change, and it can be represented by the disparity histogram. Therefore, the similarity of disparity histograms can be used to identify scene or camera configuration change as well as to reduce the effects of outliers. The similarity of disparity histogram can be computed with the weighting factor w as follows: w n,p = exp ( M i h nor n (i) h nor p (i) /σ S ) where w n,p represents the similarity measure between the n th and p th histograms. σ S is the weighting constant for distance between histograms. Note the normalized histogram h nor n should be used in Eq. (4), since the total numbers of matching points vary among consecutive frames. Since the distance is calculated with the normalized histograms, it ranges from to 2. Fig. 2 shows the histogram and its similarity measure when a scene change occurs. The similarity value between the 1 st and the 2 nd histogram was.149, while it was 1.719 between the 2 nd and the 3 rd ones. We empirically determined σ s to be.4. The new histogram can be computed by calculating the weighted sum of temporally-neighboring histograms as follows: h w p (i) = n N(p) (4) w n,p h n (i) (5) where h w p is the p th weighted histogram, and N(p) represents the set of neighboring frames for the p th frame. In this work, only the temporally causal neighboring frames were included in N(p). Also each weighted histogram h w p is used only to estimate the disparity search range, but is not used to replace the original histogram since it may cause error propagation. 6 5 3 1 7-1 7 7 76 78 8 8 8 86 88 - -3 - (a) without temporal consistency 6 5 3 1 7 7 7 76 78 8 8 8 86 88-1 - -3 - Fig. 3. Range estimation results for Heidelberg video 5 3 1 6 8 1 1 1 16 18-1 - -3 (a) without temporal consistency 5 3 1 6 8 1 1 1 16 18-1 - 4. EXPERIMENTAL RESULTS To validate the performance of the proposed method, we performed experiments with the stereo video sequences Heidelberg and RhineValley, available at [12], whose sizes are 128 7 and 7 576, respectively. For both sequences, the bin size B was set to 7 and the number of temporallyneighboring frames was 12. The input images were sub-sampled by a factor of 2 prior to applying the sparse feature matching by SURF; this tended to improve the performance. Fig. 3 and 4 show the estimated disparity search ranges for Heidelberg and Rhine Valley, respectively. Both stereo sequences include two segments with a scene change between them. Our results confirm that the proposed method provides more reliable disparity search ranges and is robust to the scene change. Next, we demonstrate the effect of a more accurate search range on the estimated disparity maps and synthesis -3 Fig. 4. Range estimation results for Rhine Valley video results. Figs. 5 and 6 show sample disparity maps obtained with and without temporal consistency in the search range estimation for both sequences. The dense disparity maps were computed based on the estimated search ranges and a hierarchical cost aggregation method [13]. The corresponding view synthesis results are also shown, where a novel view is synthesized at 3% of the original baseline distance. As shown by these results, incorrect disparity search ranges without temporal consistency often lead to incorrect disparity maps and visual artifacts in the synthesized views. The temporal consistency ensures a more reliable search range, which improves the estimated disparity maps and view synthesis.

(a) without temporal consistency (a) without temporal consistency (c) synthesized frame using (a) (d) synthesized frame using (b) Fig. 5. Disparity maps by search ranges without/with temporal consistency, 7 th frame, Heidelberg 5. CONCLUSION In this paper, we proposed a temporally-consistent method for estimating the disparity search range for a stereo video. The weighted sum of temporally-neighboring histograms is formed taking the histogram similarity into account in order to make the disparity search range estimation more consistent over time and robust to scene or camera configuration changes. The experimental results show that the proposed method works well for challenging stereo video sequences. In the future, Kullback-Leibler divergence [14, 15], which represents the similarity of two probability distributions, might be used to compute the histogram similarity. Since the histogram is normalized to calculate the weighting factor w, it could be also regarded as a propability distribution function. Furthermore, an extension of the proposed method to multiview video might be considered. In contrast to stereo video, multiple histograms exist in the multiview video case and it would be necessary to estimate a consistent search range from the multiple histograms. 6. REFERENCES [1] D. Scharstein and R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, IJCV, vol. 47, no. 1-3, pp. 7-42, Apr. 2. [2] http://vision.middlebury.edu/stereo [3] B. Cyganek and J. Borgosz, An improved variogram analysis of the maximum expected disparity in stereo images, Proc. SCIA, LNCS, pp. 6-645, 3. [4] R. Sara, Finding the largest unambiguous component of stereo matching, Proc. ECCV, pp. 9-914, 2. (c) synthesized frame using (a) (d) synthesized frame using (b) Fig. 6. Disparity maps by search ranges without/with temporal consistency, 5 th frame, RhineValley [6] M. Gong, Enforcing Temporal Consistency in Real-Time Stereo Estimation, Proc. ECCV, p. 564-577, 6. [7] B. Lucas and T. Kanade, An Iterative Image Registration Technique with an Application to Stereo Vision, International Joint Conference on Artificial Intelligence, pp. 674-679, 1981. [8] C. Tomasi and T. Kanade, Detection and Tracking of Point Features, Carnegie Mellon University Technical Report CMU- CS-91-132, April 1991. [9] H. Bay, A. Ess, T. Tuytelaars, L. V. Gool, SURF: Speeded Up Robust Features, Computer Vision and Image Understanding (CVIU), Vol. 11, No. 3, pp. 346-359, 8. [1] V. Kolmogorov and R. Zabih, Computing visual correspondence with occlusions using graph cuts, Proc. IEEE Int. Conf. Computer Vision, pp. 58-515, 1. [11] P. F. Felzenszwalb and D. P. Huttenlocher, Efficient belief propagation for early vision, Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 261-268, 4. [12] http://www.3dtv.at/ [13] D. Min and K. Sohn, Cost aggregation and occlusion handling with WLS in stereo matching, IEEE Trans. Image Processing, vol. 17, no. 8, pp. 1431-1442, Aug. 8. [14] S. Kullback, R.A. Leibler, On Information and Sufficiency, The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79-86, 1951. [15] S. Kullback, The Kullback-Leibler distance, The American Statistician, vol. 41, pp. 3-341, 1987. [5] J. Kostkova and R. Sara, Automatic Disparity Search Range Estimation for Stereo Pairs of Unknown Scenes, Proc. CVWW, 4.