STEREO-DISPARITY ESTIMATION USING A SUPERVISED NEURAL NETWORK

Similar documents
CS 787: Assignment 4, Stereo Vision: Block Matching and Dynamic Programming Due: 12:00noon, Fri. Mar. 30, 2007.

Introduction à la vision artificielle X

COMP 558 lecture 22 Dec. 1, 2010

Stereo: the graph cut method

Multiple View Geometry

Model-Based Stereo. Chapter Motivation. The modeling system described in Chapter 5 allows the user to create a basic model of a

Stereo imaging ideal geometry

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

7. The Geometry of Multi Views. Computer Engineering, i Sejong University. Dongil Han

Lecture 14: Basic Multi-View Geometry

Stereo Vision. MAN-522 Computer Vision

Dense 3D Reconstruction. Christiano Gava

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Motion Estimation. There are three main types (or applications) of motion estimation:

Machine Learning 13. week

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

CS6220: DATA MINING TECHNIQUES

Practice Exam Sample Solutions

CS5670: Computer Vision

STEREO BY TWO-LEVEL DYNAMIC PROGRAMMING

Stereo Image Rectification for Simple Panoramic Image Generation

Random projection for non-gaussian mixture models

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

VIDEO OBJECT SEGMENTATION BY EXTENDED RECURSIVE-SHORTEST-SPANNING-TREE METHOD. Ertem Tuncel and Levent Onural

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Announcements. Stereo

Stereo matching. Francesco Isgrò. 3D Reconstruction and Stereo p.1/21

CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

A STEREO VISION SYSTEM FOR AN AUTONOMOUS VEHICLE

Stereo Matching.

Dense Disparity Maps Respecting Occlusions and Object Separation Using Partial Differential Equations

Dense 3D Reconstruction. Christiano Gava

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

What have we leaned so far?

Rectification and Distortion Correction

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

Transactions on Information and Communications Technologies vol 19, 1997 WIT Press, ISSN

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

COMPARISION OF REGRESSION WITH NEURAL NETWORK MODEL FOR THE VARIATION OF VANISHING POINT WITH VIEW ANGLE IN DEPTH ESTIMATION WITH VARYING BRIGHTNESS

Computer Vision I. Dense Stereo Correspondences. Anita Sellent 1/15/16

11/14/2010 Intelligent Systems and Soft Computing 1

Image Inpainting by Hyperbolic Selection of Pixels for Two Dimensional Bicubic Interpolations

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

Announcements. Stereo

Artificial Neural Network-Based Prediction of Human Posture

Notes on Multilayer, Feedforward Neural Networks

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

EE795: Computer Vision and Intelligent Systems

Lecture 6 Stereo Systems Multi-view geometry

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

IN recent years, neural networks have attracted considerable attention

Chaplin, Modern Times, 1936

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Use of Multi-category Proximal SVM for Data Set Reduction

Figure (5) Kohonen Self-Organized Map

Texture Image Segmentation using FCM

Improving Image Segmentation Quality Via Graph Theory

Using temporal seeding to constrain the disparity search range in stereo matching

Lecture 6 Stereo Systems Multi- view geometry Professor Silvio Savarese Computational Vision and Geometry Lab Silvio Savarese Lecture 6-24-Jan-15

Three-Dimensional Computer Vision

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS

Processing Missing Values with Self-Organized Maps

Final Exam Study Guide

Depth from two cameras: stereopsis

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

CNN Template Design Using Back Propagation Algorithm

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

A MULTI-RESOLUTION APPROACH TO DEPTH FIELD ESTIMATION IN DENSE IMAGE ARRAYS F. Battisti, M. Brizzi, M. Carli, A. Neri

Neural Network based textural labeling of images in multimedia applications

(Sample) Final Exam with brief answers

Lecture 19: Motion. Effect of window size 11/20/2007. Sources of error in correspondences. Review Problem set 3. Tuesday, Nov 20

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

A Vision System for Automatic State Determination of Grid Based Board Games

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section

Stereo Video Processing for Depth Map

On-line and Off-line 3D Reconstruction for Crisis Management Applications

Multi-view stereo. Many slides adapted from S. Seitz

Handwritten Character Recognition with Feedback Neural Network

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

Depth from two cameras: stereopsis

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM

Visual Tracking (1) Feature Point Tracking and Block Matching

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

CSE152 Introduction to Computer Vision Assignment 3 (SP15) Instructor: Ben Ochoa Maximum Points : 85 Deadline : 11:59 p.m., Friday, 29-May-2015

A Quantitative Approach for Textural Image Segmentation with Median Filter

Integration of Multiple-baseline Color Stereo Vision with Focus and Defocus Analysis for 3D Shape Measurement

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Particle Tracking. For Bulk Material Handling Systems Using DEM Models. By: Jordan Pease

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

EE795: Computer Vision and Intelligent Systems

NEURAL NETWORK-BASED SEGMENTATION OF TEXTURES USING GABOR FEATURES

Real-Time Disparity Map Computation Based On Disparity Space Image

Transcription:

2004 IEEE Workshop on Machine Learning for Signal Processing STEREO-DISPARITY ESTIMATION USING A SUPERVISED NEURAL NETWORK Y. V. Venkatesh, B. S. Venhtesh and A. Jaya Kumar Department of Electrical Engineering Indian Institute of Science Bangalore 560012, INDIA Phone: 0091 80 2293 2572 Fax: 0091 80 2360 0764 Email: yvveleqee.iisc.ernet..in Abstract. We deal with the problem of determining disparity in gray-level stereoimage-pairs, by treating it as a nonlinear classification problem, and invoking Marr and Poggio's 111 neighborhood criterion. To this end, we propose the application of an artificial neural network (ANN). The main contribution of the paper is believed to be the use of neurons which are trained to be disparity selective, and thereby dispensing with the standard assumptions made about the neighborhood. The disparity estimates so obtained for random-dot and natural stereoimage-pairs are comparable to those found in the literature. Whereas Khotanzad et al. [3] used a multi-layer perceptron (MLP) in order to learn the constraints of a cooperative stereo algorithm for binary. random-dot stereograms, we employ a single layer ANN. Further, in our scheme, the ANN weights adapt themselves to the neighborhood, and are able to learn the constraints successfully. INTRODUCTION When images of objects at different. depths (in the physical world) are capt.ured hy a stereocamera-pair, each point on an object is uniquely rep resented by a specific pisel on each image; and the position of the pixel depends on the depth of the physical point in the scene. The displacement in the positions of these pixels corresponding to t,he same physical point is called disparity. And the physical depth of the point on t.he object can be extracted from t.his disparity. Therefore, the problem of stereoimage-pair analysis amounts to searching for corresponding pixels in t,he left and right images (hence the name, correspondence problem (CP), which is considered to he not completely solved). 0-7803-8608-6/04/$20.00 02004 IEEE 785

Many assumptions and c0nstraint.s have been proposed to simplify the computational approach to the CP. For instance, the cameras are assumed to be set up in such a way that the displacement of pixels (in t,he two images) takes place along the same horizontal ('epipolar') line. Hence the search area (for solving the CP) is constrained t,o be along the epipolar line (hence the name, epipolar constraint). Marr and Poggio (41 formulated the CP based on the following three constraints: s Compatibility: a similarity measure is used for matching any t.wo pixels. e Uniqueness: a pixel in one image can be matched to only one (and hence unique) pixel in the other, Continuity: objects in the world have, in general, a smooth and continuously varying disparity. Based on these constraints, Marr and Poggio [4] propose t.wo methods to solve the CP. The first method involves an iterative process, using the const,raints as excitation and inhibition, and is called the cooperative stereo algorkhm. In the second method, image features: like zero crossings, are used to match the corresponding points in the images. The present paper is motivat.ed by the cooperative st,ereo algorithm. Marr and Poggio [I] propose a cooperative algorithm in order to extract disparity from binary random-dot-stereograms. Zitnick and Kanade [5] extend t,his area-based approach to gray level images, and invoke the above three constraints to rectify the matches iteratively. For binary random-dot stereo-pairs, Khotanzad et al. 131 suggest a non-iterative met,hod of solving the same problem using a neural network. They make use of a multi-layer perceptron (MLP) along with a neighborhood const.raint, as in [I], in order to learn the excitatory and inhibitory relations between the neighboring pixels. The main contribution of the present paper is an extension of the method proposed in 131 to gray scale images, but using a single layer perceptron. Recent. work by Henkel [ZIT suggest,s t,hat coherence property can be exploited in the attempt to estimate disparit,y in a stereo-image pair. Note that, Marr and Poggio [l] do not deal with coherence because of the neighborhood constraints. It should be added here that we do not make any assumptions on t,he neighborhoods. On t,he other hand, we use the network to prune the neighborhood by itself The outline of this paper is as follows. In Sec. 2, %'e formulat,e the correspondence problem using t.he compatibility matrix; and propose a neural architecture in Sec. 3. We present, the experimental results on random-dot and natural stereo pair in Sec. 4. Finally, we conclude the paper in Sec. 5. 786

Left image Right image i in F- J PROBLEM FORMULATION Figure 1: Structure of compatibility matrix In Marr and Poggio's [1] cooperative algorithm, a pixel in the line segment from the left, image is compared with each pixel in the corresponding line segment of the right image of the stereo pair - this is the assumption oj epipolargeometry- in order to arrive at a compatibility matrix. Since Marr and Poggio [I] deal with binary random-dot stereo pairs, they use the XNOR operator in order to compare the 2 pixels: when there is a match, the matrix element is set to 1, else 0, Similar logic is extended to build, for gray-level images, a compatibility matrix, AIm.,,(i:j) (for each pixel (m,, n)), using some match-measures, like correlation or squared differences of windows centered at (m, n + i)'h and (m: n +j)th pixel in the left and right images. For simplicity, we use t.he sum of the absolute pixel-bj7-pixel difference of t,he windows centered at (m, n+i) and (m, n i j)lh pixels in the left and right images, respect,ively. for i.e.: if IL and IE are the left and right images, t.hen the compat.ibility mat.rix at (m,n)th pixel is given by lli 1" where i:j = 1,2,...,( 20-1); 20; D > D, assuming that disparit,y ranges from -D,,, to D,,,; 2W is the window size; and 20 is the segment size. The matrix elements are normalized so t.hat the values lie between 0 and 1. Hence a low value of the matrix element implies a better match. In order t.hat t,he mat,rix emheds t,he correct, match within itself, the size of the segment considered must he great.er than the maximum disparity that exists in the segment, (D > Dma3). In fact, it is fixed at a value higher than the highest disparitjr value in the image. As a consequence, the disparity value of (m, n)th pixel can be determined from him,+. 787

We consider only the interactions along the horizontal direction, i.e., only horizontal line segments are used to construct the compat.ibilit,y matrix. This procedure can be extended to include vertical neighborhoods in order to get a 3D-compatibility volume [I]. According to [l], at. a match point (i,j) only the neighborhood along the line of vision (and in the same disparity plane) affecfs the disparity value. In the compatibility matrix, the lines of vision lie along t,he rows and columns, i.e.: Mm+(i + k,j) and Mm,n(i,j + k) (see Fig. l), and the disparity plane lies along the direction of the principal diagonal, i.e. Mm,+(i + k,j + k). The uniqueness condition gives rise to inhibition along the lines of vision and excibation along the same disparity. In 131, the authors use these constrained neighborhoods in order to (a,) ext,ract, t,he training paramet,ers, and (b) learn the excitat,ory and inhibit.ory effects. But (21 suggests that, disparity can be extract.ed from the compatibility matrix using the coherence property which predicts that the match values cluster along the different disparity planes, i.e., along Afm,n(i + k,j - k). It is likely to be advantageous to consider all the match points, without any constrained neighborhoods, so that the network should be able to prune the neighborhood by itself. NEURAL ARCHITECTURE In order to train a neural network to learn the constraints embedded within the compatibility matrix, we need to present to the network inputs and the desired outputs. The implication is that we need a stereo pair, ' 1 and IR: and its true disparity map, ID: such that 1o(m,n) gives the disparity of the corresponding pixel IL(m,n.). Hence to train the network, t.he compatibility matrix, ;,, A{, is fed as the input., and the out,put, forced to take on the value of ID(m,n). We employ a linear network which is trained in a supervised manner using the gradient descent, learning method. The network (Fig. 2) has a fullyconnected, 40' input nodes and ZD,,, number of output nodes, one for each disparit3. plane. The compat.ibility matrix (2Dx2D) is fed as a (4Dzx1) vector, x, t,o the network, and the node corresponding to the expected disparity value is made high. i.e., if d, is the desired output vector, then for (m,n)th pixel, di=( 1 : if i = ID(m, n) 0 : elsewhere where i = 1,2,,..,2D,,, Let wi,j be the weight connecting jth input node to the ith output node. Since we consider a linear node, the output of the ith output node, 0, is 788

where i = 1,2,.,., 2D,,, and x? is the jth element of the vector constructed from IV~.~. We make use of the quadratic error function, E, for gradient descent defined as where d, is as given by eqn.2. Now according to gradient descent, the change in the weight nth iteration is given by, in the and. where i,p = 1,2,..., 2D,,,,; j = 1,2,...) 4D2. and 7 is a pre-defined learning constant. Hence the update equat.ion is given by w$+ ) = + AwjT; (8) We train the network using (a) the above equation, and (h) l ~, In and ID. While t,esting the network, wit,h some ot,her st.ereo pairs, the output. disparit,y value is calculated as the weighted average of the neurons out.put. Therefore, the disparity of the (m,n)th pixel is given by, where o is the output of the network, for the input x, constructed from the compatibility matrix Since the group response is considered, we get disparity with suhpixel accuracy. 789

EXPERIMENTAL RESULTS Figure 2: Neural network structure In order to train the neural network, we generat,e random-dot and textured stereo pairs from known disparity maps. Figure 3 shows a gray scale randomdot stereo pair (256x256), and the corresponding true disparity map, ranging from -5 to +5 pixels. The network was trained with a segment size of 20 (i.e., D = 10) and window size 6 (i.e., lli = 3). The result is shown in Fig. 6 along with the 3D depth plot in Fig. 7. It is to he noted that the result obtained with D = 5 is comparable to that for D = 10. The continuity constraint is questionable, since a window with D = 5 will not have any neighbor from the same disparit,y plane for -5 and +5 disparity values, but the network is still able to recognize them. Figure 5 shows the plot of weights as gray level image(negative) of different disparity nodes. Note that it has excitatory weight,^ along the principal diagonal, but the inhibitory weights seem to be spread out on both sides, and do not seem to support the uniqueness const,raint. When the disparity value increases, the excit.ation shifts away from the principal diagonal, and down-wards. For zero disparity. the excitation lies on the principal diago nal. Observe that the excitation decreases from the center, and there is high inhibition on both the sides. (4 (h) (c) Figure 3: (a) and (b) Random-dot stereo pair and (c) its true disparity map (Dmas = 5). 790

(4 (b) Figure 4: (a) Estimated disparity map and (b) the 3D plot of estimated and the true disparity of the random dot stereo pair Figure 5: 2D image plot(negative) of weights of different disparity nodes, and t,heir corresponding binary excitation(white)-inhibition(b1ack) plots for D = 10 and D,,, = 5 The network has been tested on the penhgon stereo pair. The results are shown in Fig. 6; and the corresponding 3D plot in Fig. 6. Observe that the disparity map is noisy compared to the results obt.ained from cooperat,ive algorithms by Kanade et al. [j]. This shortcoming is perhaps due to the lack of cooperation hetween the neurons corresponding to different, compatibility matrices. The network does not seem to perform well when the window under considerat.ion has insufficient pat,terns in it. See Fig. 8, where the camera is misclassified. 79 1

Figure 6: (a) Left image of pentagon stereo pair and (b) the estimated disparity map, with D = 5 and W = 3 Figure 7: 3D plot of the estimated disparity map (4 (b) Figure 8: Right image of the tsukuba stereo pair 792

Even though the network works well for synthetic images, it fails to give comparably good results for noisy images. But this can be improved by constructing the 3D compatibility-matrix M, baking into account the neighborhood along the vertical direction as well, and by using bet,ter similarity measures to construct M. Since the method is non-iterative, it, is fast; and it, can be implemented in parallel, since the compatibility matrix for each pixel can be constructed separat,ely. CONCLUSIONS Based on the concept of cooperative stereo. a new method has been proposed, using an ANN, for stereo disparit.y estimation. Its performance compares favorably with that of recent algorithms (on stereo-image pair analysis) in the literature. Some illustrations are given. REFERENCES 111 D. Marr and T. Poggio, "Cooperative computation of stereo disparity," Science, vol. 194) pp. 283-?8i, October 1978. 121 R. D. Henkel, "A simple and fast neural network approach to stereo vision," Pmc. of NIPS'S7 in Denuer, AUT Press, Cambridge, pp. 808414, 1998. [3] Alireza Khotanzad, Amol Bokil, and Ying-Wung Lee, "Stereopsis by constraint learning feed-forward neural networks,': IEEE 'Ramactkm on Neural Networks, vol. 4, no. 2: pp. 332-342, March 1993. 14) D. Marr and T. Poggio, "A computational theory of human stereo vision," Proceedings of Royal Society London E, vol. 204, pp. 301-328, 1979. 151 C. Lawrence Zitnick and Takeo Kanade, "A cooperative algorithm for stereo matching and occlusion detection,': IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, pp. 875484, July 2000. 793