A The left scanline The right scanline

Dense Disparity Estimation via Global and Local Matching Chun-Jen Tsai and Aggelos K. Katsaggelos Electrical and Computer Engineering Northwestern University Evanston, IL 60208-3118, USA E-mail: tsai@ece.nwu.edu, aggk@ece.nwu.edu Abstract A new divide-and-conquer technique for disparity estimation is proposed in this paper. This technique performs feature matching recursively, starting with the strongest feature point in the left scanline. Once the first matching pair is established, the ordering constraint in disparity estimation allows the original intra-scanline matching problem to be divided into two smaller subproblems. Each subproblem can then be solved recursively, or via a disparity space technique. An extension to the standard disparity space technique is also proposed to compliment the divide-andconquer algorithm. Experimental results demonstrate the effectiveness of the proposed approaches. 1 Introduction Disparity estimation is the most fundamental problem in stereo image processing. Given two images taken simultaneously with a pair of cameras, the goal of this process is to locate for each point in one image its corresponding point in the other image. Let I L (x; y) and I R (x; y) denote the left-channel and right-channel image functions. The solution to the correspondence problem of I L (x; y) and I R (x; y) is a disparity field, d(x; y), such that, I L (p) =I R (p + d(p)) (1) where p =(x; y) T represent the image coordinates. Many researchers have been working on the disparity estimation problem since the 1970s. The most often used criteria for matching are photometric similarity and special consistency. The photometric similarity criterion rises naturally from equation (1). The spatial consistency criterion confines the search space for d(p) to piecewise smooth disparity fields. There are several techniques available for disparity estimation, including, block matching [1], regularization techniques [2], and disparity space-based techniques [3, 4]. To reduce the complexity of the disparity estimation problem, many researchers assume that the cameras are arranged in a parallel-axis configuration. In this case, the stereo matching problem is simplified to an intra-scanline pixel matching problem. The disparity space-based technique is of particular interest in this case because it incorporates occlusion models directly into the estimation 1

process, while other techniques usually require an occlusion determination process after the estimation of the disparity field. In addition, if a dense disparity field is desired, the disparity space technique usually gives better results. In this paper, we propose a divide-and-conquer technique for feature matching. This technique first establishes the matching of strong feature points. Because of the ordering constraint in disparity estimation, these matching points divide the original intra-scanline matching problem into several smaller subproblems. These subproblems can be solved recursively, or via a disparity space technique when there is no feature point in the sub-intervals. An extension to the standard disparity space technique is also proposed to compliment the divide-and-conquer algorithm. The paper is organized as follows. In section 2, the divide-and-conquer global feature matching algorithm is introduced. In section 3, the proposed extended-step disparity space technique is described. Experiments are presented in section 4, while conclusions are given in section 5. 2 The Divide-and-Conquer Global Matching Algorithm Figure 1 shows corresponding scanlines extracted from a pair of video conferencing stereo images. To create a dense disparity map, we have to perform pixel-wise matching between these two scanlines. Noise which results from image acquisition and sampling, differences in lighting, etc., inhibits perfect matching. Furthermore, some pixels in the left scanline have no match in the right scanline due to occlusions. Even though disparity space based techniques handle these two problems (noise and occlusion) adequately well, they usually produce rough disparity estimates in uniform areas (for example, the background areas represented by the right and left end parts of the scanlines in Figure 1). In addition, image areas with high intensity variance are usually treated as occlusions when the actual disparity is not uniform in such areas. In the proposed algorithm, the ordering constraint in disparity estimation is imposed explicitly. Referring to, for example the scanline profiles in Fig. 1, the ordering constraint states that if A $ B is a matching pair, then any point to the rightofa can only be matched to a point to the right ofb. If, furthermore, C $ D is a matching pair to the right ofa $ B, then the interval [A; C] should be matched to the interval [B; D]. Therefore, the establishment of matching pairs A $ B and C $ D divide the full intra-scanline matching problem into 3 subproblems: the intra-interval matching between intervals [0;A] and [0;B], [A; C] and [B; D], and [C; N] and [D; N], where N is the last pixel in each scanline. The break-down of a large problem into sub-problems fits the divide-andconquer framework. Assume that the problem is to perform intra-scanline matching between intervals [A; C] and [B; D]. The algorithm is summarized as follows:

150 150 100 C 100 D 50 A 50 B 0 0 20 40 60 80 100 120 140 160 180 The left scanline 0 0 20 40 60 80 100 120 140 160 180 The right scanline Figure 1: The scanline profiles from a pair of real stereo images. Cross marks in the profiles are some feature points with high intensity variance. Step 1: Compute the intensity variance at each pixel in [A; C]. Pick the point with the largest intensity variance as the feature point. Step 2: Apply block matching to find the corresponding feature point in the interval [B; D]. Two measures are used to compute the reliability of the match. The first one is the signal-variance-to-matchingerror ratio (SER) and the second one is the variance similarity (VS) between the left feature point and the right feature point. They are defined respectively by: SER(p) = ffil 2 (p) + ff IR P 2 (p+d) (IL ( ) I R ( )) ; (2) 2 and VS(p) =jff 2 IL (p) ff 2 IR (p+d) j; (3) where the summation in Eq. 2 is over the window used for matching. If SER(p): is too small or VS(p) is too large, the matching pair is considered unreliable. The pixel with the next highest variance is picked as the feature point and step 2 is repeated at this new position. Step 3: If a reliable matching feature pair R $ S is found, R 2 [A; C] and S 2 [B; D], the matching between interval [A; C] and [B; D] is divided into two sub-problems: the matching between [A; R] and [B; S] and the matching between [R; C] and [S; D]. Each subproblem can be solved recursively, i.e. Step 1. is applied. Step 4: If there is no reliable matching feature pair between [A; C] and [B; D], the disparity in (A; C) can be either filled in using linear interpolation of the disparity ata and C, or, it can be solved with a disparity space technique. In this paper, the latter approach is demonstrated.

3 The Extended Disparity Space Technique The disparity space image (DSI) is the intra-scanline matching error image. It is defined for scanline n as follows: DSI n (x L ;x R )= W=2 X i;j= W=2 fi I L (x L + i; n + j) I R (x R + i; n + j)fi ; (4) where W is the matching window size. The estimation of disparity between these two scanlines is equivalent to finding a path across a DSI starting at the lower-left corner and ending at the upper-right corner which has the smallest cost. This path can be found using dynamic programming techniques. To simplify the dynamic programming for disparity estimation, three step types are typically assumed, namely left occlusion, match, and right occlusion. This simple step model tends to produce mis-detected occlusions in an area with non-uniform disparity. More sophisticated step models have been proposed ([5], [6]) to resolve the conflict between occlusion and non-uniform disparity. However, these new models introduce more cost parameters that are usually determined empirically. In this paper we propose to use an extended step model which does not introduce extra cost parameters. In Fig. 2, five possible steps (instead of three) are considered during the cost evaluation. The introduction of the two new steps dp ~ and ~ep allows for the path to model non-uniform disparity properly. According to Fig. 2, the cost function is now defined by: COST(p) = min 8>< >: COST(c)+DSI(p) COST(a)+COST occlusion COST(b)+COST occlusion COST(d)+DSI(p)+ DSI(b)+DSI(c) 2 COST(e)+DSI(p)+ DSI(a)+DSI(c) 2 9>= >; ; (5) where COST occlusion is the fixed occlusion cost, and a; b; c; d; e, and p are pixel coordinates. x R d a c e p b n n+1 n+2 x L Figure 2: The extended step model for disparity space technique. Figure 3: Left and right frames from the man" stereo image sequence.

4 Experiments In this section, two stereo image pairs are used to evaluate the performance of the proposed algorithm. The first stereo image pair comes from the video conferencing sequence man," two figures are shown in Fig. 3. A comparison of the estimated disparity maps using the conventional ([4]) and the proposed disparity space techniques is shown in Fig. 4. The disparity maps computed with the proposed extended-step algorithm and the divide-and-conquer algorithm are also shown in Fig. 4. Figure 4: From left to right: 1) disparity map estimated with the basic disparity space algorithm. 2) disparity map estimated with the modified disparity space algorithm in [6]. 3) disparity map estimated with the extended-step disparity space algorithm alone. 4) disparity map estimated with the proposed divideand-conquer global matching disparity space algorithm. All disparity maps are histogram-equalized for visualization purposes. The brightest spots are detected occlusions. By comparing the results in Fig. 4, one can see that the conventional algorithm does not handle background well. The matching error in the background is propagated into the foreground and undermines the estimation of disparity on the person's face area. On the other hand, the proposed divide-and-conquer algorithm clearly separates the foreground from the background and does not suffer from the lack of texture in the background. Notice how the global matching information introduced by the divide-and-conquer algorithm (Fig. 4.4) help removing the mis-match from pure local matching algorithm (the bright horizontal line across the chin of the face in Fig. 4.3). The aqua" image sequence (Fig. 5) is used for the second experiment. The estimated disparity maps using the conventional disparity space algorithm and the proposed divide-and-conquer algorithm are shown in Fig. 6. From these figures, one can observe that the quality of the disparity map in the latter case is better in details. You can see more structure in the coral, the largest fish, and the background using the proposed algorithm. 5 Conclusions We have presented a two-level disparity estimation algorithm. The top level uses a divide-and-conquer algorithm for global feature matching while the bot-

Figure 5: Left frame from the aqua" stereo image sequence. Figure 6: Left: disparity map from the conventional disparity space algorithm. Right: disparity map from the proposed algorithm. tom level uses a disparity space technique to perform local matching and occlusion detection. As the experiments show, the proposed technique works very well for both low-feature scenes from video conferencing sequences and complex scenes from natural scenic image sequences. Another advantage of the divide-and-conquer algorithm is that it separates texture-less regions (e.g. background) from texture-rich regions (e.g. foreground). We are investigating the application of adaptive disparity space matching algorithm to improve the performance further. References [1] M. J. Hannah, Computer Matching of Areas in Stereo Images," Ph.D. Dissertation, Stanford University, Stanford, CA, Report STAN-CS-74-483, 1974. [2] D. Terzopoulos, Regularization of Inverse Visual Problems Involving Discontinuities," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 4, pp. 413-423, July 1986. [3] Y. Ohta and T. Kanade, Stereo by Intra- and Inter-Scanline Search Using Dynamic Programming," IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. PAMI-7, No.2, March 1985. [4] S.S. Intille and A.F. Bobick, Disparity-Space Images and Large Occlusion Stereo," MIT Media Lab Perceptual Computing Group Technical Report No. 220, 1993. [5] L. Falkenhagen, "Disparity Estimation from Stereo Image Pairs Assuming Piecewise Continuous Surfaces," in: Paker, Y. and Wilbur, S. (Ed.), Image Processing for Broadcast and Video Production, ISBN 3-540-19947-0, Springer, Great Britain, 1994. [6] A. Redert, C.-J. Tsai, E. Hendriks, and A.K. Katsaggelos, "Disparity Estimation with Modeling of Occlusion and Object Orientation," Proc. SPIE Visual Communication and Image Processing, San Jose, pp. 798-808, Jan. 1998.