Disparity Estimation with Modeling of Occlusion and Object Orientation Andre Redert, Chun-Jen Tsai +, Emile Hendriks, Aggelos K. Katsaggelos + Information Theory Group, Department of Electrical Engineering Delft University of Technology, Delft, The Netherlands + Department of Electrical and Computer Engineering Northwestern University, Evanston, IL, USA ABSTRACT Stereo matching is fundamental to applications such as 3-D visual communications and depth measurements. There are several dierent approaches towards this objective, including feature-based methods,, block-based methods, 3,4 and pixel-based methods. 5 Most approaches use regularization to obtain reliable elds. Generally speaking, when smoothing is applied to the estimated depth eld, it results in a bias towards surfaces that are parallel to the image plane. This is called fronto-parallel bias. 4 Recent pixel-based approaches 5 claim that no disparity smoothing is necessary. In their approach, occlusions and objects are explicitly modeled. But these models interfere each others in the case of slanted objects and result in a fragmented disparity eld. In this paper we propose a disparity estimation algorithm with explicit modeling of object orientation and occlusion. The algorithm incorporates adjustable resolution and accuracy. Smoothing can be applied without introducing the fronto-parallel bias. The experiments show that the algorithm is very promising. INTRODUCTION The estimation of disparity elds is a dicult problem. The disparity eld between a pair of images depends directly on two things: rst, the distance of the objects to the cameras, and second, the geometry of the stereo cameras. Although there are a number of techniques for disparity/depth estimation, many of them require camera calibration. For applications like multiview image generation, 6 stereo image sequence coding, 7 etc., a -D image registration method without camera calibration is preferred. To reduce the complexity of disparity estimation, many researchers assume that the cameras are arranged in parallel-axis congurations so that the pixels on one scanline in the left image are matched to pixels on the same scanline in the right image. In this case, the stereo matching problem is simplied to an intra-scanline pixel matching problem. One popular approach for solving this intra-scanline matching is the disparity space technique.,8,5 This technique involves the computation of a matching error image (disparity space image), the denition of a matching path model, and an algorithm for minimal error path search. Previous approaches use a simple path model to reduce the size of the search space. 8 In these methods, dierent constraints are used to resolve the problems of multiple possible paths. For example, in Ref [], inter-scanline matching is used along with intra-scanline
step type x L ; x R x; d left occlusion, ; match, ; right occlusion, ;? Table : Increments for elementary steps in a path. matching. In Ref [8], reliable feature points (Ground Control Points) are used to conne the matching paths. Ref [5] requires the matching path to contain minimal number of discontinuities. However, all these methods fail to recognize the fact that many ambiguities in path search come from the simplied path model which does not distinguish between occlusions and slanted objects. In this paper we propose a disparity estimation algorithm with explicit modeling of object orientation and occlusion. The experiments show that this new approach handles occlusions and slanted objects very well. The paper is organized as follows. In Section 3, a disparity space is introduced and the proposed algorithm for disparity estimation in this space is formulated. In section 4, the implementation details are described. Experiments with both a real and a synthetic image sequences are conducted, and described in detail in the same section. Finally, Section 5 concludes the paper. 3 PROPOSED DISPARITY ESTIMATION ALGORITHM The proposed disparity estimation algorithm is based on a disparity space technique. A disparity space is a two-dimensional matching space computed from a pair of corresponding scanlines in the left and the right images. There are dierent ways for generating this space. In this paper, disparity spaces are computed by transforming the disparity space coordinates (x L ; x R ), rst used by Ohta and Kanade, through the following transformation: x d = xl? x R ; () where the x and d axes span the disparity spaces dened in this paper, while the x L and x R axes span the disparity spaces described in Ref []. In equation (), x; x L ; x R W? and jdj W? if the image width is W. Also note that the unit for x and d is dierent from the unit for x L and x R because the transformation is not norm-preserving. The relation between the two disparity spaces is illustrated in Figure. A path in a disparity space denes a mapping that associates left scanline pixels to corresponding right scanline pixels. In the x L?x R space, a path starts at (; ) and ends at (W; W ), while in the x?d space, a path starts at (; ) and ends at (W; ). Each path is composed of many elementary steps. Most researchers only consider three types of elementary steps: the match step, the left occlusion step, and the right occlusion step. The increments of each step for the x L?x R and x?d spaces, denoted respectively by (x L ; x R ) and (x; d), are listed in Table. From this table we can see that the slope d of an elementary step in the new disparity space is conned to nite x values between? and. This is an advantage of the new disparity space. Dierent costs are assigned to the elementary steps so that a low cost path denes a good corresponding mapping between left scanline pixels and right scanline pixels. A dynamic programming (DP) technique is typically used to nd the minimal cost path. Previous works 5,8 assign constant penalty costs to occlusion steps and use intensity dierence measures as costs for matching steps. Since each cluster of matching steps here represents a planar object parallel to the camera image planes, slanted planar objects or curved objects which cause contracting/stretching mapping will be modeled by many small fronto-parallel objects with occlusion gaps in between if we only allow three kinds of elementary steps. Obviously, this eect is not desirable for most natural scenes. To avoid fragmenting a scene, one should consider other elementary matching step types with
d +W/ W x L x d W x -W/ W x R Figure : An example of a path. Note that the units for the x L?x R space and for the x?d space are dierent. < jdj <. This is equivalent to explicitly modeling of objects which are not parallel to image planes. To model the new elementary steps, the disparity space is rst resampled at x and d intervals with N = x=d an integer (Figure ). The derivatives (slopes) of a disparity path now have N + dierent values between and -, corresponding to N + elementary match steps. There are also N + elementary occlusion steps. The resolution of the estimated disparity eld is determined by x, while the accuracy is limited to d. At each node in Figure, there are 4N + possible incoming paths (including N + object paths and N + occlusion paths). There are also 4N + possible outgoing paths. Just like the incoming paths, each of these outgoing paths takes one of the N + possible slopes and one of the following types: object or occlusion. To t the abovementioned approach in the framework of dynamic programming, we must dene a recursive cost function for a path. Let us use an integer index (k) to label a node and a noninteger index (k + ) to label the path that stems from this node (see Figure ). Let us also denote the slope of the incoming path by, d, k? and the type of that path by, o k?. The slope and type of the outgoing path are denoted respectively by d, k+ o k+. That is, we want to choose the slope and the type of the path so that the cost is minimized. With this notation the recursive cost function for a path is dened by: d (x k, d k ) (x k-, d k- ) x d x d (x k-, d k- ) (d k+/, o k+/ ) (x k, d k ) x (a) (b) Figure : a) Possible outgoing steps (dotted lines) at node k. In this example, N =. b) Steps (dotted lines) compared by DP algorithm in equation ().
C(x k ; d k ; d k+ ; o k+ ) = min C(x k? ; d k? ; d d k? ;o k? k? ; o k? ) + C N (x k ; d k ) + C LP (x k ; d k ; x k? ; d k? ; o k? ) The transition from one node to the next is computed by: + C T (d k? ; o k? ; d k+ ; o k+ ): () x k = x k? + x; d k = d k? + d k? x: (3) The C N term sets a limit on the possible disparity range. For example, ; if jdk j < maximal allowable disparity C N (x k ; d k ) = ; otherwise (4) The C LP term is the matching cost for a local path (a single step) that extends the global path from node k? to node k. For an object step, the matching cost is computed by rst transforming the single step back to x L?x R space, then computing the average absolute luminance dierence between the corresponding pixels in the left and the right scanlines. To get more accurate matching costs, resampling of scanline pixels is required. In the case of an occlusion step, a constant cost is used. In both cases an extra term for disparity eld smoothing can be added. The C T term is a cost used to constrain the possible transitions from incoming to outgoing path. If an objectto-object transition takes place, we can use C T to impose a smoothness constraint on second order disparity eld derivatives. If an occlusion-to-occlusion transition occurs, a cost may be added here to penalize change in the direction of the occlusion. Finally, attaching costs to object-to-occlusion and occlusion-to-object transitions keep the algorithm from breaking the scene into many small occlusions and objects. This is similar to the cost that Cox 5 used in the MLMH algorithm to reduced the number of horizontal discontinuities. A dierence however is that in our case the cost is only assigned for real occlusion-segmentation transitions, where in Cox 5 it is assigned also for slanted or curved objects. In addition to the sampling of the x and d axes we resample the y axis with y intervals. Algorithm () is applied once per y scanlines. The computational load to obtain a disparity eld for a stereo image pair has the order 6 N yn x N d N 3 ; (5) O Ny y N x x N d d (4N + ) O yx where N y is the total number of scanlines in each image, N x is the number of pixels on one scanline, N d is the size of the allowed disparity range, and N = x as dened before. d 4 EXPERIMENTS Experiments on both real and synthetic stereo image pairs are conducted and the results are shown in this section. In these experiments, C LP and C T are dened as below: (luminance matching cost) + A jd C LP = j; (constant occlusion cost, A ) + A (? jd j); for object path for occlusion path (6)
Figure 3: The left image from the MAN sequence. Figure 4: The right image from the MAN sequence. C T = 8 < : ; for occlusion - occlusion transition A 3 ; for occlusion - object transition A 4 jd j; for object - object transition So next to x, y, and d (N = x ) we have 5 additional parameters: d A : occlusion penalty, same as in Cox's algorithm. A : occlusion direction bias towards jd j = (single image occlusion). A : disparity path bias towards jd j = (fronto parallel bias). A 3 : object - occlusion segmentation penalty. A 4 : disparity nd horizontal derivative smoothing. (7) With A = A = and A 3 = A 4 = we have the best approximation to Cox's ML algorithm. 5 Due to the elementary dierences in path modeling (shown in Table ), there are no choices for x, d and y that result in an exact match with ML algorithm. Two choices give the best approximation. With x = d = y = our algorithm has an elementary occlusion stepsize twice as large (lower occlusion resolution). With x = d = :5 and y =, the elementary matching step has half the size of that in the ML algorithm (higher object resolution). Figures 3 and 4 are the stereo image pair from the MAN image sequence in the European PANORAMA 9 project. The parameters used in the algorithm are: x = ; y = ; d = ; A = 5; A = A = A 4 = ; and A 3 = 5. The estimated disparity map is shown in Figure 5. The 3-D mesh surface plot of this map is shown in Figure 6. Note that the absence of the fronto-parallel bias allows the uniform background to have a noisy shape, while the face is still recognizable. To quantify the performance of the proposed algorithm, we created a synthetic pair of images for testing (see Figure 7 and 8). The true disparity map is shown in Figure 9. The results are shown in Table and in Figure -6. In Table, the column o! o shows the percentage of correct detection of occlusions, the column d! d lists the percentage of correct detection of disparities, and the column PSNR (= log MSE 55 ) is computed based on the MSE of all d! d disparities. The algorithm has no less than 8 parameters, while the computational load can be very high for certain parameter regions. It is a dicult task to cover the whole parameter space by a small number of experiments. The experiments in Table give a fairly good overview of the aspects of the algorithm. They lead us to the following observations:
8 4 7 5 4 3 Figure 5: The estimated disparity map of MAN. In this map, pixels with high intensity correspond to pixels with large disparity. Occlusions are marked as white pixels. 7 5 4 3 Figure 6: The 3D plot of the estimated disparity of MAN. Comparing experiments,, and 3 with the rest, it seems that the results get better for smaller x; y; and d, as expected. From experiments, 3 and also 6, 9, it seems that the fronto-parallel bias works better than the smoothing on second derivatives. From experiments 7,, and we can see that the occlusion direction parameter A does not have any signicant eect. Therefore, the luminance data alone constrains the occlusion path to the condition jd j =. From experiments 4 and 6, we see that it is possible to model occlusions and object direction separately. In experiment 4, the high A forces objects path to jd j =, so objects are segmented into small objects/occlusions, with the fronto-parallel bias. In experiment 6, the low A makes sure slanted or curved objects can exist without segmentation into occlusions. For a good comparison we set A to 5 (equal to A in experiment 4), keeping the fronto-parallel bias equal to experiment 4. From experiments 6 and 7, the segmentation cost reduces the d! o error signicantly (the o! d error is raised, but it should be kept in mind that there are many more disparities than occlusions in the map, so in absolute sense the number of pixels that gets the correct segmentation into object/occlusion increases). From experiments,, and 3, it seems that a fronto-parallel bias equal to the bias in normal DP algorithm (where A = A ) gives best results. In experiment 3 the PSNR is highest, but compared to experiment, the number of correctly detected disparities is lower (see Figure 6). Experiment 9 took nine hours on a SGI octane machine. The computational load is very high for small x; y, and d. 5 CONCLUSIONS We have presented a new disparity estimation algorithm, based on deterministic dynamic programming. The main contribution of this paper is the separation in the modeling of occlusions and slanted objects. The algorithm handles adjustable resolution, accuracy, and the degree of scene segmentation. It also includes a second order smoothing penalty term.
Figure 7: The left image from the synthetic sequence. Figure 8: The right image from the synthetic sequence. 7 65 55 5 45 4 35 3 5 8 4 4 8 Figure : The 3D plot of the true disparity map. Figure 9: The true disparity map.
Exp x y d A A A A 3 A 4 o!o d!d PSNR Figure 5 93.8% 96.3% 49.4 5 5 93.7% 97.8% 49.4 3 5 5 8.6% 97.8% 4.9 4.5.5 5 95.4% 96.7% 53. 5.5.5 94.9% 96.9% 5. 6.5.5 5 5 95.% 98.3% 5.9 7.5.5 5 5 5 93.8% 99.7% 5.7,7 8.5.5 5 5 5 9.8% 99.8% 5.8 9.5.7 5 5 5 86.3% 99.8% 5..5.5 5 5 5 83.3% 99.7% 47.7.5.5 5 5 5 5 93.8% 99.7% 5.7.5.5 5 5 5 93.9% 99.7% 5.4 3.5.5 5 5 83.6% 99.3% 43.7 4.5.5 5 5 84.8% 99.8% 5.3 5.5.5 5 5 5 85.9% 99.7% 5. 6.5.5 5 78.9% 99.7% 4. 3 7.5.5 5 8.% 99.6% 5.3 8.5.5 5 5 83.3% 98.% 5.7 9.5.5 5 5 8.% 98.9% 45.6.5.5 5 5 95.% 98.3% 5.9 4.5.5 3 5 7 94.8% 99.4% 54. 5.5.5 3 3 7 9.6% 99.6% 5.6 3.5.5 3 7 7 95.6% 99.% 55.8 6 Table : Experimental results of the synthetic image pair. Figure : Experiment 4. This experiment resembles Cox's ML algorithm. The path model allows only fronto-parallel objects. One can see false segmentation of objects and occlusions in the center of the map. Figure : Experiment 7. The path model allows slanted objects and bias toward fronto-parallel objects. Segmentation penalty is also added in the model.
Figure 3: Experiment 6. The path model allows slanted objects and has no fronto-parallel bias. Figure 4: Experiment. The path model allows slanted objects and bias toward fronto-parallel objects. No segmentation penalty is added. Figure 5: Experiment. This experiment is same as experiment 7 but with higher segmentation penalty. Figure 6: Experiment 3. The fronto-parallel bias in this experiment is higher than in conventional algorithms.
7 65 55 5 45 4 35 3 5 8 4 4 8 Figure 7: The 3-D plot of the estimated disparity eld from experiment 7. Good results have been obtained with both real and synthetic stereo image pairs. Disparity maps were obtained with a physically meaningful segmentation into objects and occlusion. The algorithm allows an adjustable frontoparallel bias for objects and an adjustable bias for occlusion direction. In the experiments we found that the presence of the fronto-parallel bias enhances the results. The occlusion direction bias did not have any signicant inuence. To enhance these promising results, our future work will focus on new denitions for the cost functions in the algorithm. Currently we are investigating how to transform some assumptions about the 3D world to cost functions in the disparity space. 6 ACKNOWLEDGMENT This work has been funded by the European ACTS project PANORAMA and a NATO Collaborative Research Grant. 7 REFERENCES [] Y. Ohta and T. Kanade \Stereo by Intra- and Inter-Scanline Search Using Dynamic Programming," IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. PAMI-7, No., pp. 39-54, March 985. [] J. Liu and R. Skerjanc, \Stereo and motion correspondence in a sequence of stereo images," Signal Processing: Image Communication 5, pp. 35-38, 993.
[3] E.A. Hendriks and G. Marosi, \Recursive disparity estimation algorithm for real time stereoscopic video applications", Proceedings of the International Conference on Image Processing, pp. 89-894, 996. [4] T. Kanade and M. Okutomi, \A stereo matching algorithm with an adaptive window: theory and experiment," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 6, No. 9, pp. 9-93, 994. [5] I.J. Cox, S.L. Hingorani, and S.B. Rao, \A maximum likelihood stereo algorithm," Computer Vision and Image Understanding, vol. 63, No. 3, pp. 54-567, April 996. [6] P.A. Redert, E.A. Hendriks and J. Biemond, \Synthesis of multi viewpoint images at non- intermediate positions," Proceedings of ICASSP, Munchen, Germany, pp. 749-75, 997. [7] D. Tzovaras, N. Grammalidis, and M.G. Strintzis \Depth map coding for stereo and multiview image sequence transmission," in Proceedings of the International Workshop on Stereoscopic and Three Dimensional Imaging (IWS3DI), Santorini, Greece, pp. 75-8, 995. [8] S.S. Intille and A.F. Bobick, \Disparity-Space Images and Large Occlusion Stereo," MIT Media Lab Perceptual Computing Group Technical Report No., 993. [9] European ACTS PANORAMA project, http://www.tnt.uni-hannover.de/project/eu/panorama