An Application of Machine Learning to the Optimization of Disparity Maps

Size: px

Start display at page:

Download "An Application of Machine Learning to the Optimization of Disparity Maps"

Morgan Powell
6 years ago
Views:

1 An Application of Machine Learning to the Optimization of Disparity Maps Guido Cervone Machine Learning and Inference Laboratory George Mason University Fairfax, VA Abstract This paper presents an application of machine learning to the optimization of disparity maps, used in 3D scene reconstruction. A new approach, called ODM, is presented and it uses machine learning to guide an evolutionary process by creating and instantiating hypotheses. This approach is based on the Learnable Evolution Model, specifically modified to cope with this problem. ODM combines disparity maps generated using different parameters, first creating random combinations of the initial disparities, and then by creating and instantiating hypotheses that characterize what is the best combination for each pixel of the image. Experiments were conducted on two pairs of stereo images, one representing a computer-generated scene, and the other representing a real-world scene. The results of this approach were also compared to a more traditional and widely used algorithm. Results show that it is possible to improve the initial disparity maps generated, in some cases by almost twenty percent. 1 Introduction Inferring the 3D shape of a scene from its two dimensional images is one of the biggest challenges of computer vision, that usually goes under the name of structure from motion. The use of a pair of images taken from different point of views, also known as stereo, is one of the most commonly used approaches, and it has been widely developed in the last 20 years. The basic problem of stereo is to automatically find correspondent points in the two images. Most of the matching techniques are either local (area based) or global. In the former various variants of cross correlation are used to determine correspondent points: the main drawback of this approach is that results are strongly dependent on the size of the window used for the Marco Zucchelli Computational Vision Active Perception Lab Royal Institute of Technology Stockholm, Sweden zucch@nada.kth.se evaluation of the correlation: small windows tend to pick up details but are noisy, while large windows tend more to average and matching is more smooth but details are lost. Many attempts to develop algorithms able to adapt the size of the window automatically have been done; (Kanade and Okotumi, 1994) demonstrated that adaptive windows give much better results: their method use statistical models of the disparity maps and adapt the window on then base of this information. (Saito and Mori, 1995) applied a traditional genetic algorithm for the optimization of disparity maps, and they successfully showed that the generated disparities were better than the initial ones. Global approaches still suffer of coarse-fine matching problems even if they better optimize the continuity of the maps. In this paper a novel approach is proposed that does not require any previous knowledge of the disparity map: maps are generated using different levels of coarseness then combined together to maximize an evaluation function that weights both the compatibility between corresponding points and the continuity of the disparity. Our approach, called ODM, uses a machine learning algorithm to guide an evolutionary process that finds the optimal combination of the disparities generated. This methodology, called Learnable Evolution Model (Cervone, 1999, Michalski, 2000), has been employed in the optimization of complex evaluation functions, and in the optimization of heat exchangers (Cervone, Kaufman, Michaski, 2000) The Learnable Evolution Model or LEM is fundamentally different from the Darwinian-type model that underlies most of the current methods of evolutionary computation. The central engine of evolution in LEM is the Machine Learning mode, which creates new individuals by processes of generalization and instantiation rather than mutation and/or recombination, as in the Darwinian-type evolutionary computation methods. Machine Learning mode consists of two processes: hypothesis generation, which determines a 1

2 hypothesis characterizing differences between highpopulations, and hypothesis instantiation, which generates new individuals by instantiating the hypothesis in various ways. Machine Learning mode thus produces new individuals not through semi-random Darwinian-type operations, but rather through a deliberate reasoning process involving generation and instantiation of highlevel hypotheses about populations of individuals. Thus, in LEM, new individuals are genetically engineered, in the sense that they are determined according to descriptions learned from the analysis of the current and possibly past generations. The machine learning mode can work alone (unilem), or in combination to the Darwinian mode (duolem) which consists or the traditional operators of recombination and mutation. mode. LEM differs not only from the Darwinian-type evolution but also from the Lamarckian-type evolution, because in generating new individuals it takes into consideration not only the experience of single individuals, but also the experience of one or more populations of individuals. The main modification to the LEM methodology consists in the fact that ODM evolves matrixes, while LEM evolves vectors. Although it is possible to represent a vector as a matrix, this approach tends to loose the spatial relationship of values that lie on the same row or column. Initial experiments showed that a plain vector representation suffered from this problem. This paper is organized as follow: Chapter 2 gives an introduction to the Stereo problem, and describes how the initial disparities were generated. Chapter 3 introduces the ODM program that was written to solve this problem. Chapter 4 presents an experimental study of the application of the ODM program to two pairs of stereo images. 2 Stereo Problem Overview A stereo system is a set of two images of the same scene taken from different point of views. If the cameras are calibrated, meaning that both internal parameters and relative positions are known, then the depth-of-scene points can be easily retrieved by a process of triangulation. This process consists in finding the intersection of the rays going through the center of the cameras and the projection of the scene point over the focal plane. The matching of the two images is usually realized using cross correlation. In the case of calibrated cameras the process of finding corresponding points can be reduce to a 1-dimensional, since the cameras relative position is known, and consequently the images can be transformed in order to align the pixel rows. This process is called rectification and is a basic step of all stereo algorithms. The initial disparity maps were generated using an advanced algorithm called Maximum flow that does a fitness and low-fitness individuals in one or more past global search over the all image (Roy and Cox, 1998). The main advantage is that continuity of the map in the direction perpendicular to the scan lines can be better achieved. The algorithm does search disparity for each pixel in a given range and resolution. When high resolution is used maps are more precise but the optimization can be easily get stacked at local minima producing undesirable noise, while when resolution is low the algorithm tends to average having smooth maps but less precise in details. This is sketched in the Figure 2 where a cube is used. Compared with the traditional correlation technique the maps are much more smooth and precise and this makes the combination problem even more challenging. Figure 2: Coarse to fine disparity search. Coarse search is shown on the left, fine in the center and the original model on the right 3 ODM algorithm ODM stands for Optimal Disparity Map, and it is an algorithm that implements the LEM methodology, modified to cope with the stereo problem. Inputs to ODM are the left and right image, a series of disparity maps, and an occlusion map. The output consists of an image that is the best disparity found by the evolutionary process. Since the total number of map combinations is huge (number of disparities to the number of pixels), the search space is reduced by subdividing the images into smaller blocks NxN and running the algorithm block by block. In the experimental part it was investigated how the block size affects the optimization process. ODM evolves matrixes in which every element represents a value between 0, and the number of initial disparities passed to the program, minus one. The disparities are reconstructed by assigning, for each value in the matrix, the corresponding value of the disparity indicated. Matrixes are represented as vectors, however extra variables are added to represent the leading disparity for each row and each column. The first step in the evolutionary process consists in generating random combinations of the initial disparities, called initial population, and evaluating them according to a fitness function. In practice this process consists in generating a number of matrixes NxN that are combination of the initial disparities and evaluating them according to an evaluation function. Successively the following steps are repeated until a terminating condition is met, usually the generation of a 2

3 maximum number of births or when the entire population converges to a single disparity. First, the population is sorted according to the fitness score of the individuals. Two groups of individuals are then selected from the population, denoted HIGH group (H-group) and LOW group (L-group). These groups represent high performing and low performing individuals, respectively, according to the given fitness function. Two parameters called HT and LT are used to control the number of individuals to insert in each group, and together with the population size, are the only parameters of the ODM algorithm. These groups serve as training examples for a machine learning program; ODM employs the AQ20 program [Cervone, Panait and Michalski 2001], which is the newest implementation of the AQ methodology [Michalski, 1969; Michalski, 1983]. The concept behind the ODM algorithm is that there are reasons why the elements in the H-group have higher fitness score than those in the L-group. The machine learning algorithm finds distinctive characteristics or patterns that uniquely classify the elements in the high group, such as relations between the pixels. For example a learned hypotheses may say that all the pixels at n,m come from disparity 1, and all the pixels at p,q come from disparity 2. The process of learning hypothesis is quite complicated and goes behind the scope of this paper. Briefly, AQ is a progressive cover algorithm (a.k.a. separate and conquer. It is based on an algorithm for determining quasi-optimal (optimal or sub-optimal) solutions to general covering problems of high complexity [Michalski, 1969]. The central concept of the algorithm is a star, defined as a set of alternative general descriptions of a particular event (a seed ) that satisfy given constraints, e.g., do not cover negative examples, do not contradict prior knowledge, etc. The algorithm starts by randomly selecting a seed from among concept examples, and then creates a star for that example. A star is a set of rules that cover the positive event and does not cover any negatives. Only one rule is selected from a star according to an optimization function, called LEF (Lexicographical Evaluation Function). In ODM, LEF selects the rule with the highest rule-fitness, and in case of a tie, the rule with the smallest number of conditions. The rule fitness is an estimated average of the fitness value of the disparity included in the rule, and it is calculated by averaging the fitness values of the elements covered by the rule. A new seed is then selected from the uncovered-sofar examples, and the process repeats until there are no more examples to be covered. The result of this learning process is a set of rule that cover the positive examples, but do not cover any of the negative examples. The learned hypotheses are expressed in attributional predicate calculus. A learned rule looks like the following: [group = high] <-[pixel(3,4) = 2] & [pixel(4,6) = 3] & [pixel(5,6) = 1,2] and it reads: A disparity is part of the high group if the pixel at location 3,4 comes from disparity 2, the pixel at location 4,6 comes from disparity 3, and the pixel at location 5,6 comes from disparity 1 or 2. The following step consists in creating new combinations of the initial disparities using the learned hypotheses. Using the example provided above, the pixel at location 3,4 is set equal to pixel 3,4 of disparity number 2, the pixel at location 4,6 is set equal to pixel 4,6 of disparity number 3, and so on. Each pixel not included in the rule, is set equal to the corresponding pixel of a randomly selected disparity in the current population. Selecting a random disparity ensures that the algorithm is not too greedy, and it helps in maintaining diversity in the population. The number of disparities to generate from each rule is determined by the rule-fitness, as described previously. The newly created disparities are evaluated and inserted in the original population. The population is then resized to the original number by eliminating the individuals that have lower fitness score. This method called truncation survival is an inter-generational model, because it mixes the old individuals with the new ones. The evaluation function used to calculate the fitness of the disparities, consider both the smoothness and the compatibility of correspondent points, and is defined as follow: (Marr, 1982) The first term ensures that the brightness of correspondent points should be approximately the same while the second is just the Laplacian of the function and it is small for smooth functions, large if discontinuities are present. This evaluation function was used because it takes into consideration both the compatibility and the continuity of pixels. 4 Experiments Experiments were performed on two pairs of stereo images. The first pair, described in 4.1, is a synthetic scene (computer generated), while the second pair, described in 4.2, is an indoor scene. The Maximum Flow algorithm used for the generation of the initial disparities was obtained freely by For each image three disparities were generated, using different resolutions, which is the only parameter of the Maximum flow algorithm [Cox and Roy, 1998]. 3

In order to compare and validate ODM s performance, a more traditional and widely used Evolutionary Strategy (ES) algorithm was used (Back, Fogel, and Michalewicz, 1997).

This means that every individual in the population is selected and mutated. The mutation rate is automatically determined by the system depending on the improvements of the past populations.

4 In order to compare and validate ODM s performance, a more traditional and widely used Evolutionary Strategy (ES) algorithm was used (Back, Fogel, and Michalewicz, 1997). ES is a Darwinian evolutionary algorithm that evolves vectors of real-valued numbers. It employs deterministic selection and binary tournament. This means that every individual in the population is selected and mutated. The mutation rate is automatically determined by the system depending on the improvements of the past populations. The new and old individuals are combined together to form the new population. This is resized to the original size by a process called binary tournament. Two individuals are randomly selected and their fitness value compared. The one with higher fitness is kept in the population, while the one with lower fitness is deleted. In all the experiments reported in this paper ODM was run with HT and LT =.3, and with population size = 100. ES was run with population size = 10. Early experiments showed that ODM tends to give better results with larger populations, because more examples are passed to the learning algorithm, usually leading in more accurate rules. On the other hand, ES prefers smaller population sizes, because of its uniform selection. If the population is too large, at each cycle of the evolution too many individuals are created, usually leading to a slower convergence [Baeck, Foegel, Michalewicz, 1997]. 4.1 Synthetic Image The first pair of images are relative to a computer generate scene, and were obtained from www-dbv.cs.unibonn.de/stereo/data. They model the simplified geometry of two cameras, in which only the y coordinate is the same, while the x coordinate may change. The synthetic images were chosen because they do not contain noise, and they provide a perfect environment to test a new algorithm. The original images were 256x256, but they were first resized to 128x128, and then divided in blocks of 8x8 and 16x16, in order to reduce the complexity of the problem. The maximum flow algorithm was run with three different resolutions, namely 11, 21, 41. Figure 3 shows the scene taken by the left and right camera while Figure 4 shows the generated disparities and the ground truth. In Figure 4 darker pixels are closer to the camera, while lighter pixels are farther away. Figure 3: The left and right stereo synthetic images All three disparities seem to be good approximation of the scene, however it is possible to notice that not all the parts of the image are represented best in the same disparity. For example, in disparity 0 the sphere in the foreground has a clear dark color, suggesting that it is in the foreground. However the change of color of the walls and of the ceiling is not smooth, thus loosing the threedimensionality of the corridor. On the other hand, in disparity number 2, the ceiling and the walls change color smoothly, however the sphere does not have a very dark color. Figure 4: Initial disparities generated by the maximum flow algorithm, and ground truth Both ODM and ES algorithms were run for 50 generations, starting with the same initial population. Table 1 presents the evaluation of the three initial disparities, presented in the form of compatibility, continuity and their sum. Disparity 2 is better both in terms of compatibility and continuity. Disparity 0 Disparity 1 Disparity 2 Compatibility Continuity Sum Table 1: Evaluation of the three initial disparities Table 2 presents the results of the application of ODM and ES with different block sizes. ODM 16 ES 16 ODM 8 ES 8 Compatibility Continuity Sum Table 2: Results of ODM and ES using different block sizes The results show that both algorithms successfully improved the initial disparities. ODM performed slightly better then ES, however the difference between the two algorithms was insignificant. One reason why both 4

algorithms performed similarly can be attributed to the fact that the images were with no noise, and thus the initial disparities were very good, leaving small possibility of improvements.

This result can be attributed to the fact that it is not possible to compute the second derivative over the edges of the blocks.

compatibility. The best disparity generated contained 16384 pixel (128x128) and 29% were taken from Disparity 0, 38% from Disparity 1 and 33% from Disparity 2.

Three disparities were generated using the following resolution: 10,30,50, and they are shown in Figure 10.

5 algorithms performed similarly can be attributed to the fact that the images were with no noise, and thus the initial disparities were very good, leaving small possibility of improvements. Both algorithms performed better in terms of continuity when a larger block size. This result can be attributed to the fact that it is not possible to compute the second derivative over the edges of the blocks. Experiments were performed also for larger block sizes, and consistently showed that larger block sizes lead to a better continuity but the increase in the complexity of the problem lead worst compatibility. The best disparity generated contained pixel (128x128) and 29% were taken from Disparity 0, 38% from Disparity 1 and 33% from Disparity 2. Figure 7 shows the 3D reconstructions of the scene, realized using the disparity map. The original image size was 192x192, but the algorithm was run on blocks of 12x12. Three disparities were generated using the following resolution: 10,30,50, and they are shown in Figure 10. The occlusion was not present for this image, and consequently the value for the compatibility is higher than in the previous experiments. Figure 10. Initial disparities Similarly to the previous experiment the disparities varied depending on the resolution used. Table 3 shows the evaluation of the three initial disparities. Figure 7: 3D reconstruction of the synthetic scene (front view) Figure 8: 3D reconstruction of the synthetic scene(lateral view) 4.2 Real world image The second pair of stereo image is part of the imagery from the University of Tsukuba. The left and right images are shown in Figure 9. They represent an indoor scene with a statue and a lamp in the foreground, and a bookshelf in the background. Both ODM and ES were run for 20 generations, starting with the same initial population, however ODM converged after 10 Disparity 0 Disparity 1 Disparity 2 Compatibility Continuity Sum Table 3: Evaluation of the three initial disparities The values are larger than those in experiment 4.1 because the occlusion was not present for this image. Figure 11 and Figure 12 shows the best-so-far curve for ODM and ES respectively. The first graph shows the compatibility of the pixels, the second graph shows the continuity of the pixels, and the third graph is the sum of the first two. The dotted line represents disparity 0, the dashed line disparity 1, and the straight continuous line disparity 2. Figure 9: Left and right stereo real-world image Figure 11: Best-so-far curve for ODM 5

6 disparities. All the results of the ODM algorithm were compared to a more traditional and widely used Evolutionary Strategy algorithm. More experiments will be done to understand how the block size influences the learning and the evolutionary process on different pairs of stereo images. Different approaches can be taken on how to reduce the complexity of the problem, for example instead of optimizing single pixels, it is possible to optimize clusters of pixels within the block. Acknowledgement The authors wishes to thank Dr. Jana Kosecka for her valuable help throughout the development of this project, and Liviu Panait for his help in the implementation of the AQ20 and ODM programs. Figure 12: Best-so-far curve for ES The disparities generated in Figure 11 and 12 represent respectively a 13% and a 9% improvement over the best original disparity. Both algorithms successfully optimize both the continuity and the compatibility of the disparity. Note that the ODM algorithm was run for half the generations of ES. The disparity generated by ES after 10 generations is worst than the one generated by ODM both in terms of compatibility and continuity. Figure 10: Best disparity map found (left) and distribution of the disparity maps (right) The mosaic plot in Figure 10 shows that the generated disparity shows that the bottom left, and bottom right parts of the image are taken from disparity 0. The lamp is taken mainly from image 2, in fact it is possible to identify the edges of the lamp. The generated disparity contains pixels (192x192) and 41% were taken from Disparity 0, 28% from Disparity 1 and 31% from Disparity 2. 5 Conclusions This paper presented a methodology, called ODM, that optimizes disparity maps using a machine learning program that guides an evolutionary process. The machine learning program learns hypotheses that describe some disparities are better than others, according to an evaluation function. The hypotheses are then used to generate new disparities. Experimental results on two pairs of stereo images showed an improvement of up to 20% over the initial References [Back, Fogel, and Michalewicz, 1997] Back, T., Fogel, D.B. and Michalewicz, Z., Handbook of Evolutionary Computation, Oxford: Oxford University Press, 1997 [Cervone 1999] Cervone, G. An Experimental Application of the Learnable Evolution Model to Selected Optimization Problems. Master s Thesis, Dept. of Computer Science, George Mason University, Fairfax, VA., 1999 [Cervone, Kaufman, Michalski, 2000] Cervone, G, Kaufman, K.A., Michalski, R.S., Experimental Validations of the Learnable Evolution Model. Proceedings of the 2000 Congress on Evolutionary Computation, La Jolla, CA, 2000 Kanade, and Okutomi, 1994] Kanade, T, and Okutomi, T., A Stereo matching algorithm with an adaptive window: theory and experiment. IEEE Trans. Pattern.16 (9), [Marr 1982] Marr, D., Vision, W.H. Freeman, San Francisco, CA, , 1982 [Michalski, 1969] Michalski, R.S., On the Quasi-Optimal Solution of the General Covering Problem, Proceedings of the V International Symposium on Information Processing (FCIP 69), pp , Bled, Yugoslavia, [Michalski, 1983] Michalski, R.S, A Theory and Methodology of Machine Learning, in Michalski, R.S, Carbonell, J.G. and Mitchell, T.M. (Eds.), Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, 1983, pp [Michalski 2000] Michalski, R.S. LEARNABLE EVOLUTION MODEL: Evolutionary Processes Guided by Machine Learning. Machine Learning 38(1-2), pp. 9-40, Pal, S.K., Bhandari, D., Kundu, M.K., Genetic Algorithms for Optimal Image Enhancement, Pattern Recognition Lett. 15, [Roy and Cox, 1998] Roy, S., Cox, J. I., A Maximum-Flow Formulation of the N-camera Stereo Correspondence Problem, Proceeding of the International Conference on Computer Vision, Bombai, Jaunary [Saito and Mori, 1995] Saito, H., Mori, M. Application of Genetic Algorithms to Stero matching of images, Patter Recognition Letters 16,

The LEM3 System for Non-Darwinian Evolutionary Computation and Its Application to Complex Function Optimization. Janusz Wojtusiak Ryszard S.

The LEM3 System for Non-Darwinian Evolutionary Computation and Its Application to Complex Function Optimization. Janusz Wojtusiak Ryszard S. Reports Machine Learning and Inference Laboratory The System for Non-Darwinian Evolutionary Computation and Its Application to Complex Function Optimization Janusz Wojtusiak Ryszard S. Michalski MLI 05-2