The Correspondence Problem in Perspective Images PhD Thesis Proposal

Size: px

Start display at page:

Download "The Correspondence Problem in Perspective Images PhD Thesis Proposal"

Audrey Gilbert
5 years ago
Views:

CENTER FOR MACHINE PERCEPTION The Correspondence Problem in Perspective Images PhD Thesis Proposal CZECH TECHNICAL UNIVERSITY Ondřej Chum chum@cmp.felk.cvut.

1 CENTER FOR MACHINE PERCEPTION The Correspondence Problem in Perspective Images PhD Thesis Proposal CZECH TECHNICAL UNIVERSITY Ondřej Chum CTU CMP 23 3 January 31, 23 RESEARCH REPORT ISSN Supervisor: Dr. Jiří Matas The author was supported by the Czech Ministry of Education under project MSM and by The Grant Agency of the Czech Republic under project GACR 12/2/1539. Research Reports of CMP, Czech Technical University in Prague, No. 3, 23 Published by Center for Machine Perception, Department of Cybernetics Faculty of Electrical Engineering, Czech Technical University Technická 2, Prague 6, Czech Republic fax , phone , www:

3 Contents 1 Introduction 2 2 RANSAC 4 3 Randomized RANSAC Algorithm The T d,d Test Experiments Conclusion Locally optimized RANSAC Algorithm Local Optimization Methods Experimental Results Conclusions Wide baseline stereo Maximally Stable Extremal Regions The proposed robust wide-baseline algorithm Experiments Conclusions Conclusions and Thesis Proposal 36

4 Abstract This thesis proposal addresses the correspondence problem, especially matching of two views of a scene taken with unknown cameras from unknown and arbitrary viewpoints. This task is known under the name of Wide Baseline Stereo Matching. Our recent research related to this field is described and the thesis goals are proposed. 1 Introduction This thesis proposal describes our work taken towards solving the correspondence problem. The focus was set on matching of two views of scene taken with unknown cameras from unknown and arbitrary viewpoints, known as Wide Baseline Stereo Matching. Significant part of the work focuses on the robust estimator RANSAC, as many computer vision algorithms include a robust estimation step where model parameters are computed from a data set containing a significant proportion of outliers. The RANSAC 1 algorithm introduced by Fishler and Bolles in 1981 [5] is possibly the most widely used robust estimator in the field of computer vision. RANSAC has been applied in the context of short baseline stereo [3, 33], wide baseline stereo matching [23, 35, 25, 15], motion segmentation [3], mosaicing [17], detection of geometric primitives [3], robust eigenimage matching [1] and elsewhere. Overview of the algorithm is given in Section 2. In Section 3 we show that under a broad range of conditions, RANSAC efficiency is significantly improved if its hypothesis evaluation step is randomized. A new randomized (hypothesis evaluation) version of the RANSAC algorithm, R- RANSAC, is introduced. Computational savings are achieved by typically evaluating only a fraction of data points for models contaminated with outliers. The idea is implemented in a two-step evaluation procedure. A mathematically tractable class of statistical preverification tests for test samples is introduced. For this class of preverification test we derive an approximate relation for the optimal setting of its single parameter. The proposed pre-test is evaluated on both synthetic data and real-world problems and a significant increase in speed is shown. A new modification of RANSAC, the locally optimized RANSAC, is introduced in Section 4. It has been observed that, to find an optimal solution (with a given probability), the number of samples drawn in RANSAC is significantly higher than predicted from the mathematical model. This is due to the assumption that a model with parameters computed from an outlier-free sample is consistent with all inliers. The assumption rarely holds in practice. The locally optimized RANSAC 1 RANdom SAmple Consensus 2

5 makes no new assumptions about the data, on the contrary - it makes the abovementioned assumption valid by applying local optimization to the solution estimated from the random sample. Finally, in Section 5 a novel algorithm to wide baseline stereo matching is introduced. A new set of image elements that are put into correspondence, the so called extremal regions, is introduced. Extremal regions possess highly desirable properties: the set is closed under 1. continuous (and thus projective) transformation of image coordinates and 2. monotonic transformation of image intensities. An efficient (near linear complexity) and practically fast detection algorithm (near frame rate) is presented for an affinely-invariant stable subset of extremal regions, the maximally stable extremal regions (MSER). A new robust similarity measure for establishing tentative correspondences is proposed. The robustness ensures that invariants from multiple measurement regions (regions obtained by invariant constructions from extremal regions), some that are significantly larger (and hence discriminative) than the MSERs, may be used to establish tentative correspondences. The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes. Significant change of scale (3.5 ), illumination conditions, out-of-plane rotation, occlusion, locally anisotropic scale change and 3D translation of the viewpoint are all present in the test problems. Good estimates of epipolar geometry (average distance from corresponding points to the epipolar line below.9 of the inter-pixel distance) are obtained. In Section 6 conclusions and the thesis proposal are given. 3

6 2 RANSAC The structure of the RANSAC algorithm is simple but powerful (see Algorithm 1). Repeatedly, subsets are randomly selected from the input data and model parameters fitting the sample are computed. The size of the random samples is the smallest sufficient for determining model parameters. In a second step, the quality of the model parameters is evaluated on the full data set. Different cost functions may be used [31] for the evaluation, the standard being the number of inliers, i.e. the number of data points consistent with the model. The process is terminated when the likelihood of finding a better model becomes low. The strength of the method stems from the fact that, to find a good solution, it is sufficient to select a single random sample not contaminated by outliers. Depending on the complexity of the model (the size of random samples) RANSAC can handle contamination levels well above 5%, which is commonly assumed to be a practical limit in robust statistics [24]. In: U = {x i } set of data points, U = N f : S p computes model parameters from a data point sample ρ(p,x) the cost function for a single data point (e.g. 1 if x is an inlier to the model with parameters p, otherwise) Out: p parameters of the model maximizing the cost function k := Repeat until P{better solution exists} < η (a function of C = max(c i ), i = 1..k, the cost (quality) of the best tested model and no. of steps k) k := k + 1 I. Hypothesis (1) select randomly set S k U, S k = m (2) compute parameters p k = f(s k ) II. Evaluation (3) compute cost (quality) C k = x U ρ(p k,x) (4) if C < C k then C := C k, p := p k Algorithm 1: Summary of the standard version of the RANSAC algorithm. 4

7 3 Randomized RANSAC The speed of RANSAC depends on two factors. Firstly, the level of contamination determines the number of random samples that have to be taken to guarantee a certain confidence in the optimality of the solution. Secondly, the time spent evaluating the quality of each of the hypothesized model parameters is proportional to the size N of the data set. Typically, a very large number of erroneous model parameters obtained from contaminated samples are evaluated. Such models are consistent with only a small fraction of the data. This observation can be exploited to significantly increase the speed of the RANSAC algorithm. As the main contribution of this work, we show that under a broad range of conditions, RANSAC efficiency is significantly improved if its hypothesis evaluation step is randomized. The core idea of the Randomized (hypothesis evaluation) RANSAC is that most model parameter hypotheses evaluated are influenced by outliers. For such erroneous models, it is sufficient to test only a small number of data points d from the total of N points (d N) to conclude, with high confidence, that they do not correspond to the sought solution. The idea is implemented in a two-step evaluation procedure. First, a statistical test is performed on d randomly selected data points. The final evaluation on all N data points is carried out only if the pre-test is passed. The increase in the speed of the modified RANSAC depends on the likelihoods of the two types of errors made in the pre-test: 1. rejection of an uncontaminated model and 2. acceptance of a contaminated model. Since RANSAC is already a randomized algorithm, the randomization of model evaluation does not change the nature of the solution it is only correct with a certain probability. However, the same confidence in the solution is obtained in, on average, a shorter time. Finding an optimal pre-test with the fastest average behaviour is naturally desirable, but very complex. Instead we introduce in Section 3.2 a mathematically tractable class of pre-tests based on small test samples. For this class we derive an approximate relation for optimal setting of its single parameter. The proposed pre-tests are assessed on both synthetic data and real-world problems and performance improvements are demonstrated. The structure of this section is as follows. First, in Section 3.1, the concept of evaluation with pre-tests is introduced and formulae describing the total complexity of the algorithm are derived. Both the number of samples drawn and the amount of time spent on evaluation of a hypothesized model are discussed in detail. In Section 3.2, the d-out-of-d class of pre-test is introduced and analyzed. In Section 3.3 both simulated and real experiments are presented and their results discussed. The work is concluded in Section 3.4 and plans for future work are discussed. 5

8 3.1 Algorithm In this section, the time complexity of the RANSAC algorithm is expressed as a function of quantities that characterise the input data and the complexity of the model. We start by introducing the notations. The set of all data points is denoted U, the number of data points N = U, and ε represents the fraction of inliers in the data set. The size of the sample is m, i.e. the number of data points necessary to compute model parameters. Let us first express the total time spent in the R-RANSAC procedure. From the analysis of the algorithm (Table 2) we derived the average time spent in R- RANSAC in number of verified data points J = k(t M + t), (1) where k is the number of samples drawn, t is the average number of data points verified within one model evaluation, and t M is the time necessary to compute the parameter of the model from the selected sample.the time needed to verify the consistency of one data point with the hypothesized parameters was chosen as a unit of time. Note that t M is a constant independent of both the number of data points N and the fraction of inliers ε. From (1) we see, that the average time spent in R-RANSAC depends on both the number of samples drawn k and the average time required to process each sample. The analysis of these two components follows. The number of tested hypothesis, which is equal to the number of samples, depends (besides other factors) on the termination condition. Two different termination criteria may be adopted in RANSAC. The hypothesize-verify loop is either stopped after evaluation of more samples than expected on average to select a good (uncontaminated) sample. Alternatively, the number of samples is chosen to ensure that the probability that a better-than-currently-best sample is missed is lower than a predefined confidence level. We show that the stopping times for the two cases, average-driven and confidence-driven, differ only by a multiplicative factor and hence the optimal value in the proposed test is reached with the same parameters. Since the sample is selected without replacement, the probability of taking a good sample is ( ) I m 1 m I! (N m)! P I = ( ) = N (I m)! N! = I j N j, j= m where I = εn stands for the number of inliers. For N m a simple and accurate 6

9 In: U = {x i } set of data points, U = N f : S p computes model parameters from a data point sample ρ(p, x) the cost function for a single data point (e.g. 1 if x is an inlier to the model with parameters p, otherwise) Out: p parameters of the model maximizing the cost function k := Repeat until P{better solution exists} < η (a function of C = max(c i ), i = 1..k, the cost (quality) of the best tested model and no. of steps k) k := k + 1 I. Hypothesis (1) select randomly set S k U, S k = m (2) compute parameters p k = f(s k ) II. Preliminary test (3) perform test based on d N data points (4) continue verification only if the test is passed III. Evaluation (5) compute cost (quality) C k = x U ρ(p k,x) (6) if C < C k then C := C k, p := p k Algorithm 2: Summary of RANSAC and R-RANSAC algorithms. Step II is added to RANSAC to randomize its cost function evaluation. approximation is obtained P I ε m, (2) which is exactly correct for sampling with replacement and commonly used in the literature. Since P I > ε m, running RANSAC without replacement is on average faster than estimated with approximation (2). The average number of samples taken before the first uncontaminated is given by (from properties of the geometric distribution) k = 1 ε m α, (3) where α is the probability of a good sample passing the preverification test. Note that for the randomized version of RANSAC the number of samples is higher than 7

10 or equal to the standard version, because a valid solution may be rejected in a preliminary test with probability 1 α. In the confidence-driven sampling, at least k samples have to be taken to reduce the probability of missing a good sample below a predefined confidence level η. Thus we get, as in [3], and solving for k leads to η = (1 ε m α) k, (4) k = log η log (1 ε m α). (5) Since (1 x) is the first order Taylor expansion of e x at zero, and (1 x) e x, we have η = (1 ε m α) k e εm α k ln η ε m α k ln η ε m α k We see, that k k( ln η), where ln η is a predefined constant, so all formulae obtained for the η-confidence driven case can be trivially modified to cover the average case. The number of data points points tested. So far we have seen that introduction of a preliminary test has increased the number of samples drawn. For the pre-test to make sense, this effect must be more than offset by the reduction in the average number of data points tested per hypothesis. There are two cases to be considered. First, with probability P I an uncontaminated ( good ) sample is drawn. Then the preverification test is passed with probability α and all N data points are verified. Else, with probability 1 α, this good sample is rejected and only t α data points are on average tested. In the second case, a contaminated ( bad ) sample is drawn, and this happens with probability 1 P I. Again either the pre-verification step is passed, but this time with a different probability β, and the full test with all N data points is carried out, or with probability 1 β, only t β data points are tested in the preverification test. Here β stands for the probability, that a bad sample passes the preverification test. Note that it is important that β α, i.e. a bad (contaminated) sample is consistent with a smaller number of data points than a good sample. Forming a weighted average of the four cases, the formula for the average number of tests per sample is obtained: t(d)=p I ( αn+(1 α) t α ) + (1 PI ) ( βn+(1 β) t β ). (6) Values of α, β, t α, and t β depend on the type of preverification test. 8

11 3.2 The T d,d Test In this section we introduce a simple and thus mathematically tractable class of preverification tests. Despite its simplicity, we show in the simulations and experiments of Section 3.3 its potential. The test we analyze is defined as follows: Definition 1 (the T(d,d)) The T(d,d) is passed if all d data points out of d randomly selected are consistent with the hypothesized model. In the rest of this section we derive the optimal value for d. First of all we express constants as introduced in the previous section as α = ε d and β = δ d, where δ is the probability that a data point is consistent with a random model. Since we do not need to test all d points (since single failure means that the pre-test failed), the average time spent in the preverification test is Since t α = d i (1 ε) ε i 1 and t β = i=1 d i(1 x)x i 1 i=1 i=1 d i (1 δ) δ i 1 i=1 i(1 x)x i 1 = 1 1 x, (7) we have t α 1 and t β 1 1 ε 1 δ. The approximation we get after substituting 7 into (6) ( t(d) ε m ε d N + 1 ) εd + (1 ε m ) (δ d N + 1 ) δd 1 ε 1 δ is too complicated for finding optimal d. Therefore, we incorporate the following approximations (1 ε m ) 1 δd 1 δ 1, (1 ε m )δ d N δ d N, and ε d N 1 εd 1 ε, which are sufficiently accurate for commonly encountered values of ε, δ and N. After applying these approximations, we have t(d) N δ d ε m+d N (8) 9

12 The average time spent in R-RANSAC in number of verified data points is then approximately J(T d,d ) 1 ε m ε d ( N δ d + ε m+d N t M ) (9) We are looking for the minimum of J(T d,d ) which is found by solving for d in =. The optimal length of the T d,d test is J(T d,d ) d d ( ) ln ln ε(tm +1) N (ln δ ln ε). (1) ln δ The value of d opt must be an integer greater or equal to zero, so it could be expressed as d opt = max(, arg min J(T d,d)). (11) d { d, d } Since the cost function J(T d,d ) has only one extremum and for d ± we have J(T d,d ), we can say that R-RANSAC is faster than the standard RANSAC if J(T, ) > J(T 1,1 ). From this equation we get 3.3 Experiments N > (t M + 1) 1 ε ε δ. (12) In this section are experiments that show the usefulness of the new randomized RANSAC algorithm with a preverification tests. The speed-up is demonstrated on the problem of epipolar geometry estimation. Three experiments are conducted on data from a synthetic, a short (standard) and wide-baseline stereo matching problems. Results of these experiments are summarized in Tables 1, 2, and 3 respectively. The structure of the tables is the following. The first column shows the length d of the T d,d test, where d = means standard RANSAC. The number of samples, each consisting of 7 point-to-point correspondences, that were used for model parameter estimation is given in the second column. Since the seven-point algorithm [8] for computation of the fundamental matrix may lead to one or three solutions, the next column, labeled models, shows the number of hypothesized fundamental matrices. The tests column displays the number of point-to-point correspondences evaluated during the procedure. In the penultimate column, the average number of inliers detected is given. The last column is rather informative 1

13 d samples models tests inliers time Table 1: Synthetic experiment on 15 correspondences, 4% of inliers, 3 repetitions. d samples models tests inliers time Table 2: Short baseline experiment on 676 tentative correspondences. and shows the time in seconds taken by the algorithm. This is strongly dependent on the implementation. Synthetic experiment. 15 correspondences were generated, 9 outliers and 6 inliers. Since the run-time of both RANSAC and R-RANSAC is a random variable, the programs were executed 3 times and averages were taken. Result are shown in Table 1. Since the number of correspondences is large, the standard RANSAC algorithm spends a long time verifying all correspondence as can be seen in column tests. The short baseline experiment was conducted on the images from a standard dataset of the Leuven castle [22]. There were 676 tentative correspondences of the Harris interest points selected in the basis of the cross-correlation of neigbourhoods. The tentative correspondences contained approximately 6% of inliers. Looking at Table 2, we see that approximately twice as many fundamental matrices were hypothesized in R-RANSAC, but more than nine times less correspondences were evaluated. Wide baseline experiment on the BOOKSHELF dataset. The tentative correspondences were formed as follows. Discriminative regions (MSERs, SECs) [14] were detected. Robust similarity functions on the affine invariant description were used to establish the mutually nearest pair of regions. Point correspondences were obtained as centers of gravity of those regions. There were less then 4% of inliers among the correspondences. The speeding-up in this experiment, shown in Table 3, is approximately 5%. 3.4 Conclusion We presented a new algorithm called R-RANSAC, which increased the speed of model parameter estimation under a broad range of conditions, due to randomiza- 11

d samples models tests inliers time 394 7582 378184 161 12.9 1 6366 15583 178217 164 8.

14 d samples models tests inliers time Table 3: Wide baseline experiment on 413 tentative correspondences. Figure 1: Short baseline image set 12

15 Figure 2: Wide baseline image set 13

16 tion of the hypothesis evaluation step. For samples contaminated by outliers, it was shown that it was sufficient to test only a small number of data points d N to conclude with high confidence that they do not correspond to the sought solution. The idea was implemented in a two-step evaluation procedure (Table 2). We introduced a mathematically tractable class of pre-tests based on small test samples. For this class an approximate relation for optimal setting of its single parameter was derived. The proposed pre-test was evaluated on both synthetic data and real-world problems and a significant increase in speed was observed. The task for the future is to design an optimal preverification test in a class broader then the T d,d. 14

17 4 Locally optimized RANSAC In a classical formulation of RANSAC, the problem is to find all inliers in a set of data points. The number of inliers I is typically not known a priori. Inliers are data points consistent with the best model, e.g. epipolar geometry or homography in a two view correspondence problem or line or ellipse parameters in the case of detection of geometric primitives. The RANSAC procedure finds, with a certain probability, the inliers and the corresponding model by repeatedly drawing random samples from the input set of data points. RANSAC is popular because it is simple and it works well in practice. The reason is that almost no assumptions are made about the data and no (unrealistic) conditions have to be satisfied for RANSAC to succeed. However, it has been observed experimentally that RANSAC runs much longer (even by an order of magnitude) than theoretically predicted [29]. The discrepancy is due to one assumption of RANSAC that is rarely true in practice: it is assumed that a model with parameters computed from an uncontaminated sample is consistent with all inliers. 2 In this section we propose a novel, and easy-to-implement, modification of RANSAC exploiting the fact that the model hypothesis from an uncontaminated minimal sample is almost always sufficiently near the optimal solution and a local optimization step applied to selected models produces an algorithm with near perfect agreement with theoretical (i.e. optimal) performance. This approach not only increases the number of inliers found and consequently speeds up the RANSAC procedure by allowing its earlier termination, but also returns models of higher precision. The increase of average time spent in a single RANSAC verification step is minimal. The proposed optimization strategy guarantees that the number of samples to which the optimization is applied is insignificant. The main contributions of this work are (a) modification of the RANSAC that simultaneously improve the speed of the algorithm and and the quality of the solution (which is near to optimal) (b) introduction of two local optimization methods (c) a rule for application of the local optimization and a theoretical analysis showing the local optimization is applied at most log k times, where k is number of samples drawn. In experiments on two image geometry estimation (epipolar geometry and homography) the speed-up achieved is two to three fold. The problem described above was noticed by Tordoff and Murray [11], who required real-time performance. The necessary speed-up was achieved by providing the estimation process with additional information in the form of probability of correctness for each data point. However, such information is not always available and it is in general difficult to estimated the probabilities reliably. 2 Experiments reported in Section 4.3 confirm that the assumption does not hold 15

18 In contrast, the modification proposed in this work is internal only (requires no extra input information), it does not interfere with other modifications of the algorithm, the MLESAC [31], R-RANSAC [2] and NAPSAC [21]. MLESAC, proposed by Torr and Zisserman, defines a cost function in the maximal likelihood framework. The R-RANSAC algorithm increases the speed of the algorithm by randomization of its verification part. NAPSAC focuses on the selection of samples. In fact, all these modification can be used in conjunction. The structure of this section is as follows. First, in Section 4.1, the motivation of this work is discussed in detail and the general algorithm of locally optimized RANSAC is described. Four different methods of local optimization are proposed in Section 4.2. All methods are experimentally tested and evaluated through epipolar geometry and homography estimation. The results are shown and discussed in Section 4.3. The work is concluded in Section Algorithm The structure of the RANSAC algorithm is simple but powerful. Repeatedly, subsets are randomly selected from the input data and model parameters fitting the sample are computed. The size of the random samples is the smallest sufficient for determining model parameters. In a second step, the quality of the model parameters is evaluated on the full data set. Different cost functions may be used [31] for the evaluation, the standard being the number of inliers, i.e. the number of data points consistent with the model. The process is terminated [5, 33] when the likelihood of finding a better model becomes low, i.e. the probability η of missing a set of inliers of size I within k samples falls under predefined threshold η = (1 P I ) k. (13) Symbol P I stands for the probability, that an uncontaminated sample of size m is randomly selected from N data points ( I m) P I = ( N = m) m 1 j= I j N j εm, (14) where ε is the fraction of inliers ε = I/N. The number of samples that has to be drawn to ensure given η is k = log(η)/ log(1 P I ). From equations (13) and (14), it can be seen, that termination criterion based on probability η expects that a selection of a single random sample not contaminated by outliers is followed by a discovery of whole set of I inliers. However, this 16

19 assumption is often not valid since inliers are perturbed by noise. Since RANSAC generates hypotheses from minimal sets, the influence of noise is not negligible, and the set of correspondences the size of which is smaller than I is found. The consequence is an increase in the number of samples before the termination of the algorithm. The effect is clearly visible in the histograms of the number of inliers found by standard RANSAC. The first column of Figure 4 shows the histogram for five matching experiments. The number of inliers varies by about 2-3%. We propose a modification that increases the number of inliers found near to the optimum I. This is achieved via a local optimization of promising samples. For the summary of the locally optimized RANSAC see Algorithm 3. Repeat until the probability of finding better solution falls under predefined threshold, as in equation (13): 1. Select a random sample of the minimum number of data points S m. 2. Estimate the model parameters consistent with this minimal set. 3. Calculate the number of inliers I k, i.e. the data points their error is smaller than predefined threshold θ. 4. If new maximum has occurred (I k > I j for all j < k), run local optimization. Store the best model. Algorithm 3: A brief summary of the LO-RANSAC The local optimization step is carried out only if a new maximum in the number of inliers from the current sample has occurred, i.e. when standard RANSAC stores its best result. The number of consistent data points with a model from a randomly selected sample can be thought of as a random variable with unknown (or very complicated) density function. This density function is the same for all samples, so the probability that k-th sample will be the best so far is 1/k. Then, the average number of reaching maxima within k samples is k 1 1 k x 1 dx + 1 = log k x Note, that this is the upper bound as the number of correspondences is finite and discrete and so the same number of inliers will occur often. This theoretical bound was confirmed experimentally, the average numbers of local optimization over an execution of (locally optimized) RANSAC can be found in Table 6. For more details about experiments see Section

20 Average error Epipolar geometry from sampled points ALL Standard deviation of the error Epipolar geometry from sampled points Noise level Noise level Figure 3: The average error (left) and the standard deviation of the error for samples of 7,8,9, 14 and all 1 points respectively with respect to the noise level. 4.2 Local Optimization Methods The following methods of local optimization have been tested. The choice is motivated by the two observations that are given later in this section. 1. Standard. The standard implementation of RANSAC without any local optimization. 2. Simple. Take all data points with error smaller than θ and use a linear algorithm to hypothesize new model parameters. 3. Iterative. Take all data points with error smaller that K θ and use linear algorithm to compute new model parameters. Reduce the threshold and iterate until the threshold is θ. 4. Inner RANSAC. A new sampling procedure is ran only on I k data points consistent with the hypothesised model. As the sampling is running on inlier data, there is no need for the size of sample to be minimal. On the contrary, the size of the sample is selected to minimize the error of the model parameter estimation. In our experiments the size of samples are set to min(i k /2, 14) for epipolar geometry (see results in Section 4.2) and to min(i k /2, 12) for the case of homography estimation. The number of repetitions is set to ten in the experiments presented. 5. Inner RANSAC with iteration. This method is similar to the previous one, the difference being that each sample of the inner RANSAC is processed by method 3. The local optimization methods are based on the two following observations. Observation 1: The Size of Sample The less information (data points) is used to estimate the model parameters in the presence of noise, the less accurate the model is. The reason for RANSAC to draw 18

21 minimal samples is that every extra point exponentially decreases the probability of selecting an outlier-free sample, which is approximately 3 ε m where m is the size of the sample (i.e. the number of data points included in the sample). It has been shown in [3], that the fundamental matrix estimated from a seven point sample is more precise than the one estimated form eight points using a linear algorithm [7]. This is due to the singularity enforcement in the eight point algorithm. However, the following experiment shows, that this holds only for eight point samples and taking nine or more points gives more stable results than those obtained when the fundamental matrix is computed from seven points only. Experiment: This experiment shows, how the quality of a hypothesis depends on the number of correspondences used to calculate the fundamental matrix. For seven points, the seven point algorithm was used [3] and for eight and more points the linear algorithm [7] was used. The course of experiment was as follows. Noise of different levels was added to the noise-free image points correspondences divided into two sets of hundred correspondences. Samples of different sizes were drawn from the first set and the average error over the second was computed. This was repeated 1 times for each noise level. Results are displayed in Figure 3. This experiment demonstrates, that the more points are used to estimate the model (in this case fundamental matrix) the more precise solution is obtained (with the exception of eight points). The experiment also shows that the minimal sample gives hypotheses of rather poor quality. One can use different cost functions that are more complicated than simply the number of inliers, but evaluating this function only at parameters arising from the minimal sample will get results at best equal to the proposed method of local optimization. Observation 2: Iterative Scheme It is well known from the robust statistic literature, that pseudo-robust algorithms that first estimate model parameters from all data by least squares minimization, then remove the data points with the biggest error (or residual) and iteratively repeat this procedure do not lead to correct estimates. It can be easily shown, that a single far outlying data point, i.e. leverage point, will cause a total destruction of the estimated model parameters. That is because such a leverage point overweights even the majority of inliers in least-squares minimization. This algorithm works only well, when the outliers are not overbearing, so the majority of inliers have bigger influence on the least squares. In local optimization method 3 there are no leverage points, as each data point has error below K θ subject to the sampled model. 3 This is exact for the sampling with replacement. 19

22 4.3 Experimental Results The proposed algorithm was extensively tested on the problem of estimating two view relations (epipolar geometry and homography) from image point correspondences. Five experiments are presented in this section, all of them on publicly available data, depicted in Figures 5 and 6. In experiments A and B, the epipolar geometry is estimated in a wide-baseline setting. In experiment C, the epipolar geometry was estimated too, this time from short-baseline stereo images. From the point of view of RANSAC use, the narrow and wide baseline problems differ by the number of correspondences and inliers (see Table 4), and also by the distribution of errors of outliers. Experiments D and E try to recover homography. The scene in experiment E is the same as in experiment A and this experiment could be seen as a plane segmentation. All tentative correspondences were detected and matched automatically. Algorithms were implemented in C and the experiments were ran on AMD K7 18+ MHz processor. The terminating criterion based on equation (13) was set to η <.5. The threshold θ was set to θ = 3.84σ 2 for the epipolar geometry and θ = 5.99σ 2 for the homography. In both cases the expected σ was set to σ =.3. The characterization of the matching problem, such as number of correspondences, the total number of inliers and expected number of samples, are summarized in Table 4. The total number of inliers was set to the maximal number of inliers obtained over all methods over all repetitions. The expected number of samples was calculated according to the termination criterion mentioned above. Performance of local optimization methods 1 to 5 was evaluated on problems A to E. The results for 1 runs are summarized in Table 5. For each experiment, a table containing the average number of inliers, average number of samples drawn, average time spent in RANSAC (in seconds) and efficiency (the ratio of the number of samples drawn and expected) is shown. Table 6 shows both, how many times the local optimization has been applied and the theoretical upper bound derived in Section 4.1. The method 5 achieved the best results in all experiments in the number of samples and differs slightly from the theoretically expected number. On the other hand standard RANSAC exceeds this limit times. In Figure 4 the histograms of the sizes of the resulting inliers sets are shown. Each column shows results for one method, each row for one experiment. One can observe that the peaks are shifting to the higher values with the increasing identification number of method. Method 5 reaches the best results in terms of sizes of inlier sets and consequently in number of samples before termination. This method should be used when the fraction of inliers is low. Resampling, on the other hand, might be quite costly in the case of high number of inliers, especially if accompanied by a small 2

23 A B C D E # corr # inl ε 61% 29% 32% 19% 18% # sam Table 4: Characteristics of experiments A-E. Total number of correspondences, maximal number of inliers found within all tests, fraction of inliers ε and theoretically expected number of samples inl A sam time eff inl B sam time eff inl C sam time eff inl D sam time eff inl E sam time eff Table 5: The summary of local optimization experiments: average number of inliers (inl) and samples taken (sam), average time in seconds and efficiency (eff). The best values for each row are highlighted in bold. For more details see the description in text in Section

24 A B C D E Figure 4: Histograms of the number of inliers. The methods 1 to 5 (1 stands for standard RANSAC) are stored in columns and different dataset are shown in rows (A to E). On each graph, there is a number of inliers on the x-axis and how many times this number was reached within one hundred repetitions on the y-axis A 3. : : : : : 4.7 B 6.4 : : : : :1.6 C 7.7 : : : : : 9.2 D 5.2 : : : : : 8.1 E 4.8 : : : : : 8.6 Table 6: The average number of local optimizations ran during one execution of RANSAC and logarithm of average number of samples for comparison. number of correspondences in total) as could be seen in experiment A (61 % of inliers out of 94 correspondences). In this case, method 3 was the fastest. Method 3 got significantly better results than the standard RANSAC in all experiments, the speed up was about 1%, and slightly worse than for method 5. We suggest to use method 3 in real-time procedures when a high number of inliers is expected. Methods 2 and 4 are inferior to methods with iteration (3 and 5 respectively) without any time saving advantage. 4.4 Conclusions This section has introduced a simple modification of the RANSAC algorithm that increases the number of detected inliers. Consequently, the number of samples drawn decreased. In all experiments, the run-time is reduced by a factor of at 22

25 Figure 5: Image pairs and detected points used in epipolar geometry experiments (A - C). Inliers are marked as dots in left images and outliers as crosses in right images. 23

26 Figure 6: Image pairs and detected points used in homography experiments (D and E). Inliers are marked as dots in left images and outliers as crosses in right images. 24

27 least two, which may be very important in real-time application incorporating a RANSAC step. Two methods of local optimization were proposed: method 3 is recommended for problems with a large fraction of inliers and small amount of data points (which is typical for real-time applications) and method 5, which reaches almost optimal results. It has been shown and experimentally checked that the number of times local optimization is applied is lower than logarithm of the number of samples drawn. The proposed improvement allows to make precise quantitative statements about the number of samples drawn in RANSAC. The behavior of the modified RANSAC is in much closer agreement with the mathematical model than a straighfowrard implementation. 25

28 5 Wide baseline stereo Finding reliable correspondences in two images of a scene taken from arbitrary viewpoints viewed with possibly different cameras and in different illumination conditions is a difficult and critical step towards fully automatic reconstruction of 3D scenes [8]. A crucial issue is the choice of elements whose correspondence is sought. In the wide-baseline set-up, local image deformations cannot be realistically approximated by translation or translation with rotation and a full affine model is required. Correspondence cannot be therefore established by comparing regions of a fixed (Euclidean) shape like rectangles or circles since their shape is not preserved under affine transformation. In most images there are regions that can be detected with high repeatability since they posses some distinguishing, invariant and stable property. We argue that such regions of, in general, data-dependent shape, called distinguished regions (DRs), may serve as the elements to be put into correspondence either in stereo matching or object recognition. The first contribution of the work is the introduction of a new set of distinguished regions, the so called extremal regions. Extremal regions have two desirable properties. The set is closed under continuous (and thus perspective) transformation of image coordinates and, secondly, it is closed under monotonic transformation of image intensities. An efficient (near linear complexity) and practically fast detection algorithm is presented for an affinely-invariant stable subset of extremal regions, the maximally stable extremal regions (MSER). Robustness of a particular type of DR depends on the image data and must be tested experimentally. Successful wide-baseline experiments on indoor and outdoor datasets presented in Section 5.3 demonstrate the potential of MSERs. Reliable extraction of a manageable number of potentially corresponding image elements is a necessary but certainly not a sufficient prerequisite for successful wide-baseline matching. With two sets of distinguished regions, the matching problem can be posed as a search in the correspondence space [6]. Forming a complete bipartite graph on the two sets of DRs and searching for a globally consistent subset of correspondences is clearly out of question for computational reasons. Recently, a whole class of stereo matching and object recognition algorithms with common structure has emerged [23, 34, 1, 35, 4, 28, 18, 11]. These methods exploit local invariant descriptors to limit the number of tentative correspondences. Important design decisions at this stage include: 1. the choice of measurement regions, i.e. the parts of the image on which invariants are computed, 2. the method of selecting tentative correspondences given the invariant description and 3. the choice of invariants. Typically, distinguished regions or their scaled version serve as measurement 26

29 regions and tentative correspondences are established by comparing invariants using Mahalanobis distance [25, 35, 26]. As a second novelty of the presented approach, a robust similarity measure for establishing tentative correspondences is proposed to replace the Mahalanobis distance. The robustness of the proposed similarity measure allows us to use invariants from a collection of measurement regions, even some that are much larger than the associated distinguished region. Measurements from large regions are either very discriminative (it is very unlikely that two large parts of the image are identical) or completely wrong (e.g. if orientation or depth discontinuity becomes part of the region). The former helps establishing reliable tentative (local) correspondences, the influence of the latter is limited due to the robustness of the approach. Finding epipolar geometry consistent with the largest number of tentative (local) correspondences is the final step of all wide-baseline algorithms. RANSAChas been by far the most widely adopted method since [32]. The presented algorithm takes novel steps to increase the number of matched regions and the precision of the epipolar geometry. The rough epipolar geometry estimated from tentative correspondences is used to guide the search for further region matches. It restricts location to epipolar lines and provides an estimate of affine mapping between corresponding regions. This mapping allows the use of correlation to filter out mismatches. The process significantly increases precision of the EG estimate; the final average inlier distance-from-epipolar-line is below.1 pixel. For details see Section 5.2. Related work. Since the influential paper by Schmid and Mohr [26] many image matching and wide-baseline stereo algorithms have been proposed, most commonly using Harris interest points as distinguished regions. Tell and Carlsson [28] proposed a method where line segments connecting Harris interest points form measurement regions. The measurements are characterised by scale invariant Fourier coefficients. The Harris interest detector is stable over a range of scales, but defines no scale or affine invariant measurement region. Baumberg [1] applied an iterative scheme originally proposed by Lindeberg and Garding to associate affine-invariant measurement regions with Harris interest points. In [18], Mikolajczyk and Schmid show that a scale-invariant MR can be found around Harris interest points. In [23], Pritchett and Zisserman form groups of line segments and estimate local homographies using parallelograms as measurement regions. Tuytelaars and Van Gool introduced two new classes of affine-invariant distinguished regions, one based on local intensity extrema [35] the other using point and curve features [34]. In the latter approach, DRs are characterised by measurements from inside an ellipse, constructed in an affine invariant manner. Lowe [11] describes the Scale Invariant Feature Transform approach which produces a scale and orientation-invariant characterisation of interest points. The rest of this section is structured as follows. Maximally Stable Extremal 27

30 Image I is a mapping I : D Z 2 S. Extremal regions are well defined on images if: 1. S is totally ordered, i.e. reflexive, antisymmetric and transitive binary relation exists. In this work only S = {, 1,..., 255} is considered, but extremal regions can be defined on e.g. real-valued images (S = R). 2. An adjacency (neighbourhood) relation A D D is defined. In this work 4-neighbourhoods are used, i.e. p, q D are adjacent (paq) iff d i=1 p i q i 1. Region Q is a contiguous subset of D, i.e. for each p, q Q there is a sequence p, a 1, a 2,..., a n, q and paa 1, a i Aa i+1, a n Aq. (Outer) Region Boundary Q = {q D \ Q : p Q : qap}, i.e. the boundary Q of Q is the set of pixels being adjacent to at least one pixel of Q but not belonging to Q. Extremal Region Q D is a region such that for all p Q, q Q : I(p) > I(q) (maximum intensity region) or I(p) < I(q) (minimum intensity region). Maximally Stable Extremal Region (MSER). Let Q 1,..., Q i 1, Q i,... be a sequence of nested extremal regions, i.e. Q i Q i+1. Extremal region Q i is maximally stable iff q(i) = Q i+ \ Q i / Q i has a local minimum at i (. denotes cardinality). S is a parameter of the method. Table 7: Definitions used in Section 5.1 Regions are defined and their detection algorithm is described in Section 5.1. In Section 5.2, details of a novel robust matching algorithm are given. Experimental results on outdoor and indoor images taken with an uncalibrated camera are presented in Section 5.3. Presented experiments are summarized and the contributions of the work are reviewed in Section Maximally Stable Extremal Regions In this section, we introduce a new type of image elements useful in wide-baseline matching the Maximally Stable Extremal Regions. The regions are defined solely by an extremal property of the intensity function in the region and on its outer boundary. The concept can be explained informally as follows. Imagine all possible thresholdings of a gray-level image I. We will refer to the pixels below a threshold as black and to those above or equal as white. If we were shown a movie of thresholded images I t, with frame t corresponding to threshold t, we would see first a white image. Subsequently black spots corresponding to local intensity minima will appear and grow. At some point regions corresponding to two local 28

31 minima will merge. Finally, the last image will be black. The set of all connected components of all frames of the movie is the set of all maximal regions; minimal regions could be obtained by inverting the intensity of I and running the same process. The formal definition of the MSER concept and the necessary auxiliary definitions are given in Table 7. In many images, local binarization is stable over a large range of thresholds in certain regions. Such regions are of interest since they posses the following properties: Invariance to affine transformation of image intensities. Covariance to adjacency preserving (continuous) transformation T : D D on the image domain. Stability, since only extremal regions whose support is virtually unchanged over a range of thresholds is selected. Multi-scale detection. Since no smoothing is involved, both very fine and very large structure is detected. The set of all extremal regions can be enumerated in O(n loglog n), where n is the number of pixels in the image. Enumeration of extremal regions proceeds as follows. First, pixels are sorted by intensity. The computational complexity of this step is O(n) if the range of image values S is small, e.g. the typical {,..., 255}, since the sort can be implemented as BINSORT [27]. After sorting, pixels are placed in the image (either in decreasing or increasing order) and the list of connected components and their areas is maintained using the efficient union-find algorithm [27]. The complexity of our union-find implementation is O(n log log n), i.e. almost linear 4. Importantly, the algorithm is very fast in practice. The MSER detection takes only.14 seconds on a Linux PC with the Athlon XP 16+ processor for an 53x35 image (n = 1855). The process produces a data structure storing the area of each connected component as a function of intensity. A merge of two components is viewed as termination of existence of the smaller component and an insertion of all pixels of the smaller component into the larger one. Finally, intensity levels that are local minima of the rate of change of the area function are selected as thresholds producing maximally stable extremal regions. In the output, each MSER is represented by position of a local intensity minimum (or maximum) and a threshold. Notes. The structure of the above algorithm and of an efficient watershed algorithm [36] is essentially identical. However, the structure of the output of 4 even faster (but more complex) connected component algorithms exist with O(nα(n)) complexity, where α is the inverse Ackerman function; α(n) 4 for all practical n. 29

32 the two algorithms is different. The watershed is a partitioning of D, i.e. a set of regions R i : R i = D, R j R k =. In watershed computation, focus is on the thresholds where regions merge (and two watersheds touch). Such threshold are of little interest here, since they are highly unstable after merge, the region area jumps. In MSER detection, we seek a range of thresholds that leaves the watershed basin effectively unchanged. Detection of MSER is also related to thresholding. Every extremal region is a connected component of a thresholded image. However, no global or optimal threshold is sought, all thresholds are tested and the stability of the connected components evaluated. The output of the MSER detector is not a binarized image. For some parts of the image, multiple stable thresholds exist and a system of nested subsets is output in this case. Finally we remark that MSERs can be defined on any image (even high-dimensional) whose pixel values are from a totally ordered set. 5.2 The proposed robust wide-baseline algorithm Distinguished region detection. As a first step, the DRs are detected - the MSERs computed on the intensity image (MSER+) and on the inverted image (MSER-). Measurement regions. A measurement region of arbitrary size may be associated with each DR, if the construction is affine-covariant. Smaller measurement regions are both more likely to satisfy the planarity condition and not to cross a discontinuity in depth or orientation. On the other hand, small regions are less discriminative, i. e. they are much less likely to be unique. Increasing the size of a measurement region carries the risk of including parts of background that are completely different in the two images considered. Clearly, the optimal size of a MR depends on the scene content and it is different for each DR. In [35], Tuytelaars at al. double the elliptical DR to increase discriminability, while keeping the probability of crossing object boundaries at an acceptable level. In the proposed algorithm, measurement regions are selected at multiple scales: the DR itself, 1.5, 2 and 3 times scaled convex hull of the DR. Since matching is accomplished in a robust manner, we benefit from the increase of distinctiveness of large regions without being severely affected by clutter or non-planarity of the DR s pre-image. This is a novelty of our approach. Commonly, Mahalanobis distance has been used in MR matching. However, the non-robustness of this metric means that matching may fail because of a single corrupted measurement (this happened in the experiments reported below). Invariant description. In all experiments, rotational invariants (based on complex moments) were used after applying a transformation that diagonalises the regions covariance matrix of the DR. In combination, this is an affinely-invariant procedure. Combination of rotational and affinely invariant generalised colour moments [19] gave a similar result. On their own, these affine invariants failed on 3

33 problems with a large scale change. Robust matching. A measurement taken from an almost planar patch of the scene with stable invariant description will be referred to as a good measurement. Unstable measurements or those computed on non-planar surfaces or at discontinuities in depth or orientation will be referred to as corrupted measurements. The robust similarity is computed as follows. For each measurement MA i on region A, k regions B 1,...,B k from the other image with the corresponding i-th measurement MB i 1,...,MB i k nearest to MA i are found and a vote is cast suggesting correspondence of A and each of B 1,..., B k. Votes are summed over all measurements. In the current implementation 216 invariants at each scale, i.e. a total of 864 measurements are used (i [1, 864]). The DRs with the largest number of votes are the candidates for tentative correspondences. Experimentally, we found that k set to 1% of the number of regions gives good results. Probabilistic analysis of the likelihood of the success of the procedure is not simple, since the distribution of invariants and their noise is image-dependent. We therefore only suppose that corrupted measurements spread their votes randomly, not conspiring to create a high score and that good measurements are more likely to vote for correct matches. Tentative correspondences using correlation. Invariant description is used as a preliminary test. The final selection of tentative correspondences is based on correlation. First transformations that diagonalise the covariance matrix of the DRs are applied. The resulting circular regions are correlated (for all relative rotations). This procedure is done efficiently in polar coordinates for different sizes of circles. Rough epipolar geometry (EG) is estimated by applying RANSAC to the centers of gravity of DRs. Subsequently, the precision of the EG estimate is significantly improved by the following process. First, an affine transformation between pairs of potentially corresponding DRs, i.e. the DRs consistent with the rough EG, is computed. Correspondence of covariance matrices defines an affine transformation up to a rotation. The rotation is determined from epipolar lines. Next, DR correspondences are pruned and only those with correlation of their transformed images above a threshold are selected. In the next step, RANSAC is applied again, but this time with a very narrow threshold. The final improvement of the EG is achieved by adding to RANSAC inliers DR pairs whose convex hull centres are EG-consistent. Commonly, DRs differ in minute differences that render their centres of gravity inconsistent with the fine EG, but the centers of the convex hulls are precise enough. The precision of the final EG, estimated linearly by the eight point algorithm (without bundle adjustment or radial distortion correction) is surprisingly high. The average distance of inliers from epipolar line is below.1 31

Figure 7: BOOKSHELF: Estimated epipolar geometry on indoor scene with significant scale change.

3 Experiments The following experiments were conducted: Bookshelf, (Fig. 7). The BOOKSHELF scene tests performance under a very large scale change.

$Repetitive patterns such as bricks are present. The part of the scene visible in both views covers a small fraction of the image. Wash, (Fig. 9).$ Results on this image set have been presented in [35]. The camera undergoes significant translation and rotation.

Results on this image set have been presented in [35]. The camera undergoes significant translation and rotation.

34 Figure 7: BOOKSHELF: Estimated epipolar geometry on indoor scene with significant scale change. In the cutouts the change in the resolution of detected DRs is clearly visible. pixel, see Table Experiments The following experiments were conducted: Bookshelf, (Fig. 7). The BOOKSHELF scene tests performance under a very large scale change. The corresponding DRs in the left view are confined only to a small part of the image since the rest of the scene is not visible in the second view. Different resolution of detected features is evident in the close-up. Valbonne, (Fig. 8). This outdoor scene has been analysed in the literature [25, 23]. Repetitive patterns such as bricks are present. The part of the scene visible in both views covers a small fraction of the image. Wash, (Fig. 9). Results on this image set have been presented in [35]. The camera undergoes significant translation and rotation. The ordering constraint is notably violated, objects appear on different backgrounds. Kampa, (Fig. 1), is an example of an urban outdoor scene. A relatively large fraction of the images is covered by changing sky. Repeating windows made matching difficult. Cylindrical Box, (Fig. 11, top and bottom left), shows a metal box on a textured 32

Figure 8: VALBONNE: Estimated epipolar geometry and points associated to the matched regions are shown in the

The regions matched on the box demonstrate performance on a non-planar surface.

taken with a flash (this strongly decreases the number of MSER +). Shout, (Fig. 11, bottom right).

Since the spectral power distribution of the illumination and the position of light sources is significantly

Results are summarized in Tables 8 and 9.

35 Figure 8: VALBONNE: Estimated epipolar geometry and points associated to the matched regions are shown in the first row. Cutouts in the second row show matched bricks. floor. The regions matched on the box demonstrate performance on a non-planar surface. A significant change of illumination and a strong specular reflection is present in the second image that was taken with a flash (this strongly decreases the number of MSER +). Shout, (Fig. 11, bottom right). This scene has been used in [35]. Since the spectral power distribution of the illumination and the position of light sources is significantly different, we included the test to demonstrate performance in variable illumination conditions. Results are summarized in Tables 8 and 9. Table 8 shows the number of detected DRs in the left right images for both types of the DRs (MSER- and number of: MSER - MSER + TC Bookshelf Valbonne Wash Kampa Cyl. Box Shout Table 8: Number of DRs detected in images. The number of tentative correspondences is given in the TC column. 33

Figure 9: WASH: Epipolar geometry and dense matched regions with fully affine distortion. MSER+).

Table 9 shows the number of correspondences established in different stages of the algorithm.

Column rough EG displays the number of tentative correspondences consistent with the rough estimate of the epipolar geometry.

The column headed EG + corr gives the number of correspondences consistent with rough EG that passed the correlation test.

The final number of correspondences is given in the penultimate column fine EG.

36 Figure 9: WASH: Epipolar geometry and dense matched regions with fully affine distortion. MSER+). The number of tentative correspondences is given in the last column of Table 8. Table 9 shows the number of correspondences established in different stages of the algorithm. Column TC repeats the number of tentative correspondences. Column rough EG displays the number of tentative correspondences consistent with the rough estimate of the epipolar geometry. The ratio of TC and rough EG determines the speed of the RANSAC algorithm. The column headed EG + corr gives the number of correspondences consistent with rough EG that passed the correlation test. Notice that the numbers are much higher than those in the rough EG column. The final number of correspondences is given in the penultimate column fine EG. Average distances from epipolar lines are presented in columns rough d and fine d. We can see, that the precision of the estimated epipolar geometry is very high, much higher than the precision of the rough EG. The last column shows the number of mismatches (found manually). 5.4 Conclusions A new method for wide-baseline matching was proposed. The three main novelties are: the introduction of MSERs, robust matching of local features and the use of multiple scaled measurement regions. The MSERs are sets of image elements, closed under the affine transforma- 34

Figure 1: Estimated EG on an outdoor scene. TC rough EG rough d EG + corr fine EG fine d miss Bookshelf 85 25.48 151 63.9 1 Valbonne 49 27.17 18 82.8 Wash 171 42.34 22 86.8 2 Kampa 33 78.34 422 185.

An efficient (near linear complexity) and practically fast detection algorithm was presented. The stability and high utility of MSERs was demonstrated experimentally.

37 Figure 1: Estimated EG on an outdoor scene. TC rough EG rough d EG + corr fine EG fine d miss Bookshelf Valbonne Wash Kampa Cyl. Box Shout Table 9: Experimental results. For details see the text, at the beginning of Section 5.3. tion of image coordinates and invariant to affine transformation of intensity. An efficient (near linear complexity) and practically fast detection algorithm was presented. The stability and high utility of MSERs was demonstrated experimentally. Another novelty of the approach is the use of a robust similarity measure for establishing tentative correspondences. Due to the robustness, we were able to consider invariants from multiple measurement regions, even some that were significantly larger (and hence probably discriminative) than the associated MSER. Good estimates of epipolar geometry were obtained on challenging widebaseline problems with the robustified matching algorithm operating on the output produced by the MSER detector. The average distance from corresponding points to the epipolar line was below.9 of the inter-pixel distance. Significant change of scale (3.5 ), illumination conditions, out-of-plane rotation, occlusion, locally anisotropic scale change and 3D translation of the viewpoint are all present in the test problems. Test images included both outdoor and indoor scenes, some already used in published work. In future work, we intend to proceed towards fully automatic projective reconstruction of the 3D scene, which requires computing projective reconstruction and dense matching. Secondly, we will investigate properties of robust similarity measures and their selection based on statistical properties of the data. 35

Figure 11: CYLINDRICAL BOX: Epipolar geometry (top) and matched regions (bottom left).

SHOUT (bottom right), a scene with a change of illumination spectral power distribution.

baseline stereo correspondence problem. Parts of the work were already presented in [2, 15, 13, 16, 12].

We would like to touch some of the following topics: Degenerated configurations in RANSAC.

An example of DC are identical points for a line or a coplanar points for a fundamental matrix.

38 Figure 11: CYLINDRICAL BOX: Epipolar geometry (top) and matched regions (bottom left). Fully affine distortion, a non-planar object, textured surface and a strong specular reflection are present in the scene. SHOUT (bottom right), a scene with a change of illumination spectral power distribution. 6 Conclusions and Thesis Proposal This work presented two improvements to the RANSAC algorithm and a novel algorithm for the wide baseline stereo correspondence problem. Parts of the work were already presented in [2, 15, 13, 16, 12]. There remain open issues for further research. We would like to touch some of the following topics: Degenerated configurations in RANSAC. Degenerated configuration (DC) is a set of data points 5 that are consistent with a whole family of model parameters. An example of DC are identical points for a line or a coplanar points for a fundamental matrix. In the presence of significant DC, any model from the family defined by such DC may have large support (set of inliers) and hence RANSAC may return incorect solution. Automatic detection of DC and the ways how to deal with them can be studied. Feedback in correspondence problem. Once the model is hypothesized, 5 interesting DCs are those with the number of data points greater than minimal number of data points needed to calculate model parameters uniquely 36

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract