1 / 1 Image Processing and Computer Vision Detecting primitives and rigid objects in Images Martin de La Gorce martin.de-la-gorce@enpc.fr February 2015
Detecting lines 2 / 1 How can we detect simple primitive such as lines or circles in an image? We can use the canny edge detector to get a list of edges points We try to approximate this set of points by straight lines or circles. We will see two different methods to detect lines that generalize to other kind of primitives Hough transform RANSAC
Line parameterization 3 / 1 a line can be paramterized by two parameters its angle θ [0, π] its signed distance ρ R to the center of the image y y x positif negatif
Line parameterization We denote l(θ, ρ) R 2 the line corresponding to the parameters (θ, ρ). we have l(θ, ρ) = {p R 2 < p, ˆn(θ) >= ρ} with ˆn(θ) the unit vector point in the direction θ This rewrites ˆn(θ) = [cos(θ), sin(θ)] l(θ, ρ) = {p R 2 p x cos(θ) + p y sin(θ) = ρ} y p x 4 / 1
Distance of a point to the line 5 / 1 Given a line l(θ, ρ) corresponding to the parameters (θ, ρ) and a point p R 2 we denote d(p, l(θ, ρ)) is the euclidian distance of the point p to the line l(θ, ρ) d(p, l(θ, ρ)) = < p, ˆn(θ) > ρ (1) With ˆρ(θ, p) = p x cos(θ) + p y sin(θ) y = ˆρ(θ, p) ρ (2) ^ (, p) p x
Line score Given a set of points p 1,..., p N and the parameters (θ, ρ) we can calculate a score for the line: S(θ, p) = N [d(p k, l(θ, ρ)) < τ] k=0 Where [true] = 1 and [false] = 0. τ is a limit distance beyong wich we consider that the point is too far from the line to belong to the line y x The green points are refered as inliers and the black points as outliers.s(θ, p) is therefore the number of inliers 6 / 1
Brute Force 7 / 1 Our goal is to finds the parameters that give the best scores, i.e the line that contain a maximum of point in their vicinity. We can discretize the space of paramter on a 2D grid using discretization steps Δ θ and Δ ρ A brute force approach consists in Evaluating S(θ, p) all combination of parameters in the discretized set
Brute Force 8 / 1 The brute force algorithm would write for j in range(len(thetas)): for i in range(len(rhos)): score[i,j]=0 for k in range(len(points)): if dist_to_line(points[k],rhos[i],thetas[j])<tau: score[i,j]+=1 the complexity of the naive implementation is of order N θ N ρ N
Acceleration 9 / 1 Given a point p and an angle θ there is a unique line that contains p with the angle θ: l(θ, ρ) = {p R 2 p x cos(θ) + p y sin(θ) = ρ} So p l(θ, ρ) ρ = p x cos(θ) + p y sin(θ) y p x
Acceleration 10 / 1 Given a point p and an angle θ there is a set of parallel lines with angle θ that passes near p at a distance smaller than τ d(p, l(θ, ρ)) < τ ρ [ˆρ(θ, p) τ, ˆρ(θ, p) + τ] With ˆρ(θ, p) = p x cos(θ) + p y sin(θ) y p x
Acceleration 11 / 1 A better approach than the brute force consist in looping over the points p i and then angles θ 0,..., θ Nθ 1and avoid the loop over all the ρ 0,..., ρ Nρ 1 using the equivalence d(p, l(θ, ρ)) < τ ρ [ˆρ(θ, p) τ, ˆρ(θ, p) + τ] Assuming that the ρ 0,..., ρ Nρ 1 are evenly spaced with Δρ.We have ρ i = ρ 1 + iδρ d(p, l(θ, ρ i )) < τ ρ i [ˆρ(θ, p) τ, ˆρ(θ, p) + τ] ρ 0 + iδρ [ˆρ(θ, p) τ, ˆρ(θ, p) + τ] i [ f (ˆρ(θ, p) τ), f (ˆρ(θ, p) + τ) ] with f (ρ) = ((ρ ρ 0 )/Δρ) the function that allows to retrieve the index i of some ρ i in the set ρ 0,..., ρ Nρ 1 i.e f (ρ i ) = i
Acceleration the complexity is now N N θ τ/δ ρ 12 / 1 given the angle θ we can now avoid looping over the entire range of discrete ρ and loop only in the the range given by: d(p, l(θ, ρ i )) < τ i [ f (ˆρ(θ, p) τ), f (ˆρ(θ, p) + τ) ] This leads to the pseudo code: for j in range(len(thetas)): for i in range(len(rhos)): score[i,j]=0 for k in range(len(points)): for j in range(len(thetas)): rho_min = hat_rho(p[k],theta[i])-tau rho_max = rho_min+ 2*tau i_rho_min = ceil((rho_min-rhos[0])/delta_rho) i_rho_max = floor(rho_max-rhos[0])/delta_rho) for i from i_rho_min to i_rho_max: if i>=0 and i<len(rhos): score[i,j]+=1
Hough transform 13 / 1 Finally, assuming that we have τ = Δρ/2 the interval [ˆρ(θ, p) τ, ˆρ(θ, p) + τ] is of length Δρ and thus contains exactly one element p i of the set {ρ 0,..., ρ Nρ 1} with i = round(f (ˆρ(θ, p)) = round((ρ ρ 0 )/Δρ) we get what is called the Hough transform for i in range(len(theta)): for j in range(len(rho)): score0[i,j]=0 for k in range(len(points)): for j in range(len(theta)): i=round(hat_rho(p[k],theta[j])-rhos[0])/delta_rho score[i,j]+=1
Hough transform 14 / 1 We can visualize the accumulation in the parameter space for two points as follows:
Hough transform 15 / 1 By detecting edges in an image and accumulating for all the edge points in this image, this gives use a score image referred as the accumulator that looks like A line in the original image corresponds to a peak in this transformed image. We select the local maximums above some inliers count threshold of this image to get a set of lines.
Hough transform 16 / 1
Line refinement 17 / 1 Once we found primitive candidates we can refine its location using a smoother version of the line score S smooth (θ, p) = N h(d(p i, l(θ, ρ))) i=0 with h the function { (1 (x/τ) h(x) = 2 ) 3 ) if x τ 0 if if x > τ 1.0 0.8 0.6 0.4 0.2 0.0 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
Line refinement 18 / 1 We can maximize S smooth (θ, p) using a gradient ascent or a robust non linear least squares minimization method (see links). Note that we could use this smoother energy within the Hough transform
Hough transform 19 / 1 The Hough transform generalizes other primitives types that can be described with an analytic equation involving few parameters. Will will use a N-dimensional accumulator for primitives that are parameterized with N parameters. The size of the accumulator increases exponentially with the number of parameters
Circles Hough transform 20 / 1 For example a circle detector will require a 3D accumulator for the radius and the two coordinates of the center If we fix the radius we have two parameters left for the position of the center: edges hough transform detected circle
Circles Hough transform 21 / 1 we want to find the local maximums of S(x, y, r) = N [ p k (x, y) [r τ, r + τ]] k=0 given a fixed radius r we first construct the set of point whose distance to the origin is within the range [r τ, r + τ]: C r = {(x, y) N 2 r τ x 2 + y 2 r + τ} given r this can be done once for all using some circle rasterization technique. the function S rewrites : S(x, y, r) = N [(p k (x, y)) C r ] k=0
Circles Hough transform 22 / 1 we have : S(x, y, r) = N [(p k (x, y)) C r ] k=0 by posing Δ = p k (x, y) we have (p k (x, y)) C r (x, y) = p k Δ and Δ C r Once C r computed we can loop of the points in the image and C r to increment the accumulators for j in range(len(thetas)): for i in range(len(rhos)): score[i,j]=0 for k in range(len(points)): for delta in Cr score[p[k,0]-delta[0],p[k,1]-delta[1]]+=1
Generalized Hough transform 23 / 1 Suppose we gave a non-parametric rigid shape described by a set of points s 1,..., s m this shape can be rigidly transformed T (θ, t, s i ) = R θ s i + t wit R θ the rotation matrix [ ] cos(θ) sin(θ) R θ = sin(θ) cos(θ) We want to find the transformation that aligns the shape points s 1,..., s m with edges points p 1,..., p n in the image. We can formulate a score function: S(θ, t) = n [min j p i T (θ, t, s j ) 2 < τ] (3) i=1 This counts the number of observed point that fall near the transformed shape and thus is similar to the cost used previously
Generalized hough transform S(θ, t) = n [min j p i T (θ, t, s j ) 2 < τ] i=1 We will consider for now that the rotation angle θ is fixed and thus we use a 2D accumulator for the translation We can discretize t = t 1,..., t k and compute all S(θ, t k ) using three nested loops over t,i and j, we can define the set C θ = {p N 2 min j p R θ s i 2 < τ} We have N S(θ, t) = [(p k t) C θ ] k=0 and we can use the same method as for circles 24 / 1
Generalized hough transform C θ might be large, instead : We use a new criterion 1 S(θ, t) = n m [ p i T (θ, t, s j ) < τ] i=1 j=1 This criterion counts the number of pairs of point image/mode that are near each other after translation of the model. We can discretize t = t 1,..., t k and compute all S(θ, t k ) using three nested loops over t,i and j, this is not very efficient... 1 the fact that we removed the min operator a now use only sums gives us freedom on the order of the parameters in the nested loops when we perform summation 25 / 1
Generalized hough transform 26 / 1 We notice that a given pair (p i, s j ) will contribute to increase the score of a very small set of translations t k : p i (R θ s j + t k )) < τ t k (p i R θ s j ) < τ We can use nested loops for over i, j and then enumerate only the few t values in the disk of center p i R θ s j and radius τ. We can further speed up by accumulating only for t = p i R θ s j and then convolve the accumulator by a disk of radius τ If we do not want to keep θ fixed, we use discretized angles θ 1,..., θ l, use a 3D accumulator and add a loop over θ 2 2 this approach has been proposed by Merlin and Farber
Generalized hough transform: Using normals 27 / 1 By computing edge normals for both the model (ˆn s1,..., ˆn sm ) and the edge points in the image (ˆn p1,..., ˆn pn ), we can change to criterion to S(θ, t) = n m [( p i T ((θ, t, s j ) < τ)& ˆn pi, R θˆn sj < τ n ] i=1 j=1 This now counts the number of pairs of point image/mode that are near each other and that have similar normal after the rotation of the model. Thanks to the addition normal agreement constraint, from a given pair (p i, s j ) we can retrieve an interval for the rotation angle θ, and the pair will contribute to increase the score of a very small set of translations/rotation pairs (t, θ) which will accelerate the voting.
Generalized hough transform 28 / 1 Some approximation that lead to an acceleration: Instead of having one high-dimensional array, we store a few two dimensional projections with common coordinates (e.g, (t x, t y ), (t x, θ), (t y, θ), S xy (t x, t y ) = θ S(θ, t x, t y ) (4) S xθ (t x, θ) = t y S(θ, t x, t y ) (5) S yθ (t y, θ) = t x S(θ, t x, t y ) (6) Find consistent peaks in these lower dimensional arrays
RANSAC 29 / 1 Another generic approach to find primitives in a set of points is called he RANSAC method for "RANdom SAmple Consensus" The idea is to sample randomly just enough point to estimate the parameters of the unique primitive that contain these points (2 point for lines, 3 points for circles) Count how many points in the whole set of points agree with the generated primitive repeat this step many time and keep the primitive that contains the largest number of inlier points.
RANSAC 30 / 1 Set of points :
RANSAC 31 / 1 Select two points at random :
RANSAC 32 / 1 find the paramters of the line that fit the two points:
RANSAC 33 / 1 calculate the distance of all the points to the line:
RANSAC 34 / 1 select data that support the current hypothesis:
RANSAC 35 / 1 repeat sampling:
RANSAC 36 / 1 repeat sampling:
RANSAC 37 / 1 repeat sampling:
RANSAC 38 / 1 best_score=inf for t in range(nb_iteration): i=random(len(points)) j=random(len(points)) rho,heta=get_line_paramters(points[i],points[j]) score=0 for k in range(len(points)): if dist_to_line(points[k],rho,theta)<tau: score+=1 if score>best_score: best_rho=rho best_theta=teta
RANSAC 39 / 1 It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability this probability increasing as more iterations are allowed. Using some statistics we can estimate how many test should be done to get a good chance of finding the best line the time needed to get a good solution increases with the number of outliers as we get more chance to pick an outlier at each random selection RANSAC usually performs badly when the number of inliers is less than 50%
Refinement: ICP 40 / 1 Once we found candidate poses using the generalized hough transform we can perform refinement using an iterative approach. A classical method to align two set of points given an initial estimate of the alinement is called the Iterative closest point (ICP) algorithm consists in minimizing iteratively this energy E(θ, t) = m min i p i T (θ, t, s j ) 2 j=1 Note that the summation is done of the points s j instead of the point p i as done in eqn.3 For each point in the shape the image we look for the closest point in the image.
Refinement: ICP 41 / 1 E(θ, t) = m min i p i T (θ, t, s j ) 2 j=1 We reformulate this problem by adding a vector of auxiliary variables L = (l 1,..., l m ) {1,..., n} m that encode for each point of the shape the index of the corresponding point in the image: m E (θ, t, L) = p lj T (θ, t, s j ) 2 j=1
ICP 42 / 1 E (θ, t, L) = m p lj T (θ, t, s j ) 2 j=1 Similarly to the kmeans algorithm, we will minimize E alternatively with respect to (θ, t) and with respect to L. the minimization with respect to L is done by solving m independant problems: l j = argmin i p i T (θ, t, s j ) 2. For each poitn f the shape we look for he closest point in the image the minimization with respect to t is done by displacing the shape with the mean of the point differences t k+1 = t k + 1 m m p lj T (θ, t k, s j ) j=1
ICP 43 / 1 E (θ, t, L) = m p lj T (θ, t, s j ) 2 j=1 the minimization with respect to θ is more complex and require a singular value decomposition [1, 2]. Alternativley we can use a gauss-newton iterative minimization (see last slides)
ICP 44 / 1 iteration 1 iteration 2 iteration 3 iteration 4 iteration 5 iteration 6
Non-linear least squares 45 / 1 Given n functions f i (Θ) from R m into Rwith Θ=(θ 1,..., θ m ), we want to minimize the function S(Θ) = n f i (Θ) 2 i=1 This is refered in te literature an non-linear least squares problem
Gauss-Newton S(Θ) = n f i (Θ) 2 This can be solved using the iterative minimization. At each iteration, we linearize f i (Θ) around Θ t : we get S(Θ) i=1 f i (Θ) f i (Θ t ) + f i (Θ t )(Θ Θ t ) n (f i (Θ t ) + f i (Θ t )(Θ Θ t )) 2 = F + J f (Θ Θ t ) 2 i=1 With F the vector [f 1 (Θ t ),..., f n (Θ t )] and J f the jacobian matrix evaluated at location Θ t : f 1 f θ1 Θt 1 θm Θt J f =..... f n θ1 Θt f n θm Θt 46 / 1
Gauss-Newton 47 / 1 We have from the previousl slide S(Θ) F + J f (Θ Θ t ) 2 We get the next estimate Θ t+1 as the minimum of this approximation. The gradient write and we get the minium for 2J T f (F + J f (Θ Θ t )) Θ t+1 = Θ t (J T f J f ) 1 J T f F This iterative method is called the Gauss-Newton algorithm
Links 48 / 1 tutorial on Robust Non-Linear Least-Squares and applications to computer vision http://cvlabwww.epfl.ch/~fua/courses/lsq/intro.htm
48 / 1 [1] Arun, K., Huang, T., and Blostein, S Least-Squares Fitting of Two 3-D Point Sets. Trans. PAMI, Vol. 9, No. 5, 1987 [2] Horn, B. Closed-Form Solution of Absolute Orientation Using Unit Quaternions.JOSA A, Vol. 4, No. 4, 1987.