AIMS Computer vision. AIMS Computer Vision. Outline. Outline.

AIMS Computer Vson 1 Matchng, ndexng, and search 2 Object category detecton 3 Vsual geometry 1/2: Camera models and trangulaton 4 Vsual geometry 2/2: Reconstructon from multple vews AIMS Computer vson Lecture 41: Reconstructon HT 2018 Andrea Vedald 5 Segmentaton, trackng, and depth sensors For sldes and up-to-date nformaton: http://wwwrobotsoxacuk/~vedald/teachhtml 2 / 57 Outlne Outlne Introducton Introducton Computng H or F from pont matches Computng H or F from pont matches Feature detecton and matchng Feature detecton and matchng RANSAC RANSAC Determnng the ego-moton from F Determnng the ego-moton from F Structure and moton from more than two vews Structure and moton from more than two vews 3 / 57 4 / 57

Introducton In the prevous lectures we have seen the concept of eppolar geometry: 1 Let X be a 3D pont 2 Gven camera parameters P = K[I 0] and P = K [R t] the camera projectons of the 3D pont are: 3 Hence we have that: x PX = KX, x P X = K (RX + t) t (K ) 1 x = t (RX + t) = RK 1 x Introducton In the prevous lectures we have also seen stereo reconstructon from two vews: 1 Obtan (somehow) the camera parameters P = K[I 0] and P = K [R t] 2 Compute the fundamental matrx F = (K ) [t] RK 1 3 Match ponts x n an mage to correspondng ponts x n the second along the eppolar lnes l = Fx 4 Trangulaton: compute the 3D ponts X from x, x, P, P 4 Due to orthogonalty, we have also have that: 5 Hence we get the eppolar constrant (K ) 1 x t (K ) 1 x = 0 (x ) F x = 0 where the fundamental matrx s gven by: F = (K ) 1 [t] RK 1 5 / 57 Introducton In the prevous lectures we have also seen stereo reconstructon from two vews: 1 Obtan (somehow) the camera parameters P = K[I 0] and P = K [R t] 2 Compute the fundamental matrx F = (K ) [t] RK 1 3 Match ponts x n an mage to correspondng ponts x n the second along the eppolar lnes l = Fx 4 Trangulaton: compute the 3D ponts X from x, x, P, P You get Structure from Moton 6 / 57 Next What happens f: 1 you do not know how the camera parameters? 2 you have more than two mages? 6 / 57 [Carl Olsson] 7 / 57

The Structure from Moton (SFM) problem Gven two or more mages of a scene: Prototypcal SFM ppelne 1 Match corner ponts to fnd pont correspondence Ths s harder than before as the eppolar geometry s unavalable 2 Compute the egomoton R, t: For planar scenes: Camera C Camera C compute () the camera moton and () the scene structure Compute the homography matrx H (eg four ponts algorthm seen n B14); Extract the egomoton from H For general 3D scenes: Compute the fundamental matrx F (eg eght ponts algorthm); Extract the egomoton from F Assumptons: Known ntrnsc calbraton K, K 3 Trangulate as before to obtan the 3D ponts Unknown extrnsc calbraton R, t (egomoton) Outlne 8 / 57 Corner Ponts computed for each frame 9 / 57 Introducton Start from two vews: Computng H or F from pont matches Feature detecton and matchng RANSAC Determnng the ego-moton from F Structure and moton from more than two vews 10 / 57 11 / 57

Corner Ponts computed for each frame Extract some corner ponts, for example usng the Harrs detector: Egomoton from corner ponts Egomoton = transformaton between the cameras X x x' x x' C C' C (R,t) C' Gven pont correspondences x x for = 1 n, we want to determne R and t Intuton: Keep C stll, and move C untl all rays ntersect Obvously three correspondences are not enough to fx C How many do we need? 12 / 57 13 / 57 Outlne of egomoton computaton Actually, t found only up to scale F s a homogeneous matrx, so F K 1 [t] RK 1 K 1 [λt] RK 1 Therefore translaton and all lengths are recovered only up to scale: 1 Compute the fundamental matrx F from the correspondences x x 2 Decompose F = K [t] RK 1 to fnd R, t (gven the known K and K ) t λt, X λx 3 Compute the projecton matrces P and P f needed Depth/scale ambguty We cannot dstngush: a large translaton when vewng a large dstant scene; from a small translaton when vewng a small near-to scene Queston: How mght you resolve the depth/scale scalng ambguty? 14 / 57 15 / 57

How many correspondences are requred? Because of the depth/speed scalng ambguty the rotaton (3 DoF) can be determned completely but only the translaton drecton (2 DoF) s recoverable Ths allows us to evaluate the number of correspondence needed: 1 For n scene ponts there are 3n unknowns 2 Between 2 vews there are 5 = (3 rot + 2 trans) unknowns 3 Each correspondence yelds 4 measurements 4 Hence 4n 3n + 5 and Computng the fundamental matrx for n 8 Task: Gven n correspondences x x compute F such that : x Fx = 0 Soluton: Each correspondence generates one constrant [ x y 1 ] f 1 f 2 f 3 x = 0 whch can be wrtten as f 4 f 5 f 6 f 7 f 8 f 9 y 1 n 5 correspondences are needed For n < 7 the solutons are non-lnear, so we ll see solutons for n = 7 and n = 8 or x x f 1 + x y f 2 + x f 3 + y x f 4 + y y f 5 + y f 6 + x f 7 + y f 8 + f 9 = 0 1 [ x x x y x y x y y y x y 1 ] f = 0 f 9 Computng the fundamental matrx /ctd 16 / 57 A least squares verson of the 8-pont algorthm 17 / 57 For n correspondences buld up the n 9 system x 1 x 1 x 1 y 1 x 1 y 1 x 1 y 1 y 1 y 1 x 1 y 1 1 1 A n 9 f = f x n x n x n y n x n y n x n y n y n y n x n y n 1 For n = 8 ponts f cand be found as the null-space of A, and so f and F are determned up to scale (as expected) Snce the ponts are nosy, n general one wants to use n > 8 Ths can be done usng least square f 9 Due to nose, there wll not be an exact soluton to Af = 0 (A has full rank) Least square formulaton Fnd the unt vector f that mnmzes the norm of the resdual r = Af: f = argmn f: f =1 Af 2 18 / 57 19 / 57

A least squares verson of the 8-pont algorthm Due to nose, there wll not be an exact soluton to Af = 0 (A has full rank) Least square formulaton Fnd the unt vector f that mnmzes the norm of the resdual r = Af: f = argmn f: f =1 Af 2 A least squares verson of the 8-pont algorthm Due to nose, there wll not be an exact soluton to Af = 0 (A has full rank) Least square formulaton Fnd the unt vector f that mnmzes the norm of the resdual r = Af: f = argmn f: f =1 Af 2 Soluton wth egenvalues Compute the egen-decomposton of the matrx M = A A and set f to the (unt) egenvector ê 1 correspondng to the smallest egenvalue λ 1 Soluton wth egenvalues Compute the egen-decomposton of the matrx M = A A and set f to the (unt) egenvector ê 1 correspondng to the smallest egenvalue λ 1 Soluton wth SVD Compute the SVD of the matrx A and set f to the (unt) rght sngular vector ê 1 correspondng to the smallest sngular value σ 1 Proof of the egendecomposton soluton 19 / 57 Proof of the egendecomposton soluton 19 / 57 The squared sum of the resduals r = Af s r 2 = r r = f A Af = f Mf The squared sum of the resduals r = Af s r 2 = r r = f A Af = f Mf M = A A s a n n symmetrc real matrx; hence t can be decomposed as λ 1 M = VΛV λ 2 n = V V = λ [ê ê ] λn =1 where V = [ê1 ê n ] s the orthonormal matrx of egenvectors egenvalues are non-decreasng: 0 λ1 λ 2 λ n 20 / 57 20 / 57

Proof of the egendecomposton soluton The squared sum of the resduals r = Af s r 2 = r r = f A Af = f Mf M = A A s a n n symmetrc real matrx; hence t can be decomposed as λ 1 M = VΛV λ 2 n = V V = λ [ê ê ] where λn =1 Proof of the SVD soluton Any m n matrx A where m n can be decomposed as σ 1 σ 2 A m n = U m n σn n n V n n where U s column-orthogonal, V s fully orthogonal, and Σ contans the sngular values ordered so 0 σ 1 σ 2 σ n ] V = [ê1 ê n s the orthonormal matrx of egenvectors egenvalues are non-decreasng: 0 λ1 λ 2 λ n The egenvalues are non-negatve because: Mê = ê λ ê Mê = ê ê λ [Aê ] [Aê ] = λ 0 Then f Mf = λ 1 (f ê 1 ) 2 + λ 2 (f ê 2 ) 2 + + λ n (f ê n ) 2 Ths expresson s mnmsed when f = ê 1 Proof of the SVD soluton 20 / 57 Computng F from 7 ponts 21 / 57 Any m n matrx A where m n can be decomposed as σ 1 σ 2 A m n = U m n σn n n V n n where U s column-orthogonal, V s fully orthogonal, and Σ contans the sngular values ordered so 0 σ 1 σ 2 σ n The sngular vectors V of A are the same as the egenvectors of M = A A: M = A A = VΣ U UΣV = VΣ 2 V In partcular f = ê 1 s the frst column of V The SVD s usually preferred to the egenvalue decomposton because t s numercally more stable For the 7 9 set of equatons Af = 0 we know that f s n the null space of A Ths null space s 2-dmensonal and hence spanned by two vectors f 1 and f 2 Snce f s determned up to scale, all solutons are gven by: f = αf 1 + (1 α)f 2 Reshapng the vectors, results n a famly of canddate fundamental matrces F = αf 1 + (1 α)f 2 To fnd whch one s a proper fundamental matrx, use the non-lnear constrant det F = 0 Ths gves a cubc equaton n α Can you see why? The cubc has ether one or three real solutons for α 21 / 57 22 / 57

A Vsual Compass If the moton of the camera s known to be a pure rotaton, then the mages are related by an homography where Algorhtm: Fnd correspondences x x x = H x H = K RK 1 Compute H from the correspondences (see B14) A Vsual Compass Use H to regster mages to a common reference frame to create panoramc mosac: 01 01 01 01 Extract R to fnd relatve rotatons But of course we cannot recover any scene structure! Outlne 23 / 57 Feature detecton, matchng and the F matrx 24 / 57 Introducton Computng H or F from pont matches Feature detecton and matchng RANSAC So far, we have not dscussed matchng The reason s that computaton of the fundamental matrx can be ncorporated nto the matchng Outlne: Extract mage ponts as corners Why corners? Obtan an ntal corner matches usng local descrptors Remove outler and estmate the fundamental matrx F usng RANSAC Obtan further corner matches usng F Determnng the ego-moton from F Structure and moton from more than two vews 25 / 57 26 / 57

Why corner ponts, especally? Corner Ponts computed for each frame Why not use lnes, or take a dense pxel-based approach? The key reason s that the search for matches s no longer 1D when the camera moton s unknown A 2D regon has to be searched A dense approach s then lkely to be too expensve, and matchng sectons of a lne suffers the the aperture problem Corners are: Recall that ponts wth dstnctvely hgh autocorrelaton provde the best chance of dervng a dstnctve cross-correlaton sgnal 2D relatvely sparse; reasonably cheap to compute; well-localzed; appear qute robustly from frame to frame 1D Unform Hence corners are good for matchng 27 / 57 28 / 57 Intal matchng Intal matchng /ctd Extract corners n both mages (feature detecton) For each corner x n C, make a lst of potental matches x n a regon n C around x (heurstc) Rank the matches by comparng the regons around the corners usng cross-correlaton Sft them to reconcle forward-backward nconsstences The dea here s to not to do too much work just enough to get some good matches Source Patch Target (a) Target (b) Target (c) etc 29 / 57 30 / 57

Intal Matchng /ctd Outlne Matches some good matches, some msmatches Can stll compute F wth around 50% msmatches How? Introducton Computng H or F from pont matches Feature detecton and matchng RANSAC Determnng the ego-moton from F Structure and moton from more than two vews RANSAC RANdom SAmple Concensus 31 / 57 RANSAC algorthm for lnes 32 / 57 Suppose you tred to ft a straght lne to data contanng outlers ponts whch are not properly descrbed by the assumed probablty dstrbuton The usual methods of least squares are hopelessly corrupted Need to detect outlers and exclude them Use estmaton based on robust statstcs RANSAC was the frst, devsed by vson researchers, Fschler & Bolles (1981) 1 For many repeated trals: 11 Select a random sample of two ponts 12 Ft a lne through them 13 Count how many other ponts are wthn a threshold dstance of the lne (nlers) 2 Select the lne wth the largest number of nlers 3 Refne the lne by fttng t to all the nlers (usng least squares) Remarks: Sample a mnmal set of ponts for your problem (2 for lnes) Repeat such that there s a hgh chance that at least one mnmal set contans only nlers (see tutoral sheet) 33 / 57 34 / 57

RANSAC algorthm for lnes RANSAC algorthm for lnes Data 50% corrupt Random Sample Support 10 Support 50 35 / 57 36 / 57 RANSAC algorthm for F RANSAC algorthm for H 1 For many repeated trals: 1 For many repeated trals: 11 Select a random sample of seven correspondences 12 Compute F usng the cubc method 13 Count how many other correspondences are wthn threshold dstance of the eppolar lnes (nlers) 2 Select the F wth the largest number of nlers 3 Refne F by fttng t to all the nlers (usng the SVD method) 11 Select a random sample of four correspondences 12 Compute H (as n B14) 13 Count how many other correspondences are wthn threshold dstance of the predcted locatons (nlers) 2 Select the H wth the largest number of nlers 3 Refne H by fttng t to all the nlers, optmzng the reprojecton error mn d 2 (x, Hx) + d 2 (H 1 x, x) H (x,x ) Inlers 37 / 57 38 / 57

Correspondences consstent wth eppolar geometry Eppolar geometry Intal matches Inlers 39 / 57 40 / 57 Outlne Computng R and t from F Introducton Computng H or F from pont matches Feature detecton and matchng RANSAC Determnng the ego-moton from F Recall that F = K 1 [t] RK 1 We now show how to recover R and t from F (gven K and K ) 1 Compute the essental matrx E = [t] R = K FK 2 Compute t as the null vector of E (ee t = 0) t s determned up to a scalng factor µ there are two solutons ±µt 3 Compute R from E the algorthm for ths step s gven later t returns two solutons R1 and R 2 4 Overall, there are four solutons for the projecton matrx: Structure and moton from more than two vews P = K [R 1 µt] P = K [R 2 µt] P = K [R 1 µt] P = K [R 2 µt] 41 / 57 5 Exclude 3 of these usng a vsblty test 42 / 57

The four solutons The 3D pont s n front of both cameras n only one case Invsble Vsble C C C C Vsble Vsble Computng R 1,2 from the essental matrx E Non-examnable Recall that E = [t] R; we now recover R from E Algorthm: 1 Compute the Sngular Value Decomposton (SVD) of E 1 0 0 U 0 1 0 V M 0 0 0 C Invsble Invsble C C Invsble C Vsble 2 Set W = 0 1 0 1 0 0 0 0 1 3 The two solutons are: Note these are computer vson cameras, so to be vsble a ray must pass through the mage on ts way to the optc centre! R 1 = UWV, R 2 = UW V 43 / 57 44 / 57 Outlne Structure and moton for more than two vews Introducton Computng H or F from pont matches Feature detecton and matchng RANSAC Determnng the ego-moton from F Why bother? 1 Matchng becomes more verfable, as 3D pont estmates are avalable to reproject 2 3D pont estmates mprove as further vews over a range of angles s obtaned 3 There s no ncrease n the degree of ambguty, though the overall scale ambguty perssts Structure and moton from more than two vews 45 / 57 46 / 57

Notaton for three+ vews For three vews let the cameras be C, C, C wth projecton matrces P, P and P, and wth mage ponts x, x and x X x C x x Pont correspondence over 3 vews Gven the projecton matrces and x x how s the pont x found? X??? C x x? C C Algorthm: C C For m vews, a pont x j s maged n the -th camera C at x j = P X j 1 Compute the 3D pont from x and x 2 Then re-project usng P The search n the thrd mage s zero-d, and the sze of the search regon depends only on uncertanty Problem statement: structure and moton 47 / 57 /ctd 48 / 57 Gven: n matchng mage ponts x j over m vews Fnd: the cameras P and the 3D ponts X j such that x j P X j by fndng: mn P,X j Ths s a serous mnmzaton: n For each camera, 6 parameters For each 3D pont, 3 parameters Total of 6m + 3n 1 ( 1 for scale) parameters overall j=1 =1 m d 2 (x j, P X j ) x X x C C For 50 frames, 1000 ponts, we have 33 10 3 unknowns! x C Buldng block s computng correspondences x j x +1 j, fndng F +1 and then matrces P, P +1 Algorthm 1 Compute nterest ponts n each mage 2 Compute matches between consecutve mage pars, + 1 3 Compute F +1 Recover P, P +1 4 Compute scene ponts 5 Extend correspondences over mage trples 6 Extend correspondences over all mages 7 Optmze over all P, X j Images 1 2 3 4 x 1 j 2 x j 3 x j 4 x j P P P P 1 2 3 4 X j 49 / 57 50 / 57

2d3 s Boujou system Zsserman, Ftzgbbon, Torr, Beardsley 2d3 s Boujou system Zsserman, Ftzgbbon, Torr, Beardsley Orgnal Sequence Augmentaton 51 / 57 52 / 57 Batch SFM Up to now batch, offlne processng of vdeo sequences Real-tme, sequental SFM Real-tme, sequental, fxed tme budget (10s of mllseconds) Buld and mantan a map, and localse wrt the map DATA DATA DATA + DATA + DATA Post-producton, 3D model reconstructon, etc Real-tme robotcs applcatons, but n smplfed 2D envronments, specalsed sensors, etc Relable, repeated measurement s crucal mtgates aganst drft gvng repeatable accuracy 53 / 57 54 / 57

Sequental structure from moton: vsual SLAM Example: real-tme, sequental structure from moton Represent jont dstrbuton over camera and feature postons usng a sngle mult-varate Gaussan x v P xx P xy1 P xy2 y 1 P y1 x P y1 y 1 P y1 y 2 x = P y2 x P y2 y 1 P y2 y 2 y 2, P = state estmate predct search update measurement Use Kalman Flter (see C4B Moble Robotcs) predct measure update mage patch framework to propagate uncertanty, and fuse measurement data 55 / 57 56 / 57 Davson, Red, Smth, Wllams, Klen, et al