Lecture'9'&'10:'' Stereo'Vision'

Lecture'9'&'10:'' Stereo'Vision' Dr.'Juan'Carlos'Niebles' Stanford'AI'Lab' ' Professor'FeiAFei'Li' Stanford'Vision'Lab' 1'

Dimensionality'ReducIon'Machine'(3D'to'2D)' 3D world 2D image Point of observation Figures Stephen E. Palmer, 2002 2'

Pinhole(camera( f' f' 'Common'to'draw'image'plane'in#front''of'the'focal' point''' 'Moving'the'image'plane'merely'scales'the'image.' $! x' = #! y' = " f f x z y z 3'

RelaIng'a'realAworld'point'to'a'point'on'the'camera' In'homogeneous'coordinates:' & f x# & f 0 0 P' = $! = $ $ f y! $ 0 f 0 $ % z!" $ % 0 0 1 M' & x# 0# $!! $ y 0!! $ z! 0! " $! % 1" ideal'world' Intrinsic Assumptions Unit aspect ratio Optical center at (0,0) No skew Extrinsic Assumptions No rotation Camera at (0,0,0) 4'

RelaIng'a'realAworld'point'to'a'point'on'the'camera' In'homogeneous'coordinates:'! # P' = # # "# f x f y z $! & # & = # & # %& " f 0 0 0 0 f 0 0 0 0 1 0 K $! & # & # & # %# " x y z 1 Intrinsic Assumptions Unit aspect ratio Optical center at (0,0) No skew $ & & = K & [ I 0]P & % Extrinsic Assumptions No rotation Camera at (0,0,0) 5'

RealAworld'camera' Intrinsic Assumptions Optical center at (u0, v0) Rectangular pixels Small skew P' = K! I 0 " # $ P Extrinsic Assumptions No rotation Camera at (0,0,0) Intrinsic'parameters' & x# & u# & α s u0 0# $! $! $! y w v = 0 β v $! $! $ 0 0! $ z! $ % 1! " $ % 0 0 1 0! " $! % 1" Slide'inspiraIon:'S.'Savarese' 6'

RealAworld'camera'+'RealAworld'transformaIon' Intrinsic Assumptions Optical center at (u0, v0) Rectangular pixels Small skew P' = K [ R t ] P Extrinsic Assumptions Allow rotation Camera at (tx,ty,tz) & u# & α s u0 #& r11 r12 r13 w $ v! = $ 0 β v! $ $! $ 0 r! $ 21 r22 r23 $ % 1! " $ % 0 0 1!" $ % r31 r32 r33 t t t x y z & x# # $!! y $!! $ z!!" $! % 1" Slide'inspiraIon:'S.'Savarese' 7'

What'we'will'learn'today?' IntroducIon'to'stereo'vision' Epipolar'geometry:'a'gentle'intro' Parallel'images'&'image'recIficaIon' Solving'the'correspondence'problem' Homographic'transformaIon' AcIve'stereo'vision'system' ' Reading:( '[HZ]'Chapters:'4,'9,'11' '[FP]'Chapters:'10 '' 8'

Recovering'3D'from'Images' How'can'we'automaIcally'compute'3D'geometry' from'images?' What'cues'in'the'image'provide'3D'informaIon?' Real 3D world 2D image?' Point of observation 10'

Visual'Cues'for'3D' Shading' Merle(Norman(Cosme5cs,(Los(Angeles( Slide'credit:'J.'Hayes' 11'

Visual'Cues'for'3D' Shading' Texture' The$Visual$Cliff,(by(William(Vandivert,(1960( Slide'credit:'J.'Hayes' 12'

Visual'Cues'for'3D' Shading' Texture' Focus' From(The$Art$of$Photography,(Canon( Slide'credit:'J.'Hayes' 13'

Visual'Cues'for'3D' Shading' Texture' Focus' MoIon' Slide'credit:'J.'Hayes' 14'

Visual'Cues'for'3D' Shading' Texture' Focus' MoIon' Others:' Highlights' Shadows' Silhouedes' InterAreflecIons' Symmetry' Light'PolarizaIon'...' Shape'From'X' X'='shading,'texture,'focus,'moIon,'...' We ll'focus'on'the'moion'cue' Slide'credit:'J.'Hayes' 15'

Stereo'ReconstrucIon' The'Stereo'Problem' Shape'from'two'(or'more)'images' Biological'moIvaIon' known( camera( viewpoints( Slide'credit:'J.'Hayes' 16'

Why'do'we'have'two'eyes?' Cyclops( ( (((((((((((( (vs.(((((((((((((((((((((( ((Odysseus( Slide'credit:'J.'Hayes' 17'

1.'Two'is'beder'than'one' Slide'credit:'J.'Hayes' 18'

Depth'from'Convergence' Slide'credit:'J.'Hayes' 19'

Pinhole(camera( f' f' 'Common'to'draw'image'plane'in#front''of'the'focal' point''' 'Moving'the'image'plane'merely'scales'the'image.' $! x' = #! y' = " f f x z y z 21'

Triangula5on( P' p ' p ' O ' O ' Inputs:' 1. Camera'parameters'known' 2. Corresponding'points'p'and'p 'known' ' Output:'! P'is'at'the'intersecIon'of'''''''''''and' Op! O 0 p 0 22'

Epipolar(geometry( P' p ' p ' e ' e ' O ' 'Epipolar'Plane' 'Baseline' 'Epipolar'Lines' 'Epipoles'e,'e ' O ' ='intersecions'of'baseline'with'image'planes'' ='projecions'of'the'other'camera'center' 23'

Example:(Converging(image(planes( 24'

Epipolar(Constraint( A 'Two'views'of'the'same'object'' A 'Suppose'I'know'the'camera'posiIons'and'camera'matrices' A'Given'a'point'on'len'image,'how'can'I'find'the'corresponding'point'on'right'image?' 25'

P?' p ' e ' e ' l ' O ' O ' 26'

Epipolar(Constraint( ''PotenIal'matches'for'p'have'to'lie'on'the'corresponding'epipolar'line'l.' ''PotenIal'matches'for'p 'have'to'lie'on'the'corresponding'epipolar'line'l.' 27'

Epipolar(Constraint( P' p M P = & u# $! $ v! $ % 1! " p ' p ' p M ' P = & u' # $! $ v'! $ % 1!" O ' R,'T' O ' M = K! " I 0 # $ M ' = K '! " R T # $ 28'

Epipolar(Constraint( P' p ' p ' O ' R,'T' O ' M = K! " I 0 # $ '''K'and'K 'are'known' '''(calibrated'cameras)' M ' = K '! " R T # $ M = [ I 0] Assume'K'='K '='I' M ' = [ R T] 29'

Epipolar(Constraint( P' p ' p ' O ' R,'T' O ' T ( R p!) Perpendicular'to'epipolar'plane' [ T (R p )] 0 p T! = 30'

31' Cross'product'as'matrix'mulIplicaIon' b a b a ] [ 0 0 0 = " " " # $ % % % & ' " " " # $ % % % & ' = z y x x y x z y z b b b a a a a a a skew'symmetric'matrix '

Epipolar(Constraint( P' p ' p ' O ' R,'T' O ' [ T (R p )] 0 p T [ T ] R p! = 0 p T! = (LonguetAHiggins,'1981)' E'='essenIal'matrix' 32'

Epipolar(Constraint( P' p ' p ' l ' l ' e ' e ' O ' O ' E'p ''is'the'epipolar'line'associated'with'p '(l'='e'p )' E T' p''is'the'epipolar'line'associated'with'p'(l '='E T' p)' E'is'singular'(rank'two)' E'e '='0'''and'''E T' e'='0' E'is'3x3'matrix;'5'DOF'' 33'

Simplest Case: Parallel images Image planes of cameras are parallel to each other and to the baseline Camera centers are at same height Focal lengths are the same Slide'credit:'J.'Hayes' 35'

Simplest Case: Parallel images Image planes of cameras are parallel to each other and to the baseline Camera centers are at same height Focal lengths are the same Then, epipolar lines fall along the horizontal scan lines of the images Slide'credit:'J.'Hayes' 36'

Essential matrix for parallel images p T E! p = 0, E = [t ]R!!! " # $ $ $ % & = = 0 0 0 0 0 0 0 ] [ T T R t E!!! " # $ $ $ % & = 0 0 0 ] [ x y x z y z a a a a a a a R = I t = (T, 0, 0) Epipolar constraint: t p p Reminder:'skew'symmetric'matrix' Slide'credit:'J.'Hayes' 37'

( ) ( ) v T Tv Tv T v u v u T T v u! = = " " " # $ % % % & '! = " " " # $ % % % & '!! ) ) ) * +,,, -. 0 0 1 0 1 0 0 0 0 0 0 0 1 The y-coordinates of corresponding points are the same! t p p Essential matrix for parallel images p T E! p = 0, E = [t ]R!!! " # $ $ $ % & = = 0 0 0 0 0 0 0 ] [ T T R t E R = I t = (T, 0, 0) Epipolar constraint: Slide'credit:'J.'Hayes' 38'

Triangulation -- depth from disparity P p p z f f O Baseline B O disparity = u u " = B f z Disparity is inversely proportional to depth! Slide'credit:'J.'Hayes' 39'

Stereo image rectification Slide'credit:'J.'Hayes' 40'

Stereo image rectification Algorithm: Re-project image planes onto a common plane parallel to the line between optical centers Pixel motion is horizontal after this transformation Two transformation matrices, one for each input image reprojection! C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999. Slide'credit:'J.'Hayes' 41'

Rectification example 42'

Applica5on:(view(morphing( S.'M.'Seitz'and'C.'R.'Dyer,'Proc.#SIGGRAPH#96,'1996,'21A30'' 43'

Applica5on:(view(morphing( 44'

Applica5on:(view(morphing( 45'

Applica5on:(view(morphing( 46'

Removing perspective distortion (rectification) H p' 47'

Stereo matching: solving the correspondence problem Goal: finding matching points between two images 49'

Basic stereo matching algorithm For each pixel in the first image Find corresponding epipolar line in the right image Examine all pixels on the epipolar line and pick the best match Triangulate the matches to get depth information Simplest case: epipolar lines are scanlines When does this happen? Slide'credit:'J.'Hayes' 50'

Basic stereo matching algorithm If necessary, rectify the two stereo images to transform epipolar lines into scanlines For each pixel x in the first image Find corresponding' epipolar scanline in the right image Examine all pixels on the scanline and pick the best match x Compute disparity x-x and set depth(x) = 1/(x-x ) 51'

Correspondence problem Let s make some assumptions to simplify the matching problem The baseline is relatively small (compared to the depth of scene points) Then most scene points are visible in both views Also, matching regions are similar in appearance Slide'credit:'J.'Hayes' 52'

Correspondence search with similarity constraint Left Right scanline Matching cost disparity Slide a window along the right scanline and compare contents of that window with the reference window in the left image Matching cost: SSD or normalized correlation Slide'credit:'J.'Hayes' 53'

Correspondence search with similarity constraint Left Right scanline SSD (sum of squared differences) 54'

Correspondence search with similarity constraint Left Right scanline Normalized correlation 55'

Effect of window size Smaller window + More detail More noise W = 3 W = 20 Larger window + Smoother disparity maps Less detail 56'

The similarity constraint Corresponding regions in two images should be similar in appearance and non-corresponding regions should be different When will the similarity constraint fail? Slide'credit:'J.'Hayes' 57'

Limitations of similarity constraint Textureless surfaces Occlusions, repetition Specular surfaces Slide'credit:'J.'Hayes' 58'

Results with window search Left image Right image Window-based matching Ground truth 59'

The role of the baseline Small Baseline Large Baseline Small baseline: large depth error Large baseline: difficult search problem Slide credit: S. Seitz 61'

Problem for wide baselines: Foreshortening Matching with fixed-size windows will fail! Possible solution: adaptively vary window size Another solution: model-based stereo (CS231a) Slide credit: J. Hayes 62'

What'we'will'learn'today?' IntroducIon'to'stereo'vision' Epipolar'geometry:'a'gentle'intro' Parallel'images' Solving'the'correspondence'problem' Homographic'transformaIon' AcIve'stereo'vision'system' ' Reading:( '[HZ]'Chapters:'4,'9,'11' '[FP]'Chapters:'10 '' 63'

Reminder:'transformaIons'in'2D' Special'case' from'lecture'2' (planar'rotaion'' &'translaion)'! # # # " u' v' 1 $ &! & = # R t & " 0 1 %! $ # &# %# " u v 1 $! u & # & = H e # v & # % " 1 $ & & & % - 3 DOF - Preserve distance (areas) - Regulate motion of rigid object 64'

Reminder:'transformaIons'in'2D' Generic'case' (rotaion'in'3d,'scale'' &'translaion)'! # # # " u' v' 1 $! & # & = # & # % "# a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 $! &# &# &# %& " u v 1 $ & & = H & %! # # # " u v 1 $ & & & % - 8 DOF - Preserve colinearity 65'

Goal:'esImate'the'homographic' transformaion'between'two'images' H' AssumpIon:'Given'a'set'of'corresponding'points.' 66'

Goal:'esImate'the'homographic' transformaion'between'two'images' H' AssumpIon:'Given'a'set'of'corresponding'points.' QuesIon:'How'many'points'are'needed?' Hint:'DoF'for'H?' 8!' At'least'4'points' (8'equaIons)' 67'

DLT algorithm (Direct Linear Transformation) H p p! i i p! i = H p i 68'

DLT algorithm (direct Linear Transformation) Unknown [9x1] p! H p = 0 i i A h = 0 i! # H = # # "# h 1 h 2 h 3 h 4 h 5 h 6 h 7 h 8 h 9 $ & & & %&! # # h = # # "# h 1 h 2 h 9 $ & & & & %& Function of measurements [2x9] 9x1 2'independent'equaIons'' 69'

DLT algorithm (direct Linear Transformation) A 2x9 h 9x1 H A h = i A h = 1 0 0 p i p i! A h = 2 A h = N 0 0 A h N 9 9 1 2 = 0 Over determined Homogenous system 70'

DLT algorithm (direct Linear Transformation) A h N 9 9 1 How to solve? 2 = Singular Value Decomposition (SVD)! 0 71'

DLT algorithm (direct Linear Transformation) A h N 9 9 1 How to solve? 2 = Singular Value Decomposition (SVD)! 0 U Σ V T 2 n 9 9 9 9 9 Last column of V gives h! H! Why?'See'pag'593'of'AZ' 72'

DLT algorithm (direct Linear Transformation) How to solve h 0? A = 2 N 9 9 1 73'

Clarification about SVD P = U D V T m n m n n n n n Has n orthogonal columns Orthogonal matrix This is one of the possible SVD decompositions This is typically used for efficiency The classic SVD is actually: P = U D V T m n m m m n n n orthogonal Orthogonal 74'

H' 75'

Epipolar Constraint X x 1 l 1 l 2 e 1 e 2 x 2 O 1 O 2 E x 2 is the epipolar line associated with x 2 (l 1 = E x 2 ) E T x 1 is the epipolar line associated with x 1 (l 2 = E T x 1 ) E is singular (rank two) E e 2 = 0 and E T e 1 = 0 E is 3x3 matrix; 5 DOF 22 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

Epipolar Constraint P p p O O P M P u p M K I 0 v unknown 23 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

Epipolar Constraint P p K p p p p K ' p O O p T T R p 0 (K 1 p) T 1 T R K p 0 p T K T 1 T R K p 0 p T F p 0 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015 24

Epipolar Constraint P p K p p p p K p O O p T F p 0 F = Fundamental Matrix (Faugeras and Luong, 1992) CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015 25

Epipolar Constraint X x 1 x 2 e 1 e 2 O 1 O 2 F x 2 is the epipolar line associated with x 2 (l 1 = F x 2 ) F T x 1 is the epipolar line associated with x 1 (l 2 = F T x 1 ) F is singular (rank two) F e 2 = 0 and F T e 1 = 0 F is 3x3 matrix; 7 DOF 26 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

Why F is useful? x - Suppose F is known - No additional information about the scene and camera is given - Given a point on left image, how can I find the corresponding point on right image? 27 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

Why F is useful? F captures information about the epipolar geometry of 2 views + camera parameters MORE IMPORTANTLY: F gives constraints on how the scene changes under view point transformation (without reconstructing the scene!) Powerful tool in: 3D reconstruction Multi-view object/scene matching 28

Estimating F The Eight-Point Algorithm (Longuet-Higgins, 1981) (Hartley, 1995) P p p O O u u P p P p v v p T F p 0 1 1 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015 29

Estimating F p T F p 0 Let s take 8 corresponding points 30 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

Estimating F 31 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

Estimating F Homogeneous system W f 0 f Rank 8 If N>8 A non-zero solution exists (unique) Lsq. solution by SVD! f 1 Fˆ CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015 32

Rank-2 constraint p T Fˆ p 0 ^ The estimated F may have full rank (det(f) 0) (F should have rank=2 instead) ^ Find F that minimizes F Fˆ 0 Frobenius norm Subject to det(f)=0 SVD (again!) can be used to solve this problem 33 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

Data courtesy of R. Mohr and B. Boufama. CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015 34

Mean errors: 10.0pixel 9.1pixel 35 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

Normalization Is the accuracy in estimating F function of the ref. system in the image plane? E.g. under similarity transformation (T = scale + translation): q i T i p i q i T i p i Does the accuracy in estimating F change if a transformation T is applied? 36 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

Normalization The accuracy in estimating F does change if a transformation T is applied There exists a T for which accuracy is maximized 37 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

W f Homogeneous system If N>8 Lsq. solution by SVD! f 1 W f 0 Fˆ uv s ~ 100000 Last column of ones -> numerical values are unbalanced; Small errors in measurements (u, v) are amplified 38 CEE598 Visual Sensing for Civil Infrastructure Eng. & Mgmt. Mani Golparvar-Fard, 2015

Ac5ve(stereo((point)' P' projector ' p ' Replace'one'of'the'two'cameras'by'a'projector' A 'Single'camera'' A 'Projector'geometry'calibrated' A 'What s'the'advantage'of'having'the'projector?'' O 2' Correspondence'problem'solved!' 77'

Ac5ve(stereo((stripe)'' projector ' A Projector'and'camera'are'parallel' A 'Correspondence'problem'solved!' O ' 78'

Ac5ve(stereo((shadows)'' J.'Bouguet'&'P.'Perona,'99' S.'Savarese,'J.'Bouguet'&'Perona,'00'' Light'source ' O ' A '1'camera,1'light'source' A 'very'cheap'setup' A 'calibrated'light'source' 79'

Ac5ve(stereo((shadows)'' J.'Bouguet'&'P.'Perona,'99' S.'Savarese,'J.'Bouguet'&'Perona,'00'' 80'

Ac5ve(stereo((colorAcoded'stripes)'' L.'Zhang,'B.'Curless,'and'S.'M.'Seitz'2002' S.'Rusinkiewicz'&'Levoy'2002'' projector ' A'Dense'reconstrucIon''' A 'Correspondence'problem'again' A 'Get'around'it'by'using'color'codes' O ' 81'

Ac5ve(stereo((colorAcoded'stripes)'' Rapid'shape'acquisiIon:'Projector'+'stereo'cameras' L.'Zhang,'B.'Curless,'and'S.'M.'Seitz.'Rapid'Shape'AcquisiIon'Using'Color'Structured'Light'and'MulIApass'Dynamic' Programming.'3DPVT'2002' 82'

Ac5ve(stereo((stripe)'' Digital Michelangelo Project http://graphics.stanford.edu/projects/mich/ OpIcal'triangulaIon' Project'a'single'stripe'of'laser'light' Scan'it'across'the'surface'of'the'object' This'is'a'very'precise'version'of'structured'light'scanning' 83'

Laser scanned models The#Digital#Michelangelo#Project,'Levoy'et'al.' 84' Slide credit: S. Seitz

Laser scanned models The#Digital#Michelangelo#Project,'Levoy'et'al.' 85' Slide credit: S. Seitz

Laser scanned models The#Digital#Michelangelo#Project,'Levoy'et'al.' 86' Slide credit: S. Seitz

Laser scanned models The#Digital#Michelangelo#Project,'Levoy'et'al.' 87' Slide credit: S. Seitz

Laser scanned models 1.0'mm'resoluIon'(56'million'triangles)'' The#Digital#Michelangelo#Project,'Levoy'et'al.' 88' Slide credit: S. Seitz

What'we'have'learned'today' IntroducIon'to'stereo'vision' Epipolar'geometry:'a'gentle'intro' Parallel'images'&'image'recIficaIon' Solving'the'correspondence'problem' Homographic'transformaIon' AcIve'stereo'vision'system' ' Reading:( '[HZ]'Chapters:'4,'9,'11' '[FP]'Chapters:'10 '' 89'