Learning Two-View Stereo Matching

Size: px

Start display at page:

Download "Learning Two-View Stereo Matching"

Arabella Berry
5 years ago
Views:

1 Learning Two-View Stereo Matching Jianxiong Xiao Jingni Chen Dit-Yan Yeung Long Quan Department of Computer Science and Engineering The Hong Kong University of Science and Technology The 10th European Conference on Computer Vision Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

2 Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

3 Introduction Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

4 Introduction Stereo Matching between Two Images Input: two wide-baseline images taken from the same static scene, neither calibrated nor rectified. For more general applications, such as robust motion estimation from structure. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

5 Introduction Related Work Small-baseline matching algorithm: cannot be extended easily when the epipolar lines are not parallel. Wide-baseline matching: depend heavily on the epipolar geometry which has to be provided, often by off-line calibration. Sparse matching: the fundamental matrix so estimated often fits to subsets of image, not the whole image. Region growing based methods: greedily, bad result for quite different pixel scales due to discrete growing. Learning techniques: the information learned from other irrelevant images is very weak, the quality of the result greatly depends on the training data. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

6 Introduction Related Work Small-baseline matching algorithm: cannot be extended easily when the epipolar lines are not parallel. Wide-baseline matching: depend heavily on the epipolar geometry which has to be provided, often by off-line calibration. Sparse matching: the fundamental matrix so estimated often fits to subsets of image, not the whole image. Region growing based methods: greedily, bad result for quite different pixel scales due to discrete growing. Learning techniques: the information learned from other irrelevant images is very weak, the quality of the result greatly depends on the training data. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

7 Introduction Related Work Small-baseline matching algorithm: cannot be extended easily when the epipolar lines are not parallel. Wide-baseline matching: depend heavily on the epipolar geometry which has to be provided, often by off-line calibration. Sparse matching: the fundamental matrix so estimated often fits to subsets of image, not the whole image. Region growing based methods: greedily, bad result for quite different pixel scales due to discrete growing. Learning techniques: the information learned from other irrelevant images is very weak, the quality of the result greatly depends on the training data. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

8 Introduction Related Work Small-baseline matching algorithm: cannot be extended easily when the epipolar lines are not parallel. Wide-baseline matching: depend heavily on the epipolar geometry which has to be provided, often by off-line calibration. Sparse matching: the fundamental matrix so estimated often fits to subsets of image, not the whole image. Region growing based methods: greedily, bad result for quite different pixel scales due to discrete growing. Learning techniques: the information learned from other irrelevant images is very weak, the quality of the result greatly depends on the training data. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

9 Introduction Related Work Small-baseline matching algorithm: cannot be extended easily when the epipolar lines are not parallel. Wide-baseline matching: depend heavily on the epipolar geometry which has to be provided, often by off-line calibration. Sparse matching: the fundamental matrix so estimated often fits to subsets of image, not the whole image. Region growing based methods: greedily, bad result for quite different pixel scales due to discrete growing. Learning techniques: the information learned from other irrelevant images is very weak, the quality of the result greatly depends on the training data. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

10 Introduction Our Semi-supervised Matching Approach Propose a semi-supervised prospective of the matching problem without training. Utilize all information in an optimization procedure: local, regional and global. More robust to noise: the label vector is affected not merely by one matched pair but by all pairs with weighted paths to it. Capable of handling real number labels which is the inherent requirement of sub-pixel accuracy matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

11 Semi-supervised Matching Framework Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

12 Semi-supervised Matching Framework Three Main Catalogs of Learning Methods Supervised Learning Given that. Now, whether and is or? Unsupervised Learning Given, and, any interesting structure in them? Semi-supervised Learning Jianxiong Xiao et al. (HKUST)? Learning Two-View Stereo Matching ECCV / 45

13 Semi-supervised Matching Framework Three Main Catalogs of Learning Methods Supervised Learning Given that. Now, whether and is or? Unsupervised Learning Given, and, any interesting structure in them? Semi-supervised Learning Jianxiong Xiao et al. (HKUST)? Learning Two-View Stereo Matching ECCV / 45

14 Semi-supervised Matching Framework Three Main Catalogs of Learning Methods Supervised Learning Given that. Now, whether and is or? Unsupervised Learning Given, and, any interesting structure in them? Semi-supervised Learning Jianxiong Xiao et al. (HKUST)? Learning Two-View Stereo Matching ECCV / 45

15 Semi-supervised Matching Framework Three Main Catalogs of Learning Methods Supervised Learning Given that. Now, whether and is or? Unsupervised Learning Given, and, any interesting structure in them? Semi-supervised Learning Jianxiong Xiao et al. (HKUST)? Learning Two-View Stereo Matching ECCV / 45

16 Semi-supervised Matching Framework Notations For p = 1 or 2, q = 3 p: x p (s p 1) c p +t p : coordinate position (s p,t p ) in the p-th image space, s p {1,,r p }, t p {1,,c p },i = (s p 1) c p + t p. X p : Input image with size n p = r p c p pixels ( ) T X p = x p 1,xp 2,...,xp (s p 1) c p +t,...,x p p n. p x q j :a matching point of x p i located at coordinate position (s q,t q ) in the q-th continuous image space, s q,t q R. Label Vector y p i = ( v p i,h p ) T (( i = s 1,t 1) ( s 2,t 2)) T R 2, representing the position offset from the point in the first image to the point in the second image. Label Matrix Y p = ( y p 1,,yp n p ) T and Visibility Vector O p = ( o p 1,,op n p ) T. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

17 Semi-supervised Matching Framework Notations For p = 1 or 2, q = 3 p: x p (s p 1) c p +t p : coordinate position (s p,t p ) in the p-th image space, s p {1,,r p }, t p {1,,c p },i = (s p 1) c p + t p. X p : Input image with size n p = r p c p pixels ( ) T X p = x p 1,xp 2,...,xp (s p 1) c p +t,...,x p p n. p x q j :a matching point of x p i located at coordinate position (s q,t q ) in the q-th continuous image space, s q,t q R. Label Vector y p i = ( v p i,h p ) T (( i = s 1,t 1) ( s 2,t 2)) T R 2, representing the position offset from the point in the first image to the point in the second image. Label Matrix Y p = ( y p 1,,yp n p ) T and Visibility Vector O p = ( o p 1,,op n p ) T. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

18 Semi-supervised Matching Framework Notations For p = 1 or 2, q = 3 p: x p (s p 1) c p +t p : coordinate position (s p,t p ) in the p-th image space, s p {1,,r p }, t p {1,,c p },i = (s p 1) c p + t p. X p : Input image with size n p = r p c p pixels ( ) T X p = x p 1,xp 2,...,xp (s p 1) c p +t,...,x p p n. p x q j :a matching point of x p i located at coordinate position (s q,t q ) in the q-th continuous image space, s q,t q R. Label Vector y p i = ( v p i,h p ) T (( i = s 1,t 1) ( s 2,t 2)) T R 2, representing the position offset from the point in the first image to the point in the second image. Label Matrix Y p = ( y p 1,,yp n p ) T and Visibility Vector O p = ( o p 1,,op n p ) T. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

19 Semi-supervised Matching Framework Notations For p = 1 or 2, q = 3 p: x p (s p 1) c p +t p : coordinate position (s p,t p ) in the p-th image space, s p {1,,r p }, t p {1,,c p },i = (s p 1) c p + t p. X p : Input image with size n p = r p c p pixels ( ) T X p = x p 1,xp 2,...,xp (s p 1) c p +t,...,x p p n. p x q j :a matching point of x p i located at coordinate position (s q,t q ) in the q-th continuous image space, s q,t q R. Label Vector y p i = ( v p i,h p ) T (( i = s 1,t 1) ( s 2,t 2)) T R 2, representing the position offset from the point in the first image to the point in the second image. Label Matrix Y p = ( y p 1,,yp n p ) T and Visibility Vector O p = ( o p 1,,op n p ) T. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

20 Semi-supervised Matching Framework Notations For p = 1 or 2, q = 3 p: x p (s p 1) c p +t p : coordinate position (s p,t p ) in the p-th image space, s p {1,,r p }, t p {1,,c p },i = (s p 1) c p + t p. X p : Input image with size n p = r p c p pixels ( ) T X p = x p 1,xp 2,...,xp (s p 1) c p +t,...,x p p n. p x q j :a matching point of x p i located at coordinate position (s q,t q ) in the q-th continuous image space, s q,t q R. Label Vector y p i = ( v p i,h p ) T (( i = s 1,t 1) ( s 2,t 2)) T R 2, representing the position offset from the point in the first image to the point in the second image. Label Matrix Y p = ( y p 1,,yp n p ) T and Visibility Vector O p = ( o p 1,,op n p ) T. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

21 Semi-supervised Matching Framework Smoothness IDEA: Nearby pixels are more likely to have similar label vectors. Smoothness Assumption = A graph G = V,E. Two Images = Two Graphs G 1 = V 1,E 1 and G 2 = V 2,E 2. N ( x p ) i : the set of data points in the neighborhood of x p i. Affinity Matrix W p, w p ij is non-zero iff x p i and x p j are neighbors in E p. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

22 Semi-supervised Matching Framework Smoothness IDEA: Nearby pixels are more likely to have similar label vectors. Smoothness Assumption = A graph G = V,E. Two Images = Two Graphs G 1 = V 1,E 1 and G 2 = V 2,E 2. N ( x p ) i : the set of data points in the neighborhood of x p i. Affinity Matrix W p, w p ij is non-zero iff x p i and x p j are neighbors in E p. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

23 Semi-supervised Matching Framework Smoothness IDEA: Nearby pixels are more likely to have similar label vectors. Smoothness Assumption = A graph G = V,E. Two Images = Two Graphs G 1 = V 1,E 1 and G 2 = V 2,E 2. N ( x p ) i : the set of data points in the neighborhood of x p i. Affinity Matrix W p, w p ij is non-zero iff x p i and x p j are neighbors in E p. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

24 Semi-supervised Matching Framework Smoothness IDEA: Nearby pixels are more likely to have similar label vectors. Smoothness Assumption = A graph G = V,E. Two Images = Two Graphs G 1 = V 1,E 1 and G 2 = V 2,E 2. N ( x p ) i : the set of data points in the neighborhood of x p i. Affinity Matrix W p, w p ij is non-zero iff x p i and x p j are neighbors in E p. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

25 Semi-supervised Matching Framework Smoothness IDEA: Nearby pixels are more likely to have similar label vectors. Smoothness Assumption = A graph G = V,E. Two Images = Two Graphs G 1 = V 1,E 1 and G 2 = V 2,E 2. N ( x p ) i : the set of data points in the neighborhood of x p i. Affinity Matrix W p, w p ij is non-zero iff x p i and x p j are neighbors in E p. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

26 Semi-supervised Matching Framework Semi-Supervised Setting Many existing matching techniques such as SIFT have already been powerful enough to recover some sparse matched pairs accurately and robustly. Labeled Data X 1 l,y1 l and X 2 l,yl 2 = Unlabeled Data X 1 u,yu 1 and X 2 u,yu 2. Semi-supervised learning on the graph representation tries to estimate a label matrix Ŷ p that is consistent with: the initial incomplete label matrix, the geometry of the data manifold induced by the graph structure. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

27 Semi-supervised Matching Framework Semi-Supervised Setting Many existing matching techniques such as SIFT have already been powerful enough to recover some sparse matched pairs accurately and robustly. Labeled Data X 1 l,y1 l and X 2 l,yl 2 = Unlabeled Data X 1 u,yu 1 and X 2 u,yu 2. Semi-supervised learning on the graph representation tries to estimate a label matrix Ŷ p that is consistent with: the initial incomplete label matrix, the geometry of the data manifold induced by the graph structure. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

28 Semi-supervised Matching Framework Semi-Supervised Setting Many existing matching techniques such as SIFT have already been powerful enough to recover some sparse matched pairs accurately and robustly. Labeled Data X 1 l,y1 l and X 2 l,yl 2 = Unlabeled Data X 1 u,yu 1 and X 2 u,yu 2. Semi-supervised learning on the graph representation tries to estimate a label matrix Ŷ p that is consistent with: the initial incomplete label matrix, the geometry of the data manifold induced by the graph structure. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

29 Semi-supervised Matching Framework Semi-Supervised Setting Many existing matching techniques such as SIFT have already been powerful enough to recover some sparse matched pairs accurately and robustly. Labeled Data X 1 l,y1 l and X 2 l,yl 2 = Unlabeled Data X 1 u,yu 1 and X 2 u,yu 2. Semi-supervised learning on the graph representation tries to estimate a label matrix Ŷ p that is consistent with: the initial incomplete label matrix, the geometry of the data manifold induced by the graph structure. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

30 Semi-supervised Matching Framework Semi-Supervised Setting Many existing matching techniques such as SIFT have already been powerful enough to recover some sparse matched pairs accurately and robustly. Labeled Data X 1 l,y1 l and X 2 l,yl 2 = Unlabeled Data X 1 u,yu 1 and X 2 u,yu 2. Semi-supervised learning on the graph representation tries to estimate a label matrix Ŷ p that is consistent with: the initial incomplete label matrix, the geometry of the data manifold induced by the graph structure. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

31 Semi-supervised Matching Framework Semi-Supervised Setting Many existing matching techniques such as SIFT have already been powerful enough to recover some sparse matched pairs accurately and robustly. Labeled Data X 1 l,y1 l and X 2 l,yl 2 = Unlabeled Data X 1 u,yu 1 and X 2 u,yu 2. Semi-supervised learning on the graph representation tries to estimate a label matrix Ŷ p that is consistent with: the initial incomplete label matrix, the geometry of the data manifold induced by the graph structure. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

32 Semi-supervised Matching Framework Consistency Cost with Initial Labeling Given a configuration Ŷ p, consistency with the initial labeling can be measured by C p (Ŷp l,o p) = x p i X p l o p i ŷ p i y p 2 i. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

33 Semi-supervised Matching Framework Consistency Cost with Geometry Consistency with the geometry of the data in the image space, which follows from the smooth manifold assumption, motivates a penalty term of the form C p s (Ŷp,O p) = 1 2 w p ij (o φ p i,op j x p i,xp j X p ) ŷ p i ŷ p j 2, where ( ) φ o p i,op j = 1 2 [ (o p) ( ) ] 2 2 i + o p j penalizes rapid changes in Ŷ p between points that are close, and only enforces smoothness within visible regions. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

34 Semi-supervised Matching Framework Local Label Preference Cost Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

35 Semi-supervised Matching Framework Local Label Preference Cost Local Label Preference Cost The local cost is defined as: C p (Ŷp d,o p) ( = o p i ρp i x p i X p (ŷp i ) + ( 1 o p i ) τ p) i. Similarity cost function ρ p i (y) : similarity cost between the pixel x p i one image and the corresponding point for the label vector y in the other image space. { } Penalty term τ p x p i = max x p j N (xp i ) i x p j : to avoid that every point tends to have zero visibility to escape from cost charging. in Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

36 Semi-supervised Matching Framework Regional Surface Shape Cost Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

37 Semi-supervised Matching Framework Regional Surface Shape Cost Assumption Shape Cues The shapes of the 3D objects surface in the scene are very important cues for matching. Intuitive Approach To reconstruct the 3D surface based on two-view geometry. Unstable, especially when the baseline is not large enough. Piecewise Planar Patch Assumption Since two data points with high affinity relation are more likely to have similar label vectors, we assume that the label vector of a data point can be linearly approximated by the label vectors of its neighbors. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

38 Semi-supervised Matching Framework Regional Surface Shape Cost Assumption Shape Cues The shapes of the 3D objects surface in the scene are very important cues for matching. Intuitive Approach To reconstruct the 3D surface based on two-view geometry. Unstable, especially when the baseline is not large enough. Piecewise Planar Patch Assumption Since two data points with high affinity relation are more likely to have similar label vectors, we assume that the label vector of a data point can be linearly approximated by the label vectors of its neighbors. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

39 Semi-supervised Matching Framework Regional Surface Shape Cost Assumption Shape Cues The shapes of the 3D objects surface in the scene are very important cues for matching. Intuitive Approach To reconstruct the 3D surface based on two-view geometry. Unstable, especially when the baseline is not large enough. Piecewise Planar Patch Assumption Since two data points with high affinity relation are more likely to have similar label vectors, we assume that the label vector of a data point can be linearly approximated by the label vectors of its neighbors. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

40 Semi-supervised Matching Framework Regional Surface Shape Cost Reconstruction Cost The label of a data point can be linearly constructed by its neighbors: y p i = x p j N (xp i ) The reconstruction cost can be defined as C r (Y p ) = y p i x k i X k w p ij yp j. x p j N (xp i ) = (I W p )Y p 2 F tr ((Y p ) T L p Y p) = a p ij y p i y p j 2, x p i,xp j X p w p ij yp j A p = W p + (W p ) T W p (W p ) T : adjacency matrix D p is a diagonal matrix containing the row sums of A p,d p I L p = D p A p : un-normalized graph Laplacian matrix Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45 2

41 Semi-supervised Matching Framework Regional Surface Shape Cost LLE Cost To align the two 2D manifolds (image spaces) to one 2D manifold (visible surface). The labeled data (known matched pairs) are accounted for by constraining the mapped coordinates of matched points to coincide. Ŷc p = Ŷ p l Ŷ p u Ŷ q u X p c, O p c = A p c = = X p l Xu p Xu q, O p l O p u O q u [, A p A p = ll A p ll + Aq ll A p lu A q lu A p ul A p uu 0 A q ul 0 A q uu, A p ul A p lu A p uu ], C p r (Ŷ1,Ŷ 2,O 1,O 2) = (ac p ) ij φ ( (oc p ) i,(o p ) c ) j (ŷ p c ) i (ŷ p 2 c ) j. x p i,xp j X c p Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

42 Semi-supervised Matching Framework Global Epipolar Geometry Cost Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

43 Semi-supervised Matching Framework Global Epipolar Geometry Cost Global Epipolar Geometry Cost For x p i at position (s p,t p ), the epipolar line: ( a p ) i,bp i,cp i = (s p,t p,1)f T pq. Squared Euclidean distance in the image space of the other image: ( a p d p i i (y) = sq + b p i tq + c p ) 2 i ( a p) 2 ( i + b p) 2, i where y = (( s 1,t 1) ( s 2,t 2)) T. Global cost: C p g (Ŷp,O p) = o p i d p i x p i X p (ŷp ). i Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

44 Semi-supervised Matching Framework Symmetric Visibility Consistency Cost Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

45 Semi-supervised Matching Framework Symmetric Visibility Consistency Cost Symmetric Visibility Consistency Cost If x p i in one image has label to match with a point in the q-th image, then there must exist some point in the q-th image to have label to match x p i. C p v (O p,ŷ q ) = β x p i X p (o p i γ p i )) 2 ( ) 2 (Ŷq x p i,xp j X p w p ij o p i o p j γ function indicates whether or not there exist one or more data points that match a point near x p i from the other view according to Ŷ q. Last term enforces the smoothness of the occlusion. β controls the strength of the visibility constraint. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

46 Semi-supervised Matching Framework Symmetric Visibility Consistency Cost Voting for γ ( ) T For each point x q j at position (s q,t q ) in X q with label y q j = v q j,hq j : Place a 2D Gaussian ψ (s,t) on the p-th image centered at the matched position c j = (s p,t p ) T. = Mixture of Gaussian x q ψ cj (s,t) in the voted image space. j Truncate it γ p (s,t) = min { 1, ψ cj (s,t) }. x q j X q Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

47 Iterative MV Optimization Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

48 Iterative MV Optimization Optimization Process The optimization process has two steps: 1 M-step: estimate matching given visibility 2 V-step: estimate visibility given matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

49 Iterative MV Optimization M-step: Estimation of Matching Given Visibility Visibility term C v imposes two constraints on Ŷ given O 1 Local Constraint : For each pixel x p i in the p-th image, it should not match the invisible (occluded) points in the other image. 2 Global Constraint: For each visible point in the q-th image, at least one data point in the p-th image should match it. In the M-step, we approximate the visibility term by considering only the local constraint, which can be incorporated into the similarity function ρ p i (y) in C d. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

50 Iterative MV Optimization M-step: Estimation of Matching Given Visibility Visibility term C v imposes two constraints on Ŷ given O 1 Local Constraint : For each pixel x p i in the p-th image, it should not match the invisible (occluded) points in the other image. 2 Global Constraint: For each visible point in the q-th image, at least one data point in the p-th image should match it. In the M-step, we approximate the visibility term by considering only the local constraint, which can be incorporated into the similarity function ρ p i (y) in C d. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

51 Iterative MV Optimization M-step: Estimation of Matching Given Visibility (Con t) Let Y = (( Y 1) T, ( Y 2 ) T ) T. Cost function: ) ( C M (Ŷ = λl C p l + λ s Cs p + λ d C p d + λ r Cr p + λ g C p ) g + ε Ŷ 2, p=1,2 where ε Ŷ 2 is a small regularization term in order to prevent degenerate situations. For fixed O 1 and O 2, the cost minimization is done by setting the derivative with respect to Ŷ to be zero since the second derivative is a positive definite matrix. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

52 Iterative MV Optimization V-step: Estimation of Visibility Given Matching Let O = (( O 1) T, ( O 2 ) T ) T. Cost function: ( C V (O) = λl C p l + λ s Cs p + λ d C p d + λ r Cr p + λ g Cg p + λ v C p ) v + ε O 2, p=1,2 For fixed Ŷ 1 and ^Y 2, the cost minimization is done by setting the derivative with respect to O to be zero since the second derivative is a positive definite matrix. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

53 Iterative MV Optimization System of Linear Equations Solving We can derive that by the way we defined W p and the cost functions, the coefficient matrix is strictly diagonally dominant and positive definite. Hence, Gauss-Seidel and Conjugate Gradient iterations both converge to the solution of the linear system with theoretical guarantee. GPU is helpful to speed up. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

54 Learning the Symmetric Affinity Matrix Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

55 Learning the Symmetric Affinity Matrix Learning the Affinity Matrix Directly define the matrix: no reliable approach for model selection if only very few labeled points are available. Learn the matrix: more reliable and stable. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

56 Learning the Symmetric Affinity Matrix Manifold Assumptions Smooth manifold and linear re-constructable assumption for the manifold in image space. The label space and the image space share the same local linear reconstruction weights. Linear construction weight matrix W p by minimizing the energy function E W p = x p i X p E x p, i E x p i = x p i x p j N (xp i ) w p ij xp j 2 = x p j,xp k N (xp i ) w p ij Gi jk w p ik. ) T (x where G i jk (x = p i x p p j i x p k). To avoid the undesirable contribution of negative weights, we enforce w p ij = 1, w p ij 0. x p j N (xp i ) Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

57 Learning the Symmetric Affinity Matrix Quadratic Programming Objective Function min W p x p i X p x p j,xp k N (xp i ) w p ij Gi jk w p ik + κ ij ( w p ij w p ji ) 2 (1) s.t. x p i X p, x p j N (xp i ) w p ij = 1, w p ij 0. ( ) 2 where ij w p ij w p ji is a penalty term to encourage w p ij and w p ji to be similar. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

58 More Details Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

59 More Details Label Initialization 1 For each image, the occlusion boundaries and depth ordering in the scene are identified using [Hoiem et al. 2007]. 2 SIFT key points and nearest neighbor algorithm to achieve an initial matching. 3 Enforce the constraints of the matching to be one-to-one cross consistency. 4 Discrete region growing [Kannala and Brandt 2007]. 5 Interpolate the unmatched part by estimating the local homography transformation. 6 Obtain the initial visibility matrix. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

60 More Details Label Initialization 1 For each image, the occlusion boundaries and depth ordering in the scene are identified using [Hoiem et al. 2007]. 2 SIFT key points and nearest neighbor algorithm to achieve an initial matching. 3 Enforce the constraints of the matching to be one-to-one cross consistency. 4 Discrete region growing [Kannala and Brandt 2007]. 5 Interpolate the unmatched part by estimating the local homography transformation. 6 Obtain the initial visibility matrix. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

61 More Details Label Initialization 1 For each image, the occlusion boundaries and depth ordering in the scene are identified using [Hoiem et al. 2007]. 2 SIFT key points and nearest neighbor algorithm to achieve an initial matching. 3 Enforce the constraints of the matching to be one-to-one cross consistency. 4 Discrete region growing [Kannala and Brandt 2007]. 5 Interpolate the unmatched part by estimating the local homography transformation. 6 Obtain the initial visibility matrix. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

62 More Details Label Initialization 1 For each image, the occlusion boundaries and depth ordering in the scene are identified using [Hoiem et al. 2007]. 2 SIFT key points and nearest neighbor algorithm to achieve an initial matching. 3 Enforce the constraints of the matching to be one-to-one cross consistency. 4 Discrete region growing [Kannala and Brandt 2007]. 5 Interpolate the unmatched part by estimating the local homography transformation. 6 Obtain the initial visibility matrix. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

63 More Details Label Initialization 1 For each image, the occlusion boundaries and depth ordering in the scene are identified using [Hoiem et al. 2007]. 2 SIFT key points and nearest neighbor algorithm to achieve an initial matching. 3 Enforce the constraints of the matching to be one-to-one cross consistency. 4 Discrete region growing [Kannala and Brandt 2007]. 5 Interpolate the unmatched part by estimating the local homography transformation. 6 Obtain the initial visibility matrix. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

64 More Details Label Initialization 1 For each image, the occlusion boundaries and depth ordering in the scene are identified using [Hoiem et al. 2007]. 2 SIFT key points and nearest neighbor algorithm to achieve an initial matching. 3 Enforce the constraints of the matching to be one-to-one cross consistency. 4 Discrete region growing [Kannala and Brandt 2007]. 5 Interpolate the unmatched part by estimating the local homography transformation. 6 Obtain the initial visibility matrix. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

65 More Details Computing the similarity cost function Our algorithm works with some label data in a semi-supervised manner by the consistent cost C l The local cost C d just works in an axillary manner. Unlike the traditional unsupervised matching, our framework does not heavily rely on the similarity function ρ p i (y). For efficient computation, we just sample some values for some integer combination of h and v to compute ρ p i (y) = exp( xp i xq j 2 2σ 2 ). We normalized the largest sampled value to be 1, and then fit ρ p i (y) with continuous and differentiable quadratic function, i.e. ρ p i (y) = (v v o) 2 + (h h o ) 2 2σ 2, where (v o,h o ) and σ are the center and spread of the parabola for x p i. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

66 More Details The Complete Procedure 1 Compute the depth and occlusion boundary image and feature vector. 2 Compute sparse matching by SIFT and the confidence penalty τ, then interpolate the results from sparse matching with depth information to achieve an initial solution. 3 Learn the affinity matrix W 1 and W 2. 4 while (cost change between two iterations threshold): 1 Estimate the fundamental matrix F, and reject outliers to achieve a subset as label data, 2 Compute the parameters for the similarity cost function ρ and epipolar cost function d, 3 Estimate matching given visibility, 4 Compute the γ map, 5 Estimate visibility given matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

67 More Details The Complete Procedure 1 Compute the depth and occlusion boundary image and feature vector. 2 Compute sparse matching by SIFT and the confidence penalty τ, then interpolate the results from sparse matching with depth information to achieve an initial solution. 3 Learn the affinity matrix W 1 and W 2. 4 while (cost change between two iterations threshold): 1 Estimate the fundamental matrix F, and reject outliers to achieve a subset as label data, 2 Compute the parameters for the similarity cost function ρ and epipolar cost function d, 3 Estimate matching given visibility, 4 Compute the γ map, 5 Estimate visibility given matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

68 More Details The Complete Procedure 1 Compute the depth and occlusion boundary image and feature vector. 2 Compute sparse matching by SIFT and the confidence penalty τ, then interpolate the results from sparse matching with depth information to achieve an initial solution. 3 Learn the affinity matrix W 1 and W 2. 4 while (cost change between two iterations threshold): 1 Estimate the fundamental matrix F, and reject outliers to achieve a subset as label data, 2 Compute the parameters for the similarity cost function ρ and epipolar cost function d, 3 Estimate matching given visibility, 4 Compute the γ map, 5 Estimate visibility given matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

69 More Details The Complete Procedure 1 Compute the depth and occlusion boundary image and feature vector. 2 Compute sparse matching by SIFT and the confidence penalty τ, then interpolate the results from sparse matching with depth information to achieve an initial solution. 3 Learn the affinity matrix W 1 and W 2. 4 while (cost change between two iterations threshold): 1 Estimate the fundamental matrix F, and reject outliers to achieve a subset as label data, 2 Compute the parameters for the similarity cost function ρ and epipolar cost function d, 3 Estimate matching given visibility, 4 Compute the γ map, 5 Estimate visibility given matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

70 More Details The Complete Procedure 1 Compute the depth and occlusion boundary image and feature vector. 2 Compute sparse matching by SIFT and the confidence penalty τ, then interpolate the results from sparse matching with depth information to achieve an initial solution. 3 Learn the affinity matrix W 1 and W 2. 4 while (cost change between two iterations threshold): 1 Estimate the fundamental matrix F, and reject outliers to achieve a subset as label data, 2 Compute the parameters for the similarity cost function ρ and epipolar cost function d, 3 Estimate matching given visibility, 4 Compute the γ map, 5 Estimate visibility given matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

71 More Details The Complete Procedure 1 Compute the depth and occlusion boundary image and feature vector. 2 Compute sparse matching by SIFT and the confidence penalty τ, then interpolate the results from sparse matching with depth information to achieve an initial solution. 3 Learn the affinity matrix W 1 and W 2. 4 while (cost change between two iterations threshold): 1 Estimate the fundamental matrix F, and reject outliers to achieve a subset as label data, 2 Compute the parameters for the similarity cost function ρ and epipolar cost function d, 3 Estimate matching given visibility, 4 Compute the γ map, 5 Estimate visibility given matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

72 More Details The Complete Procedure 1 Compute the depth and occlusion boundary image and feature vector. 2 Compute sparse matching by SIFT and the confidence penalty τ, then interpolate the results from sparse matching with depth information to achieve an initial solution. 3 Learn the affinity matrix W 1 and W 2. 4 while (cost change between two iterations threshold): 1 Estimate the fundamental matrix F, and reject outliers to achieve a subset as label data, 2 Compute the parameters for the similarity cost function ρ and epipolar cost function d, 3 Estimate matching given visibility, 4 Compute the γ map, 5 Estimate visibility given matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

73 More Details The Complete Procedure 1 Compute the depth and occlusion boundary image and feature vector. 2 Compute sparse matching by SIFT and the confidence penalty τ, then interpolate the results from sparse matching with depth information to achieve an initial solution. 3 Learn the affinity matrix W 1 and W 2. 4 while (cost change between two iterations threshold): 1 Estimate the fundamental matrix F, and reject outliers to achieve a subset as label data, 2 Compute the parameters for the similarity cost function ρ and epipolar cost function d, 3 Estimate matching given visibility, 4 Compute the γ map, 5 Estimate visibility given matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

74 More Details The Complete Procedure 1 Compute the depth and occlusion boundary image and feature vector. 2 Compute sparse matching by SIFT and the confidence penalty τ, then interpolate the results from sparse matching with depth information to achieve an initial solution. 3 Learn the affinity matrix W 1 and W 2. 4 while (cost change between two iterations threshold): 1 Estimate the fundamental matrix F, and reject outliers to achieve a subset as label data, 2 Compute the parameters for the similarity cost function ρ and epipolar cost function d, 3 Estimate matching given visibility, 4 Compute the γ map, 5 Estimate visibility given matching. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

75 Experiments Outline 1 Introduction 2 Semi-supervised Matching Framework Local Label Preference Cost Regional Surface Shape Cost Global Epipolar Geometry Cost Symmetric Visibility Consistency Cost 3 Iterative MV Optimization 4 Learning the Symmetric Affinity Matrix 5 More Details 6 Experiments Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

76 Experiments Experiments Set the parameters to be more favor C l and C g for M-step, and C v for the V-step, tune the parameters manually. The intensity value is set to be the norm of the label vector, i.e. y. For visualization, we scale the intensity to the range between 0 and 200, and only visible matching is shown, i.e. o 0.5. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

77 Experiments More Results Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

78 Experiments 3D Reconstruction from 3 Views and Projective reconstruction [Quan 1995]. 4 Metric upgrade and bundle adjustment [Hartley and Zisserman 2004]. 5 Feature tracks with large reprojection errors are considered as outliers. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

79 Experiments 3D Reconstruction from 3 Views and Projective reconstruction [Quan 1995]. 4 Metric upgrade and bundle adjustment [Hartley and Zisserman 2004]. 5 Feature tracks with large reprojection errors are considered as outliers. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

80 Experiments 3D Reconstruction from 3 Views and Projective reconstruction [Quan 1995]. 4 Metric upgrade and bundle adjustment [Hartley and Zisserman 2004]. 5 Feature tracks with large reprojection errors are considered as outliers. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

81 Experiments 3D Reconstruction from 3 Views and Projective reconstruction [Quan 1995]. 4 Metric upgrade and bundle adjustment [Hartley and Zisserman 2004]. 5 Feature tracks with large reprojection errors are considered as outliers. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

82 Experiments 3D Reconstruction from 3 Views and Projective reconstruction [Quan 1995]. 4 Metric upgrade and bundle adjustment [Hartley and Zisserman 2004]. 5 Feature tracks with large reprojection errors are considered as outliers. Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

83 Experiments 3D Reconstruction from 3 Views (Con t) Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

84 Experiments Application for Structure from Motion Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

85 Q & A Q & A Q & A Jianxiong Xiao et al. (HKUST) Learning Two-View Stereo Matching ECCV / 45

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10 Structure from Motion CSE 152 Lecture 10 Announcements Homework 3 is due May 9, 11:59 PM Reading: Chapter 8: Structure from Motion Optional: Multiple View Geometry in Computer Vision, 2nd edition, Hartley