Learning-based Localization

Size: px
Start display at page:

Download "Learning-based Localization"

Transcription

1 Learning-based Localization Eric Brachmann ECCV 2018 Tutorial on Visual Localization - Feature-based vs. Learned Approaches Torsten Sattler, Eric Brachmann

2 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 2

3 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 3

4 Machine Learning Monday Wednesday Saturday Sunday Wednesday y = f(i, w) Saturday Sunday Training Set 4

5 Convolutional Neural Networks w 1 5

6 Convolutional Neural Networks w 1 6

7 Convolutional Neural Networks w 1 f 1 I, w 1 7

8 Convolutional Neural Networks w w 2 f(x) x f 2 f 1 I, w 1, w 2 8

9 Convolutional Neural Networks Sunday Saturday Wednesday Monday y = f 3 (f 2 f 1 I, w 1, w 2, w 3 ) Sunday l(y) δl δw 1 = δl δf 3 δf 3 δf 2 δf 2 δf 1 δf 1 δw 1 9

10 Convolutional Neural Networks w [Long et al., Fully convolutional networks for semantic segmentation. CVPR15] 10

11 Convolutional Neural Networks w [12 Scenes Dataset: 11

12 Convolutional Neural Networks [Kokkinos, UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory, CVPR17] 12

13 Convolutional Neural Networks Advantages: Expensive but Parallelizable Powerful due to End-To-End Training Image I Feature Extraction (hand-crafted or learned) Classification (hand-crafted or learned) Output Image I Learned Output Disadvantages: Interpretability Training Data 13

14 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 14

15 Random Forests g awake (w 1 ) = 0.5 g smile w 2 = 0.7 g hat (w 3 ) = 0.0 g awake > 0 g smile > 0 Sunday Saturday Monday y = f(i, w) 15

16 Random Forests g awake > 0 Monday Saturday Sunday Saturday Monday Sunday 16

17 Random Forests g awake > 0 Monday Saturday Sunday g smile > 0 Sunday Monday Wednesday Saturday Sunday Saturday Monday Wednesday Saturday Sunday Monday Monday Wednesday Saturday Sunday 17

18 Random Forests [Shotton et al., Real-Time Human Pose Recognition in Parts from Single Depth Images, CVPR11] 18

19 Random Forests Advantages Fast Good Performance Need less training data Disadvantages No end-to-end training* Hand-crafted features* Interpretability (Andrej Karpathy, t-sne embeddings of IMAGENET) Rule 34 (of computer science after 2012): There is a paper about it, no exceptions. * Not in [Kontschieder et al., Deep Neural Decision Forests, ICCV15] 19

20 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 20

21 Camera Pose Regression r 1 r 2 r 3 t 1 t 2 t 3 = መh Annotated Training Data Input: RGB Image Feed-Forward CNN Output: Pose [7 Scenes Dataset: 21

22 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 22

23 Pose Parametrization More infos about the wonderful world of SO(3) in e.g. [Hartley et al., Rotation Averaging, IJCV13] መh SE(3) with SE 3 = { R t 0 1 R SO 3, t R3 } R SO 3 = {R R 3 3 RR T = I, det R = 1} መh Rotation Matrix R = r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 Unit Quaternion e.g. [1] q = (c, v 1, v 2, v 3 ) Axis-Angle e.g. [2] log R = θ u = (u 1, u 2, u 3 ) Log Unit Quaternion e.g. [3] log q = 2 log R [4] Enforce/Map: RR T = I, det R = 1 Enforce/Map: q = 1 Enforce/Map: Enforce/Map: [1] Kendall et al., PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, ICCV15 [2] Brachmann et al., DSAC - Differentiable RANSAC for Camera Localization, CVPR17 [3] Brahmbhatt et al., Geometry-Aware Learning of Maps for Camera Localization, CVPR18 [4] Sola, Quaternion kinematics for the error-state Kalman filter,

24 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 24

25 Pose Loss Functions h = (R, t) l t, t = t t θ How to measure rotation error? Angular Distance [1]: θ R, R = log(r R T ) R R q θ Quaternion Distance[2]: q q Angle-Axis Distance, Log Quaternion Distance[3]: log R log R q q q log R log R log TR log TR [4] [1] Brachmann et al., DSAC - Differentiable RANSAC for Camera Localization, CVPR17 [2] Kendall et al., PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, ICCV15 [3] Brahmbhatt et al., Geometry-Aware Learning of Maps for Camera Localization, CVPR18 [4] Hartley et al., Rotation Averaging, IJCV13 25

26 Pose Loss Functions How to combine rotation error and translation error? Implicit/Metric [1]: Hand-Tuned [2]: l β h, h = l t, t + βl R, R Self-Tuned [3]: l h, h = l t, t + l R, R or max(l t, t, l R, R ) l σ 2 h, h = l t, t exp s t + s t + l R, R exp s R + s R [1] Brachmann et al., DSAC - Differentiable RANSAC for Camera Localization, CVPR17 [2] Kendall et al., PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, ICCV15 [3] Kendall and Cipolla, Geometric Loss Functions for Camera Pose Regression with Deep Learning, CVPR 2017 s = log σ 2 26

27 Pose Loss Functions How to combine rotation error and translation error? Measure the reprojection error [1]: l π h, h = v M π h, v π(h, v) [1] Kendall and Cipolla, Geometric Loss Functions for Camera Pose Regression with Deep Learning, CVPR

28 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 28

29 PoseNet Discussion [7 Scenes Dataset: PoseNet [1]: ~45cm, 10 (GoogLeNet, Quaternions, l β ) Spatial LSTM [2]: ~31cm, 10 (GoogLeNet+LSTM, Quaternions, l β ) PoseNet [3]: ~23cm, 8 (GoogLeNet, Quaternions, l σ 2) PoseNet [3]: ~23cm, 8 (GoogLeNet, Quaternions, l π ) PoseNet [4]: ~23cm, 8 (ResNet, Quaternions, l π ) PoseNet [4]: ~22cm, 8 (ResNet, log Quat., l σ 2) MapNet+ [4]: ~19cm, 7 (ResNet, log Quat., l σ 2+l Rel ) [1] PoseNet: A Convolutional Network for Real-Time 6-DoF Camera Localization, Kendall et al., ICCV 2015 [2] Image-Based Localization with Spatial LSTMs, Walch et al., ICCV 2017 [3] Geometric Loss Functions for Camera Pose Regression with Deep Learning, Kendall and Cipolla, CVPR 2017 [4] Geometry-Aware Learning of Maps for Camera Localization, Brahmbhatt et al., CVPR18 [5] Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization, Sattler et al., PAMI 2017 Sparse Features: ~5cm, 2 Advantages PoseNet: Simple End-To-End Learning Fast 29

30 Side Comment: Visual Odometry Pose Regression seems to work very well for estimating relative poses [1] Additional Input: Pose estimate of last frame Tracking Accuracy: ~5cm, 4 [1] Deep Auxiliary Learning For Visual Localization And Odometry, Valada et al., ICRA

31 Feature-Based Pipeline [7 Scenes Dataset: Image I Feature Extraction Feature Matching Estimate Pose መh = PnP({p i, y i }) 6D Pose መh Image Coordinates p i Scene Coordinates y i 31

32 Feature-Based Pipeline [7 Scenes Dataset: Image I Feature Extraction Feature Matching Estimate Pose መh = PnP({p i, y i }) 6D Pose መh Image Coordinates p i Scene Coordinates y i 32

33 Feature-Based Pipeline [7 Scenes Dataset: Image I Feature Extraction Feature Matching RANSAC Estimate Poses (PnP) Score Poses (Inliers) 6D Pose መh Image Coordinates p i Scene Coordinates y i 33

34 Task Knowledge: Local Invariance Training Image Test Images መh?????? [7 Scenes Dataset: 34

35 Task Knowledge: The Camera Model y i p i Estimate Pose መh = PnP({p i, y i }) C Camera Coordinate System Scene Coordinate System Image Coordinate p i = Chy i Camera Matrix Pose Scene Coordinate 35

36 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 36

37 Feature-Based Pipeline Image I Feature Extraction Feature Matching RANSAC Estimate Poses (PnP) Score Poses (Inliers) 6D Pose መh Image Coordinates p i Scene Coordinates y i [7 Scenes Dataset: 37

38 Scene Coordinate Regression [Sho13] Image I (RGB-D) Scene Coordinate Regression RANSAC Estimate Poses (Kabsch) Score Poses (Inliers) 6D Pose መh p i R 2 y i R 3 [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 38

39 Scene Coordinate Regression Forest Features: Depth Adapted Pixel Differences f DA RGB p = I RGB p + δ 1 I D (p), c 1 I RGB (p + δ 2 I D (p), c 2) f p < τ f DA D p = I D p + δ 1 I D (p) I D (p + δ 2 I D (p) ) Split Score: Reduction of Spatial Variance S L S R V S L, S R = S L y തy L + S R y S L y S R y തy R 39

40 Scene Coordinate Regression Forest y pred f S Mean-Shift f S X X 40

41 Scene Coordinate Regression Results 41

42 Scene Coordinate Regression Results RGB %Pose Err. < 5cm5 Sparse Features 38.6% RGB-D [Sho13] 72.6% [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 42

43 Scene Coordinate Regression Results RGB %Pose Err. < 5cm5 Sparse Features 38.6% RGB-D [Sho13] 72.6% [Val15] 91.3% yp(y) pred f S X [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Val15] Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization, Valentin et al., CVPR 15 43

44 Scene Coordinate Regression Results RGB %Pose Err. < 5cm5 Sparse Features 38.6% RGB-D [Sho13] 72.6% [Val15] 91.3% [Cal17] 90.7% f S f S f S X [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Val15] Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization, Valentin et al., CVPR 15 [Cav17] On-the-Fly Adaptation of Regression Forests for Online Camera Localization, Cavallari et al., CVPR 17 X X 44

45 Scene Coordinate Regression Results RGB %Pose Err. < 5cm5 Sparse Features 38.6% [Bra16] 55.2% RGB-D [Sho13] 72.6% [Val15] 91.3% [Cal17] 90.7% [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Val15] Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization, Valentin et al., CVPR 15 [Cav17] On-the-Fly Adaptation of Regression Forests for Online Camera Localization, Cavallari et al., CVPR 17 [Bra16] Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image, Brachmann et al., CVPR 16 45

46 Scene Coordinate Regression Forests Advantages: Fast Good Results Training On-The-Fly V S L, S R = S L y തy L + S R y S L y S R y തy R Disadvantages: Needs 3D Model for Training No End-To-End Training 46

47 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 47

48 Scene Coordinate Regression with a CNN Image I Scene Coordinate Regression (CNN) RANSAC Estimate Poses (PnP) Score Poses (Inliers) 6D Pose መh p i R 2 y i R 3 48

49 Random Forest vs. CNN Forest Prediction: CNN Prediction: Ground Truth: Pose Estimation Succeeds (< 5cm, 5 ) Pose Estimation Fails (> 5cm, 5 ) 49

50 Random Forest vs. CNN Frequency CNN Prediction: 0,14 Random Forest CNN 0,12 0,1 Inlier Peaks 0,08 0,06 Inlier Threshold Forest Outliers 0,04 Obj. Corrd. Error < 10cm %Pose Err. < 5cm5 Random Forest ~15% 55.2% CNN ~25% 17.2% 0, Object Coordinate Error (mm) Optimized: l L2 y, y = y y 2 50

51 Scene Coordinate Loss Functions l L2 l L1 %Obj. Coor. Err. < 10cm %Pose Err. < 5cm5 l Huber l Tukey Random Forest ~15% 55.2% CNN (L 2 ) ~25% 17.2% CNN (L 1 ) ~45% 55.9% CNN (Huber) ~45% 54.4% CNN (Tukey) ~45% 52.1% 51

52 Scene Coordinate Loss Functions Frequency Random Forest CNN (L 2 ) CNN (L 1 ) 0,14 0,12 %Obj. Coor. Err. < 10cm %Pose Err. < 5cm5 0,1 Random Forest ~15% 55.2% 0,08 0,06 0,04 0,02 Inlier Threshold CNN (L 2 ) ~25% 17.2% CNN (L 1 ) ~45% 55.9% CNN (Huber) ~45% 54.4% CNN (Tukey) ~45% 52.1% Object Coordinate Error (mm) 52

53 Scene Coordinate Regression Nets Advantages: Fast Good Results Disadvantages: Needs 3D Model for Training No End-To-End Training Advantages: Fast Better Results Disadvantages: Needs 3D Model for Training End-To-End Training? 53

54 Even Better Loss Function? Image I Scene Coordinate Regression (CNN) RANSAC Estimate Poses (PnP) Score Poses (Inliers) 6D Pose መh Optimized: l L1 y, y = y y l መh, h DSAC Differentiable RANSAC [Bra17] [Bra17] DSAC Differentiable RANSAC for Camera Localization Brachmann et al., CVPR

55 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 55

56 What is RANSAC? h = f({y i }) h y i 56

57 What is RANSAC? 1) Sample multiple hypotheses h j h 1 = f(y 1, y 4 ) h 2 = f y 2, y 3 h 3 = f y 1, y 5 h 1 2) Score each hypothesis: s h j s h 1 = 2 s h 2 = 2 s h 3 = 4 h 3 h 2 3) Take best one (and refine): መh = argmax h j s(h j ) መh = h 3 57

58 Our Pipeline Image I Scene Coordinate Regression (CNN) RANSAC Estimate Poses (PnP) Score Poses 6D Pose መh w h 1 h 2 h 3 h 4 Reprojection Errors of h 2 s(h 1 ) s(h 2 ) s(h 3 ) s(h 4 ) Input RGB Scene Coordinate Regression Hypothesis Sampling Scoring Hypothesis Selection Result መh 58

59 Our Pipeline Image I Scene Coordinate Regression (CNN) RANSAC Estimate Poses (PnP) Score Poses 6D Pose መh w h 1 h 2 h 3 h 4 Reprojection Errors of h 2 s(h 1 ) s(h 2 ) s(h 3 ) s(h 4 ) Input RGB Scene Coordinate Regression Hypothesis Sampling Scoring Hypothesis Selection Result መh w l( መh, h ) 59

60 End-to-End Learning: How? h 1 s(h 1 ) w h 2 h 3 h 4 Reprojection Errors of h 2 s(h 2 ) s(h 3 ) s(h 4 ) መh argmax Selection h AM = argmax h j s(h j ) hard decision non-differentiable Soft argmax Selection exp(s(h j ))h j h SoftAM = σ k exp(s(h k )) j soft decision differentiable 60

61 End-to-End Learning: How? h 1 s(h 1 ) w h 2 h 3 h 4 Reprojection Errors of h 2 s(h 2 ) s(h 3 ) s(h 4 ) መh argmax Selection h AM = argmax h j s(h j ) hard decision non-differentiable Soft argmax Selection exp(s(h j ))h j h SoftAM = σ k exp(s(h k )) j soft decision differentiable Probabilistic Selection h DSAC = h j, where j~ hard decision differentiable exp(s(h j)) σ k exp(s(h k )) 61

62 DSAC Differentiable RANSAC Probabilistic Selection h DSAC = h j, where j~ exp(s(h j)) σ k exp(s(h k )) = P j w w Differentiation via the expected task loss: w E j~p(j w) l(h j, h ) = E j~p(j w) l h j, h log P j w + w w l h j, h Scene Coordinate Regression Scoring Derivative of the selection probability Derivative of the task loss 62

63 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 63

64 Accuracy on 7-Scenes [Sho13] Error < 5cm, 5 Sparse Features [Sho13] Brachmann et al. [Bra16] Ours [Bra17] RANSAC Soft argmax DSAC 38.6% 55.2% 61.0% (not end-to-end) 57.8% (end-to-end) 66.2% (end-to-end) [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Bra16] Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image, Brachmann et al., CVPR 16 [Bra17] Learning to Predict Dense Correspondences For 6D Pose Estimation, Brachmann, Thesis,

65 Side Comment: Learning Scores [Bra17] learned s h - hard to regularize, overfits [Sho13] Inlier Count: s h = σ i 1 τ r i h, w - not differentiable [Bra18] Soft In. Count: s h = σ i sig(τ βr i h, w ) differentiable Score Regression Soft Inlier Count s(h 1 ) s(h 2 ) s(h 3 ) s(h 4 ) s(h 1 ) s(h 2 ) s(h 3 ) s(h 4 ) Training Set (2 Images Total) Estimated Camera Poses 3D Model Overlay 3D Model Overlay Estimation with Learned Score [Bra17] Estimation with Soft Inlier Count [Bra18] Test Image [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Bra17] DSAC Differentiable RANSAC for Camera Localization, Brachmann et al., CVPR 17 [Bra18] Learning Less is More - 6D Camera Localization via 3D Surface Regression, Brachmann and Rother, CVPR 18 65

66 Accuracy on 7-Scenes [Sho13] Error < 5cm, 5 Sparse Features [Sho13] Brachmann et al. [Bra16] Ours [Bra17] RANSAC Soft argmax DSAC DSAC++ [Bra18] 38.6% 55.2% 61.0% (not end-to-end) 57.8% (end-to-end) 66.2% (end-to-end) 76.1% (end-to-end) [Bra17] [Bra18] [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Bra16] Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image, Brachmann et al., CVPR 16 [Bra17] Learning to Predict Dense Correspondences For 6D Pose Estimation, Brachmann, Thesis, 2017 [Bra18] Learning Less is More - 6D Camera Localization via 3D Surface Regression, Brachmann and Rother, CVPR 18 66

67 Scene Coordinate Regression Nets Advantages: Fast Good Results Disadvantages: Needs 3D Model for Training No End-To-End Training Advantages: Fast End-To-End Training Better Best Results Disadvantages: Needs 3D Model for Training 67

68 Training without a 3D Model End-to-end training needs no 3D model but good initialization [Bra17] Initialization: Minimize 3D distance to ground truth scene coordinates min i y i w y i [Bra18] Initialization: Minimize 2D reprojection error min i π(y i w, h ) p i [Bra17] DSAC Differentiable RANSAC for Camera Localization, Brachmann et al., CVPR 17 [Bra18] Learning Less is More - 6D Camera Localization via 3D Surface Regression, Brachmann and Rother, CVPR 18 68

69 Accuracy on 7-Scenes [Sho13] Error < 5cm, 5 Sparse Features [Sho13] Brachmann et al. [Bra16] Ours [Bra17] RANSAC Soft argmax DSAC DSAC++ [Bra18] (w/o M) [Bra18] 38.6% 55.2% 61.0% (not end-to-end) 57.8% (end-to-end) 66.2% (end-to-end) 76.1% (end-to-end) 60.4% (end-to-end) with 3D model w/o 3D model [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Bra16] Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image, Brachmann et al., CVPR 16 [Bra17] Learning to Predict Dense Correspondences For 6D Pose Estimation, Brachmann, Thesis, 2017 [Bra18] Learning Less is More - 6D Camera Localization via 3D Surface Regression, Brachmann and Rother, CVPR 18 69

70 Scene Coordinate Regression Nets Advantages: Fast Good Results Disadvantages: Needs 3D Model for Training No End-To-End Training Advantages: Fast End-To-End Training Better Best Results Disadvantages: Needs 3D Model for Training 70

71 Conclusion Learning pose estimation works! High accuracy possible. Learning pose estimation is fun! Interesting mixture of learning and geometry. Pose Regression fast coarse estimates Scene Coordinate Regression with RFs: fast and accurate with CNNs: best accuracy, no 3D models Code of many methods online: PoseNet: MapNet: SCoRF [Bra16]: VLocNet: DSAC: DSAC++: 71

Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image - Supplementary Material -

Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image - Supplementary Material - Uncertainty-Driven 6D Pose Estimation of s and Scenes from a Single RGB Image - Supplementary Material - Eric Brachmann*, Frank Michel, Alexander Krull, Michael Ying Yang, Stefan Gumhold, Carsten Rother

More information

arxiv: v4 [cs.cv] 26 Jul 2018

arxiv: v4 [cs.cv] 26 Jul 2018 Scene Coordinate and Correspondence Learning for Image-Based Localization Mai Bui 1, Shadi Albarqouni 1, Slobodan Ilic 1,2 and Nassir Navab 1,3 arxiv:1805.08443v4 [cs.cv] 26 Jul 2018 Abstract Scene coordinate

More information

Learning Less is More 6D Camera Localization via 3D Surface Regression

Learning Less is More 6D Camera Localization via 3D Surface Regression Learning Less is More 6D Camera Localization via 3D Surface Regression Eric Brachmann and Carsten Rother Visual Learning Lab Heidelberg University (HCI/IWR) http://vislearn.de Abstract Popular research

More information

Learning 6D Object Pose Estimation and Tracking

Learning 6D Object Pose Estimation and Tracking Learning 6D Object Pose Estimation and Tracking Carsten Rother presented by: Alexander Krull 21/12/2015 6D Pose Estimation Input: RGBD-image Known 3D model Output: 6D rigid body transform of object 21/12/2015

More information

6-DOF Model Based Tracking via Object Coordinate Regression Supplemental Note

6-DOF Model Based Tracking via Object Coordinate Regression Supplemental Note 6-DOF Model Based Tracking via Object Coordinate Regression Supplemental Note Alexander Krull, Frank Michel, Eric Brachmann, Stefan Gumhold, Stephan Ihrke, Carsten Rother TU Dresden, Dresden, Germany The

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

Scene Coordinate and Correspondence Learning for Image-Based Localization

Scene Coordinate and Correspondence Learning for Image-Based Localization BUI ET AL.: SCENE COORDINATE AND CORRESPONDENCE LEARNING 1 Scene Coordinate and Correspondence Learning for Image-Based Localization Mai Bui 1 mai.bui@tum.de Shadi Albarqouni 1 shadi.albarqouni@tum.de

More information

arxiv: v3 [cs.cv] 13 Jul 2017

arxiv: v3 [cs.cv] 13 Jul 2017 Random Forests versus Neural Networks What s Best for Camera Localization? arxiv:1609.05797v3 [cs.cv] 13 Jul 2017 Daniela Massiceti1, Alexander Krull2, Eric Brachmann2, Carsten Rother2 and Philip H.S.

More information

Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network

Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network Zakaria Laskar 1, Iaroslav Melekhov 1, Surya Kalia 2 and Juho Kannala 1 1 Aalto University, Finland, 2 IIT,

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Fitting (LMedS, RANSAC)

Fitting (LMedS, RANSAC) Fitting (LMedS, RANSAC) Thursday, 23/03/2017 Antonis Argyros e-mail: argyros@csd.uoc.gr LMedS and RANSAC What if we have very many outliers? 2 1 Least Median of Squares ri : Residuals Least Squares n 2

More information

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington Perceiving the 3D World from Images and Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World 3 Understand

More information

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington 3D Object Recognition and Scene Understanding from RGB-D Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World

More information

The Kinect Sensor. Luís Carriço FCUL 2014/15

The Kinect Sensor. Luís Carriço FCUL 2014/15 Advanced Interaction Techniques The Kinect Sensor Luís Carriço FCUL 2014/15 Sources: MS Kinect for Xbox 360 John C. Tang. Using Kinect to explore NUI, Ms Research, From Stanford CS247 Shotton et al. Real-Time

More information

6D Object Pose Estimation Binaries

6D Object Pose Estimation Binaries 6D Object Pose Estimation Binaries March 20, 2018 All data regarding our ECCV 14 paper can be downloaded from our project page: https://hci.iwr.uni-heidelberg.de/vislearn/research/ scene-understanding/pose-estimation/#eccv14.

More information

BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth

BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth Mahdi Rad 1 Vincent Lepetit 1, 2 1 Institute for Computer Graphics and

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,

More information

Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade

Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade 1 Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade Tommaso Cavallari, Stuart Golodetz, Nicholas A. Lord, Julien Valentin, Victor A. Prisacariu, Luigi Di Stefano and

More information

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical University of Munich, Germany C. Couprie et al. "Toward Real-time Indoor Semantic Segmentation Using Depth Information"

More information

Deep Learning Based 3D Reconstruction of Indoor Scenes

Deep Learning Based 3D Reconstruction of Indoor Scenes Deep Learning Based 3D Reconstruction of Indoor Scenes Student Name: Srinidhi Hegde Roll Number: 2013164 BTP report submitted in partial fulfillment of the requirements for the Degree of B.Tech. in Computer

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Extrinsic camera calibration method and its performance evaluation Jacek Komorowski 1 and Przemyslaw Rokita 2 arxiv:1809.11073v1 [cs.cv] 28 Sep 2018 1 Maria Curie Sklodowska University Lublin, Poland jacek.komorowski@gmail.com

More information

Segmentation. Bottom up Segmentation Semantic Segmentation

Segmentation. Bottom up Segmentation Semantic Segmentation Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

arxiv: v3 [cs.cv] 14 Apr 2018

arxiv: v3 [cs.cv] 14 Apr 2018 Real-Time 6DOF Pose Relocalization for Event Cameras with Stacked Spatial LSTM Networks Anh Nguyen 1, Thanh-Toan Do 2, Darwin G. Caldwell 1, and Nikos G. Tsagarakis 1 arxiv:1708.09011v3 [cs.cv] 14 Apr

More information

Fast Edge Detection Using Structured Forests

Fast Edge Detection Using Structured Forests Fast Edge Detection Using Structured Forests Piotr Dollár, C. Lawrence Zitnick [1] Zhihao Li (zhihaol@andrew.cmu.edu) Computer Science Department Carnegie Mellon University Table of contents 1. Introduction

More information

Depth from Stereo. Dominic Cheng February 7, 2018

Depth from Stereo. Dominic Cheng February 7, 2018 Depth from Stereo Dominic Cheng February 7, 2018 Agenda 1. Introduction to stereo 2. Efficient Deep Learning for Stereo Matching (W. Luo, A. Schwing, and R. Urtasun. In CVPR 2016.) 3. Cascade Residual

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Presented by: Rex Ying and Charles Qi Input: A Single RGB Image Estimate

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

Image correspondences and structure from motion

Image correspondences and structure from motion Image correspondences and structure from motion http://graphics.cs.cmu.edu/courses/15-463 15-463, 15-663, 15-862 Computational Photography Fall 2017, Lecture 20 Course announcements Homework 5 posted.

More information

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Articulated Pose Estimation with Flexible Mixtures-of-Parts Articulated Pose Estimation with Flexible Mixtures-of-Parts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

ROBUST ESTIMATION TECHNIQUES IN COMPUTER VISION

ROBUST ESTIMATION TECHNIQUES IN COMPUTER VISION ROBUST ESTIMATION TECHNIQUES IN COMPUTER VISION Half-day tutorial at ECCV 14, September 7 Olof Enqvist! Fredrik Kahl! Richard Hartley! Robust Optimization Techniques in Computer Vision! Olof Enqvist! Chalmers

More information

Computing the relations among three views based on artificial neural network

Computing the relations among three views based on artificial neural network Computing the relations among three views based on artificial neural network Ying Kin Yu Kin Hong Wong Siu Hang Or Department of Computer Science and Engineering The Chinese University of Hong Kong E-mail:

More information

arxiv: v1 [cs.ro] 9 Mar 2018

arxiv: v1 [cs.ro] 9 Mar 2018 c IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Joint Object Detection and Viewpoint Estimation using CNN features

Joint Object Detection and Viewpoint Estimation using CNN features Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel, David Martín and José M. Armingol cguindel@ing.uc3m.es Intelligent Systems Laboratory Universidad Carlos III de Madrid

More information

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. Rhodes 5/10/17 What is Machine Learning? Machine learning

More information

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing 3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)

More information

arxiv: v2 [cs.cv] 31 Jul 2017

arxiv: v2 [cs.cv] 31 Jul 2017 : A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization Ronald Clark 1, Sen Wang 1, Andrew Markham 1, Niki Trigoni 1, Hongkai Wen 2 1 University of Oxford, Oxford, OX1 3PA 2 University of Warwick,

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS Deep Neural Decision Forests Microsoft Research Cambridge UK, ICCV 2015 Decision Forests, Convolutional Networks and the Models in-between

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet. Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet 7D Labs VINNOVA https://7dlabs.com Photo-realistic image synthesis

More information

ECE 6554:Advanced Computer Vision Pose Estimation

ECE 6554:Advanced Computer Vision Pose Estimation ECE 6554:Advanced Computer Vision Pose Estimation Sujay Yadawadkar, Virginia Tech, Agenda: Pose Estimation: Part Based Models for Pose Estimation Pose Estimation with Convolutional Neural Networks (Deep

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Direct Methods in Visual Odometry

Direct Methods in Visual Odometry Direct Methods in Visual Odometry July 24, 2017 Direct Methods in Visual Odometry July 24, 2017 1 / 47 Motivation for using Visual Odometry Wheel odometry is affected by wheel slip More accurate compared

More information

Instance-level recognition

Instance-level recognition Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction

More information

Multi-Output Learning for Camera Relocalization

Multi-Output Learning for Camera Relocalization Multi-Output Learning for Camera Relocalization Abner Guzman-Rivera Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram Izadi Microsoft Research University of Illinois Imperial

More information

Learning Semantic Environment Perception for Cognitive Robots

Learning Semantic Environment Perception for Cognitive Robots Learning Semantic Environment Perception for Cognitive Robots Sven Behnke University of Bonn, Germany Computer Science Institute VI Autonomous Intelligent Systems Some of Our Cognitive Robots Equipped

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Chi Li, M. Zeeshan Zia 2, Quoc-Huy Tran 2, Xiang Yu 2, Gregory D. Hager, and Manmohan Chandraker 2 Johns

More information

Deep Learning: Image Registration. Steven Chen and Ty Nguyen

Deep Learning: Image Registration. Steven Chen and Ty Nguyen Deep Learning: Image Registration Steven Chen and Ty Nguyen Lecture Outline 1. Brief Introduction to Deep Learning 2. Case Study 1: Unsupervised Deep Homography 3. Case Study 2: Deep LucasKanade What is

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

CS 231A Computer Vision (Winter 2018) Problem Set 3

CS 231A Computer Vision (Winter 2018) Problem Set 3 CS 231A Computer Vision (Winter 2018) Problem Set 3 Due: Feb 28, 2018 (11:59pm) 1 Space Carving (25 points) Dense 3D reconstruction is a difficult problem, as tackling it from the Structure from Motion

More information

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Introduction In this supplementary material, Section 2 details the 3D annotation for CAD models and real

More information

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012 Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships

More information

Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich. LSD-SLAM: Large-Scale Direct Monocular SLAM

Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich. LSD-SLAM: Large-Scale Direct Monocular SLAM Computer Vision Group Technical University of Munich Jakob Engel LSD-SLAM: Large-Scale Direct Monocular SLAM Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich Monocular Video Engel,

More information

Gesture Recognition: Hand Pose Estimation. Adrian Spurr Ubiquitous Computing Seminar FS

Gesture Recognition: Hand Pose Estimation. Adrian Spurr Ubiquitous Computing Seminar FS Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous Computing Seminar FS2014 27.05.2014 1 What is hand pose estimation? Input Computer-usable form 2 Augmented Reality Gaming Robot Control

More information

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Krishna Kumar Singh and Yong Jae Lee University of California, Davis ---- Paper Presentation Yixian

More information

LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION

LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION Kihwan Kim, Senior Research Scientist Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz CORRESPENDECES IN COMPUTER

More information

A Systems View of Large- Scale 3D Reconstruction

A Systems View of Large- Scale 3D Reconstruction Lecture 23: A Systems View of Large- Scale 3D Reconstruction Visual Computing Systems Goals and motivation Construct a detailed 3D model of the world from unstructured photographs (e.g., Flickr, Facebook)

More information

Image-based Localization using Hourglass Networks

Image-based Localization using Hourglass Networks Image-based Localization using Hourglass Networks Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, and Esa Rahtu 2 Aalto University, Finland firstname.lastname@aalto.fi 2 Tampere University of Technology,

More information

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray

More information

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report Figure 1: The architecture of the convolutional network. Input: a single view image; Output: a depth map. 3 Related Work In [4] they used depth maps of indoor scenes produced by a Microsoft Kinect to successfully

More information

CS381V Experiment Presentation. Chun-Chen Kuo

CS381V Experiment Presentation. Chun-Chen Kuo CS381V Experiment Presentation Chun-Chen Kuo The Paper Indoor Segmentation and Support Inference from RGBD Images. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. ECCV 2012. 50 100 150 200 250 300 350

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Deep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D

Deep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D Deep Learning for Virtual Shopping Dr. Jürgen Sturm Group Leader RGB-D metaio GmbH Augmented Reality with the Metaio SDK: IKEA Catalogue App Metaio: Augmented Reality Metaio SDK for ios, Android and Windows

More information

Single Image Depth Estimation via Deep Learning

Single Image Depth Estimation via Deep Learning Single Image Depth Estimation via Deep Learning Wei Song Stanford University Stanford, CA Abstract The goal of the project is to apply direct supervised deep learning to the problem of monocular depth

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Semantic Segmentation

Semantic Segmentation Semantic Segmentation UCLA:https://goo.gl/images/I0VTi2 OUTLINE Semantic Segmentation Why? Paper to talk about: Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, and T. Darrell,

More information

From 3D descriptors to monocular 6D pose: what have we learned?

From 3D descriptors to monocular 6D pose: what have we learned? ECCV Workshop on Recovering 6D Object Pose From 3D descriptors to monocular 6D pose: what have we learned? Federico Tombari CAMP - TUM Dynamic occlusion Low latency High accuracy, low jitter No expensive

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

Semantic Match Consistency for Long-Term Visual Localization

Semantic Match Consistency for Long-Term Visual Localization Semantic Match Consistency for Long-Term Visual Localization Carl Toft 1, Erik Stenborg 1, Lars Hammarstrand 1, Lucas Brynte 1, Marc Pollefeys 2,3, Torsten Sattler 2, Fredrik Kahl 1 1 Department of Electrical

More information

Deep Models for 3D Reconstruction

Deep Models for 3D Reconstruction Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision Group, MPI for Intelligent Systems, Tübingen Computer Vision and Geometry Group, ETH Zürich October 12, 2017 Max Planck Institute for

More information

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist

More information

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Instance-aware Semantic Segmentation via Multi-task Network Cascades Instance-aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai, Kaiming He, Jian Sun Microsoft research 2016 Yotam Gil Amit Nativ Agenda Introduction Highlights Implementation Further

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

Mobile Robotics. Mathematics, Models, and Methods. HI Cambridge. Alonzo Kelly. Carnegie Mellon University UNIVERSITY PRESS

Mobile Robotics. Mathematics, Models, and Methods. HI Cambridge. Alonzo Kelly. Carnegie Mellon University UNIVERSITY PRESS Mobile Robotics Mathematics, Models, and Methods Alonzo Kelly Carnegie Mellon University HI Cambridge UNIVERSITY PRESS Contents Preface page xiii 1 Introduction 1 1.1 Applications of Mobile Robots 2 1.2

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

INFO0948 Fitting and Shape Matching

INFO0948 Fitting and Shape Matching INFO0948 Fitting and Shape Matching Renaud Detry University of Liège, Belgium Updated March 31, 2015 1 / 33 These slides are based on the following book: D. Forsyth and J. Ponce. Computer vision: a modern

More information

Human Pose Estimation with Deep Learning. Wei Yang

Human Pose Estimation with Deep Learning. Wei Yang Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2 What do we need to know to recognize a crime scene? 3

More information

Instance-level recognition

Instance-level recognition Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction

More information

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Yaguang Li Joint work with Rose Yu, Cyrus Shahabi, Yan Liu Page 1 Introduction Traffic congesting is wasteful of time,

More information

CNN for Low Level Image Processing. Huanjing Yue

CNN for Low Level Image Processing. Huanjing Yue CNN for Low Level Image Processing Huanjing Yue 2017.11 1 Deep Learning for Image Restoration General formulation: min Θ L( x, x) s. t. x = F(y; Θ) Loss function Parameters to be learned Key issues The

More information

CS5670: Computer Vision

CS5670: Computer Vision CS5670: Computer Vision Noah Snavely Lecture 33: Recognition Basics Slides from Andrej Karpathy and Fei-Fei Li http://vision.stanford.edu/teaching/cs231n/ Announcements Quiz moved to Tuesday Project 4

More information

Learning and Recognizing Visual Object Categories Without First Detecting Features

Learning and Recognizing Visual Object Categories Without First Detecting Features Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather

More information

Learning video saliency from human gaze using candidate selection

Learning video saliency from human gaze using candidate selection Learning video saliency from human gaze using candidate selection Rudoy, Goldman, Shechtman, Zelnik-Manor CVPR 2013 Paper presentation by Ashish Bora Outline What is saliency? Image vs video Candidates

More information

Euclidean Reconstruction Independent on Camera Intrinsic Parameters

Euclidean Reconstruction Independent on Camera Intrinsic Parameters Euclidean Reconstruction Independent on Camera Intrinsic Parameters Ezio MALIS I.N.R.I.A. Sophia-Antipolis, FRANCE Adrien BARTOLI INRIA Rhone-Alpes, FRANCE Abstract bundle adjustment techniques for Euclidean

More information

Instance-level recognition part 2

Instance-level recognition part 2 Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,

More information

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm Computer Vision Group Prof. Daniel Cremers Dense Tracking and Mapping for Autonomous Quadrocopters Jürgen Sturm Joint work with Frank Steinbrücker, Jakob Engel, Christian Kerl, Erik Bylow, and Daniel Cremers

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information