Learning-based Localization
|
|
- Steven Small
- 5 years ago
- Views:
Transcription
1 Learning-based Localization Eric Brachmann ECCV 2018 Tutorial on Visual Localization - Feature-based vs. Learned Approaches Torsten Sattler, Eric Brachmann
2 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 2
3 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 3
4 Machine Learning Monday Wednesday Saturday Sunday Wednesday y = f(i, w) Saturday Sunday Training Set 4
5 Convolutional Neural Networks w 1 5
6 Convolutional Neural Networks w 1 6
7 Convolutional Neural Networks w 1 f 1 I, w 1 7
8 Convolutional Neural Networks w w 2 f(x) x f 2 f 1 I, w 1, w 2 8
9 Convolutional Neural Networks Sunday Saturday Wednesday Monday y = f 3 (f 2 f 1 I, w 1, w 2, w 3 ) Sunday l(y) δl δw 1 = δl δf 3 δf 3 δf 2 δf 2 δf 1 δf 1 δw 1 9
10 Convolutional Neural Networks w [Long et al., Fully convolutional networks for semantic segmentation. CVPR15] 10
11 Convolutional Neural Networks w [12 Scenes Dataset: 11
12 Convolutional Neural Networks [Kokkinos, UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory, CVPR17] 12
13 Convolutional Neural Networks Advantages: Expensive but Parallelizable Powerful due to End-To-End Training Image I Feature Extraction (hand-crafted or learned) Classification (hand-crafted or learned) Output Image I Learned Output Disadvantages: Interpretability Training Data 13
14 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 14
15 Random Forests g awake (w 1 ) = 0.5 g smile w 2 = 0.7 g hat (w 3 ) = 0.0 g awake > 0 g smile > 0 Sunday Saturday Monday y = f(i, w) 15
16 Random Forests g awake > 0 Monday Saturday Sunday Saturday Monday Sunday 16
17 Random Forests g awake > 0 Monday Saturday Sunday g smile > 0 Sunday Monday Wednesday Saturday Sunday Saturday Monday Wednesday Saturday Sunday Monday Monday Wednesday Saturday Sunday 17
18 Random Forests [Shotton et al., Real-Time Human Pose Recognition in Parts from Single Depth Images, CVPR11] 18
19 Random Forests Advantages Fast Good Performance Need less training data Disadvantages No end-to-end training* Hand-crafted features* Interpretability (Andrej Karpathy, t-sne embeddings of IMAGENET) Rule 34 (of computer science after 2012): There is a paper about it, no exceptions. * Not in [Kontschieder et al., Deep Neural Decision Forests, ICCV15] 19
20 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 20
21 Camera Pose Regression r 1 r 2 r 3 t 1 t 2 t 3 = መh Annotated Training Data Input: RGB Image Feed-Forward CNN Output: Pose [7 Scenes Dataset: 21
22 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 22
23 Pose Parametrization More infos about the wonderful world of SO(3) in e.g. [Hartley et al., Rotation Averaging, IJCV13] መh SE(3) with SE 3 = { R t 0 1 R SO 3, t R3 } R SO 3 = {R R 3 3 RR T = I, det R = 1} መh Rotation Matrix R = r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 Unit Quaternion e.g. [1] q = (c, v 1, v 2, v 3 ) Axis-Angle e.g. [2] log R = θ u = (u 1, u 2, u 3 ) Log Unit Quaternion e.g. [3] log q = 2 log R [4] Enforce/Map: RR T = I, det R = 1 Enforce/Map: q = 1 Enforce/Map: Enforce/Map: [1] Kendall et al., PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, ICCV15 [2] Brachmann et al., DSAC - Differentiable RANSAC for Camera Localization, CVPR17 [3] Brahmbhatt et al., Geometry-Aware Learning of Maps for Camera Localization, CVPR18 [4] Sola, Quaternion kinematics for the error-state Kalman filter,
24 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 24
25 Pose Loss Functions h = (R, t) l t, t = t t θ How to measure rotation error? Angular Distance [1]: θ R, R = log(r R T ) R R q θ Quaternion Distance[2]: q q Angle-Axis Distance, Log Quaternion Distance[3]: log R log R q q q log R log R log TR log TR [4] [1] Brachmann et al., DSAC - Differentiable RANSAC for Camera Localization, CVPR17 [2] Kendall et al., PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, ICCV15 [3] Brahmbhatt et al., Geometry-Aware Learning of Maps for Camera Localization, CVPR18 [4] Hartley et al., Rotation Averaging, IJCV13 25
26 Pose Loss Functions How to combine rotation error and translation error? Implicit/Metric [1]: Hand-Tuned [2]: l β h, h = l t, t + βl R, R Self-Tuned [3]: l h, h = l t, t + l R, R or max(l t, t, l R, R ) l σ 2 h, h = l t, t exp s t + s t + l R, R exp s R + s R [1] Brachmann et al., DSAC - Differentiable RANSAC for Camera Localization, CVPR17 [2] Kendall et al., PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, ICCV15 [3] Kendall and Cipolla, Geometric Loss Functions for Camera Pose Regression with Deep Learning, CVPR 2017 s = log σ 2 26
27 Pose Loss Functions How to combine rotation error and translation error? Measure the reprojection error [1]: l π h, h = v M π h, v π(h, v) [1] Kendall and Cipolla, Geometric Loss Functions for Camera Pose Regression with Deep Learning, CVPR
28 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 28
29 PoseNet Discussion [7 Scenes Dataset: PoseNet [1]: ~45cm, 10 (GoogLeNet, Quaternions, l β ) Spatial LSTM [2]: ~31cm, 10 (GoogLeNet+LSTM, Quaternions, l β ) PoseNet [3]: ~23cm, 8 (GoogLeNet, Quaternions, l σ 2) PoseNet [3]: ~23cm, 8 (GoogLeNet, Quaternions, l π ) PoseNet [4]: ~23cm, 8 (ResNet, Quaternions, l π ) PoseNet [4]: ~22cm, 8 (ResNet, log Quat., l σ 2) MapNet+ [4]: ~19cm, 7 (ResNet, log Quat., l σ 2+l Rel ) [1] PoseNet: A Convolutional Network for Real-Time 6-DoF Camera Localization, Kendall et al., ICCV 2015 [2] Image-Based Localization with Spatial LSTMs, Walch et al., ICCV 2017 [3] Geometric Loss Functions for Camera Pose Regression with Deep Learning, Kendall and Cipolla, CVPR 2017 [4] Geometry-Aware Learning of Maps for Camera Localization, Brahmbhatt et al., CVPR18 [5] Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization, Sattler et al., PAMI 2017 Sparse Features: ~5cm, 2 Advantages PoseNet: Simple End-To-End Learning Fast 29
30 Side Comment: Visual Odometry Pose Regression seems to work very well for estimating relative poses [1] Additional Input: Pose estimate of last frame Tracking Accuracy: ~5cm, 4 [1] Deep Auxiliary Learning For Visual Localization And Odometry, Valada et al., ICRA
31 Feature-Based Pipeline [7 Scenes Dataset: Image I Feature Extraction Feature Matching Estimate Pose መh = PnP({p i, y i }) 6D Pose መh Image Coordinates p i Scene Coordinates y i 31
32 Feature-Based Pipeline [7 Scenes Dataset: Image I Feature Extraction Feature Matching Estimate Pose መh = PnP({p i, y i }) 6D Pose መh Image Coordinates p i Scene Coordinates y i 32
33 Feature-Based Pipeline [7 Scenes Dataset: Image I Feature Extraction Feature Matching RANSAC Estimate Poses (PnP) Score Poses (Inliers) 6D Pose መh Image Coordinates p i Scene Coordinates y i 33
34 Task Knowledge: Local Invariance Training Image Test Images መh?????? [7 Scenes Dataset: 34
35 Task Knowledge: The Camera Model y i p i Estimate Pose መh = PnP({p i, y i }) C Camera Coordinate System Scene Coordinate System Image Coordinate p i = Chy i Camera Matrix Pose Scene Coordinate 35
36 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 36
37 Feature-Based Pipeline Image I Feature Extraction Feature Matching RANSAC Estimate Poses (PnP) Score Poses (Inliers) 6D Pose መh Image Coordinates p i Scene Coordinates y i [7 Scenes Dataset: 37
38 Scene Coordinate Regression [Sho13] Image I (RGB-D) Scene Coordinate Regression RANSAC Estimate Poses (Kabsch) Score Poses (Inliers) 6D Pose መh p i R 2 y i R 3 [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 38
39 Scene Coordinate Regression Forest Features: Depth Adapted Pixel Differences f DA RGB p = I RGB p + δ 1 I D (p), c 1 I RGB (p + δ 2 I D (p), c 2) f p < τ f DA D p = I D p + δ 1 I D (p) I D (p + δ 2 I D (p) ) Split Score: Reduction of Spatial Variance S L S R V S L, S R = S L y തy L + S R y S L y S R y തy R 39
40 Scene Coordinate Regression Forest y pred f S Mean-Shift f S X X 40
41 Scene Coordinate Regression Results 41
42 Scene Coordinate Regression Results RGB %Pose Err. < 5cm5 Sparse Features 38.6% RGB-D [Sho13] 72.6% [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 42
43 Scene Coordinate Regression Results RGB %Pose Err. < 5cm5 Sparse Features 38.6% RGB-D [Sho13] 72.6% [Val15] 91.3% yp(y) pred f S X [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Val15] Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization, Valentin et al., CVPR 15 43
44 Scene Coordinate Regression Results RGB %Pose Err. < 5cm5 Sparse Features 38.6% RGB-D [Sho13] 72.6% [Val15] 91.3% [Cal17] 90.7% f S f S f S X [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Val15] Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization, Valentin et al., CVPR 15 [Cav17] On-the-Fly Adaptation of Regression Forests for Online Camera Localization, Cavallari et al., CVPR 17 X X 44
45 Scene Coordinate Regression Results RGB %Pose Err. < 5cm5 Sparse Features 38.6% [Bra16] 55.2% RGB-D [Sho13] 72.6% [Val15] 91.3% [Cal17] 90.7% [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Val15] Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization, Valentin et al., CVPR 15 [Cav17] On-the-Fly Adaptation of Regression Forests for Online Camera Localization, Cavallari et al., CVPR 17 [Bra16] Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image, Brachmann et al., CVPR 16 45
46 Scene Coordinate Regression Forests Advantages: Fast Good Results Training On-The-Fly V S L, S R = S L y തy L + S R y S L y S R y തy R Disadvantages: Needs 3D Model for Training No End-To-End Training 46
47 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 47
48 Scene Coordinate Regression with a CNN Image I Scene Coordinate Regression (CNN) RANSAC Estimate Poses (PnP) Score Poses (Inliers) 6D Pose መh p i R 2 y i R 3 48
49 Random Forest vs. CNN Forest Prediction: CNN Prediction: Ground Truth: Pose Estimation Succeeds (< 5cm, 5 ) Pose Estimation Fails (> 5cm, 5 ) 49
50 Random Forest vs. CNN Frequency CNN Prediction: 0,14 Random Forest CNN 0,12 0,1 Inlier Peaks 0,08 0,06 Inlier Threshold Forest Outliers 0,04 Obj. Corrd. Error < 10cm %Pose Err. < 5cm5 Random Forest ~15% 55.2% CNN ~25% 17.2% 0, Object Coordinate Error (mm) Optimized: l L2 y, y = y y 2 50
51 Scene Coordinate Loss Functions l L2 l L1 %Obj. Coor. Err. < 10cm %Pose Err. < 5cm5 l Huber l Tukey Random Forest ~15% 55.2% CNN (L 2 ) ~25% 17.2% CNN (L 1 ) ~45% 55.9% CNN (Huber) ~45% 54.4% CNN (Tukey) ~45% 52.1% 51
52 Scene Coordinate Loss Functions Frequency Random Forest CNN (L 2 ) CNN (L 1 ) 0,14 0,12 %Obj. Coor. Err. < 10cm %Pose Err. < 5cm5 0,1 Random Forest ~15% 55.2% 0,08 0,06 0,04 0,02 Inlier Threshold CNN (L 2 ) ~25% 17.2% CNN (L 1 ) ~45% 55.9% CNN (Huber) ~45% 54.4% CNN (Tukey) ~45% 52.1% Object Coordinate Error (mm) 52
53 Scene Coordinate Regression Nets Advantages: Fast Good Results Disadvantages: Needs 3D Model for Training No End-To-End Training Advantages: Fast Better Results Disadvantages: Needs 3D Model for Training End-To-End Training? 53
54 Even Better Loss Function? Image I Scene Coordinate Regression (CNN) RANSAC Estimate Poses (PnP) Score Poses (Inliers) 6D Pose መh Optimized: l L1 y, y = y y l መh, h DSAC Differentiable RANSAC [Bra17] [Bra17] DSAC Differentiable RANSAC for Camera Localization Brachmann et al., CVPR
55 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 55
56 What is RANSAC? h = f({y i }) h y i 56
57 What is RANSAC? 1) Sample multiple hypotheses h j h 1 = f(y 1, y 4 ) h 2 = f y 2, y 3 h 3 = f y 1, y 5 h 1 2) Score each hypothesis: s h j s h 1 = 2 s h 2 = 2 s h 3 = 4 h 3 h 2 3) Take best one (and refine): መh = argmax h j s(h j ) መh = h 3 57
58 Our Pipeline Image I Scene Coordinate Regression (CNN) RANSAC Estimate Poses (PnP) Score Poses 6D Pose መh w h 1 h 2 h 3 h 4 Reprojection Errors of h 2 s(h 1 ) s(h 2 ) s(h 3 ) s(h 4 ) Input RGB Scene Coordinate Regression Hypothesis Sampling Scoring Hypothesis Selection Result መh 58
59 Our Pipeline Image I Scene Coordinate Regression (CNN) RANSAC Estimate Poses (PnP) Score Poses 6D Pose መh w h 1 h 2 h 3 h 4 Reprojection Errors of h 2 s(h 1 ) s(h 2 ) s(h 3 ) s(h 4 ) Input RGB Scene Coordinate Regression Hypothesis Sampling Scoring Hypothesis Selection Result መh w l( መh, h ) 59
60 End-to-End Learning: How? h 1 s(h 1 ) w h 2 h 3 h 4 Reprojection Errors of h 2 s(h 2 ) s(h 3 ) s(h 4 ) መh argmax Selection h AM = argmax h j s(h j ) hard decision non-differentiable Soft argmax Selection exp(s(h j ))h j h SoftAM = σ k exp(s(h k )) j soft decision differentiable 60
61 End-to-End Learning: How? h 1 s(h 1 ) w h 2 h 3 h 4 Reprojection Errors of h 2 s(h 2 ) s(h 3 ) s(h 4 ) መh argmax Selection h AM = argmax h j s(h j ) hard decision non-differentiable Soft argmax Selection exp(s(h j ))h j h SoftAM = σ k exp(s(h k )) j soft decision differentiable Probabilistic Selection h DSAC = h j, where j~ hard decision differentiable exp(s(h j)) σ k exp(s(h k )) 61
62 DSAC Differentiable RANSAC Probabilistic Selection h DSAC = h j, where j~ exp(s(h j)) σ k exp(s(h k )) = P j w w Differentiation via the expected task loss: w E j~p(j w) l(h j, h ) = E j~p(j w) l h j, h log P j w + w w l h j, h Scene Coordinate Regression Scoring Derivative of the selection probability Derivative of the task loss 62
63 Roadmap Machine Learning Basics [10min] Convolutional Neural Networks Random Forests Camera Pose Regression [15min] Pose Parametrization Pose Loss Functions Results Scene Coordinate Regression [35min] Scene Coordinate Regression Forests Scene Coordinate Regression CNNs End-to-End Learning Results 63
64 Accuracy on 7-Scenes [Sho13] Error < 5cm, 5 Sparse Features [Sho13] Brachmann et al. [Bra16] Ours [Bra17] RANSAC Soft argmax DSAC 38.6% 55.2% 61.0% (not end-to-end) 57.8% (end-to-end) 66.2% (end-to-end) [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Bra16] Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image, Brachmann et al., CVPR 16 [Bra17] Learning to Predict Dense Correspondences For 6D Pose Estimation, Brachmann, Thesis,
65 Side Comment: Learning Scores [Bra17] learned s h - hard to regularize, overfits [Sho13] Inlier Count: s h = σ i 1 τ r i h, w - not differentiable [Bra18] Soft In. Count: s h = σ i sig(τ βr i h, w ) differentiable Score Regression Soft Inlier Count s(h 1 ) s(h 2 ) s(h 3 ) s(h 4 ) s(h 1 ) s(h 2 ) s(h 3 ) s(h 4 ) Training Set (2 Images Total) Estimated Camera Poses 3D Model Overlay 3D Model Overlay Estimation with Learned Score [Bra17] Estimation with Soft Inlier Count [Bra18] Test Image [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Bra17] DSAC Differentiable RANSAC for Camera Localization, Brachmann et al., CVPR 17 [Bra18] Learning Less is More - 6D Camera Localization via 3D Surface Regression, Brachmann and Rother, CVPR 18 65
66 Accuracy on 7-Scenes [Sho13] Error < 5cm, 5 Sparse Features [Sho13] Brachmann et al. [Bra16] Ours [Bra17] RANSAC Soft argmax DSAC DSAC++ [Bra18] 38.6% 55.2% 61.0% (not end-to-end) 57.8% (end-to-end) 66.2% (end-to-end) 76.1% (end-to-end) [Bra17] [Bra18] [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Bra16] Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image, Brachmann et al., CVPR 16 [Bra17] Learning to Predict Dense Correspondences For 6D Pose Estimation, Brachmann, Thesis, 2017 [Bra18] Learning Less is More - 6D Camera Localization via 3D Surface Regression, Brachmann and Rother, CVPR 18 66
67 Scene Coordinate Regression Nets Advantages: Fast Good Results Disadvantages: Needs 3D Model for Training No End-To-End Training Advantages: Fast End-To-End Training Better Best Results Disadvantages: Needs 3D Model for Training 67
68 Training without a 3D Model End-to-end training needs no 3D model but good initialization [Bra17] Initialization: Minimize 3D distance to ground truth scene coordinates min i y i w y i [Bra18] Initialization: Minimize 2D reprojection error min i π(y i w, h ) p i [Bra17] DSAC Differentiable RANSAC for Camera Localization, Brachmann et al., CVPR 17 [Bra18] Learning Less is More - 6D Camera Localization via 3D Surface Regression, Brachmann and Rother, CVPR 18 68
69 Accuracy on 7-Scenes [Sho13] Error < 5cm, 5 Sparse Features [Sho13] Brachmann et al. [Bra16] Ours [Bra17] RANSAC Soft argmax DSAC DSAC++ [Bra18] (w/o M) [Bra18] 38.6% 55.2% 61.0% (not end-to-end) 57.8% (end-to-end) 66.2% (end-to-end) 76.1% (end-to-end) 60.4% (end-to-end) with 3D model w/o 3D model [Sho13] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, Shotton et al., CVPR 13 [Bra16] Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image, Brachmann et al., CVPR 16 [Bra17] Learning to Predict Dense Correspondences For 6D Pose Estimation, Brachmann, Thesis, 2017 [Bra18] Learning Less is More - 6D Camera Localization via 3D Surface Regression, Brachmann and Rother, CVPR 18 69
70 Scene Coordinate Regression Nets Advantages: Fast Good Results Disadvantages: Needs 3D Model for Training No End-To-End Training Advantages: Fast End-To-End Training Better Best Results Disadvantages: Needs 3D Model for Training 70
71 Conclusion Learning pose estimation works! High accuracy possible. Learning pose estimation is fun! Interesting mixture of learning and geometry. Pose Regression fast coarse estimates Scene Coordinate Regression with RFs: fast and accurate with CNNs: best accuracy, no 3D models Code of many methods online: PoseNet: MapNet: SCoRF [Bra16]: VLocNet: DSAC: DSAC++: 71
Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image - Supplementary Material -
Uncertainty-Driven 6D Pose Estimation of s and Scenes from a Single RGB Image - Supplementary Material - Eric Brachmann*, Frank Michel, Alexander Krull, Michael Ying Yang, Stefan Gumhold, Carsten Rother
More informationarxiv: v4 [cs.cv] 26 Jul 2018
Scene Coordinate and Correspondence Learning for Image-Based Localization Mai Bui 1, Shadi Albarqouni 1, Slobodan Ilic 1,2 and Nassir Navab 1,3 arxiv:1805.08443v4 [cs.cv] 26 Jul 2018 Abstract Scene coordinate
More informationLearning Less is More 6D Camera Localization via 3D Surface Regression
Learning Less is More 6D Camera Localization via 3D Surface Regression Eric Brachmann and Carsten Rother Visual Learning Lab Heidelberg University (HCI/IWR) http://vislearn.de Abstract Popular research
More informationLearning 6D Object Pose Estimation and Tracking
Learning 6D Object Pose Estimation and Tracking Carsten Rother presented by: Alexander Krull 21/12/2015 6D Pose Estimation Input: RGBD-image Known 3D model Output: 6D rigid body transform of object 21/12/2015
More information6-DOF Model Based Tracking via Object Coordinate Regression Supplemental Note
6-DOF Model Based Tracking via Object Coordinate Regression Supplemental Note Alexander Krull, Frank Michel, Eric Brachmann, Stefan Gumhold, Stephan Ihrke, Carsten Rother TU Dresden, Dresden, Germany The
More informationDeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material
DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington
More informationScene Coordinate and Correspondence Learning for Image-Based Localization
BUI ET AL.: SCENE COORDINATE AND CORRESPONDENCE LEARNING 1 Scene Coordinate and Correspondence Learning for Image-Based Localization Mai Bui 1 mai.bui@tum.de Shadi Albarqouni 1 shadi.albarqouni@tum.de
More informationarxiv: v3 [cs.cv] 13 Jul 2017
Random Forests versus Neural Networks What s Best for Camera Localization? arxiv:1609.05797v3 [cs.cv] 13 Jul 2017 Daniela Massiceti1, Alexander Krull2, Eric Brachmann2, Carsten Rother2 and Philip H.S.
More informationCamera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network
Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network Zakaria Laskar 1, Iaroslav Melekhov 1, Surya Kalia 2 and Juho Kannala 1 1 Aalto University, Finland, 2 IIT,
More informationSpatial Localization and Detection. Lecture 8-1
Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday
More informationFitting (LMedS, RANSAC)
Fitting (LMedS, RANSAC) Thursday, 23/03/2017 Antonis Argyros e-mail: argyros@csd.uoc.gr LMedS and RANSAC What if we have very many outliers? 2 1 Least Median of Squares ri : Residuals Least Squares n 2
More informationPerceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington
Perceiving the 3D World from Images and Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World 3 Understand
More information3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington
3D Object Recognition and Scene Understanding from RGB-D Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World
More informationThe Kinect Sensor. Luís Carriço FCUL 2014/15
Advanced Interaction Techniques The Kinect Sensor Luís Carriço FCUL 2014/15 Sources: MS Kinect for Xbox 360 John C. Tang. Using Kinect to explore NUI, Ms Research, From Stanford CS247 Shotton et al. Real-Time
More information6D Object Pose Estimation Binaries
6D Object Pose Estimation Binaries March 20, 2018 All data regarding our ECCV 14 paper can be downloaded from our project page: https://hci.iwr.uni-heidelberg.de/vislearn/research/ scene-understanding/pose-estimation/#eccv14.
More informationBB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth
BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth Mahdi Rad 1 Vincent Lepetit 1, 2 1 Institute for Computer Graphics and
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationDeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,
More informationReal-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
1 Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade Tommaso Cavallari, Stuart Golodetz, Nicholas A. Lord, Julien Valentin, Victor A. Prisacariu, Luigi Di Stefano and
More informationDeep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany
Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical University of Munich, Germany C. Couprie et al. "Toward Real-time Indoor Semantic Segmentation Using Depth Information"
More informationDeep Learning Based 3D Reconstruction of Indoor Scenes
Deep Learning Based 3D Reconstruction of Indoor Scenes Student Name: Srinidhi Hegde Roll Number: 2013164 BTP report submitted in partial fulfillment of the requirements for the Degree of B.Tech. in Computer
More informationarxiv: v1 [cs.cv] 28 Sep 2018
Extrinsic camera calibration method and its performance evaluation Jacek Komorowski 1 and Przemyslaw Rokita 2 arxiv:1809.11073v1 [cs.cv] 28 Sep 2018 1 Maria Curie Sklodowska University Lublin, Poland jacek.komorowski@gmail.com
More informationSegmentation. Bottom up Segmentation Semantic Segmentation
Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,
More informationYiqi Yan. May 10, 2017
Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field
More informationarxiv: v3 [cs.cv] 14 Apr 2018
Real-Time 6DOF Pose Relocalization for Event Cameras with Stacked Spatial LSTM Networks Anh Nguyen 1, Thanh-Toan Do 2, Darwin G. Caldwell 1, and Nikos G. Tsagarakis 1 arxiv:1708.09011v3 [cs.cv] 14 Apr
More informationFast Edge Detection Using Structured Forests
Fast Edge Detection Using Structured Forests Piotr Dollár, C. Lawrence Zitnick [1] Zhihao Li (zhihaol@andrew.cmu.edu) Computer Science Department Carnegie Mellon University Table of contents 1. Introduction
More informationDepth from Stereo. Dominic Cheng February 7, 2018
Depth from Stereo Dominic Cheng February 7, 2018 Agenda 1. Introduction to stereo 2. Efficient Deep Learning for Stereo Matching (W. Luo, A. Schwing, and R. Urtasun. In CVPR 2016.) 3. Cascade Residual
More informationDeep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon
Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in
More informationDeep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia
Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky
More informationPredicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Presented by: Rex Ying and Charles Qi Input: A Single RGB Image Estimate
More informationEncoder-Decoder Networks for Semantic Segmentation. Sachin Mehta
Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:
More informationImage correspondences and structure from motion
Image correspondences and structure from motion http://graphics.cs.cmu.edu/courses/15-463 15-463, 15-663, 15-862 Computational Photography Fall 2017, Lecture 20 Course announcements Homework 5 posted.
More informationArticulated Pose Estimation with Flexible Mixtures-of-Parts
Articulated Pose Estimation with Flexible Mixtures-of-Parts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationMask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma
Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left
More informationDisguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601
Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,
More informationROBUST ESTIMATION TECHNIQUES IN COMPUTER VISION
ROBUST ESTIMATION TECHNIQUES IN COMPUTER VISION Half-day tutorial at ECCV 14, September 7 Olof Enqvist! Fredrik Kahl! Richard Hartley! Robust Optimization Techniques in Computer Vision! Olof Enqvist! Chalmers
More informationComputing the relations among three views based on artificial neural network
Computing the relations among three views based on artificial neural network Ying Kin Yu Kin Hong Wong Siu Hang Or Department of Computer Science and Engineering The Chinese University of Hong Kong E-mail:
More informationarxiv: v1 [cs.ro] 9 Mar 2018
c IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising
More informationAnalysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009
Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context
More informationJoint Object Detection and Viewpoint Estimation using CNN features
Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel, David Martín and José M. Armingol cguindel@ing.uc3m.es Intelligent Systems Laboratory Universidad Carlos III de Madrid
More informationPSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.
PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. Rhodes 5/10/17 What is Machine Learning? Machine learning
More information3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing
3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)
More informationarxiv: v2 [cs.cv] 31 Jul 2017
: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization Ronald Clark 1, Sen Wang 1, Andrew Markham 1, Niki Trigoni 1, Hongkai Wen 2 1 University of Oxford, Oxford, OX1 3PA 2 University of Warwick,
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationDECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS
DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS Deep Neural Decision Forests Microsoft Research Cambridge UK, ICCV 2015 Decision Forests, Convolutional Networks and the Models in-between
More informationLecture 7: Semantic Segmentation
Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr
More informationSynscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.
Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet 7D Labs VINNOVA https://7dlabs.com Photo-realistic image synthesis
More informationECE 6554:Advanced Computer Vision Pose Estimation
ECE 6554:Advanced Computer Vision Pose Estimation Sujay Yadawadkar, Virginia Tech, Agenda: Pose Estimation: Part Based Models for Pose Estimation Pose Estimation with Convolutional Neural Networks (Deep
More informationClassification of objects from Video Data (Group 30)
Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time
More informationDirect Methods in Visual Odometry
Direct Methods in Visual Odometry July 24, 2017 Direct Methods in Visual Odometry July 24, 2017 1 / 47 Motivation for using Visual Odometry Wheel odometry is affected by wheel slip More accurate compared
More informationInstance-level recognition
Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction
More informationMulti-Output Learning for Camera Relocalization
Multi-Output Learning for Camera Relocalization Abner Guzman-Rivera Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram Izadi Microsoft Research University of Illinois Imperial
More informationLearning Semantic Environment Perception for Cognitive Robots
Learning Semantic Environment Perception for Cognitive Robots Sven Behnke University of Bonn, Germany Computer Science Institute VI Autonomous Intelligent Systems Some of Our Cognitive Robots Equipped
More informationarxiv: v1 [cs.cv] 28 Sep 2018
Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,
More informationDeep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material
Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Chi Li, M. Zeeshan Zia 2, Quoc-Huy Tran 2, Xiang Yu 2, Gregory D. Hager, and Manmohan Chandraker 2 Johns
More informationDeep Learning: Image Registration. Steven Chen and Ty Nguyen
Deep Learning: Image Registration Steven Chen and Ty Nguyen Lecture Outline 1. Brief Introduction to Deep Learning 2. Case Study 1: Unsupervised Deep Homography 3. Case Study 2: Deep LucasKanade What is
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects
More informationCS 231A Computer Vision (Winter 2018) Problem Set 3
CS 231A Computer Vision (Winter 2018) Problem Set 3 Due: Feb 28, 2018 (11:59pm) 1 Space Carving (25 points) Dense 3D reconstruction is a difficult problem, as tackling it from the Structure from Motion
More informationDeep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing
Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Introduction In this supplementary material, Section 2 details the 3D annotation for CAD models and real
More informationCS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep
CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012 Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships
More informationJakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich. LSD-SLAM: Large-Scale Direct Monocular SLAM
Computer Vision Group Technical University of Munich Jakob Engel LSD-SLAM: Large-Scale Direct Monocular SLAM Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich Monocular Video Engel,
More informationGesture Recognition: Hand Pose Estimation. Adrian Spurr Ubiquitous Computing Seminar FS
Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous Computing Seminar FS2014 27.05.2014 1 What is hand pose estimation? Input Computer-usable form 2 Augmented Reality Gaming Robot Control
More informationHide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization
Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Krishna Kumar Singh and Yong Jae Lee University of California, Davis ---- Paper Presentation Yixian
More informationLEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION
LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION Kihwan Kim, Senior Research Scientist Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz CORRESPENDECES IN COMPUTER
More informationA Systems View of Large- Scale 3D Reconstruction
Lecture 23: A Systems View of Large- Scale 3D Reconstruction Visual Computing Systems Goals and motivation Construct a detailed 3D model of the world from unstructured photographs (e.g., Flickr, Facebook)
More informationImage-based Localization using Hourglass Networks
Image-based Localization using Hourglass Networks Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, and Esa Rahtu 2 Aalto University, Finland firstname.lastname@aalto.fi 2 Tampere University of Technology,
More informationDiscrete Optimization of Ray Potentials for Semantic 3D Reconstruction
Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray
More informationDepth Estimation from a Single Image Using a Deep Neural Network Milestone Report
Figure 1: The architecture of the convolutional network. Input: a single view image; Output: a depth map. 3 Related Work In [4] they used depth maps of indoor scenes produced by a Microsoft Kinect to successfully
More informationCS381V Experiment Presentation. Chun-Chen Kuo
CS381V Experiment Presentation Chun-Chen Kuo The Paper Indoor Segmentation and Support Inference from RGBD Images. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. ECCV 2012. 50 100 150 200 250 300 350
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationDeep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D
Deep Learning for Virtual Shopping Dr. Jürgen Sturm Group Leader RGB-D metaio GmbH Augmented Reality with the Metaio SDK: IKEA Catalogue App Metaio: Augmented Reality Metaio SDK for ios, Android and Windows
More informationSingle Image Depth Estimation via Deep Learning
Single Image Depth Estimation via Deep Learning Wei Song Stanford University Stanford, CA Abstract The goal of the project is to apply direct supervised deep learning to the problem of monocular depth
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts
More informationSemantic Segmentation
Semantic Segmentation UCLA:https://goo.gl/images/I0VTi2 OUTLINE Semantic Segmentation Why? Paper to talk about: Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, and T. Darrell,
More informationFrom 3D descriptors to monocular 6D pose: what have we learned?
ECCV Workshop on Recovering 6D Object Pose From 3D descriptors to monocular 6D pose: what have we learned? Federico Tombari CAMP - TUM Dynamic occlusion Low latency High accuracy, low jitter No expensive
More informationObject Detection on Self-Driving Cars in China. Lingyun Li
Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not
More informationDeep learning for object detection. Slides from Svetlana Lazebnik and many others
Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationAn Exploration of Computer Vision Techniques for Bird Species Classification
An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex
More informationSemantic Match Consistency for Long-Term Visual Localization
Semantic Match Consistency for Long-Term Visual Localization Carl Toft 1, Erik Stenborg 1, Lars Hammarstrand 1, Lucas Brynte 1, Marc Pollefeys 2,3, Torsten Sattler 2, Fredrik Kahl 1 1 Department of Electrical
More informationDeep Models for 3D Reconstruction
Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision Group, MPI for Intelligent Systems, Tübingen Computer Vision and Geometry Group, ETH Zürich October 12, 2017 Max Planck Institute for
More informationJOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA
JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist
More informationInstance-aware Semantic Segmentation via Multi-task Network Cascades
Instance-aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai, Kaiming He, Jian Sun Microsoft research 2016 Yotam Gil Amit Nativ Agenda Introduction Highlights Implementation Further
More informationObject Detection by 3D Aspectlets and Occlusion Reasoning
Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition
More informationMobile Robotics. Mathematics, Models, and Methods. HI Cambridge. Alonzo Kelly. Carnegie Mellon University UNIVERSITY PRESS
Mobile Robotics Mathematics, Models, and Methods Alonzo Kelly Carnegie Mellon University HI Cambridge UNIVERSITY PRESS Contents Preface page xiii 1 Introduction 1 1.1 Applications of Mobile Robots 2 1.2
More informationObject Detection Based on Deep Learning
Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
More informationINFO0948 Fitting and Shape Matching
INFO0948 Fitting and Shape Matching Renaud Detry University of Liège, Belgium Updated March 31, 2015 1 / 33 These slides are based on the following book: D. Forsyth and J. Ponce. Computer vision: a modern
More informationHuman Pose Estimation with Deep Learning. Wei Yang
Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2 What do we need to know to recognize a crime scene? 3
More informationInstance-level recognition
Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction
More informationDiffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Yaguang Li Joint work with Rose Yu, Cyrus Shahabi, Yan Liu Page 1 Introduction Traffic congesting is wasteful of time,
More informationCNN for Low Level Image Processing. Huanjing Yue
CNN for Low Level Image Processing Huanjing Yue 2017.11 1 Deep Learning for Image Restoration General formulation: min Θ L( x, x) s. t. x = F(y; Θ) Loss function Parameters to be learned Key issues The
More informationCS5670: Computer Vision
CS5670: Computer Vision Noah Snavely Lecture 33: Recognition Basics Slides from Andrej Karpathy and Fei-Fei Li http://vision.stanford.edu/teaching/cs231n/ Announcements Quiz moved to Tuesday Project 4
More informationLearning and Recognizing Visual Object Categories Without First Detecting Features
Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather
More informationLearning video saliency from human gaze using candidate selection
Learning video saliency from human gaze using candidate selection Rudoy, Goldman, Shechtman, Zelnik-Manor CVPR 2013 Paper presentation by Ashish Bora Outline What is saliency? Image vs video Candidates
More informationEuclidean Reconstruction Independent on Camera Intrinsic Parameters
Euclidean Reconstruction Independent on Camera Intrinsic Parameters Ezio MALIS I.N.R.I.A. Sophia-Antipolis, FRANCE Adrien BARTOLI INRIA Rhone-Alpes, FRANCE Abstract bundle adjustment techniques for Euclidean
More informationInstance-level recognition part 2
Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,
More informationDense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm
Computer Vision Group Prof. Daniel Cremers Dense Tracking and Mapping for Autonomous Quadrocopters Jürgen Sturm Joint work with Frank Steinbrücker, Jakob Engel, Christian Kerl, Erik Bylow, and Daniel Cremers
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More information