Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Similar documents
SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Scale Invariant Feature Transform

Outline 7/2/201011/6/

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Scale Invariant Feature Transform

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

The SIFT (Scale Invariant Feature

Motion Estimation and Optical Flow Tracking

Local invariant features

Feature Based Registration - Image Alignment

Computer Vision for HCI. Topics of This Lecture

3D from Photographs: Automatic Matching of Images. Dr Francesco Banterle

School of Computing University of Utah

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

AK Computer Vision Feature Point Detectors and Descriptors

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Object Recognition with Invariant Features

Local Feature Detectors

Obtaining Feature Correspondences

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Local features and image matching. Prof. Xin Yang HUST

Visual Tracking (1) Pixel-intensity-based methods

Scott Smith Advanced Image Processing March 15, Speeded-Up Robust Features SURF

Corner Detection. GV12/3072 Image Processing.

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Feature Detection and Matching

Autonomous Navigation for Flying Robots

Local Image Features

Prof. Feng Liu. Spring /26/2017

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Image processing and features

Motion illusion, rotating snakes

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Local Features: Detection, Description & Matching

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Visual Tracking (1) Feature Point Tracking and Block Matching

Image Features. Work on project 1. All is Vanity, by C. Allan Gilbert,

Edge and corner detection

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Feature Tracking and Optical Flow

Comparison of Feature Detection and Matching Approaches: SIFT and SURF

Scale Invariant Feature Transform by David Lowe

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Multi-modal Registration of Visual Data. Massimiliano Corsini Visual Computing Lab, ISTI - CNR - Italy

CS 556: Computer Vision. Lecture 3

CS5670: Computer Vision

SIFT - scale-invariant feature transform Konrad Schindler

Local features: detection and description May 12 th, 2015

Keypoint detection. (image registration, panorama stitching, motion estimation + tracking, recognition )

CAP 5415 Computer Vision Fall 2012

Key properties of local features

COMPUTER VISION > OPTICAL FLOW UTRECHT UNIVERSITY RONALD POPPE

Requirements for region detection

Problems with template matching

Computer Vision I. Announcement. Corners. Edges. Numerical Derivatives f(x) Edge and Corner Detection. CSE252A Lecture 11

CS 558: Computer Vision 4 th Set of Notes

2D Image Processing Feature Descriptors

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Local features: detection and description. Local invariant features

3D Vision. Viktor Larsson. Spring 2019

Application questions. Theoretical questions

Stitching and Blending

Eppur si muove ( And yet it moves )

Local Image Features

Anno accademico 2006/2007. Davide Migliore

Automatic Image Alignment (feature-based)

Feature descriptors and matching

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Implementing the Scale Invariant Feature Transform(SIFT) Method

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

CS231A Section 6: Problem Set 3

ECE Digital Image Processing and Introduction to Computer Vision

CS 556: Computer Vision. Lecture 3

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

Capturing, Modeling, Rendering 3D Structures

Robotics Programming Laboratory

Feature Tracking and Optical Flow

Line, edge, blob and corner detection

Leow Wee Kheng CS4243 Computer Vision and Pattern Recognition. Motion Tracking. CS4243 Motion Tracking 1

Computer Vision II Lecture 4

SIFT: Scale Invariant Feature Transform

Wikipedia - Mysid

Lecture 10 Detectors and descriptors

CS664 Lecture #21: SIFT, object recognition, dynamic programming

EE795: Computer Vision and Intelligent Systems

IMAGE MATCHING USING SCALE INVARIANT FEATURE TRANSFORM (SIFT) Naotoshi Seo and David A. Schug

A Comparison of SIFT and SURF

Visual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.

Kanade Lucas Tomasi Tracking (KLT tracker)

Computer Vision I. Announcements. Fourier Tansform. Efficient Implementation. Edge and Corner Detection. CSE252A Lecture 13.

Image features. Image Features

CS201: Computer Vision Introduction to Tracking

Coarse-to-fine image registration

Local Features Tutorial: Nov. 8, 04

Implementation and Comparison of Feature Detection Methods in Image Mosaicing

3D Photography. Marc Pollefeys, Torsten Sattler. Spring 2015

Transcription:

Augmented Reality VU Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Feature Point-Based 3D Tracking

Feature Points for 3D Tracking Much less ambiguous than edges; Point-to-point reprojection error, not point-to-line. More robust to occlusions than template matching. BUT not for all objects.

Feature Point Extraction How to extract the same physical points?

Good Features to Track x J. Shi and C. Tomasi. Good features to track. CVPR'94. Defines a "cornerness" measure. Idea: Look for the points x easy to match under a 2D translation

Local Feature Detection: The Math x y W (u,v) Consider shifting the window W by (u,v) how do the pixels in W change? compare each pixel before and after by summing up the squared differences

Local Feature Detection: The Math x y (u,v) W summing up the squared differences of pixel intensities:

Small Motion Assumption Taylor Series expansion of I(x + u, y + v): If the motion (u,v) is small, then the first order approximation is good Plugging this into the formula from the previous slide

Local Feature Detection: The Math

Local Feature Detection: The Math E(u, v) X (x,y)2w X (x,y)2w X (x,y)2w [uv] 0 @ apple [I x I y ] apple Ix apple u v apple u [uv] [I I x I y ] y v apple apple I 2 [uv] x I x I y u I x I y Iy 2 v 1 X apple apple I 2 x I x I y A u I x I y v (x,y)2w 2 I 2 y

Local Feature Detection: The Math 0 E(u, v) [uv]@ X (x,y)2w apple I 2 x I x I y I x I y I 2 y 1 A apple u v we are looking for (x, y) images locations such that E(u, v) is large for all directions [u, v] How is it related to H?

Local Feature Detection: The Math H = X (x,y)2w apple I 2 x I x I y I x I y is a 2 2 symmetric matrix. It can be decomposed into: H =[x + x ] apple + 0 0 where λ + and λ - are the eigenvalues of H; I 2 y x + and x - are the eigenvectors of H. 2 4 x> + x > 3 5

Local Feature Detection: The Math apple H =[x + x ] + 0 0 where λ + and λ - are the eigenvalues of H; x + and x - are the eigenvectors of H. E(u, v) [uv] H apple u v x + = direction for (u, v) of largest increase in E λ + = amount of increase in direction x + x - = direction for (u, v) of smallest increase in E λ - = amount of increase in direction x - 2 4 x> + x > x + x - 3 5

Harris Cornerness Computation g x g y (g y ) 2 (g x ) 2 Gauss(.) g x g y Gauss(.) (g x ) 2 Gauss(.) (g y ) 2

min(λ 1,λ 2 )

Feature Point Tracking Two general approaches: KLT Kanade-Lucas-Tomasi tracker: Detection in the first frame then tracking; Detection in every frame and matching.

KLT Kanade-Lucas-Tomasi Tracker [Shi & Tomasi CVPR94] Detection (Good Features to Track CVPR 94): Tracking: Make use of the Lucas-Kanade algorithm j ( I t (f(m j ;p i + Δ i )) T(m j )) 2 à Correlation measure: Sum of Square Differences à f: translation model or affine model. Monitoring the templates: Stop tracking when low similarity (Correlation measure > Threshold)

Disadvantages of the KLT Tracker Looses all the features after a while: Potential solution: regularly redetect feature points, but can be confused by still be tracked features. There is a better solution.

Detection + Matching in Every Frame Detecting in every frame; Matching consecutive frames.

Feature Point Matching 4 3 2 Possible correlation measures: +di +dj ( ) 2 C = I 1 (x + di,y + dj) I 2 ( x # + di, y # + dj) i= di j= dj 1 Left image Right image 5 Point 1 Point 2 Point 3 Point 4 Point 5 C = +di +dj i= di j= dj ( I 1 (x + di, y + dj) I 1 ) I 2 ( x # + di, y # + dj) I 2 σ 1 σ 2 ( ) x di x' di dj dj y y' I 1 I 2

Cross-correlation measure: 1 4 5 Feature Point Matching 3 2 C = +di i= di +dj j= dj ( I 1 (x + di,y + dj) I 1 ) I 2 ( x # + di, y # + dj) I 2 σ 1 σ 2 - invariant to affine changes of the lighting; - between -1 (completly different patches) and +1 (equal patches); In practice: accept patches when C > 0.8 ( )

Feature Point Matching 2 1. For each point, search for the correspondent that maximizes the correlation. Search limited to a Region of interest centered on the point.

Feature Point Matching 2 1. For each point, search for the correspondent that maximizes the correlation. Search limited to a Region of interest. Retain the best correspondent according to the correlation.

Feature Point Matching 1. For each point, search for the correspondent that maximizes the correlation. Search limited to a Region of interest. 2. Reverse the role of the images.

Feature Point Matching 1. For each point, search for the correspondent that maximizes the correlation. Search limited to a Region of interest. 2. Reverse the role of the images.

Feature Point Matching Keep the points that choose each other.

Using Interest Points for 3D Tracking: Tracking planes Simon et al., "Pose Estimation for Planar Structures", CGA02. For a plane: H w,t =H t x H t-1 x..h 1 x H w,0 H w,0 H 1 H 2 H t Estimation of the homographies H t from matches.

Interest Point-Based Tracking Advantage: Robust to occlusions

Reference frame-based tracking [Vacchetti et al PAMI04] Reference frames are images of the object, captured and registered offline.

Reference Frame-Based Tracking Method Frame at time t

Reference Frame-Based Tracking Method How to match points with points in a (registered) reference frame of the object? Reference frame Frame at time t

Wide baseline matching During the tracking we roughly know where the camera is. Frame at time t.

Wide baseline matching During the tracking we roughly know where the camera is. We re-render the reference frame from the viewpoint estimated at time t-1. Reference frame «Rerendered reference frame» rendered from the viewpoint estimated at time t-1. Frame at time t.

Reference frame Wide baseline matching During the tracking we roughly know where the camera is. We re-render the reference frame from the viewpoint estimated at time t-1. The «re-rendered reference frame» is an intermediate image that can easily be matched with the current frame. «Rerendered reference frame» rendered from the viewpoint estimated at time t-1. Frame at time t.

Reference Frame-Based Tracking Method Works but not accurate à the virtual objects jitter:

In the reference frames-based tracking method, the successive frames were tracked independently. reference frames time t-1 time t

Stable Tracking Method Idea: track interest points on the object over the successive frames and use them to improve the accuracy of the camera registration. reference frames time t-1 time t

Stable Tracking Method The tracked points are the projections of 3D points lying on the object surface: à we should also optimize on these points 3D positions. reference frame time t-1 time t

The tracked points are the projections of 3D points lying on the object surface: à we should also optimize on these points 3D positions. We also optimize not only on the current camera position but also on the previous ones. The problem becomes: min camera positions upto time t 3D positions of the tracked points Stable Tracking Method reprojection errors of the tracked points + reprojection errors of points matched with reference frames with the constraint that the tracked points lie on the object surface. reference frame time t-1 time t

Stable Tracking Method 1. We consider only the current and the previous frames to keep reasonable computation times. 2. The optimization of the tracked points 3D positions under the constraint they lie on the object surface can easily be performed using a transfer function Ψ. (Ying Shan et al. Model-Based Bundle Adjustment with Application to Face Modeling ICCV01): à the 3D positions are not explicitly computed. Object n i Ψ( n i m i ) Error to minimize camera at time t-1 camera at time t

Full Method

Results

Augmented Reality

Face Tracking Face assumed to be rigid; Generic 3D model of the face; 1 reference frame built manually on a frontal view of the face; Automatic reinitialisation using a 2D detection.

Vision-Based 3D Tracking

Recursive Tracking t = 0 t = 1 t = 2...

3D Object Detection Keypoint detection (Harris, extrema of Laplacian, affine regions,...); Keypoint recognition (descriptor matching or classification); Robust pose estimation (RANSAC+P3P,...). Registered image(s) of the object to detect Input image

Keypoint-Based Object Detection

Step 1: Detection invariant to scale and rotation, or perspective transformation

Step 2: Patch rectification

Step 3: Build description vector

Step 4: Match description vectors

Feature Detector in SIFT: Invariant to Rotation and Scale

Scale-Space Theory Original image Successive convolutions with a Gaussian filter or Gaussian derivative filter while increasing σ [Lindeberg 9*]

Laplacian of Gaussian (G xx + G yy ) for feature point detection Laplacian operator

Fast Approximation of the Laplacian of Gaussian Convolution with Laplacian of Gaussian is not separable, and therefore slow. However, the Laplacian of Gaussian can be approximated by the difference of two Gaussians: G(σ) G(σ') G(σ) - G(σ')

Efficient Scale-Space Detection

Resample Blur Subtract Accurate Keypoint localization Keypoint locations: Extrema of Difference-of-Gaussian in scale space: Sub-pixel and sub-scale interpolation: The Taylor expansion around point is: Offset of extremum (use finite differences for derivatives):

Results

Affine Region Detectors: Invariant to Affine Transformations

Harris-Affine & Hessian-Affine Region Detector Harris-Affine: Uses the Auto-correlation matrix as in the classic Harris detector: " 2 % Gauss(.) I x Gauss(.) (I x I y ) M = $ ' $ 2 Gauss(.) (I x I y ) Gauss(.) I ' # y & Local maxima of the smallest eigenvalue indicate the presence of a corner. Hessian-Affine: Considers the Hessian matrix: " H = I xx I xy % $ ' # I xy I yy & Local maxima of determinant or of the smallest eigenvalue indicate the presence of a blob structure.

Scale Selection Both the Harris-Affine and the Hessian-Affine use the Laplacian to select the "characteristic" scale: σ 2 Lap(x,σ)

Affine Transformation Estimation Warp by Affine Transformation M 1/2, where M is the auto-correlation matrix.

Harris-Affine & Hessian-Affine Region Detector Algorithm: 1. Detect initial region with Harris or Hessian detector and select the scale; 2. Estimate the shape with the second moment matrix (=auto-correlation matrix); 3. Normalize the affine region to the circular one; 4. Go to step 2 if the eigenvalues of the second moment matrix for new point are not equal.

Maximally Stable Extremal region detector Binary thresholding with thresholds from 0 to 255; Regions that remain unchanged over a large ranges of thresholds are kept.

Affine Normalization Warp by M 1 1/2 Warp by M 2 1/2 We still have to correct for the orientation!

Select Canonical Orientation Create histogram of local gradient directions computed over the image patch; Each gradient contributes for its norm, weighted by its distance to patch center; Assign canonical orientation at peak of smoothed histogram. 0 2π

SIFT Description Vector Made of local histograms of gradients: In practice: 8 orientations x 4 x 4 histograms = 128 dimensions vector.

Handling Lighting Changes Gains do not affect gradients; Normalization to unit length removes contrast; Saturation affects magnitudes much more than orientation: magnitudes are thresholded.

Standard Approach Step 4: Match description vectors

Matching: Approximate Nearest Neighbour Best-Bin-First: Approximate nearest-neighbour search in k-d tree q q