CS 1674: Intro to Computer Vision. Midterm Review. Prof. Adriana Kovashka University of Pittsburgh October 10, 2016

Similar documents
Epipolar Geometry and Stereo Vision

Epipolar Geometry and Stereo Vision

Local features: detection and description. Local invariant features

CS 2770: Intro to Computer Vision. Multiple Views. Prof. Adriana Kovashka University of Pittsburgh March 14, 2017

Local features and image matching. Prof. Xin Yang HUST

Local invariant features

Local features: detection and description May 12 th, 2015

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Epipolar Geometry and Stereo Vision

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Computer Vision for HCI. Topics of This Lecture

Local Features: Detection, Description & Matching

Epipolar Geometry and Stereo Vision

Wikipedia - Mysid

Bias-Variance Trade-off (cont d) + Image Representations

Structure from Motion

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

Ninio, J. and Stevens, K. A. (2000) Variations on the Hermann grid: an extinction illusion. Perception, 29,

Motion Estimation and Optical Flow Tracking

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Feature Based Registration - Image Alignment

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Stitching and Blending

Lecture 6: Finding Features (part 1/2)

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Mosaics. Today s Readings

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Local Feature Detectors

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

CS4670: Computer Vision

CS5670: Computer Vision

Computer Vision Lecture 17

Image warping and stitching

Motion illusion, rotating snakes

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Prof. Feng Liu. Spring /26/2017

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Computer Vision Lecture 17

Lecture: RANSAC and feature detectors

Homographies and RANSAC

Image Stitching. Slides from Rick Szeliski, Steve Seitz, Derek Hoiem, Ira Kemelmacher, Ali Farhadi

CS 558: Computer Vision 4 th Set of Notes

Local Image Features

Automatic Image Alignment

The SIFT (Scale Invariant Feature

Local features and image matching May 8 th, 2018

Camera Geometry II. COS 429 Princeton University

Image warping and stitching

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

Structure from Motion

Edge and corner detection

N-Views (1) Homographies and Projection

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Patch Descriptors. EE/CSE 576 Linda Shapiro

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Texture Representation + Image Pyramids

Lecture 10: Multi-view geometry

Lecture 10: Multi view geometry

Image warping and stitching

BIL Computer Vision Apr 16, 2014

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Local Image Features

Multi-stable Perception. Necker Cube

CAP 5415 Computer Vision Fall 2012

Image Features. Work on project 1. All is Vanity, by C. Allan Gilbert,

Stereo Vision. MAN-522 Computer Vision

Chaplin, Modern Times, 1936

Automatic Image Alignment

Final project bits and pieces

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

Feature descriptors and matching

Computer Vision Lecture 20

Image Rectification (Stereo) (New book: 7.2.1, old book: 11.1)

Automatic Image Alignment (feature-based)

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Computer Vision Lecture 20

Local Image Features

AK Computer Vision Feature Point Detectors and Descriptors

Patch Descriptors. CSE 455 Linda Shapiro

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Image Warping and Mosacing

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

Two-view geometry Computer Vision Spring 2018, Lecture 10

Recovering structure from a single view Pinhole perspective projection

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Final Exam Study Guide

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Scale Invariant Feature Transform

Stereo: Disparity and Matching

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Scale Invariant Feature Transform

Automatic Image Alignment (direct) with a lot of slides stolen from Steve Seitz and Rick Szeliski

Feature Detection and Matching

Transcription:

CS 1674: Intro to Computer Vision Midterm Review Prof. Adriana Kovashka University of Pittsburgh October 10, 2016

Reminders The midterm exam is in class on this coming Wednesday There will be no make-up exams unless you or a close relative is seriously ill!

Review requests I received Textures and texture representations, image responses to size and orientation of Gaussian filter banks, comparisons 4 Corner detection alg, Harris 4 Invariance vs covariance, affine intensity change, and applications to know 3 Scale-invariant detection, blob detection, Harris automatic scale selection 3 Sift and feature description 3 Keypoint matching alg, feature matching 2 Examples of how to compute and apply homography, epipolar geometry 2 Why it makes sense to use the ratio: distance to best match / distance to second best match when matching features across images Summary of equations students need to know Pyramids Convolution practical use Filters for transforming the image

Transformations, Homographies, Epipolar Geometry

2D Linear Transformations x' y' a c bx d y Only linear 2D transformations can be represented with a 2x2 matrix. Linear transformations are combinations of Scale, Rotation, Shear, and Mirror Alyosha Efros

2D Affine Transformations Affine transformations are combinations of Linear transformations, and Translations Maps lines to lines, parallel lines remain parallel w y x f e d c b a w y x 1 0 0 ' ' ' Adapted from Alyosha Efros

Projective Transformations Projective transformations: Affine transformations, and Projective warps Parallel lines do not necessarily remain parallel w y x i h g f e d c b a w y x ' ' ' Kristen Grauman

How to stitch together a panorama (a.k.a. Basic Procedure mosaic)? Take a sequence of images from the same position Rotate the camera about its optical center Compute the homography (transformation) between second image and first Transform the second image to overlap with the first Blend the two together to create a mosaic (If there are more images, repeat) Modified from Steve Seitz

Computing the homography x x, y 1 1 1, y 1 x 2, y 2 x, y 2 2 x, x, n y n n y n To compute the homography given pairs of corresponding points in the images, we need to set up an equation where the parameters of H are the unknowns Kristen Grauman

Computing the homography p = Hp wx' a b c x wy' d e f y w g h i 1 Can set scale factor i=1. So, there are 8 unknowns. Set up a system of linear equations: Ah = b where vector of unknowns h = [a,b,c,d,e,f,g,h] T Need at least 8 eqs, but the more the better Solve for h. If overconstrained, solve using least-squares: min Ah b 2 Kristen Grauman

Computing the homography Assume we have four matched points: How do we compute homography H? h 0 ' ' ' 1 0 0 0 ' ' ' 0 0 0 1 y yy xy y x x yx xx y x ' ' ' ' ' p' w y w w x 9 8 7 6 5 4 3 2 1 h h h h h h h h h H 9 8 7 6 5 4 3 2 1 h h h h h h h h h h Derek Hoiem p =Hp Apply SVD: UDV T = A [U, S, V] = svd(a); h = V smallest (column of V corr. to smallest singular value) A

Transforming the second image Test point: Image 2 Image 1 canvas x, y wx wy w, w x, y To apply a given homography H Compute p = Hp (regular matrix multiply) Convert p from homogeneous to image coordinates Modified from Kristen Grauman wx' wy' w p * * * * * * H * * * x y 1 p

Transforming the second image Image 2 Image 1 canvas H(x,y) y y x x f(x,y) g(x,y ) Forward warping: Send each pixel f(x,y) to its corresponding location (x,y ) = H(x,y) in the right image Modified from Alyosha Efros

Depth from disparity We have two images taken from cameras with different intrinsic and extrinsic parameters. How do we match a point in the first image to a point in the second? image I(x,y) Disparity map D(x,y) image I (x,y ) So if we could find the corresponding points in two images, we could estimate relative depth Kristen Grauman

Epipolar geometry: notation X x x Derek Hoiem Baseline line connecting the two camera centers Epipoles = intersections of baseline with image planes = projections of the other camera center Epipolar Plane plane containing baseline Epipolar Lines - intersections of epipolar plane with image planes (always come in corresponding pairs) Note: All epipolar lines intersect at the epipole.

Epipolar constraint The epipolar constraint is useful because it reduces the correspondence problem to a 1D search along an epipolar line. Kristen Grauman, image from Andrew Zisserman

X X Essential matrix T RX 0 [T ] RX 0 x Let E [T x] R XEX X T EX 0 E is called the essential matrix, and it relates corresponding image points between both cameras, given the rotation and translation. Before we said: If we observe a point in one image, its position in other image is constrained to lie on line defined by above. Turns out Ex is the epipolar line through x in the first image, corresp. to x. Note: these points are in camera coordinate systems. Kristen Grauman

Basic stereo matching algorithm For each pixel in the first image Find corresponding epipolar scanline in the right image Search along epipolar line and pick the best match x Compute disparity x-x and set depth(x) = f*t/(x-x ) Derek Hoiem

Correspondence search Left Right scanline Matching cost disparity Slide a window along the right scanline and compare contents of that window with the reference window in the left image Matching cost: e.g. Euclidean distance Derek Hoiem

Geometry for a simple stereo system Assume parallel optical axes, known camera parameters (i.e., calibrated cameras). What is expression for Z? Similar triangles (p l, P, p r ) and (O l, P, O r ): T x l Z f x r T Z depth disparity Z f T x r x l Kristen Grauman

Results with window search Data Left image Right image Window-based matching Window-based matching Ground truth Ground truth Derek Hoiem

How can we improve? Uniqueness For any point in one image, there should be at most one matching point in the other image Ordering Corresponding points should be in the same order in both views Smoothness We expect disparity values to change slowly (for the most part) Derek Hoiem

Many of these constraints can be encoded in an energy function and solved using graph cuts Before Derek Hoiem Graph cuts Ground truth Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001 For the latest and greatest: http://vision.middlebury.edu/stereo/

Projective structure from motion Given: m images of n fixed 3D points x ij = P i X j, i = 1,, m, j = 1,, n Problem: estimate m projection matrices P i and n 3D points X j from the mn corresponding 2D points x ij X j x 1j x 3j P 1 x 2j Svetlana Lazebnik P 2 P 3

Photo synth Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," SIGGRAPH 2006 http://photosynth.net/

3D from multiple images Building Rome in a Day: Agarwal et al. 2009

Recap: Epipoles Point x in left image corresponds to epipolar line l in right image Epipolar line passes through the epipole (the intersection of the cameras baseline with the image plane C C Derek Hoiem

Recap: Essential, Fundamental Matrices Fundamental matrix maps from a point in one image to a line in the other If x and x correspond to the same 3d point X: Essential matrix is like fundamental matrix but more constrained Adapted from Derek Hoiem

Recap: stereo with calibrated cameras Given image pair, R, T Detect some features Compute essential matrix E Match features using the epipolar and other constraints Triangulate for 3d structure and get depth Kristen Grauman

Texture representations

Correlation filtering Say the averaging window size is 2k+1 x 2k+1: Attribute uniform weight to each pixel Loop over all pixels in neighborhood around image pixel F[i,j] Now generalize to allow different weights depending on neighboring pixel s relative position: Non-uniform weights Kristen Grauman

Convolution vs. correlation Cross-correlation F 5 2 5 4 4 u = -1, v = -1 5 200 3 200 4 1 5 5 4 4 5 5 1 1 2 200 1 3 5 200 (i, j) 1 200 200 200 1 Convolution H.06.12.06.12.25.12.06.12.06 (0, 0)

Filters for computing gradients 1 2 1 0 0 0-1 -2-1 * = Slide credit: Derek Hoiem

Texture representation: example mean d/dx value mean d/dy value Win. #1 4 10 Win.#2 18 7 Win.#9 20 20 original image Kristen Grauman derivative filter responses, squared statistics to summarize patterns in small windows

Filter banks orientations scales Edges Bars What filters to put in the bank? Kristen Grauman Spots Typically we want a combination of scales and orientations, different types of patterns. Matlab code available for these examples: http://www.robots.ox.ac.uk/~vgg/research/texclass/filters.html

Matching with filters Goal: find in image Method 0: filter the image with eye patch g[ m, n] h[ k, l] k, l f [ m k, n l] f = image g = filter What went wrong? Derek Hoiem Input Filtered Image

Matching with filters Goal: find in image Method 1: filter the image with zero-mean eye g[ m, n] ( h[ k, l] mean( h)) ( k, l Likes bright pixels where filters are above average, dark pixels where filters are below average. f [ m k, n l]) True detections False detections Derek Hoiem Input Filtered Image (scaled) Thresholded Image

Kristen Grauman Showing magnitude of responses

Kristen Grauman

Kristen Grauman

Representing texture by mean abs response Filters Mean abs responses Derek Hoiem

Computing distances using texture Dimension 1 Dimension 2 a b #dim 1 2 2 2 2 2 1 1 ) ( ), ( ) ( ) ( ), ( i a i b i b a D b a b a b a D Kristen Grauman

Feature detection: Harris

Corners as distinctive interest points We should easily recognize the keypoint by looking through a small window Shifting a window in any direction should give a large change in intensity flat region: no change in all directions A. Efros, D. Frolova, D. Simakov edge : no change along the edge direction corner : significant change in all directions

Harris Detector: Mathematics Window-averaged squared change of intensity induced by shifting the image data by [u,v]: Window function Shifted intensity Intensity Window function w(x,y) = or 1 in window, 0 outside Gaussian D. Frolova, D. Simakov

Harris Detector: Mathematics Expanding I(x,y) in a Taylor series expansion, we have, for small shifts [u,v], a quadratic approximation to the error surface between a patch and itself, shifted by [u,v]: where M is a 2 2 matrix computed from image derivatives: D. Frolova, D. Simakov

y y y x y x x x I I I I I I I I y x w M ), ( x I I x y I I y y I x I I I y x Notation: K. Grauman Harris Detector: Mathematics

What does the matrix M reveal? Since M is symmetric, we have M X 1 0 0 X 2 T Mx i x i i The eigenvalues of M reveal the amount of intensity change in the two principal orthogonal gradient directions in the window. K. Grauman

Corner response function edge : 1 >> 2 2 >> 1 corner : 1 and 2 are large, 1 ~ 2 flat region: 1 and 2 are small Adapted from A. Efros, D. Frolova, D. Simakov, K. Grauman

Harris Detector: Algorithm Compute image gradients Ix and Iy for all pixels For each pixel Compute by looping over neighbors x, y compute (k :empirical constant, k = 0.04-0.06) Find points with large corner response function R (R > threshold) Take the points of locally maximum R as the detected feature points (i.e., pixels where R is bigger than for all the 4 or 8 neighbors) D. Frolova, D. Simakov 55

K. Grauman Example of Harris application

Feature detection: Scale-invariance

Invariance vs covariance A function is invariant under a certain family of transformations if its value does not change when a transformation from this family is applied to its argument. A function is covariant when it commutes with the transformation, i.e., applying the transformation to the argument of the function has the same effect as applying the transformation to the output of the function. [ ] [For example,] the area of a 2D surface is invariant under 2D rotations, since rotating a 2D surface does not make it any smaller or bigger. But the orientation of the major axis of inertia of the surface is covariant under the same family of transformations, since rotating a 2D surface will affect the orientation of its major axis in exactly the same way. Local Invariant Feature Detectors: A Survey by Tinne Tuytelaars and Krystian Mikolajczyk, in Foundations and Trends in Computer Graphics and Vision Vol. 3, No. 3 (2007) 177 280 Chapter 1, 3.2, 7 http://homes.esat.kuleuven.be/%7etuytelaa/ft_survey_interestpoints08.pdf

What happens if: Affine intensity change Only derivatives are used => invariance to intensity shift I I + b Intensity scaling: I a I I a I + b R threshold R x (image coordinate) x (image coordinate) Partially invariant to affine intensity change L. Lazebnik

What happens if: Image translation Derivatives and window function are shift-invariant Corner location is covariant w.r.t. translation L. Lazebnik

What happens if: Image rotation Second moment ellipse rotates but its shape (i.e. eigenvalues) remains the same Corner location is covariant w.r.t. rotation L. Lazebnik

What happens if: Scaling Corner All points will be classified as edges Corner location is not covariant to scaling! L. Lazebnik

Scale Invariant Detection Problem: How do we choose corresponding circles independently in each image? Do objects in the image have a characteristic scale that we can identify? D. Frolova, D. Simakov

Scale Invariant Detection Solution: Design a function on the region which is scale invariant (has the same shape even if the image is resized) Take a local maximum of this function f Image 1 f Image 2 scale = 1/2 Adapted from A. Torralba s 1 s 2 region size region size

Automatic Scale Selection Function responses for increasing scale (scale signature) f ( I (, )) i i 1 x i i m f ( I 1 ( x, )) m K. Grauman, B. Leibe

Automatic Scale Selection Function responses for increasing scale (scale signature) f ( I (, )) i i 1 x i i m f ( I 1 ( x, )) m K. Grauman, B. Leibe

Automatic Scale Selection Function responses for increasing scale (scale signature) f I (, )) i i i i m f ( I 1 ( x, )) m K. Grauman, B. Leibe ( 1 x

What Is A Useful Signature Function? Laplacian of Gaussian = blob detector K. Grauman, B. Leibe

Difference of Gaussian Laplacian We can approximate the Laplacian with a difference of Gaussians; more efficient to implement. 2 L Gxx x y Gyy x y (,, ) (,, ) (Laplacian) DoG G( x, y, k) G( x, y, ) (Difference of Gaussians)

Difference of Gaussian: Efficient computation Computation in Gaussian scale pyramid Sampling with step 4 =2 Original image 1 4 2 K. Grauman, B. Leibe

Find local maxima in position-scale space of Difference-of-Gaussian 5 Position-scale space: 4 3 2 Adapted from K. Grauman, B. Leibe Find places where X greater than all of its neighbors (in green) List of (x, y, s)

Laplacian pyramid example Allows detection of increasingly coarse detail

Results: Difference-of-Gaussian K. Grauman, B. Leibe

Feature description

Gradients m(x, y) = sqrt(1 + 0) = 1 Θ(x, y) = atan(0/1) = 0

Scale Invariant Feature Transform Full version Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Quantize the gradient orientations i.e. snap each gradient to one of 8 angles Each gradient contributes not just 1, but magnitude(gradient) to the histogram, i.e. stronger gradients contribute more 16 cells * 8 orientations = 128 dimensional descriptor for each detected feature Adapted from L. Zitnick, D. Lowe

Scale Invariant Feature Transform Full version Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Quantize the gradient orientations i.e. snap each gradient to one of 8 angles Each gradient contributes not just 1, but magnitude(gradient) to the histogram, i.e. stronger gradients contribute more 16 cells * 8 orientations = 128 dimensional descriptor for each detected feature Normalize + clip (threshold normalize to 0.2) + normalize the descriptor After normalizing, we have: such that: 0.2 Adapted from L. Zitnick, D. Lowe

Making descriptor rotation invariant CSE 576: Computer Vision Rotate patch according to its dominant gradient orientation This puts the patches into a canonical orientation K. Grauman Image from Matthew Brown

Keypoint matching

Matching local features? Image 1 Image 2 To generate candidate matches, find patches that have the most similar appearance (e.g., lowest feature Euclidean distance) Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance) K. Grauman

Robust matching???? K. Grauman Image 1 Image 2 At what Euclidean distance value do we have a good match? To add robustness to matching, can consider ratio : distance to best match / distance to second best match If low, first match looks good. If high, could be ambiguous match.

Ratio: example Let q be the query from the first image, d1 be the closest match in the second image, and d2 be the second closest match Let dist(q, d1) and dist(q, d2) be the distances Let r = dist(q, d1) / dist(q, d2) What is the largest that r can be? What is the lowest that r can be? If r is 1, what do we know about the two distances? What about when r is 0.1?

Indexing local features: Setup When we see close points in feature space, we have similar descriptors, which indicates similar local content. Descriptor s feature space Query image K. Grauman Database images

Image matching

times appearing times appearing times appearing Describing images w/ visual words Summarize entire image based on its distribution (histogram) of word occurrences. Analogous to bag of words representation commonly used for documents. Feature patches: K. Grauman Visual words

Bag of visual words: Two uses 1. Represent the image 2. Using that representation, look for similar images 3. Can also use BOW to compute an inverted index, to simplify application #2

Visual words: main idea Extract some local features from a number of images e.g., SIFT descriptor space: each point is 128-dimensional D. Nister, CVPR 2006

D. Nister, CVPR 2006 Visual words: main idea

D. Nister, CVPR 2006 Quantize the space by grouping (clustering) the features. Note: For now, we ll treat clustering as a black box.

Inverted file index and bags of words similarity w 91 1. (offline) Extract features in database images, cluster them to find words, make index 2. Extract words in query (extract features and map each to closest cluster center) 3. Use inverted file index to find frames relevant to query 4. For each relevant frame, rank them by comparing word counts (BOW) of query and frame Adapted from K. Grauman

precision Scoring retrieval quality Query Database size: 10 images Relevant (total): 5 images (e.g. images of Golden Gate) Results (ordered): precision = # returned relevant / # returned recall = # returned relevant / # total relevant 1 0.8 0.6 0.4 0.2 Ondrej Chum 0 0 0.2 0.4 0.6 0.8 1 recall