C18 Computer Vision. Lecture 1 Introduction: imaging geometry, camera calibration. Victor Adrian Prisacariu.

Similar documents
Vision Review: Image Formation. Course web page:

Structure from motion

Structure from motion

Outline. ETN-FPI Training School on Plenoptic Sensing

Geometric camera models and calibration

COSC579: Scene Geometry. Jeremy Bolton, PhD Assistant Teaching Professor

Augmented Reality II - Camera Calibration - Gudrun Klinker May 11, 2004

calibrated coordinates Linear transformation pixel coordinates

Agenda. Rotations. Camera models. Camera calibration. Homographies

Visual Recognition: Image Formation

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Agenda. Rotations. Camera calibration. Homography. Ransac

CSE 252B: Computer Vision II

Camera Model and Calibration. Lecture-12

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

C / 35. C18 Computer Vision. David Murray. dwm/courses/4cv.

Pin Hole Cameras & Warp Functions

Computer Vision cmput 428/615

CS4670: Computer Vision

Lecture 9: Epipolar Geometry

Rectification and Distortion Correction

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263

Projective Geometry and Camera Models

BIL Computer Vision Apr 16, 2014

Camera Model and Calibration

Computer Vision: Lecture 3

Unit 3 Multiple View Geometry

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Two-view geometry Computer Vision Spring 2018, Lecture 10

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482

MERGING POINT CLOUDS FROM MULTIPLE KINECTS. Nishant Rai 13th July, 2016 CARIS Lab University of British Columbia

CS201 Computer Vision Camera Geometry

Structure from Motion

Lecture 3: Camera Calibration, DLT, SVD

Agenda. Perspective projection. Rotations. Camera models

Cameras and Stereo CSE 455. Linda Shapiro

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

Structure from Motion CSC 767

Camera Geometry II. COS 429 Princeton University

Recovering structure from a single view Pinhole perspective projection

Pin Hole Cameras & Warp Functions

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Introduction to Homogeneous coordinates

Computer Vision Lecture 17

Computer Vision Lecture 17

3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: ,

Final Exam Study Guide

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

1 Projective Geometry

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Week 2: Two-View Geometry. Padua Summer 08 Frank Dellaert

Flexible Calibration of a Portable Structured Light System through Surface Plane

Camera calibration. Robotic vision. Ville Kyrki

CS6670: Computer Vision

2D/3D Geometric Transformations and Scene Graphs

LUMS Mine Detector Project

3D Sensing. 3D Shape from X. Perspective Geometry. Camera Model. Camera Calibration. General Stereo Triangulation.

Geometry of image formation

CS6670: Computer Vision

Stereo and Epipolar geometry

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

Project 4 Results. Representation. Data. Learning. Zachary, Hung-I, Paul, Emanuel. SIFT and HoG are popular and successful.

Camera model and multiple view geometry

Pinhole Camera Model 10/05/17. Computational Photography Derek Hoiem, University of Illinois

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

METR Robotics Tutorial 2 Week 2: Homogeneous Coordinates

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Camera models and calibration

Projective Geometry and Camera Models

Structure from motion

Dense 3D Reconstruction. Christiano Gava

CSE 252B: Computer Vision II

Math background. 2D Geometric Transformations. Implicit representations. Explicit representations. Read: CS 4620 Lecture 6

Autonomous Navigation for Flying Robots

More Mosaic Madness. CS194: Image Manipulation & Computational Photography. Steve Seitz and Rick Szeliski. Jeffrey Martin (jeffrey-martin.

An idea which can be used once is a trick. If it can be used more than once it becomes a method

3D Modeling using multiple images Exam January 2008

Instance-level recognition I. - Camera geometry and image alignment

3D Geometry and Camera Calibration

Computer Vision I - Appearance-based Matching and Projective Geometry

Geometric Transformations

All human beings desire to know. [...] sight, more than any other senses, gives us knowledge of things and clarifies many differences among them.

CSE 252B: Computer Vision II

Computer Vision CSCI-GA Assignment 1.

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR

Lecture 14: Basic Multi-View Geometry

Multiview Stereo COSC450. Lecture 8

Structure from Motion and Multi- view Geometry. Last lecture

3D Vision Real Objects, Real Cameras. Chapter 11 (parts of), 12 (parts of) Computerized Image Analysis MN2 Anders Brun,

Epipolar Geometry Prof. D. Stricker. With slides from A. Zisserman, S. Lazebnik, Seitz

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Image Transformations & Camera Calibration. Mašinska vizija, 2018.

CSCI 5980/8980: Assignment #4. Fundamental Matrix

Transcription:

C8 Computer Vision Lecture Introduction: imaging geometry, camera calibration Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor

InfiniTAM Demo

Course Content VP: Intro, basic image features, basic multiview geometry Introduction: imaging geometry, camera calibration. Salient feature detection: points, edges and SIFTs. Recovering 3D from two images I: epipolar geometry. Recovering 3D from two images II: stereo correspondences, triangulation, ego-motion. Slides at http://www.robots.ox.ac.uk/~victor -> Teaching Lots borrowed from David Murray + AV C8. AV: Neural networks and applications Small baseline correspondences and optic flow. Large scale image retrieval. Fundamentals of convolutional neural networks. Object detection and recognition.

Useful Texts Multiple View Geometry in Computer Vision Richard Hartley, Andrew Zisserman Computer Vision: A Modern Approach David Forsyth, Jean Ponce Prentice Hall; ISBN:0308598 3-Dimensional Computer Vision: A Geometric Viewpoint Olivier Faugeras

Computer Vision: This time. Introduction: imaging geometry, camera calibration.. Introduction. 2. The perspective camera as a geometric device. 3. Perspective using homogeneous coordinates. 4. Calibration the elements of the perspective model. 2. Salient feature detection: points, edges and SIFTs. 3. Recovering 3D from two images I: epipolar geometry. 4. Recovering 3D from two images II: stereo correspondences, triangulation, ego-motion.

. Introduction Aim in geometric computational vision is to take a number of 2D images, and obtain an understanding of the 3D environment; what is in it; and how it evolves over time. What do we have here? seems very easy

Wrong! It s a very hard big data problem From the hardware engineering perspective The raw throughput is unsettlingly large: Colour stereo pair at 30Hz -> 00s MB/s. Now multiply by non-trivial processing cost per byte Image collections are huge. Certainly challenging, but no longer frightening. From applied maths perspective Scene info is made highly implicit or lost by reprojection. Inverse mappings 2D -> 3D ill-posed, ill-conditioned. Can only be solved by introducing constraints: about the way the world actually works; or about the way we expect it to work. We now know about these. No longer frightening.

Wrong! It s a very hard big data problem From the Information-Engineering / AI perspective Images have uneven information content both absolutely and contextually. Computational visual semantics: what does visual stuff mean exactly? If we are under time pressure, what is the important visual stuff right now? Still a massive challenge if we want genuine autonomy.

Natural vision is a hard problem But we see effortlessly! Yep, spot on if one neglects: the 0 neurons involved. aeons of evolution generating hardwire priors P(I). that we sleep with eyes shut, and avert gaze when thinking. Very hard work for our brains does machine vision have a hope? Perhaps building a general vision system is a flawed concept. Evidence that the human visual system is a bag of tricks specialized processing for specialized tasks. Perhaps we should expect no more of computer vision? However, the h.v.s. does give us a convincing magic show. Each trick flows seamlessly into the next. Do we have an inkling how that will be achieved?

So why bother? What are the advantages? From a sensor standpoint Vision is a passive sensor (unlike sonar, radar, lasers). Wide range of depths, overlaps and complements other sensors. Wide diversity of methods to recover 3D information: so vision has process redundancy. Images provide data redundancy. From a natural communications standpoint The world is awash with information-rich photons. Because we have eyes, vision provides a natural language of communication. If we want robots/man-machine-interfaces to act and interact in our environments, they have to speak that language.

Organizing the tricks Although human and computer vision might be bags of tricks, it is useful to place the tricks within larger processing paradigms. For example: a) Data-driven, bottom-up processing. b) Model-driven, top-down, generative processing. c) Dynamic Vision (mixes bottom-up with top-down feedback). d) Active Vision (task oriented). e) Data-driven discriminative approach (machine learning). These are neither all-embracing nor exclusive.

(a) Data-driven, bottom-up processing Image processing produces map of salient 2D features. Features input into a range of shape from X processes whose output was the 2.5D sketch. Only in the last stage we get a fully 3D objectcentered description.

(b) Model-driven, and (c) Dynamic vision Model-driven, top-down, generative processing: a model of the scene is assumed known. Supply a pose for the object relative to the camera, and use projection to predict where salient features should be found in the image space. Search for the features, and refine the pose by minimizing the observed deviation. Dynamic vision: mixes bottomup/top-down by introducing feedback. Top-down Dynamic

(d) Active Vision Introduces task-oriented sensing-perceptionaction loops: Visual data needs only be good enough to drive the particular action. No need to build and maintain an overarching representation of the surroundings. Computational resources focused where they are needed.

(e) Data-driven approach The aim is to learn a description of the transformation between input and output using exemplars. Geometry is not forgotten, but implicit learned representation are favored.

.2 The perspective camera as a geometric device

This is (a picture of) my cat 0 520 x = 295 308 Cat nose 520

My cat lives in a 3D world x = x. x / X = X. X / X 0 The point X in world space projects to the point x in image space

Going from X in 3D to x in 2D x = x. x /? X = X. X / X 0 film/sensor cat Output would be blurry L if film just exposed to the cat

Going from X in 3D to x in 2D x = x. x /? X = X. X / X 0 film/sensor barrier cat Blur reduced, looks good J

Pinhole Camera x = x. x /? X = X. X / X 0 Image Plane pinhole cat All rays pass through the center of projection (a single point). Image forms on the image plane.

Pinhole Camera image plane f X = X. X / X 0 p o Optical axis x = x. x / f focal length o camera origin p principal point The 3D point X = x. x / = f 4 5 4 6 f 4 7 4 6 X. X / X 0 is imaged into x = x. x / as:

.2 The perspective camera as a geometric device

Homogeneous coordinates The projection x = fx/x 0 is non-linear L. Can be made linear using homogeneous coordinates involves representing the image and scene in higher dimensional space. Limiting cases e.g. vanishing points are handled better. Homogeneous coordinates allow for transformations to be concatenated more easily.

3D Euclidean transforms: inhomogeneous coordinates My cat moves through 3D space. The movement of the tip of the nose can be described using an Euclidean transform: ; X 0. = R 0 0 X 0. + t 0. rotation translation

3D Euclidean transforms: inhomogeneous coordinates ; Euclidean transform: X 0. = R 0 0 X 0. + t 0. Concatenation of successive transform is a mess! X. = R. X + t. X / = R / X. + t / X / = R / R. X + t. + t / = R / R. X + R / t + t /.

3D Euclidean transforms: homogeneous coordinates We replace the 3D points X Y Z The Euclidean transform becomes: X = E X with a four vector = R t 0 G X X Y Z. Transformations can now be concatenated by matrix multiplication: X. = E.H X H X / = E /. X. X / = E /. E.H X 0

Homogeneous coordinates definition in R 0 X = X, Y, Z G is represented in homogeneous coordinates by any 4-vector X. X / X 0 X L such that X = X. /X L, Y = X / /X L, and Z = X 0 /X L. So the following homogeneous vectors represent the same point, for any λ 0: X. X / X 0 X L and λ X. X / X 0 X L E.g. 2,3,5, G is the same as 3, 4.5, 7.5,.5 G and both represent the same inhomogeneous point 2,3,5 G

Homogeneous coordinates definition in R / x = x, y G is represented in homogeneous coordinates by any 3-vector x. x / x 0 such that x = x. /x 0, y = x / /x 0. E.g.,2,3 G is the same as 3,6,9 G and both represent the same inhomogeneous point 0.33,0.66 G

Homogeneous notation rues for use. Convert the inhomogeneous point to an homogeneous vector: X X Y Y Z Z 2. Apply a 4 4 transform. 3. Dehomogenize the resulting vector: X. X / X 0 X L X. /X L X / /X L X 0 /X L

Projective transformations A projective transformation is a linear transformation on homogeneous 4-vectors represented by a non-singular 4x4 matrix. X. X / X 0 X L = p.. p./ p./ p.l p /. p // p /0 p /L p 0. p 0/ p 00 p 0L p L. p L/ p L0 p LL X. X / X 0 X L The effect on the homogenous points is that the original and transformed points are linked through a projection center. The 4x4 matrix is defined up to scale, and so has 5 degrees of freedom.

More 3D-3D and 2D-2D Transforms Projective (5 dof): X. X / = P X L L 0 X L Affine (2 dof): X = A 0 0 t 0 0 G Similarity (7 dof): X = SR 0 0 t 0 0 G Euclidean (6 dof): X = R 0 0 t 0 0 G X. X / X 0 X L X X X Projective (aka Homography, 8 dof): x ;. x ; / x ; 0 Affine (6 dof): = H 0 0 x. x / x 0 x = A 2 2 t / 0 G Similarity (5 dof): x = SR / / t / 0 G Euclidean (4 dof): x = R / 2 t 2 0 G x x x

2D-2D Transform Examples cos θ sin θ t a sin θ cos θ t b 0 0 scos θ ssin θ t a ssin θ scos θ t b 0 0 a.. a./ t a a /. a // t b 0 0 h.. h./ h./ h /. h // h /0 h 0. h 0/ h 00 Euclidean 3 DoF Similarity 4 DoF Affine 6 DoF Projective 8 DoF

Perspective 3D-2D Transforms Similar to a 3D-3D projective transform, but constrain the transformed point to a plane z = f. x. x z = f X ijklm = / f Because z = f is fixed, we can write: x. p.. p./ p./ p.l x λ / p /. p // p /0 p /L = f fp 0. fp 0/ fp 00 fp 0L p 0. p 0/ p 00 p 0L X. X / X 0 The 3 rd row is redundant, so: λ x. x / = p.. p./ p./ p.l p /. p // p /0 p /L p 0. p 0/ p 00 p 0L X. X / X 0 = P 0 L X. X / X 0 P 0 L is the projection matrix and this is a perspective transform

.3 Perspective using homogeneous coordinates x = x. x / x. x / = f 4 5 4 6 f 4 7 4 6 X = X. X / X 0 λ x. x / = f 0 0 0 0 f 0 0 0 0 0 X. X / X 0 λx. = fx. λx / = fx / λ = X 0 x. = f X. X 0 x / = f X / X 0

Perspective using homogeneous coordinates Image Point Projection Matrix World Point λ x. x / = f 0 0 0 0 f 0 0 0 0 0 X. X / X 0

Perspective using homogeneous coordinates It is useful to split up the overall projection matrix into three parts:. a part that depends on the internals of the camera 2. a vanilla projection matrix 3. a Euclidean transformation between the world and camera frames. We first assume the scene and world are aligned with the camera coords, so that the extrinsic camera matrix is identity and get: Image Point Camera s Intrinsic Calibration Projection matrix (vanilla) Camera s Extrinsic Calibration World Point λ x y f 0 0 0 f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X

Perspective using homogeneous coordinates Now let s make things more general: Insert a rotation R and translation t between world and camera coordinates. Insert some extra term in the intrinsic calibration matrix. Image Point Camera s Intrinsic Calibration Projection matrix (vanilla) Camera s Extrinsic Calibration World Point λ x y f sf u H 0 γf v r 0 0 0 0 0 0 0 0 0 0 0 r.. r./ r.0 t. r /. r // r /0 t / r 0. r 0/ r 00 t 0 0 0 0 X

The camera pose (extrinsic parameters) The camera s extrinsic calibration is just the rotation R and translation t that take points from the world frame to the camera frame. X t = R t 0 G X u

Building R R captures rotation and can be built from various types of rotation representations (Euler angles, quaternions, etc.). Euler angles capture the angles of rotation axis using 3 parameters, one for each axis. X ; = R v X u = cos θ v sin θ v 0 sin θ v cos θ v 0 0 0 X w X ;; = R b X ; = X x = R a X ;; = cos θ b 0 sin θ b 0 0 X ; sin θ b 0 cos θ b 0 0 0 cos θ a ± sin θ a X ;; 0 sin θ a cos θ a R {u = R a R b R v Order matters!

Building t

Inverting the transform R {u t {u 0 G. = Ru{ t u{ 0 G For rotation: R u{ = R {u. = R {u G For translation: t u{ = t {u t u{ = R u{ t {u

The intrinsic calibration parameters Describe hardware properties of real cameras: The image plane might be skewed. The central axis of the lens might not line up with the optical axis. The light gathering elements might not be square. Lens distortion. K = f 0 0 0 γf 0 0 0 0 ~ 0 0 0 s 0 0 0 0 0 = f sf u r 0 γf v r 0 0 different scaling on x and y γ is the aspect ratio. Origin offset, (u r, v r ) is the principal point. s accounts for skew

Summary of steps from Scene to Image. Move scene point X u, G into camera coordinate by 4 4 extrinsic Euclidean transformation: X { R t = X u 0 G 2. Project into ideal camera via a vanilla perspective transformation: x = I 0 X{ 3. Map the ideal image into the real image using intrinsic matrix: x = K x

.4 Camera Calibration The process that finds K, and accounts for the internal physical characteristics of the camera. (Usually) done once per camera. There are a variety of method for self-calibration, auto-calibration or pre-calibration. We will gloss over pre-calibration, using a specially made known visual scene.

What is camera calibration? Image Point Camera s Intrinsic Calibration Projection matrix (vanilla) Camera s Extrinsic Calibration World Point λ x y f sf u H 0 γf v r 0 0 0 0 0 0 0 0 0 0 0 r.. r./ r.0 t. r /. r // r /0 t / r 0. r 0/ r 00 t 0 0 0 0 X λx K [I 0] P 0 L = K[R t] R t 0 G X Camera calibration: recover K

Camera Calibration Math Part. Recover overall projection matrix P 0 L. Assume target with at least 6 known scene points. Build and solve system of (at least) 2 equations. 2. Construct P Š Œ x = p.. KR p./ from p.0 p X.L leftmost 3x3 block λ y = p of P = K[R t]. /. p // p /0 p Y /L Z p 0. p 0/ p 00 p 0L 3. Invert P Š Œ so P Š Œ = R. K.. λ = p 4. Decompose P 0. X + p Š Œ using 0/ Y + p 00 Z + p QR decomposition 0L p 0. X + p 0/ Y + p 00 Z + p 0L x = p.. X + p./ Y + p.0 Z + p.l into p 0. RX and + p 0/ Y K. + p 00 Z + p 0L y = p /. X + p // Y + p /0 Z + p /L 5. Normalise K (as scale of P was unknown). X Y Z 0 0 6. Recover t = K. 0 0 X x Y x Z x x 0 0 0 0 X Y Z p.l X y p Y /L y Z p G /L y y. p = 0 0 where p contains the unknowns.

Camera Calibration Math Part. Recover overall projection matrix P 0 L. Assume target with at least 6 known scene points. Build and solve system of (at least) 2 equations. 2. Construct P Š Œ = KR from leftmost 3x3 block of P = K[R t]. 3. Invert P Š Œ so P Š Œ = R. K.. 4. Decompose P P Š Œ using G QR decomposition p into R and K... p./ p.0 p.l P = p /. p // p /0 p /L 5. Normalise K (as pscale 0. 0/ of p 00 Pp 0L was unknown). 6. Recover t = K. p.l p /L p G /L.

Camera Calibration Math Part. Recover overall projection matrix P 0 L. Assume target with at least 6 known scene points. Build and solve system of (at least) 2 equations. 2. Construct P Š Œ = KR from leftmost 3x3 block of P = K[R t]. 3. Invert P Š Œ so P Š Œ = R. K.. 4. Decompose P Š Œ using QR decomposition into R and K. 5. Normalise K (as scale of - P was unknown). - - 6. Recover P. t = K R. K K p.l p /L p G G /L. R

Camera Calibration Math Part. Recover overall projection matrix P 0 L. Assume target with at least 6 known scene points. Build and solve system of (at least) 2 equations. 2. Construct P Š Œ = KR from leftmost 3x3 block of P = K[R t]. 3. Invert P Š Œ so P Š Œ = R. K.. 4. Decompose P Š Œ using QR decomposition into R and K. 5. Normalise K (as scale of P was unknown). 6. Recover t = K. p.l p /L p G 0L.

Camera Calibration Example Algorithm

Radial Distortion So far, we have figured out the transformations that turn our camera into a notional camera with the world and the camera coordinates aligned and an ideal image plane. One often has to correct for the other optical distortions and aberrations. Radial distortion is the most common see the Q sheet. Correction for this distortion is applied before carrying out calibration.

Practical Camera Calibration Matlab Calibration Toolkit http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/example.html

Summary of Lecture In this lecture we have: Introduced the aims of computer vision, and some paradigms. Explored linear transformations and introduced homogeneous coordinates. Defined perspective projection from a scene, and saw that it could be made linear using homogeneous coordinates. Discussed how to pre-calibrate a camera using image of six or more known scene points.