Introduction to Homogeneous coordinates

Similar documents
Camera Model and Calibration

Camera Model and Calibration. Lecture-12

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

CSE 252B: Computer Vision II

Perspective Projection in Homogeneous Coordinates

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

COMP 558 lecture 19 Nov. 17, 2010

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

Computer Graphics Hands-on

Augmented Reality II - Camera Calibration - Gudrun Klinker May 11, 2004

Today. Today. Introduction. Matrices. Matrices. Computergrafik. Transformations & matrices Introduction Matrices

CIS 580, Machine Perception, Spring 2016 Homework 2 Due: :59AM

Lecture 3: Camera Calibration, DLT, SVD

COMP30019 Graphics and Interaction Three-dimensional transformation geometry and perspective

DD2429 Computational Photography :00-19:00

3D Geometry and Camera Calibration

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Early Fundamentals of Coordinate Changes and Rotation Matrices for 3D Computer Vision

COSC579: Scene Geometry. Jeremy Bolton, PhD Assistant Teaching Professor

Camera model and multiple view geometry

To Do. Outline. Translation. Homogeneous Coordinates. Foundations of Computer Graphics. Representation of Points (4-Vectors) Start doing HW 1

Visual Recognition: Image Formation

N-Views (1) Homographies and Projection

Math background. 2D Geometric Transformations. Implicit representations. Explicit representations. Read: CS 4620 Lecture 6

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482

Maths for Signals and Systems Linear Algebra in Engineering. Some problems by Gilbert Strang

3-D D Euclidean Space - Vectors

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

(Refer Slide Time: 00:04:20)

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation

CS201 Computer Vision Camera Geometry

Representing 2D Transformations as Matrices

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Vector Algebra Transformations. Lecture 4

3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: ,

Midterm Exam Fundamentals of Computer Graphics (COMP 557) Thurs. Feb. 19, 2015 Professor Michael Langer

Hello, welcome to the video lecture series on Digital Image Processing. So in today's lecture

(Refer Slide Time: 00:02:02)

MERGING POINT CLOUDS FROM MULTIPLE KINECTS. Nishant Rai 13th July, 2016 CARIS Lab University of British Columbia

DD2423 Image Analysis and Computer Vision IMAGE FORMATION. Computational Vision and Active Perception School of Computer Science and Communication

Texture Mapping. Texture (images) lecture 16. Texture mapping Aliasing (and anti-aliasing) Adding texture improves realism.

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Rectification and Distortion Correction

Lecture 4: Transforms. Computer Graphics CMU /15-662, Fall 2016

Introduction to Computer Vision

Camera model and calibration

Two-view geometry Computer Vision Spring 2018, Lecture 10

Short on camera geometry and camera calibration

Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II

2D/3D Geometric Transformations and Scene Graphs

Pinhole Camera Model 10/05/17. Computational Photography Derek Hoiem, University of Illinois

Pin Hole Cameras & Warp Functions

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

CS231A Course Notes 4: Stereo Systems and Structure from Motion

Stereo Vision. MAN-522 Computer Vision

1 Affine and Projective Coordinate Notation

Computer Vision cmput 428/615

3D Mathematics. Co-ordinate systems, 3D primitives and affine transformations

1 Projective Geometry

3D Modeling using multiple images Exam January 2008

Robotics - Projective Geometry and Camera model. Marcello Restelli

lecture 16 Texture mapping Aliasing (and anti-aliasing)

C / 35. C18 Computer Vision. David Murray. dwm/courses/4cv.

HW 1: Project Report (Camera Calibration)

Basic Elements. Geometry is the study of the relationships among objects in an n-dimensional space

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

3D Geometry and Camera Calibration

So we have been talking about 3D viewing, the transformations pertaining to 3D viewing. Today we will continue on it. (Refer Slide Time: 1:15)

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

Transforms. COMP 575/770 Spring 2013

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

Lecture 14: Basic Multi-View Geometry

Midterm Exam Solutions

Computer Vision: Lecture 3

Computer Vision I Name : CSE 252A, Fall 2012 Student ID : David Kriegman Assignment #1. (Due date: 10/23/2012) x P. = z

Multiple View Geometry in Computer Vision

Module 6: Pinhole camera model Lecture 32: Coordinate system conversion, Changing the image/world coordinate system

Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland

Single View Geometry. Camera model & Orientation + Position estimation. What am I?

Camera Calibration. COS 429 Princeton University

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Assignment 2 : Projection and Homography

Epipolar Geometry Prof. D. Stricker. With slides from A. Zisserman, S. Lazebnik, Seitz

COMP 175 COMPUTER GRAPHICS. Ray Casting. COMP 175: Computer Graphics April 26, Erik Anderson 09 Ray Casting

Recovering structure from a single view Pinhole perspective projection

CS4670: Computer Vision

COMP30019 Graphics and Interaction Transformation geometry and homogeneous coordinates

EE795: Computer Vision and Intelligent Systems

Epipolar Geometry and the Essential Matrix

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors.

Image Formation I Chapter 2 (R. Szelisky)

CS 4204 Computer Graphics

Geometric Transformations

Transformations. Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts University of New Mexico

Pin Hole Cameras & Warp Functions

Basics of Computational Geometry

521466S Machine Vision Exercise #1 Camera models

Multiple View Geometry in Computer Vision

Unit 1 Algebraic Functions and Graphs

Transcription:

Last class we considered smooth translations and rotations of the camera coordinate system and the resulting motions of points in the image projection plane. These two transformations were expressed mathematically in a slightly different way. A translation was expressed by a vector subtraction, and a rotation was expressed by a matrix multiplication. Introduction to Homogeneous coordinates Today we will learn about an alternative way to represent translations and rotations which allows them both to be expressed as a matrix multiplication. This has certain computational and notational conveniences, and eventually leads us to a wider and interesting class of transformations. Suppose we wish to translate all points (X, Y, ) by adding some constant vector (t x, t y, t z ) to all coordinates. So how do we do it? The trick is to write out scene points (X, Y, ) as points in R 4, and we do so by tacking on a fourth coordinate with value. We can then translate by performing a matrix multiplication: X + t x Y + t y + t z = 0 0 t x 0 0 t y 0 0 t z 0 0 0 We can perform rotations using a 4 4 matrix as well. Rotations matrices go into the upper-left 3 3 corner of the 4 4 matrix: 0 0 0 0 0 0 i.e. nothing interesting happens in the fourth row or column. Last lecture we looked at rotations about the X, Y, axes only. We will discuss other rotation matrices later today. Notice that we can also scale the three coordinates: σ X X σ Y Y σ = σ X 0 0 0 0 σ Y 0 0 0 0 σ 0 0 0 0 This scaling doesn t come up so much in 3D, but it does come up when we do 2D transformations, as we will see later today. We have represented a 3D point (X, Y, ) as a point (X, Y,, ) in R 4. We now generalize this by allowing ourselves to represent (X, Y, ) as any 4D vector of the form (wx, wy, w, w) where w 0. Note that the set of points { (wx, wy, w, w) : w 0 } is a line in R 4 that passes through the origin and through the point (X, Y,, ) in R 4. That is, we are associating each point in R 3 with a line in R 4, in particular, a line that passes through the origin. This representation is called homogeneous coordinates. X Y X Y..

Is this generalization consistent with the 4 4 rotation, translation, and scaling matrices which we introduced above? Yes, it is. For any X = (X, Y,, ) and indeed for any 4 4 matrix M, we have w(mx) M(wX), Thus, multiplying each component of the vector X by w and transforming by M yields the same vector as transforming X itself by M and then muliplying each component of M X by w. But multiplying each component of a 4D vector X by w doesn t change how that vector is transformed. Points at infinity We have considered points (wx, wy, w, w) under the condition that w 0, and such a point is associated with (X, Y, ) R 3. What happens if we let w = 0? Does this mean anything? Consider (X, Y,, ǫ) where ǫ > 0, and so (X, Y,, ǫ) ( X ǫ, Y ǫ, ǫ, ). If we let ǫ 0, then the corresponding 3D point ( X, Y, ) goes to infinity, and stays along the line ǫ ǫ ǫ from the origin through the point (X, Y,, ). We thus identify the limit (X, Y,, 0) as ǫ 0 with a 3D point at infinity, namely a point in direction (X, Y, ). What happens to a point at infinity when we perform a rotation, translation, or scaling? Since the bottom row of each of these 4 4 matrices is (0,0,0,), it is easy to see that these transformations map points at infinity to points at infinity. In particular, a translation matrix does not affect a point at infinity; i.e. it behaves the same as the identity matrix; a rotation matrix maps a point at infinity in exactly the same way it maps a finite point, namely, (X, Y,, ) rotates to (X, Y,, ) if and only if (X, Y,, 0) rotates to (X, Y,, 0). a scale matrix maps a point at infinity in exactly the same way it maps a finite point, namely, (X, Y,, ) maps to (σ X X, σ Y Y, σ, ) if and only if (X, Y,, 0) scales to (σ X X, σ Y Y, σ, 0). We sometimes interpret points at infinity as direction vectors, that is, they have a direction but no position. One must be careful in refering to them in this way, though, since vectors have a length whereas points at infinity (X, Y,, 0) do not have a length. Homogenous coordinates in 2D One can use homogenous coordinates for D or 2D or higher dimensional spaces as well. For example, take 2D. Suppose we wish to translate all points (x, y) by adding some constant vector (t x, t y ). This transformation cannot be achieved by a 2 2 matrix, so we tack on a third coordinate with value (x, y, ), and translate by performing a matrix multiplication: x + t x y + t y = 0 t x 0 t y 0 0 2 x y.

We can perform rotations using a 3 3 matrix as well, namely a 2D rotations matrix goes into the upper-left 2 2 corner: cosθ sin θ 0 sin θ cosθ 0 0 0 and we can also scale the two coordinates: σ x x σ y y = or perform a shear (not mentioned in class) x + sy y = σ X 0 0 0 σ Y 0 0 0 s 0 0 0 0 0 The previous arguments about points at infinity are exactly the same for 2D points (x, y). Consider a point (x, y, ǫ) and ask what happens as ǫ 0. We get ( x, y, ) which is in the direction ǫ ǫ of (x, y) but which goes to an infinite distance from the origin as ǫ 0. Thus, (x, y, 0) represents a 2D point at infinity in direction of vector (x, y). World vs. camera coordinates Last lecture, we defined 3D points in the camera coordinate system. However, often we would also like to express the position of points (X, Y, ) in a world coordinate system which is independent of the camera. This should make sense intuitively: we would like to be able to speak about the positions of objects independently of where the camera is. If we are going to express point positions both in camera and in world coordinates, then we will need a way of transforming between these coordinate systems. We need to define a transformation from point (X w, Y w, w ) in world coordinates to its location (X c, Y c, c ) in the camera coordinates. We will use 4 4 translation and rotation matrix to do so. Rotation matrices [Note: change this in 200 consider reflections too!] Earlier this lecture we mentioned rotation matrices but didn t spell out any details. Now we consider a rotation matrix whose rows define the directions of the camera s canonical axes in world coordinates. The three rows are u,v,n are orthonormal unit vectors. The third row n = (n x, n y, n z ) is the direction of the optical axis ( axis ) of the camera, represented in world coordinates. The other two rows are the camera s x and y axes which are perpendicular to the n vector. These define the left-right and up-down directions of the camera. We can write the 3 3 rotation matrix as: R = u x u y u z v x v y v z n x n y n z Note that this matrix maps u, v, n to (,0,0), (0,,0), (0,0,) respectively. 3. x y x y..

What if we write this rotation using a 4 4 matrix? u x u y u z 0 v x v y v z 0 n x n y n z 0. 0 0 0 This matrix now maps the direction vectors u, v, n (written with a fourth coordinate which is 0) to the direction vectors (, 0, 0, 0), (0,, 0, 0), (0, 0,, 0), respectively. Camera extrinsic (or external) parameters Suppose the position of the camera s center in world coordinates is a 3D point C. If we wish to transform X w into the camera s coordinate system, we first subtract off C and then we perform a rotation: X w R(X w C) = X c. Use 3D vectors and 3 3 matrices, we can write this as follows, X c = RX w RC. Alternatively, we could use homogeneous coordinates, and write X c [ ] X w Y c R RC c = Y w 0 w. so the transformation from world to camera coordinates is the product of a 4 4 translation matrix and a 4 4 rotation matrix. [ ] [ ][ ] R RC R 0 I C = 0 0 0 where I is the 3 3 identity matrix. The rotation matrix R and the translation vector C define the camera s extrinsic coordinates, namely its orientation and position, respectively, in world coordinates. Projection matrix (3D to 2D) Recall from lecture that we defined the projection from 3D (X, Y, ) into the 2D coordinates (x, y) via (x, y) = (f X, f Y ). Let s now rewrite this 2D point in homogeneous coordinates: (x, y, ) = (f X, f Y, ) 4

and note that if 0 then (f X, f Y, ) (fx, fy, ) This equivalence allows us to write the projection from a 3D point (X, Y, ) (which is in camera coordinates) to its 2D image point (x, y), using a 3 4 matrix: fx fy = f 0 0 0 0 f 0 0 0 0 0 The projection matrix is not invertible. It is of rank 3. In particular, because the matrix above maps a 4D space to a 3D space, there must be some non-zeo vector that maps to (0, 0, 0). This vector defines the null space of the projection. What is this null vector? Answer: by inspection, it is the center of projection (0,0,0,). Note that the point (0, 0, 0) is undefined, where interpreted as a 2D point in homogeneous coordinates. It is neither a finite point (x, y, ), nor a point at infinity (x, y, 0), where I am assuming here that at least one of x, y is non-zero. Camera intrinsic (or internal) parameters When we are working with real cameras, typically we do not want to index points on the projection plane by their actual position on the projection plane (measured in say millimeters). Rather we want to index points by pixel numbers. Transforming from real physical positions to pixel positions involves a scaling (since the width of a pixel is typically not a standard unit like mm) and a translation (since the principal point typically doesn t correspond exactly to center of the image sensor). The sensor is made up of a square grid of sensor elements (pixels). We can talk about the position which is the center of this grid. For example, if the grid had 536 rows and 2048 columns, then we would be talking about the position on the sensor that is halfway between row 767 and 768 and column 023 and 024 (assuming we start our count from 0). We define a coordinate system on the grid in whatever units we want, say millimeters. We define the coordinate system such that the origin is at center of the grid and the (u, v) axes are parallel to the camera s x, y axes, respectively. We have expressed the position on the sensor plane in physical coordinates, in units of millimeters. What we want instead is to express it in terms of pixel units. We can do this by scaling. We multiply the x, y coordinates by the number of pixels per mm in the x and y direction. These scale factors are denoted m x, m y. Surprisingly, these are not always exactly the same value, though they are usually very close. The projection and scaling transformation is now: fm x 0 0 0 0 fm y 0 0 0 0 0 = m x 0 0 0 m y 0 0 0 X Y f 0 0 0 0 f 0 0 0 0 0 The principal point is the intersection of the optical axis with the image plane 5

The transformation gives positions in terms of pixel units. However, it does not correspond to the pixel indices used in an image. Typically the index (0,0) is in the corner of the image. Thus we need another transformation, namely a translation, that gives the correct pixel coordinates of the principal point. Note: the principal point might not correspond exactly to the center of the sensor. You would think it should, but keep in mind that pixels are very small and the placement of the sensor relative to the lens (which defines the optical axis and hence principal point) might not be as accurate as you hope. Let (p x, p y ) be the position of the principal point in pixel coordinates. It may indeed be at the center of the image, i.e. at position (*,*) as mentioned above. But it may be at a slightly different position too. Then, to get the position of a projected point (x, y) in pixel coordinates, we need to add a translation component which is the pixel position of the principal point: A few more details... fm x 0 p x 0 0 fm y p y 0 0 0 0 = 0 p x 0 p y 0 0 fm x 0 0 0 0 fm y 0 0 0 0 0 While the above model typically describes all the parameters needs to transform from a scene point in camera coordinates into the projected image point in pixel coordinates, it is common to add yet one more parameter called the skew parameter s. So a more general projection transformation is: fm x s p x 0 0 fm y p y 0 0 0 0 It is common to give the 3 3 matrix on the left side a special name fm x s p x K 0 fm y p y 0 0 is called the camera calibration matrix. In part 3 of this course, we will show how to estimate this matrix. Notice that the parameters f, m x, m y only appear as two products in this matrix. Sometimes one writes α x s p x K = 0 α y p y 0 0 where α x and α y is the focal length, expressed in horizontal and vertical pixel units, respectively. We now have a transformation takes a point (X, Y,, ) in world coordinates to a pixel position (wx, wy, w). One sometimes writes this transformation as: P = KR[ I C] where K, R, and I are 3 3, and where the matrix on the right is 3 4. This transformation is sometimes called a finite projective camera. 6