Image formation
Agenda Perspective projection Rotations Camera models
Light as a wave + particle
Light as a wave (ignore for now) Refraction Diffraction
Image formation Digital Image Film Human eye
Pixel brightness (More on light as psychics at end of semseter)
Pinhole optics
Camera Obscura
World s largest photograph El Toro Marine Corps, Irvine CA 2006
Accidental pinholes what s the dark stuff? (the view from Antonio s hotel room)
Accidental pinhole and pinspeck cameras: revealing the scene outside the picture CVPR 2012 Antonio Torralba, William T. Freeman Computer Science and Artificial Intelligence Laboratory (CSAIL) MIT
Perspective projection Closer objects appear larger Closer objects are lower in the image Parallel lines meet
Great reference https://www.youtube.com/watch?v=q8xsxfu7dk0&list=plc0ieyeogt2xtmfaf2st_undeptre3f9s&index=2
Pinhole Camera optical axis [Aside: right-handed coordinate system] How do we compute P? [on board]
Pinhole Camera
Image inversion
Image inversion Perplexed folks for a while. But software (or the brain) can simply invert this.
Physical model that avoids inversion easel COP = pinhole, camera center Distance of COP to easel = focal length
Visual angle (common unit in human vision) easel Note: math is easier for a spherical easel (e.g., retina) = L f L = length of projection on sphere theta = units of radians Human head is 9 inches high. At a distance of 9 feet, it subtends 1/12 radians = 4.8 degrees, regardless of focal length
Field of view (FOV) 24mm 50mm 135mm FOV = total sensor size (diagonal) focal length (in radians)
Increasing the focal length and stepping back What happens to apparant object size and FOV when we double distance to object and double the focal length? x new = 2fX 2Z = fx Z = x old sensor size FOV new = = 1 2f 2 FOV old
Decreasing the focal length and moving forward
Perspective projection Closer objects appear larger Closer objects are lower in the image Parallel lines meet All these can be simply derived with x = f X Z!
(parallel lines meet) Vanishing point: proof 2 3 X 4Y 5 = 2 4 A x B y 3 5 + 2 3 D x 4D 5 y COP (x,y,f) (X,Y,Z) Z C z D z Compute projected point (x,y) as lambda approaches infinity [on board]: x = fx Z = f(a x + D x ) A z + D z! fd x D z as!1 y = fy Z = f(a x + D x ) A z + D z! fd y D z as!1 3D lines with identical direction vectors coverge to same 2D image location
VP 3 Special case: manhatten world Consider a city-block world where all lines follow one of 3 directions VP 1 VP 2
Special case: horizon line Claim: all 3D lines on ground plane meet at a horizon line
Horizon line: proof 2 3 X 4Y 5 = Z 2 A x 3 4B y 5 + C z 2 D x 3 4D y 5 D z (x, y)! ( fd x D z, fd y D z ) as!1 Equation of ground plane is Y = -h (x,y,f) (X,Y,Z) COP For all points A on ground plane (Ax,-h,Az) with a direction D along ground plane (Dx,0,Dz), where will vanishing points converge to? ( fd x D z, 0) Why is horizon line not always at center of image?
Image y position: proof Equation of ground plane is Y = -h A point on ground plane will have y-coordinate=? y = -fh/z Z2 Z3 Z1
Image height: proof Bottom of tree: (X,-h,Z) Top of tree: (X,L-h,Z) y top y bot = f(l h) Z fh Z = fl Z
Consequence of derivations for image height and parallel lines distances and angles aren t preserved in camera projection
Orthographic projection COP (x,y,f) (X,Y,Z) x = fx/z y = fy/z (x,y,f) (X,Y,Z) x = X y = Y Life would be much simpler; we could trust angles and distances 32
Scaled orthographic projection Consider two points (A,B) at different depths that are far away from camera: 2 3 A x 4A 5 y Z 2 B x 4 B y Z + Z 3 5 if Z >> deltaz, what happens to their image projections (e.g., ax and bx)? a x = fa x Z = A x b x = fb x Z + Z COP fb x Z = B x for Z Z We can approximate sets of such points with a scaled orthographic model 33
Perspective vs Orthogrpahic Wide angle Standard Telephoto
Scaled orthographic
Scaled orthographic
Perspective tends to matter for large objects (change in depth of object large relative to distance from camera)
A look back: dominant effects of perspective Parallel lines meet at vanishing points Objects further away are smaller Foreshortening
Fronto-parallel view Foreshortened view Perspective view Rotation of far-away plane Affine linear warp Rotation of close-by plane Homography nonlinear warp
2D Geometric Transformations y translation similarity projective Euclidean affine x Transformation Matrix # DoF Preserves Icon translation rigid (Euclidean) similarity affine projective h I t h R t h sr t h A h H i i i i 2 3 i 2 3 2 3 3 3 2 3 2 orientation 3 lengths S S 4 angles S S 6 parallelism 8 straight lines `` Let s define families of transformations by the properties that they preserve
but first, we ll need tools from geometry Where we are headed. Euclidean (trans + rot) preserves lengths + angles Affine: preserves parallel lines Projective: preserves lines Projective Affine Euclidean
Agenda Perspective projection Rotations Camera models
Orthogonal transformations Defn: Orthogonal transformations are linear transformations that preserve distances and angles a T b = F (a) T F (b) where F (a) =Aa, a 2 R n,a2 R 2 2 n n a T b = a T A T Ab () A T A = I [can conclude by setting a,b = coordinate vectors] Defn: A is a rotation matrix if A T A = I, det(a) = 1 Defn: A is a reflection matrix if A T A = I, det(a) = -1
2D Rotations R = apple cos sin sin cos 1 DOF
3D Rotations R 2 3 X 4Y 5 = Z 2 3 r 11 r 12 r 13 4r 21 r 22 r 23 5 r 31 r 32 r 33 2 3 X 4Y 5 Z Think of as change of basis where ri = r(i,:) are orthonormal basis vectors r2 rotated coordinate frame r1 r3 How many DOFs? 3 = (2 to point r1 + 1 to rotate along r1)
Euler s rotation theorm Any rotation of a rigid body in a three-dimensional space is equivalent to a pure rotation about a single fixed axis https://en.wikipedia.org/wiki/euler's_rotation_theorem
3D Rotations Lots of parameterizations that try to capture 3 DOFs Helpful ones for vision: orthonormal matrix, axis-angle, exponential maps Represent a 3D rotation with a unit vector pointed along the axis of rotation, and an angle of rotation about that vector -vs- 2D 3D
Review: dot and cross products Dot product: a b = a b cos Cross product: a b = 2 3 a 2 b 3 a 3 b 2 4b 1 a 3 a 1 b 3 5 a 1 b 2 a 2 b 1 Cross product matrix: a b = âb = 2 4 3 2 3 0 a 3 a 2 b 1 a 3 0 a 1 5 4b 2 5 a 2 a 1 0 b 3
Approach! 2 R 3,! =1 x https://en.wikipedia.org/wiki/axis-angle_representation
Rodrigues' rotation formula https://en.wikipedia.org/wiki/rodrigues'_rotation_formula_rotation_formula! 2 R 3,! =1 x k x? x 1. Write as x as sum of parallel and perpindicular component to omega 2. Rotate perpindicular component by 2D rotation of theta in plane orthogonal to omega R = I +ŵ sin +ŵŵ(1 cos ) [Rx can simplify to cross and dot product computations]
Exponential map representation! 2 R 3,! =1 x k x? x R =exp(ˆv), where v =! = I +ˆv + 1 2! ˆv2 +... [standard Taylor series expansion of exp(x) @ x=0 as 1 + x + (1/2!)x 2 + ] [reduces to Rodrigous formula with Taylor series expansion of sine + cosine] Implies that we can approximate change in position of x due to a small rotation v as: v x,
Agenda Perspective projection Rotations Camera models
Recall perspective projection y x (x,y,1) (X,Y,Z) COP z x = f Z X y = f Z Y
Perspective projection revisited 2 3 x 4y5 = 1 2 f 0 3 0 40 f 05 0 0 1 2 3 X 4Y 5 Z Given (X,Y,Z) and f, compute (x,y) and lambda: x = fx = Z x = x = fx Z
Special case: f = 1 Natural geometric intuition: 3D point is obtained by scaling ray pointed at image coordinate Scale factor = true depth of point (x,y,1) (X,Y,Z) COP Z 2 3 x 4y5 = 1 2 3 X 4Y 5 Z [Aside: given an image with a focal length f, resize by 1/f to obtain unit-focal-length image]
Homogenous notation For now, think of above as shorthand notation for 2 4 x y z 3 5 2 4 X Y Z 3 5 2 4 x y z 3 5 2 4 X Y Z 3 5 9 s.t. 2 4 x y z 3 5 = 2 4 X Y Z 3 5
Camera projection 2 3 x 4y5 = 1 2 2 3 2 3 f 0 0 r 11 r 12 r 13 t x 40 f 05 4r 21 r 22 r 23 t 5 y 0 0 1 Camera instrinsic matrix K (can include skew & non-square pixel size) r 31 r 32 r 33 t z Camera extrinsics (rotation and translation) X 6Y 4Z 1 3 7 5 3D point in world coordinates r2 r1 camera r3 T world coordinate frame Aside: homogenous notation is shorthand for x = x
Fancier intrinsics x s = s x x y s = s y y x 0 = x s + o x y 0 = y s + o y x =x 0 + s y 0 } } non-square pixels shifted origin y skewed image axes x K = 2 3 s x s o x 4 0 s y o 5 y 0 0 1 2 f 0 3 0 40 f 05 = 0 0 1 2 3 fs x fs o x 4 0 fs y o 5 y 0 0 1
Notation [Using Matlab s rows x columns] 2 3 2 3 2 3 2 3 X x fs x fs o x r 11 r 12 r 13 t x 4y5 = 4 0 fs y o y 5 4r 21 r 22 r 23 t y 5 6Y 7 4Z 5 1 0 0 1 r 31 r 32 r 33 t z 1 2 3 X = K 3 3 R3 3 T 3 1 6Y 7 4Z 5 1 2 3 X = M 3 4 6Y 7 4Z 5 1 Claims (without proof): 1. A 3x4 matrix M can be a camera matrix iff det(m) is not zero 2. M is determined only up to a scale factor
Notation (more) M 3 4 2 X 6Y 4Z 1 3 7 5 = A 3 3 b 3 1 = A 3 3 2 X 6Y 4Z 1 2 3 X 4Y 5 + b 3 1 Z 3 7 5 M = 2 m T 1 4m T 2 m T 3 3 5, A = 2 a T 1 4a T 2 a T 3 3 5, b = 2 3 b 1 4b 5 2 b 3
Applying the projection matrix x = 1 ( X Y Z a 1 + b 1 ) y = 1 ( X Y Z a 2 + b 2 ) = X Y Z a 3 + b3 Set of 3D points that project to x = 0: Set of 3D points that project to y = 0: X Y Z a1 + b 1 =0 X Y Z a2 + b 2 =0 Set of 3D points that project to x = inf or y = inf: X Y Z a3 + b 3 =0
Rows of the projection matrix describe the 3 planes defined by the image coordinate system a 3 y a 1 COP a 2 x image plane
Other geometric properties (x,y) COP (X,Y,Z) Draw plane infront of pinhole. Write (x,y) for normalized coordinate and (u,v) for image coordinates? What s set of (X,Y,Z) points that project to same (x,y)? 2 3 2 3 X x 4Y 5 = w + b where w = A 1 4y5,b= A 1 b Z 1 What s the position of COP / pinhole? 2 3 X A 4Y 5 + b =0 ) Z 2 3 X 4Y 5 = A 1 b Z
Affine cameras perspective m T 3 = 0 0 0 1 weak perspective
Affine cameras Captures 3D affine transformation + orthographic projection + 2D affine transformation apple x y = = = apple 2 1 4 2 3 2 3 X a 11 a 12 a 13 b 1 4a 21 a 22 a 23 b 2 5 6Y 7 4Z 5 1 1 2 3 apple X apple a11 a 12 a 13 4Y 5 b1 + a 21 a 22 a 23 b Z 2 x = AX + b 1 2 3 2 3 3 X 5 6 7 6Y 7 4 5 4Z 5 1 1 1 Projection defined by 8 parameters Parallel lines project to parallel lines 2D points = linear projection of 3D points (+ 2D translation)
Affine Cameras m T 3 = 0 0 0 1 x = X Y Z a 1 + b 1 y = X Y Z a 2 + b 1 Image coordinates (x,y) are an affine function of world coordinates (X,Y,Z) Example: Weak-perspective projection model Projection defined by 8 parameters Parallel lines project to parallel lines The transformation can be written as a direct linear transformation plus an offset
Geometric Transformations Euclidean (trans + rot) preserves lengths + angles Affine: preserves parallel lines Projective: preserves lines Projective Affine Euclidean