3D Vision Real Objects, Real Cameras Chapter 11 (parts of), 12 (parts of) Computerized Image Analysis MN2 Anders Brun, anders@cb.uu.se
3D Vision! Philisophy! Image formation " The pinhole camera " Projective geometry " Artefacts and challenges! Camera calibration " GOPRO + homebrew! Stereo vision " Hitta.se! Structured Light " Texture
Philosophy: Why 3-D?! Why do we model things in 3-D?! Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real size estimation,! Example: 2-D: A car on the highway looks bigger and drives faster when it approaches! 3-D: A car on the highway has constant size and speed when it approaches y z x x
Philosophy: 3-D cues Shape from: Focus Lighting Stereo Structured light Photo: Greg Keene
Philosophy: 3-D cues
Philosophy: 3-D cues
Philosophy: 3-D cues
Philosophy: Marr and 2.5-D! Primal sketch: Edges and areas! 2.5-D sketch: Texture and depth! 3-D model: A hierarchical 3-D model of the world Teddy dataset, from http://cat.middlebury.edu/
Philosophies! Build accurate 3-D world representation 1. Build a complete 3-D model of the scene 2. Plan the task using the 3-D model 3. Example: Build a model of the scene, then find the teddy bear and send a robot arm to grab it.! Plan as you go, act and react 1. Collect features from the scene 2. Use the features to guide your actions 3. Example: Find the teddy bear using template matching in image, then send the robot hand in that direction. Possibly take more images when halfway.
Passive, Active and Dynamic Vision! Passive vision: " The camera has a fixed location! Dynamic vision: " The camera is moving but cannot be steered! Active vision: " The camera can be steered
The pinhole camera! The Pinhole camera is an idealized model! A real aperture is not a point.! A real aperture has a non-vanishing area and typically also a lens
The pinhole camera model! Where is the point P projected on the image plane inside the camera? P=(X,Y,Z) image plane x = f X Z focal point or origin (the pinhole ) x f
The pinhole camera model (alternative)! Imagine an observer is located at the focal point! A screen is placed at distance f from observer.! Where on this screen is P projected P=(X,Y,Z) x = + f X Z screen focal point (the observer ) x f = focal length y = + f Y Z
The pinhole camera model! In the pinhole camera, the world appears to be upside down (or 180 rotated).! The alternative interpretation is useful in computer graphics. It tells you exactly where to draw P on a screen, in front of the observer, in order to make it appear real for the observer. (OBS: the change of sign)! The alternative interpretation leads directly to projective geometry.
Projective Geometry (Very Briefly)! Points in 2-D are represented by lines in 3-D! The 3-D space is called the embedding space! All points along a line are equivalent! This is analogous to a photography, every point (position) in a photograph (2-D) corresponds to a line or ray in reality (3-D) α x Equivalence class x x
Projective Geometry (Very Briefly)! We can convert points in the ordinary plane to the projective plane! 2-D (x,y) # 3-D (x,y,1)! In general: D-dimensional # (D+1)-dimensional! Points x and α x are equivalent, α 0 α x Equivalence class x x 1
Projective Geometry (Very Briefly) α # x' & # h 11 h 12 h 13 & % ( % ( % y' ( = % h 21 h 22 h 23 ( $ % 1' ( $ % h 31 h 32 h 33 '( # x& % ( % y ( $ % 1' ( x'= h 11x + h 12 y + h 13 h 31 x + h 32 y + h 33 y'= h 21x + h 22 y + h 23 h 31 x + h 32 y + h 33 (linear) transformation H x x 1 1
Projective Geometry (Very Briefly)! Homography, a map from (D+1)-dim to (D+1)-dim! Linear in the (D+1)-dim embedding space! x = H x! Represents a perspective transformation in D-dim space! This is very nice! (linear) transformation H x x 1 1
Projective Geometry (Very Briefly)! Using homographies, we can express a rich class of transformations using linear mappings # R Rt& # sr Rt& " A t% H = I % ( % ( $ ' det(h) 0 $ 0 1 ' $ 0 1 ' # 0 1& Identity Isometric Similarity Affine Perspective
Perspective Transformations! Remember this example? We wanted to compute the perspective transformation parameters. From Feature based methods for structure and motion estimation by P. H. S. Torr and A. Zisserman
Perspective Transformations! Estimating H from point correspondences (simplified version, check the book for a more advanced version)! Each point correspondence translates to 2 linear equations (in the coefficients of H)! Assuming h 33 =1, we need 4 corresponding 2-D point pairs (x,y,x,y ) to solve this equation system (8 unknowns).! This way of solving the for the parameters has severe practical disadvantages, but it shows that it is possible at least... x'= h 11x + h 12 y + h 13 h 31 x + h 32 y + h 33 y'= h 21x + h 22 y + h 23 h 31 x + h 32 y + h 33 h 31 xx'+h 32 yx'+h 33 x' h 11 x h 12 y h 13 = 0 h 31 xy'+h 32 yy'+h 33 y' h 21 x h 22 y h 23 = 0 " P x (x,y, x',y') % $ # P y (x,y, x',y') & ' h = " 0 % $ # 0 ' &
Perspective Transformations! A cleaner and more stable solution! Multiply both sides with the cross product matrix $ 0 1 y' ' & ) α& 1 0 x' ) %& y' x' 0 () $ x' ' & ) & y' ) %& 1 () = $ 0 1 y' ' $ h 11 h 12 h 13 ' & )& ) & 1 0 x' )& h 21 h 22 h 23 ) %& y' x' 0 () %& h 31 h 32 h 33 () $ x' & ) & y) %& 1( ) 0 = $ 0 1 y' ' $ h 11 h 12 h 13 ' & )& ) & 1 0 x' )& h 21 h 22 h 23 ) %& y' x' 0 () %& h 31 h 32 h 33 () $ x' & ) & y) %& 1( ) 0 = Q(x, y, x',y')h Now three equations killing two unknowns
Single perspective camera X u Internal parameters External parameters O i f " $ αu = $ $ # $ αu = MX f s w 0 0 g v 0 0 0 1 %" ' $ ' $ ' $ &' # 1 0 0 0 0 1 0 0 0 0 1 0 % '" ' $ '# $ & R Rt 0 T 1 % ' &' X C M: Projection matrix
Single perspective camera! Estimation of M from known coordinates (X,Y,Z, 1) projections in a camera (x,y,1) α # x' & # m 11 m 12 m 13 m 14 & % ( % ( % y' ( = % m 21 m 22 m 23 m 24 ( $ % 1' ( $ % m 31 m 32 m 33 m 34 '( # X& % ( % Y ( % Z( % ( $ 1'! This is analogous to the homographic projection! Algorithms exist to solve this with 6 correspondences
Single perspective camera! This enables calibration from 6 known points! M can be factored: You can estimate camera focal length, image coordinate systems, camera position and rotation.! Triangulation: If you known several M i, then you can also estimate a position X (3-D) using several camera projections u i,(2-d).
Marker based motion capture Images: courtesy of Lennart Svensson
Mocap Images: courtesy of Lennart Svensson
External calibration! Rotation + position, 6 DoF, calibration Images: courtesy of Lennart Svensson
Motion capture applications! Animation! Biomechanical analysis! Industrial analysis Images: courtesy of Lennart Svensson
Image formation Lenses! Thin lens! zz'= f 2 Object plane Image plane Image focal point object focal point z f f z'
Image formation Lenses! Magnification, m = x/x! From similarity, x/z = X/f Object plane Image plane Image focal point X x object focal point z f f z' m = x X = f z = z' f
Image formation Lenses! Depth of field Δz Object plane Image plane Image focal point Δz ε = size of a pixel object focal point z f f z'! Thus, objects within depth of field, are scattered within an area smaller than a pixel, i.e. they are depicted sharp
Image formation Lenses! Depth of field Δz Object plane Image plane Image focal point Δz ε = size of a pixel object focal point z f f z'! Aperture size and focal length both affects the depth of field. A larger aperture will yield a smaller depth of field.
Image formation Lenses! Depth of focus Δz' Object plane Image plane Image focal point ε = size of a pixel object focal point Δz' z f f z'! Depth of focus is analogous. How much the image plane can be shifted without scattering light from a point in focus more than a pixel
AACAM @ Matlab File Exchange! Matlab code for non-perfect pinhole camera " Set aperture radius and focal length " Set depth of field " Set object distance and aperture radius
Image formation Lenses! (systems of) lenses # distortions:! Spherical aberration! Shorter focal length close to edges of lens (Image from wikipedia)
Image formation Lenses! (systems of) lenses # distortions:! Coma (Image from wikipedia)
Image formation Lenses! (systems of) lenses # distortions:! Chromatic aberration (Image from wikipedia)
Image formation Lenses! (systems of) lenses # distortions:! Astigmatism (Image from wikipedia)
Image formation Lenses! (systems of) lenses # distortions:! Geometric distortion Barrel distortion Pincushion distortion (Image from wikipedia)
Is this really a problem?! In old and cheap cameras, yes! Uppsala 1999-01-01 From http://www.uu.se/carpediem/1999/
Is this really a problem?! But also for e.g. modern GoPRO cameras!
Camera Calibration Toolbox! A Matlab toolbox for camera calibration:! http://www.vision.caltech.edu/bouguetj/calib_doc/! Freely available
Camera Toolbox Calibration! Focal length: The focal length in pixels is stored in the 2x1 vector fc.! Principal point: The principal point coordinates are stored in the 2x1 vector cc.! Skew coefficient: The skew coefficient defining the angle between the x and y pixel axes is stored in the scalar alpha_c.! Distortions: The image distortion coefficients (radial and tangential distortions) are stored in the 5x1 vector kc.
Stereo Basic equations P=(X,Y,Z) P=(X,Y,Z) z x x 1 x 2 f x 1 = f X Z x 2 = f X B Z Z = fb x 2 x 1 = fb d B B
Stereo the general case! It may happen that the relation between the two cameras is not a paralax translation! Then the epipolar constraint applies! By rectification epipolar lines are aligned with scanlines From: Epipolar Rectification by Fusiello et al.
Stereo Disparity Estimation! Search horizontally for patch disparity, use e.g. sum of squared differences (SSD) Teddy dataset, from http://cat.middlebury.edu/
Stereo Depth estimation! A simple formula converting disparity d to distance z when the inter camera distance is B:! Z = fb d Patch based estimate Ground truth
Stereo Constraints! Constraints (Marr and Poggio): " Each point in each image is assigned at most one disparity value " The disparity varies smoothly at most locations in the images! However! Different regularization may be applied to the depth function z x x 1 x 2
Stereo from Segmentation! Alternative approach: " Make a segmentation of the image first " Apply a linear model in each segmented region " Refine the models in the regions From Segment-based Stereo Matching Using Graph Cuts by Hong and CHen
Large Scale 3D Maps (C3/SAAB) d Courtesy of Petter Torle C3 Technologies
Large Scale 3D Maps (C3/SAAB)
Structured Light! A lightsource helps the stereo algorithm to find matching points.! Often used in industrial applications From: http://mesh.brown.edu/3dpgp-2009/homework/hw2/hw2.html
More Structured Light! Microsoft Kinect, using infrared light http://www.youtube.com/watch?v=nvvqjxgykcu
Other Computer Vision Code! Open CV " Free to use " Supports IPP speedups " http://en.wikipedia.org/wiki/opencv " http://sourceforge.net/projects/opencvlibrary/ " http://opencv.willowgarage.com/wiki/! Intel Integrated Performance Primitives 6.0 " http://www.intel.com/cd/software/products/asmo-na/eng/ 302910.htm " Commercial (but cheap) " Includes Computer Vision, Signal Processing, Data Compression,.
Typical Exam Questions! Project this object (points) using a pinhole camera! Can geometric transformation compensate for lens distortions in general?! Explain the parameters building up the projection matrix M # f s w 0 & % ( u = % 0 g v 0 ( $ % 0 0 1 '( u = MX # 1 0 0 0& % ( % 0 1 0 0 ( $ % 0 0 1 0' ( # R Rt& % $ 0 T ( X 1 '
Thank You!! Email questions to: anders.brun@it.uu.se