1 Multiview 3D video / Outline 2 Advanced Topics Multimedia Video (5LSH0), Module 02 3D Geometry, 3D Multiview Video Coding & Rendering Peter H.N. de With, Sveta Zinger & Y. Morvan ( p.h.n.de.with@tue.nl ) Camera geometry Intrinsic/extrinsic camera parameters Camera calibration 3D Video coding & Multiview rendering 3D Coding architecture concept Depth signals and depth estimation 3D Multiview video coding 3D rendering: algorithm and artifacts removal 5LSH0 Advanced Topics Video & Analysis A. Projective geometry - Introduction Sveta Zinger Video Coding and Architectures Research group, TU/e ( s.zinger@tue.nl ) 3 Introduction (1) Projective geometry branch of geometry dealing with the properties and invariants of geometric figures under projection serves as a mathematical framework for 3D multi-view imaging, 3D computer graphics image formation process modeling image synthesis reconstruction of 3D objects from multiple images 4 Introduction (2) 5 Introduction (3) 6 Euclidian geometry Usually used to model lines, planes or points in 3D Two parallel rails intersect in the image plane at the vanishing point Why do we need projective geometry? easier to model intersection of parallel lines at infinity perspective scaling operation requires division in Euclidian geometry => non-linearity => better to avoid 1
Homogeneous coordinates (1) Define a point in Euclidian space In projective space 3-element vector 4-element vector (X, Y, Z) T (X 1, X 2, X 3,, X 4 ) T Inhomogeneous coordinates Homogeneous coordinates 7 Homogeneous coordinates (2) Inhomogeneous coordinates (X, Y, Z) T and homogeneous coordinates (X 1, X 2, X 3,, X 4 ) T are related X = 4 X1 X 4, Y = X 2 X 4, Z = X 3 X 4, where X 0 Mapping from n-dimensional Euclidian space T T ( X X,..., X ) ( λx, λx,..., λx, λ), 1, 2 n 1 2 n 4 443 144 44 2 444 Euclidian space projective space 14 2 4 3 where λ 0 free scaling parameter, or homogeneous scaling parameter 8 Pinhole camera model (1) 9 3D from multiple images: concept 10 positive image Cloud of points is projected into multiple images from different viewpoints. 1. Coordinate frame aligned with camera center. 2. Image in focal plane (between object and camera). 3. Allows projection of 3D object onto 2D image. Can we reverse the projection process and reconstruct the points? 3D from multiple images: some images from input sequence 11 3D from multiple images: algorithm 12 http://www.cs.unc.edu/~marc/tutorial.pdf 2
3D from multiple images: result 13 Mod 03 Multiview geometry & coding 14 B. Camera Geometry PhD thesis of Ping Li, TUe VCA http://vca.ele.tue.nl/people/pli_publ.html Camera geometry / Concept (1) To understand the 3D structure of objects/scene, a relation between point coordinates in the 3D world to pixels position is required. We have 3 coordinates systems: image, camera and world. Goal: map a point in 3D space (the world coordinate system) to the image plane (the image coordinates system). 15 Camera geometry / Concept (2) Link world, camera and image coordinates by a set of parameters known as intrinsic and extrinsic parameters. Intrinsic parameters: focal length, width and height of the pixel on the sensor, position of the principal point (origin of the image coordinates system). Extrinsic parameters: camera position, camera orientation. 16 Camera geometry / Image formation Remember: pinhole camera model Projection of point onto the image plane results in point 17 Central projection using homogeneous coordinates With homogeneous coordinates, the central projection can be written as a linear equation. The central projection maps the 3D space to 2D space. 18 3
Camera geometry / CCD camera Require more general conversion to pixel coordinates Principal point offset in pixel units. 19 Extrinsic camera parameters Extrinsic parameters define orientation and location of the camera in the world coordinate system. Involves Euclidean transform between world and camera coordinates 20 Conversion from camera coordinates to pixel coordinates is obtained with width/height, hence fraction of the pixels. - R is a 3x3 rotation matrix - t is a 3D translation vector Projective camera / Summary (1) 21 Projective camera / Summary (2) 22 We can finally map the 3D point to the image Combining the camera matrix (intrinsic parameters) and the rotation/translation matrix (extrinsic parameters) we obtain the camera calibration matrix. pixel coordinates world coordinates intrinsic parameters extrinsic parameters camera calibration matrix Projection matrix 3x4 has 11 degrees of freedom (scaling invariance). Camera calibration Goal: estimating coefficients of the camera calibration matrix. Once the camera calibration matrix parameters are known, the camera is calibrated. Simple calibration algorithm It is assumed that the world coordinates of points are known with their corresponding pixel coordinates. Points are usually arranged in a special pattern for easy calibration. 23 Linear method for estimating matrix C (1) World-point coordinates and image-pixel positions are linked with the camera calibration matrix is the 3x4 projection matrix, can be written as Algorithm consists of two steps: 1. Compute matrix C with a set of know 3D positions and their respective positions in the image. 2. Extrinsic and intrinsic parameters are estimated from C. 24 4
Linear method for estimating matrix C (2) Use world point coordinates and their corresponding pixel coordinates in the image to determine. Each correspondence generates two equations which can be written as 25 Linear method for estimating matrix C (3) Stack equations into one equation system: The equation has 12 unknown parameters: at least 6 correspondence points are required. Typically, more points are used. Equation system gets over-constrained. Equation is then solved using a least squares minimization. 26 Application: back-projection of points to rays Given a point x in an image, we determine the set of points in 3D space that map to this point. This ray is presented as the joint of two points: the camera center where the point where is the pseudo-inverse of The ray is then formed by the joint of these two points: 27 Mod 02 Multiview 3D Geometry & Coding C. Multiview 3D TV coding architecture 28 Introduction multiview coding Presented work was initiated to support the development of video compression algorithms for 3D video systems. We present a 3D video system architecture based on new approaches for Depth estimation : Acquisition of 3D content 29 MVC intro / Applications of 3D video The MPEG community has a considerable interest in standardizing technologies for 3D and FTV applications, e.g. 3D TV enabling the perception of depth using a multi-view display, free-viewpoint video that allows the viewer to interactively select a viewpoint of the scene. 30 New coding techniques for the efficient storage and transmission and, Rendering of 3D video. 5
3D video Representation Formats 31 3D Video Coding Architecture 32 Several 3D video representation formats are explored: A C D 1-texture+1-depth format, N-texture video format and, depth estimation H.264 multi-view H.264 multi-view texture Coder network H.264 multi-view depth Decoder H.264 multi-view texture Decoder 3D rendering - view synthesis N-texture+N-depth that was adopted in our work. Multi-view video acquisition B Multi-view video compression decoding and rendering The proposed 3D video system architecture is composed of: a depth-estimation sub-system (3D acquisition), an H.264 multi-view video coder an H.264 multi-view depth video coder a 3D-video rendering engine Mod 02 Multiview 3D Geometry & Coding D. 3D Depth Estimation and 3D Rendering 33 Depth estimation using 2 views (1) A popular method: calculate depth for each scanline using a 1D optimization Algorithm summary: for each scanline, calculate a table of matching costs. Optimize matching-cost table using dynamic programming (ref. Viterbi). 34 table of matching-costs admitted depth Depth estimation using 2 views (2) 35 Depth estimation using multiple views 36 To estimate accurate depth images, we propose two new constraints. Instead of estimating depth images pair-wise, we propose to employ all views simultaneously. To avoid scanline artifacts, we employ an inter-scanline cost that enforces smooth variations of depth (smoothness constraint). 6
Depth estimation using multiple views 37 Rendering of multi-view images (1) 38 A pixel position in the left view can be predicted from its corresponding position in the right view using the image warping equation Disadvantage: generates holes in the rendered image, i.e. occluded pixels. Requires calibration parameters Rendering of multi-view images (2) 39 View Rendering Example - Original 40 Relief texture mapping expresses the image warping equation Advantages of relief texture are that the: Pre-warping step performs horizontal and vertical pixel-shift combined with pixel re-sampling, thus resolving occluded pixels, Post-warping equation corresponds to a planar texture mapping operation, thus efficiently implemented in a GPU. View Rendering Example Rendered 41 3D from multiple images: video demonstration 42 This video shows: input image sequence reconstructed camera positions, cloud of points obtained depth map http://www.cs.unc.edu/~marc/ texture mapped on depth 7
View rendering: Free-ViewPoint (FVP) 43 Challenges of FVP: cracks due to image sampling 44 Reference image 3D Warping Possible solution: median filtering Virtual image Challenges of FVP: poorly defined borders => contour artifacts 45 Challenges of FVP: disocclusions inpainting 46 Possible solution: label edges and delete them after warping Possible solution: fill in the disoccluded pixels with background texture information View rendering: free-viewpoint result 47 References 48 Y. Morvan, Acquisition, Compression and Rendering of Depth and Texture for Multi-view Video, Ph.D. thesis, Eindhoven University of Technology, 2009 http://mathworld.wolfram.com/projectivegeometry.html PhD thesis research Luat Do, TUe VCA http://vca.ele.tue.nl/people/ldo_publ.html 8
Mod 02 Multiview 3D Geometry & Coding E. 3D Coding of multiview images 49 Predictive coding of multiview images For an efficient transmission, an independent compression of correlated camera-views should be avoided. One predictive-coding algorithm investigated is based on an image rendering technique. The idea followed is to render an image as seen by the predicted camera using Depth Image Based Rendering (DIBR). 50 Predictive coding of depth images Reminder: the adopted video format is N-texture+N-depth. Coding of multi-view depth images: render depth image at the position of a predicted camera. 51 Including view rendering in H.264 coding For the compression of multiple views, we have integrated the view-rendering algorithm into an H.264 encoder. A camera view (texture or depth) is predicted using either» the central reference camera, or» the synthetic rendered view. 52 Synthetic view Adv. Topics MMedia Video / 5LSH0 central / Mod 02 3D intro & multiview camera-view video Coding structure for random access The coding structure defines which view can be employed as reference (predictor) for compression. For free viewpoint video, the coding structure should allow random access to an arbitrary view. 53 Coding Results of Breakdancers seq. (1) 54 Texture is temporally stable: ME slightly outperforms view-synthesis prediction at the loss of random access. Coding structure simulcast Coding structure with random access NB: There is a trade-off between coding efficiency and random-access! The proposed multi-view coding system provides: - random access to arbitrary views - coding performances similar to Simulcast coding. 9
Coding results of Breakdancers seq. (2) 55 Multiview Coding / Conclusions 56 Depth is not temporally stable: ME does not work The presented 3D video processing and coding system performs accurate depth estimation by employing simultaneously multiple views, renders high-quality images by appropriately handling occluded pixels achieves efficient compression by exploiting inter-view redundancy for the texture and depth images. The coding system relies on H.264 and thus allows a gradual introduction of cost-efficient 3D systems. 10