5LSH0 Advanced Topics Video & Analysis

Similar documents
CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS

Pin Hole Cameras & Warp Functions

Conversion of free-viewpoint 3D multi-view video for stereoscopic displays Do, Q.L.; Zinger, S.; de With, P.H.N.

Quality improving techniques in DIBR for free-viewpoint video Do, Q.L.; Zinger, S.; Morvan, Y.; de With, P.H.N.

Multiview Depth-Image Compression Using an Extended H.264 Encoder Morvan, Y.; Farin, D.S.; de With, P.H.N.

Camera Model and Calibration

calibrated coordinates Linear transformation pixel coordinates

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

Pin Hole Cameras & Warp Functions

CSE 252B: Computer Vision II

Augmented Reality II - Camera Calibration - Gudrun Klinker May 11, 2004

Outline. ETN-FPI Training School on Plenoptic Sensing

Geometric camera models and calibration

Lecture 14: Basic Multi-View Geometry

Camera model and multiple view geometry

DD2423 Image Analysis and Computer Vision IMAGE FORMATION. Computational Vision and Active Perception School of Computer Science and Communication

Visual Recognition: Image Formation

1 Projective Geometry

Vision Review: Image Formation. Course web page:

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Camera Model and Calibration. Lecture-12

Three-Dimensional Sensors Lecture 2: Projected-Light Depth Cameras

Rectification and Disparity

CIS 580, Machine Perception, Spring 2016 Homework 2 Due: :59AM

Stereo Image Rectification for Simple Panoramic Image Generation

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

CS 664 Slides #9 Multi-Camera Geometry. Prof. Dan Huttenlocher Fall 2003

LUMS Mine Detector Project

Stereo Vision. MAN-522 Computer Vision

CIS 580, Machine Perception, Spring 2015 Homework 1 Due: :59AM

Computer Vision cmput 428/615

Computer Vision I - Appearance-based Matching and Projective Geometry

Unit 3 Multiple View Geometry

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Module 6: Pinhole camera model Lecture 32: Coordinate system conversion, Changing the image/world coordinate system

COSC579: Scene Geometry. Jeremy Bolton, PhD Assistant Teaching Professor

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Projective Geometry and Camera Models

Computer Vision I - Appearance-based Matching and Projective Geometry

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Jingyi Yu CISC 849. Department of Computer and Information Science

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Epipolar Geometry and Stereo Vision

C / 35. C18 Computer Vision. David Murray. dwm/courses/4cv.

Image Transformations & Camera Calibration. Mašinska vizija, 2018.

Depth Estimation for View Synthesis in Multiview Video Coding

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Multiple View Geometry

Epipolar Geometry Prof. D. Stricker. With slides from A. Zisserman, S. Lazebnik, Seitz

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Robot Vision: Camera calibration

Pinhole Camera Model 10/05/17. Computational Photography Derek Hoiem, University of Illinois

BIL Computer Vision Apr 16, 2014

Image Formation I Chapter 2 (R. Szelisky)

Epipolar Geometry CSE P576. Dr. Matthew Brown

CS6670: Computer Vision

Epipolar Geometry and Stereo Vision

Image Formation I Chapter 1 (Forsyth&Ponce) Cameras

Multiview Reconstruction

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Platelet-based coding of depth maps for the transmission of multiview images

Image Formation I Chapter 1 (Forsyth&Ponce) Cameras

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 5, MAY

Multiple View Geometry in Computer Vision

Camera Geometry II. COS 429 Princeton University

3D Geometry and Camera Calibration

ENGN D Photography / Spring 2018 / SYLLABUS

Single-view 3D Reconstruction

Predictive Coding of Depth Images Across Multiple Views

3D Reconstruction from Scene Knowledge

Assignment 2 : Projection and Homography

CS231A. Review for Problem Set 1. Saumitro Dasgupta

Model-Based Stereo. Chapter Motivation. The modeling system described in Chapter 5 allows the user to create a basic model of a

Reading. 18. Projections and Z-buffers. Required: Watt, Section , 6.3, 6.6 (esp. intro and subsections 1, 4, and 8 10), Further reading:

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

Announcements. Mosaics. How to do it? Image Mosaics

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

arxiv: v1 [cs.cv] 28 Sep 2018

Scene Segmentation by Color and Depth Information and its Applications

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)

Projector Calibration for Pattern Projection Systems

Efficient View-Dependent Sampling of Visual Hulls

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

Compositing a bird's eye view mosaic

Module 7 VIDEO CODING AND MOTION ESTIMATION

All human beings desire to know. [...] sight, more than any other senses, gives us knowledge of things and clarifies many differences among them.

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Advanced Vision Guided Robotics. David Bruce Engineering Manager FANUC America Corporation

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

3D Modeling using multiple images Exam January 2008

3D Models from Range Sensors. Gianpaolo Palma

CS 4495/7495 Computer Vision Frank Dellaert, Fall 07. Dense Stereo Some Slides by Forsyth & Ponce, Jim Rehg, Sing Bing Kang

Structure from motion

Hybrid Rendering for Collaborative, Immersive Virtual Environments

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

An investigation into stereo algorithms: An emphasis on local-matching. Thulani Ndhlovu

A Review of Image- based Rendering Techniques Nisha 1, Vijaya Goel 2 1 Department of computer science, University of Delhi, Delhi, India

Transcription:

1 Multiview 3D video / Outline 2 Advanced Topics Multimedia Video (5LSH0), Module 02 3D Geometry, 3D Multiview Video Coding & Rendering Peter H.N. de With, Sveta Zinger & Y. Morvan ( p.h.n.de.with@tue.nl ) Camera geometry Intrinsic/extrinsic camera parameters Camera calibration 3D Video coding & Multiview rendering 3D Coding architecture concept Depth signals and depth estimation 3D Multiview video coding 3D rendering: algorithm and artifacts removal 5LSH0 Advanced Topics Video & Analysis A. Projective geometry - Introduction Sveta Zinger Video Coding and Architectures Research group, TU/e ( s.zinger@tue.nl ) 3 Introduction (1) Projective geometry branch of geometry dealing with the properties and invariants of geometric figures under projection serves as a mathematical framework for 3D multi-view imaging, 3D computer graphics image formation process modeling image synthesis reconstruction of 3D objects from multiple images 4 Introduction (2) 5 Introduction (3) 6 Euclidian geometry Usually used to model lines, planes or points in 3D Two parallel rails intersect in the image plane at the vanishing point Why do we need projective geometry? easier to model intersection of parallel lines at infinity perspective scaling operation requires division in Euclidian geometry => non-linearity => better to avoid 1

Homogeneous coordinates (1) Define a point in Euclidian space In projective space 3-element vector 4-element vector (X, Y, Z) T (X 1, X 2, X 3,, X 4 ) T Inhomogeneous coordinates Homogeneous coordinates 7 Homogeneous coordinates (2) Inhomogeneous coordinates (X, Y, Z) T and homogeneous coordinates (X 1, X 2, X 3,, X 4 ) T are related X = 4 X1 X 4, Y = X 2 X 4, Z = X 3 X 4, where X 0 Mapping from n-dimensional Euclidian space T T ( X X,..., X ) ( λx, λx,..., λx, λ), 1, 2 n 1 2 n 4 443 144 44 2 444 Euclidian space projective space 14 2 4 3 where λ 0 free scaling parameter, or homogeneous scaling parameter 8 Pinhole camera model (1) 9 3D from multiple images: concept 10 positive image Cloud of points is projected into multiple images from different viewpoints. 1. Coordinate frame aligned with camera center. 2. Image in focal plane (between object and camera). 3. Allows projection of 3D object onto 2D image. Can we reverse the projection process and reconstruct the points? 3D from multiple images: some images from input sequence 11 3D from multiple images: algorithm 12 http://www.cs.unc.edu/~marc/tutorial.pdf 2

3D from multiple images: result 13 Mod 03 Multiview geometry & coding 14 B. Camera Geometry PhD thesis of Ping Li, TUe VCA http://vca.ele.tue.nl/people/pli_publ.html Camera geometry / Concept (1) To understand the 3D structure of objects/scene, a relation between point coordinates in the 3D world to pixels position is required. We have 3 coordinates systems: image, camera and world. Goal: map a point in 3D space (the world coordinate system) to the image plane (the image coordinates system). 15 Camera geometry / Concept (2) Link world, camera and image coordinates by a set of parameters known as intrinsic and extrinsic parameters. Intrinsic parameters: focal length, width and height of the pixel on the sensor, position of the principal point (origin of the image coordinates system). Extrinsic parameters: camera position, camera orientation. 16 Camera geometry / Image formation Remember: pinhole camera model Projection of point onto the image plane results in point 17 Central projection using homogeneous coordinates With homogeneous coordinates, the central projection can be written as a linear equation. The central projection maps the 3D space to 2D space. 18 3

Camera geometry / CCD camera Require more general conversion to pixel coordinates Principal point offset in pixel units. 19 Extrinsic camera parameters Extrinsic parameters define orientation and location of the camera in the world coordinate system. Involves Euclidean transform between world and camera coordinates 20 Conversion from camera coordinates to pixel coordinates is obtained with width/height, hence fraction of the pixels. - R is a 3x3 rotation matrix - t is a 3D translation vector Projective camera / Summary (1) 21 Projective camera / Summary (2) 22 We can finally map the 3D point to the image Combining the camera matrix (intrinsic parameters) and the rotation/translation matrix (extrinsic parameters) we obtain the camera calibration matrix. pixel coordinates world coordinates intrinsic parameters extrinsic parameters camera calibration matrix Projection matrix 3x4 has 11 degrees of freedom (scaling invariance). Camera calibration Goal: estimating coefficients of the camera calibration matrix. Once the camera calibration matrix parameters are known, the camera is calibrated. Simple calibration algorithm It is assumed that the world coordinates of points are known with their corresponding pixel coordinates. Points are usually arranged in a special pattern for easy calibration. 23 Linear method for estimating matrix C (1) World-point coordinates and image-pixel positions are linked with the camera calibration matrix is the 3x4 projection matrix, can be written as Algorithm consists of two steps: 1. Compute matrix C with a set of know 3D positions and their respective positions in the image. 2. Extrinsic and intrinsic parameters are estimated from C. 24 4

Linear method for estimating matrix C (2) Use world point coordinates and their corresponding pixel coordinates in the image to determine. Each correspondence generates two equations which can be written as 25 Linear method for estimating matrix C (3) Stack equations into one equation system: The equation has 12 unknown parameters: at least 6 correspondence points are required. Typically, more points are used. Equation system gets over-constrained. Equation is then solved using a least squares minimization. 26 Application: back-projection of points to rays Given a point x in an image, we determine the set of points in 3D space that map to this point. This ray is presented as the joint of two points: the camera center where the point where is the pseudo-inverse of The ray is then formed by the joint of these two points: 27 Mod 02 Multiview 3D Geometry & Coding C. Multiview 3D TV coding architecture 28 Introduction multiview coding Presented work was initiated to support the development of video compression algorithms for 3D video systems. We present a 3D video system architecture based on new approaches for Depth estimation : Acquisition of 3D content 29 MVC intro / Applications of 3D video The MPEG community has a considerable interest in standardizing technologies for 3D and FTV applications, e.g. 3D TV enabling the perception of depth using a multi-view display, free-viewpoint video that allows the viewer to interactively select a viewpoint of the scene. 30 New coding techniques for the efficient storage and transmission and, Rendering of 3D video. 5

3D video Representation Formats 31 3D Video Coding Architecture 32 Several 3D video representation formats are explored: A C D 1-texture+1-depth format, N-texture video format and, depth estimation H.264 multi-view H.264 multi-view texture Coder network H.264 multi-view depth Decoder H.264 multi-view texture Decoder 3D rendering - view synthesis N-texture+N-depth that was adopted in our work. Multi-view video acquisition B Multi-view video compression decoding and rendering The proposed 3D video system architecture is composed of: a depth-estimation sub-system (3D acquisition), an H.264 multi-view video coder an H.264 multi-view depth video coder a 3D-video rendering engine Mod 02 Multiview 3D Geometry & Coding D. 3D Depth Estimation and 3D Rendering 33 Depth estimation using 2 views (1) A popular method: calculate depth for each scanline using a 1D optimization Algorithm summary: for each scanline, calculate a table of matching costs. Optimize matching-cost table using dynamic programming (ref. Viterbi). 34 table of matching-costs admitted depth Depth estimation using 2 views (2) 35 Depth estimation using multiple views 36 To estimate accurate depth images, we propose two new constraints. Instead of estimating depth images pair-wise, we propose to employ all views simultaneously. To avoid scanline artifacts, we employ an inter-scanline cost that enforces smooth variations of depth (smoothness constraint). 6

Depth estimation using multiple views 37 Rendering of multi-view images (1) 38 A pixel position in the left view can be predicted from its corresponding position in the right view using the image warping equation Disadvantage: generates holes in the rendered image, i.e. occluded pixels. Requires calibration parameters Rendering of multi-view images (2) 39 View Rendering Example - Original 40 Relief texture mapping expresses the image warping equation Advantages of relief texture are that the: Pre-warping step performs horizontal and vertical pixel-shift combined with pixel re-sampling, thus resolving occluded pixels, Post-warping equation corresponds to a planar texture mapping operation, thus efficiently implemented in a GPU. View Rendering Example Rendered 41 3D from multiple images: video demonstration 42 This video shows: input image sequence reconstructed camera positions, cloud of points obtained depth map http://www.cs.unc.edu/~marc/ texture mapped on depth 7

View rendering: Free-ViewPoint (FVP) 43 Challenges of FVP: cracks due to image sampling 44 Reference image 3D Warping Possible solution: median filtering Virtual image Challenges of FVP: poorly defined borders => contour artifacts 45 Challenges of FVP: disocclusions inpainting 46 Possible solution: label edges and delete them after warping Possible solution: fill in the disoccluded pixels with background texture information View rendering: free-viewpoint result 47 References 48 Y. Morvan, Acquisition, Compression and Rendering of Depth and Texture for Multi-view Video, Ph.D. thesis, Eindhoven University of Technology, 2009 http://mathworld.wolfram.com/projectivegeometry.html PhD thesis research Luat Do, TUe VCA http://vca.ele.tue.nl/people/ldo_publ.html 8

Mod 02 Multiview 3D Geometry & Coding E. 3D Coding of multiview images 49 Predictive coding of multiview images For an efficient transmission, an independent compression of correlated camera-views should be avoided. One predictive-coding algorithm investigated is based on an image rendering technique. The idea followed is to render an image as seen by the predicted camera using Depth Image Based Rendering (DIBR). 50 Predictive coding of depth images Reminder: the adopted video format is N-texture+N-depth. Coding of multi-view depth images: render depth image at the position of a predicted camera. 51 Including view rendering in H.264 coding For the compression of multiple views, we have integrated the view-rendering algorithm into an H.264 encoder. A camera view (texture or depth) is predicted using either» the central reference camera, or» the synthetic rendered view. 52 Synthetic view Adv. Topics MMedia Video / 5LSH0 central / Mod 02 3D intro & multiview camera-view video Coding structure for random access The coding structure defines which view can be employed as reference (predictor) for compression. For free viewpoint video, the coding structure should allow random access to an arbitrary view. 53 Coding Results of Breakdancers seq. (1) 54 Texture is temporally stable: ME slightly outperforms view-synthesis prediction at the loss of random access. Coding structure simulcast Coding structure with random access NB: There is a trade-off between coding efficiency and random-access! The proposed multi-view coding system provides: - random access to arbitrary views - coding performances similar to Simulcast coding. 9

Coding results of Breakdancers seq. (2) 55 Multiview Coding / Conclusions 56 Depth is not temporally stable: ME does not work The presented 3D video processing and coding system performs accurate depth estimation by employing simultaneously multiple views, renders high-quality images by appropriately handling occluded pixels achieves efficient compression by exploiting inter-view redundancy for the texture and depth images. The coding system relies on H.264 and thus allows a gradual introduction of cost-efficient 3D systems. 10