INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO-IEC JTC1/SC29/WG11

Similar documents
Realtime View Adaptation of Video Objects in 3-Dimensional Virtual Environments

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Hardware Displacement Mapping

Lecture 7, Video Coding, Motion Compensation Accuracy

Natural Viewing 3D Display

5LSH0 Advanced Topics Video & Analysis

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

View Synthesis for Multiview Video Compression

Video coding. Concepts and notations.

COMPUTER GRAPHICS - MIDTERM 1

A Review of Image- based Rendering Techniques Nisha 1, Vijaya Goel 2 1 Department of computer science, University of Delhi, Delhi, India

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Next-Generation 3D Formats with Depth Map Support

Computer Science 426 Midterm 3/11/04, 1:30PM-2:50PM

TECHNICAL RESEARCH REPORT

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Depth Estimation for View Synthesis in Multiview Video Coding

Transactions on Information and Communications Technologies vol 19, 1997 WIT Press, ISSN

Model-Based Stereo. Chapter Motivation. The modeling system described in Chapter 5 allows the user to create a basic model of a

2.5 Animations. Applications. Learning & Teaching Design User Interfaces. Content Process ing. Group Communi cations. Documents.

2.5 Animations. Contents. Basics. S ystem s. Services. Usage. Computer Architectures. Learning & Teaching Design User Interfaces.

Multiview Image Compression using Algebraic Constraints

Graphics Hardware and Display Devices

Compositing a bird's eye view mosaic

Figure 1: Representation of moving images using layers Once a set of ane models has been found, similar models are grouped based in a mean-square dist

Multimedia Technology CHAPTER 4. Video and Animation

Terrain Modeling Software User Manual

Digitization of 3D Objects for Virtual Museum

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

CIS 580, Machine Perception, Spring 2015 Homework 1 Due: :59AM

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Extensions of H.264/AVC for Multiview Video Compression

Camera model and multiple view geometry

Image Processing: Motivation Rendering from Images. Related Work. Overview. Image Morphing Examples. Overview. View and Image Morphing CS334

3 Polygonal Modeling. Getting Started with Maya 103

Georgios Tziritas Computer Science Department

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error.

cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

View Synthesis for Multiview Video Compression

Standard Graphics Pipeline

Real-time Generation and Presentation of View-dependent Binocular Stereo Images Using a Sequence of Omnidirectional Images

Alexandros Eleftheriadis and Dimitris Anastassiou. and Center for Telecommunications Research.

3D Modeling of Objects Using Laser Scanning

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22)

Jamison R. Daniel, Benjamın Hernandez, C.E. Thomas Jr, Steve L. Kelley, Paul G. Jones, Chris Chinnock

Bluray (

Basic distinctions. Definitions. Epstein (1965) familiar size experiment. Distance, depth, and 3D shape cues. Distance, depth, and 3D shape cues

Hand-Eye Calibration from Image Derivatives

Advanced Lighting Techniques Due: Monday November 2 at 10pm

6. Parallel Volume Rendering Algorithms

IMAGE-BASED RENDERING TECHNIQUES FOR APPLICATION IN VIRTUAL ENVIRONMENTS

Recently, there has been a growing interest in the use of mosaic images to represent the information contained in video sequences. This paper systemat

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

syste hms, dizationn Linköping July LINKÖPIN Calpe by Rubén Berzosa Institutionen för Linköping

Remote Reality Demonstration

Scalable Extension of HEVC 한종기

THE H.264 ADVANCED VIDEO COMPRESSION STANDARD

View Synthesis Prediction for Rate-Overhead Reduction in FTV

Overview of Multiview Video Coding and Anti-Aliasing for 3D Displays

Review for the Final

Volume Illumination. Visualisation Lecture 11. Taku Komura. Institute for Perception, Action & Behaviour School of Informatics

A Shared 4-D Workspace. Miranda Ko. Peter Cahoon. University of British Columbia. Vancouver, British Columbia. Canada.

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

Maya tutorial. 1 Camera calibration

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

TSBK03 Screen-Space Ambient Occlusion

Object Modeling from Multiple Images Using Genetic Algorithms. Hideo SAITO and Masayuki MORI. Department of Electrical Engineering, Keio University

The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map

Image Based Rendering

MET71 COMPUTER AIDED DESIGN

Advanced High Graphics

1. Interpreting the Results: Visualization 1

Video Compression An Introduction

ADOBE 9A After Effects(R) CS3 ACE. Download Full Version :

coding of various parts showing different features, the possibility of rotation or of hiding covering parts of the object's surface to gain an insight

Overview. By end of the week:

ENVI Classic Tutorial: 3D SurfaceView and Fly- Through

Multiview Depth-Image Compression Using an Extended H.264 Encoder Morvan, Y.; Farin, D.S.; de With, P.H.N.

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES

/5 Stacks. Displays the slice that follows the currently displayed slice. As a shortcut, press the > key.

Volume Rendering. Computer Animation and Visualisation Lecture 9. Taku Komura. Institute for Perception, Action & Behaviour School of Informatics

Chapter 3 Image Registration. Chapter 3 Image Registration

Image-Based Modeling and Rendering

PROJECTION MODELING SIMPLIFICATION MARKER EXTRACTION DECISION. Image #k Partition #k

Module 7 VIDEO CODING AND MOTION ESTIMATION

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 11, NOVEMBER

Interpolation using scanline algorithm

Jingyi Yu CISC 849. Department of Computer and Information Science

MOBILE VIDEO COMMUNICATIONS IN WIRELESS ENVIRONMENTS. Jozsef Vass Shelley Zhuang Jia Yao Xinhua Zhuang. University of Missouri-Columbia

Evolution of Imaging Technology in Computer Graphics. Related Areas

Volume Illumination, Contouring

The ring of place cells in the rodent hippocampus is partly under the control

CS 130 Final. Fall 2015

An Algorithm for Seamless Image Stitching and Its Application

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

FAST ALGORITHM FOR CREATING IMAGE-BASED STEREO IMAGES

Transcription:

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO-IEC JTC1/SC29/WG11 CODING OF MOVING PICTRES AND ASSOCIATED ADIO ISO-IEC/JTC1/SC29/WG11 MPEG 95/ July 1995 TITLE: A Multi-Viewpoint Digital Video Codec utilizing Computer Graphics Tools & Image Warping Tech. based on Perspective Constructions of Intermediate Viewpoint Images PRPOSE: An Ecient MPEG-2 Compatible Coding Scheme for Multiview Videos SORCE: Belle L. Tseng and Dimitris Anastassiou, Columbia niversity 1 MLTIVIEW PROFILE PROPOSAL A multiview video is a 3D extension of the traditional movie sequence, in that there are multiple perspectives of the same scene at any time instance. Comparable to a movie made by a sequence of holograms, a multiview video oers the similar look-around capability. An ideal multiview system allows any user to watch a true 3D stereoscopic sequence from any perspective the viewer chooses. Such a system have practical uses in interactive applications, medical surgery technologies, educational and training demonstrations, remote sensing developments, and a step towards virtual reality. We propose an ecient video coding scheme to accommodate transmission of multiple viewpoint images at any one instance in time. With the goal of compression and speed, a novel approach is presented incorporating a variety of existing tools and techniques. Construction of each viewpoint image is predicted using a combination of perspective projection of 3D models, 3D texture mapping, and image warping. Foreseeable uses for this coding technique include all applications requiring multiviewpoint video transmission on bandwidth-limited channels. With the development of digital video technology, an ISO standarization of video compression codec (MPEG-2) has recently been achieved. The coding standard is specied for one sequence of video, but has also recently been shown to be applicable to two sequences of stereoscopic signals. Extending the number of viewpoint videos beyond two views thus requires an intelligent and novel extension to the current MPEG-2 specication. nder the block-based constraint of MPEG-2, one disparity vector corresponding to each block of an image can be incorporated and transmitted as a motion vector. However for acceptable predictions of multi-viewpoint images, accurate disparity values for every pixel is required for a more continuous interpolation of intermediate viewpoint images. Thus a disparity/depth map is required for every pixel, whose values can be transmitted as the gray-level intensity of a second \image". Also since the majority of the depth image is quite at, the depth map can be considerably compressed. Finally a third signal is transmitted consisting of the residual prediction error used for improving the construction of a selected viewpoint image. The multiview prole will supplement the main prole of MPEG-2, thus one viewpoint is transmitted on the main prole bitstream. In our case, the central viewpoint sequence is transmitted on the main prole. Let N = number of viewpoint sequences, where the minimum number of viewpoints is equal to 3, N>= 3. The number of viewpoints must be extendable in the future, especially since the number of viewpoints is an issue in itself, thus accomodation of additional viewpoints should be incorporated as an essential feature. 1

One central image is designated to be the principal view in which the other multiviews are predicted from. The central image is denoted by I Center from viewpoint V Center. The central viewpoint, usually the middle viewpoint, is chosen so that its image has the highest collection of overlapping objects with each of the other viewpoint images. In this manner, the central viewpoint image can be used to interpolate and predict the most of the other multiviews. The other multiview images are designed by I X from viewpoints V X. The minimum number of viewpoints for the proposed system is three. An example of four additional viewpoint positions is illustrated, where V X = V Lef t ;V Right ;V Top ;V Bottom with respective images I X = I Lef t ;I Right ;I Top ;I Bottom. These camera-captured images available at the encoder are referred to as real multiviews, whereas images not directly taken by a camera, but derived by prediction or interpolation methods, are called virtual multiviews. Virtual pictures also consist of those viewpoint images in between two real cameras, thus having the virtual constructions allow the viewer to see a smoother video transition between two real views. The image coordinate system corresponding to each viewpoint is dened as (X i ;Y i ), where viewpoint index i = Center; Lef t; Right; T op; Bottom. Let the global rectangular coordinate system (X; Y; Z) be dened as corresponding to the image coordinate system (X C ;Y C ) of the central viewpoint, where Z denes the orthogonal axis from the central image plane. The viewpoint vector for some view index i is denoted as V i =[vx i ;vy i ;vz i ;va i ;vb i ;vc i ]. The camera position is represented by the coordinates (vx i ;vy i ;vz i ), and the camera zooming parameter is described by va i. In addition, the camera rotations are given by the horizontal panning angle vb i and the vertical tilting angle vc i.thus the central viewpoint V C =[0; 0; 0; 1; 0; 0]. Starting with the central viewpoint image IC t (X C;Y C ) at time t, a depth value is calculated for every pixel point, thus forming a depth map image, named DC t, corresponding to the central image. A number of methods can be used to obtain the depth values, including depth calculation from defocused images, disparity estimation using the gradient approach between other viewpoint images, block correspondence matching between multiviews, etc.,... Consequently for every image point IC t (x C;y C ), there is a corresponding depth value z C = DC t (x C;y C ). The set of 3D coordinate information (x C ;y C ;z C ) spanning the central image is thus similar to a 3D geometrical surface model and a corresponding texture map represented by the intensity values of IC t (x C;y C ). Following, a generic mesh representation is obtained whose vertices correspond to the pixels of the central image, thus generating a 3D surface model of the scene. This geometric representation oers ease in interpolating dierent viewpoints. Similar to rendering approaches in computer graphics, given a 3D geometrical model and its associated texture intensity, every viewpoint can be eectively and eciently constructed by simple geometric transformations followed by 3D texture mapping. Furthermore, interpolating virtual viewpoint images is facilitated and made feasible. tilizing computer graphics capabilities, currently available in hardware for speed purposes, texture mapping is the most appropriate and ecient method for rendering an image of a 3D surface structure. In texture mapping, textured images are mapped onto corresponding geometric meshed surfaces. In addition, texture mapping incorporates the fore-shortening eect of surface curvatures. The texture is a 3D function of its position in the object, and therefore the central viewpoint image represents a 3D texture function. Furthermore, other tools from computer graphics can be incorporated for a better overall construction. For example, if the light source can be extracted from the available viewpoint images, illumination and shading can be included to improve the predictions. As the channel bandwidth increases, the quality of the decoded multiview videos must also improve. Our approach is to increment the number of transmitted prediction error images at any one time t, thus associated with an additional non-central viewpoint image construction. Consequently, perfect reconstruction is always achievable with unlimited bandwidth. 2

2 STEPS for ENCODER 1. Find the central viewpoint, achieving the best predictions for the other viewpoint images. sually the central viewpoint is the middle camera viewpoint position. The central image is denote I C and the non-central images are designated by I X, corresponding to central viewpoint V C and non-central viewpoints V X respectively. 2. Encode the central image I t C in the main prole of MPEG-2. 3. Transmit the encoded bitstream for I t C. 4. Decode the bitstream for I t C to determine the received central image, ^ I t C. 5. Calculate the depth map, named D t C, for the central viewpoint image I t C. One such depth estimation technique is depth from defocused images 6. Encode the depth map D t C. 7. Transmit the encoded bitstream for D t C. 8. Decode the bitstream for D t C to determine the received depth map, ^ D t C. 9. Obtain a 3D mesh/wireframe representation, called M t, for the central image IC t of time t by associating each coordinates pair (x C ;y C ) with its corresponding depth value DC t (x C;y C ). Consequently, the 3D surface texture is derived from the 2D image, and a graphical model of the image scene is obtained. 10. Determine a non-central viewpoint, designated as VX t, from the set of original real viewpoints, in which the best construction of viewpoint image IX t is desired for time t. The selection of non-central viewpoint VX t is based on a round-robin schedule, where every viewpoint is alternatively selected at a period of N, 1, see Figures. 11. Transmit the selected viewpoint vector V t X for time t. 12. Predict the selected non-central viewpoint images, referred to as PI t X, by rendering the 3D mesh model M t in the specied viewpoint V t X. First, construct the wireframe image from the mesh representation by simple geometric transformations with perspective projections. Then texture map corresponding areas of the central image onto the appropriate 3D wireframe substructures. 13. Calculate the prediction errors PE t required for the nal reconstruction of the selected viewpoint V t X, by examining the dierence between the original image I t X and the predicted image PIt X. 14. Encode the prediction errors PE t. 15. Transmit the prediction errors PE t associated with the chosen viewpoint V t X. 16. Determine if the allocated bit rate allows transmission of an additional viewpoint prediction error. If bandwidth permits, goto step (8) and chose another non-central viewpoint. Otherwise, start all over from the beginning for the next frame at step (2). 3

3 STEPS for DECODER 1. Decode the bitstream of the central image, denoted I ^ C t, from the main prole of MPEG-2. 2. Store the decoded central image ^ I t C in memory. 3. Decode the bitstream of the depth map, thereafter referred to as ^ Dt C. 4. Obtain a 3D mesh/wireframe representation, called M t, for the central image I t C by associating each coordinates pair (x C ;y C ) with its corresponding depth value D t C (x C;y C ). Consequently, the 3D surface texture is derived from the 2D image, and a graphical model of the image scene is obtained. 5. Store the 3D mesh model M t for time t in memory. 6. Decode the selected viewpoint vector V t X for time t. 7. Store the viewpoint vector V t X selected for time t in memory. 8. Decode the prediction errors, denoted ^ PE t, associated with the selected viewpoint V t X. 9. Determine the requested viewpoint vector from the user at the decoding system, assigned V t. 10. Predict the user-requested viewpoint images (whether real or virtual), referred to as PI t where index designates the appropriate viewpoint V t. The predicted images PI t are constructed by rendering the 3D mesh model M t in the specied viewpoint V t. First, construct the wireframe image from the mesh representation by simple geometric transformations with perspective projections. Then texture map corresponding areas of the central image onto the appropriate wireframe substructures. This module is similar to that of the encoder. 11. Determine if the encoder selected non-central viewpoint V t X is similar to the user-requested viewpoint V t. 12. If equal, V t = V t X, Reconstruction of the nal image I ^ X t is obtained by combining the prediction errors PE ^ t with the predicted images PIX t. This kind of reconstruction is dened as Type IPrediction. Store the reconstructed non-central image I ^ X t in memory. Start at the next frame by returning to step (1). 13. If not similar, V t 6= V X t, the following steps are carried out. Given the user-requested viewpoint V t ^, retrive from memory the image to the nearest past image I ^ reconstructed by atype I Prediction. I t,f corresponding Similarly, given the user-requested viewpoint V t, retrive from memory the image ^ corresponding to the nearest future image I ^ reconstructed by atype I Prediction. Form 3D mesh model M t,f and I t,f. D t,f C I t,b created at time t, f using the positional information from In parallel, form 3D mesh M t+b created at time t + b using the information from D t+b C I t+b. 4 and

Generate the front mesh image MI t,f Also, generate the back mesh image MI t+b points. ^ for the image I t,f by locating a grid of xed points. for the image ^ I t+b by locating a grid of xed Warping between the front mesh M t,f and the back mesh M t+b, generate the intermediate mesh-predicted image MPI t for time t. Finally, obtain a construction for I ^ t by combining the intermediate mesh-predicted image MPI t with the predicted image PIt. The method for this nal combination is left to the decoding system or any image fusion techniques may be adopted here, eg. XOR operator, average,... Construction of this nal image I ^ t is termed as Type IIPrediction. If other viewpoint images are requested by the user, the steps may be repeated again. Otherwise start at the beginning of step (1) to decode the next time frame. 4 ADVANTAGES 1. Tracking of image points are facilitated by knowing the real world 3D coordinates of each point. 2. Prediction of non-transmitted virtual viewpoints is possible and ecient using the proposed combination of computer graphics tool. 3. View Interpolation is fast and ecient because of hardware speed and hardware rendering capabilities. 4. A possible solution to object-based approaches for MPEG-4 is suggested by this method. Some MPEG-4 proposals have insinuated the construction of 3D object models from the image sources. 5. Takes advantage of many existing graphical tools and animation facilities. 6. Combining image processing with computer graphics capabilities. 7. Incorporating the fore-shorting eect of 3D surfaces is automatic by using 3D structures with image texture data. 8. Quality of multi-viewpoint video construction is proportional to transmission bandwidth. 9. Freedom to improve individual coding modules in accordance with advance technologies. Similar to MPEG-2's main prole, this codec does not prevent creativity and inventive spirit. 10. The exibility of multi-viewpoint parameters supports a wide range of multiview video applications. 5

5 MLTIVIEW VIDEO PARAMETERS Picture Width Picture Height Frame Rate Bit Rate Buer Size Number of Viewpoints Viewpoint Vectors 6

6 APPROACHES / TECHNIQES / TOOLS MPEG-2 Video Coding Standard Determine Camera Viewpoints Depth from Defocus 3D Mesh Representation 3D Viewpoint Rendering Methods Texture Mapping Digital Image Warping 7