syste hms, dizationn Linköping July LINKÖPIN Calpe by Rubén Berzosa Institutionen för Linköping

Size: px
Start display at page:

Download "syste hms, dizationn Linköping July LINKÖPIN Calpe by Rubén Berzosa Institutionen för Linköping"

Transcription

1 Institutionen för syste emteknikk Department of Electrical Engineeringg Final Degree Project Overview of 3D Video: Coding Algorith hms, Implementations and Standard dizationn Final Degreee Project performed in Information Coding by Rubén Berzosa Calpe Linköping July 2011 TEKNISKA HÖGSKOLAN LINKÖPIN NGS UNIVERSITETT Department of Electrical Engineering Linköping University S Linköping, Sweden Linköpings tekniska t högskola Institutionenn för systemteknik Linköping

2 Overview of 3D Video: Coding Algorithms, Implementations and Standardization Master thesis in Information Coding at Linköping Institute of Technology by Rubén Berzosa Calpe LiTH-ISY-EX--YY/XXXX--SE

3

4 ii Abstract 3D technologies have aroused a great interest over the world in the last years. Television, cinema and videogames are introducing, little by little, 3D technologies into the mass market. This comes as a result of the research done in the 3D eld, solving many of its limitations such as quality, contents creation or 3D displays. This thesis focus on 3D video, considering concepts that concerns the coding issues and the video formats. The aim is to provide an overview of the current state of 3D video, including the standardization and some interesting implementations and alternatives that exist. In the report necessary background information is presented in order to understand the concepts developed: compression techniques, the dierent video formats, their standardization and some advances or alternatives to the processes previously explained. Finally, a comparison between the dierent concepts is presented to complete the overview, ending with some conclusions and proposed ideas for future works.

5

6 iv Acknowledgements This thesis completes my Master degree in Electrical Engineering and it has been carried out at Information Coding Division of the Electrical Engineering Department, in Linköpings Universitet. I would like to thank to my supervisor Jens Ogniewski, who, during the development of this thesis, has dedicated a lot of time, helping and providing me valuables advices, even in the written report. I can say I have nished this thesis and I have learnt much doing it because of his dedication. I also appreciate the opportunity Robert Forchheimer has given to me to carry out my thesis at Information Coding. Living in Linköping during 6 months has been one of the best experiences I have ever had, just something I will never forget. I have grown as a person learning many things life had not taught me yet. The new situations and feelings experienced have enriched me in a way I would have never imagined. The fantastic people I have known here is undoubtedly one of the best things I keep with me. Moreover, I would like to name all the people who are around my life. My friends, who are always when I need them and with whom I have had a lot of good moments. You make me feel good when I am with you and you deserve to be included in these lines. You can be sure that I do not forget any of you. And nally, I would like to express the most special gratitude to my family. My parents, who have been supporting me all my life, in good and bad moments, making eorts to do the best of me and for me. And my sister, who is a special person in my life and has always been a example for me. Without you three, this thesis would have never been a reality and I would have never arrived until this moment. A long way where you have accompanied me, enjoying with me my happy days and sharing with me my sad days. For this reason, this thesis is yours.

7

8 Contents Acknowledgements iv 1 Introduction Background Thesis content Thesis outline Basic coding issues Spatial prediction Discrete cosine transform Temporal prediction Inter-view prediction Inter-view prediction for key pictures Inter-view prediction for non key pictures Specic implementations D Video formats Video only formats Conventional Stereo video Multiview video Coding standards Depth enhanced formats Video plus depth Multiview video plus depth Layered depth video Coding standards Other implementations Computational saving methods Wavelet transformation Real 3D - Digital holography Comparison Discrete Cosine Transform Vs Wavelet Transform Video only formats, Depth enhanced formats and Holography.. 36 vi

9 CONTENTS vii 4.3 Spatial, temporal and inter-view prediction Conclusions Conclusions Future work Bibliography 42

10 Chapter 1 Introduction 3D 1 video technologies in cinema, television and videogames have recently generated an increased interest over the world, paving the way for a lot of researching activities in 3D areas. This results in many improvements achieved in the 3D eld, solving many of its limitations such as quality, content creation or 3D displays. It has to be noted though that 3D technology is known since the 19th century. Consequently, 3D technology is, although little by little, arriving at cinemas and at home, through dierent available channels: terrestrial broadcast, cable, Blue-Ray disc or streaming. Some examples are the successful lm Avatar, the football World Cup 2010 broadcasted in 3D, the NVIDIA 3D vision with domestic and professional applications or in the videogames world, the new Nintendo 3DS and the PlayStation 3 with 3D videogames. However, it is required that all the parts of the 3D processing chain are prepared, specically acquisition, coding and display. One of the major issues is coding the information due to the huge amount of data generated when recording several views and the possibility of adding depth data. That is why advances in coding techniques, even standards for Multi-View Video (MVV) have been developed [1], helping the introduction of the systems to the 3D market and achieving good quality-compression compromises. In addition, 3D video can be presented in dierent representations, which can be divided into several classes: those that include depth data, 4D wavelets, object model based or video only formats are some examples. Video only formats are the formats which are more common nowadays and have their corresponding standards. Stereoscopic technology 2 is currently the format widely used for many platforms. On the other hand, formats with depth images can be the next generation to be used for 3D video. It seems hence that a lot of formats, techniques and standards exists or have to be developed for 3D video applications. For this reason, it is interesting to 1 it refers to three dimensions: width, length and depth 2 this technique creates the illusion of depth in an image through two dierent video streams, one for each eye. 1

11 CHAPTER 1. INTRODUCTION 2 write a thesis to expose and analyze the topic and its current situation. Finally, the properties of the human visual system are the key of how previous systems works, consequently dierent application scenario and requirements concerning 3D exists. Below these three topics are claried. 1.1 Background When video and compression of its signal are treated, knowing what the human visual system is and how it works is always useful since it helps to understand if the signal is represented in a correct form or if some visible information is removed. How details both spatial as well as temporal in video are perceived depends on the visual system. The visual system starts working if light arrives on the retina, generating a stimulation which leads to the perception. The photo receptors of the retina are rods and cones and each of these types has its own properties and characteristics, providing dierent information: Rods: there are between million uniformly distributed rods in the retina. Their connection is parallel and they are achromatic, being sensitive to the intensity. The vision with these photo receptor cells is scotopic, which means they work when there is low illumination. This also results in another characteristic: it is a peripheric visual perception. Cons: there are about 6-7 million, not uniformly distributed and situated in the fovea. They are color sensitive, specically, three types of cons exits: red-sensitive (more than 60%), blue-sensitive (about 30%) and green-sensitive (only around 5%). They provide the photopic vision 3 and the details, working well when there are high illumination conditions. More information about photoreceptors, eye structure or graphics like spatial distribution of rods and cons in the retina is available in [2]. Due to that there are more rods than cons, the eye is more sensitive to intensity changes than to color changes. Apart from this assessment, all previous information is enough to justify the use of YCbCr 4 format instead of RGB 5 since it matches better the human visual perception. However, there are more aspects of the eye that must be taken into account. One of them, the persistence of vision, which is produced because the eye retains the displayed picture for a short time, aecting the frame rate in videos. Another important property is the spatial integration, which is related with the spatial frequencies and consists in averaging the details when the eye cannot discriminate them. Therefore, it is essential to know the human visual system and their properties to be able to exploit dierent redundancies and create proper video formats. 3 vision of the eye under well-lit conditions 4 Y provides brightness and Cr and Cd the chrominance 5 Red, Green and Blue. It refers to the color composition done from the primary colors.

12 CHAPTER 1. INTRODUCTION 3 But there is another subject about 3D technology which is important to focus on before beginning the concepts of the following chapters: application scenarios. 3D video is becoming a reality with dierent applications, helped by Multiview Video Coding (MVC) techniques and the advances in displays and acquisition. These applications can be grouped in dierent categories: Free viewpoint video (FVV): the viewer can choose the viewpoint in the 3D space he/she wants to observe. So it is possible to see the scene from dierent perspectives, navigating within a concrete range. Three dimensional TV (3DTV): this is an extension of the stereoscopic technology. Several cameras capture the light eld of the scene. Then, depending on system, it delivers one stereoscopic view or dierent stereoscopic views. In the latter case, the position of the viewer denes which stereoscopic view he/she watches. Stereoscopic TV is included in this category and it is the rst application of 3DTV available to the consumer. Immersive Teleconference: interactivity and virtual reality are desirable. For this reason FVV and 3DTV are supported, providing respectively interactivity and virtual reality. In this application several viewers interact so that feeling immersed in a 3D environment is important. Finally, it has to be pointed out, when applying or investigating MVC, some requirements that have to be known. Some of them are: view scalability, compression eciency, robustness, random access and resource consumption. For an extended explanation about this concepts, the reader is referred to [3] and [4]. 1.2 Thesis content As previously commented, compression becomes one of the important issues due to the quantity of data generated, specially when recording two or more views. Without compression, costs in hardware and systems to process digital video are increased, and furthermore, required bandwidth and storage are too high. To understand this problem the following numerical example of an uncompressed video and its requirements can be considered: Supposing a digital video sequence of 720x575 and 50 frames per second (fps) where the picture is represented by 3 bytes per pixel. Then, the size of a frame is 720x575x3x8 bits (9.93 Mbits). So, considering a video of one hour, the required space to store it is: x50x3600 bits (1.78 Tbits). This means that to send the video the bandwidth needed is (9.93 Mbits x 50 ) Mbps. This example illustrates clearly the necessity of compression. It becomes even more important when the case includes two or more views, since the data increases signicantly. Fortunately several techniques provide tools to exploit the

13 CHAPTER 1. INTRODUCTION 4 dierent redundancies existing in images. A lot of research has been done and it is still a focus of investigations, resulting in new approaches, novel video representations and improvements to existing techniques. Consequently, 3D technology is a eld which still needs a lot of time to arrive at a mature stage. At the moment, the extended format for the mass consumer market, stereoscopic video, is slowly gaining acceptance. At the same time, other formats are preparing their place in the 3D world. However, there is a common feature among all of them: providing high quality to the consumer. This concerns content creators, service providers and display manufacturers. It seems then, that a global vision of the topic and its currently situation is useful and can clarify the future of 3D video. So this thesis gives an overview of the state of the art of 3D video, always focusing on the coding part. 1.3 Thesis outline The thesis is structured as follows: In chapter 2 the most important techniques concerning coding are described, specically compression tools. They mainly correspond to Multiview Video Coding (MVC), an extension of H.264/AVC. Chapter 3 features the 3D video format and their representations. It also gives an overview over the current situation of the existing standards, relating the video representations with their corresponding standards. In chapter 4, concepts and their features explained in previous chapters are analyzed and compared. Finally, chapter 5 concludes the thesis and presents some ideas for future work.

14 Chapter 2 Basic coding issues One characteristic of multiview video (MVV) is that the system uses multiple views of the same scene, which means several cameras are capturing simultaneously several video streams of the same scene. This results in a huge amount of data that has to be stored or transmitted. Encoding, which is the process in which data information is transformed from one format to another, becomes an essential tool inside this process since the data need to be compressed. Fortunately, in multiview coding, correlations between adjacent views, temporal correlations within frames and spatial correlations in pictures exist. Taking advantage of these redundancies, the objective is to exploit them so as to obtain a compressed stream with an acceptable quality. For this reason ecient compression techniques have to be used in the encoding process of a MVV in order to achieve this objective In this chapter, the main coding issues concerning MVC are presented. Some of them are similar to codecs which are not MVC, nevertheless the combination of all of them results in the basic tools to implement the coding process in MVC. First, spatial prediction with Discrete Cosine Transform (DCT) is presented, then temporal prediction is discussed to nally describe inter-view prediction. 2.1 Spatial prediction Images are strongly non stationary data. For this reason, if a division into stationary regions is made, the resulting small areas of the picture are similar. As signicant correlation exists between neighboring pixels, the information to be coded can be reduced if those redundancies are exploited. To exploit spatial redundancies, intra prediction, which is based on neighboring pixels predictions, is applied. In this section transform technique such as the Discrete Cosine Transform are explained. It should be noted that other processes coming after the DCT in spatial coding, as weighting, scanning or entropy, are very common as well. 5

15 CHAPTER 2. BASIC CODING ISSUES Discrete cosine transform DCT is a mathematical tool widely used in image and video coding processes as it has very good energy compaction and de-correlation properties. It is close to optimal in terms of energy compaction capabilities and can be computed by fast algorithms. DCT is a good approximation to the Karhunen Loeve Transform (KLT), optimal among unitary transformations but more complex (and for this reason not considered by current coding standards). The mathematical expression is the following: ( ) s (2m + 1) kπ a k (m) = N cos 2M s = { 2 k 0 1 k = 0 (2.1) A k,l (m, n) = a k (m) a l (n) With the objective to remove redundancy, DCT transforms the signal to a new space where the signal representation is more compact, as described later. Specically, it transforms an NxN picture block into NxN block of coecients in the frequency domain. Figure 2.1: Matrix of coecients. Source: [34] The signal transform implies comparing (projecting to compute the inner product) each block in the image with the dierent components (vectors) that dene the transform space. Once the transform is done, the energy of a NxN pixel block is compacted into a few transformed coecients and the correlation between these coecients is signicantly reduced.

16 CHAPTER 2. BASIC CODING ISSUES 7 Figure 2.2: Matrix of DCT coecients after the transformation. [34] As a result, only a few coecients are necessary to recover a substantial amount of information since these few coecients represent the largest part of the signal energy. An interesting aspect of the transform coecients is that all of them representing high frequencies can be discarded due to the fact that the human visual system is not sensitive to high spatial frequencies. Hence the perceptual quality of the reconstructed image is not aected if the discarding process is done. Apart from the previous purpose, DCT is also a tool employed in the blockbased hybrid video coding 1. Inside this process, motion prediction errors are encoded through DCT to achieve a reduction of information to be sent. Summarizing, the use of this transform allows the removal of spatial redundancy owing to its good energy compaction and de-correlation properties. However, some eects have to be considered, as for example blocking eect, consequence of the block division and the isolated processing of each block, that results in a degradation of the image, although there are solutions such as the deblocking lter employed in H Temporal prediction A strong correlation exists between successive frames in a sequence. The basic idea of temporal prediction is to remove these similarities by coding their dierences, obtaining a bandwidth compression without aecting signicantly the image resolution. In this section inter-prediction is explained: rst a review about motion compensation and next the picture coding structures process. Finally the two parts of motion compensation, prediction error coding and motion estimation, are specied. High temporal correlations between frames in a sequence of images are exploited by motion compensation technique, whose capacity to remove temporal redundancies is commonly used in video compression processes. A solution is to approximate the moving objects by using the regular non-overlapped blocks into which the image is previously partitioned. After that, it is necessary to determine the motion vector so that a representation of the movement of the 1 Hybrid coding technique consists basically in represent image in terms of original data (DCT coecients), predicted image (motion vectors) and prediction error (DCT coecients).

17 CHAPTER 2. BASIC CODING ISSUES 8 image block could be done. Then, it is possible to reconstruct the current frame from the reference frame(s) through the prediction error and the motion vector. The following expressions describe which prediction is done, where at rst the process is presented when no motion compensation is used: No motion compensation: if motion compensation is not used, the prediction is known as a linear temporal prediction. It works well in stationary regions, but in real world video the objects in the scene as well as the camera are usually moving. So it is not the proper process since the values of pixels in the same spatial location in adjacent frames can be dierent: ˆf (x, t) = f (x, t 1) (2.2) In this expression, ˆf(x, t) represents the predicted frame, t the time and x is the pixel. Uni-directional motion compensation: works not well for regions uncovered by object motion, but solves the problem of the previous case: ˆf (x, t) = f (x + d (x), t 1) (2.3) Where d (x) represents the motion vector of pixel x from time t to t-1. The reference frame must be reconstructed before the coded frame. This case, in which the prediction of the current frame is done from a previous reference frame, is known as forward motion compensation. Bi-directional motion compensation: handles better the uncovered regions. In this case, a pixel in the current frame is predicted from a pixel in the following frame (t+1 ) as well as a pixel in the previous frame (t-1 ). It should be noted that the dierences are actually encoded. The predicted value is: ˆf (x, t) = a f f (x + d f (x), t 1) + a b f (x + d b (x), t + 1) (2.4) Now, d b represent the motion vector at x from t to t+1 and d f the same from t to t-1. Analog to the previous case, the prediction done from a future reference frame is called backward motion compensation. Then, in bi-directional motion compensation both backward and forward motion compensation are used. Finally a b and a f are coecients that should be determined through a predictor. The use of motion vectors to predict allows the coding structure with I, P and B images. The rst common approach is the IBBP 2... structure (where I and P images are references for B images), nevertheless is not the most ecient temporal structure. Due to this disadvantage, hierarchical B pictures, which are more ecient than the traditional structure [5], are chosen to be described in this section. They represent a coding structure that uses bi-directional predictive 2 I, B and P images that are in the group of images

18 CHAPTER 2. BASIC CODING ISSUES 9 pictures (B pictures) as references for other B pictures within one group of pictures. These types of prediction schemes benet from the increased exibility that H.264/AVC at picture/sequence level oers in comparison to former video coding standards and from the availability of the multiple reference picture technique [6]. Figure 2.3 shows a typical hierarchical reference picture structure. Figure 2.3: Hierarchical reference pictures structure for temporal prediction. [6] In this gure, the Group of Pictures (GOP) is built by the pictures located between a key picture (included in the GOP) and the previous key picture. The rst picture of the sequence is intra-coded, being an Instantaneous Decoder Refresh (IDR) picture. Indexes in the pictures indicates the hierarchical level, that is used to assure pictures are predicted from pictures with the same or higher temporal hierarchy level. Usually pictures are predicted using the two nearest pictures, always considering the constraint that the reference picture has to be encoded before the picture which is taking this reference picture as a reference. Arrows indicate to which pictures the picture can be a reference picture, and as expected, key pictures act as a reference to more pictures than B pictures. The hierarchical B picture concept is easily applied to multiview video as dened in section 2.3. A general view of the motion compensation has been reviewed in previous paragraphs. However, it is interesting to clarify the two parts of which motion compensation consists: Motion estimation: is conducted by minimizing a Lagrangian cost function J = D + λ R (2.5) This Lagrangian cost function J is the sum of distortion D and rate R, weighted by the Lagrange parameter λ. For each block S i of a picture, the motion estimation algorithm chooses the motion vector m i within a search range M in the reference picture that minimizes J m i = argmin {D (S i, m) + λ R (S i, m)} (2.6) Here, the distortion is calculated as the sum of squared errors between the current picture and the previously decoded reference picture s' D (S i, m) = [S (x, y, t) S (x m x, y m y, t m t )] 2 (2.7) (x,y)ɛb The rate R is the number of bits needed to transmit all components of the motion vector [6]. Although motion estimation algorithms are based

19 CHAPTER 2. BASIC CODING ISSUES 10 on temporal changes in images, depending on the application, motion estimation methods could be dierent. For video compression, estimated motion vectors are part of the motion compensation process in which a frame is coded from a reference frame. Prediction error: is usually obtained by nding the dierence between the original frame and the frame resulted from the motion compensation. This prediction error is encoded to be sent to the receiver and subjected to a transform coding operation. 2.3 Inter-view prediction In multiview video, sequences are captured by several video cameras, so the same scene is captured from nearby viewpoints. As a result, high correlations exist between the pictures of dierent views, which means, inter-view redundancy is present. The objective of inter-view prediction is to exploit these similarities between neighboring views in order to achieve a good compression rate. It is important to stress that encoding each view separately 3 is an inecient way to compress (multiview) video, as results in [5] indicate, because inter-view redundancies are not considered. There are two dierent inter-view predictions depending on if it applies for key frames only or for key and non key frames Inter-view prediction for key pictures A common method used in video coding based on motion compensated prediction is the inter prediction process, which basically consists in replacing intracoded (I) images with inter-coded (P,B) images in order to reduce the bit rate. Adding this idea to the scheme in where only inter prediction is applied and every view is coded independently, will result in a signicant coding gain. The following gures, 2.4 and 2.5, illustrate this concept. It should be noted that in these gures, the hierarchical B pictures concept is applied: Figure 2.4: Temporal Prediction using hierarchical B pictures. [6] 3 Known as simulcast coding, which is the method of gure 2.4 and can be done by any H.264/AVC codec.

20 CHAPTER 2. BASIC CODING ISSUES 11 Figure 2.5: Inter-view prediction for key pictures. [6] In these gures, horizontal axis represent the time and vertical axis the dierent views of the video. Every view corresponds to a dierent camera, in this case a total of 8 and the GOP has the size of 8 images. 2.4 represents the scheme where only temporal prediction is used 4. But in gure 2.5, only view S0 maintains the same scheme than in the previous gure (temporal prediction). This view (in this case S0), that is known as base view, has the minimum value of a view order index in the coded sequence and is characterized by not using inter-view prediction, so can be decoded independently of other views. There is only one base view in a coded video sequence. However, for the other views, inter-view prediction is applied by replacing all intra-coded key pictures by inter-coded pictures. An interesting aspect is how all these changes can aect in general the processes. Due to the desire of retaining the pictures of each GOP, the prediction structure does not change and synchronization and random access are still provided since key pictures of the base view are coded in intra mode (coded as I pictures, independently of other views and pictures). In contrast, the new scheme introduces an eect on the encoding process (and as a consequence, on the decoding process as well): it is not possible to process independently the video sequences of the individual views Sn since reference pictures are shared when they have to be stored in a shared buer (for parallel decoding) or interleaved into one bit stream (for sequential processing) Inter-view prediction for non key pictures Some analysis showed that the use of temporal prediction with inter-view reference frames at the same time improves coding eciency [6]. So it is logical to apply the idea of using inter-view prediction not only in key pictures, but in non key pictures as well, so that statistical dependencies can be exploited better. Inter-view prediction can be extended to non key pictures as gure 2.6 illustrates. Comparing with previous cases (gure 2.4 and gure 2.5) this example, 4 Usually simulcast coding is used as a reference to compare highly ecient temporal prediction structures with prediction structures that additionally use inter-view prediction.

21 CHAPTER 2. BASIC CODING ISSUES 12 which consists of the same number of cameras and the same length for the GOP, shows the whole behavior of the structure. Now all the non key pictures are inter-coded pictures (B pictures), being also predicted from the same timepoint picture of the neighboring views using inter-view prediction. As gure 2.6 indicates, in key pictures the process remains as specied in section Figure 2.6: Inter-view prediction for key and non key pictures. [6] Indeed, there are more prediction tools as view synthesis prediction, which is discussed in [7]. The disadvantage in this case and in inter-view prediction in non key pictures compared with the previous prediction structures (simulcast and inter-view prediction in only key pictures) is that they are more complex, and as a result, the computation process becomes more complex as well. On the other hand, they have coding eciency advantages which help to compress the multiview video data.

22

23 Chapter 3 Specic implementations In the previous chapter, some coding techniques to compress the amount of data were presented. Although they are essential to allow the transmission process, these techniques are not suitable for all 3D video representations. For this reason, in this chapter the dierent 3D video formats are specied, paying special attention to those which does not only use the most common predictions. Some of these formats have their corresponding standard, but not all of them. Standardization, an integral part of telecommunications, is necessary to assure integrity of the systems, covering aspects as communications protocols, interoperability of implementations and safety requirements. So it becomes an important part of the implementations of 3D video formats. For this reason, an analysis about the present standardization, including ideas for the upcoming standards, is done. Finally, other implementations and improvements of the coding tools and techniques (some of them appeared already in chapter 2) are specied with the objective of nding solutions to some problems as computational complexity or simply value new ways to achieve the same objective D Video formats 3D video is becoming an interesting technology arriving at the home of the consumer. This content is provided through Blu-ray, 3DTV broadcast or Internet. For these home applications exists (or are being designed) a variety of 3D displays systems such as two view stereo systems or multiview auto-stereoscopic displays. As a consequence a lot of dierent 3D video formats are available or are being investigated [1]. The data included is dierent and is strongly related to specic display types, having a direct inuence in the design of the 3D processing chain. As a result, several compression and coding algorithms exist for the dierent 3D video formats. The fact is that standard formats and ecient compression are indispensable for 3D applications, thus some of them are standardized and 14

24 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 15 widely established. On the other hand, others are currently under investigation. In this section, two classes of representations are distinguished, video only formats and depth enhanced formats, each of them including several representations with their own specic characteristics Video only formats Basically two video representations are part of this class: conventional stereo and multiview video, whose corresponding standardization is the H.264 family [8], including several extensions such as MVC. In this format only color pixel video data is involved. Two or more cameras capture the scene generating the corresponding multiple (two or more) signals, which are processed, nevertheless, without scene geometry information. Since the basic coding tools described in chapter 2 are mainly used in these representations, a brief and not extensively description and a standard overview is done in this section Conventional Stereo video This representation is the most well known type, being capable to create the illusion of depth in the images. Basically, a pair of sequences show the same scene from slightly dierent positions, one for the left eye and another for the right eye. As a result the data to be stored or transmitted is twice as big than the conventional monoscopic video if the least complex method is used: encoding and decoding separately the two video signals, which is called simulcast coding. However, and as seen in chapter 2, this technique is not ecient in compression terms. Other approaches can be applied as compatible stereoscopic coding, which is based on motion compensated DCT and where left view is independently coded and the right view is coded with references to the left view. Still the most convenient method to ecient coding is combining temporal and interview prediction as gure 3.1 illustrates, increasing the coding eciency. Figure 3.1: Stereo coding, combined temporal/interview prediction. [9] An alternative to conventional stereo video (CSV) is the mixed resolution stereo format (MRS). In this case, one of the two views of the stereo pair is

25 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 16 downsampled using the binocular suppression theory 1, which introduces a low pass eect on the view. An example is showed in gure 3.2, where the right view is subsampled to half resolution instead of being coded in full resolution. Then it is upsampled back to the original resolution into display format at the decoder. Figure 3.2: Stereo image pair with low-pass ltered right view. [9] Using this process of coding the two views with dierent resolution, the bit rate is signicantly decreased while no overall losses in 3D perception quality are produced [9] Multiview video While CSV consists only in two views, the MVV format shows the same scene from dierent views, enabling, for example, free viewpoint video applications. The format contains N views captured by an array of cameras, so the signals created has N times the amount of data of singleview video to be stored or transmitted. Figure 3.3 on the next page illustrates a multiview video system with stereo video included, where multiple video capturing, data compression, transmission and reception of the compressed data are represented. Since in chapter 2 the tools to reduce this data and the MVC features have been reviewed, in this subsection only a summarization of MVV is done. The MVC is used for coding N color only video sequences. Its coder uses temporal prediction structures, where pictures are coded as I, P or B pictures. However, temporal prediction is not the only prediction applied, since correlations between views exist. As developed in section 2.2, the coding eciency can be improved by using hierarchical B pictures structure. This means then that temporal and interview reference pictures are used in the coding process to predict the current picture. Regarding the restrictions of MVC, it is remarkable that a good data rate for e.g. 10 views is not possible to achieve. This is because of the dependency between bit rate and number of cameras. Hence, it is necessary to use advanced approaches to decouple the number of views for coding and transmission from the number of required output views. 1 It exploits properties of the human visual system: in a pair of corresponding points, one always suppresses the other. In this case (MRS), image is spatio-temporally low-pass ltered compared to the image in the other eye, achieving savings in bandwidth.

26 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 17 Figure 3.3: Multiview video system. [35] MVC is suitable for multiview video signals, however other formats that require geometry data are being investigated. Due to this restriction of these other formats and because synthesis of new views is dicult to do when using MVC, new depth enhanced formats have appeared [9] Coding standards Regarding 3D, video only formats are considered in the H.264 family. The standard H.264 is considered as a family of standards, consisting of several proles. This means that not every decoder is able to decode all proles, only those which are in the specication of the decoder. With the aim of being an standard that works with lower bit rates than previous standards (H.263 or MPEG-2 for example) and oering a good video quality, the H.264/Advanced Video Coding (AVC) video coding standard was designed, whose rst version was completed in May It is a standard of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) and since the exibility the standard provides, another objective is fullled: to cover a variety of video applications in networks and systems such as mobile services, Internet Protocol Television (IPTV), High Denition Television (HDTV) or High Denition (HD) video storage. Since the central requirement for the designs is high compression eciency, it seems it was the only factor to be considered for the video standard. However, this requirement is not the only standardization requirement. For instance, some general issues are usually considered: resource consumption, low delay or error robustness tend to be analyzed in the design process. Others are specic to MVC extension and important for the correct development of the system. Some examples are scalability or backward compatibility. Considering all the requirements and demands, the tools provided in the standard try to achieve all these issues. For a detailed information about all these tools and methods and the proles included during the years, the reader

27 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 18 is referred to the standard [8]. H.264/AVC is often the starting point for stereo and MVC. It is possible to distinguish three methods or extensions included in the standard. Next, the processing chains of standardized coding methods for video only format are listed: H.264/AVC simulcast: already reviewed in chapter 2, consists in that the video sequences are independently encoded, transmitted and decoded. Although it can be used for CSV, MRS and MVV, it is not the most common solution since it oers lower ecient coding than the following methods listed. Figure 3.4: H.264/AVC simulcast. [1] H.264/AVC including stereo supplemental enhancement information (SEI ): for CSV the Stereo video information SEI message was developed. It indicates to the decoder that the coded video sequence consists of stereo view content. Figure 3.5 shows the process followed: the two sequences are, line by line, interlaced into one. Then the encoder, which is working in eld coding mode after receiving the SEI message, applies the techniques to reduce redundancy and the bit stream is transmitted to be later decoded. Since the two images need to have the same size, this method is not suitable for MRS. Figure 3.5: H.264/AVC stereo SEI. [1] H.264/MVC : multiview video coding extensions were completed in November This extension, annex H of the standard, uses concepts seen in

28 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 19 chapter 2, in which a picture uses temporal and inter-view prediction. It includes new techniques to improve coding eciency and to reduce the complexity in the decoded part as well as new functionalities. Figure 3.6: H.264/MVC. [1] Depth enhanced formats When investigating other formats than MVC, some restrictions appear due to the fact that MVC was designed for multiview video signals. Application formats for 3D video coding which need geometry data are a good example of formats which have these restrictions. Depth enhanced formats dier from video only formats by including scene geometry data in form of depth maps, which are characterized by having dierent statistical properties than video signal. A complete 3D video coding framework that targets a generic 3D video format for depth enhanced formats and associated ecient compression is depicted in gure 3.7. The estimation of the depth data is done at the transmitter side limiting the number of input views and providing for transmission a multiview video plus depth format. At the receiver side, the data (video and depth) is decoded and view synthesis is used to generate the necessary additional views for the display. View synthesis helps the computation process that requires the conversion from representation to display format. Figure 3.7: Overview of a 3D video system intended for depth enhanced formats. [26]

29 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 20 The advantage that these formats introduce is the ability to generate and display virtual views at arbitrary positions, useful if additional intermediate views are needed and if the occlusion problem is solved. In the following subsections some depth enhanced formats are specied and nally an analysis about the situation of the coding standards is presented Video plus depth Video plus depth (V+D) is the next format after MVV considering the complexity of the methods. It consists of a video signal and its corresponding per pixel depth map and both are transmitted. An example is illustrated in gure 3.8: Figure 3.8: Video plus depth: regular 2D color video and a 8 bit depth image. [25] As the gure shows, the depth data can be considered as a luminance only video signal, being a gray scale image. If these images are fed into the luminance channel and chrominance is xed to a constant value, the video signal which results can be processed by any state of the art video codec. The range is quantized with 8 bits, which means 256 values are associated to the dierent points. The most distant point is represented by the value 0 while the closest point by the value 255. Therefore, the depth range, which is the distance of the 3D point from the camera, is limited between z far and z near and every pixel has its z value. Depth data could be generated at the receiver, but it is not trivial. The best option is to provide depth information at the sender so that 3D content producers has control over the display output. Then a method to store the depth data has to be applied. Usually the inverted real world depth data is used: stored depth = z max 1 z 1 z max 1 z min (3.1)

30 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 21 Where an 8 bit representation (values from 0 to 255) is assumed. Thanks to this method it is possible to get a high depth resolution of close objects and a coarse depth resolution in farther objects. On the other hand, when referring to disparity values, the inverse quantized depth values are not identical due to the dependence between camera distance and disparity values. Then, the following equation is used in synthesis scenarios to recover the depth values z from the depth maps: z = stored depth 255 ( 1 ) 1 z min 1 z max + 1 z max (3.2) One limitation of the video plus depth format representation is its FVV functionality. When the position of the user changes, the rendered stereo pair is supposed to be adjusted to the new position. However, the navigation range of the head motion parallax is signicantly limited, although extending video coding scheme to a N-view plus N-depth environment, a higher navigation range is achieved. So, by rendering the virtual intermediate views, the FVV functionality is increased. Figure 3.9 shows the synthesis of arbitrary intermediate views with this format. Figure 3.9: Synthesized view from video and depth of adjacent camera views. [25] Virtual views are generated in real-time by Depth Image Based Rendering (DIBR) [10] at the receiver. This technique allows the synthesis of views as follows: rst the points of the original image are projected into the 3D scene using the depth maps and later the 3D points are projected into the image plane. It can be seen as 2D points are projected to 3D points and then the contrary process is applied. Concatenation of these processes is known as 3D

31 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 22 image warping and the idea of this process has been explained in the previous paragraphs. Another common problem is about the color with depth representation, for example, when generating depth or disparity information. Actually, captured depth elds are not good enough given that available cameras which capture video with per sample depth do not provide good quality. Nevertheless algorithms for depth and disparity estimation are studied and this problem can be solved. The last inconvenience is the increased complexity. In order to achieve the advantages of this representation, the implementations are more complex on the transmitter side as well as on the receiver side. At the transmitter, the generation of depth data has to be done before encoding while at the receiver, to generate the stereo pair it is necessary, after decoding, to execute view synthesis. Despite of these limitations, the video plus depth concept is valuable because of its advantage of generating virtual views at arbitrary positions and its backward compatibility. Only by specifying high level syntax to let the decoder interpret the streams as color and depth, and transmitting the information about the depth range, it is possible to use existing video codecs in the decoding process Multiview video plus depth Multiview video plus depth (MVD) format appear as a new 3D video representation to cover the lack of the previous representations: MVC becomes inecient if the number of view is high and it does not provide continuity between views while continuity of V+D is still limited. The MVD representation however presents as a main advantage the ability to easily render intermediate views. To have this ability, the MVD process includes dierent complex steps, which are shown in gure Figure 3.10: MVD process. [9] At the encoder part a total of N color with N depths videos are encoded before being transmitted to the decoder. Previously, at the transmitter, the

32 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 23 depth is estimated for the N views. After transmitting, the data is decoded at the receiver and the virtual views are rendered. Regarding the autostereoscopic displays that MVD can eciently support, gure 3.11 illustrates an example. Figure 3.11: MVD: example scheme of view synthesis for support of multiview displays. [10] Here, the display consists of 9 dierent views (V1, V2...V9) which are shown at the same time. The user is able to see from his/her position (indicated in the gure as Pos1, Pos2 and Pos3) a stereo pair of these views depending on his/her own position. Of all of these views, only three are in the decoder as original views (V1, V5 and V9) with their corresponding depth maps D1, D5 and D9. Then, the depth image based rendering process is applied to obtain the other views and complete the display of the 9 views. Following this method, the process becomes more ecient than if using MVC, which would transmit the 9 display views directly. As in previous cases, the achievement of more eciency is paid with an increase of the complexity of the process Layered depth video This representation is derived from MVD and appears as an alternative to it. Actually, it is believed to be more suitable than MVD for 3DTV scenario due to that it is more compact and less information has to be transmitted. On the other hand, error prone vision tasks operating on unreliable depth data have to be added. The Layered Depth Video (LDV) format consists of multiple layers, containing information about depth and texture and describing the scene in terms of geometry and color. Specically, LDV uses one color video and a occlusion

33 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 24 layer, both with its associated depth map. The occlusion layer is characterized for including image content that is occluded by foreground objects in the main layer. Therefore, the idea of LDV is to describe hidden parts of the scene from the view of reference camera using the additional layers above mentioned. Figures 3.12 and 3.13 represents this concept. Figure 3.12: Layered video depth concept - before transmission process. [27] Figure 3.13: Layered video depth concept - after transmission process. [27] In this example, the scene is the left image of gure 3.12, where a blue ball is behind a brown ball. The scene is captured and then foreground layer and occlusion layer are created. Occlusion information is constructed by warping the neighboring V+D views from the MVD representation. After that, the LDV stream is encoded and transmitted to nally obtain the center view and render and generate two views: from the left and from the right side of the image. Now, in these images the hidden element (blue ball) is visible. In gure 3.14 real images representing the dierent layers of gure 3.12 are presented.

34 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 25 Figure 3.14: Layered depth video before transmission - example with real images. [27] However, it is also interesting to observe an example of how the two views are generated. Next gure 3.15 shows how the generated images can look like. First picture is the central view with occlusion layer and the two images under this one, are the new views, left and right. Figure 3.15: Layered depth video after transmission - example with real images. [33]

35 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 26 LDV can be seen as an extension of V+D format since this format could be considered as the rst layer of the LDV. Nevertheless, strictly speaking it is really derived from MVD. Wrapping the main layer image onto other input images, it is possible to generate LDV from MVD: the idea is to determine the parts that are occluded in the main layer image and correspond to other contributing input images. Once this process is done, all theses images are considered as a residual images, thus they are transmitted Coding standards The actual standardization situation in depth enhanced formats diers from the situation of the standards in video only formats. While coding algorithms are standardized for video only formats, some algorithms of depth enhanced formats are not still standardized, so more standards have to be developed for these formats. However, two specications have been elaborated: MPEG-C part 3 standard (ISO/IEC ) [11]: enables already V+D, encoding the information into two dierent streams: video in one and depth in the other. That means both video and depth are encoded separately. Then, these two streams are multiplexed into one stream, frame by frame and with depth map parameters. The standard denes the representation for depth maps to be encoded as 2D sequences and the parameters to interpret the depth values at the receiver. On the other hand, the standard does not cover the techniques which concern compression and transport. H.264/AVC Auxiliary Picture Syntax: also uses V+D format. In this case, a primary coded picture, the video, is supplemented by an auxiliary coded picture, the depth. So the primary coded picture, since it contains all the macroblocks of the picture, is the only one which aects the decoding process. One requirement of primary and auxiliary coded pictures is that both have to contain the same number of macroblocks. In contrast to MPEG-C part 3, where video and depth are coded independently, here primary and auxiliary coded pictures are combined into a single source, which is then coded by H.264/AVC to be sent. After transmission, the coded pictures, primary and auxiliary, are decoded independently at the same time. Apart from these two standards, others extensions exist, specically MPEG-4 Multiple Auxiliary Component (MAC) and H.264 Scalable Video Coding (SVC). With the rst one it is possible to encode auxiliary components, so depth map could be employed. H.264/SVC, which is an extension of H.264/AVC (annex G) [8], provides a compatible base layer and one or more enhancement layer. The base layer has the minimum quality and the enhancement layer represent the increased quality, where depth can be an option and be decoded by a SVC decoder. Despite the existence of these standards, as introduced at the beginning of this subsection, more coding algorithms are required for these formats that

36 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 27 are not properly supported, for example, MVD. For this reason and for the requirements of the market, the Moving Picture Experts Group (MPEG) has initiated a development of a generic 3D video standard. Figure 3.16 illustrates these targets an main ideas: Figure 3.16: Target of MPEG 3D video coding initiative. [1] The aim is to support high quality autostereoscopic displays and to solve problems with varying display types and sizes, adding for example a variable stereo baseline or an adjustment of the depth perception. But not only these objectives have to be achieved, other example objectives include the reduction of the rate requirements or an improved rendering ability. 3.2 Other implementations In chapter 2 basic and most common methods and tools used for the 3D video signal treatment in the coding process have been reviewed. However, these are not the only methods existing since many investigations have been done in this eld. As a result, other implementations and improvements to known techniques have appeared and can be adopted by some systems. This section presents interesting implementation results of research considering some factors as eciency improvement and computational savings for example. In addition, real 3D with digital holographic, is treated in the following subsections Computational saving methods One of the problems in the encoding systems is the computational complexity. This is caused by the motion estimation and the disparity estimation, which are

37 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 28 used in MVC when coding each macroblock to provide a high coding eciency. Some research on the computational complexity in MVC has been done and as result some methods have been proposed to reduce this complexity with a minimal loss of image quality: [12], [13] or [14] are some of them. But since a lot of these methods exist, only one of them, which appears in a recent publication [15], is explained in this subsection: Early SKIP mode decision: as justied in chapter 2, prediction modes of a macroblock (MB) in current view and in its neighbor view are similar due to similarities in the content. This algorithm assumes this fact and take advantage of it. First important element is the global disparity vector (GDV) because it helps to locate a suitable MB in the neighbor view. GDV is measured by the size of MB unit, however it is not the exact disparity between both MBs. That is why to estimate the mode of an MB, the mode of the corresponding MB and its 8 MB neighbors are taken. Figure 3.17 depicts this concept: Figure 3.17: MB and its neighboring MBs in previous coded view. [15] Then, weight factor of SKIP mode is necessary to allow to discern when SKIP mode is suitable. For this, the weight factor is compared with a threshold and when it is larger than the threshold, SKIP mode is considered better, otherwise it is thought the MB needs variable size motion estimation and disparity estimation. One problem is how to x the threshold given that, since video contents are similar between views, coding mode between views are similar as well. The denition of the weight factor of

38 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 29 SKIP mode for an MB is: W SKIP mode = N 1 i=0 α i w i (3.3) Where N is the total number of MBs, w i the weight of SKIP mode for the MBs (MB 0 to 8 in gure 3.17), that is 1 when the (corresponding) MB is coded with SKIP mode and 0 otherwise. Finally α i is the MB-weight factor, a parameter that is dierent depending on the MB. An analysis made in [28] shows that the percentage of usage of SKIP mode occurring in B slices is high, depending on the video between 60% and 90%. This means that this percentage of the MBs are usually coded as SKIP mode and consequently, complexity in computational process decreases. Another interesting method is the adaptative early termination, whose strategy is based on a threshold which can be determined by two dierent types of processes. One consists in using constant values, with the possibility of applying the same threshold independently of the coding conditions. The second, more complex method, takes references about the rate distortion costs from spatially and temporally neighboring MBs and xes the early termination thresholds for the current MB. Remarkable methods are the fast intermode size decision, which considers the fact that MBs in homogeneous region choose large sizes 2 and MBs in active motion choose small sizes, and the selective intra prediction in inter frame, employed when a new object appears and the motion vector could not be as ecient as using the original data. A possible algorithm could be based on these four approaches mentioned and the experimental results in [15] conclude that the objective of reducing the computational complexity is achieved, always considering that many others methods and algorithms can help, in complexity terms, the encoding process as well Wavelet transformation The Wavelet transform emerged as a useful tool in image and video compression because it oers exibility when representing non-stationary images. In addition, the wavelet representation oers a desirable property for video coding applications: multiresolution expression of a signal. Actually, this multiresolution decomposition provides itself a scalable bit stream. The wavelet concept, however, requires a mathematical support to be more understandable. Here, the Haar Basis in one dimension, which is the simplest wavelet, is introduced. A wavelet decomposition transforms a function into scaling coecients and detail coecients. Then, ignoring the coecients with low value, allows lossy 2 it refers to mode sizes: 16 x 16, 16 x 8, 8 x 8...

39 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 30 compression of the signal. Assuming having a number of nested linear function spaces: ν 0 ν 1 ν 2... ν j (3.4) The dimension of the spaces increase with j. For each space ν a set of basis functions, called scaling functions, are dened. When normalized, the following expression generates them: with φ j,t (x) = 2 j /2 φ ( 2 j x t ) t = 0,..., 2 j 1 (3.5) φ (x) = { 1 0 x < 1 0 otherwise On the other hand, the wavelet functions span the wavelet spaces, which complements ν j in ν j+1. The inner product is zero for each scaling and wavelet function at the same level. Expression 3.6 gives the wavelet functions: with ψ j,t (x) = 2 j /2 ψ ( 2 j x t ) t = 0,..., 2 j 1 (3.6) 1 0 x < 1 /2 ψ (x) = 1 1 /2 x < 1 0 otherwise Then a hierarchical basis, where a given function is represented, is made from scaling and wavelet functions: the wavelet decomposition. The compression can be achieved if the proper scaling and wavelet function are chosen, which makes it possible to represent the original signal with few coecients. As in the DCT case, wavelet transforms need complement processes as well: quantization or entropy coding for example. Wavelet coding techniques can be sorted in dierent categories. Some of them are: Spatial-domain motion compensation followed by 2D wavelet transform Wavelet transform followed by frequency domain motion compensation 3D wavelet transforms with or without motion estimation However, one of the most interesting is the 4D wavelet [16], where a 4D matrix of pixels represents a multiview video stream. But wavelet coding is not easy to apply to MVC despite of the exibility it contributes. The problem is MVV is high dimensional (two dimensions for spatial direction, one dimension for temporal direction and another one for view direction) and existing motion compensated temporal lters are not ecient when exploiting correlations between views due to that the redundancies are dierent: view disparity is not the same as temporal motion.

40 CHAPTER SPECIFIC IMPLEMENTATIONS In addition to the previous problem, some di culties arise in compressing the 4D wavelet coe cients in a e cient way. 4D wavelet coe cients have the possibility of being generated with any decomposition structure, for this reason some proposed methods consist in reorganizing these coe cients into 3D data. Nevertheless, nding a solution to all these previous issues as is done in [16], it is possible to construct a suitable 4D wavelet based on MVC Real 3D - Digital holography Holography is a technique of the branch of optics that uses a laser and its coherent light to construct a hologram which can create a 3D image. Hence, this image behaves as if the object was present, changing as the viewing position 3 changes, called motion parallax. An improvement is the possibility to show videos on a holographic volumetric display. As a consequence, digital holography, where holograms are digitally representated (i.e. they can be processed, analyzed and transmitted electronically), becomes an important topic inside the eld of 3D displays. Speci c hologram sequence compression techniques are needed, although some investigations have been done, as for example using MPEG-4 part 2 video coding algorithm for the compression of hologram sequences [36]. Digital holography is de ned as the technology of acquiring and processing holographic measurement data, typically via a Charges Coupled Device (CCD) 4 camera or a similar device. It means digital holography treats the data in order to reconstruct the object data from the recorder measurement data. More generally, it can be seen as a 3D technique for capturing real world objects. Figure 3.18 shows an example of a 3D holography image and [17] is an interesting demonstration of a 3D video. 5 Figure 3.18: 3D holography image 3 it well should be noted that motion parallax can be introduced without holographic images as 4 de nition 5 extracted from wikipedia extracted from:

41 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 32 Next, the general principles of digital holography are detailed. For more extended information about holography and digital holography issues see [18]. The recording process is illustrated in gure The recorded medium, a CCD, receives two lights to record the hologram. One is the plane reference wave and the other the reected wave from the object. Since both waves arrives to the surface of the CCD, an interference between them is recorded. After that, the resulting hologram is electronically recorded and stored. Figure 3.19: Recording with digital holography. [18] When reconstructing, several methods can be chosen like for example the numerical hologram reconstruction, where intensity and phase are calculated. The basis is done by the equation 3.7. where Γ (ξ, η ) = i λ ˆ ˆ ρ = h (x, y) E R (x, y) exp ( i 2π λ ρ ) ρ dxdy (3.7) (x ξ ) 2 + (y η ) 2 + d 2 (3.8) and h (x, y)the hologram function and ρ the distance of points between hologram plane and reconstruction plane. The scheme of gure 3.20 species the geometrical values which appear in the above equations. Figure 3.20: Coordinate system of numerical reconstruction. [18]

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50

More information

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Upcoming Video Standards Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Outline Brief history of Video Coding standards Scalable Video Coding (SVC) standard Multiview Video Coding

More information

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013 ECE 417 Guest Lecture Video Compression in MPEG-1/2/4 Min-Hsuan Tsai Apr 2, 213 What is MPEG and its standards MPEG stands for Moving Picture Expert Group Develop standards for video/audio compression

More information

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

International Journal of Emerging Technology and Advanced Engineering Website:   (ISSN , Volume 2, Issue 4, April 2012) A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and

More information

Digital Video Processing

Digital Video Processing Video signal is basically any sequence of time varying images. In a digital video, the picture information is digitized both spatially and temporally and the resultant pixel intensities are quantized.

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 MPEG2011/N12559 February 2012,

More information

Video Communication Ecosystems. Research Challenges for Immersive. over Future Internet. Converged Networks & Services (CONES) Research Group

Video Communication Ecosystems. Research Challenges for Immersive. over Future Internet. Converged Networks & Services (CONES) Research Group Research Challenges for Immersive Video Communication Ecosystems over Future Internet Tasos Dagiuklas, Ph.D., SMIEEE Assistant Professor Converged Networks & Services (CONES) Research Group Hellenic Open

More information

Video coding. Concepts and notations.

Video coding. Concepts and notations. TSBK06 video coding p.1/47 Video coding Concepts and notations. A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds. Each image is either

More information

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami to MPEG Prof. Pratikgiri Goswami Electronics & Communication Department, Shree Swami Atmanand Saraswati Institute of Technology, Surat. Outline of Topics 1 2 Coding 3 Video Object Representation Outline

More information

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Week 14. Video Compression. Ref: Fundamentals of Multimedia Week 14 Video Compression Ref: Fundamentals of Multimedia Last lecture review Prediction from the previous frame is called forward prediction Prediction from the next frame is called forward prediction

More information

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

2014 Summer School on MPEG/VCEG Video. Video Coding Concept 2014 Summer School on MPEG/VCEG Video 1 Video Coding Concept Outline 2 Introduction Capture and representation of digital video Fundamentals of video coding Summary Outline 3 Introduction Capture and representation

More information

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Project Title: Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Midterm Report CS 584 Multimedia Communications Submitted by: Syed Jawwad Bukhari 2004-03-0028 About

More information

5LSH0 Advanced Topics Video & Analysis

5LSH0 Advanced Topics Video & Analysis 1 Multiview 3D video / Outline 2 Advanced Topics Multimedia Video (5LSH0), Module 02 3D Geometry, 3D Multiview Video Coding & Rendering Peter H.N. de With, Sveta Zinger & Y. Morvan ( p.h.n.de.with@tue.nl

More information

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2,

More information

Video Compression An Introduction

Video Compression An Introduction Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro email:{martinian,jxin,avetro}@merl.com, behrens@tnt.uni-hannover.de Mitsubishi Electric Research

More information

Overview of Multiview Video Coding and Anti-Aliasing for 3D Displays

Overview of Multiview Video Coding and Anti-Aliasing for 3D Displays MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Overview of Multiview Video Coding and Anti-Aliasing for 3D Displays Anthony Vetro, Sehoon Yea, Matthias Zwicker, Wojciech Matusik, Hanspeter

More information

Advanced Video Coding: The new H.264 video compression standard

Advanced Video Coding: The new H.264 video compression standard Advanced Video Coding: The new H.264 video compression standard August 2003 1. Introduction Video compression ( video coding ), the process of compressing moving images to save storage space and transmission

More information

Part 1 of 4. MARCH

Part 1 of 4. MARCH Presented by Brought to You by Part 1 of 4 MARCH 2004 www.securitysales.com A1 Part1of 4 Essentials of DIGITAL VIDEO COMPRESSION By Bob Wimmer Video Security Consultants cctvbob@aol.com AT A GLANCE Compression

More information

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK Professor Laurence S. Dooley School of Computing and Communications Milton Keynes, UK How many bits required? 2.4Mbytes 84Kbytes 9.8Kbytes 50Kbytes Data Information Data and information are NOT the same!

More information

VIDEO COMPRESSION STANDARDS

VIDEO COMPRESSION STANDARDS VIDEO COMPRESSION STANDARDS Family of standards: the evolution of the coding model state of the art (and implementation technology support): H.261: videoconference x64 (1988) MPEG-1: CD storage (up to

More information

Development and optimization of coding algorithms for mobile 3DTV. Gerhard Tech Heribert Brust Karsten Müller Anil Aksay Done Bugdayci

Development and optimization of coding algorithms for mobile 3DTV. Gerhard Tech Heribert Brust Karsten Müller Anil Aksay Done Bugdayci Development and optimization of coding algorithms for mobile 3DTV Gerhard Tech Heribert Brust Karsten Müller Anil Aksay Done Bugdayci Project No. 216503 Development and optimization of coding algorithms

More information

Tech Note - 05 Surveillance Systems that Work! Calculating Recorded Volume Disk Space

Tech Note - 05 Surveillance Systems that Work! Calculating Recorded Volume Disk Space Tech Note - 05 Surveillance Systems that Work! Surveillance Systems Calculating required storage drive (disk space) capacity is sometimes be a rather tricky business. This Tech Note is written to inform

More information

Module 7 VIDEO CODING AND MOTION ESTIMATION

Module 7 VIDEO CODING AND MOTION ESTIMATION Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five

More information

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri MPEG MPEG video is broken up into a hierarchy of layer From the top level, the first layer is known as the video sequence layer, and is any self contained bitstream, for example a coded movie. The second

More information

Multi-View Video Transmission over the Internet

Multi-View Video Transmission over the Internet Institutionen för Systemteknik Department of Electrical Engineering Master Thesis Multi-View Video Transmission over the Internet By Abdullah Jan Mirza Mahmood Fateh Ahsan LiTH-ISY-EX - - 10/4409 - - SE

More information

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO-IEC JTC1/SC29/WG11

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO-IEC JTC1/SC29/WG11 INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO-IEC JTC1/SC29/WG11 CODING OF MOVING PICTRES AND ASSOCIATED ADIO ISO-IEC/JTC1/SC29/WG11 MPEG 95/ July 1995

More information

Multimedia Technology CHAPTER 4. Video and Animation

Multimedia Technology CHAPTER 4. Video and Animation CHAPTER 4 Video and Animation - Both video and animation give us a sense of motion. They exploit some properties of human eye s ability of viewing pictures. - Motion video is the element of multimedia

More information

Digital video coding systems MPEG-1/2 Video

Digital video coding systems MPEG-1/2 Video Digital video coding systems MPEG-1/2 Video Introduction What is MPEG? Moving Picture Experts Group Standard body for delivery of video and audio. Part of ISO/IEC/JTC1/SC29/WG11 150 companies & research

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro TR2006-035 April 2006 Abstract

More information

View Synthesis Prediction for Rate-Overhead Reduction in FTV

View Synthesis Prediction for Rate-Overhead Reduction in FTV MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com View Synthesis Prediction for Rate-Overhead Reduction in FTV Sehoon Yea, Anthony Vetro TR2008-016 June 2008 Abstract This paper proposes the

More information

3D Video Formats and Coding Standards

3D Video Formats and Coding Standards 3D Video Formats and Coding Standards Fraunhofer Institute for Telecommunications Heinrich-Hertz-Institut Berlin Einsteinufer 37 10587 Berlin Germany +49 30 310 02 0 info@hhi.fraunhofer.de http://www.hhi.fraunhofer.de

More information

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression IMAGE COMPRESSION Image Compression Why? Reducing transportation times Reducing file size A two way event - compression and decompression 1 Compression categories Compression = Image coding Still-image

More information

An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A.

An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A. An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT) Beong-Jo Kim and William A. Pearlman Department of Electrical, Computer, and Systems Engineering Rensselaer

More information

In the name of Allah. the compassionate, the merciful

In the name of Allah. the compassionate, the merciful In the name of Allah the compassionate, the merciful Digital Video Systems S. Kasaei Room: CE 315 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu Webpage:

More information

Very Low Bit Rate Color Video

Very Low Bit Rate Color Video 1 Very Low Bit Rate Color Video Coding Using Adaptive Subband Vector Quantization with Dynamic Bit Allocation Stathis P. Voukelatos and John J. Soraghan This work was supported by the GEC-Marconi Hirst

More information

Next-Generation 3D Formats with Depth Map Support

Next-Generation 3D Formats with Depth Map Support MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Next-Generation 3D Formats with Depth Map Support Chen, Y.; Vetro, A. TR2014-016 April 2014 Abstract This article reviews the most recent extensions

More information

A real-time SNR scalable transcoder for MPEG-2 video streams

A real-time SNR scalable transcoder for MPEG-2 video streams EINDHOVEN UNIVERSITY OF TECHNOLOGY Department of Mathematics and Computer Science A real-time SNR scalable transcoder for MPEG-2 video streams by Mohammad Al-khrayshah Supervisors: Prof. J.J. Lukkien Eindhoven

More information

Video Transcoding Architectures and Techniques: An Overview. IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang

Video Transcoding Architectures and Techniques: An Overview. IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang Video Transcoding Architectures and Techniques: An Overview IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang Outline Background & Introduction Bit-rate Reduction Spatial Resolution

More information

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France Video Compression Zafar Javed SHAHID, Marc CHAUMONT and William PUECH Laboratoire LIRMM VOODDO project Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier LIRMM UMR 5506 Université

More information

Depth Estimation for View Synthesis in Multiview Video Coding

Depth Estimation for View Synthesis in Multiview Video Coding MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Depth Estimation for View Synthesis in Multiview Video Coding Serdar Ince, Emin Martinian, Sehoon Yea, Anthony Vetro TR2007-025 June 2007 Abstract

More information

JPEG 2000 vs. JPEG in MPEG Encoding

JPEG 2000 vs. JPEG in MPEG Encoding JPEG 2000 vs. JPEG in MPEG Encoding V.G. Ruiz, M.F. López, I. García and E.M.T. Hendrix Dept. Computer Architecture and Electronics University of Almería. 04120 Almería. Spain. E-mail: vruiz@ual.es, mflopez@ace.ual.es,

More information

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Yongying Gao and Hayder Radha Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48823 email:

More information

About MPEG Compression. More About Long-GOP Video

About MPEG Compression. More About Long-GOP Video About MPEG Compression HD video requires significantly more data than SD video. A single HD video frame can require up to six times more data than an SD frame. To record such large images with such a low

More information

Lecture 14, Video Coding Stereo Video Coding

Lecture 14, Video Coding Stereo Video Coding Lecture 14, Video Coding Stereo Video Coding A further application of the tools we saw (particularly the motion compensation and prediction) is stereo video coding. Stereo video is used for creating a

More information

Megapixel Video for. Part 2 of 4. Brought to You by. Presented by Video Security Consultants

Megapixel Video for. Part 2 of 4. Brought to You by. Presented by Video Security Consultants rought to You by 2009 Video Security Consultants Presented by Part 2 of 4 A1 Part 2 of 4 How to Avert a Compression Depression Illustration by Jerry King While bandwidth is widening, larger video systems

More information

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error.

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error. ON VIDEO SNR SCALABILITY Lisimachos P. Kondi, Faisal Ishtiaq and Aggelos K. Katsaggelos Northwestern University Dept. of Electrical and Computer Engineering 2145 Sheridan Road Evanston, IL 60208 E-Mail:

More information

Video Compression MPEG-4. Market s requirements for Video compression standard

Video Compression MPEG-4. Market s requirements for Video compression standard Video Compression MPEG-4 Catania 10/04/2008 Arcangelo Bruna Market s requirements for Video compression standard Application s dependent Set Top Boxes (High bit rate) Digital Still Cameras (High / mid

More information

Multimedia Standards

Multimedia Standards Multimedia Standards SS 2017 Lecture 5 Prof. Dr.-Ing. Karlheinz Brandenburg Karlheinz.Brandenburg@tu-ilmenau.de Contact: Dipl.-Inf. Thomas Köllmer thomas.koellmer@tu-ilmenau.de 1 Organisational issues

More information

MPEG: It s Need, Evolution and Processing Methods

MPEG: It s Need, Evolution and Processing Methods MPEG: It s Need, Evolution and Processing Methods Ankit Agarwal, Prateeksha Suwalka, Manohar Prajapati ECE DEPARTMENT, Baldev Ram mirdha institute of technology (EC) ITS- 3,EPIP SItapura, Jaipur-302022(India)

More information

Motion Estimation for Video Coding Standards

Motion Estimation for Video Coding Standards Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression

More information

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology Course Presentation Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology Image Compression Basics Large amount of data in digital images File size

More information

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT) 1 Beong-Jo Kim and William A. Pearlman Department of Electrical, Computer, and Systems Engineering

More information

Scalable Multiresolution Video Coding using Subband Decomposition

Scalable Multiresolution Video Coding using Subband Decomposition 1 Scalable Multiresolution Video Coding using Subband Decomposition Ulrich Benzler Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung Universität Hannover Appelstr. 9A, D 30167 Hannover

More information

Developing a Multimedia Toolbox for the Khoros System. Yuh-Lin Chang. Rafael Alonso. Matsushita Information Technology Laboratory

Developing a Multimedia Toolbox for the Khoros System. Yuh-Lin Chang. Rafael Alonso. Matsushita Information Technology Laboratory Developing a Multimedia Toolbox for the Khoros System Yuh-Lin Chang Rafael Alonso Matsushita Information Technology Laboratory Panasonic Technologies, Inc. Two Research Way Princeton, NJ 08540, USA fyuhlin,alonsog@mitl.research.panasonic.com

More information

Compression of Stereo Images using a Huffman-Zip Scheme

Compression of Stereo Images using a Huffman-Zip Scheme Compression of Stereo Images using a Huffman-Zip Scheme John Hamann, Vickey Yeh Department of Electrical Engineering, Stanford University Stanford, CA 94304 jhamann@stanford.edu, vickey@stanford.edu Abstract

More information

High Efficiency Video Coding. Li Li 2016/10/18

High Efficiency Video Coding. Li Li 2016/10/18 High Efficiency Video Coding Li Li 2016/10/18 Email: lili90th@gmail.com Outline Video coding basics High Efficiency Video Coding Conclusion Digital Video A video is nothing but a number of frames Attributes

More information

A New Data Format for Multiview Video

A New Data Format for Multiview Video A New Data Format for Multiview Video MEHRDAD PANAHPOUR TEHRANI 1 AKIO ISHIKAWA 1 MASASHIRO KAWAKITA 1 NAOMI INOUE 1 TOSHIAKI FUJII 2 This paper proposes a new data forma that can be used for multiview

More information

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING 1 Michal Joachimiak, 2 Kemal Ugur 1 Dept. of Signal Processing, Tampere University of Technology, Tampere, Finland 2 Jani Lainema,

More information

On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks

On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks 2011 Wireless Advanced On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks S. Colonnese, F. Cuomo, O. Damiano, V. De Pascalis and T. Melodia University of Rome, Sapienza, DIET,

More information

Lecture 6: Compression II. This Week s Schedule

Lecture 6: Compression II. This Week s Schedule Lecture 6: Compression II Reading: book chapter 8, Section 1, 2, 3, 4 Monday This Week s Schedule The concept behind compression Rate distortion theory Image compression via DCT Today Speech compression

More information

Automatic 2D-to-3D Video Conversion Techniques for 3DTV

Automatic 2D-to-3D Video Conversion Techniques for 3DTV Automatic 2D-to-3D Video Conversion Techniques for 3DTV Dr. Lai-Man Po Email: eelmpo@cityu.edu.hk Department of Electronic Engineering City University of Hong Kong Date: 13 April 2010 Content Why 2D-to-3D

More information

Video Quality Analysis for H.264 Based on Human Visual System

Video Quality Analysis for H.264 Based on Human Visual System IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021 ISSN (p): 2278-8719 Vol. 04 Issue 08 (August. 2014) V4 PP 01-07 www.iosrjen.org Subrahmanyam.Ch 1 Dr.D.Venkata Rao 2 Dr.N.Usha Rani 3 1 (Research

More information

Lecture 5: Error Resilience & Scalability

Lecture 5: Error Resilience & Scalability Lecture 5: Error Resilience & Scalability Dr Reji Mathew A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S 010 jzhang@cse.unsw.edu.au Outline Error Resilience Scalability Including slides

More information

Compressed-Domain Video Processing and Transcoding

Compressed-Domain Video Processing and Transcoding Compressed-Domain Video Processing and Transcoding Susie Wee, John Apostolopoulos Mobile & Media Systems Lab HP Labs Stanford EE392J Lecture 2006 Hewlett-Packard Development Company, L.P. The information

More information

MPEG-2 standard and beyond

MPEG-2 standard and beyond Table of Content MPEG-2 standard and beyond O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ November 18, 2009 1 Table of Content MPEG-2 standard 1 A brief history

More information

HAMED SARBOLANDI SIMULTANEOUS 2D AND 3D VIDEO RENDERING Master s thesis

HAMED SARBOLANDI SIMULTANEOUS 2D AND 3D VIDEO RENDERING Master s thesis HAMED SARBOLANDI SIMULTANEOUS 2D AND 3D VIDEO RENDERING Master s thesis Examiners: Professor Moncef Gabbouj M.Sc. Payman Aflaki Professor Lauri Sydanheimo Examiners and topic approved by the Faculty Council

More information

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing ĐẠI HỌC QUỐC GIA TP.HỒ CHÍ MINH TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA ĐIỆN-ĐIỆN TỬ BỘ MÔN KỸ THUẬT ĐIỆN TỬ VIDEO AND IMAGE PROCESSING USING DSP AND PFGA Chapter 3: Video Processing 3.1 Video Formats 3.2 Video

More information

Final report on coding algorithms for mobile 3DTV. Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin

Final report on coding algorithms for mobile 3DTV. Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin Final report on coding algorithms for mobile 3DTV Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin MOBILE3DTV Project No. 216503 Final report on coding algorithms for mobile 3DTV Gerhard

More information

Mesh Based Interpolative Coding (MBIC)

Mesh Based Interpolative Coding (MBIC) Mesh Based Interpolative Coding (MBIC) Eckhart Baum, Joachim Speidel Institut für Nachrichtenübertragung, University of Stuttgart An alternative method to H.6 encoding of moving images at bit rates below

More information

Image and video processing

Image and video processing Image and video processing Digital video Dr. Pengwei Hao Agenda Digital video Video compression Video formats and codecs MPEG Other codecs Web video - 2 - Digital Video Until the arrival of the Pentium

More information

Department of Computer Science University of Western Cape Computer Science Honours Project: Anaglyph Video By: Jihaad Pienaar Supervisors: Mehrdad

Department of Computer Science University of Western Cape Computer Science Honours Project: Anaglyph Video By: Jihaad Pienaar Supervisors: Mehrdad Department of Computer Science University of Western Cape Computer Science Honours Project: Anaglyph Video By: Jihaad Pienaar Supervisors: Mehrdad Ghaziasgar and James Connan Glossary Depth Map Stereo

More information

The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map

The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map Sriram Sethuraman 1 and M. W. Siegel 2 1 David Sarnoff Research Center, Princeton,

More information

The Core Technology of Digital TV

The Core Technology of Digital TV the Japan-Vietnam International Student Seminar on Engineering Science in Hanoi The Core Technology of Digital TV Kosuke SATO Osaka University sato@sys.es.osaka-u.ac.jp November 18-24, 2007 What is compression

More information

Segmentation based coding of depth Information for 3D video

Segmentation based coding of depth Information for 3D video Department of Signal Theory and Communications M.Sc. dissertation Segmentation based coding of depth Information for 3D video Author: Payman Aflaki Beni Advisor: Professor Javier Ruiz Hidalgo Barcelona,

More information

The Scope of Picture and Video Coding Standardization

The Scope of Picture and Video Coding Standardization H.120 H.261 Video Coding Standards MPEG-1 and MPEG-2/H.262 H.263 MPEG-4 H.264 / MPEG-4 AVC Thomas Wiegand: Digital Image Communication Video Coding Standards 1 The Scope of Picture and Video Coding Standardization

More information

New Techniques for Improved Video Coding

New Techniques for Improved Video Coding New Techniques for Improved Video Coding Thomas Wiegand Fraunhofer Institute for Telecommunications Heinrich Hertz Institute Berlin, Germany wiegand@hhi.de Outline Inter-frame Encoder Optimization Texture

More information

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn Basic Video Compression Techniques Chapter 10 10.1 Introduction to Video Compression

More information

3366 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 9, SEPTEMBER 2013

3366 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 9, SEPTEMBER 2013 3366 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 9, SEPTEMBER 2013 3D High-Efficiency Video Coding for Multi-View Video and Depth Data Karsten Müller, Senior Member, IEEE, Heiko Schwarz, Detlev

More information

Low Complexity Multiview Video Coding

Low Complexity Multiview Video Coding Low Complexity Multiview Video Coding Shadan Khattak Faculty of Technology De Montfort University A thesis submitted for the degree of Doctor of Philosophy April 2014 To my family. Abstract 3D video is

More information

Video Compression Standards (II) A/Prof. Jian Zhang

Video Compression Standards (II) A/Prof. Jian Zhang Video Compression Standards (II) A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2009 jzhang@cse.unsw.edu.au Tutorial 2 : Image/video Coding Techniques Basic Transform coding Tutorial

More information

Prof. Feng Liu. Spring /27/2014

Prof. Feng Liu. Spring /27/2014 Prof. Feng Liu Spring 2014 http://www.cs.pdx.edu/~fliu/courses/cs510/ 05/27/2014 Last Time Video Stabilization 2 Today Stereoscopic 3D Human depth perception 3D displays 3 Stereoscopic media Digital Visual

More information

Motion-Compensated Subband Coding. Patrick Waldemar, Michael Rauth and Tor A. Ramstad

Motion-Compensated Subband Coding. Patrick Waldemar, Michael Rauth and Tor A. Ramstad Video Compression by Three-dimensional Motion-Compensated Subband Coding Patrick Waldemar, Michael Rauth and Tor A. Ramstad Department of telecommunications, The Norwegian Institute of Technology, N-7034

More information

Lecture 5: Compression I. This Week s Schedule

Lecture 5: Compression I. This Week s Schedule Lecture 5: Compression I Reading: book chapter 6, section 3 &5 chapter 7, section 1, 2, 3, 4, 8 Today: This Week s Schedule The concept behind compression Rate distortion theory Image compression via DCT

More information

Scalable Extension of HEVC 한종기

Scalable Extension of HEVC 한종기 Scalable Extension of HEVC 한종기 Contents 0. Overview for Scalable Extension of HEVC 1. Requirements and Test Points 2. Coding Gain/Efficiency 3. Complexity 4. System Level Considerations 5. Related Contributions

More information

Representation and coding of 3D video data

Representation and coding of 3D video data Projet PERSEE Schémas Perceptuels et Codage vidéo 2D et 3D n ANR-09-BLAN-0170 Livrable D4.1 17/11/2010 Representation and coding of 3D video data Josselin GAUTIER IRISA Emilie BOSC INSA Luce MORIN INSA

More information

View Generation for Free Viewpoint Video System

View Generation for Free Viewpoint Video System View Generation for Free Viewpoint Video System Gangyi JIANG 1, Liangzhong FAN 2, Mei YU 1, Feng Shao 1 1 Faculty of Information Science and Engineering, Ningbo University, Ningbo, 315211, China 2 Ningbo

More information

Introduction of Video Codec

Introduction of Video Codec Introduction of Video Codec Min-Chun Hu anita_hu@mail.ncku.edu.tw MISLab, R65601, CSIE New Building 3D Augmented Reality and Interactive Sensor Technology, 2015 Fall The Need for Video Compression High-Definition

More information

A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND THREE STEP SEARCH ALGORITHM FOR P-FRAME

A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND THREE STEP SEARCH ALGORITHM FOR P-FRAME A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND THREE STEP SEARCH ALGORITHM FOR P-FRAME Dr. Mohammed H. Al-Jammas 1 and Mrs. Noor N. Hamdoon 2 1 Deputy Dean/College of Electronics

More information

EE 5359 H.264 to VC 1 Transcoding

EE 5359 H.264 to VC 1 Transcoding EE 5359 H.264 to VC 1 Transcoding Vidhya Vijayakumar Multimedia Processing Lab MSEE, University of Texas @ Arlington vidhya.vijayakumar@mavs.uta.edu Guided by Dr.K.R. Rao Goals Goals The goal of this project

More information

Introduction to Video Encoding

Introduction to Video Encoding Introduction to Video Encoding INF5063 23. September 2011 History of MPEG Motion Picture Experts Group MPEG1 work started in 1988, published by ISO in 1993 Part 1 Systems, Part 2 Video, Part 3 Audio, Part

More information

AUDIO AND VIDEO COMMUNICATION MEEC EXERCISES. (with abbreviated solutions) Fernando Pereira

AUDIO AND VIDEO COMMUNICATION MEEC EXERCISES. (with abbreviated solutions) Fernando Pereira AUDIO AND VIDEO COMMUNICATION MEEC EXERCISES (with abbreviated solutions) Fernando Pereira INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Electrotécnica e de Computadores September 2014 1. Photographic

More information

Rate Distortion Optimization in Video Compression

Rate Distortion Optimization in Video Compression Rate Distortion Optimization in Video Compression Xue Tu Dept. of Electrical and Computer Engineering State University of New York at Stony Brook 1. Introduction From Shannon s classic rate distortion

More information

Georgios Tziritas Computer Science Department

Georgios Tziritas Computer Science Department New Video Coding standards MPEG-4, HEVC Georgios Tziritas Computer Science Department http://www.csd.uoc.gr/~tziritas 1 MPEG-4 : introduction Motion Picture Expert Group Publication 1998 (Intern. Standardization

More information

Recent, Current and Future Developments in Video Coding

Recent, Current and Future Developments in Video Coding Recent, Current and Future Developments in Video Coding Jens-Rainer Ohm Inst. of Commun. Engineering Outline Recent and current activities in MPEG Video and JVT Scalable Video Coding Multiview Video Coding

More information

TECHNICAL RESEARCH REPORT

TECHNICAL RESEARCH REPORT TECHNICAL RESEARCH REPORT An Advanced Image Coding Algorithm that Utilizes Shape- Adaptive DCT for Providing Access to Content by R. Haridasan CSHCN T.R. 97-5 (ISR T.R. 97-16) The Center for Satellite

More information

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 5, MAY

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 5, MAY IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 5, MAY 2015 1573 Graph-Based Representation for Multiview Image Geometry Thomas Maugey, Member, IEEE, Antonio Ortega, Fellow Member, IEEE, and Pascal

More information

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy JPEG JPEG Joint Photographic Expert Group Voted as international standard in 1992 Works with color and grayscale images, e.g., satellite, medical,... Motivation: The compression ratio of lossless methods

More information

Compression of Light Field Images using Projective 2-D Warping method and Block matching

Compression of Light Field Images using Projective 2-D Warping method and Block matching Compression of Light Field Images using Projective 2-D Warping method and Block matching A project Report for EE 398A Anand Kamat Tarcar Electrical Engineering Stanford University, CA (anandkt@stanford.edu)

More information

Advanced Encoding Features of the Sencore TXS Transcoder

Advanced Encoding Features of the Sencore TXS Transcoder Advanced Encoding Features of the Sencore TXS Transcoder White Paper November 2011 Page 1 (11) www.sencore.com 1.605.978.4600 Revision 1.0 Document Revision History Date Version Description Author 11/7/2011

More information