syste hms, dizationn Linköping July LINKÖPIN Calpe by Rubén Berzosa Institutionen för Linköping

Size: px

Start display at page:

Download "syste hms, dizationn Linköping July LINKÖPIN Calpe by Rubén Berzosa Institutionen för Linköping"

Percival Allison
5 years ago
Views:

Rubén Berzosa Calpe Linköping July 2011 TEKNISKA HÖGSKOLAN LINKÖPIN NGS UNIVERSITETT Department of Electrical

1 Institutionen för syste emteknikk Department of Electrical Engineeringg Final Degree Project Overview of 3D Video: Coding Algorith hms, Implementations and Standard dizationn Final Degreee Project performed in Information Coding by Rubén Berzosa Calpe Linköping July 2011 TEKNISKA HÖGSKOLAN LINKÖPIN NGS UNIVERSITETT Department of Electrical Engineering Linköping University S Linköping, Sweden Linköpings tekniska t högskola Institutionenn för systemteknik Linköping

2 Overview of 3D Video: Coding Algorithms, Implementations and Standardization Master thesis in Information Coding at Linköping Institute of Technology by Rubén Berzosa Calpe LiTH-ISY-EX--YY/XXXX--SE

4 ii Abstract 3D technologies have aroused a great interest over the world in the last years. Television, cinema and videogames are introducing, little by little, 3D technologies into the mass market. This comes as a result of the research done in the 3D eld, solving many of its limitations such as quality, contents creation or 3D displays. This thesis focus on 3D video, considering concepts that concerns the coding issues and the video formats. The aim is to provide an overview of the current state of 3D video, including the standardization and some interesting implementations and alternatives that exist. In the report necessary background information is presented in order to understand the concepts developed: compression techniques, the dierent video formats, their standardization and some advances or alternatives to the processes previously explained. Finally, a comparison between the dierent concepts is presented to complete the overview, ending with some conclusions and proposed ideas for future works.

6 iv Acknowledgements This thesis completes my Master degree in Electrical Engineering and it has been carried out at Information Coding Division of the Electrical Engineering Department, in Linköpings Universitet. I would like to thank to my supervisor Jens Ogniewski, who, during the development of this thesis, has dedicated a lot of time, helping and providing me valuables advices, even in the written report. I can say I have nished this thesis and I have learnt much doing it because of his dedication. I also appreciate the opportunity Robert Forchheimer has given to me to carry out my thesis at Information Coding. Living in Linköping during 6 months has been one of the best experiences I have ever had, just something I will never forget. I have grown as a person learning many things life had not taught me yet. The new situations and feelings experienced have enriched me in a way I would have never imagined. The fantastic people I have known here is undoubtedly one of the best things I keep with me. Moreover, I would like to name all the people who are around my life. My friends, who are always when I need them and with whom I have had a lot of good moments. You make me feel good when I am with you and you deserve to be included in these lines. You can be sure that I do not forget any of you. And nally, I would like to express the most special gratitude to my family. My parents, who have been supporting me all my life, in good and bad moments, making eorts to do the best of me and for me. And my sister, who is a special person in my life and has always been a example for me. Without you three, this thesis would have never been a reality and I would have never arrived until this moment. A long way where you have accompanied me, enjoying with me my happy days and sharing with me my sad days. For this reason, this thesis is yours.

8 Contents Acknowledgements iv 1 Introduction Background Thesis content Thesis outline Basic coding issues Spatial prediction Discrete cosine transform Temporal prediction Inter-view prediction Inter-view prediction for key pictures Inter-view prediction for non key pictures Specic implementations D Video formats Video only formats Conventional Stereo video Multiview video Coding standards Depth enhanced formats Video plus depth Multiview video plus depth Layered depth video Coding standards Other implementations Computational saving methods Wavelet transformation Real 3D - Digital holography Comparison Discrete Cosine Transform Vs Wavelet Transform Video only formats, Depth enhanced formats and Holography.. 36 vi

9 CONTENTS vii 4.3 Spatial, temporal and inter-view prediction Conclusions Conclusions Future work Bibliography 42

10 Chapter 1 Introduction 3D 1 video technologies in cinema, television and videogames have recently generated an increased interest over the world, paving the way for a lot of researching activities in 3D areas. This results in many improvements achieved in the 3D eld, solving many of its limitations such as quality, content creation or 3D displays. It has to be noted though that 3D technology is known since the 19th century. Consequently, 3D technology is, although little by little, arriving at cinemas and at home, through dierent available channels: terrestrial broadcast, cable, Blue-Ray disc or streaming. Some examples are the successful lm Avatar, the football World Cup 2010 broadcasted in 3D, the NVIDIA 3D vision with domestic and professional applications or in the videogames world, the new Nintendo 3DS and the PlayStation 3 with 3D videogames. However, it is required that all the parts of the 3D processing chain are prepared, specically acquisition, coding and display. One of the major issues is coding the information due to the huge amount of data generated when recording several views and the possibility of adding depth data. That is why advances in coding techniques, even standards for Multi-View Video (MVV) have been developed [1], helping the introduction of the systems to the 3D market and achieving good quality-compression compromises. In addition, 3D video can be presented in dierent representations, which can be divided into several classes: those that include depth data, 4D wavelets, object model based or video only formats are some examples. Video only formats are the formats which are more common nowadays and have their corresponding standards. Stereoscopic technology 2 is currently the format widely used for many platforms. On the other hand, formats with depth images can be the next generation to be used for 3D video. It seems hence that a lot of formats, techniques and standards exists or have to be developed for 3D video applications. For this reason, it is interesting to 1 it refers to three dimensions: width, length and depth 2 this technique creates the illusion of depth in an image through two dierent video streams, one for each eye. 1

11 CHAPTER 1. INTRODUCTION 2 write a thesis to expose and analyze the topic and its current situation. Finally, the properties of the human visual system are the key of how previous systems works, consequently dierent application scenario and requirements concerning 3D exists. Below these three topics are claried. 1.1 Background When video and compression of its signal are treated, knowing what the human visual system is and how it works is always useful since it helps to understand if the signal is represented in a correct form or if some visible information is removed. How details both spatial as well as temporal in video are perceived depends on the visual system. The visual system starts working if light arrives on the retina, generating a stimulation which leads to the perception. The photo receptors of the retina are rods and cones and each of these types has its own properties and characteristics, providing dierent information: Rods: there are between million uniformly distributed rods in the retina. Their connection is parallel and they are achromatic, being sensitive to the intensity. The vision with these photo receptor cells is scotopic, which means they work when there is low illumination. This also results in another characteristic: it is a peripheric visual perception. Cons: there are about 6-7 million, not uniformly distributed and situated in the fovea. They are color sensitive, specically, three types of cons exits: red-sensitive (more than 60%), blue-sensitive (about 30%) and green-sensitive (only around 5%). They provide the photopic vision 3 and the details, working well when there are high illumination conditions. More information about photoreceptors, eye structure or graphics like spatial distribution of rods and cons in the retina is available in [2]. Due to that there are more rods than cons, the eye is more sensitive to intensity changes than to color changes. Apart from this assessment, all previous information is enough to justify the use of YCbCr 4 format instead of RGB 5 since it matches better the human visual perception. However, there are more aspects of the eye that must be taken into account. One of them, the persistence of vision, which is produced because the eye retains the displayed picture for a short time, aecting the frame rate in videos. Another important property is the spatial integration, which is related with the spatial frequencies and consists in averaging the details when the eye cannot discriminate them. Therefore, it is essential to know the human visual system and their properties to be able to exploit dierent redundancies and create proper video formats. 3 vision of the eye under well-lit conditions 4 Y provides brightness and Cr and Cd the chrominance 5 Red, Green and Blue. It refers to the color composition done from the primary colors.

12 CHAPTER 1. INTRODUCTION 3 But there is another subject about 3D technology which is important to focus on before beginning the concepts of the following chapters: application scenarios. 3D video is becoming a reality with dierent applications, helped by Multiview Video Coding (MVC) techniques and the advances in displays and acquisition. These applications can be grouped in dierent categories: Free viewpoint video (FVV): the viewer can choose the viewpoint in the 3D space he/she wants to observe. So it is possible to see the scene from dierent perspectives, navigating within a concrete range. Three dimensional TV (3DTV): this is an extension of the stereoscopic technology. Several cameras capture the light eld of the scene. Then, depending on system, it delivers one stereoscopic view or dierent stereoscopic views. In the latter case, the position of the viewer denes which stereoscopic view he/she watches. Stereoscopic TV is included in this category and it is the rst application of 3DTV available to the consumer. Immersive Teleconference: interactivity and virtual reality are desirable. For this reason FVV and 3DTV are supported, providing respectively interactivity and virtual reality. In this application several viewers interact so that feeling immersed in a 3D environment is important. Finally, it has to be pointed out, when applying or investigating MVC, some requirements that have to be known. Some of them are: view scalability, compression eciency, robustness, random access and resource consumption. For an extended explanation about this concepts, the reader is referred to [3] and [4]. 1.2 Thesis content As previously commented, compression becomes one of the important issues due to the quantity of data generated, specially when recording two or more views. Without compression, costs in hardware and systems to process digital video are increased, and furthermore, required bandwidth and storage are too high. To understand this problem the following numerical example of an uncompressed video and its requirements can be considered: Supposing a digital video sequence of 720x575 and 50 frames per second (fps) where the picture is represented by 3 bytes per pixel. Then, the size of a frame is 720x575x3x8 bits (9.93 Mbits). So, considering a video of one hour, the required space to store it is: x50x3600 bits (1.78 Tbits). This means that to send the video the bandwidth needed is (9.93 Mbits x 50 ) Mbps. This example illustrates clearly the necessity of compression. It becomes even more important when the case includes two or more views, since the data increases signicantly. Fortunately several techniques provide tools to exploit the

13 CHAPTER 1. INTRODUCTION 4 dierent redundancies existing in images. A lot of research has been done and it is still a focus of investigations, resulting in new approaches, novel video representations and improvements to existing techniques. Consequently, 3D technology is a eld which still needs a lot of time to arrive at a mature stage. At the moment, the extended format for the mass consumer market, stereoscopic video, is slowly gaining acceptance. At the same time, other formats are preparing their place in the 3D world. However, there is a common feature among all of them: providing high quality to the consumer. This concerns content creators, service providers and display manufacturers. It seems then, that a global vision of the topic and its currently situation is useful and can clarify the future of 3D video. So this thesis gives an overview of the state of the art of 3D video, always focusing on the coding part. 1.3 Thesis outline The thesis is structured as follows: In chapter 2 the most important techniques concerning coding are described, specically compression tools. They mainly correspond to Multiview Video Coding (MVC), an extension of H.264/AVC. Chapter 3 features the 3D video format and their representations. It also gives an overview over the current situation of the existing standards, relating the video representations with their corresponding standards. In chapter 4, concepts and their features explained in previous chapters are analyzed and compared. Finally, chapter 5 concludes the thesis and presents some ideas for future work.

14 Chapter 2 Basic coding issues One characteristic of multiview video (MVV) is that the system uses multiple views of the same scene, which means several cameras are capturing simultaneously several video streams of the same scene. This results in a huge amount of data that has to be stored or transmitted. Encoding, which is the process in which data information is transformed from one format to another, becomes an essential tool inside this process since the data need to be compressed. Fortunately, in multiview coding, correlations between adjacent views, temporal correlations within frames and spatial correlations in pictures exist. Taking advantage of these redundancies, the objective is to exploit them so as to obtain a compressed stream with an acceptable quality. For this reason ecient compression techniques have to be used in the encoding process of a MVV in order to achieve this objective In this chapter, the main coding issues concerning MVC are presented. Some of them are similar to codecs which are not MVC, nevertheless the combination of all of them results in the basic tools to implement the coding process in MVC. First, spatial prediction with Discrete Cosine Transform (DCT) is presented, then temporal prediction is discussed to nally describe inter-view prediction. 2.1 Spatial prediction Images are strongly non stationary data. For this reason, if a division into stationary regions is made, the resulting small areas of the picture are similar. As signicant correlation exists between neighboring pixels, the information to be coded can be reduced if those redundancies are exploited. To exploit spatial redundancies, intra prediction, which is based on neighboring pixels predictions, is applied. In this section transform technique such as the Discrete Cosine Transform are explained. It should be noted that other processes coming after the DCT in spatial coding, as weighting, scanning or entropy, are very common as well. 5

15 CHAPTER 2. BASIC CODING ISSUES Discrete cosine transform DCT is a mathematical tool widely used in image and video coding processes as it has very good energy compaction and de-correlation properties. It is close to optimal in terms of energy compaction capabilities and can be computed by fast algorithms. DCT is a good approximation to the Karhunen Loeve Transform (KLT), optimal among unitary transformations but more complex (and for this reason not considered by current coding standards). The mathematical expression is the following: ( ) s (2m + 1) kπ a k (m) = N cos 2M s = { 2 k 0 1 k = 0 (2.1) A k,l (m, n) = a k (m) a l (n) With the objective to remove redundancy, DCT transforms the signal to a new space where the signal representation is more compact, as described later. Specically, it transforms an NxN picture block into NxN block of coecients in the frequency domain. Figure 2.1: Matrix of coecients. Source: [34] The signal transform implies comparing (projecting to compute the inner product) each block in the image with the dierent components (vectors) that dene the transform space. Once the transform is done, the energy of a NxN pixel block is compacted into a few transformed coecients and the correlation between these coecients is signicantly reduced.

16 CHAPTER 2. BASIC CODING ISSUES 7 Figure 2.2: Matrix of DCT coecients after the transformation. [34] As a result, only a few coecients are necessary to recover a substantial amount of information since these few coecients represent the largest part of the signal energy. An interesting aspect of the transform coecients is that all of them representing high frequencies can be discarded due to the fact that the human visual system is not sensitive to high spatial frequencies. Hence the perceptual quality of the reconstructed image is not aected if the discarding process is done. Apart from the previous purpose, DCT is also a tool employed in the blockbased hybrid video coding 1. Inside this process, motion prediction errors are encoded through DCT to achieve a reduction of information to be sent. Summarizing, the use of this transform allows the removal of spatial redundancy owing to its good energy compaction and de-correlation properties. However, some eects have to be considered, as for example blocking eect, consequence of the block division and the isolated processing of each block, that results in a degradation of the image, although there are solutions such as the deblocking lter employed in H Temporal prediction A strong correlation exists between successive frames in a sequence. The basic idea of temporal prediction is to remove these similarities by coding their dierences, obtaining a bandwidth compression without aecting signicantly the image resolution. In this section inter-prediction is explained: rst a review about motion compensation and next the picture coding structures process. Finally the two parts of motion compensation, prediction error coding and motion estimation, are specied. High temporal correlations between frames in a sequence of images are exploited by motion compensation technique, whose capacity to remove temporal redundancies is commonly used in video compression processes. A solution is to approximate the moving objects by using the regular non-overlapped blocks into which the image is previously partitioned. After that, it is necessary to determine the motion vector so that a representation of the movement of the 1 Hybrid coding technique consists basically in represent image in terms of original data (DCT coecients), predicted image (motion vectors) and prediction error (DCT coecients).

17 CHAPTER 2. BASIC CODING ISSUES 8 image block could be done. Then, it is possible to reconstruct the current frame from the reference frame(s) through the prediction error and the motion vector. The following expressions describe which prediction is done, where at rst the process is presented when no motion compensation is used: No motion compensation: if motion compensation is not used, the prediction is known as a linear temporal prediction. It works well in stationary regions, but in real world video the objects in the scene as well as the camera are usually moving. So it is not the proper process since the values of pixels in the same spatial location in adjacent frames can be dierent: ˆf (x, t) = f (x, t 1) (2.2) In this expression, ˆf(x, t) represents the predicted frame, t the time and x is the pixel. Uni-directional motion compensation: works not well for regions uncovered by object motion, but solves the problem of the previous case: ˆf (x, t) = f (x + d (x), t 1) (2.3) Where d (x) represents the motion vector of pixel x from time t to t-1. The reference frame must be reconstructed before the coded frame. This case, in which the prediction of the current frame is done from a previous reference frame, is known as forward motion compensation. Bi-directional motion compensation: handles better the uncovered regions. In this case, a pixel in the current frame is predicted from a pixel in the following frame (t+1 ) as well as a pixel in the previous frame (t-1 ). It should be noted that the dierences are actually encoded. The predicted value is: ˆf (x, t) = a f f (x + d f (x), t 1) + a b f (x + d b (x), t + 1) (2.4) Now, d b represent the motion vector at x from t to t+1 and d f the same from t to t-1. Analog to the previous case, the prediction done from a future reference frame is called backward motion compensation. Then, in bi-directional motion compensation both backward and forward motion compensation are used. Finally a b and a f are coecients that should be determined through a predictor. The use of motion vectors to predict allows the coding structure with I, P and B images. The rst common approach is the IBBP 2... structure (where I and P images are references for B images), nevertheless is not the most ecient temporal structure. Due to this disadvantage, hierarchical B pictures, which are more ecient than the traditional structure [5], are chosen to be described in this section. They represent a coding structure that uses bi-directional predictive 2 I, B and P images that are in the group of images

CHAPTER 2. BASIC CODING ISSUES 9 pictures (B pictures) as references for other B pictures within one group of pictures. These types of prediction schemes benet from the increased exibility that H.

18 CHAPTER 2. BASIC CODING ISSUES 9 pictures (B pictures) as references for other B pictures within one group of pictures. These types of prediction schemes benet from the increased exibility that H.264/AVC at picture/sequence level oers in comparison to former video coding standards and from the availability of the multiple reference picture technique [6]. Figure 2.3 shows a typical hierarchical reference picture structure. Figure 2.3: Hierarchical reference pictures structure for temporal prediction. [6] In this gure, the Group of Pictures (GOP) is built by the pictures located between a key picture (included in the GOP) and the previous key picture. The rst picture of the sequence is intra-coded, being an Instantaneous Decoder Refresh (IDR) picture. Indexes in the pictures indicates the hierarchical level, that is used to assure pictures are predicted from pictures with the same or higher temporal hierarchy level. Usually pictures are predicted using the two nearest pictures, always considering the constraint that the reference picture has to be encoded before the picture which is taking this reference picture as a reference. Arrows indicate to which pictures the picture can be a reference picture, and as expected, key pictures act as a reference to more pictures than B pictures. The hierarchical B picture concept is easily applied to multiview video as dened in section 2.3. A general view of the motion compensation has been reviewed in previous paragraphs. However, it is interesting to clarify the two parts of which motion compensation consists: Motion estimation: is conducted by minimizing a Lagrangian cost function J = D + λ R (2.5) This Lagrangian cost function J is the sum of distortion D and rate R, weighted by the Lagrange parameter λ. For each block S i of a picture, the motion estimation algorithm chooses the motion vector m i within a search range M in the reference picture that minimizes J m i = argmin {D (S i, m) + λ R (S i, m)} (2.6) Here, the distortion is calculated as the sum of squared errors between the current picture and the previously decoded reference picture s' D (S i, m) = [S (x, y, t) S (x m x, y m y, t m t )] 2 (2.7) (x,y)ɛb The rate R is the number of bits needed to transmit all components of the motion vector [6]. Although motion estimation algorithms are based

19 CHAPTER 2. BASIC CODING ISSUES 10 on temporal changes in images, depending on the application, motion estimation methods could be dierent. For video compression, estimated motion vectors are part of the motion compensation process in which a frame is coded from a reference frame. Prediction error: is usually obtained by nding the dierence between the original frame and the frame resulted from the motion compensation. This prediction error is encoded to be sent to the receiver and subjected to a transform coding operation. 2.3 Inter-view prediction In multiview video, sequences are captured by several video cameras, so the same scene is captured from nearby viewpoints. As a result, high correlations exist between the pictures of dierent views, which means, inter-view redundancy is present. The objective of inter-view prediction is to exploit these similarities between neighboring views in order to achieve a good compression rate. It is important to stress that encoding each view separately 3 is an inecient way to compress (multiview) video, as results in [5] indicate, because inter-view redundancies are not considered. There are two dierent inter-view predictions depending on if it applies for key frames only or for key and non key frames Inter-view prediction for key pictures A common method used in video coding based on motion compensated prediction is the inter prediction process, which basically consists in replacing intracoded (I) images with inter-coded (P,B) images in order to reduce the bit rate. Adding this idea to the scheme in where only inter prediction is applied and every view is coded independently, will result in a signicant coding gain. The following gures, 2.4 and 2.5, illustrate this concept. It should be noted that in these gures, the hierarchical B pictures concept is applied: Figure 2.4: Temporal Prediction using hierarchical B pictures. [6] 3 Known as simulcast coding, which is the method of gure 2.4 and can be done by any H.264/AVC codec.

20 CHAPTER 2. BASIC CODING ISSUES 11 Figure 2.5: Inter-view prediction for key pictures. [6] In these gures, horizontal axis represent the time and vertical axis the dierent views of the video. Every view corresponds to a dierent camera, in this case a total of 8 and the GOP has the size of 8 images. 2.4 represents the scheme where only temporal prediction is used 4. But in gure 2.5, only view S0 maintains the same scheme than in the previous gure (temporal prediction). This view (in this case S0), that is known as base view, has the minimum value of a view order index in the coded sequence and is characterized by not using inter-view prediction, so can be decoded independently of other views. There is only one base view in a coded video sequence. However, for the other views, inter-view prediction is applied by replacing all intra-coded key pictures by inter-coded pictures. An interesting aspect is how all these changes can aect in general the processes. Due to the desire of retaining the pictures of each GOP, the prediction structure does not change and synchronization and random access are still provided since key pictures of the base view are coded in intra mode (coded as I pictures, independently of other views and pictures). In contrast, the new scheme introduces an eect on the encoding process (and as a consequence, on the decoding process as well): it is not possible to process independently the video sequences of the individual views Sn since reference pictures are shared when they have to be stored in a shared buer (for parallel decoding) or interleaved into one bit stream (for sequential processing) Inter-view prediction for non key pictures Some analysis showed that the use of temporal prediction with inter-view reference frames at the same time improves coding eciency [6]. So it is logical to apply the idea of using inter-view prediction not only in key pictures, but in non key pictures as well, so that statistical dependencies can be exploited better. Inter-view prediction can be extended to non key pictures as gure 2.6 illustrates. Comparing with previous cases (gure 2.4 and gure 2.5) this example, 4 Usually simulcast coding is used as a reference to compare highly ecient temporal prediction structures with prediction structures that additionally use inter-view prediction.

CHAPTER 2. BASIC CODING ISSUES 12 which consists of the same number of cameras and the same length for the GOP, shows the whole behavior of the structure.

21 CHAPTER 2. BASIC CODING ISSUES 12 which consists of the same number of cameras and the same length for the GOP, shows the whole behavior of the structure. Now all the non key pictures are inter-coded pictures (B pictures), being also predicted from the same timepoint picture of the neighboring views using inter-view prediction. As gure 2.6 indicates, in key pictures the process remains as specied in section Figure 2.6: Inter-view prediction for key and non key pictures. [6] Indeed, there are more prediction tools as view synthesis prediction, which is discussed in [7]. The disadvantage in this case and in inter-view prediction in non key pictures compared with the previous prediction structures (simulcast and inter-view prediction in only key pictures) is that they are more complex, and as a result, the computation process becomes more complex as well. On the other hand, they have coding eciency advantages which help to compress the multiview video data.

23 Chapter 3 Specic implementations In the previous chapter, some coding techniques to compress the amount of data were presented. Although they are essential to allow the transmission process, these techniques are not suitable for all 3D video representations. For this reason, in this chapter the dierent 3D video formats are specied, paying special attention to those which does not only use the most common predictions. Some of these formats have their corresponding standard, but not all of them. Standardization, an integral part of telecommunications, is necessary to assure integrity of the systems, covering aspects as communications protocols, interoperability of implementations and safety requirements. So it becomes an important part of the implementations of 3D video formats. For this reason, an analysis about the present standardization, including ideas for the upcoming standards, is done. Finally, other implementations and improvements of the coding tools and techniques (some of them appeared already in chapter 2) are specied with the objective of nding solutions to some problems as computational complexity or simply value new ways to achieve the same objective D Video formats 3D video is becoming an interesting technology arriving at the home of the consumer. This content is provided through Blu-ray, 3DTV broadcast or Internet. For these home applications exists (or are being designed) a variety of 3D displays systems such as two view stereo systems or multiview auto-stereoscopic displays. As a consequence a lot of dierent 3D video formats are available or are being investigated [1]. The data included is dierent and is strongly related to specic display types, having a direct inuence in the design of the 3D processing chain. As a result, several compression and coding algorithms exist for the dierent 3D video formats. The fact is that standard formats and ecient compression are indispensable for 3D applications, thus some of them are standardized and 14

24 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 15 widely established. On the other hand, others are currently under investigation. In this section, two classes of representations are distinguished, video only formats and depth enhanced formats, each of them including several representations with their own specic characteristics Video only formats Basically two video representations are part of this class: conventional stereo and multiview video, whose corresponding standardization is the H.264 family [8], including several extensions such as MVC. In this format only color pixel video data is involved. Two or more cameras capture the scene generating the corresponding multiple (two or more) signals, which are processed, nevertheless, without scene geometry information. Since the basic coding tools described in chapter 2 are mainly used in these representations, a brief and not extensively description and a standard overview is done in this section Conventional Stereo video This representation is the most well known type, being capable to create the illusion of depth in the images. Basically, a pair of sequences show the same scene from slightly dierent positions, one for the left eye and another for the right eye. As a result the data to be stored or transmitted is twice as big than the conventional monoscopic video if the least complex method is used: encoding and decoding separately the two video signals, which is called simulcast coding. However, and as seen in chapter 2, this technique is not ecient in compression terms. Other approaches can be applied as compatible stereoscopic coding, which is based on motion compensated DCT and where left view is independently coded and the right view is coded with references to the left view. Still the most convenient method to ecient coding is combining temporal and interview prediction as gure 3.1 illustrates, increasing the coding eciency. Figure 3.1: Stereo coding, combined temporal/interview prediction. [9] An alternative to conventional stereo video (CSV) is the mixed resolution stereo format (MRS). In this case, one of the two views of the stereo pair is

CHAPTER 3. SPECIFIC IMPLEMENTATIONS 16 downsampled using the binocular suppression theory 1, which introduces a low pass eect on the view. An example is showed in gure 3.

25 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 16 downsampled using the binocular suppression theory 1, which introduces a low pass eect on the view. An example is showed in gure 3.2, where the right view is subsampled to half resolution instead of being coded in full resolution. Then it is upsampled back to the original resolution into display format at the decoder. Figure 3.2: Stereo image pair with low-pass ltered right view. [9] Using this process of coding the two views with dierent resolution, the bit rate is signicantly decreased while no overall losses in 3D perception quality are produced [9] Multiview video While CSV consists only in two views, the MVV format shows the same scene from dierent views, enabling, for example, free viewpoint video applications. The format contains N views captured by an array of cameras, so the signals created has N times the amount of data of singleview video to be stored or transmitted. Figure 3.3 on the next page illustrates a multiview video system with stereo video included, where multiple video capturing, data compression, transmission and reception of the compressed data are represented. Since in chapter 2 the tools to reduce this data and the MVC features have been reviewed, in this subsection only a summarization of MVV is done. The MVC is used for coding N color only video sequences. Its coder uses temporal prediction structures, where pictures are coded as I, P or B pictures. However, temporal prediction is not the only prediction applied, since correlations between views exist. As developed in section 2.2, the coding eciency can be improved by using hierarchical B pictures structure. This means then that temporal and interview reference pictures are used in the coding process to predict the current picture. Regarding the restrictions of MVC, it is remarkable that a good data rate for e.g. 10 views is not possible to achieve. This is because of the dependency between bit rate and number of cameras. Hence, it is necessary to use advanced approaches to decouple the number of views for coding and transmission from the number of required output views. 1 It exploits properties of the human visual system: in a pair of corresponding points, one always suppresses the other. In this case (MRS), image is spatio-temporally low-pass ltered compared to the image in the other eye, achieving savings in bandwidth.

26 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 17 Figure 3.3: Multiview video system. [35] MVC is suitable for multiview video signals, however other formats that require geometry data are being investigated. Due to this restriction of these other formats and because synthesis of new views is dicult to do when using MVC, new depth enhanced formats have appeared [9] Coding standards Regarding 3D, video only formats are considered in the H.264 family. The standard H.264 is considered as a family of standards, consisting of several proles. This means that not every decoder is able to decode all proles, only those which are in the specication of the decoder. With the aim of being an standard that works with lower bit rates than previous standards (H.263 or MPEG-2 for example) and oering a good video quality, the H.264/Advanced Video Coding (AVC) video coding standard was designed, whose rst version was completed in May It is a standard of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) and since the exibility the standard provides, another objective is fullled: to cover a variety of video applications in networks and systems such as mobile services, Internet Protocol Television (IPTV), High Denition Television (HDTV) or High Denition (HD) video storage. Since the central requirement for the designs is high compression eciency, it seems it was the only factor to be considered for the video standard. However, this requirement is not the only standardization requirement. For instance, some general issues are usually considered: resource consumption, low delay or error robustness tend to be analyzed in the design process. Others are specic to MVC extension and important for the correct development of the system. Some examples are scalability or backward compatibility. Considering all the requirements and demands, the tools provided in the standard try to achieve all these issues. For a detailed information about all these tools and methods and the proles included during the years, the reader

264/AVC simulcast: already reviewed in chapter 2, consists in that the video sequences are independently encoded, transmitted and decoded.

27 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 18 is referred to the standard [8]. H.264/AVC is often the starting point for stereo and MVC. It is possible to distinguish three methods or extensions included in the standard. Next, the processing chains of standardized coding methods for video only format are listed: H.264/AVC simulcast: already reviewed in chapter 2, consists in that the video sequences are independently encoded, transmitted and decoded. Although it can be used for CSV, MRS and MVV, it is not the most common solution since it oers lower ecient coding than the following methods listed. Figure 3.4: H.264/AVC simulcast. [1] H.264/AVC including stereo supplemental enhancement information (SEI ): for CSV the Stereo video information SEI message was developed. It indicates to the decoder that the coded video sequence consists of stereo view content. Figure 3.5 shows the process followed: the two sequences are, line by line, interlaced into one. Then the encoder, which is working in eld coding mode after receiving the SEI message, applies the techniques to reduce redundancy and the bit stream is transmitted to be later decoded. Since the two images need to have the same size, this method is not suitable for MRS. Figure 3.5: H.264/AVC stereo SEI. [1] H.264/MVC : multiview video coding extensions were completed in November This extension, annex H of the standard, uses concepts seen in

3.1.2 Depth enhanced formats When investigating other formats than MVC, some restrictions appear due to the fact that MVC was designed for multiview video signals.

28 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 19 chapter 2, in which a picture uses temporal and inter-view prediction. It includes new techniques to improve coding eciency and to reduce the complexity in the decoded part as well as new functionalities. Figure 3.6: H.264/MVC. [1] Depth enhanced formats When investigating other formats than MVC, some restrictions appear due to the fact that MVC was designed for multiview video signals. Application formats for 3D video coding which need geometry data are a good example of formats which have these restrictions. Depth enhanced formats dier from video only formats by including scene geometry data in form of depth maps, which are characterized by having dierent statistical properties than video signal. A complete 3D video coding framework that targets a generic 3D video format for depth enhanced formats and associated ecient compression is depicted in gure 3.7. The estimation of the depth data is done at the transmitter side limiting the number of input views and providing for transmission a multiview video plus depth format. At the receiver side, the data (video and depth) is decoded and view synthesis is used to generate the necessary additional views for the display. View synthesis helps the computation process that requires the conversion from representation to display format. Figure 3.7: Overview of a 3D video system intended for depth enhanced formats. [26]

29 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 20 The advantage that these formats introduce is the ability to generate and display virtual views at arbitrary positions, useful if additional intermediate views are needed and if the occlusion problem is solved. In the following subsections some depth enhanced formats are specied and nally an analysis about the situation of the coding standards is presented Video plus depth Video plus depth (V+D) is the next format after MVV considering the complexity of the methods. It consists of a video signal and its corresponding per pixel depth map and both are transmitted. An example is illustrated in gure 3.8: Figure 3.8: Video plus depth: regular 2D color video and a 8 bit depth image. [25] As the gure shows, the depth data can be considered as a luminance only video signal, being a gray scale image. If these images are fed into the luminance channel and chrominance is xed to a constant value, the video signal which results can be processed by any state of the art video codec. The range is quantized with 8 bits, which means 256 values are associated to the dierent points. The most distant point is represented by the value 0 while the closest point by the value 255. Therefore, the depth range, which is the distance of the 3D point from the camera, is limited between z far and z near and every pixel has its z value. Depth data could be generated at the receiver, but it is not trivial. The best option is to provide depth information at the sender so that 3D content producers has control over the display output. Then a method to store the depth data has to be applied. Usually the inverted real world depth data is used: stored depth = z max 1 z 1 z max 1 z min (3.1)

30 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 21 Where an 8 bit representation (values from 0 to 255) is assumed. Thanks to this method it is possible to get a high depth resolution of close objects and a coarse depth resolution in farther objects. On the other hand, when referring to disparity values, the inverse quantized depth values are not identical due to the dependence between camera distance and disparity values. Then, the following equation is used in synthesis scenarios to recover the depth values z from the depth maps: z = stored depth 255 ( 1 ) 1 z min 1 z max + 1 z max (3.2) One limitation of the video plus depth format representation is its FVV functionality. When the position of the user changes, the rendered stereo pair is supposed to be adjusted to the new position. However, the navigation range of the head motion parallax is signicantly limited, although extending video coding scheme to a N-view plus N-depth environment, a higher navigation range is achieved. So, by rendering the virtual intermediate views, the FVV functionality is increased. Figure 3.9 shows the synthesis of arbitrary intermediate views with this format. Figure 3.9: Synthesized view from video and depth of adjacent camera views. [25] Virtual views are generated in real-time by Depth Image Based Rendering (DIBR) [10] at the receiver. This technique allows the synthesis of views as follows: rst the points of the original image are projected into the 3D scene using the depth maps and later the 3D points are projected into the image plane. It can be seen as 2D points are projected to 3D points and then the contrary process is applied. Concatenation of these processes is known as 3D

31 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 22 image warping and the idea of this process has been explained in the previous paragraphs. Another common problem is about the color with depth representation, for example, when generating depth or disparity information. Actually, captured depth elds are not good enough given that available cameras which capture video with per sample depth do not provide good quality. Nevertheless algorithms for depth and disparity estimation are studied and this problem can be solved. The last inconvenience is the increased complexity. In order to achieve the advantages of this representation, the implementations are more complex on the transmitter side as well as on the receiver side. At the transmitter, the generation of depth data has to be done before encoding while at the receiver, to generate the stereo pair it is necessary, after decoding, to execute view synthesis. Despite of these limitations, the video plus depth concept is valuable because of its advantage of generating virtual views at arbitrary positions and its backward compatibility. Only by specifying high level syntax to let the decoder interpret the streams as color and depth, and transmitting the information about the depth range, it is possible to use existing video codecs in the decoding process Multiview video plus depth Multiview video plus depth (MVD) format appear as a new 3D video representation to cover the lack of the previous representations: MVC becomes inecient if the number of view is high and it does not provide continuity between views while continuity of V+D is still limited. The MVD representation however presents as a main advantage the ability to easily render intermediate views. To have this ability, the MVD process includes dierent complex steps, which are shown in gure Figure 3.10: MVD process. [9] At the encoder part a total of N color with N depths videos are encoded before being transmitted to the decoder. Previously, at the transmitter, the

32 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 23 depth is estimated for the N views. After transmitting, the data is decoded at the receiver and the virtual views are rendered. Regarding the autostereoscopic displays that MVD can eciently support, gure 3.11 illustrates an example. Figure 3.11: MVD: example scheme of view synthesis for support of multiview displays. [10] Here, the display consists of 9 dierent views (V1, V2...V9) which are shown at the same time. The user is able to see from his/her position (indicated in the gure as Pos1, Pos2 and Pos3) a stereo pair of these views depending on his/her own position. Of all of these views, only three are in the decoder as original views (V1, V5 and V9) with their corresponding depth maps D1, D5 and D9. Then, the depth image based rendering process is applied to obtain the other views and complete the display of the 9 views. Following this method, the process becomes more ecient than if using MVC, which would transmit the 9 display views directly. As in previous cases, the achievement of more eciency is paid with an increase of the complexity of the process Layered depth video This representation is derived from MVD and appears as an alternative to it. Actually, it is believed to be more suitable than MVD for 3DTV scenario due to that it is more compact and less information has to be transmitted. On the other hand, error prone vision tasks operating on unreliable depth data have to be added. The Layered Depth Video (LDV) format consists of multiple layers, containing information about depth and texture and describing the scene in terms of geometry and color. Specically, LDV uses one color video and a occlusion

CHAPTER 3. SPECIFIC IMPLEMENTATIONS 24 layer, both with its associated depth map.

Therefore, the idea of LDV is to describe hidden parts of the scene from the view of reference camera using the additional layers above mentioned. Figures 3.12 and 3.13 represents this concept.

[27] In this example, the scene is the left image of gure 3.12, where a blue ball is behind a brown ball. The scene is captured and then foreground layer and occlusion layer are created.

33 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 24 layer, both with its associated depth map. The occlusion layer is characterized for including image content that is occluded by foreground objects in the main layer. Therefore, the idea of LDV is to describe hidden parts of the scene from the view of reference camera using the additional layers above mentioned. Figures 3.12 and 3.13 represents this concept. Figure 3.12: Layered video depth concept - before transmission process. [27] Figure 3.13: Layered video depth concept - after transmission process. [27] In this example, the scene is the left image of gure 3.12, where a blue ball is behind a brown ball. The scene is captured and then foreground layer and occlusion layer are created. Occlusion information is constructed by warping the neighboring V+D views from the MVD representation. After that, the LDV stream is encoded and transmitted to nally obtain the center view and render and generate two views: from the left and from the right side of the image. Now, in these images the hidden element (blue ball) is visible. In gure 3.14 real images representing the dierent layers of gure 3.12 are presented.

15 shows how the generated images can look like.

34 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 25 Figure 3.14: Layered depth video before transmission - example with real images. [27] However, it is also interesting to observe an example of how the two views are generated. Next gure 3.15 shows how the generated images can look like. First picture is the central view with occlusion layer and the two images under this one, are the new views, left and right. Figure 3.15: Layered depth video after transmission - example with real images. [33]

35 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 26 LDV can be seen as an extension of V+D format since this format could be considered as the rst layer of the LDV. Nevertheless, strictly speaking it is really derived from MVD. Wrapping the main layer image onto other input images, it is possible to generate LDV from MVD: the idea is to determine the parts that are occluded in the main layer image and correspond to other contributing input images. Once this process is done, all theses images are considered as a residual images, thus they are transmitted Coding standards The actual standardization situation in depth enhanced formats diers from the situation of the standards in video only formats. While coding algorithms are standardized for video only formats, some algorithms of depth enhanced formats are not still standardized, so more standards have to be developed for these formats. However, two specications have been elaborated: MPEG-C part 3 standard (ISO/IEC ) [11]: enables already V+D, encoding the information into two dierent streams: video in one and depth in the other. That means both video and depth are encoded separately. Then, these two streams are multiplexed into one stream, frame by frame and with depth map parameters. The standard denes the representation for depth maps to be encoded as 2D sequences and the parameters to interpret the depth values at the receiver. On the other hand, the standard does not cover the techniques which concern compression and transport. H.264/AVC Auxiliary Picture Syntax: also uses V+D format. In this case, a primary coded picture, the video, is supplemented by an auxiliary coded picture, the depth. So the primary coded picture, since it contains all the macroblocks of the picture, is the only one which aects the decoding process. One requirement of primary and auxiliary coded pictures is that both have to contain the same number of macroblocks. In contrast to MPEG-C part 3, where video and depth are coded independently, here primary and auxiliary coded pictures are combined into a single source, which is then coded by H.264/AVC to be sent. After transmission, the coded pictures, primary and auxiliary, are decoded independently at the same time. Apart from these two standards, others extensions exist, specically MPEG-4 Multiple Auxiliary Component (MAC) and H.264 Scalable Video Coding (SVC). With the rst one it is possible to encode auxiliary components, so depth map could be employed. H.264/SVC, which is an extension of H.264/AVC (annex G) [8], provides a compatible base layer and one or more enhancement layer. The base layer has the minimum quality and the enhancement layer represent the increased quality, where depth can be an option and be decoded by a SVC decoder. Despite the existence of these standards, as introduced at the beginning of this subsection, more coding algorithms are required for these formats that

36 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 27 are not properly supported, for example, MVD. For this reason and for the requirements of the market, the Moving Picture Experts Group (MPEG) has initiated a development of a generic 3D video standard. Figure 3.16 illustrates these targets an main ideas: Figure 3.16: Target of MPEG 3D video coding initiative. [1] The aim is to support high quality autostereoscopic displays and to solve problems with varying display types and sizes, adding for example a variable stereo baseline or an adjustment of the depth perception. But not only these objectives have to be achieved, other example objectives include the reduction of the rate requirements or an improved rendering ability. 3.2 Other implementations In chapter 2 basic and most common methods and tools used for the 3D video signal treatment in the coding process have been reviewed. However, these are not the only methods existing since many investigations have been done in this eld. As a result, other implementations and improvements to known techniques have appeared and can be adopted by some systems. This section presents interesting implementation results of research considering some factors as eciency improvement and computational savings for example. In addition, real 3D with digital holographic, is treated in the following subsections Computational saving methods One of the problems in the encoding systems is the computational complexity. This is caused by the motion estimation and the disparity estimation, which are

37 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 28 used in MVC when coding each macroblock to provide a high coding eciency. Some research on the computational complexity in MVC has been done and as result some methods have been proposed to reduce this complexity with a minimal loss of image quality: [12], [13] or [14] are some of them. But since a lot of these methods exist, only one of them, which appears in a recent publication [15], is explained in this subsection: Early SKIP mode decision: as justied in chapter 2, prediction modes of a macroblock (MB) in current view and in its neighbor view are similar due to similarities in the content. This algorithm assumes this fact and take advantage of it. First important element is the global disparity vector (GDV) because it helps to locate a suitable MB in the neighbor view. GDV is measured by the size of MB unit, however it is not the exact disparity between both MBs. That is why to estimate the mode of an MB, the mode of the corresponding MB and its 8 MB neighbors are taken. Figure 3.17 depicts this concept: Figure 3.17: MB and its neighboring MBs in previous coded view. [15] Then, weight factor of SKIP mode is necessary to allow to discern when SKIP mode is suitable. For this, the weight factor is compared with a threshold and when it is larger than the threshold, SKIP mode is considered better, otherwise it is thought the MB needs variable size motion estimation and disparity estimation. One problem is how to x the threshold given that, since video contents are similar between views, coding mode between views are similar as well. The denition of the weight factor of

38 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 29 SKIP mode for an MB is: W SKIP mode = N 1 i=0 α i w i (3.3) Where N is the total number of MBs, w i the weight of SKIP mode for the MBs (MB 0 to 8 in gure 3.17), that is 1 when the (corresponding) MB is coded with SKIP mode and 0 otherwise. Finally α i is the MB-weight factor, a parameter that is dierent depending on the MB. An analysis made in [28] shows that the percentage of usage of SKIP mode occurring in B slices is high, depending on the video between 60% and 90%. This means that this percentage of the MBs are usually coded as SKIP mode and consequently, complexity in computational process decreases. Another interesting method is the adaptative early termination, whose strategy is based on a threshold which can be determined by two dierent types of processes. One consists in using constant values, with the possibility of applying the same threshold independently of the coding conditions. The second, more complex method, takes references about the rate distortion costs from spatially and temporally neighboring MBs and xes the early termination thresholds for the current MB. Remarkable methods are the fast intermode size decision, which considers the fact that MBs in homogeneous region choose large sizes 2 and MBs in active motion choose small sizes, and the selective intra prediction in inter frame, employed when a new object appears and the motion vector could not be as ecient as using the original data. A possible algorithm could be based on these four approaches mentioned and the experimental results in [15] conclude that the objective of reducing the computational complexity is achieved, always considering that many others methods and algorithms can help, in complexity terms, the encoding process as well Wavelet transformation The Wavelet transform emerged as a useful tool in image and video compression because it oers exibility when representing non-stationary images. In addition, the wavelet representation oers a desirable property for video coding applications: multiresolution expression of a signal. Actually, this multiresolution decomposition provides itself a scalable bit stream. The wavelet concept, however, requires a mathematical support to be more understandable. Here, the Haar Basis in one dimension, which is the simplest wavelet, is introduced. A wavelet decomposition transforms a function into scaling coecients and detail coecients. Then, ignoring the coecients with low value, allows lossy 2 it refers to mode sizes: 16 x 16, 16 x 8, 8 x 8...

39 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 30 compression of the signal. Assuming having a number of nested linear function spaces: ν 0 ν 1 ν 2... ν j (3.4) The dimension of the spaces increase with j. For each space ν a set of basis functions, called scaling functions, are dened. When normalized, the following expression generates them: with φ j,t (x) = 2 j /2 φ ( 2 j x t ) t = 0,..., 2 j 1 (3.5) φ (x) = { 1 0 x < 1 0 otherwise On the other hand, the wavelet functions span the wavelet spaces, which complements ν j in ν j+1. The inner product is zero for each scaling and wavelet function at the same level. Expression 3.6 gives the wavelet functions: with ψ j,t (x) = 2 j /2 ψ ( 2 j x t ) t = 0,..., 2 j 1 (3.6) 1 0 x < 1 /2 ψ (x) = 1 1 /2 x < 1 0 otherwise Then a hierarchical basis, where a given function is represented, is made from scaling and wavelet functions: the wavelet decomposition. The compression can be achieved if the proper scaling and wavelet function are chosen, which makes it possible to represent the original signal with few coecients. As in the DCT case, wavelet transforms need complement processes as well: quantization or entropy coding for example. Wavelet coding techniques can be sorted in dierent categories. Some of them are: Spatial-domain motion compensation followed by 2D wavelet transform Wavelet transform followed by frequency domain motion compensation 3D wavelet transforms with or without motion estimation However, one of the most interesting is the 4D wavelet [16], where a 4D matrix of pixels represents a multiview video stream. But wavelet coding is not easy to apply to MVC despite of the exibility it contributes. The problem is MVV is high dimensional (two dimensions for spatial direction, one dimension for temporal direction and another one for view direction) and existing motion compensated temporal lters are not ecient when exploiting correlations between views due to that the redundancies are dierent: view disparity is not the same as temporal motion.

CHAPTER 3. 31 SPECIFIC IMPLEMENTATIONS In addition to the previous problem, some di culties arise in compressing the 4D wavelet coe cients in a e cient way.

40 CHAPTER SPECIFIC IMPLEMENTATIONS In addition to the previous problem, some di culties arise in compressing the 4D wavelet coe cients in a e cient way. 4D wavelet coe cients have the possibility of being generated with any decomposition structure, for this reason some proposed methods consist in reorganizing these coe cients into 3D data. Nevertheless, nding a solution to all these previous issues as is done in [16], it is possible to construct a suitable 4D wavelet based on MVC Real 3D - Digital holography Holography is a technique of the branch of optics that uses a laser and its coherent light to construct a hologram which can create a 3D image. Hence, this image behaves as if the object was present, changing as the viewing position 3 changes, called motion parallax. An improvement is the possibility to show videos on a holographic volumetric display. As a consequence, digital holography, where holograms are digitally representated (i.e. they can be processed, analyzed and transmitted electronically), becomes an important topic inside the eld of 3D displays. Speci c hologram sequence compression techniques are needed, although some investigations have been done, as for example using MPEG-4 part 2 video coding algorithm for the compression of hologram sequences [36]. Digital holography is de ned as the technology of acquiring and processing holographic measurement data, typically via a Charges Coupled Device (CCD) 4 camera or a similar device. It means digital holography treats the data in order to reconstruct the object data from the recorder measurement data. More generally, it can be seen as a 3D technique for capturing real world objects. Figure 3.18 shows an example of a 3D holography image and [17] is an interesting demonstration of a 3D video. 5 Figure 3.18: 3D holography image 3 it well should be noted that motion parallax can be introduced without holographic images as 4 de nition 5 extracted from wikipedia extracted from:

CHAPTER 3. SPECIFIC IMPLEMENTATIONS 32 Next, the general principles of digital holography are detailed. For more extended information about holography and digital holography issues see [18].

41 CHAPTER 3. SPECIFIC IMPLEMENTATIONS 32 Next, the general principles of digital holography are detailed. For more extended information about holography and digital holography issues see [18]. The recording process is illustrated in gure The recorded medium, a CCD, receives two lights to record the hologram. One is the plane reference wave and the other the reected wave from the object. Since both waves arrives to the surface of the CCD, an interference between them is recorded. After that, the resulting hologram is electronically recorded and stored. Figure 3.19: Recording with digital holography. [18] When reconstructing, several methods can be chosen like for example the numerical hologram reconstruction, where intensity and phase are calculated. The basis is done by the equation 3.7. where Γ (ξ, η ) = i λ ˆ ˆ ρ = h (x, y) E R (x, y) exp ( i 2π λ ρ ) ρ dxdy (3.7) (x ξ ) 2 + (y η ) 2 + d 2 (3.8) and h (x, y)the hologram function and ρ the distance of points between hologram plane and reconstruction plane. The scheme of gure 3.20 species the geometrical values which appear in the above equations. Figure 3.20: Coordinate system of numerical reconstruction. [18]

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50