Final report on coding algorithms for mobile 3DTV. Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin

Size: px
Start display at page:

Download "Final report on coding algorithms for mobile 3DTV. Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin"

Transcription

1 Final report on coding algorithms for mobile 3DTV Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin

2 MOBILE3DTV Project No Final report on coding algorithms for mobile 3DTV Gerhard Tech, Karsten Müller, Philipp Merkle, Heribert Brust, Lina Jin Abstract: A low complexity view synthesis algorithm suitable for mobile devices has been developed and implemented. The implemented renderer provides two different modes. The first mode enables very fast processing by rounding disparities to integer values to avoid interpolation. The second mode supports floating point disparities by interpolation at sub pixel positions. For both modes pre-processing filters for the depth and post-processing filters for the rendered view have been implemented. A method for removing irrelevant information from depth maps in Video plus Depth coding is presented. Irrelevant edges and features in the depth map can be damped while the quality of the rendered view is retained. The processed depth maps can be coded at a reduced rate compared to unaltered data. Coding experiments show gains up to 0.5dB for the rendered view at the same bit rate. The integration of the PSNR-HVS, to the JMVC Software for Multiview coding is described. A QP dependent correction factor for the Lagrange multiplier has been determined. The modified rate-distortion optimization process leads to gains up to 1.6dB PSNR-HVS using the new video quality metric. A final summary of the stereo video formats and coding methods evaluated in the Mobile3DTV project is given; results and advancements are pointed out. Keywords: 3DTV, coding algorithms, Rendering, Depth map filtering, Perceptual video coding

3 Executive Summary This deliverable is tripartite: The first part describes the advances achieved in Video plus Depth coding. A rendering approach supporting sub-pixel accuracy and a filter for removing irrelevant signal parts from depth data are presented. The second part describes a software encoder using a new video quality metric. And third, a final summary on coding algorithms for mobile 3DTV is presented in the last part. For the Video plus Depth approach a low complexity view synthesis algorithm suitable for mobile devices has been developed and implemented. For fast processing each row of the synthesized view is rendered using data of a line of the corresponding video and depth frame sequentially. This minimizes the amount of needed memory as well as the number of memory accesses. The implemented renderer provides two different modes. The first mode enables very fast processing by rounding disparities to integer values to avoid interpolation. The second mode supports floating point disparities by interpolation at sub pixel positions. For both modes pre-processing filters for the depth and post-processing filters for the rendered view have been implemented. The renderer supports different data formats for disparity, e.g. inverse depth data or scaled disparities. A method for removing irrelevant information from depth maps in Video plus Depth coding is presented. The depth map is filtered in several iterations using a diffusion approach. In each iteration smoothing is carried out in local sample neighborhoods, considering the distortion introduced into a rendered view. Smoothing is only applied when the rendered view is not affected. Therefore, irrelevant edges and features in the depth map can be damped while the quality of the rendered view is retained. The processed depth maps can be coded at a reduced rate compared to unaltered data. Coding experiments show gains up to 0.5dB for the rendered view at the same bit rate. The new filter is adapted to the new renderer. Hence it performs an integrated optimization of the depth data that leads to higher coding gains. The second part of the deliverable describes the integration of a new video quality metric, the PSNR-HVS, to the JMVC Software for Multiview coding. Therefore, the software structure of JMVC, and in particular the distortion classes, the rate-distortion interface and the Macroblock- Encoding class of the encoder have been modified. Coding experiments have been carried out to evaluate the gains achieved by the modified rate-distortion process. Constant scaling factors for the Lagrange multiplier used in the rate-distortion optimization have been evaluated. A QPdependent correction factor for the Lagrange multiplier has been determined for the new video quality metric (NVQM). With the optimized Lagrange multiplier the rate-distortion optimization process leads to gains up to 1.6dB at high bit rates using the new video quality metric compared to an encoder using the SSD for optimization. The last part of the deliverable gives a summary of the stereo video formats and coding methods evaluated in the Mobile3DTV project. Results and advancements are pointed out. It can be concluded that with the current technology development level the CSV representation format using MVC and the Video plus depth format using MPEG-C Part 3 perform best as coding approaches for Mobile3DTV. 2

4 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Table of Contents 1 Introduction Video plus Depth Coding View Synthesis for Video plus Depth Coding Relationship between depth and disparity Implemented Renderer Evaluation of results Conclusion Reduction of irrelevant information from depth maps Proposed Method Evaluation of results Conclusion and Outlook Multi View Coding Software Encoder using a new Video Quality Metric New Video Quality Metric (PSNR-HVS) Rate-distortion optimization Rate-distortion optimization using the new VQM Conclusion and Outlook Overview of coding algorithms for mobile3dtv Overview of representation format and coding approaches Representation Formats Coding Approaches Evaluation and advancement of stereo video coding for Mobile3DTV Subjective Evaluation Mixed Resolution Stereo representation and coding Video plus Depth MVC Conclusion Summary References

5 1 Introduction The deliverable is tripartite. The first part deals with the advances achieved in Video plus Depth coding. A rendering approach supporting sub-pixel accuracy and a filter for removing irrelevant signal parts from depth data are presented. The second part describes a software encoder using a new video quality metric. And third, a final summary on coding algorithms for mobile 3DTV is presented in the last part. For the Video plus Depth approach a low complexity view synthesis algorithm suitable for mobile devices has been developed and implemented. For fast processing each row of the synthesized view is rendered using data of a line of the corresponding video and depth frame sequentially. This minimizes the amount of needed memory as well as the number of memory accesses. The implemented renderer provides two different modes. The first mode enables very fast processing by rounding disparities to integer values to avoid interpolation. The second mode supports floating point disparities by interpolation at sub pixel positions. For both modes pre-processing filters for the depth and post-processing filters for the rendered view have been implemented. The renderer supports different data formats for disparity, e.g. inverse depth data or scaled disparities. The new rendering approach is presented in section 2.1. A method for removing irrelevant information from depth maps in Video plus Depth coding is presented in section 2.2. The depth map is filtered in several iterations using a diffusion approach. In each iteration smoothing is carried out in local sample neighborhoods, considering the distortion introduced to a rendered view. Smoothing is only applied when the rendered view is not affected. Therefore irrelevant edges and features in the depth map can be damped while the quality of the rendered view is retained. The integration of a new video quality metric, the PSNR-HVS, to the JMVC Software for Multiview coding is described in section 3. For this, the software structure of JMVC was updated. In particular, the distortion classes, the rate-distortion interface and the Macroblock-Encoding class of the encoder have been modified. Coding experiments have been carried out to evaluate the gains achieved by the modified rate-distortion process. Constant scaling factors for the Lagrange multiplier used in the rate-distortion optimization have been evaluated. A QP dependent correction factor for the Lagrange multiplier has been determined for the new video quality metric (NVQM). Section 4 gives an overall project summary of the stereo video formats and coding methods evaluated in Mobile3DTV. Results and advancements are pointed out. Author acknowledgement: Section 2 on the advancement of the Video plus Depth approach, section 3 on the integration of the new video quality metric and the final overview on the achievements of coding approaches in section 4.2 have been authored by Gerhard Tech. Lina Jin provided section on the new video quality metric and Philipp Merkle section 4.1 on coding approaches and representation formats. Karsten Müller and Heribert Brust assisted in the overall compilation of the deliverable. 4

6 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV 2 Video plus Depth Coding 2.1 View Synthesis for Video plus Depth Coding For the Video plus Depth approach a low complexity view synthesis algorithm suitable for mobile devices has been developed and implemented. For fast processing each row of the synthesized view is rendered using data of a line of the corresponding video and depth frame sequentially. This minimizes the amount of required memory as well as the number of memory accesses. The implemented renderer provides two different modes: The first mode enables very fast processing by rounding disparities to integer values to avoid interpolation. The second mode supports floating point disparities by interpolation at sub pixel positions. For both modes pre-processing filters for the depth and post-processing filters for the rendered view have been implemented. The implemented renderer supports different data formats for disparity, e.g. inverse depth data or scaled disparities Relationship between depth and disparity Fig. 1 Relationship between disparity and depth in parallel pin-hole setup The implemented renderer supports the synthesis of rectified views from an input view and its depth map. Hence the rendered view is generated as parallel second view, i.e. as shot with a camera with its optical axes in parallel and rotation parameters equal in comparison to the original camera. These constraints simplify the rendering process to a shift of the pixels of the first view by disparities retrieved from the depth map. The relationship between depth and disparity is depicted in figure 1. The point P is shot by camera 0. Its image in camera 0 is the point X. To generate its corresponding point X in virtual camera view 1 X must be shifted by the disparity. Using (1) 5

7 with denoting the virtual camera distance, denoting the focal length of the camera and denoting the horizontal distance of from the center of the stereo camera pair the disparity can be computed as (2) Implemented Renderer Fig. 2 Basic processing steps of the renderer, dashed steps are optional The implemented renderer supports the warping of the samples of a left view to render a right view. The distance to shift the sample positions is given by a depth map for the left view. Figure 2 shows the basic processing steps of the renderer. Steps that are optional are marked with dashed lines. The renderer supports different scaled depth and disparity data. In step (1) this formats are converted to disparities. A possible pre-processing of disparity data is carried out in step (2). The warping of the input samples is done in step (3). Here two different approaches are possible: a simple integer warping and an interpolated warping using interpolation at sub-pixel positions. In step (4) hole filling is carried out. An optional post-processing can be done in step (5) Input data formats The renderer supports scaled depth maps as well as scaled disparity maps. The data must be provided as 8-bit integer YUV data. 6

8 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Scaled depth maps Scale depth maps are for example used by MPEG [1] with as focal length of the camera, the baseline of the camera pair, and and as minimal and maximal depth of the depicted scene. Scaled disparities Disparities are reconstructed by rescaling the given data using equation (4) with denoting a scaling factor and representing a disparity offset. Assuming an approximately equally distributed disparity range for all sequences and can be fixed to constants. With this approach an additional transmission of scaling data can be omitted. Moreover a fast implementation using the constant values is possible. However with fixed values an optimal usage of the 8-bit depth range is not longer assured. A further possibility that is also applicable for scaled depth maps is to keep the virtual baseline variable. By choosing a suitable the user of the mobile device could be enabled to select a subjective optimal depth impression Pre-processing of depth data Pre-processing of depth maps by applying low pass filtering can significantly reduce artifacts in the rendered view [2]. The proposed rendering algorithm supports binominal filtering of the input depth map with selectable number of taps. The filter is separated in horizontal and vertical direction. Note that the filtering is the only operation of the implemented renderer that is not performed in row direction only, hence an increased number of memory accesses is caused Sample warping The implemented renderer supports two modes to render the right or second view. One is fast warping mode without interpolation the other is the interpolated warping using sub-pixel accuracy. Fast warping The warping method is e.g. presented in [2]. is used to identify the sample position in the right view that is set to. (6) For the case of left foreground object edges ( ) occlusion will occur by shifting some samples backward. Hence the mapping from the left view to the right view is not unique. A background and foreground value will be mapped to the same position. However by processing input samples from left to right it is assured that the foreground sample value will be assigned last to. At right foreground object edges ( ) values from the left view are not assigned to all positions of the right view, hence disocclusions occur. To track such holes a binary map is generated while warping, indicating the positions already filled by the left view samples. Interpolated warping An advantage of the fast warping approach is that the samples of one line of the input view can be processed subsequently. Thus the number of memory accesses is minimized. The idea of the interpolated warping mode is to keep this sequential processing and to incorporate an interpolation at sub-pixel positions to the warping process instead of rounding. (4) 7

9 In the first step, position and are calculated with sub-pixel accuracy using equation (7). and are the correct unrounded positions of the shifted samples at and. After that, the difference is evaluated in the second step. If is true a left foreground edge starts between and that occludes the background and it is tested if the edge value of is extrapolated to the left. If, a right foreground edge ends between and. Here, a disocclusion occurs and it is tested if the edge value is extrapolated to the right. In the other cases the sample values at integer position in the target grid of are interpolated. Left foreground edge In case of a left foreground edge it is evaluated if the distance between integer sample position is smaller than. For this case the edges value is extrapolated to the left. (7) and the previous left (8) (9) Right foreground edge In case of a right foreground edge it is evaluated if the distance between integer sample position is smaller than. and the next right (10) For this case the edges value is extrapolated to the right. (11) Interpolation All integer sample positions between and are linearly interpolated using equation (12). Similar to the fast warping approach holes in a rendered view are tracked by a binary map indicating the positions already filled by the left view samples. This map is generated during warping Hole filling Holes in the warped view emerge from disocclusions or rounding disparity values. For hole filling a simple straight forward background extension process is used. Sample positions marked as unfilled in the binary map are filled by extrapolating the value of the background object. When rendering from left to right the background object is always located right to a disocclusion, therefore the value of the next warped sample can be used for filling Post-processing Errors in the depth map can lead to single pixel error in the rendered view known as boundary noise. To fill the missing pixel a three tap median filter in row direction can be applied. (12) 8

10 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Color planes The implemented renderer supports direct YUV 4:2:0 processing as described in [2] as well as RGB 4:4:4 processing. However processing in RGB color space is more complex since it requires up sampling of the U and V channel as well as the color space conversion Evaluation of results Effects of pre-filter Figure 3 shows a comparison between a view synthesized from unprocessed depth data (a,c) and from depth data with the binominal filter (b,d). It can be seen that the pre-processing significantly reduces the artifacts on the right side of the sunshade. However the 3D impression of the stereo view is affected as well since the low pass filtering extents the edges of foreground objects. Fig. 3 Synthesized views; (a) from unfiltered depth data, (c) detail view; (b) from pre-filtered depth data, (d) detail view; 9

11 Fig. 4 Comparison of fast warping (a,c,e,g) and interpolated warping (b,d,f,h); (a,b): synthesized views of sequence horse; (c,d): a detailed cutout; (e,f): histograms of effectively used disparities; (g,h) effectively used disparity maps 10

12 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Comparison of fast and interpolated warping Figure 4 depicts a comparison of the fast and the interpolated warping mode. Figures 4 (a,c,e,g) are related to the fast mode and figures 4 (b,d,f,h) to the interpolated mode. The views synthesized with different warping modes comprise two major differences. One is related to the texture data and depicted in figure 4 (a-d). The other difference is in the depth impression and is indicated in figure 4 (e-h). Figure 4 (c) shows artifacts originating at the left side of a foreground objects edge. Due to rounding, some samples in the foreground (horse leg) have not been filled with the foreground object sample values but rather with values from samples of the background. Note that such artifacts are not holes and cannot be filled by the hole filling process. In the interpolated mode as depicted in figure 4 (d) these artifacts do not emerge. Reason for this is the continuous interpolation between two warped samples (equation 12). The second difference between the fast and the interpolated warping mode is depth impression in the stereo view. In the fast warping mode disparity values are rounded to the next integer position (equation 5). The rounding enlarges the quantization step size of the disparity values. The histogram of effectively used disparities and the corresponding depth map are shown in figure 4 (e,f). It can be seen that the depth data is quantized to several different layers. These layers are also visible in the rendered view. Although the layers have minor influence on the depth quality if the image content consists of a stack of objects at particular depths, they can be annoying if e.g. a plane is given reaching from the foreground to the background. With the interpolated warping mode using sub pixel accuracy the layering is reduced to the quantization step size given by the input depth data as shown in figure 4 (f,h).the advantage of reduced layering is of course only given if the input depth data provides sub-pixel disparities. In these cases PSNR gains up to 1 db have been found for some sequences. Sub pixel disparities are given if the depth estimation for a sequence has been carried out for the full-scale sequence before down-sampling to mobile display size Effects of the post-filter Figure 5 shows a comparison between a rendered view not post processed (a,c) and a rendered view post processed with the 3-tap median filter (b,d). The renderer was set to the fast warping mode. As explained in section some samples have not been filled with values from background samples. It can be seen that the post-processing significantly reduces these artifacts. A disadvantage of this approach is decreased sharpness attained by median filtering. However, due to binocular suppression effects this loss of sharpness is subjectively reduced, when watching the stereo sequence. 11

13 Fig. 5 Synthesized views; (a) from unfiltered depth data, (c) detail view; (b) from post-filtered rendered view, (c,d) detail views; Conclusion A renderer suitable for low-complexity rendering for mobile devices has been presented. The renderer supports different input data formats and incorporates two modes for sample warping. The fast warping mode for minimal computational complexity is suitable for full pixel accurate disparities. The more complex interpolated warping mode allows rendering with sub-pixel accurate disparities and provides an improved depth impression. Hole filling is carried out using line-wise background pixel filling. Post- and pre-processing filters for the input depth and the synthesized view have been implemented to reduce artifacts. 12

14 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV 2.2 Reduction of irrelevant information from depth maps This section presents a method for further improvement of the Video plus Depth approach for stereo video. The depth map is optimized regarding the synthesis of a second view in stereo distance. The basic idea of the proposed method is that some signal parts of depth map created by a depth estimation algorithm are irrelevant for rendering. A removal or damping of these high frequency parts will increase coding efficiency and lead to an improved overall quality. Therefore the proposed algorithm applies a diffusion process to the depth data considering the distortion introduced to a rendered view. Diffusion filtering has been proposed by Perona and Malik [3] and has already been applied to depth maps ([4],[5]). In contrast to the proposed method the approaches presented in [4] and [5] use edge information from the video for depth map enhancement. The proposed approach and its single steps are presented in section An evaluation of the approach is given in section Finally section provides the conclusion and an outlook Proposed Method The main concept of the proposed approach is the smoothing of the depth map in small steps and multiple iterations. In each iteration, all samples of a frame are processed consecutively. The smoothing applied to a sample is controlled by the error introduced to the rendered view, as depicted in figure 6. Here and denote the coordinates of a sample in the frame and represents the iteration number. Fig. 6 Iteration steps of the proposed method; Subsequent to the calculation of smoothed candidate values each sample is evaluated to determine if its candidate value is used in the processed depth map. An iteration of the proposed method starts with the calculation of a depth map with smoothed depth values candidates from the input depth map using a diffusional approach. Subsequently, all samples are processed successively to evaluate the obtained depth value candidates. The order in which the samples are processed has an influence on the filtering result. To minimize this influence the order is permuted for each iteration. 13

15 The depth map representing the current state of the processing is denoted. First is initialized with. While processing, is true for samples at positions that already have been processed in iteration and is true for samples that have not been processed. At the end of iteration, is equal to for all. The decision if a depth candidate at position is accepted is based on the error introduced to the view rendered from when changing from to. If the introduced error is below a threshold, the candidate value is accepted. Otherwise, the sample remains unchanged. The iterative filtering process can be terminated when the filter output converges. In the following sections the single aspects of the proposed method are discussed in detail Diffusion Filtering Smoothing is carried out using an approach similar to the diffusion process proposed by Perona and Malik in [3]. In [3] an image is smoothed by addition of its locally weighted Laplacian. A weighting (diffusion) coefficient is determined from the images gradient. It can be shown that this approach is similar to Gaussian smoothing for constant diffusion coefficient and multiple iterations. For the proposed method the diffusion process is modified to (13) with denoting the 4-nearest-neighbors discrete 2D-laplacian operator and denoting the quantization step size of the depth data. Thus the depth value of a sample converges to the mean of its horizontal and vertical neighbor samples with step size. Reason for the modification is the decision step. In this step a large change of depth might be rejected due to a large introduced error in the rendered view, whereas multiple small changes attained by equation (13) might be allowed. Nevertheless, a smaller change per iteration increases the total number of required iterations Error Calculation Figure 7 depicts the error calculation process. The error introduced by a change of a depth value from to at sample position is estimated as follows: Create a depth map with (14) Hence only the value of the sample under evaluation at position is changed whilst all other depth samples retain their current values. 14

16 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Render the output view using and. denotes the input video data. is the reference view. Rendering of this view must only be carried out once. Render the output view using and.rendering using the altered depth map must be carried out for each sample and iteration. Nevertheless, computational complexity is low since only image parts influenced by the sample at position must be re-rendered. Set is the maximum squared error between the image rendered from processed depth data and the reference image.the proposed approach uses the subpel accurate rendering method presented in section 0. This method shifts the samples of the view using the disparity calculated from the depth values and interpolates the sample values at positions of the target grid. Disocclusions are filled using a straight forward line wise extrapolation of the boundary background sample value. as well as are rendered using the coded video. This approach enables a stronger smoothing of depth data, since details and noise removed from the video data by coding are neglected when calculating the error introduced by the modified depth map. (15) 15

17 Fig. 7 Error Calculation step of the proposed methods; An intermediate depth map is created from the current map and one depth value of the current map; The view rendered from this depth map is compared to the reference view Decision Step In the decision step the introduced error is compared to a given threshold. determines the maximal allowed error for a sample in the rendered view. If is higher than the diffusion step is rejected. This is summarized in equation (16). In the scope of these experiments, only the removal of irrelevant information from the depth data is targeted. Hence the threshold is set to. A higher threshold enables stronger smoothing but also leads to an impaired rendered view Iterative processing As stated before the diffusion of the depth map is performed by processing the samples in succession. The order in which the samples are processed has an influence on the filtering result. To minimize this influence the order is permuted for each iteration in a way that the distance between two consecutively processed samples is maximized. The iterative filtering process can be terminated when the filter output converges, e.g. when the difference between and is below a threshold. Experiments show that approx. 100 iterations are enough to obtain a good smoothing result. (16) 16

18 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Fig. 8 Sequences Champagne Tower (left) and Book Arrival (right); (a), (d): Video Data; (b), (e) unprocessed depth; (c), (f): processed depth; Note that rendering using the unprocessed and processed depth provides an identical result, since only irrelevant signal parts have been reduced. 17

19 2.2.2 Evaluation of results Diffusion process Figure 8 shows results of the proposed diffusion filter. A frame from the sequence Champagne Tower is depicted in Figure 8 (a). The sequence is downscaled to a size of 320x240 samples what is typical for e.g. mobile 3D TV displays. The corresponding unprocessed depth map is shown in figure 8 (b). It can be seen that the depth in the background is very noisy. Although this noise is irrelevant for rendering and does not affect the rendering process, it leads to higher data rates when compressed with a conventional encoder. The depth map processed with the proposed algorithm is presented in 8 (c). Here an error threshold of has been used, thus rendering with the processed and unprocessed depth data results in the same synthesized view. Nevertheless the noise in the background and also on the table in the foreground is removed, while edges in depth map are retained, that are important for correct rendering. A region clipped from the sequence Book Arrival can be seen in figure 8 (d). The full-sized sequence has a resolution of 1024x768 pixels. The unprocessed depth data shown in figure 8 (e) is currently used in MPEG exploration experiments [6]. For processing the threshold has been set to. It can be seen that irrelevant edges are removed by the proposed method. Figure 8 (f) shows that diffusion has been carried out to the left of foreground edges (marked green) and in regions with homogeneous video texture (marked blue). The reason for this filtering behavior is depicted in figures 9 and 10. Figure 9 depicts schematically the reason for diffusion in homogeneous texture regions for one row of the input data. Figure 9 (a) shows the video samples and figure 9 (b) their disparity values. For the unprocessed case a depth peek is shown, that can be regarded as noise. The shift conducted in the warping process is depicted in a - space in figure 9 (c). denotes the disparity and denotes the horizontal position of a sample. In the warping process samples can move horizontally on lines defined by with as original sample position. Note that samples with a positive disparity are in the foreground here. The final rendering result can be seen in figure 9 (d). It consists of the values of the foreground samples. The resulting gaps have been interpolated by hole filling from neighboring sample values. On the right side of figure 9 the rendering of processed depth data is presented. The peak of depth data has been smoothed out in figure 9 (f). However the rendering result shown in figure 9 (h) is the same as for the unprocessed data and the error determined by equation (15) is zero. The reason for diffusion to the left of foreground objects is depicted in figure 10. The input data contains an edge in the video data (figure 10 a) as well as in the unprocessed depth data (figure 10 b). The rendering result for the unprocessed data is depicted in figure 10 (d). After applying the diffusion filter the disparity data has been smoothed next to the left side of the foreground object as shown in figure 10 (f). However the sample belonging to the changed disparity value belongs to the background and is occluded. Hence the rendering result shown in figure 10 (h) is the same as for the unprocessed case. Please note that for the ChampagneTower sequence the diffusion in occluded regions has been disabled. The reason for this is described in the next section. 18

20 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Fig. 9 Diffusion in homogeneous regions; Rendering the video samples (a), (e) using the unprocessed (b) and processed depth data (f) leads to same results (d) and (h) Fig. 10 Diffusion in homogeneous regions; Rendering the video samples (a), (e) using the unprocessed (b) and processed depth data (f) leads to same results (d) and (h) 19

21 Diffusion in occluded regions As shown before the value of depth samples belonging to occluded regions are not important as long as the samples stay in the background. Hence a change of these samples will result in an error and a strong smoothing of depth data next to a foreground objects edge occurs. Although this smoothed area does not impair edges in the rendered view for the uncoded case, it was found that impairments can occur after coding. This is caused by the block-partitioning applied in rate-distortion optimization process carried out by the encoder. For an edge smoothed to one side usually a large block size is chosen, while for sharp edges a large block is further subdivided. This effect is depicted in figure 11. In some cases the depth value of important foreground samples is better preserved in a small block in the subsequent transform and quantization steps. To avoid impairment by the changed block partitioning smoothed sample values can be rejected for all occluded samples in the decision step. Fig. 11 Unprocessed (a) and processed (b) depth data; samples important for rendering are marked red; for the unprocessed data a smaller block size is chosen Coding Results To evaluate the impact of the proposed method on compression efficiency coding experiments have been carried out. The video and depth data of sequences Champagne Tower and Book Arrival have been coded using the H.264/AVC Reference Software JM. The encoder has been configured to use main profile with hierarchical B-pictures, a GOP size of 8 and an intra period of 16. The depth data has been filtered with the proposed approach using the video data coded with a QP of 30 for generation of the rendered reference view. For Champagne Tower diffusion in occluded regions was disabled for Book Arrival not. Then the processed and unprocessed depth maps have been coded. The views rendered from the coded video and coded unprocessed and coded processed depth have been compared to the view rendered from uncoded and unprocessed video and depth data. The results are depicted in figure 12. Here the PSNR obtained by this comparison is plotted versus the bit rate used for depth data. Note that the maximal PSNR is also limited by the impairment caused by the coded texture. It can be seen that gains up to 0.5dB can be achieved with the proposed method. 20

22 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Fig. 12 Coding results for sequences Champagne Tower (a) and Book Arrival (b); PSNR Y of the rendered view vs. bit rate of the depth map. The view rendered from uncoded and unprocessed data is used as reference. 21

23 2.2.3 Conclusion and Outlook A diffusion algorithm for the enhancement of depth maps in Video plus Depth coding has been presented. The diffusion process is controlled by the distortion introduced in the rendered view regarding the rendering algorithm and the coded video data. Hence only irrelevant high frequency parts are damped. Resulting depth maps can be coded at lower bit rates while providing the same quality in the rendering process. The applicability of the approach has been demonstrated for two sequences. PSNR gains up to 0.5dB have been shown using a view rendered from uncoded and unprocessed data as reference at the same bit rate of the rendered view. The proposed approach can be advanced in several ways: An optimization and evaluation using original views instead of rendered views as reference promises higher coding gains as presented here, since not only signal parts irrelevant for rendering but also signal parts introducing noise to the rendered view can be reduced. Possible extensions regarding the diffusion process are anisotropic diffusion filtering and diffusion in temporal direction. Moreover an adaptation to Multi View plus Depth data (MVD) is imaginable. 22

24 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV 3 Multi View Coding 3.1 Software Encoder using a new Video Quality Metric This section presents the integration of a new video quality metric (NVQM) into the JMVC Software for Multi View Coding [7]. Section introduces this metric. In section the basics of the rate-distortion optimization are discussed as also implemented in the JMVC Software. The integration of the NVQM into the JMVC Software is described and evaluated in section Finally section provides the conclusion and gives a suggestion for the integration of a new stereo video quality metric (NSVQM) New Video Quality Metric (PSNR-HVS) PSNR-HVS proposed in [8] is a full reference image quality metric. It supplies an algorithm for computing the PSNR while taking into account the peculiarities of the human visual system HVS, thus the abbreviation - PSNR-HVS. Many studies have confirmed that the HVS is more sensitive to low frequency distortions rather than to high frequency ones. It is also very sensitive to contrast changes and noise. The PSNR-HVS removes the mean shifting and the contrast stretching. The modified version of PSNR utilizes the decorrelation properties of block DCT and the effect of individual DCT coefficients on the overall perception. More specifically, the modified PSNR is calculated as: (17) where is calculated taking into account the HVS features as follows Here, and denote image size, is a normalization factor, are DCT coefficients of an image block for which the coordinates of its left upper corner are equal to and, are the DCT coefficients of the corresponding block in the original image, and is the matrix of correcting factors [8]. PSNR-HVS-M is designed based on PSNR-HVS taking into account Contrast Sensitivity Function (CSF) and between-coefficient contrast masking of DCT basis functions [9]. The model operates with the values of DCT coefficients of pixel block of an image. For each DCT coefficient of the block the model allows to calculate its maximum distortion that is not visible due to the between-coefficient masking. PSNR-HVS-M assumes that the masking degree of each coefficient depends upon its square value (power) and human eye sensitivity to this DCT basis function determined by means of the Contrast Sensitivity Function (CSF). Several basis functions can jointly mask one or few other basis functions. Then, their masking effect value depends on the sum of their weighted powers. PSNR-HVS-M reduces the value of contrast masking in accordance to the proposed model [9]. The two metrics have been modified to work for both and block sizes by adjusting the masking coefficients in the calculation of MSE. The availability of blocks allows using the metric for macro-blocks of H.264/MPEG-4 AVC encoders Rate-distortion optimization In this section the rate-distortion optimization carried out by the JMVC Software is introduced. Section gives an overview of the encoding modes available in the H.264/MPEG-4 AVC standard. The rate-distortion optimized selection of one of these modes is discussed in sections and (18)

25 Macroblock encoding modes and partitions The H.264/MPEG-4 AVC standard supports different modes to encode a macroblock [10], [11]. These modes provide different options to split the macroblocks in partitions and to predict these partitions. Whether a mode is available depends on the slice type, the position of the macroblock in the slice and the selected codec profile. Partitionings, possible for encoding a macroblock in an I-slice, are shown in figure 13. Note that the modes are only available for the high profile. The prediction is carried out using sample values from the boarders of already coded partitions located on the left, top, and top right of the current partition. The four prediction modes supported for the partitioning are depicted in figure (14. The direction of the prediction is indicated by the red arrows. Fig. 13 Macroblock partitioning for Intra coded macroblocks, note that the in the high profile. partitioning is only possible Fig. 14 The four prediction modes for partitions. Fig. 15 The nine prediction modes for partitions. 24

26 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Inter predicted macroblocks are encoded using a motion compensated prediction from one or more reference frames, that have been encoded prior to the current frame. To increase the prediction quality the macroblock can be split up in multiple partitions. A motion compensated predictor is estimated for each of the partitions. Possible partitions are depicted in figure 16. The partition can be further split into one, two or four sub-partitions. Moreover the H.264/MPEG-4 AVC standard provides a skip mode for macroblocks in P and B slices and a direct mode for macroblocks in B slices. For both modes the motion vectors are inferred from adjacent blocks. In the skip mode no residual data is transmitted. Fig. 16 Macroblock partitioning for Inter predicted macroblocks Rate-distortion optimized mode selection Target of the rate-distortion optimized mode selection is choice of a macroblock partitioning and prediction mode to globally minimize the rate given a fixed distortion or to minimize the distortion given a fixed rate. Therefore a Lagrangian optimization is commonly used as for example described in [12]. The Lagrangian optimization targets the minimization of the rate-distortion functional defined as with denoting a mode under test. and represent the rate and the distortion obtained, when encoding a macroblock using mode. The rate-distortion optimized selection is carried out by coding a macroblock in all possible modes and finally using the mode providing the minimal cost. The Lagrange multiplier controls the tradeoff between the rate and the distortion. An optimization of leads to a globally optimized rate-distortion characteristic. The Lagrange multiplier found to be optimal using the sum of squared difference (SSD) between the original data and the encoded macroblock was determined in [12] as with as quantization step size. Thus the optimal Lagrange multiplier depends on quantization step size. In the JMVC Software for Multiview Coding [7] is determined as with denoting the quantization parameter of the JMVC Software. With the approximation for the quantization step size it can be seen that as presented in equation (20). (19) (20) (21) 25

27 Rate-distortion optimized motion estimation When coding a macroblock using inter prediction, motion compensation is carried out to generate an optimal predictor. In the motion compensation process motion vectors are estimated using a rate-distortion optimized approach. By minimizing the functional with denoting the rate needed to code the motion vector. is the sum of absolute differences (SAD) between the predictor and the partition to encode. For using the SAD, the Lagrange multiplier must be set to [12]: Rate-distortion optimization using the new VQM This section gives a brief overview on how the rate-distortion optimization is implemented in the JMVC Software and what changes had to be carried out to integrate the new Video Quality Metric. The performance of the new video quality metric for Intra coding as well as for Inter and Intra Coding are evaluated by coding experiments. Results from these experiments are used to optimize the Lagrange multiplier. Finally a QP dependent correction factor for the Lagrange multiplier is investigated Integration into the JMVC Software for Multiview Coding Overview of the encoding process of the JMVC Software The hierarchy of MVC encoding modules are depicted in figure 17. In the H.264/AVCEncoderTest class the encoder is initialized and frame buffers are setup. Subsequently the frames of the sequence are processed. In the CreaterH264AVCEncoder class further objects used in the encoding process are initialized. The PicEncoder class initializes the slice header, sets up the Lagrange multiplier depending on the used QP and finally starts the encoding of a frame or a field. The reference frames used by the current frame or field are set in the SliceEncoder class. Moreover the processing of slice groups and macroblocks is started. Single macroblocks are finally encoded using the MbEncoder class. (22) (23) Fig. 17 Structure of the encoding process in the JMVC Software. 26

28 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Structure of rate-distortion optimization in the JMVC Encoder Fig. 18 Hierarchy of rate-distortion optimization search The structure of the rate-distortion optimization carried out in the MacroBlockEncoder module is shown in figure 18. The search for the mode that provides the minimum rate-distortion cost is carried out hierarchically. At the highest level the possible partitioning sizes as given in section are tested. The Skip Mode is tested for P-Slices only. The Direct Mode is only tested for B-slices. Inter Prediction is carried out for P- and B-slices and intra prediction for I-, P- and B- slices. Four or nine prediction modes are tested for Intra coded macroblocks. For each inter coded macroblock partition a search is carried out to find an optimal reference frame and motion vector. For inter coded macroblocks further subdivisions are tested. Moreover a transform size of is evaluated for the inter coded macroblocks, when using the high profile (EstimateMb8x8Frext). The search process performed in inter coding is depicted in figure 19 and equation (20). The JMVC Software supports different video quality metrics at different levels of the rate-distortion optimization process. In blocks marked orange or red in figure 18 the sum of squared errors is used (SSE) to determine the distortion. In the calculation of the rate-distortion cost is 27

29 used. To determine the video quality in the motion estimation process JMVC provides a choice of block difference calculations, i.e. between SAD, SSE, HADAMARD and SAD-YUV for full pixel accurate motion estimation (marked green) and a choice of SAD, SSE, HADAMARD for subpixel accurate motion estimation (marked blue). If the SSE is used in motion estimation, is used for computation of the rate-distortion costs. Otherwise, (equation 23) is used. The computation of the distortion is realized in the XDistortion class of the JMVC Software. This class provides member functions for distortion computation for different block sizes. To choose a VQM, a parameter can be passed to these functions. A pointer to a function implementing the selected VQM for the particular block size is then selected and called. The XRateDistortion class of the JMVC Software provides functions to calculate the ratedistortion cost given a particular distortion and rate. The Lagrange multiplier for mode selection as well as for motion estimation is a member of this class. Fig. 19 Inter Search of JMVC Encoder Changes to the JMVC Encoder Functions to compute the new metric as given in equation (18) for different block sizes have been added to the XDistortion class of JMVC Software. The PSNR-HVS as well as the PSNR-HVS-M can be used in the rate-distortion optimization. However the encoder has been optimized for PSNR-HVS. Hence in the following, the new video quality metric refers to the PSNR-HVS. The minimum block size in H.264/AVC is samples. This requires a computation of the new distortions metric for blocks in the rate-distortion optimization. To have a consistent metric it was chosen to calculate the distortion for a larger block by summing the distortions of its sub-blocks. A new parameter VQMMode has been added to the encoder configuration to decide if and to what extend the new video quality metric is used. Depending on that parameter the quality metric is chosen at different levels of the hierarchically search process. Three different settings are possible: The first setting is to use the new metric for I-frames only (orange blocks in figure 18). The second setting allows the usage of the new metric for intra and inter mode decision (orange and red blocks in figure 18 and figure 19). The third setting enables the use of the new video quality metric for intra and inter mode decision as well as for motion estimation. An optimization of the Lagrange multiplier is presented in [12] and gives equation (20). However this optimization has been carried out for SSE. To find an optimal Lagrange multiplier for the new video quality metric two additional parameters have been included to the JMVC encoder configuration. These parameters are called LambdaScale and LambdaScaleME. They are members in the RateDistortion class and scale the Lagrange multiplier. The linear scaling has been used based on the assumption that the proportionality between the squared quantization step size and the optimal Lagrange multiplier as given in equation (20) is still valid for the new video quality metric. 28

30 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Evaluation of the new VQM in the rate-distortion optimization process The performance of the new video quality metric in the rate-distortion optimization process has been evaluated for the different cases that can be set by the VQMMode parameter. An overview of the setups is depicted in table 1. NVQM denotes the new video quality metric. To show the gains achieved by optimizing to the new video quality metric, encoding with conventional distortion metric (SSE, SAD) has been carried out for reference. The test sequences Car, Horse, Butterfly, Mountain, Soccer2 and Bullinger from the coding test set of the stereo video database [13] have been used. Sequences have been coded with varying the quantization parameter of the encoder from 20 to 42 with a step size of 3. To evaluate the influence of different Lagrange multipliers, the LambdaScale parameter has been set to 0.25, 0.5, 0.75, 1 and 2. For the inter encoding test the period of I-frames has been set to 8 and an IPPP GOP-structure has been chosen. Intra mode decision Tab. 1 Encoder setups used for evaluation Results for the intra encoding setup are depicted in figure 20. The sequences have been encoded using I-frames only. The black curve shows results for the reference encoding setup. It can be seen that using the new video quality metric in rate-distortion optimization process leads to gains for all sequences. Gains increase for higher bit rates. A maximum gain of can be achieved for the Mountain sequence. An evaluation of the influence of the LambdaScale parameter shows that for high bit rates a low LambdaScale is optimal. In contrast to this a high LambdaScale is optimal for low bit rates. Hence the assumption of a linearity between the optimal Lagrange multiplier and the determined by equation (21) does not hold any more. Intra and Inter mode decision Figure 21 shows the results for the Inter and Intra configuration as given in table 1. Note that for this mode decision, the LambdaScaleME estimation was set to and differs from the LambdaScale parameters. Conclusions here are similar to those found for the Intra mode only. The optimal LambdaScale decreases for increasing bit rates and gains increase for higher bit rates. Intra and Inter mode decision and motion estimation Results for the Intra, Inter and Motion Estimation configuration are shown in figure 22. The LambdaScaleME parameter is equal to the LambdaScale parameter. It can be seen that gains decrease by using the new video quality metric in the motion estimation process. This might be due to the energy of the residual attained with the new video quality metric. Using the new quality metric in the rate-distortion optimized motion estimation leads to a predictor minimizing the new video quality metric but not minimizing the energy of the residual signal. The additional rate used to encode the residual data might lead to the observed decrease of the overall performance. 29

31 Fig. 20 Evaluation of Intra configuration only; average NVQM-Y of both views vs. total bit rate 30

32 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Fig. 21 Evaluation of Inter and Intra configuration ; average NVQM-Y of both views vs. total bit rate 31

33 Fig. 22 Evaluation of Inter and Intra and ME configuration; average NVQM-Y of both views vs. total Bit rate 32

34 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Optimization of the Lagrange multiplier The evaluation of encoding results shows that the assumption of a linearity between the optimal Lagrange multiplier and the as attained from equation (21) does not hold. The encoding process cannot be optimized by selecting a constant scale for independent from QP. In [12] the relationship between Lagrange multiplier and the QP is determined by fixing and multiple encodings of a macroblocks with different QPs to find the optimal combination of QP and. This approach could be redone for the new video quality metric. However, for simplicity reasons only an optimal correction factor (Lambda Scale) for is calculated here. For optimal encoder performance must be corrected by a factor depending on the QP value. The final Lagrange multiplier is then with from equation (21). The coding experiments provide the rate and the distortion for several combinations of the scaling factor and the. The optimal relationship between and can now be obtained by evaluation of several rate points. Given the set of combinations of and that produce the rate as the optimal combination at rate point is given by Hence, is the combination that minimize the distortion at rate. (24), (25). (26) Figure 23 depicts the optimization procedure for the sequence Car. To determine and coding experiments have been carried out varying from to with a step size of 0.1 and range from to using a step size of two. Intermediate values have been interpolated. Sets of combinations leading to equal rates can be obtained from figure 23 (a). These isorate lines are marked black. The combinations minimizing the distortion by maximizing the NVQM can be found in figure 23 (b) on the iso-rate lines and are highlighted by red circles. 33

35 Fig. 23 Optimization of the Lambda Scale; (a), (b), ; the black lines mark points with equal rate; combinations leading to minimal distortions are marked red The relationship between and for different is depicted in figure 24 (b). The blue lines mark combinations leading to the same rate. Again, the combinations maximizing the NVQM from a particular set are marked red. This can be seen from 24(c). Here the NVQM-Y is plotted versus the Lambda Scale. As found before in section the optimal converges to small values below for high rates and increases for low rates. Moreover it can be observed that has a minor influence on the distortion for low rates. The iso-rate lines run almost horizontal. The spike that can be observed in the NVQM- curve at low rates results from this. Due to the small change in the NVQM-Y noise introduced by the numerical solution of 34

36 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV equation (26) has a strong influence on the determined. However, since the changes in distortion is very small this effect has only a minor influence on the optimization of. For the sake of completeness the relationship between the QP and the NVQM-Y is depicted in figure 24 (a). Fig. 24 Optimization of the Lambda Scale ; (a) relationship between NVQM and QP; (b) relationship between QP and ; (c) relationship between Lambda Scale and NVQM; the blue lines mark points with equal rate; combinations leading to minimal distortions are marked red 35

37 Fig. 25 Relationship between Lambda Scale and QP for all sequences of the test set The optimal combinations of QP and have been determined for all sequences of the coding test set. Results can be found in figure 25. The figure shows that optimal relationship between QP and is sequence-dependent and varies up to 0.5. However the optimal increases for all sequences with increasing QP. An approximation of relationship that is depicted in figure 25 is e.g. Using equation (27) together with equation (24) and equation (21) leads to a changed Lagrange multiplier for the NVQM of (27) (28) Evaluation of Results Coding results using the Lagrange multiplier as calculated from equation (28) are depicted in figure 26. Gains compared to the reference can especially be achieved at high rates and range up to 1.6 db. The correction of the Lagrange multiplier by the QP dependent allows to maximize the encoder performance compared to a correction with a constant. That can e.g. be seen for the horse sequence. Here the scale of 0.25 provides best results at 1450 kbit/s and a scale of at 600 kbit/s. With QP dependent the encoder operates optimal at both rate points. However, due to sequence dependency of the QP dependent scale the approximation is not optimal for all sequences. This effect can be observed e.g. for the mountain sequence at 1600 kbit/s. Here a scale of 0.25 would increase the bit rate. Additional to the evaluation using the NVQM coding experiments, evalutions using the PSNR for comparison to the reference have been carried out. An average decrease of -PSNR has been found. 36

38 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Fig. 26 Coding results attained with a QP dependent correction factor for the Lagrange multiplier 37

39 3.1.4 Conclusion and Outlook A new video quality metric, the PSNR-HVS, has been integrated into the JMVC Software for Multiview coding. Therefore the distortion classes, the rate-distortion interface and the Macroblock-Encoding class have been modified. Coding experiments have been carried out to evaluate the gains achieved by the modified rate-distortion optimization. Constant scaling factors for the Lagrange multiplier used in the rate-distortion optimization have been evaluated. A QP dependent correction factor for the Lagrange multiplier has been determined for the NVQM. With the optimized Lagrange multiplier the rate-distortion optimization process leads to gains up to 1.6dB at high bit rates using the new video quality metric compared to an encoder using the SSD for optimization. Since the new video quality metric has been designed to emulate the human visual system [8], a subjectively increased video quality can be assumed as well. The integrated 2D video quality metric is a first step towards an encoder using a stereo video quality metric. The new metric can be used to optimize the quality of the first coded view. The encoder for the second view could use rate-distortion optimization regarding an error calculated from the second view together with the already coded first view. The concept of this approach is depicted in figure 27. Fig. 27 Concept for an encoder using a new stereo video quality metric (NSVQM) In the first step the first view is encoded using the new video quality metric. In the second step, the second view is encoded using a new stereo video quality metric. This metric utilizes four inputs to the rate-distortion optimization process: the currently tested macroblock, the original second view, the original first view and the reconstructed first view. Further extensions could include a rate-distortion optimized quantization using the new metric and a optimization targeting blocks larger than. 38

40 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV 4 Overview of coding algorithms for mobile3dtv This section gives an overview of the evaluated coding methods. The evaluations and results of this and the other Mobile3DTV deliverables on stereo video coding are summarized. 4.1 Overview of representation format and coding approaches Stereo video can be represented in different formats namely Conventional Stereo Video (CSV), Mixed Resolution Stereo (MRS) and Video plus Depth (V+D). These formats are depicted in figure 28 and are discussed in section For the stereoscopic representation formats, different standardized coding methods exist. These coding methods are AVC Simulcast, AVC with Stereo SEI-Message, AVC Auxiliary Picture Syntax, MPEG-C part 3 with AVC and MVC. An overview can be found in figure 28 and in section They can be applied to the representation formats. However, not each combination is practical. Reasonable combinations are listed in table 2. Tab. 2 Reasonable combinations of representation formats and coding methods are marked with Representation Formats These sections will provide a detailed description of commonly used representation formats. An initial graphical overview is given in figure 28 where three representation formats are shown Conventional Stereo Video Stereo video consists of a pair of sequences, showing the same scene for the right and the left eye view, as shown in 28 left. Compared to conventional monoscopic video, stereo video has twice the amount of data to be stored or transmitted. Especially for mobile video services with its bandwidth and memory limitations, very efficient compression of stereo video is required to realize 3D instead of conventional 2D video. However, efficient compression of stereo video takes advantage of the fact that the left and the right view of a stereo pair show the same scene from slightly different perspectives and are therefore highly redundant. For CSV, the representation format equals the display format, such that no conversion processing is required. 39

41 Fig. 28 Representation formats and their processing to common display format: Conventional Stereo Coding left, Mixed Resolution Stereo middle and Video+Depth right Mixed Resolution Stereo A reduction of the transmission rate can be achieved by exploiting the binocular suppression theory [16]. In a stereo sequence, where the sharpness of left eye and right eye view differ ( see figure 28 middle), the perceived binocular quality of a stereoscopic sequence was rated close to the sharper view [17], [18]. In contrast, if both views exhibit different amounts of blocking artifacts, the binocular quality of a stereoscopic sequence was rated close to the mean quality of both views. This leads to the assumption that a stereoscopic sequence, in which one view has a reduced resolution (mixed resolution representation, MR) the same subjective quality in comparison to the full resolution (FR) case is perceived. Thus, a lower bit rate at equal quality is achievable for MR. For the conversion of MRS into the 2-view stereoscopic display format, post processing in the form of upsampling is required. Advancements achieved for the Mixed Resolution Stereo representation can be found in section Video plus Depth Representation The video plus depth format consist of a conventional monoscopic color video and an associated per pixel depth map (figure 28 right), which can be regarded as a monochromatic, luminanceonly video signal. Thus, a lower bit rate is achievable. The depth data is usually generated by depth/disparity estimation from a captured stereo pair. Such algorithms can be highly complex and are still error-prone. The advantage of this format is the possible baseline variation such that stereo pairs with baselines other than the original camera pair can be generated. This requires the most complex conversion method from representation to display format of the presented formats. Here, view synthesis is used to generate the second view from the V+D format. The major challenge is the visual quality of the synthesized view, as rendering artifacts may result in a wrong and thereby annoying 3D impression in case the left and right view are inconsistent. Advancements achieved for the Video plus Depth representation can be found in section

42 MOBILE3DTV D2.6 Final report on coding algorithms for mobile 3DTV Coding Approaches Fig. 29 Coding approaches suitable for Mobile3DTV AVC Simulcast A simple coding method for stereo content is H.264/AVC Simulcast. It is specified as the individual application of an H.264/AVC conforming coder to several video sequences in a generic way [11]. H.264/AVC is the latest video coding standard of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). H.264/AVC has recently become the most widely adopted video coding standard and covers all common video applications ranging from mobile services and videoconferencing to IPTV, HDTV, and HD video storage. For stereo video the overview diagram in figure 29 illustrates the coding procedure of H.264/AVC Simulcast with the left and right view of a stereo pair. The H.264/AVC encoder is applied to each of the two input sequences independently, resulting in two encoded bit- or transport-streams (BS/TS). After transmission over the channel the two streams are decoded independently, resulting in the distorted video sequences of the stereo pair. AVC Simulcast can be applied to all representation formats AVC with Stereo SEI-Message According to the H.264/AVC standard [11], the Stereo video information SEI message is specified as follows: This SEI message provides the decoder with an indication that the entire coded video sequence consists of pairs of pictures forming stereo-view content. It defines six flags to control the mapping of frames or fields of the coded video sequence to the left and right 41

Development and optimization of coding algorithms for mobile 3DTV. Gerhard Tech Heribert Brust Karsten Müller Anil Aksay Done Bugdayci

Development and optimization of coding algorithms for mobile 3DTV. Gerhard Tech Heribert Brust Karsten Müller Anil Aksay Done Bugdayci Development and optimization of coding algorithms for mobile 3DTV Gerhard Tech Heribert Brust Karsten Müller Anil Aksay Done Bugdayci Project No. 216503 Development and optimization of coding algorithms

More information

Advanced Video Coding: The new H.264 video compression standard

Advanced Video Coding: The new H.264 video compression standard Advanced Video Coding: The new H.264 video compression standard August 2003 1. Introduction Video compression ( video coding ), the process of compressing moving images to save storage space and transmission

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 MPEG2011/N12559 February 2012,

More information

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Course Presentation Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Video Coding Correlation in Video Sequence Spatial correlation Similar pixels seem

More information

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Project Title: Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Midterm Report CS 584 Multimedia Communications Submitted by: Syed Jawwad Bukhari 2004-03-0028 About

More information

Implementation and analysis of Directional DCT in H.264

Implementation and analysis of Directional DCT in H.264 Implementation and analysis of Directional DCT in H.264 EE 5359 Multimedia Processing Guidance: Dr K R Rao Priyadarshini Anjanappa UTA ID: 1000730236 priyadarshini.anjanappa@mavs.uta.edu Introduction A

More information

Coding of 3D Videos based on Visual Discomfort

Coding of 3D Videos based on Visual Discomfort Coding of 3D Videos based on Visual Discomfort Dogancan Temel and Ghassan AlRegib School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA, 30332-0250 USA {cantemel, alregib}@gatech.edu

More information

3366 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 9, SEPTEMBER 2013

3366 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 9, SEPTEMBER 2013 3366 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 9, SEPTEMBER 2013 3D High-Efficiency Video Coding for Multi-View Video and Depth Data Karsten Müller, Senior Member, IEEE, Heiko Schwarz, Detlev

More information

Depth Estimation for View Synthesis in Multiview Video Coding

Depth Estimation for View Synthesis in Multiview Video Coding MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Depth Estimation for View Synthesis in Multiview Video Coding Serdar Ince, Emin Martinian, Sehoon Yea, Anthony Vetro TR2007-025 June 2007 Abstract

More information

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H. EE 5359 MULTIMEDIA PROCESSING SPRING 2011 Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.264 Under guidance of DR K R RAO DEPARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY

More information

Video Quality Analysis for H.264 Based on Human Visual System

Video Quality Analysis for H.264 Based on Human Visual System IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021 ISSN (p): 2278-8719 Vol. 04 Issue 08 (August. 2014) V4 PP 01-07 www.iosrjen.org Subrahmanyam.Ch 1 Dr.D.Venkata Rao 2 Dr.N.Usha Rani 3 1 (Research

More information

New Techniques for Improved Video Coding

New Techniques for Improved Video Coding New Techniques for Improved Video Coding Thomas Wiegand Fraunhofer Institute for Telecommunications Heinrich Hertz Institute Berlin, Germany wiegand@hhi.de Outline Inter-frame Encoder Optimization Texture

More information

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Efficient MPEG- to H.64/AVC Transcoding in Transform-domain Yeping Su, Jun Xin, Anthony Vetro, Huifang Sun TR005-039 May 005 Abstract In this

More information

Digital Video Processing

Digital Video Processing Video signal is basically any sequence of time varying images. In a digital video, the picture information is digitized both spatially and temporally and the resultant pixel intensities are quantized.

More information

Rate Distortion Optimization in Video Compression

Rate Distortion Optimization in Video Compression Rate Distortion Optimization in Video Compression Xue Tu Dept. of Electrical and Computer Engineering State University of New York at Stony Brook 1. Introduction From Shannon s classic rate distortion

More information

Optimizing the Deblocking Algorithm for. H.264 Decoder Implementation

Optimizing the Deblocking Algorithm for. H.264 Decoder Implementation Optimizing the Deblocking Algorithm for H.264 Decoder Implementation Ken Kin-Hung Lam Abstract In the emerging H.264 video coding standard, a deblocking/loop filter is required for improving the visual

More information

Module 7 VIDEO CODING AND MOTION ESTIMATION

Module 7 VIDEO CODING AND MOTION ESTIMATION Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five

More information

VIDEO COMPRESSION STANDARDS

VIDEO COMPRESSION STANDARDS VIDEO COMPRESSION STANDARDS Family of standards: the evolution of the coding model state of the art (and implementation technology support): H.261: videoconference x64 (1988) MPEG-1: CD storage (up to

More information

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46 LIST OF TABLES TABLE Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46 Table 5.2 Macroblock types 46 Table 5.3 Inverse Scaling Matrix values 48 Table 5.4 Specification of QPC as function

More information

High Efficiency Video Coding. Li Li 2016/10/18

High Efficiency Video Coding. Li Li 2016/10/18 High Efficiency Video Coding Li Li 2016/10/18 Email: lili90th@gmail.com Outline Video coding basics High Efficiency Video Coding Conclusion Digital Video A video is nothing but a number of frames Attributes

More information

Using animation to motivate motion

Using animation to motivate motion Using animation to motivate motion In computer generated animation, we take an object and mathematically render where it will be in the different frames Courtesy: Wikipedia Given the rendered frames (or

More information

CMPT 365 Multimedia Systems. Media Compression - Video

CMPT 365 Multimedia Systems. Media Compression - Video CMPT 365 Multimedia Systems Media Compression - Video Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Introduction What s video? a time-ordered sequence of frames, i.e.,

More information

Compression of Light Field Images using Projective 2-D Warping method and Block matching

Compression of Light Field Images using Projective 2-D Warping method and Block matching Compression of Light Field Images using Projective 2-D Warping method and Block matching A project Report for EE 398A Anand Kamat Tarcar Electrical Engineering Stanford University, CA (anandkt@stanford.edu)

More information

Homogeneous Transcoding of HEVC for bit rate reduction

Homogeneous Transcoding of HEVC for bit rate reduction Homogeneous of HEVC for bit rate reduction Ninad Gorey Dept. of Electrical Engineering University of Texas at Arlington Arlington 7619, United States ninad.gorey@mavs.uta.edu Dr. K. R. Rao Fellow, IEEE

More information

10.2 Video Compression with Motion Compensation 10.4 H H.263

10.2 Video Compression with Motion Compensation 10.4 H H.263 Chapter 10 Basic Video Compression Techniques 10.11 Introduction to Video Compression 10.2 Video Compression with Motion Compensation 10.3 Search for Motion Vectors 10.4 H.261 10.5 H.263 10.6 Further Exploration

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

2014 Summer School on MPEG/VCEG Video. Video Coding Concept 2014 Summer School on MPEG/VCEG Video 1 Video Coding Concept Outline 2 Introduction Capture and representation of digital video Fundamentals of video coding Summary Outline 3 Introduction Capture and representation

More information

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2,

More information

5LSH0 Advanced Topics Video & Analysis

5LSH0 Advanced Topics Video & Analysis 1 Multiview 3D video / Outline 2 Advanced Topics Multimedia Video (5LSH0), Module 02 3D Geometry, 3D Multiview Video Coding & Rendering Peter H.N. de With, Sveta Zinger & Y. Morvan ( p.h.n.de.with@tue.nl

More information

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation Chapter 10 Basic Video Compression Techniques 10.1 Introduction to Video Compression 10.2 Video Compression with Motion Compensation 10.3 Search for Motion Vectors 10.4 H.261 10.5 H.263 10.6 Further Exploration

More information

3D Video Processing Algorithms Part I. Sergey Smirnov Atanas Gotchev Sumeet Sen Gerhard Tech Heribert Brust

3D Video Processing Algorithms Part I. Sergey Smirnov Atanas Gotchev Sumeet Sen Gerhard Tech Heribert Brust 3D Video Processing Algorithms Part I Sergey Smirnov Atanas Gotchev Sumeet Sen Gerhard Tech Heribert Brust Project No. 216503 3D Video Processing Algorithms Part I Sergey Smirnov, Atanas Gotchev, Sumeet

More information

View Synthesis Prediction for Rate-Overhead Reduction in FTV

View Synthesis Prediction for Rate-Overhead Reduction in FTV MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com View Synthesis Prediction for Rate-Overhead Reduction in FTV Sehoon Yea, Anthony Vetro TR2008-016 June 2008 Abstract This paper proposes the

More information

An Efficient Mode Selection Algorithm for H.264

An Efficient Mode Selection Algorithm for H.264 An Efficient Mode Selection Algorithm for H.64 Lu Lu 1, Wenhan Wu, and Zhou Wei 3 1 South China University of Technology, Institute of Computer Science, Guangzhou 510640, China lul@scut.edu.cn South China

More information

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 5, MAY

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 5, MAY IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 5, MAY 2015 1573 Graph-Based Representation for Multiview Image Geometry Thomas Maugey, Member, IEEE, Antonio Ortega, Fellow Member, IEEE, and Pascal

More information

IN the early 1980 s, video compression made the leap from

IN the early 1980 s, video compression made the leap from 70 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 1, FEBRUARY 1999 Long-Term Memory Motion-Compensated Prediction Thomas Wiegand, Xiaozheng Zhang, and Bernd Girod, Fellow,

More information

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France Video Compression Zafar Javed SHAHID, Marc CHAUMONT and William PUECH Laboratoire LIRMM VOODDO project Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier LIRMM UMR 5506 Université

More information

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION In this chapter we will discuss the process of disparity computation. It plays an important role in our caricature system because all 3D coordinates of nodes

More information

Compression of Stereo Images using a Huffman-Zip Scheme

Compression of Stereo Images using a Huffman-Zip Scheme Compression of Stereo Images using a Huffman-Zip Scheme John Hamann, Vickey Yeh Department of Electrical Engineering, Stanford University Stanford, CA 94304 jhamann@stanford.edu, vickey@stanford.edu Abstract

More information

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy JPEG JPEG Joint Photographic Expert Group Voted as international standard in 1992 Works with color and grayscale images, e.g., satellite, medical,... Motivation: The compression ratio of lossless methods

More information

JUNSHENG FU A Real-time Rate-distortion Oriented Joint Video Denoising and Compression Algorithm

JUNSHENG FU A Real-time Rate-distortion Oriented Joint Video Denoising and Compression Algorithm JUNSHENG FU A Real-time Rate-distortion Oriented Joint Video Denoising and Compression Algorithm Master of Science Thesis Subject approved in the Department Council meeting on the 23rd of August 2011 Examiners:

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006, School of Computer Science and Communication, KTH Danica Kragic EXAM SOLUTIONS Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006, 14.00 19.00 Grade table 0-25 U 26-35 3 36-45

More information

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami to MPEG Prof. Pratikgiri Goswami Electronics & Communication Department, Shree Swami Atmanand Saraswati Institute of Technology, Surat. Outline of Topics 1 2 Coding 3 Video Object Representation Outline

More information

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Week 14. Video Compression. Ref: Fundamentals of Multimedia Week 14 Video Compression Ref: Fundamentals of Multimedia Last lecture review Prediction from the previous frame is called forward prediction Prediction from the next frame is called forward prediction

More information

Introduction to Video Encoding

Introduction to Video Encoding Introduction to Video Encoding INF5063 23. September 2011 History of MPEG Motion Picture Experts Group MPEG1 work started in 1988, published by ISO in 1993 Part 1 Systems, Part 2 Video, Part 3 Audio, Part

More information

Lecture 7, Video Coding, Motion Compensation Accuracy

Lecture 7, Video Coding, Motion Compensation Accuracy Lecture 7, Video Coding, Motion Compensation Accuracy Last time we saw several methods to obtain a good motion estimation, with reduced complexity (efficient search), and with the possibility of sub-pixel

More information

Video Compression An Introduction

Video Compression An Introduction Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital

More information

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50

More information

A New Data Format for Multiview Video

A New Data Format for Multiview Video A New Data Format for Multiview Video MEHRDAD PANAHPOUR TEHRANI 1 AKIO ISHIKAWA 1 MASASHIRO KAWAKITA 1 NAOMI INOUE 1 TOSHIAKI FUJII 2 This paper proposes a new data forma that can be used for multiview

More information

Next-Generation 3D Formats with Depth Map Support

Next-Generation 3D Formats with Depth Map Support MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Next-Generation 3D Formats with Depth Map Support Chen, Y.; Vetro, A. TR2014-016 April 2014 Abstract This article reviews the most recent extensions

More information

Enhanced View Synthesis Prediction for Coding of Non-Coplanar 3D Video Sequences

Enhanced View Synthesis Prediction for Coding of Non-Coplanar 3D Video Sequences Enhanced View Synthesis Prediction for Coding of Non-Coplanar 3D Video Sequences Jens Schneider, Johannes Sauer and Mathias Wien Institut für Nachrichtentechnik, RWTH Aachen University, Germany Abstract

More information

Graph-based representation for multiview images with complex camera configurations

Graph-based representation for multiview images with complex camera configurations Graph-based representation for multiview images with complex camera configurations Xin Su, Thomas Maugey, Christine Guillemot To cite this version: Xin Su, Thomas Maugey, Christine Guillemot. Graph-based

More information

ISSN: An Efficient Fully Exploiting Spatial Correlation of Compress Compound Images in Advanced Video Coding

ISSN: An Efficient Fully Exploiting Spatial Correlation of Compress Compound Images in Advanced Video Coding An Efficient Fully Exploiting Spatial Correlation of Compress Compound Images in Advanced Video Coding Ali Mohsin Kaittan*1 President of the Association of scientific research and development in Iraq Abstract

More information

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

International Journal of Emerging Technology and Advanced Engineering Website:   (ISSN , Volume 2, Issue 4, April 2012) A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and

More information

Reduced Frame Quantization in Video Coding

Reduced Frame Quantization in Video Coding Reduced Frame Quantization in Video Coding Tuukka Toivonen and Janne Heikkilä Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P. O. Box 500, FIN-900 University

More information

Lecture 13 Video Coding H.264 / MPEG4 AVC

Lecture 13 Video Coding H.264 / MPEG4 AVC Lecture 13 Video Coding H.264 / MPEG4 AVC Last time we saw the macro block partition of H.264, the integer DCT transform, and the cascade using the DC coefficients with the WHT. H.264 has more interesting

More information

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013 ECE 417 Guest Lecture Video Compression in MPEG-1/2/4 Min-Hsuan Tsai Apr 2, 213 What is MPEG and its standards MPEG stands for Moving Picture Expert Group Develop standards for video/audio compression

More information

LBP-GUIDED DEPTH IMAGE FILTER. Rui Zhong, Ruimin Hu

LBP-GUIDED DEPTH IMAGE FILTER. Rui Zhong, Ruimin Hu LBP-GUIDED DEPTH IMAGE FILTER Rui Zhong, Ruimin Hu National Engineering Research Center for Multimedia Software,School of Computer, Wuhan University,Wuhan, 430072, China zhongrui0824@126.com, hrm1964@163.com

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro email:{martinian,jxin,avetro}@merl.com, behrens@tnt.uni-hannover.de Mitsubishi Electric Research

More information

Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding

Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding 344 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding Peter

More information

An Improved H.26L Coder Using Lagrangian Coder Control. Summary

An Improved H.26L Coder Using Lagrangian Coder Control. Summary UIT - Secteur de la normalisation des télécommunications ITU - Telecommunication Standardization Sector UIT - Sector de Normalización de las Telecomunicaciones Study Period 2001-2004 Commission d' études

More information

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING 1 Michal Joachimiak, 2 Kemal Ugur 1 Dept. of Signal Processing, Tampere University of Technology, Tampere, Finland 2 Jani Lainema,

More information

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology Course Presentation Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology Image Compression Basics Large amount of data in digital images File size

More information

Video Coding Using Spatially Varying Transform

Video Coding Using Spatially Varying Transform Video Coding Using Spatially Varying Transform Cixun Zhang 1, Kemal Ugur 2, Jani Lainema 2, and Moncef Gabbouj 1 1 Tampere University of Technology, Tampere, Finland {cixun.zhang,moncef.gabbouj}@tut.fi

More information

Motion Estimation for Video Coding Standards

Motion Estimation for Video Coding Standards Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression

More information

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Chin-Fu Tsao 1, Yu-Hao Chen 1, Jin-Hau Kuo 1, Chia-wei Lin 1, and Ja-Ling Wu 1,2 1 Communication and Multimedia Laboratory,

More information

VHDL Implementation of H.264 Video Coding Standard

VHDL Implementation of H.264 Video Coding Standard International Journal of Reconfigurable and Embedded Systems (IJRES) Vol. 1, No. 3, November 2012, pp. 95~102 ISSN: 2089-4864 95 VHDL Implementation of H.264 Video Coding Standard Jignesh Patel*, Haresh

More information

BLOCK MATCHING-BASED MOTION COMPENSATION WITH ARBITRARY ACCURACY USING ADAPTIVE INTERPOLATION FILTERS

BLOCK MATCHING-BASED MOTION COMPENSATION WITH ARBITRARY ACCURACY USING ADAPTIVE INTERPOLATION FILTERS 4th European Signal Processing Conference (EUSIPCO ), Florence, Italy, September 4-8,, copyright by EURASIP BLOCK MATCHING-BASED MOTION COMPENSATION WITH ARBITRARY ACCURACY USING ADAPTIVE INTERPOLATION

More information

Mesh Based Interpolative Coding (MBIC)

Mesh Based Interpolative Coding (MBIC) Mesh Based Interpolative Coding (MBIC) Eckhart Baum, Joachim Speidel Institut für Nachrichtenübertragung, University of Stuttgart An alternative method to H.6 encoding of moving images at bit rates below

More information

MPEG-4: Simple Profile (SP)

MPEG-4: Simple Profile (SP) MPEG-4: Simple Profile (SP) I-VOP (Intra-coded rectangular VOP, progressive video format) P-VOP (Inter-coded rectangular VOP, progressive video format) Short Header mode (compatibility with H.263 codec)

More information

Redundancy and Correlation: Temporal

Redundancy and Correlation: Temporal Redundancy and Correlation: Temporal Mother and Daughter CIF 352 x 288 Frame 60 Frame 61 Time Copyright 2007 by Lina J. Karam 1 Motion Estimation and Compensation Video is a sequence of frames (images)

More information

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Yongying Gao and Hayder Radha Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48823 email:

More information

Fast Mode Decision for H.264/AVC Using Mode Prediction

Fast Mode Decision for H.264/AVC Using Mode Prediction Fast Mode Decision for H.264/AVC Using Mode Prediction Song-Hak Ri and Joern Ostermann Institut fuer Informationsverarbeitung, Appelstr 9A, D-30167 Hannover, Germany ri@tnt.uni-hannover.de ostermann@tnt.uni-hannover.de

More information

CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS

CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS Luat Do 1, Svitlana Zinger 1, and Peter H. N. de With 1,2 1 Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven,

More information

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM 1 KALIKI SRI HARSHA REDDY, 2 R.SARAVANAN 1 M.Tech VLSI Design, SASTRA University, Thanjavur, Tamilnadu,

More information

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Torsten Palfner, Alexander Mali and Erika Müller Institute of Telecommunications and Information Technology, University of

More information

Quality improving techniques in DIBR for free-viewpoint video Do, Q.L.; Zinger, S.; Morvan, Y.; de With, P.H.N.

Quality improving techniques in DIBR for free-viewpoint video Do, Q.L.; Zinger, S.; Morvan, Y.; de With, P.H.N. Quality improving techniques in DIBR for free-viewpoint video Do, Q.L.; Zinger, S.; Morvan, Y.; de With, P.H.N. Published in: Proceedings of the 3DTV Conference : The True Vision - Capture, Transmission

More information

Introduction to Medical Imaging (5XSA0) Module 5

Introduction to Medical Imaging (5XSA0) Module 5 Introduction to Medical Imaging (5XSA0) Module 5 Segmentation Jungong Han, Dirk Farin, Sveta Zinger ( s.zinger@tue.nl ) 1 Outline Introduction Color Segmentation region-growing region-merging watershed

More information

ABSTRACT

ABSTRACT Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 3rd Meeting: Geneva, CH, 17 23 Jan. 2013 Document: JCT3V- C1005_d0 Title: Test Model

More information

The Scope of Picture and Video Coding Standardization

The Scope of Picture and Video Coding Standardization H.120 H.261 Video Coding Standards MPEG-1 and MPEG-2/H.262 H.263 MPEG-4 H.264 / MPEG-4 AVC Thomas Wiegand: Digital Image Communication Video Coding Standards 1 The Scope of Picture and Video Coding Standardization

More information

EE Low Complexity H.264 encoder for mobile applications

EE Low Complexity H.264 encoder for mobile applications EE 5359 Low Complexity H.264 encoder for mobile applications Thejaswini Purushotham Student I.D.: 1000-616 811 Date: February 18,2010 Objective The objective of the project is to implement a low-complexity

More information

Scalable Extension of HEVC 한종기

Scalable Extension of HEVC 한종기 Scalable Extension of HEVC 한종기 Contents 0. Overview for Scalable Extension of HEVC 1. Requirements and Test Points 2. Coding Gain/Efficiency 3. Complexity 4. System Level Considerations 5. Related Contributions

More information

Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding

Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding 2009 11th IEEE International Symposium on Multimedia Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding Ghazaleh R. Esmaili and Pamela C. Cosman Department of Electrical and

More information

Video Codecs. National Chiao Tung University Chun-Jen Tsai 1/5/2015

Video Codecs. National Chiao Tung University Chun-Jen Tsai 1/5/2015 Video Codecs National Chiao Tung University Chun-Jen Tsai 1/5/2015 Video Systems A complete end-to-end video system: A/D color conversion encoder decoder color conversion D/A bitstream YC B C R format

More information

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames Ki-Kit Lai, Yui-Lam Chan, and Wan-Chi Siu Centre for Signal Processing Department of Electronic and Information Engineering

More information

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE 5359 Gaurav Hansda 1000721849 gaurav.hansda@mavs.uta.edu Outline Introduction to H.264 Current algorithms for

More information

Improving Intra Pixel prediction for H.264 video coding

Improving Intra Pixel prediction for H.264 video coding MEE 08:19 Improving Intra Pixel prediction for H.264 video coding Senay Amanuel Negusse This thesis is presented as part of Degree of Master of Science in Electrical Engineering Blekinge Institute of Technology

More information

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression An Optimized Template Matching Approach to Intra Coding in Video/Image Compression Hui Su, Jingning Han, and Yaowu Xu Chrome Media, Google Inc., 1950 Charleston Road, Mountain View, CA 94043 ABSTRACT The

More information

Adaptive Quantization for Video Compression in Frequency Domain

Adaptive Quantization for Video Compression in Frequency Domain Adaptive Quantization for Video Compression in Frequency Domain *Aree A. Mohammed and **Alan A. Abdulla * Computer Science Department ** Mathematic Department University of Sulaimani P.O.Box: 334 Sulaimani

More information

Intra-Mode Indexed Nonuniform Quantization Parameter Matrices in AVC/H.264

Intra-Mode Indexed Nonuniform Quantization Parameter Matrices in AVC/H.264 Intra-Mode Indexed Nonuniform Quantization Parameter Matrices in AVC/H.264 Jing Hu and Jerry D. Gibson Department of Electrical and Computer Engineering University of California, Santa Barbara, California

More information

Context based optimal shape coding

Context based optimal shape coding IEEE Signal Processing Society 1999 Workshop on Multimedia Signal Processing September 13-15, 1999, Copenhagen, Denmark Electronic Proceedings 1999 IEEE Context based optimal shape coding Gerry Melnikov,

More information

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn Basic Video Compression Techniques Chapter 10 10.1 Introduction to Video Compression

More information

Recent, Current and Future Developments in Video Coding

Recent, Current and Future Developments in Video Coding Recent, Current and Future Developments in Video Coding Jens-Rainer Ohm Inst. of Commun. Engineering Outline Recent and current activities in MPEG Video and JVT Scalable Video Coding Multiview Video Coding

More information

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ)

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) 5 MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) Contents 5.1 Introduction.128 5.2 Vector Quantization in MRT Domain Using Isometric Transformations and Scaling.130 5.2.1

More information

ENCODER COMPLEXITY REDUCTION WITH SELECTIVE MOTION MERGE IN HEVC ABHISHEK HASSAN THUNGARAJ. Presented to the Faculty of the Graduate School of

ENCODER COMPLEXITY REDUCTION WITH SELECTIVE MOTION MERGE IN HEVC ABHISHEK HASSAN THUNGARAJ. Presented to the Faculty of the Graduate School of ENCODER COMPLEXITY REDUCTION WITH SELECTIVE MOTION MERGE IN HEVC by ABHISHEK HASSAN THUNGARAJ Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment

More information

Introduction to Video Compression

Introduction to Video Compression Insight, Analysis, and Advice on Signal Processing Technology Introduction to Video Compression Jeff Bier Berkeley Design Technology, Inc. info@bdti.com http://www.bdti.com Outline Motivation and scope

More information

Anno accademico 2006/2007. Davide Migliore

Anno accademico 2006/2007. Davide Migliore Robotica Anno accademico 6/7 Davide Migliore migliore@elet.polimi.it Today What is a feature? Some useful information The world of features: Detectors Edges detection Corners/Points detection Descriptors?!?!?

More information

Review for the Final

Review for the Final Review for the Final CS 635 Review (Topics Covered) Image Compression Lossless Coding Compression Huffman Interpixel RLE Lossy Quantization Discrete Cosine Transform JPEG CS 635 Review (Topics Covered)

More information

In the name of Allah. the compassionate, the merciful

In the name of Allah. the compassionate, the merciful In the name of Allah the compassionate, the merciful Digital Video Systems S. Kasaei Room: CE 315 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu Webpage:

More information

Video encoders have always been one of the resource

Video encoders have always been one of the resource Fast Coding Unit Partition Search Satish Lokkoju # \ Dinesh Reddl2 # Samsung India Software Operations Private Ltd Bangalore, India. l l.satish@samsung.com 2 0inesh.reddy@samsung.com Abstract- Quad tree

More information