Development and optimization of coding algorithms for mobile 3DTV. Gerhard Tech Heribert Brust Karsten Müller Anil Aksay Done Bugdayci

Size: px

Start display at page:

Download "Development and optimization of coding algorithms for mobile 3DTV. Gerhard Tech Heribert Brust Karsten Müller Anil Aksay Done Bugdayci"

Elmer Kelly
6 years ago
Views:

1 Development and optimization of coding algorithms for mobile 3DTV Gerhard Tech Heribert Brust Karsten Müller Anil Aksay Done Bugdayci

Project No. 216503 Development and optimization of coding algorithms for mobile 3DTV Gerhard Tech, Heribert Brust, Karsten Müller, Anil Aksay, Done Bugdayci Abstract: Error resilience tools for H.

2 Project No Development and optimization of coding algorithms for mobile 3DTV Gerhard Tech, Heribert Brust, Karsten Müller, Anil Aksay, Done Bugdayci Abstract: Error resilience tools for H.264/AVC are presented. Slice encoding has been implemented in the H.264/MVC reference software JMVC An evaluation of the new encoder shows that the additional bit rate needed for the error resilience can be neglected for error free channels. In case of error-prone channel the new slice mode provides sufficient error resilience. The Mixed Resolution Stereo Coding (MRSC) approach has been evaluated. The optimal bit rate distribution between left and down sample right view has been examined. The Advanced Mixed Resolution Stereo Coding (AMRSC) approach has been developed. The three main features of AMRSC are optimized down sampling, interview prediction and view enhancement using unsharp masking. The suitability of sub sampling and low pass filtering together with interview prediction has objectively been evaluated. Further improvements of coding efficiency have been achieved by optimizing the bit rate distribution between the full view and the predicted down-sampled view. AMRSC is compliant with the overall MVC coding strategy in Mobile3DTV. For subjective evaluation of coding methods 96 test stimuli have been generated from the six sequences of the coding test set using Simulcast, Multi View, Mixed Resolution and Video + Depth coding. Two codec profile and a low and a high quality level have been used. 24 test stimuli have been generated for subjective evaluation of transmission approaches using Simulcast, Multi View and Video + Depth coding. Half of the 24 sequences have been coded using the new encoder supporting error resilience methods. Keywords: 3DTV, Error Resilience, Mixed Resolution Coding, Generation of coded test sequences

3 Executive Summary This deliverable is tripartite. The first part deals with the new prototype of the software encoder using error resilience tools. The second part describes the examination and the development of the Mixed Resolution approach. The optimization of the coding approach for subjective tests is presented in the last part. H.264/AVC provides several error resilience tools. However none of them is implemented in JMVC Reference Software for MVC extension of H.264/AVC. The implementation of slice encoding into the H.264/MVC reference software JMVC has been carried out. Frames are stored in smaller data packets that can still be decoded independently in case of losses. An evaluation of the new encoder has been carried out using an error-free and error-prone channel. Coding tests show that the additional bit rate needed for the error resilience can be neglected and video quality only decreases slightly for error-free channels. In case of error-prone channel it has been demonstrated that the new slice mode provides sufficient error resilience and leads to a high gain of video quality. In the context of stereo coding, both views have the same resolution in classical coder settings. Here, an interesting alternative is the Mixed Resolution Coding approach, which is also evaluated. It is found that the optimal bit rate distribution between left and down sample right view is approximately 30% to 35% for the down sampled view. The quality of Mixed Resolution and Full Resolution Coding is subjectively evaluated and shows that the subjective quality of coded Mixed Resolution sequences is better than Simulcast coded sequences, due to decreasing number of coding artifacts. Although this approach yields lower bitrates, the perceived quality may not always be close to the full view. Therefore, beyond the Mixed Resolution approach, an Advanced Mixed Resolution Stereo Coding (AMRSC) approach has been developed. The three main features of the AMRSC approach are optimized down sampling, interview prediction and view enhancement using unsharp masking at the receiver side. The suitability of sub sampling and low pass filtering together with interview prediction was objectively evaluated. Further improvements in coding efficiency were achieved by optimizing the bit rate distribution between the full view and the predicted down-sampled view. For the Mobile3DTV application, this means that an MVC codec will be used, for which the AMRSC approach is now also compliant. Test stimuli for subjective evaluations have been generated. Coding results for various stereo video coding approaches, codecs and codec settings are presented. For subjective evaluation of coding methods in total 96 test stimuli have been generated from the six sequences of the coding test set using Simulcast, Multi View, Mixed Resolution and Video + Depth Coding. Moreover a baseline and a high codec profile were used. An objective evaluation was carried out at a low and a high quality level. 24 test stimuli have been generated for the subjective evaluation of transmission methods. Therefore the four sequences of the transmission test set have been coded using Simulcast, Multi View and Video plus Depth coding. Half of the sequences were coded using the new encoder with error resilience tools. 2

4 Table of Contents 1 Introduction Software encoder using error resilience tools Slice Interleaving Modified MVC encoder Modified MVC bit stream assembler Modified MVC decoder Evaluation of MVC encoder using slice mode Test Setup Coding Results for error free channel Coding Results for error prone channel Conclusion Mixed Resolution Coding Optimization of Mixed Resolution Stereo Coding (MRSC) Different sampling methods of MRSC Objective criteria for bit rate allocation Subjective evaluation Advanced Mixed Resolution Coding (AMRSC) Interview Prediction View Enhancement using unsharp masking Conclusion Optimization of coding approaches for subjective tests Test sequences for subjective evaluation of coding approaches Test setup Coding Results Generated Test Stimuli Conclusion Test sequences for transmission studies Test setup Coding Results Generated Test Stimuli Conclusion Conclusion

5 1 Introduction This deliverable consists of three parts. The first part deals with the new prototype of the software encoder using error resilience tools. The second part describes the examination and the development of the Mixed Resolution approach. The optimization of coding approaches for subjective tests is presented in the last part. H.264/AVC provides several error resilience tools. However none of them is implemented in JMVC Reference Software for MVC extension of H.264/AVC. The implementation of slice encoding into the H.264/MVC reference software JMVC is reported in section 2. Moreover coding results using this new encoder are presented. The evaluation has been carried out using an error-free and error-prone channel. In section 3 the Mixed Resolution Coding approach is presented. Different types of sub sampling one view are compared. The optimum bit rate distribution between both views is investigated and the quality of Mixed Resolution and Full Resolution Coding is subjectively evaluated. An Advanced Mixed Resolution Stereo Coding Approach (AMRSC) is presented. Therefore the coding of mixed resolution sequences was improved by exploiting interview dependences. Moreover the suitability of sub sampling and low pass filtering together with interview prediction was objectively evaluated. Improvements in coding efficiency were achieved by choosing different QP parameters for left and right view. Finally the enhancement of the subjective quality by applying a simple unsharp masking algorithm was investigated. Section 4 presents methods for the generation of test stimuli for subjective tests. Coding results for various stereo video coding approaches, codecs and codec settings are presented. The focus is set on the generation of test stimuli with defined bit rates for subjective comparison of coding approaches as well as transmission methods. Test stimuli generated for the coding test also include sequences of the simple Mixed Resolution approach. For generation of some of the transmission test stimuli, the new encoder using error resilience tools has been utilized. 4

6 2 Software encoder using error resilience tools Error resilient tools in H.264/AVC are data partitioning, slice interleaving, flexible macroblock (MB) ordering (FMO), SP/SI frames, reference frame selection, intra block refreshing and redundant slices [1]. SP/SI frames and reference frame selection requires feedback from the decoder. Data partitioning, slice interleaving, FMO, intra block refreshing and redundant slices are the candidates to be used in MVC. However none of these tools are implemented in JMVC Reference Software for MVC extension of H.264/AVC. Slice interleaving for error resilience has been integrated to JMVC By this way, it is possible to code each different representation with/without slices. 2.1 Slice Interleaving Parameter sets Slice #0 Slice #1 Slice #2 Slice #0 Slice #0 Slice #1 Slice #0 Frame #0 Bit stream NAL packets Frame #1 Frame #2 Frame #3 Video sequence Figure 2.1: Bit stream syntax of H.264/AVC using fixed-size slices H.264/AVC bit stream is composed of network abstraction layer (NAL) units as shown in Figure 2.1. In each NAL unit, there is a video coding layer (VCL) block. VCL can be a small packet with information about the bitstream like sequence parameter set (SPS); picture parameter set (PPS) or supplemental enhancement information (SEI). SPS and PPS are required packets whereas SEI can be skipped. Other VCL packets are the coded video streams. Each packet is a slice containing an integer number of macro block. It can contain all macro blocks of a frame or it can contain a single macro block. Figure 2.2 depicts a frame encoded using several slices. Slices are independently decodable if previous frames are available. This is achieved by using the location information in the slice header and by allowing spatial dependency only inside the slice. For compression efficiency, using a single slice per frame is better in order to avoid header overhead. 5

7 Slice #0 Slice #1 Slice #3 Slice #4 Figure 2.2 Slice encoding of a frame using fixed-size slices If NAL unit size is bigger than Maximum Transmission Unit (MTU) of the corresponding transport medium, it will be fragmented into smaller packets. In erroneous environments, some of these smaller packets can be lost, and this will cause the system to lose the entire frame, since parts of a NALU cannot be decoded by the decoder. However, by encoding a frame into several slices so that each slice size is smaller than MTU, each packet arrived at the decoder can be decoded correctly. In Figure 2.3 it is shown how the same error pattern is applied to both slice and no slice encoded streams. Some of the slices can still be decoded in case of slice encoding. The performance of slice encoding can be affected by the burst size of the error and also the size of the slices. time time : Bit Error : Lost Packet time Figure 2.3 Sliced and not-sliced encoding in cases of erroneous transmission Slice encoding and decoding has been implemented into JMVC by modifying the encoder, decoder and bit stream assembler. In the current version, encoder generates several numbers of slices for a frame according to the input slice size parameter. 2.2 Modified MVC encoder There are several functions modified in order to integrate slice encoding into MVC encoder. First a new function is added to check the total bytes spend for the currently encoded slice. By using this function, slice encoding loop is modified. Instead of coding all MBs in the frame, there is a check after each MB is encoded. If the total allowed slice size is smaller than the current slice size, current slice is finalized and the next MB will start with a new slice. Another modification is done for the loop filter functions. Previously loop filter operations are applied for all MBs in the frame. This part is changed so that loop filter operations applied for the MBs inside the current slice. 6

8 2.3 Modified MVC bit stream assembler This is simply the assembling of the left and right streams before the decoder and it is necessary since JMVC software decoder takes the multi view coded streams in assembled format. The decoder uses the inter-view prediction references between the two streams so the assembling is done such that the frames of each stream are put in order containing one from left and one from right view streams. JMVC bit assembler assumes each frame is encoded using a single slice and there are no losses in the stream. Instead of fixing these problems, a new application is written which assembles left and right streams correctly in case of slice encoding and also losses. 2.4 Modified MVC decoder Since JMVC Reference software does not have the slice mode implemented, adding slice mode in the encoder brought the issue of decoder modification. Decoder software does not require any modification for slice decoding. However there are some modifications for error handling in case of slice and/or frame losses. Error concealment is not a normative part of H.264/AVC. More information on H.264/AVC error concealment can be found in [1]. However, these concealment strategies are not included in the MVC software. In order to handle frame losses another application is used to insert skip frames. This will enable decoder to decode lost frames as a copy of the previous frames in the buffer. For slice losses, modifications are done in the MVC decoder. New functions are added to check the decoded MBs in each frame and identify the missing MBs when decoding of the current picture finishes. Missing MBs are copied from the collocated MBs from the nearest decoded picture in the buffer. Also loop filter functions are modified to disable loop filtering for the missing MBs. 7

9 2.5 Evaluation of MVC encoder using slice mode Test Setup The evaluation of the MVC encoder was carried out using the four sequences of the test set for transmission studies (see section ). To generate test sequences the slice argument determining the slice size was varied as well as the QP. Codec Parameters are given in Table 2.1. A slice size of zero means that slice interleaving was disabled. Profile GOP Size Symbol Mode Table 2.1 Codec Settings Baseline 1 (IPPP) CAVLC Search Range 48 Intra Period 16 QP 24, 28, 32, 36, 40 Slice Size (byte) 250, 500, 750, 1000, 1250 The performance of the encoder has been evaluated for an error-free and an error prone channel. To generated distorted sequences the channel model shown in Table 2.2 has been used. For details please refer to [2]. Table 2.2 Channel and transmission parameters Channel Model Modulation (COST207) Typical Urban 6 taps fdmax=24hz 16QAM Convolutional Code Rate 2/3 Guard Interval 1/4 FFT Mode SNR 8K 17dB 8

10 2.5.2 Coding Results for error free channel The rate-distortion characteristics of the error resilient encoder in case of an error-free channel are depicted in Figure 2.4. With decreasing size of the slices the performance of the encoder decreases. This can be explained with an increasing overhead, introduced by the higher number of data packets. Nevertheless for a slice size of e.g byte performance losses are below 0.5 db can be neglected. Figure 2.4 PSNR vs. bit rate, for different slice sizes and sequences; error-free channel 9

11 2.5.3 Coding Results for error prone channel Results for the error-prone channel are depicted in Figure 2.5. Each of the shown rate-distortion points is an average over rates and distortions of five sequences distorted with different error patterns. Note that averaging is required for evaluation, since coding different QPs and slice sizes results in bit streams of different length that are distorted a different position. For a single sequence quality can decrease with enabled slice mode. Reason for this is that losses that occurred in uncritical parts of the bit stream generated without slice mode can be shifted to critical parts of the bit stream when coded with enabled slice mode. However, in average the slice interleaving should lead to an increased quality. To show this the evaluation has to be carried out statistically. Here only five different error patterns have been used per bit stream. Hence the influence of outliers can still be seen in Figure 2.5 (For example for RollerBlade1 at a slice size of 250). Nevertheless the converging tendency can already be observed: A smaller slice size results in an increasing performance. For high rates the gain obtained by using the slice mode increases. Reason for this is that the length of critical parts in the bit stream increases and losses of important data packets are more likely. This effect can be diminished by slice partitioning effectively. Figure 2.5 PSNR vs. bit rate, for different slice sizes and sequences; error-prone channel; average over sequences distorted by 5 different error patterns 10

12 2.6 Conclusion The prototype of the software encoder using error resilience has been presented. Error resilience is achieved by a new slice mode. Frames are stored in smaller data packets that can still be decoded independently in case of losses. Coding tests show that the additional bit rate needed for the error resilience can be neglected and video quality only decreases slightly for error free channels. In case of error-prone channel it has been demonstrated that the new slice mode provides sufficient error resilience and leads to a high gain of video quality. Beyond the objective examination of the slice encoder carried out here, a subjective evaluation of the encoder using slice mode will be carried out in a large scale subjective test and be reported in the upcoming deliverable D4.3 Results of quality attributes of coding, transmission and their combinations. Further error protection applied in the lower layers will be reported in the upcoming deliverable D3.4 Stereo DVB-H broadcasting system with error resilient tools. 11

3 Mixed Resolution Coding In this section the Mixed Resolution Coding approach is presented. Different types of sub sampling one view are compared.

The Advanced Mixed Resolution Stereo Coding (AMRSC) is presented. Therefore the coding of mixed resolution sequences was improved by exploiting interview dependences.

Improvements in coding efficiency were achieved by choosing different QP parameters for left and right view.

1 Optimization of Mixed Resolution Stereo Coding (MRSC) If the sharpness in two views of a stereoscopic signal is different, the perceived quality is close to the sharper view (binocular suppression

13 3 Mixed Resolution Coding In this section the Mixed Resolution Coding approach is presented. Different types of sub sampling one view are compared. The optimum bit rate distribution between both views is investigated and the quality of Mixed Resolution and Full Resolution coding is subjectively evaluated. The Advanced Mixed Resolution Stereo Coding (AMRSC) is presented. Therefore the coding of mixed resolution sequences was improved by exploiting interview dependences. Moreover the suitability of sub sampling and low pass filtering together with interview prediction was objectively evaluated. Improvements in coding efficiency were achieved by choosing different QP parameters for left and right view. Finally the enhancement of the subjective quality by applying a simple unsharp masking algorithm was investigated. 3.1 Optimization of Mixed Resolution Stereo Coding (MRSC) If the sharpness in two views of a stereoscopic signal is different, the perceived quality is close to the sharper view (binocular suppression theory) [3], [4]. In the presence of different amount of blocking artifacts however, the binocular quality is rated as the average of both views. This means that it should be possible to transmit a stereoscopic video with one reduced resolution view (Mixed Resolution representation) at a lower bit rate and to still reach the same quality as for the Full Resolution representation Different sampling methods of MRSC Different ways of reducing the sharpness of one view were investigated. Sub sampling one view in both directions and sub sampling only in one direction were compared. Another method of reducing the sharpness is low pass filtering and coding at the base resolution. These methods have been compared to Full Resolution coding in an informal subjective test with the sequences Mountain, Diving, Performance and Soccer 2 (Figure 3.1). (a) (b) (c) (d) Figure 3.1: Test sequences for subjective comparison of Mixed Resolution and Full Resolution coding(left view): (a) Mountain (b) Diving (c) Performance (d) Soccer 2 12

14 The sequences have a resolution of 320x240 Pixel, a frame rate of 30 frames per second and a length of 240 frames (Mountain, Diving and Performance) and 450 frames (Soccer2) respectively. For down sampling and up sampling, the filters used in the JSVM software, were applied [5] row wise and column wise. These are the non-normative dyadic filter for down sampling (equation 1) and the normative dyadic filter for up sampling (equation 2) (1) (2) The test was carried out on a 3.5" display with barrier technology. It has a total resolution of 640x480 pixel and a resolution of 320x480 pixel per view in 3D mode. The sequences were displayed with the stereoscopic player [6] which does a vertical up sampling by a factor of two. Four different coding types were compared (Table 3.1). Table 3.1: Tested types of sub sampling of coded stereoscopic sequences Left view Right view Type 1 Coding of full resolution view Coding of full resolution view Type 2 Type 3 Type 4 Horizontal sub sampling by a factor of 2 and coding Horizontal and vertical sub sampling by a factor of 2 and coding Low pass filtering in horizontal and vertical direction and coding of full resolution view Coding of full resolution view Coding of full resolution view Coding of full resolution view The total bit rate for all types was the same and the bit rate distributions were chosen to 1:2, 1:4 and 1:8 for left view vs. right view. For coding, H.264/AVC simulcast with reference software JM 14.2 was used. The different coding types were subjectively rated by 5 video coding experts in an informal test from best to worst in the following order: 1. Type 3 2. Type 2 3. Type 4 4. Type 1 This indicates that it is possible to transmit a Mixed Resolution sequence with the same bit rate at a higher subjective quality. To verify these results, further tests have been carried out to show the difference between the best Mixed Resolution representation (Type 3) and Full Resolution. In order to do further subjective tests, the best bit rate distribution had to be found. This was done with objective criteria, described in the next section. 13

15 3.1.2 Objective criteria for bit rate allocation To optimize the bit rate distribution between left and right view, the PSNR measure was used. Due to the theory that the binocular quality of a stereo sequence is the average of the quality of both views, in presence of blocking artefacts [4], a total PSNR was calculated, considering all pixels of both views. To realize that with existing tools, the mean squared error (MSE) was first calculated separately for both views. After that the total PSNR was calculated from both mean squared errors for left and right view using the following equations: PSNR 10 log10 (3) MSE t MSE t MSE 1 MSE 2 (4) 2 In the case of Mixed Resolution Coding, the calculation of mean squared error was done after up-sampling the lower resolution view. The calculation was done with respect to the low pass filtered and up-sampled original view. Figure 3.2 shows the rate-distortion curves for the sequence Mountain for left and right view for Mixed Resolution in which the left view was coded at half horizontal and half vertical resolution, and Full Resolution without any low pass filtering. A total bit rate of 400 kbit/s was used. The bit rate distribution varied over the entire range from 100% for left view to 100% for right view. Both curves were then interpolated with a cubic spline interpolation to calculate the total PSNR and match the exact value of 400 kbit/s total bit rate. (a) (b) Figure 3.2: PSNR for left view, right view and total PSNR with (a) Mixed Resolution and (b) Full Resolution It can be seen that the total PSNR curve for Mixed Resolution has its maximum at 30% for the left (subsampled) view and for Full Resolution at 45% for the left (full) view. Moreover it can be seen that the total PSNR reaches higher values for Mixed Resolution than for Full Resolution. This comes from the fact that the PSNR for the left view was calculated with respect to the low 14

16 pass filtered up-sampled uncoded view. Hence the total PSNR curves do not take blur into account. However, they follow the binocular quality, if there is no difference between Mixed Resolution (with one low pass filtered, down-sampled and up-sampled view) and Full Resolution visible. It was shown in D2.4 that the difference between Mixed Resolution and Full Resolution is minimized with increasing base resolution, increasing viewing distance and decreasing display size. Figure 3.3 shows the total PSNR curves for Mixed Resolution and Full Resolution for different sequences with the following total bit rates: Mountain (425 kbit/s), Diving (236 kbit/s), Performance (1280 kbit/s) and Soccer 2 (350 kbit/s). (a) (b) (c) (d) Figure 3.3: Total PSNR for Mixed Resolution and Full Resolution for the sequences (a) Mountain, (b) Diving, (c) Performance and (d) Soccer 2 The maximum of the total PSNR curve for Full Resolution lies around 50% for the left view. For Mixed Resolution the total PSNR reaches its maximum at 30% (Mountain and Performance), 35% (Soccer 2) and 45% (Diving). This shows that the optimum bit rate distribution between both views is sequence-dependent. 15

17 Figure 3.4 shows the total PSNR curves and its maxima for the sequence Mountain at different total bit rates. It can be seen that the optimum distribution depends on the total bit rate. Figure 3.4: Total PSNR and corresponding maxima for the sequence Mountain at total bit rates of 200 kbit/s, 400 kbit/s, 600 kbit/s, 800 kbit/s and 1000 kbit/s (from bottom to top) Subjective evaluation Small scale subjective tests were carried out to compare Mixed Resolution coding with Full Resolution coding. The test setup was the same as described in the subjective tests with uncoded sequences in D2.4 [5]. The 3.5" and the 32" stereoscopic displays were used. The sequences Mountain, Diving, Performance and Soccer2 were shown to 13 expert viewers in an A- B preference vote in the following order: AABBAABB It was randomly chosen, whether A was the Mixed Resolution sequence and B Full Resolution sequence or vice versa. After that test persons should rate, if A or B had the better overall quality. The tested bit rates are shown in Table 3.2. Table 3.2: Bit rate distribution to left and right view for subjective tests with Mixed Resolution and Full Resolution Sequence Mountain Mountain Diving Performance Soccer2 Soccer2 Total bitrate [kbit/s] Bit rate left view /total bit rate: Mixed Resolution [%] Bit rate left view /total bit rate: Full Resolution [%]

18 Setup I (3.5 display) h/d = 1/10 Setup II (32 display) h/d = 1/5 Table 3.3: Result of subjective tests with coded sequences Mountain 320 kbit/s Mountain 425 kbit/s Diving 236 kbit/s Performance 1280 kbit/s Soccer2 260 kbit/s Soccer2 350 kbit/s Mixed Resolution better No difference Full Resolution better total Mountain 320 kbit/s Mountain 425 kbit/s Diving 236 kbit/s Performance 1280 kbit/s Soccer2 260 kbit/s Soccer2 350 kbit/s total

19 The results of these tests are shown in Table 3.3. It can be seen, that for both displays Mixed Resolution has a slightly better binocular quality than Full Resolution. This means that for these relatively small bit rates the stronger blocking artifacts in the case of Full Resolution are more annoying than the slightly unsharper images in the case of Mixed Resolution. Nevertheless the performance of the Mixed Resolution approach is also display dependent. In the large scale subjective evaluation of the coding approaches the Mixed Resolution approach does not outperform the simulcast approach (see upcoming Deliverable 4.3 Results of quality attributes of coding, transmission and their combinations ). This can be related to the advanced NEC display that is used in the large scale study. The NEC autostereoscopic 3.5 display with a resolution of 428 x 240 is based on a lenticular sheet technology and provides a much better video quality than the display based on parallax barrier technology. The sharpness difference introduced by mixed resolution seems to be more visible on the NEC display. However, the evaluation carried out here shows the potential of the mixed resolution approach. Therefore an advanced mixed resolution approach has been investigated and is presented in the next section. 3.2 Advanced Mixed Resolution Coding (AMRSC) The coding tests in section 3.1 were carried out without using any prediction between both views. Coding a stereoscopic video with inter-view prediction can result in bit rate savings while maintaining the same quality. It was investigated whether inter-view prediction and Mixed Resolution Coding can be combined to obtain better coding results than applying only one of the two techniques. The unsharp masking algorithm presented in section is a method for enhancing the subjective quality of Mixed Resolution sequences. It is a simple algorithm that can be applied on a mobile device with low computational costs. A further enhancement of the Mixed Resolution approach can be obtained by using optimized down sampling algorithms. An investigation of these methods is not part of this deliverable but presented in the upcoming Deliverable 5.4 ( Advanced algorithms for stereo-video preprocessing ) Interview Prediction It was reported in D2.2 that using H.264/MVC (Multiview Video Coding) with inter-view prediction results in a significantly better rate distortion performance than simulcast coding. On the other hand, coding a low pass filtered or sub sampled video requires less bit rate than the original video and maintains a high binocular quality due to the binocular suppression theory. It was investigated how the Mixed Resolution Coding can be improved by exploiting inter-view dependences of both views Low Pass Filtering and Sub sampling This section describes the coding experiments of low pass filtered and sub sampled views with interview and without interview prediction. The right view was coded with the base resolution. The left view was coded with four different methods: Method I: low pass filtering and coding with base resolution (Left LP). Method II: low pass filtering and sub-sampling by a factor of two in both directions and coding at the reduced resolution (Left DS). Method III: the decoded right view was used for inter-view prediction for the low pass filtered left view (Left LP IV). 18

Method IV: the decoded right view was low pass filtered and sub-sampled by a factor of two and used for inter-view prediction of the low pass filtered

The tested sequences are Hands (251 frames), Snail (189 frames), Horse (140 frames) and Car (235 frames) with a base resolution of 480x272 pixel.

5 Sequences for coding test with Mixed Resolution and interview prediction: (a) Hands, (b) Snail, (c) Horse and (d) Car The encoder settings are shown

20 Method IV: the decoded right view was low pass filtered and sub-sampled by a factor of two and used for inter-view prediction of the low pass filtered and sub-sampled left view (Left DS IV) [7]. For all methods the H.264/MVC was used. The tested sequences are Hands (251 frames), Snail (189 frames), Horse (140 frames) and Car (235 frames) with a base resolution of 480x272 pixel. (a) (b) (c) (d) Figure 3.5 Sequences for coding test with Mixed Resolution and interview prediction: (a) Hands, (b) Snail, (c) Horse and (d) Car The encoder settings are shown in Table 3.4. The QP parameter was varied from 20 to 44. For the methods using interview prediction, the QP for left and right view were the same. Table 3.4 Encoder setting for Mixed Resolution Coding with interview prediction Encoder Implementation JMVM 7.0 Quantization Parameter 20, 22, 24, 26,..., 44 GOP Size 2 Intra Period 16 Symbol Mode CAVLC 19

21 (a) (b) Figure 3.6: Coding results for Mixed Resolution with inter-view prediction, LP = low-pass filtered, DS = down-sampled, IV = inter-view prediction: (a) Hands, (b) Snail 20

22 (a) (b) Figure 3.7: Coding results for Mixed Resolution with inter-view prediction, LP = low-pass filtered, DS = down-sampled, IV = inter-view prediction: (a) Horse and (b) Car 21

23 The rate-distortion curves are shown in Figure 3.6 and Figure 3.7. The right (original) view has the worst performance because it was coded with base resolution and high details. The coding of the low pass filtered left view (Left LP) shows a better performance because the PSNR was calculated with respect to the low pass filtered uncoded view. The PSNR of the decoded low pass filtered and sub-sampled view (Left DS) was also calculated with respect to the low pass filtered and sub-sampled uncoded view. It can be seen for all sequences that coding with a lower resolution leads to a better rate distortion performance than only low pass filtering. With the use of inter-view prediction the low pass filtered version (Left LP IV) reaches some gain for low bit rates. For high bit rates nearly no enhancement is visible. The use of inter-view prediction for the low pass filtered and down-sampled version (Left DS IV) achieves a gain for all sequences for low and high bit rates. Bit rate savings of up to 70% compared to the coding of a down sampled view without inter-view prediction are possible for some sequences (Horse, Car) Bit rate allocation In the tests described in section the best result was achieved with coding a sub-sampled version of the left view with inter-view prediction from the right view. The QP values were the same for left and right view. This combination of QP values is not necessarily the best in terms of the overall binocular quality. It was further tested how the rate distortion performance changes for coding the left view when the QP value of the base (right) view is varied. The coder settings of Table 3.4 were used for this test, while the QP value of the right view was varied from 20 to 44 with step size 1. The decoded right view was low pass filtered and sub sampled, and used for inter-view prediction of the left view. The left view was coded with QP values from 20 to 44 with step size 4 with all QP values of the decoded right view. It can be seen in Figure 3.8 that based on the PSNR value for same QPs for left and right view, the PSNR value changes if the QP value of the right view varies. The PSNR value reaches higher values at lower left-view-only bit rates if the quality of the base view is increased. When the quality of the right view is decreased, the left view requires a higher bit rate and has a lower PSNR value. For the sequence Snail there are some exceptions of this behavior, but only for relatively low bit rates of the right view. Note, that this inverse PSNR behavior only occurs, because the left view PSNR is plotted against the left-view-only bit rate. For the total PSNR vs. total bit rate, the expected behavior occurs, as shown in the following figures. 22

24 (a) (b) (c) (d) Figure 3.8: Left view bit rate vs. left view PSNR; the down-sampled left view was coded with inter-view prediction from right view; the violet curve shows results for same QP values for left and right view; the black curves show the variation of rate and distortion with varying QP of the right view (a) Hands, (b) Snail, (c) Horse and (d) Car To find out which gain is achievable with different QP values compared to same QP values of left and right view, it is necessary to evaluate both views jointly. This was done by averaging the mean squared errors of both views. The mean squared error of the left view was calculated with respect to the low pass filtered uncoded left view. Figure 3.9 and Figure 3.10 show the total PSNR versus the total bit rate of both views. The displayed numbers are the QP values of the left (down sampled) and the right view with the highest total PSNR for particular bit rates. 23

25 (a) (b) Figure 3.9: Total PSNR versus total bit rate for Mixed Resolution Coding with inter-view prediction and the QP combinations for left (down sampled) and right view with the highest PSNR for particular bit rates: (a) Hands and (b) Snail 24

26 (a) (b) Figure 3.10: Total PSNR versus total bit rate for Mixed Resolution Coding with inter-view prediction and the QP combinations for left (down sampled) and right view with the highest PSNR for particular bit rates: Horse (a) and (b) Car 25

27 It can be seen that with this optimization the QP of the right (base) view is always higher than the QP of the left (down sampled) view. The bit rate distribution to left and right view is shown in Table 3.5. For all tested QPs the bit rate for the right view is higher than the bit rate for the left view. For the sequence Hands the bit rate difference between left and right view is lower than for the other sequences. The reason for this is that the sequence Hands has less inter-view dependences than the sequence Horse for example. Figure 3.11 and Figure 3.12 show the comparison between the optimized QP values for interview prediction and the results of inter-view prediction with same QPs for left and right view. For the sequences Car and Horse the rate distortion curves are nearly identical for both methods. There is a small gain of different QP values for the sequence Snail and a significant gain for the sequence Hands. The use of inter-view prediction with same QPs causes less gain compared to simulcast for the sequence Hands than for the other sequences. Because of that the optimization of the QP values results in high gains for the sequence Hands. Table 3.5: Coding results for left (down sampled) and right view with interview prediction Hands Snail Total bit rate [kbit/s] Total PSNR [db] QP left (DS) view QP right view Bit rate left view [kbit/s] Bit rate right view [kbit/s] Horse Car Total bit rate [kbit/s] Total PSNR [db] QP left (DS) view QP right view Bit rate left view [kbit/s] Bit rate right view [kbit/s]

28 (a) (b) Figure 3.11: Total bit rate versus total PSNR for Mixed resolution coding without inter-view prediction, with inter-view prediction with same QPs and with different QPs for left and right view: (a) Hand and (b) Snail 27

29 (a) (b) Figure 3.12: Total bit rate versus total PSNR for Mixed resolution coding without inter-view prediction, with inter-view prediction with same QPs and with different QPs for left and right view: (a) Horse and (b) Car 28

30 3.2.2 View Enhancement using unsharp masking The suitability of unsharp masking filters for enhancement of the binocular quality has been evaluated. The filter increases the subjective sharpness of the sub-sampled view, hence the approach has the potential to achieve a sharper overall image, as well as the potential to reduce the sharpness differences between both views. An advantage of this approach is that the bit rate for transmission does not increase. The resolution of the right (base) view was 480x272 pixel and the resolution of the left (sub sampled) view was 240x136. After decoding and up sampling an unsharp masking algorithm was applied to the left view. This algorithm uses the following convolution matrix (5) for the up-sampled image. This has the effect that in the resulting image, the low frequency components are reduced, while the high frequency components are enhanced. Thus, the algorithm produces a subjectively sharper sequence. The parameter α adjusts the factor of the unsharp masking. Here, α values of 2 and 4 and QP values of 25, 28, 31, 34, 37 were tested. The sequences were shown on the 3.5 and 32, displays (see section 3.1.3) in an informal expert viewing test. It was observed, that the binocular quality of the Mixed Resolution sequences, to which the method was applied, did not improve quality in all cases. Even a slightly worse quality was observed for medium and low bit rates. As expected the overall subjective sharpness was increased, but also the coding artifacts were amplified and binocular quality decreases. Nevertheless for high bit rates which support a quality appropriate in a real life scenario the positive effect is dominant and algorithm improves the quality, due to increased overall sharpness (Figure 3.13). 29

31 (a) (b) (c) (d) (e) (f) Figure 3.13: Part of sequence Horse for QP=25 (a), (c), (e) and QP=37 (b), (d), (f); (a) and (b): sub sampled decoded left view; (c) and (d): sub sampled decoded left view with unsharp masking; (e) and (f): decoded left view 30

32 3.3 Conclusion Expert evaluation shows that the subjective quality of coded Mixed Resolution sequences is better than simulcast coded sequences, due to decreasing number of coding artifacts. The optimized bit rate distribution between left and right view for Mixed Resolution Coding without interview prediction was approximately 30 to 35% for the low resolution view. Nevertheless in the large scale subjective evaluation of the coding approaches the Mixed Resolution approach does not outperform the simulcast approach (see the upcoming deliverable D4.3 Results of quality attributes of coding, transmission and their combinations ). This might be related to the advanced display used in the large scale study and the different test methodologies. However, the evaluation carried out in this scope show the potential of the mixed resolution approach. Therefore the Advanced Mixed Resolution Stereo Coding (AMRSC) approach has been investigated. The three main features of the AMRSC approach are optimized down sampling, interview prediction and view enhancement using unsharp masking at the receiver side. Optimized down sampling is reported in D5.4 ( Advanced algorithms for stereo-video preprocessing ) and leads to PSNR gains up to 1dB for the down-sampled view. Inter-view prediction significantly improves the rate distortion performance. The optimized QP combinations of both views show that the base view should be coded with a higher QP than the predicted low resolution view. The observed QP difference is between 2 and 8 for the tested sequences and settings. Unsharp masking of the low resolution view can enhance the overall quality but only for high bit rates. This does not apply to low or medium bit rates, because coding artifacts are also amplified by sharpening. Further potential for the Mixed Resolution approach lies in more advanced content-adaptive sharpening and up-sampling algorithms. Also approaches using information from the full view for reconstruction from the down-sampled view are thinkable. A higher performance might result also from an optimization of the AMRSC approach using a new 3D video quality metric which comprises the implications of the binocular suppression theory better than the PSNR used in this scope. 31

33 4 Optimization of coding approaches for subjective tests In this section methods for the generation of test stimuli for subjective tests are described. Coding results for various stereo video coding approaches, codecs and codec settings are presented. The focus of this section is set on the generation of test stimuli with defined bit rates for subjective comparison (in contrast to objective coding comparisons in previous Deliverable D2.2 [8]). Based on the results of D2.2 the coding approaches to be tested have been chosen. From the two possible methods for Video plus Depth Coding (MPEG-C part 3 using H.264/AVC and H.264 auxiliary picture syntax), MPEG-C part 3 has been selected. Reason for this is the flexibility of independent bit rate allocation for video and depth provided by MPEG-c part 3. Coding approaches using interview prediction are H.264/AVC Stereo SEI message and H.264/MVC. Out of those, H.264/MVC has been chosen in line with the 3D Video community, not for performance reasons, but for the reason of backward compatibility given by the possibility to extract a bit stream for 2D presentation. Beyond the methods examined in D2.2 the simple Mixed Resolution approach was optimized to generate test stimuli for the evaluation of coding methods. Furthermore the new prototype of the software encoder using slice mode has been utilized for the coding carried out for the evaluation of transmission approaches. Another difference to D2.2 is the choice of test sequences. The coding test set from D2.1 [9] and a transmission test set have been defined matching the user s needs examined in D4.1 [10]. For coding tests short sequences (~10s) are used. Longer sequences with audio (~60s) have been coded for the transmission tests. Further adjustments concern the spatial and temporal resolution: The video format was adapted to match the resolution of the new NEC display and the frame rate was set to 12.5 or 15 fps. 4.1 Test sequences for subjective evaluation of coding approaches The subjective evaluation of coding approaches targets the finding of the optimal approach for coding of stereo video content. Therefore a great variety of coded sequences has been generated. The next sections describe the test setup as well as the coding results Test setup For the large scale evaluation of the four coding approaches Simulcast, Multi View (MVC), Mixed Resolution (MRSC) and Video plus Depth (VD) coding using MPEG-C part 3 a set of test stimuli has been generated. The coding approaches have been optimized at rate points with a low and a high video quality. Furthermore a baseline and a high codec profile have been used. The six sequences from the coding test set of the stereo video database [9] have been used. This leads to a total number of 4 (approaches) x 2 (qualities) x 2 (profiles) x 6 (sequences) = 96 test stimuli Coding Approaches H.264/AVC Simulcast The left and right views are coded as independent streams using H.264/MPEG-4 AVC. Hence this method does not need any pre- or post processing before coding and after decoding, the complexity on sender and receiver side is low. Redundancy between channels is not exploited. Optimization is carried out by jointly varying the quantization parameter (QP) for left and right view. 32

34 H.264/AVC Multi View Coding (MVC) H.264/AVC MVC allows inter-view prediction. The left view is used as reference for the right view. Prediction has been enabled for anchor as well as for non-anchor frames. No pre- or postprocessing is required on the sender or receiver side. Optimization is carried out by jointly varying the QP for left and right view. H.264/AVC Mixed Resolution coding (MRSC) Binocular suppression theory states that perceived image quality is dominated by the view with higher spatial resolution [4]. The mixed resolution approach utilizes this attribute of human perception by decimating one view before transmission and up-scaling at the receiver side. This enables a trade off between spatial sub-sampling and amplitude quantization. Nevertheless sampling introduces pre- as well as post-processing. For experiments in this scope the right view was decimated by a factor of two in horizontal and vertical direction. The simple MRSC approach without interview prediction, optimized down-sampling and unsharp masking has been used. Optimization is carried out by independently varying the QP for left and right. MPEG C Part 3 using H.264/AVC (V+D Coding) MPEG-C Part 3 defines a video plus depth representation of the stereo video content. Depth was estimated from an original left and right view by the HHI Hybrid Recursive Matching (HRM) algorithm. One view and the associated depth signal are coded. At the receiver the second view is synthesized by depth image based rendering [11]. Compared to video, a depth signal can be coded in most cases at a fraction of the color bit rate at sufficient quality for view synthesis. Nevertheless errors in depth estimation and interpolation at occurring disocclusions introduce artefacts to the rendered view. Optimization is carried out by independently varying the QP for video and depth Test set The test set for coding defined in the stereo video database [9] was used to generate the test stimuli. The sequences are shown in Figure 4.1. Details are presented in Table 4.1. All sequences have a frame rate of 15 frames per second. Table 4.1 Properties of sequences from the coding test set Sequence Genre Movement Complexity Length Size in Camera Object Structural Depth in sec. pixels Horse Nature none low high medium x240 Bullinger News none low low low x240 Car Action high low medium high x240 Mountain Documentary medium low medium low x240 Butterfly Animation none high high medium x240 Soccer2 Sports high high medium high x240 33

35 Horse Bullinger Car Mountain Butterfly Figure 4.1 Sequences of the coding test set Soccer2 34

36 Codec Profiles Coding has been carried out using two codec profiles. The simple baseline profile uses an IPPP structure and CAVLC. The complex high profile enables hierarchical B-Frames and CABAC. For the Simulcast, Mixed Resolution and V+D approach the AVC Reference Software JM 14.2 has been used. The MVC stimuli have been coded using the MVC reference Software JMVC Table 4.2 shows the used codec settings in detail. Table 4.2 Codec Settings and Profiles Profile Baseline High GOP Size 1 (IPPP) 8 (Hierarchical B frames) Symbol Mode CAVLC CABAC Search Range Intra Period High and low quality The coding approaches have been evaluated at a high and low quality. Note, that it is not useful to define a constant high and constant low bit rate for all sequences to achieve high and low qualities for all sequences. Reason for this is a variable compressibility of different sequences. A rate sufficient for a high quality for one sequence might produce a low quality for other sequences. To guarantee a comparable low and a comparable high quality for all sequences a low and a high rate point had to be determined for each sequence individually. The following approach was used to obtain these rate points: To define a high and a low quality for all sequences of the coding test set the quantization parameters (QP) of the codec for simulcast coding was set to 30 for the high quality and 37 for the low quality. This results in a low and high bit rate for each sequence of the coding test set. Resulting bit rates are shown in Table 4.3 and have been used as target rates for the other three approaches together with the baseline profile. Table 4.3 Target bit rates in kbit/s for high and low quality Profile Quality Bullinger Butterfly Car Horse Mountain Soccer2 Baseline Low High High Low High Bit rates for the high profile are also shown in Table 4.3. They are the rates from the sequences coded with high profile and simulcast having the same PSNR as the sequences coded with the base profile and simulcast at QP 37 and QP 30. This guarantees a comparable objective quality for the baseline and high-profile sequences using simulcast. Hence it can be subjectively evaluated if the different GOP structures of the two profiles have an influence on the subjective quality which is not reflected by the PSNR. 35

37 4.1.2 Coding Results Baseline Profile Simulcast Figure 4.2 shows the RD-characteristics used for the optimization of the simulcast approach. For coding a QP range from 18 to 44 with a step size of one was used. Sequences matching the bit rates defined in Table 4.3 have been taken as test stimuli. The Bullinger sequence is highly compressible due to the very low complexity of the constant background and only slightly moving foreground. Content of the Butterfly and the Horse sequence have both a high structural complexity and no camera movement. Nevertheless coding gains for Butterfly are higher than for Horse. Reason for this is the absence noise and a higher similarity of subsequent frames in the artificial scene. In the sequences Mountain, Soccer2 and Car the camera is moving. The strongest camera motion can be found in Car, nevertheless the camera is only moving in forward direction thus the scene is changing rather slowly. This explains the higher gains compared to the Mountain and Soccer2 sequences in which the camera moves in horizontal or vertical direction. Figure 4.2 PSNR vs. bit rate of left and right view for simulcast coding (baseline profile) 36

38 Multi View Coding The RD-characteristics used for the optimization of MVC approach is shown in Figure 4.3. The sequences have been coded using a QP range from 18 to 44 with a step size of one. Sequences matching the bit rates defined in Table 4.3 have been taken as test stimuli. A comparison to simulcast coding shows that the coding gain increases. The differences between the sequences are similar. A high gain can be found for the Butterfly sequence. This is related to the similarity of the two artificial views that enables an efficient interview prediction. Figure 4.3 PSNR vs. bit rate of left and right view for mvc coding (baseline profile) Mixed Resolution Stereo Coding To determine optimal bit rate distribution between the views of the mixed resolution method, the approach suggested in [12] and section was used. Thus the shown PSNR was calculated from the average MSE of the full and the up-sampled low resolution view. To take binocular suppression theory into account the down- and up-sampled original view was taken as reference for the up-sampled low resolution view. Hence the PSNR calculated this way only evaluates the coding quality and not the overall quality. The left view and the down-sampled right view have been coded with QPs from 18 to 44 with a step size of 2. Coding results are shown in Figure 4.5. Each point represents a QP-combination for the left and the down-sampled right view. The optimal QP-combinations can be found on the envelope of these points. Sequences matching the bit rates defined in Table 4.3 and coded with optimal QP combinations have been taken as test stimuli. Therefore also coding with intermediate QP-combinations has been done if necessary. 37

39 Figure 4.4 PSNR vs. bit rate of left view and and down-sampled right view for MRSC Coding (baseline profile) 38

40 Figure 4.5 PSNR vs. bit rate of left view and depth for V+D Coding (baseline profile) 39

41 Video + Depth Coding Coding results for V+D coding are shown in Figure 4.5. The PSNR was calculated from the average MSE of the left and the rendered right view. The MSE of the rendered right view was calculated taking the rendered right view from uncoded data as reference. Rendering artifacts already existing in the uncoded data are neglected with this approach. Hence the PSNR calculated this way only evaluates the coding quality and not the overall quality. The left view has been coded with QPs from 18 to 44 and a step size of 2. For depth QPs from 8 to 44 or 18 to 44 depending on the sequence have been used with a step size of 2. Each point in Figure 4.5 represents a QP-combination for the left view and depth. The optimal QP-combinations can be found on the envelope of these points. Sequences matching the bit rates defined in Table 4.3 and coded with optimal QP combinations have been taken as test stimuli. Therefore also coding with intermediate QP-combinations has been done, if necessary High Profile Figure 4.6 to Figure 4.9 show the coding results for the high profile. Typical characteristics of the sequences are similar to the baseline profile case. For all sequences a high coding can be achieved by using the high profile with hierarchical B-pictures and CABAC. A comparison of high and base profile is presented separately for each sequence in section Simulcast Figure 4.6 PSNR vs. bit rate of left and right view for simulcast coding (high profile) 40

42 Multi View Coding Figure 4.7 PSNR vs. bit rate of left and right view for MVC coding (high profile) 41

43 Mixed Resolution Stereo Coding Figure 4.8 PSNR vs. bit rate of left view and and down-sampled right view for MRSC Coding (high profile) 42

44 Video+Depth Coding Figure 4.9 PSNR vs. bit rate of left view and depth for V+D Coding (high profile) 43

45 4.1.3 Generated Test Stimuli Table 4.4 to Table 4.9 show PSNRs and bit rate distribution of the resulting test stimuli. The total PSNR was calculated using the MSE of the single left and right views. Note that for MRSC the PSNR of the right view was calculated using the uncoded up- and down-sampled right view as reference. Therefore PSNR values are marked with pluses. For V+D coding the PSNR of the right view was calculated using the rendered right view from uncoded data as reference, PSNR values are marked with asterisks. Method PSNR-Y both views [db] PSNR-Y left view [db] PSNR-Y right view [db] Bit rate right / Bit rate total Base Profile Low Rate (74 kbit/s) Simulcast % MVC % MRSC % V+D 37.3* * 23% High Rate (160 kbit/s) Simulcast % MVC % MRSC % V+D 39.5* * 33% High Profile Low Rate (46 kbit/s) Simulcast % MVC % MRSC % V+D 37.3* * 26% High Rate (99 kbit/s) Simulcast % MVC % MRSC % V+D 39.4* * 21% Table 4.4 Properties of test stimuli of sequence Bullinger Table 4.4 shows the coding results for Bullinger. Interview prediction leads to bit rate savings of about 24% for the right view and significant PSNR gains for MVC. Optimal distribution of the bit rate for MR coding reaches from 32%-38% for the down-sampled right view. Depth can be coded at approximately 20%-30% of the total bit rate and leads to the best quality of the left 44

46 view. A comparison of the baseline and the high profile shows, that the high profile enables bit rate savings of about 40% while the quality for the Simulcast, MRSC and VD coding remains unchanged. Method PSNR-Y both views [db] PSNR-Y left view [db] PSNR-Y right view [db] Bit rate right / Bit rate total Base Profile Low Rate Base Profile (143 kbit/s) Simulcast % MVC % MRSC 33.8* * 45% V+D % High Rate Base Profile (318 kbit/s) Simulcast % MVC % MRSC 39.1* * 38% V+D % High Profile Low Rate (94 kbit/s) Simulcast % MVC % MRSC 33.3* % V+D % High Rate (212 kbit/s) Simulcast % MVC % MRSC 38.8* * 39% V+D % Table 4.5 Properties of test stimuli of sequence Butterfly The results for the Butterfly sequence are shown in Table 4.5. Due to the synthetic character of the sequence and the similarity of both views, interview prediction is very efficient. About 50% of bit rate compared to simulcast can be saved for the right view. The bit rate of the down-sampled right view ranges from 38% to 49% of the total bit rate. The optimal bit rate for depth is from about 8% to 29% of the total rate. The bit rates of sequences coded with high profile are about 33% lower than for sequences coded with the baseline profile. Performance of MVC and MRSC decreases, but performance of V+D increases. 45

47 Method PSNR-Y both views [db] PSNR-Y left view [db] PSNR-Y right view [db] Bit rate right / Bit rate total Base Profile Low Rate (130 kbit/s) Simulcast % MVC % MRSC 35.1* * 46% V+D % High Rate (378 kbit/s) Simulcast % MVC % MRSC 39.4* * 42% V+D % High Profile Low Rate (112 kbit/s) Simulcast % MVC % MRSC 35.2* * 42% V+D % High Rate (323 kbit/s) Simulcast % MVC % MRSC 39.7* * 34% V+D % Table 4.6 Properties of test stimuli of sequence Car Table 4.6 depicts the results for the sequence Car. MVC and MRSC results in bit rate savings of approx. 20% for the right view. Depth can be coded efficiently and needs only about 7% to 16% of the total bit rate. The bit rate of sequences coded with the high profile is about 14% lower as for sequences coded with the baseline profile. The quality remains for all methods approximately equal. Thus the gain achieved by using the more complex coding structure is relatively low. 46

48 Method PSNR-Y both views [db] PSNR-Y left view [db] PSNR-Y right view [db] Bit rate right / Bit rate total Base Profile Low Rate (160 kbit/s) Simulcast % MVC % MRSC 31.3* * 41% V+D % High Rate (450 kbit/s) Simulcast % MVC % MRSC 37.1* * 33% V+D % High Profile Low Rate (104 kbit/s) Simulcast % MVC % MRSC 31.2* * 41% V+D % High Rate (284 kbit/s) Simulcast % MVC % MRSC 37.0* * 29% V+D % Table 4.7 Properties of test stimuli of sequence Horse Properties of the sequence Horse are provided in Table 4.7. MVC and MRSC lead to a rate for the right view of approximately 30% to 40% of the total bit rate. Depth can be coded at about 9% to 18% of the total bit rate and leads to a high quality of the left view. The high profile enables bit rate savings 35% of compared to the base profile at approximately equal quality. 47

49 Method PSNR-Y both views [db] PSNR-Y left view [db] PSNR-Y right view [db] Bit rate right / Bit rate total Base Profile Low Rate (104 kbit/s) Simulcast % MVC % MRSC 31.7* * 33% V+D % High Rate (367 kbit/s) Simulcast % MVC % MRSC 37.0* * 30% V+D % High Profile Low Rate (78 kbit/s) Simulcast % MVC % MRSC 32.7* * 29% V+D % High Rate (208 kbit/s) Simulcast % MVC % MRSC 37.3* * 28% V+D % Table 4.8 Properties of test stimuli of sequence Mountain Table 4.8 shows coding results for the Mountain sequence. Regarding the bit rate distribution of simulcast coding, it can be seen that the right view is slightly less compressible than the left view. MVC achieves gains up to 1 db compared with simulcast. Nevertheless interview prediction is not efficient for the high rate and baseline profile. The MRSC leads to a distribution of bit rate of about 30% for the down-sampled right view. Depth can be coded at 13% to 19% of the total bit rate. For the low rates up to 25% of bit rate can be saved with the high profile and slightly better quality. At the high rate a saving of 40% is achieved. 48

50 Method PSNR-Y both views [db] PSNR-Y left view [db] PSNR-Y right view [db] Bit rate right / Bit rate total Base Profile Low Rate (159 kbit/s) Simulcast % MVC % MRSC 34.4* * 37% V+D % High Rate (452 kbit/s) Simulcast % MVC % MRSC 39.4* * 36% V+D % High Profile Low Rate (134 kbit/s) Simulcast % MVC % MRSC 34.4* * 34% V+D % High Rate (381 kbit/s) Simulcast % MVC % MRSC 39.4* * 34% V+D % Table 4.9 Properties of test stimuli of sequence Soccer2 Coding results for the Soccer2 sequence are presented in Table 4.9. For MVC PSNR gains up to 1.5 db can be reached. Bit rate of the right view reaches from 34% to 43% for MVC and MRSC. The depth can be coded at 10% to 15% of the total bit rate and enables gains up to 2.7 db for the left view. Coding with the high profile leads to bit rate saving of 15% at approximately the same quality Conclusion For subjective evaluation of coding methods 96 test stimuli have been generated from the six sequences of the coding test set using Simulcast, Multi View, Mixed Resolution and Video + Depth coding. A baseline and a high codec profile were used. An objective evaluation was carried out at a low and a high quality level. 49

51 MVC results in a higher PSNR compared to simulcast. Using V+D and Mixed Resolution coding the PSNR of the left view increases compared to the simulcast approach. Nevertheless quality of the rendered right view, the down-sampled view and the overall quality of both views is questionable since rendering artifacts and image distortions introduced by down-sampling cannot be evaluated using the PSNR. Therefore the large scale subjective test is needed. Results of this test are reported in the upcoming deliverable D4.3 Results of quality attributes of coding, transmission and their combinations. Coding using the high profile generates sequences at approximately the same quality level, but with bit rate savings from 15% to 50%. 4.2 Test sequences for transmission studies Additional to the examination of the coding approaches a study on transmission approaches was carried out. This section deals with the preparation of test stimuli for this study. The focus is set on the coding part. Apart from the slice mode, error resilience strategies are discussed in the upcoming deliverable D3.4 ( Stereo DVB-H broadcasting system with error resilient tools ) Test setup For the transmission studies coded sequences using the Simulcast, Multi View and Video plus Depth coding have been generated. The coding approaches have been optimized at rate points with a high video quality. The baseline codec profile has been used. The four sequences from the transmission test set of the stereo video database [9] have been used. Furthermore sequences have been coded with and without using the newly implemented slice mode. This leads to a total number of 3 (approaches) x 4 (sequences) x 2 (slice mode)= 24 test stimuli Coding Approaches The Simulcast, Multi View and the V+D coding approaches as described in section have been used. Due to low performance of the simple Mixed Resolution approach in the subjective coding test (see D4.3), this approach was omitted. The Advanced Mixed Resolution Stereo Coding (AMRSC) approach has not been available at the time of the preparation of the test sequences Test set A test set for transmission studies was defined and will be reported in the next update of the stereo video database [9]. The sequences are shown in Figure Details are presented in Figure The sequences RhineValleyMoving, Knights Quest and HeidelbergAlleys consist of different scenes with varying movement and complexity. All sequences have a length of 60 seconds and are available with audio. Table 4.10 Properties of sequences from the coding test set Sequence Genre Movement Complexity Frame Size in Camera Object Structural Depth Rate pixels RollerBlade Sports None High Medium Medium x240 User Created RhineValleyMoving Action High High Medium Low x240 KnightsQuest Animation Various Various Low Low x240 HeidelbergAlleys Documentary Low Low High Various x240 50

4 Quality level The coding approaches have been evaluated at a high quality point.

simulcast coding to 30. This results in a target bit rate for each sequence from coding test set.

52 RollerBlade1 RhineValleyMoving KnightsQuest HeidelbergAlleys Figure 4.10: Sequences of the transmission test set Codec Profile The transmission study has been carried out using the baseline profile shown in Table 4.2. For all approaches the MVC reference Software JMVC has been used. Interview prediction was not used for simulcast and V+D coding Quality level The coding approaches have been evaluated at a high quality point. Individual target bit rates for each sequence have been found with the approach described in section To define a high quality for all sequences of the transmission test set it was chosen to set a quantization parameter (QP) of the codec for simulcast coding to 30. This results in a target bit rate for each sequence from coding test set. Furthermore bit rates should not exceed 600 kbit/s. Therefore it was necessary to set a QP of 33 for the RollerBlade sequence. Resulting bit rates are shown in Table 4.11 and have been used as target rates for the other two approaches and the slice mode. Table 4.11: Target bit rates in kbit/s RollerBlade1 RhineValleyMoving KnightsQuest HeidelbergAlleys

Final report on coding algorithms for mobile 3DTV. Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin

Final report on coding algorithms for mobile 3DTV Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin MOBILE3DTV Project No. 216503 Final report on coding algorithms for mobile 3DTV Gerhard