A Study on Structural Similarity Based Interframe Video Coding

Project report for EE5359 Multimedia Processing, summer 28, DR. K.R. RAO Att Kruafak, att.kruafak@uta.edu, SID 32488 Submitted draft July 8, 28, final submitted July 3, 28. A Study on Structural Similarity Based Interframe Video Coding Att Kruafak ABSTRACT The goal of the research is to study rate-distortion behavior of the interframe codec designed with novel motion estimation based on structural similarity distortion and the codec with conventional motion estimation based on pixel error distortion (absolute differences). The study from previous literature shows that structural similarity metric provides better image assessment than pixel error based metric (mean square error and peak signal-to-noise ratio). The codec with fixed motion block sizes (6x6), (8x8), and (4x4), will be investigated. Simulation results such as rate-distortions will be measured in terms of both peak signal-to-noise ratio versus bit rate and structural similarity index versus bit rate.. INTRODUCTION In many video coding standards, including H.264/AVC [,2,3], distortion calculation in motion estimation part is based on pixel error metric (for example, absolute differences, MSE). However, pixel error metric is not designed to match with human visual perception. The disadvantages of error metric (MSE) can be found in [4,5,6,7]. Structural similarity (SSIM) index is a new objective image quality metric, introduced in [5] for image quality assessment. The research applies SSIM index for distortion calculation to estimate the motion vector during motion search process. The codec proposed in [8] will be used in the simulation. The reason is that it is less complicated to handle several components inside the codec. Rate-distortion results of different fixed-motion partition sizes will be observed. Perceived reconstructed video frame will be observed both at low bit rates and at high bit rates by varying the quantization parameter. Recently, two papers [9, ] introduced SSIM in motion estimation of H.264 codec. The research uses the codec based on [8]. The codec is modified with single size motion estimation to observe the effect on each motion block size: (6x6), (8x8), and (4x4). SSIM metric is implemented in motion estimation part. MATLAB implementation of SSIM can be obtained from []. Huffman codebooks for P-frames are trained with five test sequences. Simulations on four test video sequences with group of pictures as IPP are performed. Rate-distortion results in terms of SSIM and bit rate are compared with conventional PSNR and bit rate. Low bit rate (low visual quality) and high bit rate (high visual quality) encoding regions will be observed and investigated. The report is organized as follows. Section 2 provides the background on SSIM, video quality assessment, and SSIM used in video codec literature. Section 3 discusses the interframe codec simulation with reference to frame interpolation, motion vector search process, motion block matching calculation, bit stream format, and entropy codebook training. Simulation results are discussed in Section 4. Section 5 provides summary and future works.

2 2. BACKGROUND 2. Definition of Structural Similarity Index This section is based on [5,6]. Structural similarity (SSIM) index [5] is a new objective method to measure image quality between a distorted image and a reference image. The SSIM measures the degradation of structural information based on the assumption that human visual system characteristics are adopted for extracting structural information from an image scene. The authors claim that SSIM has better consistency with perceived image quality than pixel error model (absolute differences and MSE) based on different performance evaluations. The author [5] provides the following observations on nature of natural images and human visual characteristics to implement SSIM locally.. Natural images are spatially non-stationary in general. 2. Image distortions can be space-variant. The distortions may or may not depend on local image statistics. 3. Only a local area in the image can be perceived by humans at one time instance, regarding to HVS. 4. Localized quality measurement of the image provides a spatially varying quality map (called SSIM map) of the image which provides more quality degradation information of the image. SSIM is defined as the product of three local quantities: luminance comparison (function of mean), contrast comparison (function of variance), and structure comparison (function of correlation coefficient and variance) ( ) = ( ) ( ) ( ) α β γ SSIM x, y l x, y c x, y s x, y (2.) 2μ μ + C x y (, ) = 2 2 l x y μ + μ + C x y x y 2, ( ) c x, y 2σ σ + C = σ + σ + C 2 2 x y 2 xy 3, ( ) s x, y 2σ + C = σσ + C x y 3 N μ = x xi N i= N 2 σ ( ) 2 x = xi μ x N i= N σ xy = ( xi μx)( yi μ y ) N i= where, x and y are two local window sizes ( N N) of the same spatial location from two images x is the i th pixel of x, where i =,, N i μ x is the mean of x 2 σ x is the variance of x σ is the covariance of x and y xy α = β = γ = for simplicity

C,, and C are constants to prevent the numerator of each term being close to zero (to C2 3 prevent an unstable measurement). ( ) 2 2 C2 C = KL, C2 = ( K2L), C 3 = 2 K =., K 2 =.3 L = 255 is the dynamic range of pixel values (for 8-bit per pixel component), Properties of SSIM index [5,6],. Symmetry: SSIM ( x, y) = SSIM ( y, x ) 2. Boundedness: SSIM ( xy, ) 3. Unique maximum: SSIM ( xy, ) = if and only if x for all i =, 2,, N ) 3 = y (in discrete representations, xi = yi Mean SSIM (MSSIM) index for a single overall quality value of the entire image, is defined as the average of the SSIM map, M MSSIM x, y = SSIM x, y (2.2) where, image. ( ) ( M j = x j is j th local window (N N) of an image. M is the number of local windows of the SSIM is applied in two areas of this research, i.e., motion block matching distortion calculation, and reconstructed video quality measurement. j j ) 2.2 Video Quality Assessment Using SSIM The research measures the perceived video quality based on the method introduced in [6] with modifications. Local window weighting value [6] is set as. Frame weighting [6] is set as constant for simplicity. In this research, local window size is (x) for all three video components. In calculation, each local block moves pixel by pixel and gives an SSIM value (2.). SSIM of each video component, Y-SSIM, Cb-SSIM, and Cr-SSIM is calculated independently, in (2.3) (2.5). Y-SSIM = F M Y SSIM pj F M p = j = (2.3) Cb-SSIM = F M Cb SSIM pj F M p = j = (2.4) Cr-SSIM = F M Cr SSIM pj F M p = j = (2.5) Y where, SSIM pj Cb, SSIM pj Cr, and SSIM pj are the SSIM index values (2.) (SSIM map) of Y, Cb, and Cr components of the jth local window in the pth video frame. F is the number of video frames. M is the number of local windows.

4 RGB-SSIM (SSIM of a reconstructed true color video), is calculated as following. Reconstructed video sequence in YCbCr 4:2: format is converted to YCbCr 4:4:4. Cb and Cr components are upsampled by 2 and then are convolved with 2-D zero-order hold interpolation filter. RGB components are obtained from YCbCr-to-RGB transform matrix on each YCbCr 4:4:4 ith pixel, in (2.6) [24,25] ri.64.596 yi 6 g i.64.392.83 cb i 28 = b i.64 2.7 cr i 28 (2.6) The block diagram of color format conversion is shown in Fig. 2.. Sampling formats of YCbCr video component, such as, 4:2:, 4:4:4, 4:2:2, 4::, and 4:: (gray scale) used in video coding standards are shown in Fig. 2.2. SSIM of each R, G, and B component are calculated independently by (2.). The research proposes measuring RGB-SSIM by averaging the three components, in (2.7) F M = = F M R G B RGB-SSIM = SSIM pj + SSIM pj + SSIM pj p j 3 3 3 (2.7) R G B where, SSIM pj, SSIM pj, and SSIM pj are SSIM index values (2.) of R, G, and B components of the jth local window in the pth video frame. F is the number of video frames. M is the number of local windows. 2.3 Structural Similarity Index in Video Codec Design Two papers [9,] applied SSIM in JM/H.264 reference software. In [9], the author claims that motion estimation method is improved using distortion based on SSIM (MEBSS, motion estimation method based on structural similarity). Both bit rate and coding time are reduced. The simulation is implemented only at QP =, very low quantization parameter value (high quality, high bit rate). Lagrange cost for motion block matching metric is changed from absolute differences to ( SSIM) with full search motion estimation. SSIM is computed with local block size (4 4) non-overlapped windows. In [], the authors propose more techniques to reduce the coding time in motion estimation called FMEBSS, fast MEBSS. SSIM is used in early termination algorithm (with fixed threshold value) and Lagrange cost where absolute differences is changed to K ( SSIM). K is the author defined multiplier. The simulation is done at QP =, 2, and 3. The codec achieves bit rate and coding time reductions.

5 Y Y R 2 3 4 Cb G 5 6 7 Cr B 8 2 3 4 5 6 7 8 9 4:2: 4:4:4 4:4:4 Figure 2. Color format conversion for YCbCr 4:2: to RGB 4:4:4, conversion block diagram, components of Mobile CIF (288x352)-pixel during conversion process, from the left, YCbCr 4:2:, YCbCr 4:4:4, RGB 4:4:4

6 2 4 6 8 2 4 5 5 2 25 3 35 4 45 5 Y Cb Cr 4:4:4 2 4 6 8 2 4 5 5 2 25 3 35 4 45 5 Y Cb Cr 4:2: 2 4 6 8 2 4 5 5 2 25 3 35 4 45 5 Y Cb Cr 4:2:2 (c) 2 4 6 8 2 4 5 5 2 25 3 35 4 45 5 Y Cb Cr 4:: (d) Figure 2.2 Sampling formats on a block of (4x4) pixels of YCbCr video sequence (left side) and (right side) the corresponding components of Foreman QCIF (44x76), frame, 4:4:4, 4:2:, (c) 4:2:2, (d) 4::

7 3. CODEC SIMULATION FOR INTERFRAME CODING (IPP ) In simulation, the codec is implemented in MATLAB. Intraframe coding part is based on [8]. For interframe coding, the codec uses the following coding tools. Variable size integer transforms are used with {(6x6),(8x8),(4x4)} for luma block, and {(8x8),(4x4),(2x2)} on chroma block. Reference frame interpolation for fractional-pixel motion vector, is used on both luma and chroma reference frames similar to H.264. Fast motion estimation (similar to JM H.264 reference software [2,3]) with UMHexagon motion search for integer pixel resolution and CBFPS motion search for fractional pixel resolution, is used [4,5,6,7]. Huffman entropy encoder is used to obtain the encoded bit stream. The block diagrams of the codec are shown in Figs. 3. 3.2. The codec processes an input CIF video sequence in YCbCr 4:2: format with 8 bits per pixel. Frames are encoded as IPP. The detail of each encoding tools are provided in the next sections. Figure 3. The proposed video encoder block diagram (MB: Macroblock, BSS: Block size selection, pel: pixel)

8 Figure 3.2 The proposed video decoder block diagram 3. Reference Frame Interpolation (H.264-Like) The codec uses quarter-pixel interpolation on luma reference frame and /8-pixel interpolation on chroma reference frame (Fig. 3.). For a luminance reference frame, half-pixel positions are, first, derived. Then, the quarter-pixel positions are derived. For a chroma reference frame, linear interpolation is applied directly to integer-pixel positions to obtain /8-pixel position. Frame boundary reflection is used for the missing pixel interpolation near the frame boundary. 3.. Quarter-pixel interpolation on luminance reference frame Half-pel interpolation is performed first. The luma frame is upsampled by 2. Then, perform convolutions with the half-pel interpolation filer [,, -5,, 2,, 2,, -5,, ]/32 along the row and the column directions [,8,9]. Then, apply rounding and clipping at 255 (maximum pixel value for 8 bpp). For quarter-pixel interpolation, the half-pel frame is upsampling by 2. Then, interpolate by bilinear filter [, ]/2, [,8,9]. The interpolated pixel values are rounding and clipping at 255. Fractional pixels in between integer pixels are shown in Fig. 3.3. + a c + d e f g i k n p q r + + Figure 3.3 Fractional-pixel positions among integer-pixel positions at the four corners, + is an integer-pel position, is a half-pel position, {a,c,d,e,f,g,i,k,n,p,q,r} are the quarter-pel positions

9 3..2 /8-pixel interpolation on chroma reference frame The process for /8-pel interpolation is following. A chroma reference frame is upsampled by 8 (Fig. 3.4). Frame boundary reflection is used for interpolation at the frame boundary. Then, linear interpolation of neighboring integer pixels is performed in (3.), [,8]. A + + + + + + + B + + + + + + + + + + + + a + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + C + + + + + + + D Figure 3.4 Fractional-pixel position of chroma reference frame, {A,B,C,D} are integer-pel position, + is the pixel with upsampling by 8, {a} is a fractional pixel to be interpolated. ( ( dx)( dy) dx( dy) ( dx) dy dxd y ) a = round 8 8 A + 8 B + 8 C + D / 64, (3.) where, a is the fractional pixel to be interpolated. {A,B,C,D} are neighboring integer pixels. d x is pixel distance from the left of the pixel a. d y is pixel distance from the top of the pixel a (in Fig. 3.4, d x is 3, and d y is 2). Figures 3.5 3.7 show the interpolation results of luma and chroma reference frames of Football test sequence.

5 5 5 2 6 7 8 25 9 5 5 2 25 3 35 6 7 8 9 2 2 2 3 4 5 2 3 4 5 6 7 8 2 3 4 5 6 7 32 34 36 38 4 2 2 4 6 8 22 24 26 28 3 32 34 36 2 4 6 8 2 4 65 7 75 8 (c) Figure 3.5 Interpolation on the reconstructed luma frame 5 of Football CIF, QP = 3. Left column is full frame, right column is zoomed, reconstructed frame 5, integer-pel resolution, half-pel interpolation, (c) quarter-pel interpolation

2 4 6 8 2 4 5 5 25 3 35 4 8 9 6 2 4 6 8 8 2 22 24 26 28 3 32 2 4 6 8 2 4 34 65 7 75 8 Figure 3.6 Interpolation on the reconstructed chroma Cb frame 5 of Football CIF, QP = 3. Left column is full frame, right column is zoomed, reconstructed frame 5, integer-pel resolution, eighth-pel interpolation

2 2 4 6 8 2 4 5 5 25 3 35 4 8 9 6 2 4 6 8 8 2 22 24 26 28 3 32 2 4 6 8 2 4 65 7 75 8 Figure 3.7 Interpolation on the reconstructed chroma Cr frame 5 of Football CIF, QP = 3. Left column is full frame, right column is zoomed, reconstructed frame 5, integer-pel resolution, eighth-pel interpolation

3 3.2 Motion Vector Search Method (JM-Like) In block-based motion estimation, motion vector is obtained by searching for a reference luma block that gives the best match with the current block within a finite search window in a luma reference frame. Search point is defined as a position of a search block on a reference frame. The search method of the interframe codec is similar to the search method used in JM reference software (without early termination, for simplicity), that is, UMHexagonS (hybrid unsymmetricalcross multi-hexagon-grid search) for integer-pel motion estimation, and CBFPS (center-biased fractional-pel search) for fractional-pel motion estimation [4,5,6,7]. This is to reduce the number of search points compared to a full search. Search range of the proposed codec is set as [ 32,+32] pixels (with the same search range, full search requires,89 searching points). 3.2. Initializing motion vector mv Pred Before motion search begins, motion vector is predicted from median of motion vectors of neighboring sub-blocks, mv, mv, and mv in (3.2). A B C ([ ] ) mvpred = median round mva, mvb, mvc / 4, (3.2) where {A, B, C, D} are neighboring sub-blocks. Procedure to obtain the predicted motion vector is shown as flow chart in Fig. 3.8. mv Pred is in integer-pel resolution. Examples of neighboring subblocks and macroblocks of different sub-block sizes are shown in Fig. 3.9. Start Get sub-block positions D B C X is a current sub-block A X {A,B,C,D} are neighboring sub-block Sub-block A exists? Yes No mv A =[ ] mv A Sub-block B exists? No Replace mv B with mv A Yes mv B Sub-block C exists? No Sub-block D exists? No Replace mv C with mv A Yes Yes Replace mv C with mv D mv C mv Pred =median[round(mv A,mv B,mv C )/4] Stop Figure 3.8 Flow chart for predicting the initial motion vector from neighboring motion vectors

4 (c) Figure 3.9 Neighboring sub-blocks and macroblocks to predict the initial motion vector before motion search of motion estimation, of different sub-block sizes, (6x6), (8x8), (c) (4x4)

5 Integer-pel search is as follows. Flow chart of integer-pel search is shown in Fig. 3.. Step, search point initializing. Select either motion vector (,) (the same position as the current block) or (median of motion vectors from neighboring macroblocks) that gives mv Pred lower distortion as the starting search. Step 2, asymmetrical cross search consists of 6 horizontal and 8 vertical search points, excluding the center position (Fig 3.) for motion block sizes {(6x6),(8x8)}. Step 3, uneven multi-hexagon-grid search (Fig 3.) for motion block sizes {(6x6),(8x8)}, consists of two steps, Step 3., small full local search covers [±2, ±2] pixels (24 search points). Step 3.2, uneven-6-point-multi-hexagon-grid search consists of four hexagon search (64 search points). Step 4, extended hexagon-based search (EHS) (Fig. 3.(c)) for all motion block sizes {(6x6),(8x8),(4x4)}, consists of two steps, Step 4., hexagon based search (HEXBS) [2] consists of 6 search points with 3 more search points on every extended search. Step 4.2, diamond search (DS) [2] or small hexagon search, consists of 4 search points. Figure 3. Flow chart of integer-pel search

6 5 y (row index) 5-5 - -5-5 - -5 5 5 x (column index) 5 y (row index) 5-5 - -5-5 - -5 5 5 x (column index) 4 y (row pixel index) 2-2 -4-4 -2 2 4 x (column pixel index) (c) Figure 3. UMHexagonS motion search pattern on the integer-pixel luma reference frame (hybrid Unsymmetrical-cross Multi-Hexagon-grid Search) [4], asymmetrical cross search, uneven multi-hexagon-grid search, (c) extended hexagon-based search (EHS)

7 Fractional-pel search [4] is as follows. Flow chart of fractional-pel search is shown in Fig. 3.2. Step, full sub-pel search for motion block size (6x6), consists of 6 search points (Fig. 3.3). Step 2, diamond search (fractional-pel fast search) for motion block sizes {(8x8),(4x4)}, consists of 4 search points with 3 more search points on every extended search (Fig. 3.3). Figure 3.2 Flow chart of fractional-pel search y (row pixel index).5 -.5 y (row pixel index).5 -.5 - - -.5.5 x (column pixel index) - - -.5.5 x (column pixel index) Figure 3.3 CBFPS motion search pattern on a fractional-pixel interpolated luma reference frame (Center-Biased Fractional Pel Search) [4], fractional-pel full search, fractional-pel fast search, diamond search

8 3.3 Block Matching Using SAD And SSIM In block-based motion estimation, motion vector is obtained by searching for a reference luma block (of a reference frame) that gives the best match with the current block (of a current frame) (Fig. 3.4) within a search window. Best match is defined as the minimum distortion between the current block of the current frame and a displaced block in the reference frame within a search window. In the codec simulation, motion vector,, is estimated on a luma mv chroma mv luma block. Motion vector of the corresponding chroma block,, is derived from the luma motion vector in (3.3) for 4:2: sampling format of video sequence. An example of motion compensated frame for (6x6) motion block prediction of Football sequence is shown in Fig. 3.5. mv 2 luma mv chroma =, (3.3) Figure 3.4 Block-matching motion prediction The proposed research considers two types of distortions, the conventional SAD (sum of absolute difference), and SSIM. 3.3. SAD distortion calculation in (3.4) SAD is widely used in the block-based codec. SAD of each block matching is calculated SAD N N N Ref, (3.4) n= m= (,;, ) = ( +, +, ) ( + +, + +, ) D k l i j y k m l n p y k m i l n j p where SAD D N (k,l;i,j) is the distortion value from SAD of a current block at (k,l) position (top-left corner pixel location) and a reference block at (k+i,l+j) position, size (NxN). The displacement vector of a reference block is determined by (i,j). p and p indicate the frame number. A motion vector, mv N (k,l) of a current block is the displacement vector (i,j ) with the minimum distortion value D N (k,l), in (3.5) (3.6). In the simulation, motion block size N is 4, 8, and 6. SAD DN ( k, l) = min DN ( k, l; i, j) ( i, j), (3.5) mv k, l = i, j, (3.6) N ( ) ( )

9 3.3.2 SSIM distortion calculation For SSIM distortion, the distortion is defined in (3.7). The minimum distortion is and the maximum distortion is, D k,;, l i j = SSIM, (3.7) SSIM ( ) N SSIM of a sub-block is calculated based on (2.). Local window size is the same size as sub-block. Simulation is performed on sub-block sizes (6x6), (8x8), and (4x4) individually. A motion vector, mv N (k,l) of a current block is the displacement vector (i 2,j 2 ) with the minimum distortion value D N (k,l), in (3.8) (3.9). ( ) D k, l = min D k, l; i, j, (3.8) N N ( i, j) SSIM ( ) N (, ) (, ) mv k l = i j, (3.9) 2 2

2 Y Cb Cr 2 3 4 5 6 7 8 2 3 4 5 6 7 Figure 3.5 An example of motion estimation for Football CIF 4:2:, QP=3, SSIM-based codebook, SSIM-motion prediction, fixed-(6x6) motion block. From the top row, reconstructed reference frame 5, motion-predicted frame 5, and current frame 5. From left column, Y, Cb, and Cr components

2 3.4 Bit Stream Format Output from the transform encoding of the residual block is mapped into bits by Huffman entropy encoder. The proposed bit stream format for interframe encoding is given in Table 3.. coeffbss (block size selection bits), DC (composite dc coefficients), NZ (non-zero values from run-length encoder), RUN (number of zero values from RLC), and EndOfNZ are the coefficients from the transform encoding block. End of non-zero symbol EndOfNZ (bit ) in the bit stream is to indicate the end of each sub-block. Bit stream format objective is to ensure that the bit stream can be decoded correctly. Table 3. The bit stream format of the codec for a macroblock of P frame Macroblock type Luma P-MB Chroma P-MB Bit stream format mv X mv Xq, mv Y mv Yq, coeffbss, DC DC n, NZ RUN NZ i RUN i EndOfNZ, NZ RUN NZj RUN j EndOfNZ coeffbss, DC DC m, NZ RUN NZ k RUN k EndOfNZ, NZ RUN NZ p RUN p EndOfNZ 3.5 Codebook Training of P-Frame In the simulations, Huffman codebooks are trained with five test sequences (Bus, City (ABC), Crew (NASA), Foreman, and Soccer) (Fig. 3.6). Video test sequences (in YUV raw format) can be obtained from [22]. Motion vectors, dc coefficients and coefficients of different transform sizes from RLC are collected. Histograms for all coefficient types are created. Huffman codebooks are then obtained from Huffman trees of the histograms. Huffman tree method can be found in [23]. List of codebook names of the simulated codec is given in Table 3.2. Two codebook sets are created based on distortion calculation methods, that is, SAD-based codebook (motion prediction based on SAD), and SSIM-based codebook (motion prediction based on SSIM distortion). Figures 3.7 3.22 show Huffman codebooks of different coefficients including the histogram, codebook range, size of the codebook, average code length and its entropy in bits per symbol, for SAD-based and SSIM-based codebooks.

22 True color Luma Cb 2 Cr 4 Cb Cr 6 Cb 8 Cr Cb Cr 2 Cb 4 Cr 2 3 4 5 6 7 8 Figure 3.6 Five CIF sequences for codebook training, from top row, Bus frame 2, City frame 4, Crew frame 5, Foreman frame, and Soccer frame. From left column, show true color, luma, Cb (top), and Cr (bottom) components

23 Table 3.2 Huffman codebook names and their descriptions for P-frame coding Codebook name Component type Description mv Luma Motion vector components both vertical (x or column) and horizontal (y or row) directions DC Luma & chroma DC coefficients for all macroblock types NZ_6x6_P-MB Luma Non-zero values for sub-block size 6x6 NZ_8x8_P-MB Luma & chroma Non-zero values for sub-block size 8x8 NZ_4x4_P-MB Luma & chroma Non-zero values for sub-block size 4x4 NZ_2x2_P-MB Chroma Non-zero values for sub-block size 2x2 RUN_6x6_P-MB Luma Number of zero values for sub-block size 6x6 RUN_8x8_P-MB Luma & chroma Number of zero values for sub-block size 8x8 RUN_4x4_P-MB Luma & chroma Number of zero values for sub-block size 4x4 RUN_2x2_P-MB Chroma Number of zero values for sub-block size 2x2 mv codebook [-288 248] size_36 symbols avg length_5.439 bits, (entropy_5.45 bits) mv count.5-2 - 2 Symbol value -4 [ ] -3 [ ] -2 [ ] - [ ] [ ] [ ] 2 [ ] 3 [ ] 4 [ ] DC codebook [-49 37] size_87 symbols avg length_.694 bits, (entropy_.54 bits) DC (inter) count.5-4 -2 2 Symbol value -4 [ ] -3 [ ] -2 [ ] - [ ] [] [ ] 2 [ ] 3 [ ] 4 [ ] Figure 3.7 Histograms and the corresponding Huffman codebooks (with SAD-based training) of, Motion vector, DC coefficient

24 NZ_(6 6)_P-MB codebk [-43 33] size_74 avg length_.8875 bits, (entropy_.773 bits) NZ_(6 6)_P-MB count.5-4 -2 2 Symbol value -4 [ ] -3 [ ] -2 [ ] - [] [ ] EndOfNZ [ ] 2 [ ] 3 [ ] 4 [ ] NZ_(8 8)_P-MB codebook [-2 24] size_43 avg length_.969 bits, (entropy_.8226 bits) NZ_(8 8)_P-MB count.5-2 - 2 Symbol value -4 [ ] -3 [ ] -2 [ ] - [ ] [] EndOfNZ [ ] 2 [ ] 3 [ ] 4 [ ] NZ_(4 4)_P-MB codebook [-8 8] size_37 avg length_.3638 bits, (entropy_.64 bits) NZ_(4 4)_P-MB count.5 - Symbol value (c) -4 [ ] -3 [ ] -2 [ ] - [ ] [] EndOfNZ [ ] 2 [ ] 3 [ ] 4 [ ] NZ_(2 2)_P-MB codebook [-2 2] size_5 avg length_.4 bits, (entropy_.846 bits) NZ_(2 2)_P-MB count.5-2 [ ] - [ ] [] EndOfNZ [ ] 2 [ ] -5 5 Symbol value (d) Figure 3.8 Histograms and codebooks (SAD-based) of NZ with block sizes, 6x6, 8x8, (c) 4x4, (d) 2x2

25 RUN_(6 6)_P-MB count.5 5 5 2 Symbol value RUN_(6 6)_P-MB codebk [ 232] size_225 avg length_3.9695 bits, (entropy_3.9287 bits) [ ] [ ] 2 [ ] 3 [ ] 4 [ ] 5 [ ] 6 [ ] 7 [ ] 8 [ ] RUN_(8 8)_P-MB codebook [ 62] size_63 avg length_3.522 bits, (entropy_3.465 bits) RUN_(8 8)_P-MB count.5 2 4 6 Symbol value [ ] [ ] 2 [ ] 3 [ ] 4 [ ] 5 [ ] 6 [ ] 7 [ ] 8 [ ] RUN_(4 4)_P-MB codebook [ 4] size_5 avg length_2.65 bits, (entropy_2.554 bits) RUN_(4 4)_P-MB count.5 5 Symbol value (c) [] [ ] 2 [ ] 3 [ ] 4 [ ] 5 [ ] 6 [ ] 7 [ ] 8 [ ] RUN_(2 2)_P-MB count.5-2 3 4 Symbol value (d) RUN_(2 2)_P-MB codebook [ 2] size_3 avg length_.38 bits, (entropy_.97388 bits) [] [ ] 2 [ ] Figure 3.9 Histograms and codebooks (SAD-based) of RUN with block sizes, 6x6, 8x8, (c) 4x4, (d) 2x2

26 mv codebook [-5 36] size_278 symbols avg length_4.883 bits, (entropy_4.7757 bits) mv count.5 - Symbol value -4 [ ] -3 [ ] -2 [ ] - [ ] [ ] [ ] 2 [ ] 3 [ ] 4 [ ] DC (inter) count.5 - -5 5 Symbol value DC codebook [- 76] size_55 symbols avg length_.565 bits, (entropy_.559 bits) -4 [ ] -3 [ ] -2 [ ] - [ ] [] [ ] 2 [ ] 3 [ ] 4 [ ] Figure 3.2 Histograms and the corresponding Huffman codebooks (with SSIM-based training) of, Motion vector, DC coefficient

27 NZ_(6 6)_P-MB codebk [-64 7] size_4 avg length_2.532 bits, (entropy_.9534 bits) NZ_(6 6)_P-MB count.5-5 5 Symbol value -4 [ ] -3 [ ] -2 [ ] - [] [ ] EndOfNZ [ ] 2 [ ] 3 [ ] 4 [ ] NZ_(8 8)_P-MB codebook [-39 5] size_73 avg length_2.534 bits, (entropy_2.2 bits) NZ_(8 8)_P-MB count.5-2 2 4 Symbol value -4 [ ] -3 [ ] -2 [ ] - [ ] [] EndOfNZ [ ] 2 [ ] 3 [ ] 4 [ ] NZ_(4 4)_P-MB codebook [-9 2] size_4 avg length_.425 bits, (entropy_.632 bits) NZ_(4 4)_P-MB count.5-2 Symbol value (c) -4 [ ] -3 [ ] -2 [ ] - [ ] [] EndOfNZ [ ] 2 [ ] 3 [ ] 4 [ ] NZ_(2 2)_P-MB count.5-5 5 Symbol value (d) NZ_(2 2)_P-MB codebook [-2 4] size_7 avg length_.23 bits, (entropy_.8366 bits) -2 [ ] - [ ] [] EndOfNZ [ ] 2 [ ] 3 [ ] 4 [ ] Figure 3.2 Histograms and codebooks (SSIM-based) of NZ with block sizes, 6x6, 8x8, (c) 4x4, (d) 2x2

28 RUN_(6 6)_P-MB count.5 5 5 2 Symbol value RUN_(6 6)_P-MB codebk [ 24] size_227 avg length_3.584 bits, (entropy_3.526 bits) [] [ ] 2 [ ] 3 [ ] 4 [ ] 5 [ ] 6 [ ] 7 [ ] 8 [ ] RUN_(8 8)_P-MB codebook [ 62] size_63 avg length_3.28 bits, (entropy_3.732 bits) RUN_(8 8)_P-MB count.5 2 4 6 Symbol value [] [ ] 2 [ ] 3 [ ] 4 [ ] 5 [ ] 6 [ ] 7 [ ] 8 [ ] RUN_(4 4)_P-MB codebook [ 4] size_5 avg length_2.534 bits, (entropy_2.4595 bits) RUN_(4 4)_P-MB count.5 5 Symbol value (c) [] [ ] 2 [ ] 3 [ ] 4 [ ] 5 [ ] 6 [ ] 7 [ ] 8 [ ] RUN_(2 2)_P-MB count.5-2 3 4 Symbol value (d) RUN_(2 2)_P-MB codebook [ 2] size_3 avg length_.2878 bits, (entropy_.9854 bits) [] [ ] 2 [ ] Figure 3.22 Histograms and codebooks (SSIM-based) of RUN with block sizes, 6x6, 8x8, (c) 4x4, (d) 2x2

4. SIMULATION RESULTS Simulations are performed on four CIF test sequences (Football, Harbour (Demografx), Mobile, and Stefan) [22] (Fig. 4.). The R-D (rate-distortions) in terms of Y-PSNR versus bit rate, RGB-PSNR versus bit rate, Y-SSIM versus bit rate, and RGB-SSIM versus bit rate for four cases are compared. Four combinations of codec with two types of trained codebooks (SAD-based and SSIM-based) and two types of block-matching distortion (SAD-based and SSIM-based) are simulated. Each simulation point of an R-D plot is obtained by changing the QP (quantization parameter). PSNR and SSIM are calculated from the decoded sequence. Bit rate in bits per second is calculated from the encoded bits and the given frame rate (3 frames per second). The results are shown in Figs. 4.2 4.3 for different measurements and for different motion block sizes {(6x6),(8x8),(4x4)}. 5. CONCLUSIONS AND FUTURE WORKS In the simulation, the codec is implemented and trained with SAD-based and SSIMbased motion estimations. Two sets of codebooks for interframe encoding are created. Then, the codec is tested by different sets of test sequences with motion prediction based on SAD and SSIM. The R-D results between the two types of distortions are shown. 5. Future Work on Interframe Coding RDO (rate distortion optimization) is not in interest (with the availability of the supported hardware). The research wants to conduct a clear view of the effect under the low complexity mode Segregate the mv codebook (motion vector) in three, regarding to predicted-block size {6,8,4}. This is for better see the effect. In practice codec, mv codebooks can be combined in as a single codebook with other encoding techniques to reduce the encoded bits. Use local window size same as sub-block size for the calculation of SSIM distortion (in block-matching motion estimation), without moving the local window. Sub-block size (6x6) uses local window size (6x6). Sub-block size (8x8) uses local window size (8x8). Sub-block size (4x4) uses local window size (4x4). Train codebook for P-MB of P-frame with five test sequences. Codebook training is time consuming process. (Similar to finding the best set of motion weighting factors of variable block size motion estimation) Run and compare the performance on another set of four test sequences with SSIMpredicted and SAD-predicted on IPP encoding structure. The evaluation details will be discussed in the following aspects.. Compare two measurements, RGB-PSNR and RGB-SSIM on different test images with the same coding parameters. 2. Compare (RGB-PSNR) two motion predictions, based on SAD and SSIM with the same motion block size and the same codebook training. 3. Compare the performance (RGB-PSNR) between SAD-based codebook and SSIM-based codebook, with the same type of motion prediction (either SAD or SSIM).

3 5.2 Future Work Timing Plan Codebook training: July 3 August 6, 28 Codec run: August 7 August 2, 28 Interpret the results and documentation: August 3 August 6, 28 True color Luma Cb 2 Cr 3 4 Cb 5 Cr 6 7 Cb 8 Cr 9 Cb Cr 2 3 4 5 6 7 8 Figure 4. Four CIF sequences used in the simulation, from top row, Football frame 5, Harbour frame, Mobile frame 8, and Stefan frame 5. From left column, show true color, luminance, Cb (top), and Cr (bottom) chrominance components

3 36 35 34 Y-PSNR in db 34 33 32 3 3 28 SADpred SADcb Football.5 2 2.5 Y-PSNR in db 33 32 3 3 28 27 SADpred SADcb Habour.5 2 2.5 34 35 Y-PSNR in db 32 3 28 26 SADpred SADcb Mobile.5 2 2.5 3 3.5 4 Y-PSNR in db 34 33 32 3 3 28 27 SADpred SADcb Stefan.5 2 2.5 3 (c) (d) Figure 4.2 Y-PSNR and bit rate comparison, motion block size (6x6) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

32.95.98.96 Y-SSIM.9.85 Y-SSIM.94.92.8 SADpred SADcb Football.5 2 2.5.9.88 SADpred SADcb Habour.5 2 2.5.97.96.96 Y-SSIM.94.92 Y-SSIM.95.94.93.9.88 SADpred SADcb Mobile.92.9.9 SADpred SADcb Stefan.5 2 2.5 3 3.5 4.5 2 2.5 3 (c) (d) Figure 4.3 Y-SSIM and bit rate comparison, motion block size (6x6) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

33 32 32 3 3 RGB-PSNR in db 3 28 27 26 25 SADpred SADcb Football.5 2 2.5 RGB-PSNR in db 3 28 27 26 25 SADpred SADcb Habour.5 2 2.5 3 28 3 RGB-PSNR in db 27 26 25 24 23 22 SADpred SADcb Mobile.5 2 2.5 3 3.5 4 RGB-PSNR in db 28 27 26 25 24 SADpred SADcb Stefan.5 2 2.5 3 (c) (d) Figure 4.4 RGB-PSNR and bit rate comparison, motion block size (6x6) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

34.9.96 RGB-SSIM.85.8 RGB-SSIM.94.92.75 SADpred SADcb Football.9.88 SADpred SADcb Habour.5 2 2.5.5 2 2.5.96.94.94 RGB-SSIM.92.9.88 RGB-SSIM.92.9.86.84 SADpred SADcb Mobile.88.86 SADpred SADcb Stefan.5 2 2.5 3 3.5 4.5 2 2.5 3 (c) (d) Figure 4.5 RGB-SSIM and bit rate comparison, motion block size (6x6) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

35 Y-PSNR in db 36 35 34 33 32 3 3 SADpred SADcb Football.5 2 2.5 Y-PSNR in db 34 33 32 3 3 28 27 SADpred SADcb Habour.5 2 2.5 3 Y-PSNR in db 34 33 32 3 3 28 27 26 SADpred SADcb Mobile Y-PSNR in db 35 34 33 32 3 3 28 SADpred SADcb Stefan.5 2 2.5 3 3.5 4.5 2 2.5 3 (c) (d) Figure 4.6 Y-PSNR and bit rate comparison, motion block size (8x8) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

36.98.94.97.92.96.95 Y-SSIM.9.88.86.84 SADpred SADcb Football.5 2 2.5 Y-SSIM.94.93.92.9.9 SADpred SADcb Habour.5 2 2.5 3.98 Y-SSIM.97.96.95.94.93.92.9.9 SADpred SADcb Mobile.5 2 2.5 3 3.5 4 (c) Y-SSIM.97.96.95.94.93.92 SADpred SADcb Stefan.5 2 2.5 3 (d) Figure 4.7 Y-SSIM and bit rate comparison, motion block size (8x8) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

37 32 32 3 3 RGB-PSNR in db 3 28 27 26 SADpred SADcb Football.5 2 2.5 RGB-PSNR in db 3 28 27 26 25 SADpred SADcb Habour.5 2 2.5 3 RGB-PSNR in db 28 27 26 25 24 23 22 SADpred SADcb Mobile.5 2 2.5 3 3.5 4 (c) RGB-PSNR in db 3 3 28 27 26 25 SADpred SADcb Stefan.5 2 2.5 3 (d) Figure 4.8 RGB-PSNR and bit rate comparison, motion block size (8x8) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

38.92.97.9.96.88.95 RGB-SSIM.86.84.82.8.78 SADpred SADcb Football RGB-SSIM.94.93.92.9.9.89 SADpred SADcb Habour.5 2 2.5.5 2 2.5 3.96 RGB-SSIM.94.92.9.88.86 SADpred SADcb Mobile.5 2 2.5 3 3.5 4 (c) RGB-SSIM.95.94.93.92.9.9.89.88 SADpred SADcb Stefan.87.5 2 2.5 3 (d) Figure 4.9 RGB-SSIM and bit rate comparison, motion block size (8x8) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

39 Y-PSNR in db 36 35 34 33 32 3 3 SADpred SADcb Football 2.5 3 3.5 4 4.5 Y-PSNR in db 35 34 33 32 3 3 28 SADpred SADcb Habour 2 2.5 3 3.5 Y-PSNR in db 34 33 32 3 3 28 27 26 SADpred SADcb Mobile 2.5 3 3.5 4 4.5 5 5.5 (c) Y-PSNR in db 35 34 33 32 3 3 28 SADpred SADcb Stefan 2 2.5 3 3.5 4 (d) Figure 4. Y-PSNR and bit rate comparison, motion block size (4x4) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

4 Y-SSIM.94.92.9.88.86 SADpred SADcb Football.84.82 2.5 3 3.5 4 4.5 Y-SSIM.98.97.96.95.94.93.92 SADpred SADcb Habour 2 2.5 3 3.5.98.98.97.97 Y-SSIM.96.95.94 Y-SSIM.96.95.93.92.9 SADpred SADcb Mobile.94.93 SADpred SADcb Stefan 2.5 3 3.5 4 4.5 5 5.5 2 2.5 3 3.5 4 (c) (d) Figure 4. Y-SSIM and bit rate comparison, motion block size (4x4) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

4 3 32 3 3 RGB-PSNR in db 28 27 26 25 SADpred SADcb Football RGB-PSNR in db 3 28 27 26 SADpred SADcb Habour 2.5 3 3.5 4 4.5 2 2.5 3 3.5 RGB-PSNR in db 28 27 26 25 24 23 22 SADpred SADcb Mobile 2.5 3 3.5 4 4.5 5 5.5 (c) RGB-PSNR in db 3 3 28 27 26 25 SADpred SADcb Stefan 2 2.5 3 3.5 4 (d) Figure 4.2 RGB-PSNR and bit rate comparison, motion block size (4x4) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

42 RGB-SSIM.9.88.86.84.82.8.78.76.74 SADpred SADcb Football 2.5 3 3.5 4 4.5 RGB-SSIM.97.96.95.94.93.92.9 SADpred SADcb Habour 2 2.5 3 3.5.96.96.95.94.94 RGB-SSIM.92.9.88.86 SADpred SADcb Mobile RGB-SSIM.93.92.9.9.89.88 SADpred SADcb Stefan 2.5 3 3.5 4 4.5 5 5.5 2 2.5 3 3.5 4 (c) (d) Figure 4.3 RGB-SSIM and bit rate comparison, motion block size (4x4) with four test sequences, Football, Harbour, (c) Mobile, and (d) Stefan

43 REFERENCES [] ITU-T Rec. H.264 and ISO/IEC 4496- (MPEG4-AVC), Advanced Video Coding for Generic Audiovisual Services, March 25. [Online]. Available: http://ecs.itu.ch/cgibin/ebookshop [2] G. Sullivan, T. Wiegand, and H. Yu, Joint draft 6 of New profiles for professional applications amendment to ITUT-Rec. H.264 & ISO/IEC 4496- (Amendment 2 to 25 edition). in JVT of ISO/IEC MPEG and ITU-T VCEG, Doc. JVT-V24, Jan. 27. [Online]. Available: http://ftp3.itu.ch/av-arch/jvt-site/27 Marrakech/ [3] S. K. Kwon, A. Tamhankar, and K. R. Rao, Overview of H.264/MPEG-4 part, J. Vis. Commun. Image Representation, vol. 7, pp. 86 26, April 26. [4] H. R. Wu and K. R. Rao, Digital Video Image Quality and Perceptual Coding. Boca Raton, FL: CRC Press, 26. [5] Z. Wang et al. Image quality assessment: From error visibility to structural similarity, IEEE Trans. Image Processing, vol. 3, pp. 6 62, Apr. 24. [6] Z. Wang, L. Lu, and A.C. Bovik, Video quality assessment based on structural distortion measurement, Signal Processing: Image Communication, vol. 9, pp. 2 32, Feb. 24. [7] Z. Wang and A.C. Bovik. Modern Image Quality Assessment. San Rafael, CA: Morgan & Claypool, 26. [8] A. Kruafak, PhD Comprehensive Exam Proposal on Hybrid video coding design using variable size transform coding and variable size motion estimation, The University of Texas at Arlington, June 27. [9] Z.-Y. Mai et al. A novel motion estimation method based on structural similarity for H.264 inter prediction, IEEE ICASSP 26, pp. 93 96, May 26. [] C.-L. Yang, H.-X. Wang, and L.-M. Po, A novel fast motion estimation algorithm based on SSIM for H.264 video coding, Pacific-Rim Conference on Multimedia, pp. 68 76, Dec. 27. [] Z. Wang. The SSIM Index for Image Quality Assessment, SSIM index Matlab code. [Online]. Available: http://www.ece.uwaterloo.ca/~z7wang/research/ssim/ [2] K. Sühring, Ed., (27) JM 2. Reference Software. [Online]. Available: http://iphome.hhi.de/suehring/tml/ [3] A. M. Tourapis, K. Sühring, and G. Sullivan, Revised H.264/ MPEG-4 AVC reference software manual. in JVT of ISO/IEC MPEG and ITU-T VCEG, Doc. JVT-Q42, Oct. 25. [Online]. Available: http://ftp3.itu.ch/av-arch/jvt-site/25 Nice/ [4] X. Xu, and Y. He, Comments on motion estimation algorithms in current JM software, in JVT of ISO/IEC MPEG and ITU-T VCEG, Doc. JVT-Q89, Oct. 25. [Online]. Available: http://ftp3.itu.ch/av-arch/jvt-site/25 Nice/

[5] Z. Chen et al. Fast integer pel and fractional pel motion estimation for JVT. in JVT of ISO/IEC MPEG and ITU-T VCEG, Doc. JVT-F7, Dec. 22. [Online]. Available: http://ftp3.itu.ch/av-arch/jvt-site/22_2_awaji/ 44 [6] X. Yi et al. Improved and simplified fast motion estimation for JM. in JVT of ISO/IEC MPEG and ITU-T VCEG, Doc. JVT-P2, July 25. [Online]. Available: http://ftp3.itu.ch/avarch/jvt-site/25_7_poznan/ [7] K.-P. Lim, G. Sullivan, and T. Wiegand, Text description of joint model reference encoding methods and decoding concealment methods. in JVT of ISO/IEC MPEG and ITU-T VCEG, Doc. JVT-N46, Jan. 25. [Online]. Available: http://ftp3.itu.ch/av-arch/jvtsite/25 HongKong/ [8] I. E. G. Richardson, H.264 and MPEG-4 Video Compression. West Sussex, UK: Wiley, 23. [9] T. Wedi and H. G. Musmann, Motion- and aliasing-compensated prediction for hybrid video coding, IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp. 577 586, July 23. [2] C. Zhu, X. Lin, and L.-P. Chau, Hexagon-based search pattern for fast block motion estimation, IEEE Trans. Circuits Systs. Video Technol., vol. 2, pp. 349 355, May 22. [2] S. Zhu, and K.-K. Ma, A new diamond search algorithm for fast block-matching motion estimation, IEEE Trans. Image Processing, vol. 9, pp. 287, Feb. 2. [22] Video Library and Tools, Network System Lab, Simon Fraser University. [Online]. Available: http://nsl.cs.sfu.ca/wiki/index.php/video_library_and_tools [23] K. Sayood, Introduction to Data Compression 3rd Edition. San Francisco: Morgan Kaufmann, 26. [24] R. C. Gonzalez, R. E. Woods, and S. L. Eddins, Digital Image Processing Using MATLAB. Upper Saddle River, NJ: Pentice-Hall, 24. [25] W. Burger and M. J. Burge, Digital Image Processing, An Algorithmic Introduction Using Java. New York, NY: Springer, 28. [26] W.-S Kim et al., New video quality metrics in the H.264 reference software. in JVT of ISO/IEC MPEG and ITU-T VCEG, Doc. JVT-AB3, April 28. [Online]. Available: http://ftp3.itu.ch/av-arch/jvt-site/28_7_hannover/