Project Title: Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Midterm Report CS 584 Multimedia Communications Submitted by: Syed Jawwad Bukhari 2004-03-0028
About Project...1 Project Goals...1 Introduction...2 Why Image/Video Compression?... 2 Error Metrics... 2 Why Scalable Video Coding?... 3 Background study...4 Motion Prediction... 4 DWT... 5 Video Compression Using DWT...6 EZW... 6 TWAVIX... 8 Project Work...8 Next Phase...9 Results...10 References...11
Page 1 About Project In this project I am reviewing and analyzing different scalable video coding with scalable motion coding schemes that use the discrete wavelet transform for video compression. For this I am implementing a scalable video coder using discrete wavelet transform for different famous approaches described in text. In the later part I will look at scalable motion coding part. For this I will be analyzing (in detail) the approach described by Boisson [1]. Main purpose of this project is to get understanding of video compression by using wavelets and to carry out analysis for scalable motion coding so that I can come up with some optimal estimates of parameters in a scalable video codec. This mid term report is organized as follows. First of all I will describe what is scalable video coding and why we need it after that scalable motion coding and its importance in scalable video coding will be discussed. Further a background study for using wavelets will be discussed and finally the work done so far with the proposed implementation scheme will be presented. Project Goals The goal of this project is to explore some of famous DWT based techniques for scalable video compression. More precisely the scope of project is as follows: Design and implement a Scalable Video Codec. Performing analyses for finding balance between different parameters involved in a scalable codec. Study and implementation of a scalable motion coding technique described in [1].
Page 2 Introduction Why Image/Video Compression? A video is a sequence of frames that are to be played out in such a manner that the viewer gets illusion of a scene being played. Each frame of video is an image and as we know that image compression is necessary since each pixel in a colored image required 24bits (for all three color channels) and with a frame size of 800 by 600 it requires almost 1.4MB of memory to store one frame and to display a sequence of frames as a video with 30 frames per second we require 2.5 GB memory for one minute of a video. Hence Image/Video compression is necessary to save memory as well as the bandwidth if the image/video is to be transferred over a network link. Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are well known to achieve image compression by exploiting the spatial redundancy of the image. These can achieve very high compression ratios for natural images. DWT has been shown to achieve image compression ratios far beyond than that of DCT. For high resolution images it outperforms the DCT based methods while keeping the perceptual quality of image in acceptable ranges. The trade off in image/video compression is compression ratio versus the perceptual quality. For very high compression ratio usually we get degradation in the perceptual quality. In recent times wavelets based techniques have shown promising results to achieve very high compression within the acceptable quality ranges. I will describe more about it in next sections but first look at the error metrics that are most commonly used to measure image quality. Error Metrics Mean Square Error (MSE) and Peak Signal to Noise Ratio (PSNR) are two error metrics commonly used for measuring image quality. The mathematical formulae for these are as follows: MSE 1 = MN M N i= 1 j= 1 ( OrgPixelValue( i, j) DecodedPixelValue( i, j)) 2 PSNR = 20 log ( 255 10 ) MSE
Page 3 Clearly if MSE between original and decompressed image is high than the quality is supposed to be of a lower level and a higher PSNR value would mean a higher quality of compression/decompression as the inverse relation between PSNR and MSE depicts. One more measure for perceptual quality is perceptual quality. Why Scalable Video Coding? ^ S it is best of the above for measuring Heterogeneous network users request for different type of quality for the same video for example a user with a lower bandwidth connection may require a low quality video while a user watching the same video on a HDTV will require a video with much higher quality resulting in greater requirements for bandwidth. A mobile user will require the same video in a lower resolution mode due to smaller screen size and memory requirements. Now a generic video coder to fulfill requirements for such a wide range of bit rates and qualities it requires the video coder to be scalable. Also for achieving such a high scalability it is essential that motion coding present some scalability. In the later part of the report it will be discuss in much more detail.
Page 4 Background study First of all we will discuss basics of video compression in short. To compress a video we usually exploit spatial as well as temporal redundancies. Spatial redundancies can be exploited by conventional image compression methods while the temporal redundancies can be exploited by various approaches. Here I will describe the use of motion prediction method to achieve temporal redundancy. Motion Prediction Motion in adjacent frame for a natural scene can be estimated from the previous frame (as well as from future frames) that is most of the frame content is repeated in consecutive frames to give effect of smooth motion. Therefore we can estimate motion occurring in one frame from its neighboring frames. For this we use block-matching algorithms and try to find similar small size blocks in adjacent frames. Motion vectors represent the amount of motion that happened in current frame with respect to some reference frame. All we need to do is to send a GOP (group of pictures) in which first frame will be intra-coded (I-frame) i.e. a frame with no motion vectors and it is just compressed as an image by efficient image compression methods. Now this can predict the next frame at decoder side with the use of motion vectors. There are various approaches for motion estimation and compensation. I will discuss the generic motion prediction mechanism. Consider the following GOP I1 B2 B3 P4 B5 B6 P7 B8 B9 I10 P frames are predicted from the previous I or P frames whereas B frames are predicted from I and P frames on both forward and backward indices. We can use different type of block matching algorithms ranging from exhaustive search to log search and from pixel to sub-pixel search space. Experiments have shown that half or quarter pixel value search techniques have significant achievement in quality. Motion vectors are estimated from motion compensation unit and the difference of motion compensated image with the original image is compressed and transmitted.
Page 5 Motion estimation is quite useful for exploiting temporal redundancies at the expense of extra computations. DWT DWT is found very useful in transform based image compression. Due to better energy compaction as well as correspondence with the human visual system, wavelet transform methods for image compression are very successful. The Embedded Zero-tree wavelet (EZW) algorithm provides a compression ratio of 100:1 while keeping the perceptual quality of image in acceptable range. I will discuss different wavelet based algorithms in next section. In the following figure simple image compression technique is shown which uses DWT transform. We take the DWT on the whole image; it results in approximation coefficients in which we can again take the 2D-wavelet transform. This can be done as many levels as required Image Wavelet Transform Quantization Entropy Coding Compressed image Figure 1: Image Compression Using Wavelet 1 level 2D DWT 2 level 2D DWT Figure 2: 1 and 2 levels of DWT In the above figures 1 level and 2 levels DWT on the cameraman images are shown. Each application of DWT divides the image into sub-band image. The top left corner is approximation coefficients that are the LL (low, low) frequency region.
Page 6 Video Compression Using DWT There are various approaches to get video compression one way is to perform DWT on the intra-coded frame as well as on the residual of motion compensated and original frame. In this approach DWT is used only for exploiting spatial redundancy and for temporal redundancy we use block matching motion prediction. Many variations and techniques are proposed and are in use under this framework. Another approach is to use 3D signal transforms for exploiting spatial and temporal redundancies one such method is 3D DWT based video compression. I will not review such techniques as it is out of the scope of project. One more approach is to perform motion estimation and compensation on the wavelet filtered image. Now we look at how we can apply DWT on an image to achieve compression. When applying 2D DWT on an image for multiple levels we get the approximation coefficients and detail coefficients. Approximation coefficients are the LL frequencies that result from the low pass horizontal and vertical resolution filters. Most of the image energy is present in approximation coefficients and the detail coefficients have the information for sharp edges etc. Each application of DWT filters sub-bands the image into an image of size that is half in size of the each dimension of the image. This is depicted by the figure 2 in which the left image is a decomposed image after one level of DWT and the right one is after two levels of DWT. Once we get the coefficients all we need to do is to encode them efficiently to get image compression so that the reconstructed image remains perceptually indistinguishable from the original one. The most famous approach that enabled the application of DWT for achieving very high image compression ratios is embedded zerotree wavelet (EZW) coding. One noticeable thing is that EZW coding enables the progressive image decompression, as we will see while discussing EZW. There are some improvements in EZW coding that are widely used these algorithms include SPIHT, WDR and ASWDR. We will discuss only EZW. EZW Shapiro presented the EZW algorithm in [2]. Embedded coding is an approach for encoding the transformed coefficients to achieve progressive transmission of compressed
Page 7 image hence we can achieve scalability in image compression by sending only those coefficients that are necessary for image decompression on a specific bit rate. Zero-trees allow an efficient coding technique of coefficients that will result in embedded coding. Consider the following matrix that shows the coefficients with a numbering scheme representing the order in which coefficients are to be read and encoded. 1 2 5 8 17 24 25 32 3 4 6 7 18 23 26 31 9 10 13 14 19 22 27 30 12 11 15 16 20 21 28 29 33 34 35 36 49 50 54 55 40 39 38 37 51 53 56 61 41 42 43 44 52 57 60 62 48 47 46 45 58 59 63 64 The process of embedded coding used in EZW is also referred as bit-plane encoding. Following is the 5 step bit-plane coding process used in EZW [3] Step 1: Set an initial threshold such that will only the first coefficient is greater than the threshold and no other is greater than the threshold. Step 2: Update the threshold to its half. Step 3: Significance pass. Scan insignificant values using baseline algorithm scan order as presented in figure above. Test each value if it greater than threshold then output the sign of value and set its quantised value to this threshold otherwise set the quantised value for this coefficient to zero. Step 4: Refinement pass. Scan significant values found with higher threshold values. For each significant value output a zero bit if it belongs to quantised value plus the threshold value interval otherwise output a one bit. Step 5: Repeat step 2 to 4. This way we will get a bit steam and the decoder needs only to produce the quantised coefficients from it. Using a quad-tree (embedded tree here) structure we can gain significant image compression by getting many zeros.
Page 8 Next I will discuss another framework that uses wavelet for video compression. TWAVIX TWAVIX [1] stands for The Wavelet based video coder with scalability. The following diagram depicts its architecture: first the video is passed from temporal analysis unit where on the basis of GOP format each frame is either passed to spatial analysis unit or for motion prediction to the motion estimation unit. The rest of process is simply the jpeg2000 image compression. Figure 3: TWAVIX architecture Project Work The architecture for my video codec is almost similar to TWAVIX architecture; the slight variations can be noticed from the following diagram. Figure 4: Planned Coder
Page 9 In this diagram I have shown the architecture of coder that will use the EZW coding for image compression. For testing and efficiency purposes I am implementing both the EZW based compression as well as a JPEG2000 like architecture for image compression in which approximation coefficients are quantized and then entropy coded. The block diagram is similar to the first except the use of entropy coding instead of EZW coding. Figure 5: Planned Coder with Entropy Coding So far I have completed implementation of this coder in matlab the only thing to do is to make the code in form of a separate coder and decoder and to tune up some computationally expensive steps. Next Phase In the next phase I will set up a server and make the encoder able for scalable coding of video for different specifications and then finally performing the experiments to fix the optimal values or find some relationships between different parameters for scalable video coding.
Page 10 Results Actual P4 frame Actual B2 frame Compensated P4 frame Compensated B2 frame Difference between Compensated P4 frame and Actual P4 frame Difference between Compensated B2 frame and Actual B2 frame
Page 11 References [1]. Accuracy-scalable motion coding for efficient scalable video compression ; Boisson, Edouard & Guillemot., 2004. [2]. Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. on Signal Processing, 41(12): 3445-3462, 1993. [3]. The transform and data compression handbook