REDUCTION OF CODING ARTIFACTS IN LOW-BIT-RATE VIDEO CODING Robert L. Stevenson Laboratory for Image and Signal Processing Department of Electrical Engineering University of Notre Dame Notre Dame, IN 46556 E-Mail: Stevenson.1@nd.edu ABSTRACT The compression of digital video data has many applications in the transmission and storage of video sequences. For moderate compression ratios there are many techniques which can provide satisfactory performance, for high compression ratios, however, typical compression techniques produce noticeable artifacts in the reconstructed video. This paper proposes a technique for the post-processing of motioncompensated compressed video data. The technique utilied a stochastic regulariation approach which can be realied using a simple and fast iterative computational algorithm. The approach has been applied to the post-processing color video sequences and yields good results. 1. INTRODUCTION Source coding of video data has been a very active area of research for many years. The goal of course is to reduce the number of bits needed to represent a video sequence while making as few as possible perceptible changes to the data. Many algorithms have been developed which can successfully compress a video sequence to a certain rate with almost no perceptible eects. A problem arises, however, as we try to push these compression techniques beyond their target rate. For high compression ratios most algorithms start to generate artifacts which severely degrade the perceived quality of the video sequence. The type of artifacts generated are dependent on the compression technique. For motion-compensated block encoded video sequence, the most noticeable artifact is generally the discontinuities present at block boundaries due to both the transform coding of inter-frame coded images and the motioncompensation of intra-frame coded images. This paper extends a technique which we originally proposed for post-processing still image data to post-processing video data. It is based on a stochastic framework, where probabilistic models are used for both the noise introduced by the coding and for a \good" image. The restored video sequence is the MAP estimate based on these models. Most previous work in this area have addressed just the postprocessing of still image data. Extending these previous ideas is not always straight forward. Previous techniques This work was supported by the Intel Corporation. which have tried to address this issue have various problems which limit their ability to produce high quality image estimates. Some techniques propose changes in the way the image is coded[4, 2], this however reduces the eciency of the source coder and thus reduces the compression ratio. Linear based estimators[10], while removing some artifacts, usually degrade edge information in the original image. Several techniques try to overcome this smoothing of the edges by rst estimating the edge information in the compressed image data[3, 5] or by estimating edge information during an iterative smoothing procedure[8, 9]. This however is a very dicult task for very high compression ratios, where the actual edge information is somewhat scrambled. This paper will rst describe a generic model for video compression. For the purpose of the reconstruction algorithm, this model is descriptive enough to describe many compression techniques, such as: subband coding, vector quantiation, DPCM and various hybrid techniques which combine some of these methods. It also describes the effects of motion compensation which is used in many video compression techniques. A decompression algorithm is then described based on a previously proposed image model[6, 7]. The computational algorithm is also briey described. Experimental results are shown for video data compressed using the Intel's Indeo compressor. It can be seen that reconstructed image sequence of this new method show a reduction in many of the most noticeable artifacts and thus allow higher compression ratios. 2. DECOMPRESSION ALGORITHM To decompress the compressed video representation, a MAP technique is proposed. Let the compressed video data be represented by y while the decompressed full resolution video sequence is represented by. For MAP estimation, the decompressed video data estimate ^ is given by ^ = arg max L(jy) (1) where L(:) is the log likelihood function L(:) = log P r(:). Using Bayes rule P r(yj)p r() ^ = arg maxflog g (2) P r(y) = arg maxflog P r(yj) + log P r()g (3)
The conditional probability P r(yj) is based on the video compression method while prior probability P r() is based on the stochastic image model[6, 7]. 2.1. Video compression model In a transform coding compression technique, a unitary transformation H is applied to an original video frame x. The compressed representation y is obtained by applying a quantiation Q to the transform coecients y = Q[Hx]: (4) In video compression, the quantier, Q, often includes a linear dierence operation with the previous frame. Without loss of generality this can be included as part of the nonlinear operator Q. Quantiation partitions the transform coecient space and maps all points in a partition cell to a representative reconstruction point, usually taken as the centroid of the cell. The indices of these cells are transmitted in the compressed representation y. In the standard video decompression method, the reconstructed video is given by ^ = H?1 Q?1 [y]: (5) The inverse quantiation maps the indices to the reconstruction points. Quantiation may be viewed as a many to one operation. That is, many video sequences map into the same compressed representation. The operation of the quantier is assumed to be noise free. A given video sequence will be compressed to the same compressed representation y every time. The conditional probability for the noise free quantier can be described by P r(yj) = 1; y = Q[H]; 0; y 6= Q[H]: Therefore the MAP estimate of (3) can be written as a minimiation constrained to the space Z = f : y = Q[H]g 2.2. Image model (6) ^ = arg minf? log P r()g: (7) 2Z For a model of a \good" image (i.e. P r()) a non-gaussian Markov random eld (MRF) model is used [6, 7]. This model has been shown to successfully model both the smooth regions and discontinuities present in images. In this research a special form of the MRF is used which has this very desirable property. This model is characteried by a special form of the Gibbs distribution P r(x) = 1 Z expf? 1 X c2c T (d t cx)g (8) where is a scalar constant that is greater than ero, d c is a collection of linear operators and the function T (:) is shown in Figure 1 and is given by T (u) = u 2 ; juj T; T 2 + 2T (juj? T ); juj > T ; (9) ρ -T T x Figure 1: T (x) Since T (:) is convex, this particular form of the MRF results in a convex optimiation problem when used in the MAP estimation formulation (3). Therefore, such MAP estimates will be unique, stable, and can be computed eciently. The function T (:) is known as the Huber minimax function [1] and for that reason this statistical model is called the Huber-Markov random eld model (HMRF). For this distribution, the linear operators, d c, provide the mechanism for incorporating what is considered consistent most of the time while the function T (:) is the mechanism for allowing some inconsistency. The parameter T controls the amount of inconsistency allowable. The function T (:) allows some inconsistency by reducing the importance of the consistency measure when the value of the consistency measure exceeds some threshold, T. For the measure of consistency, the fact that the difference between a pixel and its local neighbors should be small is used; that is, there should be little local variation in the image. For this assumption, an appropriate set of consistency measures is fd t cg c2c = f m;n? k;lg k;l2nm;n;1m;nn ; (10) where N m;n consists of the eight nearest neighbors of the pixel located at (m; n) and N is the dimension of the image. Across discontinuities this measure is large, but the relative importance of the measure at such a point is reduced because of the the use of the Huber function. The MAP estimate can now be written as ^ = arg min 2Z = arg min X c2cx X V c() (11) 2Z 1m;nN k;l2nm;n T ( m;n? k;l): (12) As a result of the choice of image model [6, 7], this results in a convex (but not quadratic) constrained optimiation which can be solved using iterative techniques. 2.3. Reconstruction algorithm An iterative approach is used to nd ^ in the constrained minimiation of (12). An initial estimate 0 is improved by successive iterations until the dierence between k and k+1 is below a given threshold. The rate of convergence of the iteration is aected by the choice of the initial estimate. A better initial estimate will result in faster convergence.
The initial estimate used here is formed by the standard decompression 0 = H?1 Q?1 [y]: (13) Given the estimate at the k-th iteration, k, the gradient descent method is used to nd the estimate at the next iteration, k+1. The gradient of P T (d t c) is used to nd the steepest direction b( k) towards the minimum b( k) = X c2c The sie of the step k is chosen as k = b t k (P c2c 0 T (d t c k)d t c: (14) b t kb k Since the updated estimate w k+1, 00 T (dt c k)d cd t c)b k : (15) w k+1 = k + kb( k); (16) may fall outside the constraint space Z, w k+1 is projected onto Z to give the image estimate at the k + 1-th iteration k+1 = P Z(w k+1): (17) P Z is dependent both on the original compressed image y and the quantiation Q which was used to produce it. In projecting the image w k+1 onto the constraint space Z, we are nding the point k+1 2 Z for which jj k+1? w k+1jj is a minimum. If w k+1 2 Z, then k+1 = w k+1 and jj k+1? w k+1jj = 0. Since H is unitary, jjh k+1? Hw k+1jj = jj k+1? w k+1jj (18) and the projection can be carried out in the transform domain. For intra-frame encoded images the motion compensated block is subtracted o before the projection operator is applied. It is then added back to the projected image point to obtain k+1. Devising projector operators in the transform domain can be done for both scalar and vector quantiers. 3. EXAMPLES Figure 2a shows a single intra-coded frame from a teleconferencing test sequence after it has been compressed by Intel's MRV compression standard. The coding parameters have been set so that the original 320240 at 10 frames per second is compressed to a 90kbs (a compression ratio of 200 to 1) Coding artifacts are very noticeable at this rate. Most noticeable are the blocking eects of the coding algorithm. Figure 2b shows the result of the postprocessing described in this paper. Only seven iterations of the iterative procedure were executed in order to reduce the coding artifacts. Notice that the blocking eects have been completely removed in the postprocessed image. This can be most easily seen in the face and background regions. 4. CONCLUSION The problem of video decompression has been cast as an ill-posed inverse problem, and a stochastic regulariation technique has been used to form a well-posed reconstruction algorithm. A statistical model for the image was produced which incorporated the convex Huber minimax function. The use of the Huber minimax function T () helps to maintain the discontinuities from the original image which produces high resolution edge boundaries. Since T () is convex, the resulting multidimensional minimiation problem is a constrained convex optimiation problem. Ecient computational algorithms can be used in the minimiation. The proposed video decompression algorithm produces reconstructed video sequences which greatly reduced the noticeable artifacts which exist using standard techniques. 5. REFERENCES [1] P. J. Huber, Robust Statistics, New York: John Wiley & Sons, 1981. [2] K. N. Ngan, D. W. Lin, and M. L. Liou, \Enhancement of Image Quality for Low Bit Rate Video Coding," IEEE Transactions on Circuits and Systems, Vol. 38, No. 10, October 1991, pp. 1221{1225. [3] B. Ramamurthi and A. Gersho, \Nonlinear Space- Variant Post-processing of Block Coded Images," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-34, No. 5, October 1986, pp. 1258{1268. [4] H. C. Reeve and J. S. Lim, \Reduction of Blocking Eects in Image Coding," Optical Engineering, Vol. 23, No. 1, January/February 1984, pp. 34{37. [5] K. Sauer, \Enhancement of Low Bit-Rate Coded Images Using Edge Detection and Estimation," Computer Vision Graphics and Image Processing: Graphical Models and Image Processing, Vol. 53, No. 1, January 1991, pp. 52{62. [6] R. R. Schult and R. L. Stevenson, \Improved Definition Image Expansion," Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, San Francisco, CA, March 23{26, 1992. [7] R. L. Stevenson and S. M. Schweier, \Nonlinear Filtering Structure for Image Smoothing in Mixed-Noise Environments," Journal of Mathematical Imaging and Vision Vol. 2, 1992, pp. 137{154. [8] Y. Yang, N. P. Galatsanos, and A. K. Katsaggelos, \Regularied Reconstruction to Reduce Blocking Artifacts of Block Discrete Cosine Transform Compressed Images," IEEE Trans. on Circuits and Systems for Video Technology, Vol. 3, No. 6, pp. 421{432, Dec. 1993. [9] Y. Yang, N. P. Galatsanos, and A. K. Katsaggelos, \Projection-Based Spatially Adaptive Reconstruction of Block-Transform Compressed Image," IEEE Trans. on Image Processing, Vol. 4, No. 7, pp. 896{908, July 1995.
[10] A. Zakhor, \Iterative Procedures for Reduction of Blocking Eects in Transform Image Coding," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 2, No. 1, March 1992, pp. 91{95.
(a) (b) Figure 2: a) MRV compressed intra-frame coded image (200:1), b) Post-processed image