Whitepaper submitted to Mozilla Research Pre- and Post-Processing for Video Compression Aggelos K. Katsaggelos AT&T Professor Department of Electrical Engineering and Computer Science Northwestern University Tel: +847-491-7164 Email: aggk@eecs.northwestern.edu IVPL: http://ivpl.ece.northwestern.edu/ December 13, 2013
Overview Delivery of high-quality video under bit rate constraints is the primary objective of digital video compression technology. Modern video codecs including Daala employ a hybrid coding scheme. In addition to prediction and transform at the core of such codecs, extensive research has shown that pre- and post-processing of video signals are of great value in improving the performance of a codec [1]. Daala, a codec under development is utilizing pre- and post-processing as an integral part of the codec. It is differentiating itself from existing video compression standards, thus supporting and advancing Open Web and Web technologies, and empowering and encouraging the creation of open research and software. At the Image and Video Processing Laboratory (IVPL) we have long and extensive experience in video compression research and development and we have contributed to the video coding standardization efforts, as manifested by the large number of publications, talks, and patents. We propose to apply such knowledge and expertise towards improving the current capabilities of Daala. More specifically, we propose a framework towards improving end-user video quality under bit rate constraints by (i) applying adaptive video processing techniques, (ii) jointly optimizing the pre- and post-processing components, and (iii) filtering the residual and frequency coefficients for efficient rate control and improved rate-distortion performance. Proposed Technical Details We have been advocating a video compression paradigm according to which the decoder is an estimator whose objective is to obtain the visually best reconstruction of the available bit stream [1]. Such an objective is expressed by incorporating prior knowledge about the video characteristics (e.g., spatial and temporal smoothness) and the auxiliary information provided by the bit stream (e.g., motion information, allowable frequency coefficients, etc) into the estimation problem while being faithful to the decoded frame intensities. Any post-processing task is therefore included in such a paradigm. Similarly, the job of the pre-processor is to prepare the data in such a way that given knowledge of the post-processor the best visual quality of the reconstructed video can be obtained for a given bit budget. In the following we provide a brief description of the tasks we propose to undertake. Adaptive pre- and post-processing We propose to develop an adaptive recovery system such that artifacts introduced during compression are reduced while the image characteristics are preserved. We will expand on technology we have developed. As an example, the following formulation for adaptive de-blocking filtering was used in [2] ( ) ( ), where enforces the output to be close to the decoded frame, the summation represents a set of smoothness priors imposed on with and denoting the weight and high-pass operator, respectively, 1 enforces temporal smoothness in the
output, and ( ) represents the frequency constraints (the DCT coefficients of the estimated frame should agree with those contained in the bit stream). Note that the weighted sum is used to introduce spatial adaptivity, for example, by expressing the weights as functions of the visibility of the blocking artifacts [3]. For example, the weight assigned to each pixel is proportional to the local mean and inversely proportional to the local variance. This takes into account the fact that blocking artifacts are more visible in high intensity flat areas. For such areas the weights are larger resulting in stronger smoothing. These ideas will be adopted in designing both adaptive pre- and post-processing filters. Solutions to such optimization problems will be sought using numerous approaches, including constrained least squares (CLS), projection onto convex sets (POCS), Bayesian techniques, etc [1], [4]. Joint Design of Pre- and Post-Processing Pre- and post-processing are outside the specification of the video compression standards. Since in many cases one entity is developing the encoder while another one is developing the decoder, pre- and post-processing are independent systems, that is, the one does not have knowledge of the operations of the other. In developing a new codec, such as Daala, the opportunity arises to incorporate pre- and post-processing into the overall system and design them optimally. We therefore propose to jointly design the pre- and post-processing components. We propose to formulate the joint design as an optimization problem according to which the end-to-end expected distortion is minimized. An early example of such a joint design is presented in [5]. In addition, since the pre-processor has complete knowledge of the post-processor, it can also provide useful auxiliary information to the decoder, as shown in Figures 1 and 2. Note that in these figures, the post-processing component (shaded in blue) utilizes the auxiliary information provided by the pre-processor, as well as, the information carried by the bit stream. Residual and Frequency Filtering Regarding pre-processing, an important question is where exactly to apply it in the encoding scheme. Naturally, pre-processing can be applied to the input video, that is, it operates directly on the spatial domain intensities. One drawback of this is the resulting complicated rate control scheme, since iterative parameter selection for pre-processing necessitates multiple rounds of prediction, which is computationally expensive [6]. Alternatively, we propose to apply pre-processing to the residual signal, either to the displaced frame difference (DFD) in the inter-prediction mode or to the spatial residual in the intraprediction mode. This design is exemplified in Figure 2, where the pre-processing takes place right before the transformation of the DFD. Experimental results demonstrate that this is an effective design [6]. In addition, we also propose to filter the quantized frequency coefficients directly, in order to achieve better rate distortion (RD) performance [7], [8]. input Preprocessing Transform Quantization Frequency filtering Entropy coding auxiliary info output Postprocessing Inverse transform De-quantization Entropy decoding prior info frquency constraints Figure 1: Block diagram of intra-coding 2
input - residual Preprocessing Transform Quantization Frequency filtering motion constraints auxiliary info Post-processing Inverse transform De-quantization prior info frquency constraints Figure 2: Block diagram of inter-coding Summary of Objectives The main objective of this proposal is to develop technology which will enhance the development of the Daala codec. We propose to undertake the development of the following technical innovations: (1) Adaptive pre- and post-processing for improved visual quality (2) Joint pre-/post-processing optimization (3) Residual and frequency filtering for efficient rate control and improved RD performance The primary deliverables of this project will be algorithms, software, and documentation. Budget We are requesting support for one post-doctoral student for one year. This amounts to $60K for stipend and $12K for benefits. In addition we are requesting $4K for travel. To this sub-total of $76K there is a $10K departmental overhead, bringing the total request to $86K. 3
References [1] C. A. S gall a d A. K. Ka agg l, Pr - and post-processing algorithms for compressed v d ha m, 34 th Asilomar Conference on Signals and Systems, Pacific Grove, CA, October 2000. [2] C. A. Segall and A. K. Katsaggelos, E ha m f mpr d v d g v al q al y m r, IEEE ICIP, September 2000. [2] Y. Ya g, N. P. Gala a, A. K. Ka agg l, R g lar z d r r r d l k g ar fa f l k d r ra f rm mpr d mag, IEEE Transactions on Circuits and Systems for Video Technology, vol. 3, no. 6, December 1993. [4] Y. Ya g, N. P. Gala a, A. K. Ka agg l, Pr -based spatially adaptive reconstruction of block- ra f rm mpr d mag, IEEE Transactions on Image Processing, vol. 4, no. 7, July 1995, 896-908. [5] C. A. S gall, Fram w rk f r h p -processing, super-resolution and de-blurring of mpr d v d, Ph.D h, N r hw r U v r y, Eva, IL, 2001. [6] C. A. Segall, P. Karunaratne and A. K. Ka agg l, Pr -processing of compressed digital v d, SPIE Conference on Visual Communications and Image Processing, San Jose, CA, January 2001. [7] K. Ramchandran and M. Vetterli, Rate-distortion optimal fast thresholding with complete JPEG/ PEG d d r mpa l y, IEEE Transactions on Image Processing, vol. 3, no. 5, 700-704. [8] L.P. Kondi and A. K. Katsaggelos, An Operational Rate-Distortion Optimal Single-Pass SNR Scalable Video Coder, IEEE Trans. Image Processing, vol. 10, issue 11, pp. 1613-1620, 11/2001. 4