Video Denoising Algorithm in Sliding 3D DCT domain

Dmytro Rusanovskyy and Karen Egazaran, ACIVS 2005, Antwerp, Belgum Vdeo Denosng Algorthm n Sldng 3D DCT doman Dmytro Rusanovskyy and Karen Egazaran Insttute of Sgnal Processng Tampere Unversty of Technology, Fnland E-mals: {FrstName.LastName}@tut.f Abstract. The problem of denosng of vdeo sgnals corrupted by addtve Gaussan nose s consdered n ths paper. A novel 3D DCT-based vdeodenosng algorthm s proposed. Vdeo data are locally fltered n sldng/runnng 3D wndows (arrays) consstng of hghly correlated spatal layers taken from consecutve frames of vdeo. Ther selecton s done by the use of a block matchng or smlar technques. Denosng n local wndows s performed by a hard thresholdng of 3D DCT coeffcents of each 3D array. Fnal estmates of reconstructed pxels are obtaned by a weghted average of the local estmates from all overlappng wndows. Expermental results show that the proposed algorthm provdes a compettve performance wth state-ofthe-art vdeo denosng methods both n terms of PSNR and vsual qualty. 1. Introducton Dgtal mages and vdeo nowadays are essental part of everyday lfe. Often mperfect nstruments of data acquston process, natural phenomena, transmsson errors and compresson can degrade a qualty of collected data. Presence of nose may suffcently affect the further data processng such as analyss, segmentaton, classfcaton and ndexng. Denosng s typcally appled before any aforementoned mage/vdeo data processng. Heren, the problem of denosng of vdeo corrupted by addtve ndependent whte Gaussan nose s consdered. Hstorcally, frst algorthms for vdeo denosng operated n spatal or spato-temporal domans [1]. Recent research on denosng has demonstrated a trend towards transform-based processng technques. Processng n a transform doman (e.g. n DCT, DFT or wavelet domans) provdes a superor performance comparng to the spato-temporal methods due to a good decorrelaton and compacton propertes of transforms. Wavelet-based vdeo denosng was nspred by the results of the ntensve work on the wavelet-based mage denosng [3-5] ntated by Donoho s wavelet shrnkage approach [2]. Several multresoluton (wavelet-based) approaches were recently proposed to the problem of vdeo denosng, see, e.g. [6] and [7]. Local adaptve sldng wndow DCT (SWDCT) mage denosng method [8], [9] s a strong alternatve to the wavelet-based methods. Ths paper gves an extenson of t to SWDCT denosng of vdeo. Ths extenson s not a straghtforward one. Vdeo data n the temporal drecton are not statonary due to a moton present n vdeos. Thus,

Dmytro Rusanovskyy and Karen Egazaran, ACIVS 2005, Antwerp, Belgum two pxels located at the same spatal locaton of consecutve frames could be uncorrelated. On the other hand, DCT s a good approxmaton of the statstcally optmal Karhunen-Loeve transform n the case of hghly correlated data [10]. Thus, a wse selecton of local 3D data through the dfferent frames should be performed before any applcaton of a 3D DCT. One approach to ths s to use a block matchng technque to correlate 2D mage blocks n sequental frames va mnmzaton of some cost functon (MSE or MAE). Full search or any of fast block matchng schemes could be utlzed here. Ths paper s organzed as followng. In Secton 2, the SWDCT mage denosng approach s brefly descrbed. A 3D-SWDCT vdeo denosng algorthm s proposed n Secton 3. In Secton 4, denosng performance of the proposed algorthm s analyzed n comparson wth the recent wavelet-based vdeo denosng algorthms [6], [7], [13]. Conclusons are gven n Secton 5. 2. Sldng Wndow DCT Denosng of Images Sldng wndow DCT denosng approach s well developed tool for mage denosng, (see, e.g. [8] and [9]). In ths secton we wll brefly descrbe ts basc prncples. SWDCT denosng scheme s graphcally depcted n Fg. 1. Suppose, we wsh to recover unknown mage xt () from nosy observatons yt () = xt () + nt (), where t = ( t1, t2) are coordnates n 2D space, nt ( ) s 2 2 an addtve Gaussan nose N(0, σ ) wth varance σ. A A j Nosy Image y Local DCT Denosng + + + + A x buf A j Accum. Buffer W W j W x x Denosed mage Weghng Mask Fg. 1. A general SW-DCT mage denosng scheme [9] Nosy mage yt ( ) s locally processed n the overlapped blocks (wndows) { A }. Runnng over the mage each A s separately fltered n the DCT doman (computng 2D DCT of { A }, thresholdng obtaned coeffcents and applyng an nverse 2D DCT to the result) to obtan a local estmate A. For every A ts relevance s reflected by a weght W evaluated from the local DCT spectrum propertes (selected to be a recprocal of the number of remanng (nonzero) after a threshold DCT coeffcents n the block). These estmates A and weghts W are further accumulated n the buffer

Dmytro Rusanovskyy and Karen Egazaran, ACIVS 2005, Antwerp, Belgum x buf and n the weghtng mask W, respectvely. Fnally, every denosed mage pxel xt () s obtaned by a weghted average of denosed local estmates of the same pxel from all overlapped estmates A. The SWDCT denosng algorthm can be expressed by equatons (1)-(4) Y( w) = F{ A }, (1) X w ( ) = T{ Y( w) } 1 { ( ) }, (2) A = F X w, (3) xt () = Wtx () buf. (4) 1 where F {} s a separable 2D forward DCT and F { } s ts nverse, w= ( w1, w2) are coordnates of 2D DCT coeffcents and T { } s a hard thresholdng functon ( ), ( ) Y w Y w Thr X ( w) =. 0, else (5) The SW-DCT denosng assumes several tunable parameters, such as the local wndow sze and sldng steps along the mage drectons. They can be user-specfed [9] ether adaptve to a local sgnal statstc [12] n order to acheve a better performance/complexty tradeoff. 3. Vdeo Denosng Based on a 3D DCT The SW-DCT denosng method s well developed for mages. In the case of vdeo, SW-DCT should be performed n the 3D space, and the use of a temporal redundancy of vdeo can mprove the flterng performance. Let us assume that SW-DCT operates n the spatal doman of each vdeo frame as t s descrbed above. In the temporal drecton 1D sldng DCT can be smlarly appled along the temporal axs. On the other hand, SW-DCT performance can be sgnfcantly mproved, f the transform wll operate over a hghly correlated sgnal. However, pxels along the temporal axs may be uncorrelated due to dynamcal nature of a vdeo sgnal. Due to ths, we propose to perform a local 3D DCT denosng on an array B (of sze A (k s an ndex of the Lh Lw Lt ) that s bult from correlated 2D blocks k, current frame) taken from the L t consecutve frames. These A k, blocks are selected usng a block matchng or smlar technque [11]. Here, the full search or a fast block matchng scheme va mnmzaton of some cost functon (MSE or MAD) can be employed.

Dmytro Rusanovskyy and Karen Egazaran, ACIVS 2005, Antwerp, Belgum L t y : x buf : A,0 A, k A,0 A, k B L h xl w xl t B Fg. 2. A general block-dagram of proposed vdeo denosng algorthm A general scheme of the proposed algorthm for vdeo processng s depcted n Fg. 2. A nosy sequence y s processed locally n the 3D wndows of sze Lh Lw Lt. An accumulatng buffer x buf keeps L t consecutve frames. Block selector uses a sldng wndow A j,0 n the 0 th frame of the buffer as a reference and searches n every A,..., A ; k = L 1 n terms of ther sequental frame for the best match {,1, k t } A. These 2D blocks A,0 and { A,1,..., A, k : k Lt 1} correlaton to,0 =, are flled to the buffer B. Note here that t may appear that a block selector wll fal to fnd, n some frame, a subblock A k, that correlates wth A,0. Ths could appear ether due to a dynamc nature of a vdeo or due to a global scene change. In order to prevent an error propagaton further, block selecton for the current A,0 should be termnated. In such a case a local 3DDCT denosng should be performed on a shorter n the temporal drecton array B, n other words we mplement an adaptve wndow sze selecton n the temporal doman. As a result, we have produced a 3D array B flled wth A,..., A ; k = L 1 whch s now a subject L t hghly correlated 2D blocks {,0, k t } of denosng. We retreve a locally denosed estmate of B as a result of hardthresholdng of the 3D-DCT coeffcents of B and accumulate t n the buffer x buf. Amount of the retreved estmates for a partcular x buf ( t) and statstcal propertes of the local DCT spectra defne Wt ( ), as t was specfed n Secton 2. After the current L t frames are processed, the sldng wndow shfts n the temporal drecton and a new L t + 1 frame becomes to be nvolved n the denosng procedure. Descrbed operatons (namely, block matchng, local denosng of B, accumulaton

Dmytro Rusanovskyy and Karen Egazaran, ACIVS 2005, Antwerp, Belgum of B n x buf ) are recursvely performed on a group of frames untl the last frame of the sequence s processed. Fnally, every pxel xt ( ) s reconstructed from a coordnate-wse weghtng of x buf ( t) wth a mask Wt ( ). The computatonal complexty of the proposed algorthm s mostly depends on the computaton of the 3D DCTs and a block matchng procedure performed for every spatal block. If we assume that szes of a local 3D DCT are L n all drectons, and the sldng parameters for both spatal coordnates are P, then the number of arthmetc operatons per output sample for the transform part of the algorthm s 2L equal to 2 µ + 1 2, where µ s a complexty of the 1D DCT. If 8 P L = and P = 4, ths number s 4 µ. Few operatons per output sample should be added to ths n the case of applcaton of a fast block matchng procedure. 4. Expermental results To evaluate the performance of the proposed method, several standard test CIF and QCIF vdeo sequences were used, see Tables 1 and 2. Orgnal sequences were corrupted by an addtve Gaussan nose wth a standard devatons equal to 10, 15 and 20, and then processed wth the denosng algorthm proposed n Secton 3. In our smulatons we chose processng buffer B to be of sze of 8x8x8 due to exsted and well developed software and hardware solutons for 8-pont DCT [10]. Buffer B was chosen to be sldng over a vdeo data wth the steps equal to 2 n both spatal drectons and 1 n the temporal drecton. The hard thresholdng procedure was appled to all 3D-DCT coeffcents of the buffer B to get a locally denosed estmate B. Threshold value Thr n the Equaton 5 was chosen to be equal to 2σ [8]. To select hghly correlated A k, blocks from 7 consecutve frames, we have used a fast block matchng algorthm n the pxel doman (so called logarthmc search [11]) wth a mnmal absolute error (MAD) as a cost functon. To prevent error propagaton, an adaptve wndow sze selecton n temporal doman was performed. The block selecton procedure was termnated for a partcular A,0 f a correlated A k, can not be found n the current frame. Ths mproves a flterng performance especally n the presence of hgh moton or scene change. Furthermore, a selecton algorthm that operates n the pxel doman and s based on MAD or MSE crtera may provde A k, correlated rather wth a nose pattern of the reference A, 0 than wth the orgnal vdeo sgnal. Ths could become a problem n vdeo fragments wth a very low sgnal to nose ratos, for example, dark flat regons corrupted wth heavy nose. To suppress such false moton predcton we rejected (set to zero) moton vectors f MAD of a predcton s lower than a predefned threshold wthn the dstance of 3σ. In Table 1, a performance of our algorthm s compared wth the results of waveletbased denosng schemes of [6] and [7]. The average PSNR values presented n Table 1 are computed over 40 ( Salesman ) and 52 ( Tenns and Flower Garden )

Dmytro Rusanovskyy and Karen Egazaran, ACIVS 2005, Antwerp, Belgum frames. The frst four frames of processed sequences are excluded from PSNR calculaton due to the recursve nature of the WRSTF algorthm [7] n order to make the comparson more objectve. Table 1. Vdeo denosng, comparatve results Vdeo Tenns Salesman Flower Nose, Average PSNR, dbs σ Nosy Soft3D [6] WRSTF [7] 3D SWDCT 10 28.16 31.86 32.41 33.34 15 24.63 29.86 30.12 30.80 20 22.15 28.58 28.68 29.52 10 28.15 34.85 35.82 37.01 15 24.72 33.29 33.91 34.83 20 22.35 32.00 32.40 33.29 10 28.34 30.23 30.80 31.25 15 24.88 27.71 28.19 28.62 20 22.44 26.01 26.39 26.80 To compare performance of the proposed algorthm wth the results reported n [13], we have appled our algorthm to Mss Amerca and Hall vdeo sequences corrupted wth addtve Gaussan nose (average PSNR of nosy vdeo are 20 dbs). Results are shown n Table 2. Analyss of Tables 1 and 2 demonstrates that our algorthm outperforms those from [6], [7] and [13]. Fg 3. and 4 gve some examples of denosed frames to subjectve judgment of vsual qualty of denosed vdeo sequences. (a) (b) Fg. 3. A fragment of the 30 th frame of the Salesman vdeo sequence. (a) Nosy (PSNR of fragment 22.29 dbs). (b) Denosed wth the proposed algorthm (PSNR of fragment 33.06 dbs)

Dmytro Rusanovskyy and Karen Egazaran, ACIVS 2005, Antwerp, Belgum (a) (b) Fg. 4. A fragment of the 30 th frame of the Flower vdeo sequence. (a) Nosy (PSNR of fragment 22.41dBs). (b) Denosed wth the proposed algorthm (PSNR of fragment 26.39 dbs) Table 2. Vdeo denosng, comparatve results Vdeo Average PSNR, dbs Nosy Proposed n [13] 3D SWDCT Mss Amerca 20 34.1 34.9 Hall 20 29.1 31.8 Detaled nformaton on the developed algorthm and vdeo sequences processed by 3D-SWDCT are avalable from: http://www.cs.tut.f/rusanovs/. 5. Conclusons A problem of denosng of vdeo sgnals corrupted by an addtve Gaussan nose s consdered n ths paper. A novel 3D DCT based vdeo denosng algorthm s proposed. Hgh flterng performance of the local 3D DCT based thresholdng s acheved by a proper selecton of vdeo volume data to be locally denosed. A 3D DCT thresholdng s performed on a group of hghly correlated sldng n spatal drectons 2D wndows that are selected from the set of sequental frames. Weghted average of overlapped denosed estmates provdes a fnal denosed vdeo. We have tested the proposed algorthm on a group of standard vdeo test sequences corrupted by an addtve Gaussan nose wth a varety of standard devatons. Results have demonstrated that the proposed algorthm provdes compettve results wth waveletbased vdeo denosng methods both n terms of PSNR and subjectvely qualty.

Dmytro Rusanovskyy and Karen Egazaran, ACIVS 2005, Antwerp, Belgum Acknowledgements Ths work has been fnancally supported by the Academy of Fnland, Fnnsh Center of Excellence Programme 2000-2005, project no. 5202853, as well as EU project NoE FP6-PLT 511568-3DTV. The authors would lke to thank V. Zlokolca and I. Selesnck for provdng processed vdeo sequences resultng from applcatons of ther denosng algorthms (presented n Table 1). References 1. J.C. Bralean, R.P. Klehorst, S.Efstratads, A.K.Katsaggelos, R.L.Lagendjk, Nose Reducton Flters for Dynamc Image Sequences: A Revew, IEEE Proc, Vol. 83, no. 9, Sep, 1995 2. D.L.Donoho, De-nosng by soft-thresholdng, IEEE Trans. on Informaton Theory, vol.41, no.3, pp.613-627, May 1995 3. L.Sendur, I.W.Selesnck, Bvarage Shrnkage Functons for Wavelet-Based Denosng Explotng Interscale Dependency, IEEE Trans. on Sgnal Proc., vol. 50, n. 11, pp. 2745-2756, November 2002 4. R.Cofman,D.Donoho, Translaton Invarant de-nosng, n Lecture Notes n Statstcs: Wavelets and Statstcs, vol. New York: Sprnger-Verlag, pp.125-150, 1995 5. N.Kngsbury, Complex Wavelets and Shft Invarance, avalable by the URL: http://ece-www.colorado.edu/fmeyer/classes/ece-5022/projects/kngsbury1.pdf 6. W.I. Selesnck and K.Y. L, Vdeo denosng usng 2d and 3d dualtree complex wavelet transforms, n Wavelet Applcatons n Sgnal and Image Processng (Proc. SPIE 5207), San Dego, Aug. 2003. 7. V. Zlokolca, A. Pzurca, W. Phlps "Wavelet Doman Nose-Robust Moton Estmaton and Nose Estmaton for Vdeo Denosng " Frst Internatonal Workshop on Vdeo Processng and Qualty Metrcs for Consumer Electroncs, Scotssdale, Arzona, 23-25 January, 2005, USA 8. L. Yaroslavsky and K. Egazaran and J.Astola, Transform doman mage restoraton methods: revew, comparson and nterpretaton, TICSP Seres #9, TUT, Tampere, Fnland, December 2000, ISBN 952-15-0471-4. 9. R. Öktem, L. Yaroslavsky and K. Egazaran, "Sgnal and Image Denosng n Transform Doman and Wavelet Shrnkage: A Comparatve Study", n Proc. of EUSIPCO'98, Sept. 1998, Rhodes, Greece. 10. K. Rao, P.Yp, Dscrete Cosne Transform: Algorthm, Advantages, Applcatons, Academc Press, 1990. 11. R. J. Clarke, Dgtal Compresson of Stll Images and Vdeo, Academc Press, 1995. 12. K. Egazaran, V. Katkovnk, H. Öktem and J. Astola, Transform-based denosng wth wndow sze adaptve to unknown smoothness of the sgnal, In Proc. of Frst Internatonal Workshop on Spectral Technques and Logc Desgn for Future Dgtal Systems (SPECLOG), June 2000, Tampere, Fnland, pp. 409-430. 13. N. Gupta, E. Plotkn, M. Swamy, Bayesan Algorthm for Vdeo Nose Reducton n the Wavelet Doman, IEEE Internatonal Symposum on Crcuts and Systems, ISCAS 2005, May 23-26, Kobe, Japan