Sund coding of imge sequences using multiple vector quntizers Emnuel Mrtins, Vitor Silv nd Luís de Sá Instituto de Telecomunicções, Deprtmento de Engenhri Electrotécnic Pólo II d Universidde de Coimr, 3030 COIMBRA, PORTUGAL ABSTRACT One efficient wy to compress digitl imges is sund coding. Sund coding using vector quntiztion could e competitor to DCT-like imge compression schemes. In this pper we will descrie n imge sequence compression lgorithm sed on difference imge coding techniques, with lock motion compenstion, difference imge segmenttion in rectngles using qudtrees, decomposition of rectngles in sunds nd vector quntiztion of the sunds. The vector quntiztion scheme uses multiple vector quntizers, which yields etter itrte lloction. The quntiztion of ech sund is performed y 3 different tree structured vector quntizers (TSVQ) t vrile tree depths. The Rte-Distortion (R-D) curves of ll the rectngles re scnned to get the est glol R-D comintion. The est comintion prmeters re coded nd used to quntize the sunds of ll the rectngles. The results show slightly etter performnce of the this scheme in reltion to the optiml sclr quntiztion of sunds (Entropy Limit). The coding speed of this VQ scheme is only 3 times slower thn 1 single vector quntiztion per vector. Keywords: Imge coding, sund coding, vector quntiztion, multiple quntizers 1. INTRODUCTION Nowdys there re numer of existing stndrds for digitl video compression 1,2,3 with very good performnce in terms of the qulity/rte rtio. These use similr nd common techniques with reduced differences etween them. They comine interfrme prediction, lock trnsform, sclr quntiztion, run-length encoding nd sclr entropy coding. The video coding system descried in this pper differs from the existing stndrds, ecuse it uses vrile size rectngle segmenttion, sund coding nd vector quntiztion. The im of this work ws to study vector quntiztion (VQ) in comintion with sund coding. Sund decomposition hs shown to chieve good visul results in comprison to lock trnsform techniques. VQ is very efficient method for compression of voice nd imge signls 5,6, in prticulr is very suitle for low itrte pplictions, where the desired finl visul qulity is not too high. However VQ needs too mny resources for rel time imge compression such s computtionl power nd memory. For prcticl resons vector dimensions greter thn 16 re voided, otherwise the compression system ecomes too much complex. In this pper we compre Vector Quntiztion (VQ) to sclr quntiztion (SQ) with regrd to the performnce of n existing video compression system 7 using temporl prediction, sund decomposition, sclr quntiztion nd entropy coding. In order to mke the comprison, the SQ prt of the lgorithm ws chnged y VQ. The remining prts of the lgorithm were left unchnged. In this study VQ is pplied to locks of pels elonging to the sunds, using multivector quntizers with vrile dimensions (2 1 up to ). In order to reduce the computtionl effort of the encoder, quternry tree structured vector quntiztion (TSVQ) ws employed. 2. THE VIDEO COMPRESSION SYSTEM The hyrid compression system used in this work comines severl techniques of digitl imge compression. A rief review of these techniques is presented. 2.1. Coding of imge differences Moving imges re very correlted in time. The 1st order entropy of one single frme in movie is greter thn the 1st order entropy of the difference etween consecutive frmes.
SPLIT frme 21 frme21 - frme 20 current frme difference frme difference frme encoder compressed difference frme previous decoded frme current decoded frme 1 frme dely difference frme decoder decoded difference frme =trnsmited dt strem = decoder replic in encoder Fig.1 - First order entropy. () Isolted frme, H 1 = 6,89 pp. () difference etween consecutive frmes, H 1 = 3,61 pp. (SPLIT movie, 8pp). Fig.2 - Video encoder sed on simple prediction nd difference coding. Almost ll the video compression systems exploit this temporl correltion nd use interfrme prediction, e.g. they encode the difference etween frmes insted of isolted frmes. Fig.2 shows the interfrme prediction scheme used in our encoder. The dt to e coded is the difference etween the current frme, in the entrnce of the encoder, nd the previous coded frme. The previous coded frme is the one expected on the decoder output, if no error occurs in the trnsmission chnnel. 2.2. Imge segmenttion nd motion estimtion nd lock merging The system divides the difference imge in locks of 8 8 pels (moving locks) nd drops those with smll energy, the not moving locks (ckgrounds normlly). Fig.3 shows n exmple where only 0% of the totl imge is to e coded. c c c c () () (c) (d) Fig.3 - () Blocks to e coded, from Fig.1 (in gry). () Blocks to e coded fter relxtion (lck res re not coded). (c) Block merging. Simple exmple. (d) Result of lock merging of Fig.3 (1 lock = 8 8 pels). The moving locks (with energy ) re merged in rectngulr regions. In order to get smll numer of (nd wide) regions relxtion lgorithm is pplied to the locks resulting from the segmenttion stge. This method produces more homogeneous res fter the merging nd llows etter results for sund coding. Fig.3 shows the result of the relxtion procedure. The merging lgorithm uses qudtrees to construct the wide rectngles efficiently 8. Fig.3c shows the result of the lock merging lgorithm for prticulr exmple. This procedure reduces the order effects cused y sund quntiztion nd decorreltes etter imge dt. The system performs lso motion estimtion like MPEG. Mtching locks hve only 8 8 pels nd the serch re is within window of ±16 pels. The motion vectors re encoded like in MPEG. 2.3. Sund decomposition Sund decomposition is one of mny ville methods used to decorrelte dt. Sund decomposition cn e seen s specil trnsform. The most used method to decorrelte dt in imge compression is the DCT - Discrete Cosine Trnsform, lock trnsform, normlly pplied to locks of 8 8 pels. In this system we chose sund coding in order to get good visul results in the decoded imges nd to void common lock rtifcts tht pper in lock trnsform sed pproches.
Current Imge Difference Imge Msk Informtion Overhed extrct chnged locks sund decomposition VQ with multi-tsvq vector codes 1imge rectng. Dec. Bnds LL LH HL HH LL 0 LH 1 HL 2 HH 3 L= low, low frequency H= high, high frequency Dec. Bnds LL LH HL HH LL LH 5 HL 6 HH 7 Dec. Bnds 1st letter- horizontl 2nd letter-vertic. LL LH HL HH LL 8 LH 9 HL 10 HH 11 Fig. - Sund decomposition of rectngle. p p e r2 h ḋ rw Previous decoded imge motion estimtion Dely of 1 imge = dt trnsmited to decoder motion dt overhed motion dt overhed Current decoded imge motion compensted Msk Informtion Overhed (shded picture) the copy of decoder in the coder inverse VQ inverse decomposition Fig. 5 - Glol digrm of the video encoder. pper2.drw This system uses pyrmid sund decomposition of rectngles, where ech rectngle is decomposed in sunds: LL (low-low), the low frequencies nd, LH (low-high), HL (high-low) nd HH (high-high). L nd H mens the kind of filtering tht is done to the 2-D signl. The 2-D filters used in the decomposition re seprle so tht they implemented through 2 1- D filters: one low-pss filter nd one high-pss filter. The 1-D filters re FIR (Finite Impulse Response) filters with length nd re pplied first horizontlly nd then verticlly. The filters were designed in order to otin n optiml performnce from the point of view of imge coding gin 9. The LL nd is further decomposed in new sunds if the originl rectngle is greter or equl to lock of 16 16 pels. The second LL sund is even further decomposed in new sunds if the originl rectngle is greter or equl to lock of 32 32 pels. Fig. shows the pyrmid sund decomposition scheme used in our system. After sund decomposition ech smple (coefficient) of the sunds re sclr quntized. The quntiztion of the sunds reduces the it rte generted y the imges ut introduces distortion to the reconstructed imges. The glol digrm of the video encoder is shown in Fig 5. 3. SUBBAND QUANTIZATION Sclr quntiztion is used in the mjority of imge compression systems ecuse it is esy to use nd llows entropy coding without incresing too much the complexity of the system. Vector quntizers re less used ecuse of their complexity, despite the existence of low complexity schemes tht hve only smll decrese of performnce in comprison to the most effective nd complex ones. The dvntge of tree structured quntizers, TSVQ 6, is tht its coding speed is greter tht tht of the Full Serch VQ (FSVQ), in the group of memoryless VQ. There re, of course, VQ techniques with etter performnce like ECVQ - Entropy Constrined Vector Quntiztion 10 - or EPTSVQ - Entropy-Pruned Tree Structured Vector Quntiztion 11. However they hve prohiited complexity for imge compression systems with men/high qulity. 3.1. Multiple quntizers Ech sund hs its own sttistics, we use different quntizers nd codeooks for ech sund. These quntizers nd codeooks re especilly designed for the sund they re intended for. In order to overcome some limittions of TSVQ quntizers, in terms of vrile it rte through wide rnge, we choose severl quntizers up to mximum of per sund, vrying for ech one the dimensions of the vectors. Of course, the low frequencies sunds will need more it rte to e coded thn the high frequencies sunds ecuse they crry more visul informtion nd re more sensile to quntiztion distortion. After decomposing ech rectngle in severl sunds, they re reorgnized to fit in the sme rectngle like the exmples of Figures 6, 6c nd 6d. These new rectngles of sunds re prtitioned in locks of x elements s shown in Fig.6 nd these re the locks which will e vector quntized. In the exmple of Fig.6 we cn see tht there re locks distriuted over 1, 2 or sunds. Becuse ech lock must e quntized only with one codeook, it ws necessry to choose wht codeook to ssign to ech lock. For given x lock we ssign the codeook tht corresponds to the sund where the upper left pixel of the x lock lies (see Fig. 6, dotted squres). In this we use the codeook of the frequency sund when the x lock spns more thn one sund (note tht low frequency sunds need more trnsmission it rte thn high frequency sunds)
sunds 5 6 7 1 2 3 () sund numer of lock () 5 5 6 6 7 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 0 1 2 3 2 (c) 8 9 5 1011 6 7 1 3 (d) quntifier x pplied once worst distortion low rte quntifier x2 pplied twice quntifier 2x2 pplied times quntifier 2x1 pplied 8 times etter qulity high rte Fig.6 - Reorgnizing of one rectngle in sunds nd locks of x pels. () Loction of sunds fter 2 decompositions. () Numering sund locks. (c) Loction of sunds fter 1 decomposition. (d) Loction of sunds fter 3 decompositions. Fig.7 - Prtition of one lock of elements. There re severl quntizers for ech sund, depending on the desired finl imge qulity. Ech quntizer s tree must not occupy more memory thn tht occupied y 1000-000 vectors, otherwise the encoder will spent much memory. It ws necessry to devise wy of rising the it rte without rising the size of the trees. So, ech lock of elements is prtitioned in severl locks of lower dimensions depending on the rte/distortion of the lock. These sulocks re coded y different quntizers. Fig. 7 shows severl possiilities of prtitioning one lock of elements. This prtition method llows vrile it rtes per pel. Greter dimensions, like nd 2, re used for low itrtes with high distortion. Lower dimensions, like 2 2 nd 2 1, re used for high qulity using high itrtes. 3.2. Trining codeooks A different quntizer ws ssigned to ech of the 12 sunds nd, in ech sund, there re severl possiilities for the size of the vectors to e quntized (, 2, 2 2 nd 2 1). This mens tht the compression system would spend memory to mintin 8 codeooks. For memory sving purposes this numer ws decresed to less thn 16 codeooks, exploiting the fct tht codeook trined with one of LL sunds cn e used to code the others sunds, if they hve similr sttistics (when using the scheme in Fig.). One test hs shown tht using quntizer to code sund (LL) trined with sund led to the sme result thn using codeook trined with sund 0 (LL lso). Using this ssumption, the codeook trined with sund 0 is used to code sunds 0, nd 8 (the LL sunds). This procedure is lso pplied in the other sunds. In order to trin the codeooks used y the vector quntizers, we hd to choose n imge sequence (movie) representtive of wide rnge of imges tht could pper t the encoder input. We used the LTS/EPFL sequence 12, ecuse this sequence is rich in complex motion like cmer pn, zooming nd unzooming. Normlly this kind of motion does not pper when testing video compression stndrds. The vectors used to trin the codeooks for the video encoder were otined using dedicte progrm. This progrm is replic of the video encoder, without the quntiztion prt. The vectors in the input of the quntiztion prt re those tht would e vector quntized. After getting the trin vectors, ech codeook ws trined using the LBG lgorithm 13 for TSVQ. The size of the designed quternry trees is 6 levels with vectors per level, wht gives mximum of 096 vectors per codeook. We choose quternry trees ecuse of their serch speeds. Binry trees re fstest, ut hve not so good qulity s quternry trees. If we choose more thn vectors per rnch of the tree, we will increse the coding time nd the encoder would ecome very slow. 3.3. Vrile rte quntiztion Better results cn e chieved y vrying the lloction of the ville it rte long the rectngles of the frme to encode. Some locks of the frme need more itrte to code thn others, ssuming the sme finl qulity. So, this system mkes use of compression fctor for ech rectngle, where different compression fctor mens different it rte to tht rectngle nd lso different quntizer nd vector size. We ve chosen 16 compression fctors for ech lock. This gives 16x12=192 prmeters to choose. The configurtion of the encoder (the reltion etween the compression fctor nd the vector quntizer used t ech sund nd the it rte used with ech vector) ws, in certin wy, heuristic. We hd some difficulty to choose ll the 192 prmeters nd this is still under study.
The ttriution of the compression fctors to ech rectngle is mde fter quntizing ech rectngle with ll the 16 compression fctors. The distortion nd rte for ech fctor nd rectngle is otined nd n itertive procedure finds good comintion of compression fctors for ll the rectngles tht mximize the finl PSNR of the decoded imge. This optimiztion procedure of the compression fctors is only 3 to times slower thn one single quntiztion insted of eing 16 times slower. It is not necessry to quntize ech rectngle for ll the 16 compression fctors ecuse we use, for different compression fctors, the sme quntizers ut with different serching depth levels of the tree. After choosing the compression fctors for ll the rectngles, the encoder send the indexes of the compression fctors for the decoder nd the codes of ech quntized vector. With this lst informtion plus the motion vectors, the imge msk nd the previous decoded imge, the decoder cn reconstruct the encoded imge.. RESULTS We hve simulted the video compression system descried efore. The tests were mde in severl monochromtic imge sequences, like SPLIT (6 people moving simultneously), Miss Americ nd Trevor. All imge sequences hve CIF resolution (352 288 pels). The results re presented in figures 8, 9 nd 10 in terms of PSNR (Pek Signl to Noise Rtio) versus frme numer. PSNR [db] 3 33 32 31 30 29 28 27 2 6 10 1 18 22 26 30 3 38 2 6 50 Frme Numer Fig. 8 - PSNR of the sequence SPLIT coded t 0.3 pp (26 ). 1 st imge coded with norml TSVQ of pels locks t 0.3 pp. Thick Line: Sund VQ with multiple quntizers (<PNSR>=32.16 db). Dshed line: Sund SQ with idel entropy coding (<PSNR>=31.29 db). PSNR [db] 3 1 39 37 35 33 31 2 6 10 1 18 22 26 30 3 38 2 6 50 Frme Numer Fig.9 - PSNR for severl imge sequences coded t 0.3 pp (originls t 8 pp) (26 ). 1 st imge known y the decoder. Thick line = Miss Americ. Norml line = Trevor. Dshed line = SPLIT. PSNR [db] 1 0 39 38 37 36 35 3 2 12 22 32 2 52 62 72 82 Frme Numer Fig.10 - PSNR of decoded sequence Miss Americ. 1 st imge known y the decoder. Thick Line: 0.15 pp (52 ). Dshed line: 0.11 pp (72 ). In Fig.8 we compre the performnce of the VQ coding lgorithm relted to the SQ coding of the sunds coefficients. Fig. 8 shows difference of 1 db in PSNR etween VQ nd SQ which mens tht the VQ system offers only smll increse in performnce. In Fig.9 we show the results in terms of PSNR for severl sequences compressed 26 times (0.3 pp), Miss Americ, Trevor nd SPLIT. Fig 10 shows the performnce in PSNR for higher compression rtes (52 times nd 72 times) for the Miss Americ sequence.
5. CONCLUSION We hve presented new video compression method using vector quntiztion of sunds with multiple quntizers. The encoder includes sund decomposition, imge segmenttion nd motion estimtion. The performnce of this system using vector quntiztion is slightly etter thn the similr system using sclr quntiztion with idel entropy coding. The system is not yet fully explored ecuse its configurtion hs mny vrile prmeters. 6. REFERENCES 1. ITU-T Rec. H.261, Video Codec for udiovisul services t px6 kit/s, Genev, 1990; revised t Helsinki, 1993. 2. ISO/IEC 11172-2, Coding of moving picture nd ssocited udio for digitl storge medi up to out 1,5 Mit/s, Prt 2: Video, 1993 (MPEG-1). 3. ISO/IEC 13818-2 ITU-T Rec. H.262, Generic coding of moving pictures nd ssocited udio informtion Prt 2: Video, Octoer 199, (MPEG-2).. J.W. Woods (editor), Sund imge coding, Kluwer Acdemic Pulishers, 1991. 5. R.M. Gry, Vector Quntiztion, IEEE ASSP Mgzine, pp. -29, April 198. 6. A. Gersho nd R.M. Gry, Vector Quntiztion nd signl compression, Kluwer Acdemic Pulishers, Boston, MA 1992. 7. V. Silv nd L. Sá, Vrile lock size wvelet video coding, in Proc. of IEEE-SP Interntionl Symposium on Time- Frequency nd Time-Scle Anlysis, pp.32-35, 199. 8. V. Silv nd L. Sá, Qudtree sed lock merging lgorithm for su-nd coding, Proc. SPIE - Europen Symposium on dvnced Network nd Services, 20-2 Mrch 1995. 9. V.Silv nd L. Sá, Anlyticl Optimiztion of CQF filters, IEEE Trns. on Signl Processing, Vol., No. 6, pp.156-1568, June 1996. 10. P.A. Chou, T. Lookugh nd R.M. Gry, "Entropy-Constrined Vector Quntiztion", IEEE Trns. on Acoustics, Speech, nd Signl Processing, Vol. 37, No.1, pp.31-2, Jnury 1989. 11. E. A. Riskin., T. Lookugh, P.A. Chou nd R.M. Gry, "Vrile Rte Vector Quntiztion for Medicl Imge Compression", IEEE Trns. on Medicl Imging, pp. 290-298, Septemer 1990. 12. André Nicoulin, The LTS/EPFL video sequence for very low it rte coding, Swiss Federl Institute of Technology, Lusnne, 199. 13. Y. Linde, A. Buzo nd R. Gry, An lgorithm for vector quntizer design, IEEE Trns. on Communictions, vol.28, nº.1, pp.8-95, Jnury 1980.