A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper, we present a novel technque that can be used for fast smlarty-based ndexng and retreval of both mage and vdeo databases n dstrbuted envronments. We assume that mage or vdeo databases are stored n the compressed form usng standard technques such as JPEG for mages, and M-JPEG or MPEG for vdeos. The exstng technques, proposed n the lterature, use computatonally ntensve features and cost functons for content-based mage and vdeo retreval and ndexng. The proposed algorthm uses an nnovatve approach based on hstograms of DC coeffcents only, and therefore s computatonally less expensve than the other approaches. In the case of a JPEG-compressed mage database, the query process s the followng. The user submts a request for searchby-smlarty by presentng the desred mage. The algorthm calculates the DC coeffcents of ths mage and creates the hstogram of DC coeffcents. Then, the algorthm compares the DC hstogram of the submtted mage wth the DC hstograms of the mages stored n the database usng a hstogram smlarty metrc. The mage database can be local or at a remote server. In our experments, we compared several hstogram smlarty metrcs: weghted Eucldean dstance, square dfference, and absolute dfference. The algorthm then selects and presents to the user the mages wth the smallest values of the metrc that best match the submtted mage. In the case of a compressed vdeo database, the smlarty-based ndexng and retreval s more complex. The manpulaton of a vdeo database conssts of three man operatons: () parttonng of the vdeo nto clps, () key frame extracton, and (3) ndexng and retreval of key frames. The proposed algorthm has been appled n all three steps. Frst, the DC hstograms are mplemented for parttonng each vdeo nto clps or camera shots. Then, n the next phase the same DC hstograms are used to extract key frames and create a database of key frames only. Fnally, n the last step, the user submts one or more vdeo frames that he/she s searchng for. We mplemented the descrbed algorthm for smlarty-based retreval to both mage and vdeo databases. The expermental results, presented n the paper, show that the proposed algorthm can be very effcent for smlarty-based search of mages and vdeos n dstrbuted envronments, such as Internet, Intranets, or local-area networks. Keywords: content-based retreval and ndexng, multmeda databases, DC coeffcents, hstogram of DC coeffcents. INTRODUCTION There are two man approaches n ndexng and retreval of mages and vdeos n multmeda databases: (a) keyword-based ndexng and (b) content-based ndexng. The keyword-based ndexng uses keywords or descrptve text, whch s stored together wth mages and vdeos n the databases. Retreval s performed by matchng the query, gven n the form of keywords, wth the stored keywords. Ths approach s not satsfactory, because the text-based descrpton tends to be ncomplete, mprecse, and nconsstent n specfyng vsual nformaton.

To overcome ths problem, recent research has been focused on content-based ndexng and retreval technques [,,3,4,5]. Ths approach allows users to ndex and retreve mages and vdeos from databases usng vsual content (such as promnent regons, color, shape, sze, and texture), moton related nformaton (movement of objects, enlargng or shrnkng, and global camera operaton), and smlarty-based. The current technques, proposed n lterature, mostly deal wth uncompressed multmeda objects (mages and vdeos). There are several technques proposed for shot detecton and segmentaton of compressed vdeo [3,4,5]. These technques use block comparson metrcs, whch measure the dfferences between DCT coeffcents of blocks n two frames. In ths paper, we present a technque for content-based mage and vdeo ndexng and retreval, whch uses hstograms of DC coeffcents. We assume that mages and vdeos n multmeda databases are stored n compressed form (JPEG for mages or MPEG and M-JPEG for vdeos). We propose a fast retreval and ndexng algorthm that can be very effcently used for content-based search on the Internet. The fundamental dea of the new algorthm conssts of usng hstograms of DC coeffcents only of the stored JPEG mages, or I-frames n the case of the compressed MPEG or M-JPEG vdeo. The experments show that the hstogram of DC coeffcents s a very dstngushable characterstc of an mage and can be effectvely used for mage or vdeo retreval and ndexng. On the other hand, the calculaton of the hstogram of DC coeffcents and related cost functons turns to be very fast and does not requre computatonally ntensve algorthms.. AN ALGORITHM FOR SIMILARITY-BASED RETRIVAL OF IMAGES The JPEG encodng standard for full-color mages s based on DCT transformaton. An mage s dvded nto 8x8 blocks, and pxels from each block are transformed from spatal to frequency doman. The transformed 64-pont dscrete sgnal s a functon of two spatal dmensons, and ts components are called spatal frequences or DCT coeffcents. The F(0,0) coeffcent s called the DC coeffcent, and the remanng 63 coeffcents are called AC coeffcents. For color mages, represented by YUV or YCbCr format, the DCT transform s performed to all three components. The proposed algorthm s based on DC coeffcents that are calculated only for Y (lumnance) component. There are two reasons for ths decson: () human vsual system s more senstve to Y than to two other chromnance components, and () the JPEG and MPEG standards typcally use hgher densty for Y than for the other two components. Hstogram of DC Coeffcents The pxels of the orgnal Y component n spatal doman are coded wth 8 bts. However, after the DCT transformaton, the szes of DC coeffcents of the Y component become bts; the DC coeffcents are n the range [-04 to +03]. The hstogram of DC coeffcents can be now created. For llustraton purposes, the DC hstogram s created for the mage elephant, whch conssts of 600x800 pxels. The mage contans 75x00 mcroblocks, whch gves 7,500 DC coeffcents. The hstogram of DC coeffcents s shown n Fgure. The number of hstogram bns n ths example s 048, whch corresponds to all values of DC coeffcents n the range [-,04 to +,03]. However, the hstogram of DC coeffcents can be reduced to a smaller sze of hstogram bns 04, 5, or 56 bns. The hstogram wth a smaller sze of bns requres less computaton when hstogram smlarty metrc s calculated. Hstogram Smlarty Metrcs Hstogram smlarty metrcs are used to compare DC hstograms of a gven mage wth hstograms of compressed mages from the database. We analyzed three hstogram-comparson metrcs: () Weghted Eucldean Dstance, () Square Dfference, and (3) Absolute Dfference. These three metrcs are defned next. Let s denote the j th hstogram bn value of a query mage as H Q (j), and the j th hstogram bn value of an mage n the database as H D (j). Then, the Weghted Eucldean Dstance (WED) metrc s defned as WED = N j= w [ H j Q H D ]

where: N s the total number of hstogram bns, and w j s the weght n bn j defned as w w j j = H Q f.. H = otherwse Q 0 Fgure. Hstogram of DC coeffcents for the mage elephant. The Square Dfference (SD) metrc s defned as SD = N j= [ H Q H D ]

and the Absolute Dfference (AD) metrc as AD = N j= H Q H D The complexty of all three metrcs depends on the number of hstogram bns. Our experments have shown that the metrcs based on 5 bns perform qute well and not much worse than wth,048 bns. Example of Smlarty-Based Retreval of an Image Database In the followng example, we compared the effcency of three metrcs n retrevng compressed mages from an mage database. We created an mage database, whch conssts of 00 mages. We performed the experments for dfferent number of hstogram bns: 048, 04, 5, and 56. The user submts a request for search by smlarty by presentng the desred mage to the algorthm. The algorthm calculates the DC coeffcents of ths mage. Then, one of the hstogram smlarty metrc s calculated to compare the DC hstogram of the submtted mage wth the DC hstograms of the mages stored n the database. Then, the algorthm presents to the user the set of mages wth the smallest values of hstogram smlarty metrcs. The whole query process takes only a few seconds. For llustraton, n Table and Fgure, results of query-by-smlarty are presented. In Fgure, the algorthm presented best 0 matches of the compressed mages based on the absolute dfference metrc. Table. Results of Retrevng Image elephant.jpg from the Image Database IMAGE NAME WED IMAGE NAME SD IMAGE NAME AD Elephant.jpg 0.30 Elephant.jpg 0 Elephant.jpg 0 Elephant3.jpg 650 Elephant3.jpg 9.35 Elephant3.jpg 0.5 Oregeon-sunset.jpg 45 Icefeld.jpg 8.8 Elephant.jpg 0.83 Icefeld.jpg 508 Oregeon-sunset.jpg 8.99 Flower3.jpg.04 Icefeld.jpg 53 Namess.jpg 9.87 Goat.jpg.07 Chamber.jpg 546 Icefeld.jpg 9.96 Flower7.jpg.09 Namess.jpg 548 Namess3.jpg 30. Surf.jpg. Porcelan.jpg 568 Namess4.jpg 3.09 Flower6.jpg. Woman.jpg 573 Namess6.jpg 3.44 Flower4.jpg.6 Namess3.jpg 583 Lake-goat.jpg 3.3 Sd5.jpg.0 The followng conclusons can be drawn from these experments: All three metrcs gave good results n smlarty-based retreval, but the absolute dfference metrc seems to be the most relable. Reducng the number of hstogram bns from,048 to,04 was effcent. Frst, t reduced the number of operatons needed for the calculaton of smlarty metrcs. Second, the smaller number of bns reduced the senstvty of ndexng due to quantzaton nose. However, when the number of bns was further reduced to 5 and 56, the ndexng results were deterorated.

Fgure. Example of smlarty-based retreval usng the DC hstogram and the absolute dfference metrc.

3. AN ALGORITHM FOR SIMILARITY-BASED RETRIEVAL OF COMPRESSED VIDEOS In the case of compressed vdeo databases, the procedure s more complex. The manpulaton of a vdeo database conssts of three man operatons: () Parttonng of the vdeo nto clps, () Key frame extracton, and (3) Indexng and retreval of key frames. The frst two steps are typcally performed off-lne durng the feature extracton phase, whle the last step s performed n real tme. The proposed algorthm, based on DC hstograms, can be appled n all three steps. Frst, the DC hstogram s mplemented to partton each vdeo nto clps or camera shots. Then, n the next phase the same DC hstogram s used to extract key frames and create a database of key frames only. Fnally, n the last step, the user submts one or more vdeo frames that he/she s searchng for. The algorthm s capable of searchng through the vdeo database (key frames only) and retreve the most smlar frames or clps. Vdeo Parttonng The hstogram of DC coeffcents can successfully be used n parttonng vdeo by detectng camera breaks. Frst, let s consder M-JPEG compressed vdeo, where all frames are I-frames. In ths case, we use DC hstograms to compare subsequent frames and detect camera breaks. To mnmze the computatonal complexty, the range of DC coeffcents s reduced to [-56,+55] by usng the followng formula: F(0,0) = 3 7 x= 0 y= 0 where: F(0,0) s a DC coeffcent, and f(x,y) s a pxel value of y-component n a 8x8 block. 7 f ( x, y) To test the smlarty of hstograms of subsequent frames from the same clp, we performed several experments wth standard vdeo clps Football, Mss Amerca, and Suse. Results, presented n Fgure 3a-c, show two DC hstograms for each clp, the hstogram of frame 0 and frame 8. In all three cases the hstograms of these two frames are almost dentcal. Then, we compared DC hstograms of dfferent clps. Fgure 4 compares the DC hstograms of frame 0 for these three clps. It shows that the hstograms of these three frames are sgnfcantly dfferent. In order to detect camera breaks, we defne the normalzed square dfference metrc (NSD): NSD = N j= [ H H H ] where: NSD s the normalzed square dfference metrc for frame, and H (j) are DC hstogram values for the th frame, and j s one of possble hstogram levels. If the overall dfference s greater than a gven threshold T, a camera break s declared.

Fgure 3. Hstogram of DC coeffcents of frames 0 and 7 for vdeo clps: (a) Football, (b) Mss Amerca, and (c) Suse.

Mss Amerca Footbal l Suse Fgure 4. DC hstogram comparson of frames 0 for three vdeo clps. To test the proposed technque, we apled t to a composed vdeo conssted of three clps, each contanng 8 frames. The results of the vdeo parttonng experment are presented n Fgure 5. The algorthm was able to correctly detect both camera breaks. The threshold, used n the experment, was T =0. 40 Camera breaks 35 NSD x 00 [%] 30 5 0 Threshold 5 0 5 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 Frame number Fgure 5. DC hstogram comparson technque appled to vdeo parttonng. For a vdeo database compressed usng the MPEG technque, the vdeo parttonng uses a two-pass approach [3]. In the frst pass, the proposed technque, based on DC hstograms, s appled to I-frames only. For example, for a MPEG sequence {IBBPBBPBB}{IBBPBBPBPP}, etc., the algorthm wll detect the camera breaks occurred between I-frames. In the second pass, a technque based on moton vectors [7] s appled to detect the camera break wthn those sequences whch are detected n the frst pass.

Key Frame Extracton In the next step, the key frames are extracted from the vdeo segments dentfed n the frst step. The DC hstogram comparson technque s used n ths step as well. However, the smlarty metrc s now defned as the accumulated dfference between the current frame and the prevous key frame NSD = N j= [ H H KF H ] where: H KF (j) s the j th hstogram bn value of the DC hstogram of the prevous key frame. The frst frame n a vdeo clp s always declared as the frst key frame. Then, the other frames are compared to ths frame. When the dfference becomes greater than the threshold T, the current frame s declared as the next key frame. The followng frames are then compared to ths key frame. Fgure 6 llustrates the procedure for extractng key frames. Note that the threshold T =0, used for the key frame extracton, s smaller than the threshold T, used for vdeo parttonng. The descrbed process s appled to I-frames only. 6 4 Threshold Accumulated NSD x 00 [%] 0 8 6 4 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 - Frame number Key frames Fgure 6. DC hstogram comparson technque for extracton of key frames. In the example n Fgure 6, the vdeo clp comprsed of three sequences: Football, Mss Amerca, and Suse, each consstng of 8 frames. The algorthm has extracted four key frames.

Indexng and Retreval of Key Frames Fnally, n the last step, the DC hstogram technque s appled to smlarty-based search of extracted key frames. The set of key frames, extracted n the prevous step, comprses a key-frame database, and the search s now performed on key frames only. Ths step s equvalent to the smlarty-based retreval of mage databases, descrbed n Secton. In our experment, we created a database of key frames and appled the proposed algorthm for the retreval of frames, whch are smlar to the gven frame. The results are shown n Fgure 7. Fgure 7. Example of smlarty-based retreval of key frames usng DC hstograms.

4. CONCLUSION We presented an algorthm for smlarty-based ndexng and retreval of mage and vdeo databases. The proposed algorthm s based on DC hstograms of compressed mages and vdeo frames. We analyzed several hstogram smlarty metrcs n order to select the most effcent one. The algorthm has been tested on a small compressed mage database as well as on several vdeo sequences. In summary, the proposed algorthm can be very effcent for smlarty-based retreval of mages and vdeos n dstrbuted envronments, such as Internet, Intranets, or local-area networks. REFERENCES. H.J. Zhang, S.Y. Tan, S.W. Smolar, and Y. Gong, Automatc Parsng and Indexng of News Vdeo, Multmeda Systems, Vol., No. 6, pp. 55-64, 995.. E. Ardzzone and M. La Casca, Automatc Vdeo Database Indexng and Retreval, Journal of Multmeda Tools and Applcatons, Vol. 4, pp. 9-56, 997. 3. B. Furht, S.W. Smolar, and H.J. Zhang, Vdeo and Image Processng n Multmeda Systems, Kluwer Academc Publshers, Norwell, MA, 995. 4. F. Arman et al., Content-Based Browsng of Vdeo Sequences, Proc. of ACM Multmeda 94, San Francsco, CA, October 994. 5. H.J. Zhang, et al., Vdeo Parsng Usng Compressed Data, Proc. SPIE 94 Symposum on Image and Vdeo Processng, San Jose, CA, pp. 4-49, February 994. 6. B.-L. Yeo and B. Lu, A Unfed Approach to Temporal Segmentaton of Moton JPEG and MPEG Compressed Vdeo, Proc. IEEE Internatonal Conference on Multmeda Computng and Networkng, Washngton D.C., pp. 8-88, May 995. 7. A. Akutsu et. al., Vdeo Indexng Usng Moton Vectors, Proc. of SPIE 9 Symposum on Communcatons and Image Processng, Boston, MA, pp. 5-530, November 99.