A Novel Video Retrieval Method Based on Web Community Extraction Using Features of Video Materials

IEICE TRANS. FUNDAMENTALS, VOL.E92 A, NO.8 AUGUST 2009 1961 PAPER Specal Secton on Sgnal Processng A Novel Vdeo Retreval Method Based on Web Communty Extracton Usng Features of Vdeo Materals Yasutaka HATAKEYAMA a), Student Member, Takahro OGAWA, Satosh ASAMIZU, and Mk HASEYAMA, Members SUMMARY A novel vdeo retreval method based on Web communty extracton usng audo and vsual features and textual features of vdeo materals s proposed n ths paper. In ths proposed method, canoncal correlaton analyss s appled to these three features calculated from vdeo materals and ther Web pages, and transformaton of each feature nto the same varate space s possble. The transformed varates are based on the relatonshps between vsual, audo and textual features of vdeo materals, and the smlarty between vdeo materals n the same feature space for each feature can be calculated. Next, the proposed method ntroduces the obtaned smlartes of vdeo materals nto the lnk relatonshp between ther Web pages. Furthermore, by performng lnk analyss of the obtaned weghted lnk relatonshp, ths approach extracts Web communtes ncludng smlar topcs and provdes the degree of attrbuton of vdeo materals n each Web communty for each feature. Therefore, by calculatng smlartes of the degrees of attrbuton between the Web communtes extracted from the three knds of features, the desred ones are automatcally selected. Consequently, by montorng the degrees of attrbuton of the obtaned Web communtes, the proposed method can perform effectve vdeo retreval. Some expermental results obtaned by applyng the proposed method to vdeo materals obtaned from actual Web pages are shown to verfy the effectveness of the proposed method. key words: vdeo retreval, canoncal correlaton analyss, lnk analyss, Web communty extracton 1. Introducton Due to the recent wdespread use of the Internet and wdespread use of dgtal recordng and storage, there have been many studes on vdeo retreval. Vdeo retreval s generally acheved on the bass of keywords that represent topcs of vdeo materals. In ths approach, vdeo materals n a database are frstly annotated manually by keywords. Users then provde keywords that represent the topcs of ther desred vdeo materals to the vdeo retreval system, and the correspondng vdeo materals can be retreved. However, the performance of vdeo retreval n ths approach depends on the accuracy of the manual annotaton, and t s dffcult to manually annotate all of the vdeo materals n a very large database. Therefore, vdeo materals should be annotated automatcally n order to perform effectve vdeo retreval n a very large database. Manuscrpt receved December 8, 2008. Manuscrpt revsed February 28, 2009. The authors are wth the Graduate School of Informaton Scence and Technology, Hokkado Unversty, Sapporo-sh, 060-0814 Japan. The author s wth the Kushro Natonal College of Technology, Kushro-sh, 084-0916 Japan. a) E-mal: hatakeyama@lmd.st.hokuda.ac.p DOI: 10.1587/transfun.E92.A.1961 Several methods for automatc keyword extracton from the text of Web pages have recently been proposed [1] [3]. These methods analyze underlyng contents from the text of Web pages and automatcally extract typcal words from the text. Thus, these methods can automatcally annotate vdeo materals on the Web. However, the text of Web pages s provded by users, whch nclude the ntentons of users. Thus, the extracted keywords are affected by the ntentons of users, and the performance of retreval stll depends on the accuracy of them. In order to solve these problems, alternatve approaches known as content-based retreval approaches have been proposed [4] [7]. These approaches can realze retreval on the bass of color, wavelet-based features, shape, and locaton. By applyng these approaches, smlar mages and vdeo materals can be retreved on the bass of ther vsual features, whch are automatcally extracted from them. However, exstng approaches cannot accurately grasp the semantc concepts of vdeo materals from ther low-level features. Semantc annotaton methods for extracton of semantc concepts have recently been proposed [8] [10]. These approaches learn semantc concepts of the vdeo materals from ther low-level features. Thus, these approaches can automatcally annotate vdeo materals on the Web by ther semantc concepts. However, snce these approaches can only utlze vsual features of vdeo materals, audo features are not effectvely utlzed for the annotaton. Furthermore, these approaches are appled to very large databases. Thus, features that can be obtaned from vdeo materals on the Web such as Web page text, lnk relatonshp between ther Web pages are not effectvely utlzed. Therefore, these features should be utlzed n order to acheve more accurate retreval of vdeo materals on the Web. In ths paper, we propose a novel vdeo retreval method based on Web communty extracton usng audo and vsual features and textual features of vdeo materals. In the proposed method, the followng two novel approaches are ntroduced nto the retreval scheme of vdeo materals on the Web. 1. Canoncal correlaton analyss s appled to three knds of vdeo features, vsual and audo features of vdeo materals and textual features obtaned from Web pages contanng those vdeo materals, and these dfferent These three features are collectvely called vdeo features n ths paper. Copyrght c 2009 The Insttute of Electroncs, Informaton and Communcaton Engneers

1962 IEICE TRANS. FUNDAMENTALS, VOL.E92 A, NO.8 AUGUST 2009 knds of features can be compared n the same varate space. By applyng lnk analyss to three adacency matrces obtaned from smlartes of each knd of feature, Web communtes are extracted usng features of vdeo materals. 2. Web communtes extracted from dfferent adacency matrces are combned on the bass of ther smlartes. Thus, effectve results of retreval can be acheved by usng features of vdeo materals By ntroducng the above two novel approaches, the proposed method can obtan results of retreval based on Web communty extracton usng features of vdeo materals. The proposed method conssts of two procedures: Web communty extracton and vdeo retreval usng the obtaned Web communty. In the frst procedure, Web communtes of vdeo materals ncludng smlar topcs are extracted. In order to realze ths procedure, canoncal correlaton analyss [11] [14] s appled to vdeo features. Next, the proposed method ntroduces the obtaned smlartes of vdeo materals nto the lnk relatonshp between ther Web pages. By performng lnk analyss of the obtaned weghted lnk relatonshp, ths approach extracts Web communtes that nclude smlar topcs on the bass of the obtaned vdeo features. Ths scheme also provdes the degree of attrbuton of the vdeo materals n each Web communty. In the second procedure, when the user selects one Web communty from one of the three vdeo features, the most smlar ones n the other two vdeo materals are extracted. Specfcally, the proposed method calculates the smlartes of degrees of attrbuton of the vdeo materals between the Web communtes extracted from dfferent vdeo features, and the desred communtes are automatcally selected. Consequently, by montorng the degrees of attrbuton of the three obtaned Web communtes, the proposed method retreves vdeo materals contanng smlar topcs. Ths paper s organzed as follows. Canoncal correlaton analyss s explaned n Sect. 2. In Sect. 3, a vdeo feature extracton method s presented, and the smlarty between two vdeo materals s defned on the bass of varates obtaned by applyng canoncal correlaton analyss to the obtaned vdeo features. In Sect. 4, a Web communty extracton method based on the obtaned smlartes s explaned. Vdeo retreval usng the vdeo features s also explaned n ths secton. In Sect. 5, results of experments performed by applyng the proposed method to vdeo materals obtaned from actual Web pages are shown to verfy the effectveness of the proposed method. Concludng remarks are gven n Sect. 6. 2. Canoncal Correlaton Analyss Canoncal correlaton analyss s explaned n ths secton. Canoncal correlaton analyss s generally used to nvestgate the relatonshp between several groups of varables, and lnear transformaton that maxmzes the correlaton between these groups of varables can be obtaned. By utlzng obtaned transformed varates, dfferent knds of features can be compared n same varate space. Specfcally, gven matrces X ( = 1, 2,, R), X = [x 1, x2,, xp ] ( = 1, 2,, R), (1) whch are N P matrces, respectvely. Note that N s the number of data and P s the number of varables n each data set. Canoncal correlaton analyss calculates coeffcent vectors w maxmzng the correlaton between the vectors g ( = 1, 2,, R) as follows: g = X w. (2) Specfcally, by usng a vector y whose elements are unknown, we estmate y and X w (= g ) that mnmze Q(y, w ) = y X w 2 (y y = 1). (3) =1 Frst, by utlzng the least-squares method, the above equaton can be rewrtten as follows: Thus, Q(y, w ) y X (X X ) 1 X y 2. (4) =1 Q(y, w ) R y 2 y (X (X X ) 1 X )y, (5) wherewedefne =1 Q(y) = R y 2 y (X (X X ) 1 X )y, (6) =1 (w = (X X ) 1 X y). (7) Second, n order to mnmze Q(y), the second term of Eq. (6) must be maxmzed. Then, the vector y can be obtaned as the soluton of the followng egenvalue problem: (X (X X ) 1 X )y = λy. (8) =1 Furthermore, from the egenvectors, whch correspond to the N e (= rank(x X )) postve egenvalues, the coeffcent matrx W ( = 1, 2,, R)sdefnedasfollows: W = [w 1, w2,, wk ] (k = 1, 2,, N e), (9) where w k represent the egenvectors and correspond to w n Eq. (2). Then, the followng equaton s obtaned: μ 1 0 0 W X X 0 μ 2 W =..... 0 μ k

HATAKEYAMA et al.: A NOVEL VIDEO RETRIEVAL METHOD BASED ON WEB COMMUNITY EXTRACTION 1963 = Λ,, (10) where Λ, s a correlaton matrx whose dagonal elements represent the canoncal correlaton coeffcents μ k (k = 1, 2,, N e ) between g (= W X )andg (= W X ). Thus, canoncal correlaton analyss requres the followng procedures. () The egenvalue problem shown n Eq. (8) s solved. () The coeffcent matrces W ( = 1, 2,, R) are calculated n Eqs. (7) and (9) () The correlaton matrces Λ, (, = 1, 2,, R) s calculated n Eq. (10) (v) By calculatng W and Λ, for all ( = 1, 2,, R), X, W x n (n = 1, 2,, P )andw x m (m = 1, 2,, P ) can be compared n the same varate space of x m. As shown n the above procedures, lnear transformaton that maxmzes the correlaton between R pars of vectors can be obtaned by canoncal correlaton analyss. 3. CCA-Based Smlarty Computaton between Vdeo Materals In order to use the proposed vdeo retreval method, we must calculate vdeo features of each vdeo materal and ther smlartes based on the varates obtaned by applyng canoncal correlaton analyss to the obtaned vdeo features. Therefore, we show the method for calculatng audovsual and textual features of vdeo materals n 3.1. In 3.2, we show how canoncal correlaton analyss s utlzed for the obtaned features and how the new smlartes between two vdeo materals for each feature are obtaned. 3.1 Vdeo Feature Extracton In ths secton, we show how vsual features, audo features and textual features are calculated by the proposed method from the vdeo materal f ( = 1, 2,, N, wheren s the number of vdeo materals}). In the proposed method, the shot segmentaton method [15] s appled to the vdeo materals f, and several shots s q (q = 1, 2,, M,whereM s the number of shots wthn the vdeo materal f ) are obtaned. Furthermore, vsual, audo and textual feature vectors are obtaned for each shot s q. Therefore, the proposed method calculates M vsual feature vectors, M feature vectors, and M textual feature vectors for each vdeo materal f. An overvew of vdeo feature extracton s shown Fg. 1, and the detals of ther extractons are shown below. Vsual Feature Vector Frst, for each frame n the vdeo materal f, ts HSV color hstogram wth p bns s calculated, and ts vector s obtaned. Next, the proposed method dvdes f nto several shots s q (q = 1, 2,, M, where M s the number of shots wthn the vdeo materal f ) by the shot segmentaton method and calculates the Fg. 1 Overvew of the vdeo feature extracton. In ths fgure, M = 3, and the musc class s most ncluded n the clps wthn shot s 3. vector medan [16], whch s obtaned from the vectors of the color hstograms calculated for the frames n the same shot s q, as ts vsual feature vector u q (= [v q (1),v q (2),,v q (p)] ). As shown n the above procedures, M vsual feature vectors u q (q = 1, 2,, M ) are obtaned for each vdeo materal f. Audo Feature Vector Each shot s q ( q {1, 2,, M }) n the vdeo materal f s frst dvded nto some clps, the nterval of each clp beng set to T c. Next, based on the method n [17], each clp wthn the shot s q (q {1, 2,, M })sclassfed nto four audo classes, slence, speech, musc and nose. The audo class that s most ncluded n the clps wthn the shot s q s selected. Furthermore, for all of the clps classfed nto the audo classes n s q, the averages and standard devatons of the followng 11 features are computed: volume, zero-crossng rate, ptch, frequency centrod, frequency bandwdth, sub-band energy rato (0 630 Hz) (630 1720 Hz) (1720 4400 Hz) (4400 11025 Hz), non-slence rato and zerorato. These features are used n not only reference [17] but also several conventonal methods for audo sgnal classfcaton [18], [19]. Then, by algnng the element of the obtaned features, the audo feature vector a q (= [a q (1), a q (2),, a q (22)] ) s obtaned for each shot s q. Thus, the M audo feature vectors a q (q = 1, 2,, M ) are obtaned for each vdeo materal f. Textual Feature Vector Suppose that the total number of keywords that appear on N Web pages contanng vdeo materals f ( = 1, 2,, N)sK. Then, by applyng TF-IDF [20] to the keywords n the Web pages of each vdeo materal f,

1964 IEICE TRANS. FUNDAMENTALS, VOL.E92 A, NO.8 AUGUST 2009 weghts correspondng to these K keywords are calculated. Furthermore, by algnng the obtaned weghts, the feature vector ω (= [ω 1,ω2,,ωK ] ) s obtaned for f. Note that the dmenson of the obtaned vector s much larger than the dmensons of the vsual and audo features. Thus, we apply prncpal component analyss (PCA) to ω ( = 1, 2,, N) and reduce ther dmensons as follows. Frst, from the obtaned feature vectors ω ( = 1, 2,, N), a covarance matrx s obtaned as follows: D = 1 N 1 N (ω ω)(ω ω), (11) =1 where ω s the average vector of ω ( = 1, 2,, N). Furthermore, by calculatng the egenvectors from the covarance matrx D, calculaton of the egenvector matrx U becomes possble as follows. D UΛ 2 uu, (12) where Λ u s the egenvalue matrx. Thus, by usng the obtaned egenvector matrx U, a new vector t s obtaned as follows: t = U (ω ω) ( = 1, 2,, N). (13) The obtaned vector t s defned as the textual feature vector. For each shot s q wthn the vdeo materal f, we assgn the same textual feature vectors t q (= t )(q = 1, 2,, M ). As shown n the above procedures, the vsual feature vector u q, the audo feature vector aq and the textual feature vector t q are obtaned for each shot s q wthn the vdeo materal f. 3.2 Computaton of Smlarty between Vdeo Materals In ths secton, n order to compare dfferent knds of features n the same feature space, canoncal correlaton analyss s appled to each knd of feature vector obtaned n 3.1, and by utlzng the transformed varates, the proposed method defnes smlartes of vdeo materals for each knd of feature. Frst, from the vsual feature vector u q,theaudo feature vector a q and the textual feature vector t q,whch are obtaned from the vdeo materal f, the followng three matrces are obtaned by the proposed method: X v = H[u 1 1, u2 1,, uq 1 1,, uq ], (14) X a = H[a 1 1, a2 1,, aq 1 1,, aq ], (15) X t = H[t 1 1, t2 1,, tq 1 1,, tq ]. (16) In the above equatons, H s a centerng matrx and s defned as follows: H = I 1 N 11, (17) where I s the N N dentty matrx and 1 = [1, 1,, 1] s an N 1 vector. Next, three knds of weghted vectors Fg. 2 The overvew of relatonshps between three knds of features on canoncal correlaton analyss.(k = t, l = {v, a}) are obtaned by applyng canoncal correlaton analyss to the three matrces that are obtaned n Eqs. (14) (16), and then three knds of coeffcent matrces W k (k {v, a, t}) are obtaned. From the obtaned W k (k {v, a, t}) andλ k,l (k, l {v, a, t k l}) whose dagonal elements represent the canoncal correlaton coeffcents between k and l, ν q k s calculated as follows: ν k q = Λ l,k W k kq W k kq f k l f k = l (k, l {v, a, t}) (18) where k q represents u q, a q, t q, whch are the vsual, audo, and textual feature vectors of the shot q n the vdeo materal f, respectvely. Furthermore, an overvew of relatonshps between transformed varates from each knd of feature vector s shown Fg. 2. By usng the above equaton, all of the features are transformed nto new varates; that s, they can be transformed nto features based on relatonshps between vsual, audo and textual features of vdeo materals. In the proposed method, smlartes of vsual, audo, and textual features are defned as the smlarty matrces S v, S a and S t, respectvely. Note that (, )-th ( = 1, 2,, N, = 1, 2,, N) elements of these matrces are respectvely defned as follows: S v (, ) = max q,q S a (, ) = max q,q S t (, ) = max q,q ν q v ν q v ν v q ν v q ν q a ν q a ν a q ν a q ν q t ν q t ν t q ν t q, (19), (20). (21) Note that q ( {1, 2,, M )) and q ( {1, 2,, M })represent each shot n vdeo materals f and f, respectvely. Thus, the smlartes of vdeo materals usng canoncal correlaton analyss are calculated by the followng procedures.

HATAKEYAMA et al.: A NOVEL VIDEO RETRIEVAL METHOD BASED ON WEB COMMUNITY EXTRACTION 1965 () By applyng canoncal correlaton analyss to three matrces X v, X a,andx t whch represent three knds of features, the transformaton of varates s calculated by Eq. (18). () By utlzng the transformed varates ν v q, ν a q,andνq t, the smlartes of vdeo materals for each knd of feature are calculated by Eqs. (19) (21). As shown n the above procedures, by applyng canoncal correlaton analyss to each feature vector, the vectors ν q k are obtaned as semantc feature vectors. Furthermore, the matrces Λ k,l and W k are determned n such a way that the sum of ν q k ν q l 2 (k, l {v, a, t}) becomes mnmum. In the followng secton, we perform Web communty extracton based on each feature by utlzng the obtaned smlarty matrces and the lnk relatonshp between Web pages of the vdeo materals. Furthermore, the proposed method, whch acheves vdeo retreval by rankng the vdeo materals n the obtaned Web communty, s explaned. 4. Vdeo Retreval Method Usng Web Communty Extracton An overvew of vdeo retreval usng Web communty extracton s shown Fg. 3. In the proposed method, users select a desred Web communty from many Web communtes, and Web communtes that are smlar to the selected Web communty are automatcally selected and combned. Furthermore, by rankng vdeo materals n the combned Web communty, effectve retreval usng vdeo features can be realzed. Ths secton s organzed as follows. Web communty extracton based on features of vdeo materals for each feature n 4.1 s presented. In 4.2, new smlarty measures between Web communtes n dfferent features are obtaned. Then, by montorng the degrees of attrbuton n the three selected communtes, rankng of the vdeo materals that effectvely utlze the audo, vsual and textual features and the lnk relatonshp can be acheved. 4.1 Web Communty Extracton Based on Features of Vdeo Materals In the feld of Web mnng, several lnk analyss methods have been proposed [21], and sets of Web pages that are smlar can be obtaned by usng the Web communty extracton method. Specfcally, the egenvector s calculated from a matrx L L,whereL s the N N adacency matrx of N Web pages, and each element of the obtaned egenvector represents the degree of attrbuton of each Web page for a target Web communty of smlar topcs. In the proposed method, the smlartes S u, S a and S t are utlzed for calculatng the adacency matrx, and Web communty extracton based on vdeo features s realzed. Frst, by utlzng smlarty matrces obtaned from Eqs. (19) (21) and the adacency matrx L, the weghted adacency matrces L v, L a, L t are respectvely defned. The (, )-th ( = 1, 2,, N, = 1, 2,, N) elements of L v, L a and L t are respectvely defned as fol- Fg. 3 lows: Overvew of the vdeo retreval usng Web communty extracton. L v (, ) = S v (, )L(, ), (22) L a (, ) = S a (, )L(, ), (23) L t (, ) = S t (, )L(, ). (24) In the proposed method, Web communty s extracted by usng the obtaned weghted adacency matrx. Specfcally, the egenvectors e v, e a and e t ( = 1, 2,, N) are obtaned from the matrces L vl v, L al a and L t L t, respectvely. Then, each element for the obtaned egenvector can be regarded as a degree of attrbuton of the target Web communty for each vdeo materal. Therefore, the proposed method can perform Web communty extracton based on features of each vdeo materal. 4.2 Rankng of Vdeo Materals n Web Communtes By usng the lnk analyss algorthm shown n 4.1, degrees of attrbuton of vdeo materals for each Web communty can be obtaned by the proposed method. In our approach, one Web communty s manually selected by the user from one of the vdeo features, and the most smlar Web communtes n the two vdeo features are automatcally selected by our method. Furthermore, by rankng the vdeo materals n smlar Web communtes, the proposed method effectvely performs vdeo retreval based on features of each vdeo materal. Frst, from N egenvectors e k ( = 1, 2,, N; k {v, a, t}), the user selects one Web communty; that s, one egenvector correspondng to the

1966 IEICE TRANS. FUNDAMENTALS, VOL.E92 A, NO.8 AUGUST 2009 Web communty s selected. Then the proposed method automatcally select egenvectors correspondng to the most smlar Web communtes n the other two features. Specfcally, the smlarty between two Web communtes s defned as an nner product of ther egenvectors. Then, the two egenvectors that maxmze the smlartes for the Web communty selected from one of the vdeo features are obtaned for the other two vdeo features. In ths way, the three egenvectors e v, e a and e t can be obtaned by the proposed method. Note that the obtaned egenvectors correspond to the vectors for the Web communtes of smlar topcs. Therefore, from the obtaned egenvectors e v, e a and e t, the proposed method selects the predetermned number of vdeo materals f that maxmzes the followng crteron: C = max{u e v, u e a, u e t} ( = 1, 2,, N), (25) where u ( = 1, 2,, N) s a standard bass vector whose -th element s one. The -th element of the standard bass vector u represents the theoretcal maxmum degree of attrbuton of vdeo materal f for a target Web communty. Thus, crteron C represents the maxmum degree of attrbuton of vdeo materal f obtaned from three egenvectors for the target Web communty. By utlzng crteron C, accurate retreval usng vdeo features can be acheved by usng the proposed method. Then, the Web communty concept s used and appled to the proposed method as follows: () Smlartes between vsual, audo and textual features n vdeo materals f ( = 1, 2,, N) and f ( = 1, 2,, N) are calculated by Eqs. (19) (21), respectvely. () The adacency matrx s weghted by smlartes of vsual, audo and textual features by Eqs. (22) (24), respectvely. () Lnk analyss s appled to each weghted adacency matrx, and three sets of Web communtes are obtaned usng vsual, audo and textual features by calculatng the egenvectors e v,e a,ande t of L vl v, L al a,andl t L t. (v) By selectng one Web communty from the obtaned Web communtes, two Web communtes that are smlar to the selected Web communty are automatcally selected from the other features. (v) By combnng those Web communtes, results of retreval can be obtaned on bass of the degree of attrbuton of the vdeo materal for the combned Web communty. Thus, the proposed method can acheve vdeo retreval based on Web communty extracton usng features of vdeo materals. 5. Expermental Results The effectveness of the proposed vdeo retreval method s verfed n ths secton. It s dffcult to apply the proposed Table 1 Results of retreval by the proposed method. The keyword s sports as a query. Web communty whose topc s sports festval s selected. rank overvew of topcs 1 Introducton of a sports car 2 Sports festval 3 Sports festval 4 Sports festval 5 Sports festval 6 Sportng-goods company commercal 7 Sports festval 8 Sports festval 9 Sports festval 10 Sports festval Table 2 Results of retreval by YouTube. The keyword s sports as a query. rank overvew of topcs 1 Sports bloopers 2 Promotonal flm of musc whose ttle contans sports 3 Omnbus varous sports 4 Intervews about sports 5 Promotonal flm of sports car 6 Sports festval 7 Sport news program 8 Sports festval 9 Introducton of a sports car n auto show 10 Sports bloopers method to all vdeo materals on the Web. In the experment, n order to obtan a large number of vdeo materals, the keyword sports was gven as a query to YouTube, and results for the top 1000 vdeo materals were obtaned. By selectng a desred Web communty, results of retreval that contan vdeo materals wth smlar topcs can be obtaned from vdeo materals ncludng many topcs by usng the proposed method, and hence we ust provde one keyword as a query and obtan a large number of vdeo materals ncludng varous topcs. We set the number of HSV color hstogram bns to 48 (H = 12, S = 2, V = 2). The total number of keywords s 6722. Results of retreval by applyng the proposed method to the obtaned vdeo materals and results of retreval by YouTube are shown n Table 1 and Table 2, respectvely. It can be seen that the proposed method acheves accurate retreval of vdeo materals whose topcs are smlar to each other. In the proposed method, canoncal correlaton analyss s appled to the three knds of features that are extracted from the audo-vsual sgnals and the text of Web page, and calculaton of smlarty of semantc features between vdeo materals for each feature s possble. Furthermore, by usng the lnk analyss algorthm based on the obtaned smlartes and the lnk relatonshp of the Web pages contanng the vdeo materals, Web communtes of smlar topcs can be extracted for each feature. Consequently, results of retreval that contan smlar vdeo materals ncludng a smlar topc are obtaned by usng the proposed method. Retreval by YouTube seems to provde vdeo materals whose keywords are the same as the gven http://www.youtube.com/

HATAKEYAMA et al.: A NOVEL VIDEO RETRIEVAL METHOD BASED ON WEB COMMUNITY EXTRACTION 1967 Table 3 Results of retreval by the proposed method. The keyword s anmal as a query. Web communty whose topc s vdeo games whose ttle contans anmal s selected. rank overvew of topcs 1 Playng vdeo game whose ttle contans anmal 2 Playng vdeo game whose ttle contans anmal 3 Playng vdeo game whose ttle contans anmal 4 Playng vdeo game whose ttle contans anmal 5 Educatonal program about anmal 6 Promotonal flm of musc whose ttle contans anmal 7 Promotonal flm of musc whose ttle contans anmal 8 Playng vdeo game whose ttle contans anmal 9 Playng vdeo game whose ttle contans anmal 10 Vdeo about anmal experments Table 4 Results of retreval by YouTube. The keyword s anmal as a query. rank overvew of topcs 1 Educatonal program whose characters are anmal mascots 2 Promotonal flm of musc whose ttle contans anmal 3 Educatonal program for chldren 4 Funny anmal flm 5 Playng vdeo game whose ttle contans anmal 6 Professonal wrestlng match 7 Promotonal flm of musc whose muscan s name contans anmal 8 Playng vdeo game whose ttle contans anmal 9 Slde show of anmal pctures 10 Promotonal flm of musc whose ttle contans anmal query keywords. Consequently, by gvng a keyword such as sports that ncludes varous topcs as a query, YouTube shows results of retreval that contan varous vdeo materals ncludng the same keywords as the query. Compared to the results obtaned by usng the approach of YouTube, t can be seen that the the proposed method can realze vdeo retreval based on Web communty ncludng the vdeo materals of smlar topcs. The keyword anmal was also gven as a query to YouTube, results for the top 1000 vdeo materals were obtaned. We set the number of HSV color hstogram bns to 48 (H = 12, S = 2, V = 2). The total number of keywords s 7598. Results of retreval by applyng the proposed method to the obtaned vdeo materals and results of retreval by YouTube are shown n Table 3 and Table 4, respectvely. It can be also seen that the proposed method can obtan results of retreval based on the Web communty ncludng the vdeo materals of smlar topcs. Thus, by applyng the proposed method to a large number of vdeo materals that nclude varous topcs, effectve retreval based on Web communty extracton can be realzed. Next, we quanttatvely verfy the effectveness of the proposed vdeo retreval method. We used recall and precson, whch are defned as follows: Num. of correctly retreved vdeo materals Recall=, (26) Num. of relevant vdeo materals Num. of correctly retreved vdeo materals Precson=. Num. of vdeo materals retreved (27) Results of retreval obtaned by usng the proposed approach Fg. 4 Quanttatve evaluaton of the proposed method and conventonal methods () and () based on precson and recall. and by usng two conventonal approaches based on the followng two smlartes are shown n Fg. 4. Conventonal approach () Smlartes between Web pages are defned on the bass of ther text of Web pages. Specfcally, by applyng TF-IDF to keywords n Web pages, weghts correspondng to all keywords are calculated. Furthermore, by algnng the obtaned weghts, a feature vector s obtaned. Smlartes between Web pages are defned as an nner product of obtaned feature vectors n [20]. Conventonal approach () The HSV color hstogram wth p bns s calculated for each frame n the vdeo materal. Then, by algnng the elements of the obtaned HSV color hstogram, a feature vector s extracted for each frame. Next, shots of vdeo materal are obtaned by the shot segmentaton method, and the vector medan s calculated from the obtaned feature vectors of the frames n the shot as a keyframe. Smlartes between the Web pages contanng vdeo materals are defned on the bass of Hstogram Intersecton [22] obtaned from all keyframes n the vdeo materal. From ths fgure, we can see the proposed method keeps hgher precson and recall compared to the conventonal methods. Thus, the results of retreval contan many desred vdeo materals than the conventonal methods. The conventonal methods cannot utlze dfferent knds features obtaned from the vdeo materals. In the proposed method, the followng two novel approaches are ntroduced. Frst, canoncal correlaton analyss s appled to three knds of feature vectors obtaned from the vdeo materals on the Web, and these dfferent knds of feature vectors can be compared same varate space. By applyng lnk analyss to three adacency matrces obtaned from smlartes of each knd of feature, Web communtes are extracted usng features of vdeo materals. Next, Web communtes extracted from dfferent adacency matrces are combned on the bass of ther smlartes. Thus, by utlzng these novel approaches, the proposed method acheves notceable

1968 IEICE TRANS. FUNDAMENTALS, VOL.E92 A, NO.8 AUGUST 2009 Table 5 Computaton tme of vdeo feature extracton. (The audo sgnal was sampled at 44.1 khz. The mage sze was 320 240 pxels and the frame rate was 15 fps.) vsual feature 31 sec audo feature 15 sec textual feature 0.5 sec Table 6 Computaton tme of the Web communty extracton, and rankng the vdeo materals n the Web communty usng 1000 vdeo materals. Web communty extracton 19.5 sec Rankng the vdeo materals 1.1 sec mprovement over the tradtonal approach. Therefore, the effectveness of the proposed approach was quanttatvely specfed by ths experment. Fnally, we show the computaton tme of the proposed method. The proposed method conssts of three computaton parts: vdeo feature extracton, Web communty extracton, and the rankng of vdeo materals n the Web communty. We frstly show the computaton tme of vdeo feature extracton on a computer usng a QuadCore Intel R Xeon R CPU E5420 2.50 GHz Dual Processor wth 32 Gbytes of RAM. By utlzng 60-sec vdeo materals, the computaton tmes of for vsual and audo features obtaned from the vdeo materal are shown n Table 5. Assumng that the total number of keywords that appear on Web pages 6722, the computaton tme for textual feature extracton from the text of a Web page s also shown n Table 5. Furthermore, these features only need to be extracted once before Web communty extracton and rankng of the vdeo materals n the Web communty, and thus t s not contaned n actual retreval tme. The computaton tmes for the Web communty extracton and rankng of vdeo materals n the Web communty usng 1000 vdeo materals are shown n Table 6. The Web communtes can also be prelmnarly extracted before the rankng of vdeo materals n the Web communty. Thus, the computaton tme for Web communty extracton s not contaned n the actual retreval tme. Furthermore, the computaton tme of the rankng the vdeo materals contans the combnaton of the Web communtes whch s smlar to each other. Ths procedure can be performed before the retreval process, and t should therefore be possble to reduce the computaton tme for rankng of vdeo materals n the Web communty. 6. Conclusons In ths paper, we have proposed a novel vdeo retreval method based on Web communty extracton usng audovsual features and textual features of vdeo materals. The proposed method conssts of two procedures, Web communty extracton and vdeo retreval usng the obtaned Web communty. In the frst procedure, Web communtes of vdeo materals ncludng smlar topcs are extracted for each feature. In the second procedure, the proposed method calculates the smlartes of the degrees of attrbuton of the vdeo materals between the Web communtes extracted from dfferent vdeo features, and the desred communtes are automatcally selected. Then by rankng vdeo materals n the obtaned Web communtes, the proposed method can perform vdeo retreval. Our expermental results verfed the superorty of the proposed vdeo retreval method. Snce accurate vdeo retreval can be acheved by our method, our method can be appled to exstng applcatons for vdeo retreval. However, Web communtes, whch contan some vdeo materals wth topcs that are not smlar, are obtaned by the proposed method. Therefore, mprovement n Web communty extracton s needed for the proposed method. Ths wll be the subect of subsequent studes. Acknowledgment Ths work was supported under proect Regonal Innovaton Creaton R&D Programs. References [1] J. Klenberg, Bursty and herarchcal structure n streams, KDD 02: Proc. Eghth ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, pp.91 101, ACM, New York, NY, USA, 2002. [2] T. Nanno, T. Fuk, Y. Suzuk, and M. Okumura, Automatcally collectng, montorng, and mnng Japanese weblogs, WWW Alt. 04: Proc. 13th Internatonal World Wde Web Conference on Alternate Track Papers & Posters, pp.320 321, ACM, New York, NY, USA, 2004. [3] R. Kumar, J. Novak, P. Raghavan, and A. Tomkns, On the bursty evoluton of blogspace, WWW 03: Proc. 12th Internatonal Conference on World Wde Web, pp.568 576, ACM, New York, NY, USA, 2003. [4] J.Z. Wang, Smplcty: A regon-based retreval system for pcture lbrares and bomedcal mage databases, MULTIME- DIA 00: Proc. Eghth ACM Internatonal Conference on Multmeda, pp.483 484, ACM, New York, NY, USA, 2000. [5] J. L, J.Z. Wang, and G. Wederhold, Irm: Integrated regon matchng for mage retreval, MULTIMEDIA 00: Proc. Eghth ACM Internatonal Conference on Multmeda, pp.147 156, ACM, New York, NY, USA, 2000. [6] S. Feng, R. Manmatha, and V. Lavrenko, Multple bernoull relevance models for mage and vdeo annotaton, CVPR 2004, Proc. 2004 IEEE Computer Socety Conference on Computer Vson and Pattern Recognton, vol.2, pp.ii-1002 II-1009, June-July 2004. [7] K. Barnard and D. Forsyth, Learnng the semantcs of words and pctures, IEEE Internatonal Conference on Computer Vson, vol.2, p.408, 2001. [8] C.G.M. Snoek, M. Worrng, J.C. van Gemert, J.M. Geusebroek, and A.W.M. Smeulders, The challenge problem for automated detecton of 101 semantc concepts n multmeda, MULTIMEDIA 06: Proc. 14th Annual ACM Internatonal Conference on Multmeda, pp.421 430, ACM, New York, NY, USA, 2006. [9] X. L, D. Wang, J. L, and B. Zhang, Vdeo search n concept subspace: A text-lke paradgm, CIVR 07: Proc. 6th ACM Internatonal Conference on Image and Vdeo Retreval, pp.603 610, ACM, New York, NY, USA, 2007. [10] W. Jang, S.F. Chang, T. Jebara, and A.C. Lou, Semantc concept classfcaton by ont sem-supervsed learnng of feature subspaces and support vector machnes, European Conference on Computer Vson, pp.270 283, 2008. [11] H. Hotellng, Relatons between two sets of varates, Bometrka, vol.28, no.3-4, pp.321 377, 1936. [12] P. Horst, Relatons among m sets of measures, Psychometrka,

HATAKEYAMA et al.: A NOVEL VIDEO RETRIEVAL METHOD BASED ON WEB COMMUNITY EXTRACTION 1969 vol.26, no.2, pp.129 149, June 1961. [13] A. Nelsen, Multset canoncal correlatons analyss and multspectral, truly multtemporal remote sensng data, IEEE Trans. Image Process., vol.11, no.3, pp.293 305, March 2002. [14] H. Zhang, Y. Zhuang, and F. Wu, Cross-modal correlaton learnng for clusterng on mage-audo dataset, MULTIMEDIA 07: Proc. 15th Internatonal Conference on Multmeda, pp.273 276, ACM, New York, NY, USA, 2007. [15] A. Nagasaka and Y. Tanaka, Automatc vdeo ndexng and fullvdeo search for obect appearances, Proc. IFIP TC2/WG 2.6 Second Workng Conference on Vsual Database Systems II, pp.113 127, North-Holland Publshng, Amsterdam, The Netherlands, 1992. [16] J. Astola, P. Haavsto, and Y. Neuvo, Vector medan flters, Proc. IEEE, vol.78, no.4, pp.678 689, Aprl 1990. [17] N. Ntanda and M. Haseyama, Audo-based shot classfcaton for audovsual ndexng usng pca, mgd and fuzzy algorthm, IEICE Trans. Fundamentals, vol.e90-a, no.8, pp.1542 1548, Aug. 2007. [18] T. Zhang and C.C.J. Kuo, Audo content analyss for onlne audovsual data segmentaton and classfcaton, IEEE Trans. Speech Audo Process., vol.9, no.4, pp.441 457, May 2001. [19] Z. Lu, Y. Wang, and T. Chen, Audo feature extracton and analyss for scene segmentaton and classfcaton, J. VLSI Sgnal Processng System, vol.20, pp.61 79, 1998. [20] F. Sebastan, Machne learnng n automated text categorzaton, ACM Comput. Surv., vol.34, no.1, pp.1 47, 2002. [21] J.M. Klenberg, Authortatve sources n a hyperlnked envronment, J. ACM, vol.46, no.5, pp.604 632, 1999. [22] M.J. Swan and D.H. Ballard, Color ndexng, Int. J. Comput. Vs., vol.7, no.1, pp.11 32, 1991. Satosh Asamzu receved hs B.S. and M.S. degrees n Materal Scence and Engneerng from Muroran Insttute of Technology, Japan n 1994 and 1996 respectvely. He receved hs Ph.D. degree n Electroncs and Informaton Engneerng from Hokkado Unversty, Japan n 1999. He s currently an assocate professor n the Kushro Natonal College of Technology. Hs research nterests are n dgtal mage processng and ts applcaton. Mk Haseyama receved her B.S., M.S. and Ph.D. degrees n Electroncs from Hokkado Unversty, Japan n 1986, 1988 and 1993, respectvely. She s currently a professor n the Graduate School of Informaton Scence and Technology, Hokkado Unversty. Her research nterests are n dgtal sgnal processng and ts applcatons. Yasutaka Hatakeyama receved hs B.S. degree n Electrcal and Electronc Engneerng from the Natonal Insttuton for Academc Degrees and Unversty Evaluaton, Japan n 2008. He s currently pursung an M.S. degree at the Graduate School of Informaton Scence and Technology, Hokkado Unversty. Hs research nterests nclude audovsual processng and Web mnng. Takahro Ogawa receved hs B.S., M.S. and Ph.D. degrees n Electroncs and Informaton Engneerng from Hokkado Unversty, Japan n 2003, 2005 and 2007, respectvely. He s currently an assstant professor n the Graduate School of Informaton Scence and Technology, Hokkado Unversty. Hs research nterests are n dgtal mage processng and ts applcatons.