Robust Shot Boundary Detection from Video Using Dynamic Texture

Sensors & Transducers 204 by IFSA Publshng, S. L. http://www.sensorsportal.com Robust Shot Boundary Detecton from Vdeo Usng Dynamc Teture, 3 Peng Tale, 2 Zhang Wenjun School of Communcaton & Informaton Engneerng, Shangha Unversty, Shangha, 200072, Chna 2 School of Flm and TV Arts & Technology, Shangha Unversty, Shangha 200072, Chna 3 School of Computer Scence and Technology, Huabe Normal Unversty, Huabe, Anhu, 235000, Chna. E-mal: talep@63.com Receved: 4 March 204 /Accepted: 28 March 204 /Publshed: 3 March 204 Abstract: Vdeo boundary detecton belongs to a bass subject n computer vson. It s more mportant to vdeo analyss and vdeo understandng. The estng vdeo boundary detecton s always are effectve to certan types of vdeo data. These s have relatvely low generalzaton ablty. We present a novel shot boundary detecton algorthm based on vdeo dynamc teture. Frstly, the two adjacent frames are read from a gven vdeo. We normalze the two frames to get the same sze frame. Secondly, we dvde these frames nto some sub-doman on the same standard. The followng thng s to calculate the average gradent drecton of subdoman orm dynamc teture. Fnally, the dynamc teture of adjacent frames s compared. We have done some eperments n dfferent types of vdeo data. These epermental results show that our has hgh generalzaton ablty. To dfferent type of vdeos, our algorthm can acheve hgher average precson and average recall relatve to some algorthms. Copyrght 204. IFSA Publshng, S. L. Keywords: Shot boundary detecton, Average gradent drecton, Dynamc teture, Teture measure.. Introducton Vdeo s an mportant form of multmeda data. Wth Internet and electroncs technology developng, vdeo data are rapdly ncreasng n vdeo webstes, lbrares and moble devces, etc. In the vast amounts of data, we always hope to get avalable vdeo data. Because of the complety of the vdeo structure, t s more dffcult to retreve vdeo from that large database. Therefore there s a need for effcent and accurate vdeo retrevng algorthms. Among the algorthms, an mportant approach s to break the vdeo nto shots and get key frame for each shot. So vdeo retreval can be converted to retreval based on mage. As we know that a shot of vdeo s a contnue sequence of frames to depct some scene or event, vdeo structure s shown n Fg.. 2. Revew of Estng Technques Nowadays, shot boundary detecton, a fundamental problem of computer vson, has been ntensvely studed n the past several years. There have been some s n the publshed lterature to solve the shot boundary detecton [-8]. These shot boundary 04 Artcle number P_RP_0084

detecton technques can be classfed nto four approaches, namely, ) Pel comparson s; 2) Hstogram comparson ; 3) Edge comparson, and 4) Methods based on machne learnng. teture. Dynamc teture [9] can be used to descrbe the dynamc and tmng sequence mage object, such as vdeo shot. In general, for the adjacent frames of the same shot, f the objects wthn a frame move n the local area of frame, frame teture feature locally changes too. Based on the factors, ths paper proposes a new vdeo dynamc teture concept. In ths paper, the frame of vdeo wll be segmented nto the same sze sub-doman that s mage blocks. The followng thng s to calculate the average gradent drecton of each area, orm "vdeo dynamc teture" based on these average gradent drecton. Then, accordng to the change of "vdeo dynamc teture" between the adjonng frames, we can determne f the boundary of the adjacent frames have a large change. Fg.. Vdeo structure. ) Pel comparson s are based on the assumpton that there s dfference among the pels. Thus, we may defne a threshold and draw dfference of the adjacent frame through calculatng ther value (grey level, color value, etc). Further, accordng to the dfference between the adjacent frames, t can be judged whether shot boundary changes. Pel comparson s have low complety of calculaton, but some factors (pel brghtness change caused by movement of dgtal cameras and vdeo object, llumnaton varaton, etc) have large nfluence to results, these factors can be easy to cause a fault shot boundary. 2) Hstogram comparson s use gray hstogram to compare the adjacent frames. The gray hstogram of frame usually classfy several ratngs to brghtness of each pel, gray of each pel or color of each pel n a frame, and sums the total number of per level pels, so as to compare the frame boundary. On the bass of the statstcs, the s can get better result of boundary detecton to the vdeo where there are slow moton objects and slow camera movng. On the one hand, these s have lower computaton complety, on the other hand, when the lght ntensty change or shots have quck movement, the hstogram can produce dstorton, thus we receve error detecton. 3) Edge comparson s rely on the assumpton when the adjacent shots happen to change, the new shot should be far from the prevous one. The s are easy to detect the shot wth smple structure. 4) Machne learnng s usually use machne learnng to tran of frame features, and acheve the shot boundary. These machne learnng s nclude the SVM, K-means, AdaBoot algorthm, etc. Teture feature s an mportant the mage feature and annotaton s for object wthn mage and the whole mage. In ths paper we etend mage teture concept to vdeo, then form a dynamc 3. Background and Detecton As we know that teture feature s a basc feature of the mage. From the vew of localty, mage teture s n out of order, whch s n order n global vew, and whch s easy to be cogntve and dffcult to be defned. Vdeo teture s abstract, whose defntons are formed due to the dfferent understandng on the teture, and depends on the specfc applcaton. In ths paper we treat a group of average gradent drecton as vdeo dynamc teture. Vdeo dynamc teture of color vdeo s dfferent from gray vdeo. Calculaton for vdeo dynamc teture of color vdeo s more dffcult than gray vdeo. Frstly, let's dscuss vdeo dynamc teture of gray vdeo. For an assumed grayscale mage block (sub-doman), as we know that the average gradent drecton of the sub-doman can always reflect the regonal gray ntensty. If we segment an mage nto several sub-domans, and calculate the average gradent drecton of each sub-doman, we wll quanttatvely obtan gray dstrbuton. In ths work, we brng up a vdeo clp, read the adjacent frame f +, make normalzaton to frame sze, and calculate the average gradent drecton of each sub-doman Gradent _, j. Thus Matr, Matr + (average gradent drecton matr) are formed by Gradent _, j.the followng thng s to compare Matr wth Matr +. Flowchart of algorthm s shown n Fg. 2. 3.. Intalzaton Operaton In vdeo stream, the vdeo clp can be from dfferent cameras, dfferent vdeo post-producton, so there may be a not consstent sze among the shots, For facltatng the comparson we request all the frames must be n the same sze. In ths paper all the frames are normalzed, nto a gven sze. 05

In ths paper, sub-doman gradent s calculated n the three felds (R, G, B) n RGB color space, then eventual gradent can obtaned by averagng gradent from R, G, B feld. Average gradent of frame sub-doman (FSAGT), average gradent drecton of frame sub-doman (FSAGTD), dynamc teture are defned as follows: Defnton. Frame sub-doman average gradent (FSAGT). In RGB color space, for a sub-doman D, j of a gven fame, we assume D, L L j R (L belongs to odd number), FSAGT s average gradent value of sub-doman pels ecept for boundary pels. FSAGT( I, J ) = L 2 K= z= R, G, B ( H ( + k + L * +, y) H ( + K + L *, y)) = z / 3, (5) Fg. 2. Formaton of vdeo dynamc teture. 3.2. Correlatve Concepts Defnton For a normalzed frame f *, we can calculate gradent of sub-doman by several s. We know that dffculty that color frame obtan the gradent of sub-doman s greater than the gray mage. Due to the color mages havng more applcaton, so ths paper uses color vdeo data as eperment data. For gray mage, the gradent and the gradent drecton can be calculated by the Formula - Formula 4. We assume that gradent based on gray (brght)functon H(, y) n the pont (, y) s a vector wth value and drecton. G and Gy represent gradent along the drecton and y drecton. H(, y) represent the grey value of pel (, y). Then the gradent vector can be represented as follows: G = H ( +, y) H (, y), () G = H(, y+ ) H(, y ), (2) y The gradent value G (, y) s: 2 2 Gy (, ) = G(, y) + Gy(, y ), (3) The gradent drecton s: (, y) = arctan( G (, y) / G (, y)), (4) y L 2 k ( H(, y + K + L* j + ) = Z = R, G, B, (6) H(, y + k + L* j )) where, j=0,, 2. Defnton 2. Frame average gradent drecton (FSAGT D). In RGB color space, for a sub-doman D, j L L of a gven fame, we assume D, j R (L belongs to odd number), FSAGT D Gradent_ ( IJ, frame Gradent_ z /3 ) can be descrbe as: I, J = = arctan( FSAGT( I, J) y, (7) / FSAGT( I, J) Defnton 3. Vdeo dynamc teture. For a gven frame f, we normalze t as the f at frst. The frame f * s segmented nto * M * N areas. Then we calculate the gradent drecton of each sub-doman. At last, a matr s made up wth M * N gradent drectonsgradent_. Gradent_ W = Gradent_ 0,0 M,0, Gradent_ 0,, Gradent_, Gradent_ M, 0, N, Gradent_ M, N The matr W s named as dynamc teture of the frame. 06

4. Dynamc Teture Measure Image teture features can be measured by Coarseness, Contrast, Lne Lkeness, regularty, Roughness and drectonalty [0]. Smlarty among mage teture may use statstcal moment of hstogram, gray level co-occurrence matr, spectral measure and the fractal dmenson vdeo dynamc teture n ths paper s dfferent from the tradtonal mage teture. Vdeo dynamc teture s defned by average gradent drecton of frame subdoman. We defne teture whch can reflect the dynamc global characterstcs of vdeo frames, can also reflect the local characterstcs of mage. For two adjacent frames, the change value of gray gradent drecton of the same sub-doman reflects smlarty of the two adjacent frames at the sub-doman. The change value of dynamc teture of two adjacent frames reflects smlarty of the two adjacent frames. For the adjacent frame class dynamc teture can be compared by Formula 8 and Formula 9. Gradent Gradent_ + jk, jk, + 2 2 jk, + Gradent_ jk, ( Gradent ) ( ) < δ, (8) Δ W = W + W, (9) In the Formula 8, δ s a gven eperence decmal. Due to many factors, frames always have nose. If Formula 8 s establshed, we wll thnk that the gradent drecton of sub-doman (j, k) does not change. We assume that N s s number of matr elements whch meet Formula 8 n the adjacent frames. For a gven shot, f matr ΔW meets Formula 9, we thnk matr Δ W s n sparse, frame f + belong to the same shot. Otherwse the Δ W s dense matr, frame f + does not belong to the same shot. N / < ε, (0) s N all where N all s the total number of matr elements; ε s a gven decmals. 5. Shot Boundary Detecton Algorthm Based on Dynamc Teture The basc steps of our are represented as: Step. For a gven vdeo clps, the algorthm need to read for two adjacent frames f +. In RGB space, the two frames wll be normalzed, dvded nto several sub-doman. Then we calculate ts dynamc teture w and w +. Step 2. Accordng to Formula 9 and Formula 0, the algorthm calculates Δ w and estmates sparsty of Δ w. Step 3. If Δ w s sparse, we thnk that f + s frst frame of new shot. Otherwse, f = f +, the algorthm turns to step. Step 4. Detecton judgment. For some vdeo, such as vdeo whch has sngle flud frame object, although there wll be an obvous dfference for the dynamc teture of two adjacent frames, but we stll can't confrm that the two frames belong to two shots. On the bass of algorthm of lterature [], we wll compare hstogram of the two frames. If they stll have obvous dfference for hstogram, the two frames belong to two dfferent shots, otherwse, they belong to the same shot. 6. Results and Analyss of Eperment In order to prove the performance of the algorthm n ths paper, we choose 40 peces of vdeo clps ncludng several knds of vdeo data (news, sports vdeo, advertsng, move, etc.) from Internet to compare several s. Typcal vdeo data s shown n Fg. 3. Fg. 3. Typcal vdeo data. In ths paper, the epermental platform s Wndows 7 operatng system. The computer confguraton for ths eperment s as follows: Inter core2 Quad CPU Q8300, 2 GB memory and Matlab200a. In ths paper we compared our s wth two very commonly used s. One of these algorthms s based on color hstogram, the others s pel comparson. We mplement and run all algorthms under the same development envronment to have a far tmng comparson. The comparsons amomg the three algorthms are based on shot average precson (AVS precson ), shot average recall(avs recall ), to test effect of the three s. AVS precson and AVS recall can be defned as follows: 07

AVS AVS precson recall n Nts = Nas = = n Nts = Nt + Nf 00%, () 00%, (2) where Nts s the rght number of shot detected of vdeo ; Na s the number of shots whch have been detected; Nf s the number of lost shots. In the eperments, we adopt that sub-doman sze of frame s 3 3. Why we adopt the sze 3 3? If the sze s smaller, t wll ncrease the runnng tme of algorthm. If the sze s larger, t s mpossble to accurately descrbe the change of gradent of frame. 6.. Epermental Data Analyss The Formula and Formula 2 we defned can more accurately embody shot boundary detecton results from some vdeo. The epermental results are shown n Table Table 4. For sports vdeo, there are 723 shots. Our detected 632 boundares. Among them shot number beng detected s 583. We can see from table, the average recall rato and precson rato reached 85.37 % and 85.6 %. As can be seen from Table to Table 4, our can obtan hgh boundary detecton effcency for cut shots or gradent shots, and also has hgh generalzaton ablty. Algorthm Table. Eperment result for sports vdeo. of vdeo AVSprecson Pel comparson 0 83.2 85.09 Color hstogram 0 77.36 84.97 Our s 0 85.37 85.6 Algorthm Table 2. Eperment result for Move. of vdeo AVSprecson Pel comparson 0 86.09 78.36 Color hstogram 0 80.6 82.87 Our s 0 93.8 88.53 Algorthm Table 3. Eperment result for news vdeo. of vdeo AVSprecson Pel comparson 0 89.59 85.46 Color hstogram 0 86.4 86.08 Our s 0 93 9.72 Algorthm Table 4. Eperment result for advertsement. of vdeo AVSprecson Pel comparson 0 93 83.4 Color hstogram 0 9.74 83.4 Our s 0 92.8 84.88 7. Concluson For several knds of vdeo data, the paper presented a novel vdeo boundary detecton based on dynamc teture. We defne a novel vdeo dynamc teture based on gradent drecton of sub-doman of frame, whch can reflect frame local changes and global changes. The of shot boundary detecton sn t only effectve to abrupt shots, but also to gradual shots. We have epermented wth sports vdeo, move, news vdeo, and advertsement vdeo, wth good results. The approach also has some short need to mprove, such as how to select δ and ε. In future work, we wll be combned wth other nformaton to mprove the n order to obtan better generalzaton ablty and better detecton results. Acknowledgements Ths work was fnancally supported by the Unversty Scence Research Key Project of Anhu Chna (Grant No. KJ200A304). References []. E. Sahoura, A. Zakhor, Content Analyss of Vdeo Usng Prncpal Components, IEEE Transactons on Crcuts and Systems for Vdeo Technology, Vol. 9, No. 8, December 999, pp. 290-298. [2]. E. Strnga, C. S. Regazzon, Real-tme Vdeo-shot Detecton for Scene Survellance applcatons, IEEE Trans. on Image Processng, Vol. 9, No., January 2000, pp. 69-79. [3]. I. Koprnska, S. Carrato, Vdeo segmentaton of MPEG compressed data, n Proceedngs of the IEEE Int. Conf. on Electroncs, Crcuts and Systems, Vol. 2, 998, pp. 243-246. [4]. Zhang, H. J., Kankanhall, A., Smolar, S., Automatc parttonng of full-moton vdeo, Multmeda Systems,,, 993, pp. 0-28. [5]. Ford Ralph M, Robson Crag, Temple Danel, et al., Metrcs for shot boundary detecton n dgtal vdeo sequences, Multmeda Systems, 8,, 2000, pp. 37-46. [6]. Zhu X, Ln Xng-gang, Survey on vdeo temporal segmentaton, Chnese Journal of Computer, 27, 8, 2004, pp. 027-035. [7]. Nagasaka A, Tanaka Y., Automatc vdeo ndeng ull-vdeo search for object appearances, n Proceedngs of the IFIP 2 nd Workshop Conference on 08

Vsual Database System II Budapest, Hungary, 992, pp. 3-27. [8]. Zh-Cheng Zhao, An-N Ca, Shot Boundary Detecton Algorthm n Compressed Doman Based on Adaboost and Fuzzy Theory, Advances n Natural computaton, 2006, pp. 67-626. [9]. R. C. Nelson, R. P. Polana, Qualtatve Recognton of Moton Usng Temporal Teture, CVGIP: Image Understandng, 56,, 992, pp. 78-89. [0]. Tamura H., Mor S., Yamawak T., Tetural features correspondng to vsual percepton, IEEE Transactons on Systems, Man and Cybernetcs, 8, 6, 978, pp. 460-473. []. Dugad R., Ratakonda K., Ahuja N., Robust Vdeo Shot Change Detecton, n Proceedngs of the IEEE Workshop on Multmeda Sgnal Processng, 998, pp. 376-38. 204 Copyrght, Internatonal Frequency Sensor Assocaton (IFSA) Publshng, S. L. All rghts reserved. (http://www.sensorsportal.com) 09