A Clustering Algorithm for Key Frame Extraction Based on Density Peak

Journal of Computer and Communcatons, 2018, 6, 118-128 http://www.scrp.org/ournal/cc ISSN Onlne: 2327-5227 ISSN Prnt: 2327-5219 A Clusterng Algorthm for Key Frame Extracton Based on Densty Peak Hong Zhao 1, Tao Wang 1, Xangyan Zeng 2 1 School of Computer and Communcaton, Lanzhou Unversty of Technology, Lanzhou, Chna 2 Department of Mathematcs and Computer Scence, Fort Valley State Unversty, Fort Valley, GA, USA How to cte ths paper: Zhao, H., Wang, T. and Zeng, X.Y. (2018) A Clusterng Algorthm for Key Frame Extracton Based on Densty Peak. Journal of Computer and Communcatons, 6, 118-128. https://do.org/10.4236/cc.2018.612012 Receved: December 17, 2018 Accepted: December 23, 2018 Publshed: December 26, 2018 Abstract Amng at the problem of vdeo key frame extracton, a densty peak clusterng algorthm s proposed, whch uses the HSV hstogram to transform hgh-dmensonal abstract vdeo mage data nto quantfable low-dmensonal data, and reduces the computatonal complexty whle capturng mage features. On ths bass, the densty peak clusterng algorthm s used to cluster these low-dmensonal data and fnd the cluster centers. Combnng the clusterng results, the fnal key frames are obtaned. A large number of key frame extracton experments for dfferent types of vdeos show that the algorthm can extract dfferent number of key frames by combnng vdeo content, overcome the shortcomng of tradtonal key frame extracton algorthm whch can only extract a fxed number of key frames, and the extracted key frames can represent the man content of vdeo accurately. Keywords Key Frame, Clusterng Algorthm, HSV Color Hstogram 1. Introducton Wth the rapd development of multmeda and Internet technology, we are n the era of data exploson. At every moment, there are a lot of data generatng such as vdeo, text, mages, blogs n all walks of lfe. Vvd dgtal vdeo has gradually replaced the monotonous text nformaton, whch has become one of the man ways for people to spread nformaton. Ether personalzed recommendaton or content-based vdeo retreval, t s dffcult to analyze a large amount of vdeo data. Key frames are vdeo pctures that can represent the man content of vdeo smply and effectvely, they provde a sutable abstracton and framework for vdeo ndexng, browsng and retreval. The use of key frames DOI: 10.4236/cc.2018.612012 Dec. 26, 2018 118 Journal of Computer and Communcatons

greatly reduces the amount of data requred n vdeo browsng and provdes an organzatonal framework for dealng wth vdeo content [1]. Key frame extracton has been recognzed as one of the mportant research ssues n vdeo nformaton retreval [2]. Clusterng algorthm s the process of dvdng a set of data obects nto multple groups or clusters, whch makes the obects n the same cluster have hgh smlarty and the smlarty of obects between dfferent clusters s extremely low. From the pont of vew of pattern recognton, clusterng s the dscovery of potental patterns n data, helpng people to group and classfy them to acheve a better understandng of the dstrbuton of data. As a knd of data mnng tool, clusterng analyss has been wdely used n many felds such as bology, nformaton securty, ntellgent busness and web searchng. Dfferent clusterng algorthms are based on dfferent assumptons and data types, and each clusterng algorthm has ts lmtatons and bases. The choce of clusterng algorthm often depends on the type of data and the purpose of clusterng. For example, some clusterng algorthms may work better on one applcaton scenaro, but not n another. Clusterng algorthm s used to extract vdeo key frames, and the frame mages wth hgh smlarty n the vdeo are clustered nto one class, and these cluster centers are key frames. Densty-based clusterng method classfes areas wth suffcent hgh densty nto clusters, lookng for hgh-densty areas separated by low-densty areas, and clusters wth arbtrary shape can be easly obtaned. The densty peak clusterng algorthm DPCA (clusterng by fast search and fnd of densty peaks) [3] s a new densty-based clusterng algorthm, whch can fnd clusters wth dfferent denstes by vsualzed method, quckly fnd the densty peak ponts (.e. cluster centers) of data sets, and effcently allot sample ponts and elmnate outlers [4]. In the feld of mage processng and mage retreval, how to extract effectve features from mage content has become the most concerned ssue. The color feature s one of the most sgnfcant vsual features that wdely used n the feld of mage processng, the man reason s that color s often closely relevant to the obect or scene contaned n the mage. In comparson wth other vsual features, the color feature has less dependence on the sze, drecton and perspectve of mage and also has hgher robustness. The earlest example of mage retreval makng use of color s a retreval algorthm based on global color hstogram proposed by Swan and Ballard. The retreval process based on color hstogram nvolves the selecton of mage color space, the quantzaton of color space, the defnton of color hstogram and the calculaton of smlarty dstance between hstograms [5]. Several ssues wll be nvolved n extractng key frames by clusterng algorthm. Frstly, t s necessary to select the approprate color space to descrbe the color features. Secondly, use a certan quanttatve method to express the color feature as a vector form. Fnally, defne a crteron to measure the smlarty between mages n color [6]. Image clusterng s much more complex, because most of the mage data s hgh-dmensonal and the amount of data s DOI: 10.4236/cc.2018.612012 119 Journal of Computer and Communcatons

large. All the mage data has to be loaded nto memory for calculaton, so t s not only computatonally large, but also prone to memory leaks. In vew of ths, ths paper proposes a densty peak clusterng algorthm whch combnes the characterstcs of HSV hstogram, uses the HSV hstogram to smplfy calculaton and effectvely mproves the qualty and effcency of key frame extracton. 2. HSV Hstogram Method 2.1. RGB Color Model Accordng to the trcolor theory, the human eye s more senstve to red, green and blue, and the maorty of colors can be syntheszed by dfferent proportons of red, green and blue. The RGB color space s shown n Fgure 1. Any color lght n nature can be mxed by addng R, G and B three prmary colors n dfferent proportons. For nstance, when the three prmary components are all zero, they are mxed nto black lght, and when the three prmary components are both the maxmum, they are mxed nto whte lght. Therefore, any color corresponds to a pont n the RGB color space. RGB color model s the most commonly used color model n mage processng. As far as edtng mages are concerned, t s the best color model. Its physcal meanng s clear and sutable for the work of color knescope, but t does not adapt to the vsual characterstcs of human bengs and does not conform to the vsual udgment of human eyes on color. For a color, the human eye s most concerned about ts chroma, depth, brghtness, and syntheszes three parameters to evaluate the color. People wthout professonal knowledge of color cannot drectly udge these colors by RGB value, so RGB color space s not n lne wth people s percepton of color psychology [7]. In addton, the RGB color space s uneven, so the vsual dfference between two colors cannot be expressed drectly by the dstance between two color ponts n the color space [8]. Fgure 1. RGB color space. DOI: 10.4236/cc.2018.612012 120 Journal of Computer and Communcatons

2.2. HSV Color Model HSV color space s a color model orented to vsual percepton, n whch the color percepton of human eye manly ncludes three elements: hue, saturaton and value [9]. The HSV color model corresponds to a concal subset n the cylndrcal coordnate system, as shown n Fgure 2. The V-axs represents brghtness, the dstance from the V-axs represents saturaton S, and the angle of rotaton around the V-axs represents hue H. The top surface of the cone corresponds to V = 1, and the color wth the maxmum brghtness and saturaton s located on the crcumference of the top surface of the cone [10]. Androutsos et al. roughly dvded the HSV color space by experment: the areas wth brghtness greater than 75% and saturaton greater than 20% were brght color areas, the areas wth brghtness less than 25% were black areas, the areas wth brghtness greater than 75% and saturaton less than 20% were whte areas, and the others were color areas [11]. The HSV model s smlar to the panter s method of color matchng. By changng the color ntensty and depth, dfferent tones of a pure color can be obtaned. That s, addng whte to a pure color to change the color ntensty and addng black to change the color depth. It can be seen that the three elements of hue, saturaton and brghtness n the HSV color space have a clear structure, are easy to understand and closely related to the way people feel the color. In order to capture the features of vdeo frames better, ths paper uses HSV color model to carry out subsequent expermental analyss. 2.3. HSV Hstogram Color hstogram s a wdely used color feature n mage processng, whch descrbes the proporton of dfferent colors n the entre mage, and does not care about the spatal locaton of each color [12]. As shown n Fgure 3, the gray hstogram s to count all the pxels n the mage and get a unfed concept of the overall gray level. Among them, the horzontal axs represents the grayscale value (generally taken 0-255), and the vertcal axs s the number of pxels correspondng to each gray value n the mage. Fgure 2. HSV color space. DOI: 10.4236/cc.2018.612012 121 Journal of Computer and Communcatons

Fgure 3. Gray hstogram. Choosng the approprate number of color cells (.e. bn of hstogram) and color quantzaton methods are related to the performance and effcency requrements of specfc applcatons. In general, the more color ntervals there are, the stronger the hstogram s ablty to dstngush colors. However, color hstograms wth many ntervals not only ncrease the computatonal burden, but are also not conducve to ndexng n large mage database. Moreover, for some applcatons, the use of very fne color space parttonng method may not necessarly mprove the retreval effect, especally for those applcatons that cannot tolerate the omsson of relevant mages. In the HSV color model, t s necessary to draw the hstogram of ts three components (H, S, V) separately, and when there are qute a few colors n the pcture, the dmenson of each hstogram wll be hgher, so the HSV color space needs quantfyng frst. Accordng to the characterstcs of HSV color model, the followng treatments are made n ths study: 1) Consderng the human vsual resoluton ablty, the hue H component s dvded nto 12 parts, and the saturaton S and value V components are dvded nto 5 equal parts. 2) Consderng the value range of each component and the subectve color percepton, the followng quantzaton s performed. 0 H [346,15] 1 H [16, 45] 2 H [46, 75] 3 H [76,105] 0 H [0, 0.2] 0 H [0, 0.2] 4 H [106,135] 1 H [0.2, 0.4] 5 H [136,165] 1 H [0.2, 0.4] H =, S = 2 H [0.4, 0.6], V = 2 H [0.4, 0.6] 6 H [166,196] 3 H [0.6, 0.8] 3 H [0.6, 0.8] 7 H [196, 225] 4 H [0.8,1] 8 H [226, 255] 4 H [0.8,1] 9 H [256, 285] 10 H [286, 315] 11 H [316, 345] DOI: 10.4236/cc.2018.612012 122 Journal of Computer and Communcatons

3) Based on the perceptual characterstcs of human eyes to color, that s, the senstvty of human eyes to the H component s greater than the S component, the senstvty to the S component s greater than the V component, and then these three color components are merged nto one-dmensonal feature vectors, as shown n Equaton (1). F = 5H + 3S + 2V (1) Therefore, the value range of F s 0-75. As shown n Fgure 4, a frame of vdeo s converted nto a hstogram of 76 bn, n whch the horzontal axs represents 76 dmensons of one-dmensonal feature vector F and the vertcal axs represents the number of pxels appearng on each dmenson n an mage. 3. Densty Peak Clusterng Algorthm The man dea of densty clusterng algorthm s to fnd hgh densty regons separated by low densty regons. DPCA, a densty peak clusterng algorthm, can use vsualzaton to help fnd clusters wth dfferent denstes. It requres that each cluster has a maxmum densty pont as the cluster center, each cluster center attracts and connects the ponts wth lower densty around t, and dfferent cluster centers are relatvely far away [2]. That s, the densty peak clusterng algorthm s based on two assumptons: 1) the densty of cluster centers s greater than that of ther neghbors, and 2) the dstance between dfferent cluster centers and the hgher densty pont s relatvely large. Therefore, there are two man quanttes that need to be calculated: local densty ρ and dstance from hgher densty ponts δ. 3.1. Dstance Metrc Every data obect has 76-dmensonal attrbute values, whch can be expressed as 1 d D x = { x,, x,, x } (that D = 76). The dstance between sample pont x and x s calculated by Eucldean dstance, as shown n Equaton (2). D d d = 2 (2) d = 1 dst( x, x ) ( x x ) Fgure 4. HSV hstogram. DOI: 10.4236/cc.2018.612012 123 Journal of Computer and Communcatons

3.2. Local Densty The local densty ρ of the data obect x s defned as follows: ρ = χ( dst( x, x ) dstcutoff ) (3) x U where χ ( x) s an ndcator functon, whch s defned as follows: 1 x 0 χ( x) = 0 x > 0 The dst cutoff tem ndcates the cutoff dstance, and n lterature [2], t s ponted out that the value range of emprcal parameter t [1% 2%]. The dstance between any two data obects n dataset U s calculated and sorted ncrementally, the value of dst cutoff takes the numerc value at the t poston n the ncremental sequence. The local densty formula descrbes that the local densty ρ of each data obect x s equal to the number of data ponts where the dstance from the obect x s less than the cutoff dstance dst cutoff. 3.3. Dstance from Hgher Densty Ponts The dstance x s defned as follows: δ from hgher densty pont of data obect : ρ> ρ max ( dst( x, x )) ρ ρ x U, δ = (4) mn ( dst( x, x )) otherwse When the local densty ρ of data obect x s the global maxmum value, the relatve dstance δ s the maxmum dstance between any other data obect x and than x. Otherwse, some data obects ρ are found, and the relatve dstance x whose local densty s greater δ s the mnmum dstance between data obects x and x. It can be seen that DPCA ams to fnd data obects wth large local densty and relatve dstance as cluster centers. These cluster centers attract and connect the ponts wth low densty around them, and they are relatvely far away from each other. The local densty ρ and relatve dstance δ of each data obect x are calculated, and a two-dmensonal decson map s generated based on these two attrbute values, where the horzontal axs s the local densty ρ and the vertcal axs s the relatve dstance δ. Some data ponts n the upper-rght corner of the decson map can represent dfferent cluster centers because of ther hgh local densty and large relatve dstance from other clusters. 4. Expermental Results 4.1. Vdeo Frames Processng wth HSV Hstogram Open CV, an open source computer vson lbrary, s used to read a 511 frame test vdeo and convert each frame of the vdeo from the RGB color space to the HSV color space. Accordng to the HSV hstogram quantzaton formula, the quantzed values of each channel are calculated and merged nto HSV color level DOI: 10.4236/cc.2018.612012 124 Journal of Computer and Communcatons

F on one channel accordng to Equaton (1) ( F [0,75] ).Then the number of pxels appearng on each HSV color level F s counted based on the HSV hstogram method, and each frame of the vdeo s converted nto a HSV hstogram, whch s expressed as a 76-dmensonal egenvector n numercal form, that s 1 d D x = { x,, x,, x } (D = 76). 4.2. DPCA Clusterng Process Each frame of the vdeo has been converted nto a 76-dmensonal feature vector, so the sze of the nput data s 511 76. The dstance of any two sample ponts s calculated accordng to the dstance metrc Equaton (2) and stored n the dstance matrx M, whch s a 511 511 symmetrc matrx. The value on the dagonal lne of matrx M s all zero, M [, ] corresponds to the dstance between data obect x and x, and M [, ] = M[,]. The emprcal parameter t [1% 2%], and some experments has been carred out at t = 1% and t = 2% respectvely. The dstance of any two data obects n dataset U s calculated and sorted ncrementally, the value of dst cutoff takes the numerc value at the t poston n the ncremental sequence. Therefore, the larger t s, the larger the cutoff dstance d cutoff s, and the greater the local densty ρ of data obect x s. The expermental results are shown n Table 1. Next, the local densty ρ of each data obect x s calculated usng the densty calculaton Equaton (3), and the relatve dstance δ of each data obect x s calculated wth the relatve dstance calculaton Equaton (4). Fnally, the decson map s generated. Fgure 5 and Fgure 6 are decson maps when t = 1% and t = 2% respectvely. Fgure 5. Decson maps correspondng to t = 1%. DOI: 10.4236/cc.2018.612012 125 Journal of Computer and Communcatons

Fgure 6. Decson maps correspondng to t = 2%. Table 1. Effect of emprcal parameter t on cutoff dstance. t dcutoff 1% 1766.30 1.5% 2419.76 2% 3196.17 When t takes dfferent values, the local densty and relatve dstance of each sample pont and the dstrbuton of sample ponts on decson maps wll be dfferent. In order to make the local densty of data sample large, t = 2% was adopted as the expermental scheme n ths study. In Fgure 6, the sample pont 484 n the upper rght corner has the maxmum local densty, whch ndcates that the number of sample ponts smlar to the sample pont 484 n dataset U s the largest. In addton, sample ponts 16 and 306 have large local densty and relatve dstance, they can also be selected as cluster centers. These three cluster centers can represent dfferent clusters, and ther correspondng vdeo frames are also strong representatve. Therefore, the 16, 306 and 484 frames of the vdeo are key frames. Next, look at the 128 and 249 sample ponts, whch have the potental to serve as cluster centers and are also recorded as key frames of the vdeo. Fnally, fve key frames are obtaned. In the decson map, sample ponts wth the densty of 0 on the horzontal axs are nose ponts or outlers, and there are no smlar sample ponts around them. Therefore, these off-group ponts can be drectly gnored n searchng for cluster centers. The expermental results are shown n Table 2. DOI: 10.4236/cc.2018.612012 126 Journal of Computer and Communcatons

Fgure 7. Normalzed decson maps correspondng to t = 2%. Table 2. Local densty and relatve dstance of fve cluster centers. Frame ρ δ 16 12 73,045.71 128 8 39,697.45 249 5 49,962.86 306 20 53,948.63 484 54 127,402.09 Snce ths study s manly to fnd the key frames of vdeo, t s not concerned about whch sample ponts are ncluded n every cluster, so t s only necessary to examne whether each sample pont has potental to be a cluster center. In order to prevent the nfluence of dfferent attrbutes durng the experment, the normalzaton of local densty ρ and relatve dstance δ can be consdered. The normalzed decson map s shown n Fgure 7. 5. Concluson Amng at the problem of vdeo key frame extracton, ths paper proposes a densty peak clusterng algorthm, whch uses the HSV hstogram to transform hgh-dmensonal abstract vdeo mage data nto quantfable two-dmensonal nput matrx. In fact, the vdeo key frame s a relatvely subectve concept. Extractng key frames wth the densty peak clusterng algorthm can combne the characterstcs of vdeo content well. The extracted key frames can better represent the man content of vdeo, they have low redundancy, good nose resstance, and DOI: 10.4236/cc.2018.612012 127 Journal of Computer and Communcatons

can form clusters wth arbtrary shape wthout the need to set up the ntal parameters artfcally. Acknowledgements Ths work was supported by the Natural Scence Foundaton of Chna under Grant Nos. 51668043 and 61262016 and the Gansu Natural Scence Foundaton of Chna under Grant No.18JR3RA156. Conflcts of Interest The authors declare no conflcts of nterest regardng the publcaton of ths paper. References [1] Unversty, T. (2012) Key Frame Extracton Usng Unsupervsed Clusterng Based on a Statstcal Model. Tsnghua Scence & Technology, 10, 169-173. [2] Cao, C.Q. (2012) Research on Key Frame Extracton n Content-Based Vdeo Retreval. M.S. Thess, Tayuan Unversty of Technology, Tayuan. [3] Rodrguez, A. and Lao, A. (2014) Machne Learnng. Clusterng by Fast Search and Fnd of Densty Peaks. Scence, 344, 1492. https://do.org/10.1126/scence.1242072 [4] Zhang, J.Q. and Zhang, H.Y. (2017). Clusterng by Fast Search and Fnd of Densty Peaks Based on Manfold Dstance. Computer Knowledge & Technology, 13, 179-182. [5] Jang, L.C., Shen, G.Q. and Zhang, G.X. (2009) Image Retreval Algorthm Based on HSV Block Color Hstogram. Mechancal and Electrcal Engneerng, 26, 54-57. [6] Zhuang, Y.T., Ru, Y., Huang, T.S. and Mehrotra, S. (2002) Adaptve Key Frame Extracton Usng Unsupervsed Clusterng. Internatonal Conference on Image Processng. [7] We, B.G. and L, X.Y. (1999) Research Progress of Color Image Segmentaton. Computer Scence, 4, 59-62. [8] Xu, X.U. (1999) A Method of Domnant Colors Extracton and Representaton for CBIR Systems. Journal of Computer Aded Desgn & Computer Graphcs, 11, 385-388. [9] Zhang, Y.J. (2012) Image Engneerng: Image Analyss. 2nd Edton, Tsnghua Unversty Press. [10] Castleman, K.R., Zhu, Z.G. and Ln, X. (2002) Dgtal Image Processng. 3rd Edton, Electroncs Industry Press. [11] Androutsos, D., Platanots, K.N. and Venetsanopoulos, A.N. (1999) A Novel Vector-Based Approach to Color Image Retreval Usng a Vector Angular-Based Dstance Measure. Elsever Scence Inc. [12] Wu, C.Y., Ta, X.Y. and Zhao, J.Y. (2004) Image Retreval Based on Color Features. Computer Applcaton, 24, 135-137. DOI: 10.4236/cc.2018.612012 128 Journal of Computer and Communcatons