Efficient Mean-shift Clustering Using Gaussian KD-Tree

Pacfc Graphcs 2010 P. Allez, K. Bala, and K. Zhou (Guest Edtors) Volume 29 (2010), Number 7 Effcent Mean-shft Clusterng Usng Gaussan KD-Tree Chunxa Xao Meng Lu The School of Computer, Wuhan Unversty, Wuhan, 430072, Chna Abstract Mean shft s a popular approach for data clusterng, however, the hgh computatonal complexty of the mean shft procedure lmts ts practcal applcatons n hgh dmensonal and large data set clusterng. In ths paper, we propose an effcent method that allows mean shft clusterng performed on large data set contanng tens of mllons of ponts at nteractve rate. The key n our method s a new scheme for approxmatng mean shft procedure usng a greatly reduced feature space. Ths reduced feature space s adaptve clusterng of the orgnal data set, and s generated by applyng adaptve KD-tree n a hgh-dmensonal affnty space. The proposed method sgnfcantly reduces the computatonal cost whle obtanng almost the same clusterng results as the standard mean shft procedure. We present several knds of data clusterng applcatons to llustrate the effcency of the proposed method, ncludng mage and vdeo segmentaton, statc geometry model and tme-varyng sequences segmentaton. Categores and Subject Descrptors (accordng to ACM CCS): I.4 [Computng methodologes]: Image Processng and Computer Vson Applcatons 1. Introducton Mean shft s a well establshed method for data set clusterng, t has been wdely used n mage and vdeo segmentaton [CM02] [WTXC], object trackng [CRM03], mage denosng [BC04], mage and vdeo stylzaton [DS02] [WXSC04], and vdeo edtng [WBC 05], t also has been extended to geometry segmentaton [YLL 05] and 3D reconstructon [WQ04]. Mean shft works by defnng a Gaussan kernel densty estmate for underlyng data, and clusters together the ponts that converge to the same mode under a fxed-pont teratve scheme. Although mean-shft works well for data clusterng and obtan pleasng clusterng results, however, the hgh computatonal complexty s the one of man dffcultes to apply mean shft to cluster large data set, especally for those stuatons where the nteractve and even real tme clusterng processng are preferred. The complexty for the standard mean shft procedure s O(τdn 2 ), where n s the number of the data ponts, the τ s the number of the teratons for mean shft clusterng procedure, and d s the dmenson of the pont. The most expensve computng s to fnd the closest neghborhood for each pont n the data space, whch s a multdmensonal range searchng method. Even usng one of the most popular nearest neghbor search method, the ANN method [AMN 98], gven a query pont q and ε > 0, a (1 + ε) approxmate nearest neghbor of q must be computed n O(c d,ε logn) tme, where c d,ε s a factor dependng on dmenson d and ε. Therefore, when processng large data sets, the hgh tme complexty leads to serous dffculty. Although many acceleraton technques have been proposed [EDD03, GSM03, YDGD03, WBC 05, CP06, PD07, FK09], further mprovements are stll desrable for both performance and clusterng qualty. In ths paper, nspred by the fast hgh-dmensonal flterng method usng Gaussan KD-trees [AGDL09], we propose an effcent paradgm for mean-shft procedure computng. Our method s based on followng key observaton, snce the mean shft procedure clusters those ponts that are feature smlar, whle there are many clusters of ponts whch are hgh smlar n feature, t s wasteful to perform mean shft procedure for each pont to converge to the mode. Thus we frst cluster the orgnal pont set nto clusters based on feature smlarty usng KD-tree [AGDL09], and obtan the Gaussan weghted samples of the orgnal data set, whch s the reduced feature space for approxmatng orgnal pont set. Then nstead of computng mean shft drectly on orgc 2010 The Author(s) Publshed by Blackwell Publshng, 9600 Garsngton Road, Oxford OX4 2DQ, UK and 350 Man Street, Malden, MA 02148, USA.

nal ndvdual ponts, we compute on the samples (whch s of a much smaller number) to obtan the modes of the sample space. Fnally we fnd the closest mode for each pont based on Gaussan weghted feature smlarty, and construct the fnal clusterng results. As mean shft s performed on a greatly reduced space (typcally thousands of tmes smaller than orgnal data set), and all stages of our algorthm are data-parallel across queres and can be mplemented the algorthm n CUDA [Buc07], we can cluster the large data set n real tme or at nteractve rate (for a vdeo wth 1.44 10 7 pxels n Fgure 5). Furthermore, as the sample space s an approxmate feature space of the orgnal data set generated usng the proposed Gaussan weghted smlarty samplng, our method receves accurate results comparable wth the standard mean shft that performed on the orgnal data set. In addton, our method uses only an extremely small fracton of resources, for both tme and memory consumng. Ths paper s organzed as follows. In secton 2, we gve the related work, secton 3 s the man part of our paper, we descrbe the proposed fast mean shft clusterng method. In secton 4, we gve the applcatons of our method and comparsons wth the related mean shft acceleraton methods, and we conclude n secton 5. 2. Related work Mean shft was frst presented by Fukunaga and Hostetler [FH75], and t was further nvestgated by Cheng et al. [Che95] and Comancu et al. [CM02]. Mean shft s now a popular approach for data set clusterng, and has been wdely used n mage and vdeo segmentaton [CM02] [WTXC], object trackng [CRM03], mage denosng [BC04] and mage and vdeo stylzaton [DS02] [WXSC04], t also has been extended to geometry segmentaton [YLL 05] and 3D reconstructon [WQ04], and many mage and vdeo edtng methods are based on the mean shft clusterng preprocessng [WBC 05]. One of the man dffcultes n applyng Mean Shft based clusterng to large data sets s ts computatonal complexty. For each Gaussan weghted average teraton, the complexty of brute force computaton s quadratc n the number of data ponts. There are several exstng technques whch have been developed to ncrease the speed of Mean Shft. Comancu and Meer [CM02] used axs-algned box wndows, however, ths produces many lmt ponts and adjacent ponts are merged as a post-process. Dementhon [DeM02] used multscale structure to accelerate vdeo segmentaton. Yang at al. [YDDD03] appled the Fast Gauss Transform to speed up the sums of Gaussans n hgher dmensons that were used n the Mean Shft teraton. Ths method s effectve for Gaussan weghted average wth large flterng radus, however, performng weghted average n a relatve small radus does not beneft much from ths method. Georgescu et al. [GSM03] accelerated mean shft by performng fast nearest neghbor search wth spatally coherent hash tables. Carrera-Perpnán [CP06] studed four acceleraton strateges and found that spatal dscretzaton method (usng unform down samplng schemes) performed best. Pars and Durand [PD07] appled the sparse representaton of the densty functon to accelerate mean shft, smlar to the blateral flterng [PD06], they frst bnned the feature ponts n a coarse regular grd, and then blurred the bn values usng a separable Gaussan. The computatonal complexty and memory scale exponentally wth the dmenson d. Wang et al. [WLGR07] used a dual tree to speed up Mean Shft by computng two separate trees, one for the query ponts, and one for the reference ponts. Compared to the methods of [YDDD03] [PD07], ths method mantans a relatve error bound of the Mean Shft teraton at each stage, leadng to a more accurate algorthm, however, the performance of ths method s much lower than [YDDD03] [PD07]. More recently, Freedman and Kslev [FK09] proposed a samplng technque for Kernel Densty Estmate (KDE), they constructed a compactly represented KDE wth much smaller descrpton complexty, ths method greatly accelerates the mean shft procedure, however, the accuracy of the mean shft clusterng depends on the number of the random samples. Many methods have appled Gaussan KD-Trees for acceleratng mage and vdeo processng. Adams et al. [AGDL09] appled Gaussan KD-Trees for acceleratng hgh-dmensonal flterng ncludes the blateral mage flter [TM98], blateral geometry flterng [JDD03, FDCO03] and mage denosng wth nonlocal means [BCM05]. We borrow some deas from [AGDL09] for adaptve clusterng n ths paper. Xu et al. [XLJ 09] used K-D Tree to buld adaptve clusterng for acceleratng affnty-based mage and vdeo edt propagaton. As an alteratve, Xao et al. [XNT10] used quadtree based herarchcal data structure to accelerate edt propagaton. KD-Trees have been wdely used n acceleratng graphcs renderng [HSA91], the real-rme KDtree constructon on graphcs hardware also have been proposed [HSHH07] [ZHWG08]. Our method apples Gaussan KD-Trees [AGDL09] to buld a herarchy and clusterng for the large data set to accelerate the mean shft computng. Compared wth Pars and Durand [PD07], whose complexty s exponental n the dmenson d of the pont, our tree-based mean shft provdes wth excellent performance, as ts runtme and memory consumng both scale lnearly wth dmenson of the ponts. Wth the samples generated usng smlarty-based KD-tree clusterng, our method obtans more accurate results than [FK09] when usng smlar number of samples. 3. Fast mean shft clusterng We frst gve bref revew of mean shft, then we descrbe the proposed fast mean shft clusterng method, ncludng

data set clusterng preprocess usng KD-tree, sample feature space computng, mean shft modes computng n reduced feature space, modes nterpolaton. We also present the complexty analyss and GPU mplementaton of the proposed algorthm. 3.1. Revew of mean shft Gven pont data set {x } n =1, where x R d s d dmensonal feature vector, each s assocated wth a bandwdth value h > 0. An adaptve nonparametrc estmator of ths data set s defned as ( f K (x) = 1 n 1 n =1 h d k x x 2) (1) h ( where K (x) = c k,d k x 2) > 0 s kernel functon satsfyng K (x) 0 and Rd K (x)dx = 1, By takng the gradent of (1) the followng expresson can be obtaned. f K (x) = 2c n n =1 1 h d g ( x x h 2 ) m(x) (2) where g(x) = k (x), and m(x) s the so-called mean shft vector n 1 x x =1 x g( 2 ) h m(x) = n =1 h d+2 1 h d+2 g( x x h 2 ) x (3) The expresson (3) shows that at locaton x the weghted average of the data ponts selected wth kernel K s proportonal to the normalzed densty gradent estmate obtaned wth kernel K. The mean shft vector thus ponts toward the drecton of maxmum ncrease n the densty. The followng gradent-ascent process wth an adaptve step sze untl convergence consttutes the core of the mean shft clusterng procedure n 1 x+1 =1 h d+2 = n 1 =1 g h d+2 ( x x g x ( x x 2 ) h ), j = 1,2,... (4) 2 h The startng pont of the procedure x can be chosen as ponts x, and the convergence ponts of the teratve procedure are the modes of the densty. The all ponts that converge to the same mode are collected and consdered as a cluster. More detals are descrbed n [CM02]. 3.2. Fast mean shft computng The weghted average of expresson (4) s the most tme consumng computng of mean-shft when the number n of the data set s large (for example, 10 9 pxel n vdeo streamng). Gven an arbtrary set of pont {x } n =1 wth feature vector of d dmenson, a nave computaton of mean shft vector expresson (4) would take O(dn 2 ) tme, as every pont nteracts wth every other pont. A smple way to accelerate the mean shft procedure s usng the weghted average of the closest ponts of x, and the bandwdth value h can be set dependng on the neghborhood sze. However, usng ths scheme, we have to perform the nearest neghborhood search, whch s also a tme consumng operaton for large data set, especally for data set wth hgh dmenson vector. To accelerate the weghted average operaton of expresson (4), nstead of computng expresson (4) for ndvdual ponts n the data set, we approxmate orgnal data set by pece-wse lnear segments n the feature space based on smlarty measure, each represented by a cluster of nearby pxels, and the sze of each cluster s adapted to the sze of the smlar feature space. The generated clusters can be consdered as the samples of the data set, whch s of a much smaller number than the number of pont. Then nstead of solvng the mean shft procedure (4) drectly on ndvdual ponts as done n prevous methods, we solve t on the samples based on Gaussan weghted average on a neghborhood, fnally we nterpolate the clusterng results to the orgnal data set. We cluster the pont data based on smlarty measure between the ponts, whch s defned n the feature space of nput data set. We defne the smlarty between the ponts usng both spatal localty p and value v of pont, whch consttutes the feature space of the nput data set. For example, n mage case, pont x s a pxel wth ts poston p = (x,y) and ts color value v = (r,g,b) (Lab color space). Thus, each pont x s a fve-dmensonal feature vector whose axes are x = (p,v) = (x,y,r,g,b). As stated n [AP08] [XLJ 09], the smlarty measure ((or affnty) between two ponts can be defned as z j = exp x x j 2 ), the poston p and value v also can be weghted by parameters. For vdeo, the feature vector can be expanded to nclude the frame ndex t and moton estmaton ς of pont x, and feature vector s expressed as seven-dmensonal vector x = (p,v,t,ς). For mage, we compute the mean shft clusterng procedure (4) n feature space where each pont s assocated wth both spatal localty p and value v. 3.2.1. KD-Tree adaptve clusterng We apply KD-tree to adaptvely cluster the data set n the feature space, and subdvde fnely n the regons where the pont feature vectors are dfferent, subdvde coarsely n the regons where the feature vectors are smlar. In mage KDtree clusterng, for example, we subdvde the homogeneous regons coarsely, whle subdvde the edges regons fnely. Then by representng each cluster wth a sample, we can receve an accurate approxmate feature space of the orgnal data set wth much less samples. KD-tree clusters the data set based on feature space n a top down way. Startng from a root cell, the top rectangular

cell represents all ponts n the data set, we recursvely splt a cell to two chld cells adaptvely along a dmenson that s alternated at successve tree levels. Smlar to [AGDL09], each nner cell of the tree T represents a d-dmensonal rectangular cell whch stores sx varables: the dmenson d along whch the tree cuts, the cut value T cut on that dmenson, the bounds of the cell n that dmenson T mn and T max, and ponters to ts chldren T le ft and T rght. Leaf nodes contan only a d-dmensonal pont whch s consdered as a sample. Ths sample represents the ponts that contaned n the leaf cell, the kd-tree stores m samples { } m y j of orgnal data set { n } d-dmensons, one sample pont per leaf. The samples m y j construct the reduced feature space of the orgnal data set. As llustrated n Fgure 1, we adaptvely cluster the mage nto clusters. The mage s adaptvely clustered, where at the edge regons, more samples are placed, whle at the homogeneous regons, coarse samples are placed. The sample factor can be changed by the stoppng crtera. We present two thresholds for stoppng crtera, one s the sze of the cell, other one s the varance σ of the smlarty measure z j between the ponts n the cell. By usng these two thresholds we can generate dfferent samplng factor for mage as well as respectng for the feature dstrbuton of the data set. Fgure 1 shows some cluster results wth dfferent samplng factor. Fgure 1: Image samplng usng adaptve KD-tree. (a) Orgnal mage (636 844), (b) mage clusterng wth 1,050 samples, (c) mage clusterng wth 11,305 samples. 3.2.2. Sample feature space computng We obtan the samples { } m y j n d-dmensons, whch are adaptve samples of the feature space of orgnal data set. The samples { } m y j can be consdered a approxmate feature space of orgnal space. To make a more accurate ap- proxmate feature space, we scatter the pont x to the samples { } m y j based on the affnty smlarty z j between the pont x and sample y j, and obtan an affnty smlarty based sample space. Smlar to the splattng stages n [AGDL09], we scatter { each } pont x to ts nearest neghborhood N (x ) n samples m y j (We apply the KD-tree to search the hgh dmenson nearest neghborhood N (x ) for pont x ), and compute an affnty based sample y j for each sample y. The affnty based sample y j can be consdered as the weghted smlarty average for those feature vector {x } that s most smlar to the sample y. We compute and sum the affnty smlarty z j between x and each y j N (x ) to obtan the affnty based sample y j of the sample y : y j = y j + z j x, then y j s normalzed by the sum of the smlarty z j. The generated y j s a feature vector of d-dmensons. The affnty based sample {y j } m s a more accurate samples of the orgnal pont set, and wll be used n the mean shft clusterng procedure and modes nterpolaton. Then for each sample, we store two vectors, one s feature vector y, the other s the affnty based sample y j. 3.2.3. Mean-shft modes computng After obtanng the affnty based samples {y j } m for orgnal data set feature space, nstead of computng the mean shft clusterng procedure n the orgnal space, we compute the mean shft procedure on the reduced feature space {y j } m. Note for each teraton, we fnd for each sample y j the nearest neghborhood N ( y ) j n the sample space, and perform followng gradent-ascent process wth an adaptve step sze untl convergence: y N(y j ) y u g( j y h 2) u j+1 = ( y N(y j ) g u j y h 2), j = 1,2,... (5) Iteratng an nfnte number of tmes the expresson (5) s guaranteed to brng u j to the mode n reduced feature space. Practcally, n the mplementaton, we fnd that the meanshft clusterng procedure (5) tends to converge n a very small number of steps, typcally around 6. As the number of the samples { y } m j s much smaller than the number of the ponts {x } n =1, that s m n, the computatonal complexty of mean shft vector computng has been reduced sgnfcantly. Furthermore, the mean shft s performed n the affnty smlarty based reduced feature space, whch leads to more accurate modes. We apply the KD-tree to perform the hgh dmenson nearest neghbor research for each teraton computng. After performng mean shft teratons for each sample y j, we receve the modes {z k } s k=1 of the reduced feature space { y } m j, whch are the approxmate modes of orgnal data set. All samples convergng to the same mode are clustered together. 3.2.4. Modes nterpolaton To nterpolate the modes computed n the reduced feature space to the orgnal data set, one naïve approach s to fnd a nearest mode z l {z k } s k=1 for each pont x. Ths can be consdered as a hard clusterng. As an alternatve method, we gve a soft clusterng method whch generates more smooth clusterng results. Ths method works by applyng weghted based nterpolaton.

The mode nterpolaton works as follows, for each pont x, we fnd the nearest samples N (x ) n { y } m j. Each sample y j N (x ) converges to a mode u j, that s, y j u j. Based on the affnty smlarty z j between x and y j, by normalzng the weghts z j : j z j = 1, the fnal mode are computed as: x z j u j. When we compute the weghted mode nterpolaton over all the samples { y } m j, smlar to [FK09], we wll receve the cartoon-lke clusterng results. Note that n the nterpolaton stage, the sze of the N (x ) s not always the same as that performed n the sample space computng stage. In our experment, we set neghborhood sze between 10 and 20, and receve satsfed results. As the samples are sgnfcantly smaller than the number of the orgnal pont set, usng the GPU accelerated KD-tree nearest neghborhood search, the search performng s fast. In addton, snce we determne the fnal mode for x based on the weghted smlarty, the results are more accurate. Fgure 3 shows the results usng the two dfferent mode selecton methods, one s the nearest-sample based mode selecton, the other s weghted modes nterpolaton. As llustrated n Fgure 3, usng the weghted modes nterpolaton, we receve smoother and more accurate results. up-sample clusterng results to the orgnal data sets whch takes O(n(logm + d)) tme. Recall that m n, ths results n a total complexty O(dnlogn). Compared wth standard mean shft procedure wth complexty O(τdn 2 ), the proposed method sgnfcantly accelerated. Applyng the method presented by [ZHWG08], a KD-tree s effcently bult on the GPU for the nput ponts wth hghdmensonal feature vector. Three stages of our proposed algorthm, ncludng the ponts scatterng, mean shft clusterng procedure n the reduced space, and modes nterpolaton, all ncorporate the hgh dmensonal nearest neghbor search. As the hgh dmensonal nearest neghbor search can be mplemented n parallel usng GPU [AGDL09], thus, our method s even fast to process large data set wth hgh dmensonal feature vector. We mplement the proposed algorthms n CUDA, and run t on an NVIDIA GeForce GTX 285 (1GB) graphcs card. We observe a typcal speedup of 20x over our sngle-threaded CPU mplementaton runnng on an Pentum(R) Dual-Core CPU E5200@2.50GHz wth 2GB RAM, whch allows our method to be appled n an nteractve mean shft clusterng framework for moderateszed data set. 4. Applcatons and comparsons (a) (b) (c) Fgure 2: Image segmentaton results comparson. (a) The orgnal mage, (b) the result usng nearest sample based mode selecton, (c) the result usng weghted modes nterpolaton. 3.3. Complexty analyss and GPU mplementaton Our algorthm accelerates the mean shft procedures by computng a lower resoluton feature space and then nterpolatng the clusterng result to the orgnal data set. By usng a KD-tree structure, we construct a reduced feature space for nput n d-dmensonal data ponts wth m feature vector: {x } n =1 { y } m j, and m n. Assumng the depth of Gaussan tree s O(logm), the complexty of tree constructon s O(ndlog m). Performng nearest neghborhood for each of the n nput ponts to scatter values nto the tree takes O(n(logm + d)) tme. If performng τ teratons for mean shft clusterng procedure, computng the mean shft teraton n the reduced space wth m feature vector takes O(τm(logm + d)) tme. In the last stage, we We apply the proposed fast mean shft clusterng to followng applcatons, mage segmentaton, vdeo segmentaton, geometry model segmentaton, and anmated object segmentaton. We also present the comparson results on both performance and segmentaton qualty wth the most related methods. Our approach s mplemented usng C++ on a machne equpped wth Pentum(R) Dual-Core CPU E5200@2.50GHz wth 2GB RAM. The GPU acceleraton s based on CUDA [ http: http://www.nvda.com/cuda ] and run on a NVIDIA GeForce GTX 285 (1GB) graphcs card. 4.1. Image segmentaton We apply the proposed fast mean shft method for mage segmentaton. All pxels that converge to same mode are collected together and are consdered to be the same segment. In mage case, we defne the feature vector of pxel x = (σ p p,σ cc ) comprsng ts poston p = (x,y) and ts color value c = (r,g,b) (Lab space) whch can be weghted by parameters σ p and σ c. Thus, each pxel x s a fvedmensonal vector. Fgure 3 presents the segmentaton results generated applyng our fast mean shft method based on dfferent samplng factor. As llustrated n Fgure 3, there are 6 10 6 pxels n the orgnal mage. Even wth very hgh samplng factor such as n/m = 4, 096, the segmentaton results s stll pleasng. Wth much less samples, the mage can be clustered n hgh speed even wthout usng GPU acceleraton. It

takes only total 0.958 seconds on CPU to perform mean shft clusterng wth samplng factor n/m = 1,024. In Fgure 4, we present mage segmentaton results for the mages wth dfferent szes, and gve the comparson results wth standard mean shft method [CM02], the accelerated method of Pars and Durand [PD07], and compactly represented KDE of Freedman and Kslev [FK09]. We comprse wth these methods on both performance and segmentaton qualty. Compared wth [CM02], some weak features may be lost usng our method snce they may be ncorporated nto the salent features durng data Gaussan KD-tree clusterng, however, the salent features may be better kept, as llustrated n Fgure 4. As shown n Fgure 4, gven the same samplng factor, our method generates hgher qualty compared wth [PD07] and [FK09], especally at the regons wth weak edges. The complexty of [PD07] depends on the dmenson d of the pont, when processng hgh-dmensonal data, ths method does not show much advantage. However, our method s fast even wth low samplng factor and hgh dmensonal data sets. as shown n Table 1. It takes our method 5.91 second to cluster 6.6 10 6 pxels on CPU, whle t take 105.5 seconds usng [PD07]. Usng our method ncorporatng GPU mplementaton, our method shows greater advantage when processng large data sets wth hgh dmensonal feature vector, t takes less than 0.2 second to cluster 6.6 10 6 pxels on GPU. 4.2. Vdeo segmentaton Mean shft clusterng can also be used n vdeo segmentaton [DeM02]. As vdeo streamng usually contans mllons of pxels, practcal vdeo segmentaton usng mean shft clusterng depends heavly on the performance of the mean shft procedure. In addton, compared wth mage data, the dmensons of feature space are hgher, whch further ncrease the computatonal complexty of mean shft procedure. Thus, t s mpractcable to segment long range vdeo streamng by performng standard mean shft clusterng wthout usng acceleraton technques. However, usng our fast mean shft clusterng method, we can perform vdeo segmentaton at nteractve rate. We defne a feature space {x } n =1 of seven dmensons for the vdeo. The feature vector at each pxel x = (σ p p,σ c c,σ t t,σ ς ς ) comprses ts poston p (x and y coordnate), color c (Lab color vector), tme t and moton ς, these four knds of features can be weghted by parameters σ p, σ v, σ t and σ ς, and the values of these parameters are defned as constants for all pxels. As llustrated n Fgure 5, there are 1.44 10 7 pxels n the vdeo, and we frst cluster the vdeo wth samplng factor 16348 usng KD tree. It takes our method 16.23 seconds to perform the mean shft clusterng on CPU, and 1.2 seconds on GPU. It takes 320.3 seconds on CPU usng [PD07]. Fgure 5: Vdeo segmentaton. Left column, nput vdeo (600 480 50), from top to down, 1th frame, 28th frame, 50th frame. Rght column, segmentaton results. 4.3. Mesh model segmentaton Smlar to mage and vdeo segmentaton n mage processng and analyss, surface segmentaton s one of the most mportant operatons n geometry processng and modelng. Yamauch et al. [YLL 05] adopted the mean shft clusterng to mesh partton, and produced feature senstve mesh segmentaton. In Fgure 6, we gve the mesh segmentaton usng the proposed fast mean shft clutterng. We defne a feature space {x } n =1 of sx dmensons for the mesh model. The feature vector at vertex x = (σ p p,σ v v ) comprses ts poston p (x, y and z coordnate) and normal v (three dmenson vector) whch can be weghted by parameters σ p and σ v. The values of σ p and σ v are defned as global constants for all vertces. In Fgure 6, usng the varant of mean shft procedure to the mesh model, we receve a patch-type segmentaton results, and the segmentaton results are senstve to the salent surface features. Furthermore, we adopt the herarchcal mage segmentaton method [PD07] to mesh model and generate the herarchcal segmentaton results. Note that our fast mean shft method s guaranteed to produce segmentaton results whch catch the meanngful components, no addtonal computaton s needed to compute the herarchcal results. For a mesh model wth 70,994 vertces, t takes 0.31 second for our method to compute the results. We also gve the comparson results wth [YLL 05]. As shown n Fgure 6, our approach generates more convncng results. 4.4. Anmated geometry object segmentaton The proposed fast mean shft also can be used to accelerate the anmated object (geometry surface sequences) segmenc 2010 The Author(s)

(a) (b) (c) (d) (e) Fgure 3: Image segmentaton usng dfferent samplng factor. (a) Orgnal mage, (b) n/m =256, (c) n/m =1,024, (d) n/m =4,096, (e) n/m =16,384. Data set Data set sze d n/m Kd-tree constructon Modes computng Modes nterpolaton tme brd 564 752 5 1024 0.904 0.024 0.030 0.958 Obama 1128 1504 5 4096 1.013 0.016 0.108 1.137 castle 2256 3008 5 4096 5.105 0.088 0.717 5.910 Vdeo 600 480 50 7 16348 13.898 0.084 2.248 16.23 Mesh 70994 6 512 0.225 0.020 0.065 0.310 Horse 30784 60 30 4094 1.311 0.053 0.136 1.500 Table 1: Performance of our method for dfferent knds of data sets. Note that we perform clusterng for anmated object (Horse) on GPU. All other data sets are performed on CPU. (a) (b) (c) (d) (e) (f) Fgure 6: Herarchcal decomposton of statc mesh model. (a)-(e) s the results of our proposed herarchcal decomposton method, (f) s the result of [YLL 05]. taton. Inspred by [LXL ], we frst compute approxmately nvarant sgnature vectors ξ for each vertex of the anmated object [LG05], whch s a local and hgh dmensonal approxmately nvarant under shape rgd/scalng transformatons. Then both the geometrc attrbutes (vertex poston p and ts normal v ) and ts local sgnature vector ξ of each vertex x on the anmated object can be weghted by parameters σ p, σ v and σ ξ, whch construct the hgh dmensonal feature space of the anmated object x = (σ p p,σ vv,σ ξ ξ ). Then the vertces of anmated object can be clustered effcently usng the proposed GPU- accelerated mean shft clusterng algorthm. In Fgure 7, we gve the anmated object segmentaton results. There are total 1.6 10 6 vertces n the anmated object wth 50 frames. We use d = 24 dmensons for the sgnature vector ξ (ξ R 24 ) n our mplementaton. It takes 1.5 seconds on GPU to complete the mean shft teratons (10 teratons n ths example) wth samplng factor n/m = 4096. We also gve the comparson results wth [WB10]. Wuhrer and Brunton [WB10] performed the anmated object segmentaton n dual space of the mesh model. They found near-rgd segments whose segment boundares locate at regons of large deformaton, and assumed that the vertex-tovertex correspondences of the nput meshes were known. As an alternatve, our method reles on the local hgh dmensonal sgnature vector nformaton for clusterng, ncorporatng wth the proposed fast mean shft clusterng technques, whch ensures the decomposed parts more meanngful and temporally-coherent results n hgher speed. 5. Concluson In ths paper, we propose a new algorthm for acceleratng compute mean shft clusterng. Usng KD-tree to adaptvely cluster the orgnal data set nto clusters wth smlar feature smlarty, the clusters construct the samples of the orgnal data set. Then we compute the mean shft procedure on

(a) (b) (c) (d) (e) Fgure 4: Image segmentaton comparsons. (a) orgnal mage, (b) mage segmentaton usng standard mean shft (EDISON), (c) mage segmentaton usng [PD07], (d) mage segmentaton usng [FK09], (e) mage segmentaton usng our proposed method. the greatly reduced sampled feature space and generated the modes, and fnally by usng the Gaussan mportance weght, we upsample the computed modes to the orgnal data set to get fnal clusterng results. Our algorthm sgnfcantly speeds up the performance whle not sacrfcng the accuracy. Our method s especally useful for hgh resoluton mages, long tme vdeo sequences and geometry models segmentaton wth large pont set. Acknowledgment We would lke to thank the anonymous revewers for ther valuable comments and nsghtful suggestons. Ths work was partly supported by NSFC (No. 60803081), Natonal Hgh Technology Research and Development Program of Chna (863 Program) (No. 2008AA121603), Natural Scence Foundaton of Hube Provnce (2008CDB350), State Key Lab of CAD&CG (No. A0808), the Fundamental Research Funds for the Central Unverstes (6081005). References [AGDL09] ADAMS A., GELFAND N., DOLSON J., LEVOY M.: Gaussan kd-trees for fast hgh-dmensonal flterng. ACM Transactons on Graphcs (TOG) 28, 3 (2009), 21. [AMN 98] ARYA S., MOUNT D., NETANYAHU N., SILVER- MAN R., WU A.: An optmal algorthm for approxmate nearc 2010 The Author(s)

Fgure 7: Anmated object decomposton result comparson. Top row: our results, bottom row: Wuhrer and Brunton [WB10]. est neghbor searchng fxed dmensons. Journal of the ACM (JACM) 45, 6 (1998), 891 923. [AP08] AN X., PELLACINI F.: Appprop: all-pars appearancespace edt propagaton. ACM Trans. Graph 27, 3 (2008), 40. [BC04] BARASH D., COMANICIU D.: A common framework for nonlnear dffuson, adaptve smoothng, blateral flterng and mean shft. Image and Vson Computng 22, 1 (2004), 73 81. [BCM05] BUADES A., COLL B., MOREL J.: A non-local algorthm for mage denosng. In CVPR 2005 (2005), pp. 60 65. [Buc07] BUCK I.: Gpu computng: Programmng a massvely parallel processor. In Proceedngs of the Internatonal Symposum on Code Generaton and Optmzaton (2007), IEEE Computer Socety, p. 17. [Che95] CHENG Y.: Mean shft, mode seekng, and clusterng. IEEE Transactons on Pattern Analyss and Machne Intellgence 17, 8 (1995), 790 799. [CM02] COMANICIU D., MEER P.: Mean shft: A robust approach toward feature space analyss. IEEE Transactons on pattern analyss and machne ntellgence 24, 5 (2002), 603 619. [CP06] CARREIRA-PERPINÁN M.: Acceleraton strateges for Gaussan mean-shft mage segmentaton. In CVPR (2006), vol. 1. [CRM03] COMANICIU D., RAMESH V., MEER P.: Kernel-based object trackng. IEEE Transactons on Pattern Analyss and Machne Intellgence 25, 5 (2003), 564 577. [DeM02] DEMENTHON D.: Spato-temporal segmentaton of vdeo by herarchcal mean shft analyss. Language 2 (2002). [DS02] DECARLO D., SANTELLA A.: Stylzaton and abstracton of photographs. In SIGGRAPH (2002), ACM, pp. 769 776. [EDD03] ELGAMMAL A., DURAISWAMI R., DAVIS L.: Effcent kernel densty estmaton usng the fast gauss transform wth applcatons to color modelng and trackng. IEEE Transactons on Pattern Analyss and Machne Intellgence 25, 11 (2003), 1499 1504. [FDCO03] FLEISHMAN S., DRORI I., COHEN-OR D.: Blateral mesh denosng. ACM Transactons on Graphcs (TOG) 22, 3 (2003), 950 953. [FH75] FUKUNAGA K., HOSTETLER L.: The estmaton of the gradent of a densty functon, wth applcatons n pattern recognton. IEEE Transactons on Informaton Theory 21, 1 (1975), 32 40. [FK09] FREEDMAN D., KISILEV P.: Fast Mean Shft by compact densty representaton. [GSM03] GEORGESCU B., SHIMSHONI I., MEER P.: Mean shft based clusterng n hgh dmensons: A texture classfcaton example. In ICCV (2003), pp. 456 463. [HSA91] HANRAHAN P., SALZMAN D., AUPPERLE L.: A rapd herarchcal radosty algorthm. ACM SIGGRAPH Computer Graphcs 25, 4 (1991), 206. [HSHH07] HORN D., SUGERMAN J., HOUSTON M., HANRA- HAN P.: Interactve kd tree GPU raytracng. In Proceedngs of the 2007 symposum on Interactve 3D graphcs and games (2007), ACM, p. 174. [JDD03] JONES T., DURAND F., DESBRUN M.: Non-teratve, feature-preservng mesh smoothng. ACM Transactons on Graphcs 22, 3 (2003), 943 949. [LG05] LI X., GUSKOV I.: Mult-scale features for approxmate algnment of pont-based surfaces. In SGP (2005), Eurographcs Assocaton, p. 217. [LXL ] LIAO B., XIAO C., LIU M., DONG Z., PENG Q.: Fast Herarchcal Anmated Object Decomposton Usng Approxmately Invarant Sgnature. Submmted to The Vsual Computer. [PD06] PARIS S., DURAND F.: A fast approxmaton of the blateral flter usng a sgnal processng approach. ECCV (2006). [PD07] PARIS S., DURAND F.: A topologcal approach to herarchcal segmentaton usng mean shft. In CVPR (2007), pp. 1 8. [TM98] TOMASI C., MANDUCHI R.: Blateral flterng for gray and color mages. [WB10] WUHRER S., BRUNTON A.: Segmentng anmated objects nto near-rgd components. The Vsual Computer 26, 2 (2010), 147 155. [WBC 05] WANG J., BHAT P., COLBURN R., AGRAWALA M., COHEN M.: Vdeo cutout. ACM Transactons on Graphcs 24, 3 (2005), 585 594. [WLGR07] WANG P., LEE D., GRAY A., REHG J.: Fast mean shft wth accurate and stable convergence. In Workshop on Artfcal Intellgence and Statstcs (AISTATS) (2007), Cteseer. [WQ04] WEI Y., QUAN L.: Regon-based progressve stereo matchng. In CVPR (2004), vol. 1, Cteseer, pp. 106 113. [WTXC] WANG J., THIESSON B., XU Y., COHEN M.: Image and vdeo segmentaton by ansotropc kernel mean shft. Computer Vson-ECCV 2004, 238 249. [WXSC04] WANG J., XU Y., SHUM H., COHEN M.: Vdeo toonng. In ACM SIGGRAPH 2004 Papers (2004), ACM, pp. 574 583. [XLJ 09] XU K., LI Y., JU T., HU S., LIU T.: Effcent affntybased edt propagaton usng KD tree. In ACM SIGGRAPH Asa 2009 papers (2009), ACM, pp. 1 6. [XNT10] XIAO C., NIE Y., TANG F.: Effcent Edt Propagaton Usng Herarchcal Data Structure. IEEE Transactons on Vsualzaton and Computer Graphcs (2010). [YDDD03] YANG C., DURAISWAMI R., DEMENTHON D., DAVIS L.: Mean-shft analyss usng quas-newton methods. In ICIP (2003), vol. 3, Cteseer, pp. 447 450. [YDGD03] YANG C., DURAISWAMI R., GUMEROV N., DAVIS L.: Improved fast gauss transform and effcent kernel densty estmaton. In ICCV (2003), pp. 664 671. [YLL 05] YAMAUCHI H., LEE S., LEE Y., OHTAKE Y., BELYAEV A., SEIDEL H.: Feature senstve mesh segmentaton wth mean shft. In SMA (2005), vol. 243, IEEE. [ZHWG08] ZHOU K., HOU Q., WANG R., GUO B.: Real-tme kd-tree constructon on graphcs hardware. In ACM SIGGRAPH Asa 2008 papers (2008), ACM, pp. 1 11.