arxiv: v1 [cs.cr] 22 Apr 2015 ABSTRACT

Size: px
Start display at page:

Download "arxiv: v1 [cs.cr] 22 Apr 2015 ABSTRACT"

Transcription

1 Differentially Private -Means Clustering arxiv:10.0v1 [cs.cr] Apr 01 ABSTRACT Dong Su #, Jianneng Cao, Ninghui Li #, Elisa Bertino #, Hongxia Jin There are two broa approaches for ifferentially private ata analysis. The interactive approach aims at eveloping customize ifferentially private algorithms for various ata mining tass. The non-interactive approach aims at eveloping ifferentially private algorithms that can output a synopsis of the input ataset, which can then be use to support various ata mining tass. In this paper we stuy the traeoff of interactive vs. non-interactive approaches an propose a hybri approach that combines interactive an noninteractive, using -means clustering as an example. In the hybri approach to ifferentially private -means clustering, one first uses a non-interactive mechanism to publish a synopsis of the input ataset, then applies the stanar -means clustering algorithm to learn cluster centrois, an finally uses an interactive approach to further improve these cluster centrois. We analyze the error behavior of both non-interactive an interactive approaches an use such analysis to ecie how to allocate privacy buget between the non-interactive step an the interactive step. Results from extensive experiments support our analysis an emonstrate the effectiveness of our approach. 1. INTRODUCTION In recent years, a large an growing boy of literature has investigate ifferentially private ata analysis. Broaly, they can be classifie into two approaches. The interactive approach aims at eveloping customize ifferentially private algorithms for specific ata mining tass. One ientifies the queries that nee to be answere for the ata mining tas, analyze their sensitivity, an then answers them by aing appropriate noises. The non-interactive approach aims at eveloping an approach to compute, in a ifferentially private way, a synopsis of the input ataset, which can then be use to generate a synthetic ataset, or to irectly support various ata mining tass. An intriguing question is which of the two approaches is better? Given an input ataset D, the esire privacy parameter ǫ, which we refer to as the privacy buget, an one or more ata # Department of Computer Science, Purue University {su1, ninghui, bertino}@cs.purue.eu Institute for Infocomm Research, Singapore caojn@ir.a-star.eu.sg Samsung Information Systems of America hongxia.jin@sisa.samsung.com analysis tass, shoul one use the interactive approach or the noninteractive approach? This question is largely open. In general, the non-interactive approach has the avantage that once a synopsis is constructe, many analysis tass can be conucte on the synopsis. In contrast, using the interactive approach, one is limite to executing the interactive algorithm just once; any aitional access to the ataset woul violate ifferential privacy. Therefore, strictly speaing, a ataset can serve only one analyst, an for only one tas. (One coul ivie the privacy buget for multiple analysts an/or multiple tass, but then the accuracy for each tas will suffer.) On the other han, because the interactive approach is esigne specifically for a particular ata mining tas, one might expect that, uner the same privacy buget it shoul be able to prouce more accurate results than the non-interactive approach. In this paper we initiate the stuy of the traeoff of interactive vs. non-interactive approaches, using-means clustering as the example. Clustering analysis plays an essential role in ata management tass. Clustering has also been use as a prime example to illustrate the effectiveness of interactive ifferentially private ata analysis [, 11,,, 1,, 0]. There are three state of the art interactive algorithms. The first is the ifferentially private version of the Lloy algorithm [, ], which we call DPLloy. The secon algorithm uses the sample an aggregation framewor [] an is implemente in the GUPT system [1], which we call GM. The thir an most recent one, which we call PGM, uses Priv- Gene [0], a framewor for ifferentially private moel fitting base on genetic algorithms. To the best of our nowlege, performing -means clustering using the non-interactive approach has not been explicitly propose in the literature. In this paper, we propose to combine the following non-interactive ifferentially private synopsis algorithms with - means clustering. The ataset is viewe as a set of points over a -imensional omain, which is ivie into M equal-size cells, an a noisy count is obtaine from each cell. A ey ecision is to choose the parameter M. A larger M value means lower average counts for each cell, an therefore noisy counts are more liely to be ominate by noises. A smallerm value means larger cells, an therefore one has less accurate information of where the points are. ) We propose a metho that sets M = ( Nǫ +, which is erive base on extening the analysis in [], which aims to minimize errors when answering rectangular range queries for -imensional ata, to higher imensional case. We call the resulting -means algorithm EUGM, where EUG is for Extene Uniform Gri. We conucte extensive experimental evaluations for these algorithms on external atasets an 1 atasets that we synthesize by varying the imension from to an the number of clusters from to. Experimental results are quite interesting. GM was 1

2 introuce after DPLloy an was claime to have accuracy avantage over DPLloy, an PGM was introuce after an compare GM. However, we foun that DPLloy is the best metho among these three methos. In the comparison of DPLloy an GM in [1], DPLloy was run using much larger number of iterations than necessary, an thus perform poorly. In [0], PGM was compare only with GM, an not with DPLloy. More specifically, we foun that GM is by far the worst among all methos. Through experimental analysis of the sources of the errors, we foun that it is possible to ramatically improve the accuracy of GM by choosing smaller partitions in the sample an aggregation framewor. After this improvement, GM becomes competitive with PGM. However, DPLloy, the earliest metho is clearly the best performing algorithm among the interactive algorithms. Through analysis, we foun that why DPLloy outperforms PGM. The genetic programming style PGM nees more iterations to converge. When maing these algorithms ifferentially private, the privacy buget is ivie among all iterations, thus having more iterations means more noise is ae to each iteration. Therefore, the more irect DPLloy outperforms PGM. The most intriguing results are those comparing DPLloy with EUGM. For most atasets, EUGM performs much better than DPLloy. For a few, they perform similarly, an for two atasets DPLloy outperforms EUGM. Through further theoretical an empirical analysis, we foun that while the performance of both algorithms are greatly affecte by the two ey parameters an, they are affecte ifferently by these two parameters. DPLloy scales worse when increases, while EUGM scales worse when increases. Again we use analysis to emonstrate why this is the case. An intriguing question is can we further improve DPLloy? The accuracy of DPLloy is affecte by two ey factors: the number of iterations an the choice of initial centrois. In fact, these two are closely relate. If the initially chosen centrois are very goo an close to the true centrois, one only nees perhaps one iteration to improve it, an this reuction in the number of iterations woul mean little noise is ae. This leas us to propose a novel hybri metho that combines non-interactive EUGM with interactive DPLloy. We first use half the privacy buget to run EUGM, an then use the centrois outputte by EUGM as the initial centrois for one roun of DPLloy. Such a metho, however, may not actually outperform EUGM, especially when the privacy buget ǫ is small, since then one roun of DPLloy may actually worsen the centrois. We use our error analysis formulas to etermine whether there is sufficient privacy buget for such a hybri approach to outperform EUGKM. We then experimentally valiate the effectiveness of the Hybri approach. The hybri iea is applicable to general private ata analysis tass which require parameter tuning. In the no-privacy setting, one typically tunes parameters by builing moels for several parameters an selecting the one which offers the best utility. Uner the ifferential privacy setting, such in of parameter tuning proceure oes not wor well since the limite privacy buget might be over-ivie by trying many ifferent parameters. Chauhuri et al. [] propose a metho for private parameter tuning by taing avantage of parallel composition. The iea is to buil private moels with ifferent parameters on separate subset of the ataset an evaluate moels on a valiation set. The best parameter is chosen via exponential mechanism with quality function efine by the evaluation score. However, this approach is also not scalable well over a large set of caniate parameters which might result each ata bloc to have very small number of points an therefore lea to very inaccurate moel. Our propose hybri approach offers a better solution. We can first publish private synopses of the input ata, on which we try a large set of parameters. Then, we run the interactive private analysis with the selecte parameter on the input ataset to get the final result. In this paper we avance the state of art on ifferentially private ata mining in several ways. First, we have introuce noninteractive methos for ifferentially private -means clustering, which are highly effective an often outperform state of the art interactive methos. Secon, we have extensively evaluate three interactive methos, an one non-interactive methos, an analyze their strengths an weanesses. Thir, we have evelope techniques to analyze the error resulte from both DPLloy an EU- GM. Finally, we introuce the novel concept of hybri approach to ifferentially private ata analysis, which is so far the best approach to -means clustering. We conjecture that the concept of hybri ifferential privacy approach may prove useful in other analysis tass as well. The rest of the paper is organize as follows. In Section, we iscuss relate wor. In Section, we give preliminary information about ifferential privacy an-means clustering. In Section, we escribe the existing three interactive approaches, DPLloy, GM, PGM an one non-interactive approache EUGM. In Section, we first show the experimental results on the performance comparison among the interactive an non-interactive approaches, an analyze their strengths an weanesses. In Section we stuy the error behavior of DPLloy an EUGM, introuce the hybri approach, an compare these with existing algorithms. We conclue in Section.. RELATED WORK The notion of ifferential privacy was evelope in a series of papers [, 1,, 1, ]. Several primitives for answering a single query ifferentially privately have been propose. Dwor et al. [1] introuce the metho of aing Laplacian noise scale with the sensitivity. McSherry an Talwar [] introuce a more general exponential mechanism. Nissim et al. [] propose aing noises proportion to local sensitivity. Blum et al. [] propose a sublinear query (SuLQ) atabase moel for interactively answering a sublinear number (in the size of the unerlying atabase) of count queries ifferential privately. The users (e.g. machine learning algorithms) issue queries an get responses which are ae laplace noises. They applie the SuLQ framewor to the -means clustering an some other machine learning algorithms. McSherry [] built the PINQ (Privacy INtegrate Queries) system, a programming platform which provies several ifferentially-private primitives to enable ata analysts to write privacy-preserving applications. These private primitives inclue noisy count, noisy sum, noisy average, an exponential mechanism. The DPLloy algorithm, which we compare against in this paper, has been implemente using these primitives. Another programming framewor with ifferential privacy support is Airavat, which maes programs using the MapReuce framewor ifferentially private []. Nissim et al. [, ] propose the sample an aggregate framewor (SAF), an use -means clustering as a motivating application for SAF. This SAF framewor has been implemente in the GUPT system [1] an is evaluate by -means clustering. This is the GM algorithm that we compare with in the paper. Dwor [11] suggeste applying a geometric ecreasing privacy buget allocation strategy among the iterations of -means, whereas we use an increasing sequence. Geometric ecreasing sequence will cause later rouns using increasingly less privacy buget, resulting in

3 higher an higher istortion with each new iteration. Zhang et al. [0] propose a general private moel fitting framewor base on genetic algorithms. The PGM approach in this paper is an instantiation of the framewor to-means clustering. Interactive methos for other ata mining tass have been propose. McSherry an Mironov [] aapte algorithms proucing recommenations from collective user behavior to satisfy ifferential privacy. Frieman an Schuster [1] mae the ID ecision tree construction algorithm ifferentially private. Chauhuri an Monteleoni [] propose a ifferentially private logistic regression algorithm. Zhang et al. [1] introuce the functional mechanism, which perturbs an optimization objective to satisfy ifferential privacy, an applie it to linear regression an logistic regression. Differentially private frequent itemset mining has been stuie in [, ]. The traeoffs of interactive an non-interactive approaches in these omains are interesting future research topics. Most non-interactive approaches aim at eveloping solutions to answer histogram or range queries accurately [1,, 1, ]. Dwor et al. [1] calculate the frequency of values an release their istribution ifferentially privately. Such metho maes the variance of query result increase linearly with the query size. To aress this issue, Xiao et al. [] propose a wavelet-base metho, by which the variance is polylogarithmic to the query size. Hay et al. [1] organize the count queries in a hierarchy, an improve the accuracy by enforcing the consistency between the noisy count value of a parent noe an those of its chilren. Cormoe et al. [] aapte stanar spatial inexing techniques, such as quatree an -tree, to ecompose ata space ifferential-privately. Qaraji et al. [] propose the UG an AG metho for publishing - imensional atasets. Mohamme et al. [] tailore the noninteractive ata release for construction of ecision trees. Roth et al. [] stuie the problem on how to release synthetic ata ifferentially privately for any set of count queries specifie in avance. They propose a ǫ-ifferentially private mechanism whose error scales only logarithmically with the number of queries being answere. However, it is not computationally efficient (super-polynomial in the ata universe size). Subsequent wor inclues [1, 0, 1,, 1, 1]. One of the typical wors is the private multiplicative weight mechanism [0] which is propose to answer count queries interactively whose error also scales logarithmically with the number of queries seen so far. Its running time is only linear in the ata universe size.. BACKGROUND.1 Differential Privacy Informally, ifferential privacy requires that the output of a ata analysis mechanism shoul be approximately the same, even if any single tuple in the input atabase is arbitrarily ae or remove. DEFINITION 1 (ǫ-differential PRIVACY [, 1]). A ranomize mechanism A gives ǫ-ifferential privacy if for any pair of neighboring atasets D an D, an any S Range(A), Pr[A(D) = S] e ǫ Pr [ A(D ) = S ]. In this paper we consier two atasetsdand to be neighbors if an only if either D = D + t or D = D + t, where D + t enotes the ataset resulte from aing the tuple t to the ataset D. We used D to enote this. This protects the privacy of any single tuple, because aing or removing any single tuple results in e ǫ -multiplicative-boune changes in the probability istribution of the output. Differential privacy is composable in the sense that combining multiple mechanisms that satisfy ifferential privacy for ǫ 1,,ǫ m results in a mechanism that satisfies ǫ-ifferential privacy for ǫ = iǫi. Because of this, we refer to ǫ as the privacy buget of a privacy-preserving ata analysis tas. When a tas involves multiple steps, each step uses a portion of ǫ so that the sum of these portions is no more thanǫ. There are several approaches for esigning mechanisms that satisfy ǫ-ifferential privacy, incluing Laplace mechanism [1] an Exponential mechanism []. The Laplace mechanism computes a function g on the ataset D by aing to g(d) a ranom noise, the magnitue of which epens on GS g, the global sensitivity or the L 1 sensitivity of g. Such a mechanism A g is ) given below: A g(d) = g(d)+lap ( GSg ǫ where GS g = max (D,D ):D D g(d) g(d ), an Pr[Lap(β) = x] = 1 β e x /β. In the above, Lap(β) enotes a ranom variable sample from the Laplace istribution with scale parameter β.. -means Clustering Algorithms The -means clustering problem is as follows: given a - imensional ataset D = {x 1,x,...,x N }, partition ata points in D into sets O = {O 1,O,,O } so that the Normalize Intra-Cluster Variance (NICV) is minimize 1 N j=1 x l O j x l o j. (1) The stanar -means algorithm is the Lloy s algorithm []. The algorithm starts by selecting points as the initial choices for the centroi. The algorithm then tries to improve these centroi choices iteratively until no improvement can be mae. In each iteration, one first partitions the ata points into clusters, with each point assigne to be in the same cluster as the nearest centroi. Then, one upates each centroi to be the center of the ata points in the cluster. i [1..] o j i x l O x l j i, () O j where j = 1,,...,,x l i ano j i are the i-th imensions of xl an o j, respectively. The algorithm continues by alternating between ata partition an centroi upate, until it converges.. THE INTERACTIVE AND NON- INTERACTIVE APPROACHES In this section, we escribe interactive approaches an noninteractive approaches to ifferential private -means clustering..1 Interactive Approaches.1.1 DPLloy Differentially private -means or LLoy s algorithm was first propose by Blum et al. [] an was later implemente in the PINQ system [], a platform for interactive privacy preserving ata analysis. We call this the DPLloy approach. DPLloy iffers from the stanar Lloy algorithm in the following ways. First, Laplacian noise is ae to the iterative upate step in the Lloy algorithm. Secon, the number of iterations nees to be fixe in orer to ecie how much noise nees to be ae in each iteration. Each iteration requires computing the total number of points in a cluster an, for each imension, the sum of the coorinates of the ata points in a cluster. Let t be the number of iterations, an be

4 the number of imensions. Then, each tuple is involve in answering t sum queries an t count queries. To boun the sensitivity of the sum query to a small number r, each imension is normalize to[ r,r]. Thus, the global sensitivity of DPLloy is(r+1)t, an each query is answere by aing Laplacian noise Lap ( (r+1)t ǫ There are two issues that greatly impact the accuracy of DPLloy. The first is the number of iterations. A large number of iterations causes too much noises being ae. A small number of iterations may be insufficient for the algorithm to converge. In [], the number of iterations is set to be, which seems to wor well for many settings. The secon is the quality of initial centrois. A poor choice of initial centrois can result in converging to a local optimum that is far from global optimum, or not converging after the given number of iterations. While many methos for choosing the initial points have been evelope [], these methos were evelope without the privacy concern an nee access to the ataset. In [], points at uniform ranom from the omain are chosen as the initial centrois. We have observe empirically that this can perform poorly in some settings, since some ranomly chosen initial centrois are close together. We thus introuce an improve metho for choosing initial centrois that is similar to the concept of sphere pacing. Given a raius a, we ranomly generate centrois one by one such that each new centroi is of istance at least a away from each borer of the omain an each new centroi is of istance at least a away from any existing centroi. When a ranomly chosen point oes not satisfy this conition, we generate another point. When we have faile repeately, we conclue that the raius a is too large, an try a smaller raius. We use a binary search to fin the maximal value for a such that it is the process of choosing centrois succee. This process is ata inepenent..1. GM The -means clustering problem was also use to motivate the sample an aggregate framewor (SAF) for satisfying ifferential privacy, which was evelope in [, ], an implemente in the GUPT system [1]. Given a ataset D an a function f, SAF first partitions D into l blocs, then it evaluates f on each of the bloc, an finally it privately aggregates results from all blocs into a single one. Since any single tuple in D falls in one an only one bloc, aing one tuple can affect at most one bloc s result, limiting the sensitivity of the aggregation step. Thus one can a less noise in the final step to satisfy ifferential privacy. As far as we now, GUPT [1] is the only implementation of SAF. Authors of [1] implemente -means clustering an use it to illustrate the effectiveness of GUPT. We call this algorithm GM. Given a ataset D, it first partitions D into l blocs D 1,D,...,D l. Then, for each bloc D b (1 b l), it calculates its centrois o b,1,o b,,...,o b,. Finally, it averages the centrois calculate from all blocs an as noise. Specifically, the i th imension of the j th aggregate centroi is o j i = 1 l l b=1 o b,j i +Lap ( (maxi min i) l ǫ ). ), () where o b,j i is the i th imension of o b,j, [min i,max i] is the estimate output range ofi th imension. One half of the total privacy buget is use to estimate this output range, an the other half is use for aing Laplace noise. We have foun that the implementation ownloae from [0], which uses Equation (), performe poorly. Analyzing the ata closely, we foun that min i an max i often fall outsie of the ata range, especially for small ǫ. We slightly moifie the coe to bounmin i anmax i to be within the ata omain. This oes not affect the privacy, was able to greatly improve the accuracy. In this paper we use this fixe version. Here a ey parameter is the choice of l. Intuitively, a larger l will result in each bloc being very small an unable to preserve the cluster information in the blocs, an a smaller l, on the other han, results in large noise ae. (Note the inverse epenency on l in Equation (). Analysis in [1] suggests to set l = N 0.. Our experimental results, however, show that the performance is quite poor. We consier a variant that chooses l = N, i.e., having each bloc containing points, which performs much better than settingl = N PGM PrivGene [0] is a general-purpose ifferentially private moel fitting framewor base on genetic algorithms. Given a ataset D an a fitting-score function f(d, θ) that measures how well the parameter θ fits the ataset D, the PrivGene algorithm initializes a caniate set of possible parameters θ an iteratively refines them by mimicing the process of natural evolution. Specifically, in each iteration, PrivGene uses the exponential mechanism [] to privately select from the caniate set m parameters that have the best fitting scores, an generates a new caniate set from the m selecte parameters by crossover an mutation. Crossover regars each parameter as an l-imensional vector. Given two parameter vectors, it ranomly selects a number l such that 0 < l < l an splits each vector into the first l imensions in the vector an the remainingl l imensions (the lower half). Then, it swaps the lower halves of the two vectors to generate two chil vectors. These vectors are then mutate by aing a ranom noise to one ranomly chosen imension. In [0], PrivGene is applie to logistic regression, SVM, an -means clustering. In the case of -means clustering, the NICV formula in Equation 1, more precisely its non-normalize version, is use as the fitting function f, an the set of cluster centrois is efine as parameter θ. Each parameter is a vector of l = imensions. Initially, the caniate set is populate with 00 sets of cluster centrois ranomly sample from the ata space, each set containing exactly centrois. Then, the algorithm runs iteratively for max{,(xnǫ)/m } rouns, where x an m are empirically set to1. an, respectively, an N is the ataset size. We call the approach of applying PrivGene to -means clustering PGM, which is similarly to DPLloy in that it tries to iteratively improve the centrois. However, rather than maintaining an improving a single set of centrois, PGM maintains a pool of caniates, uses selection to improve their quality, an crossover an mutation to broaen the pool. Similar to DPLloy, a ey parameter is the number of iterations. Too few iterations, the algorithm may not converge. Too many iterations means too little privacy buget for each iteration, an the exponential mechanism may not be able to select goo caniates.. Non-interactive Approaches Interactive approaches such as DPLloy an GM suffer from two limitations. First, often times the purpose of conucting - means clustering is to visualize how the ata points are partitione into clusters. The interactive approaches, however, output only the centrois. In the case of DPLloy, one coul also obtain the number of ata points in each cluster; however, it cannot provie more etaile information on what shapes ata points in the clusters tae. The value of interactive private -means clustering is thus limite. Secon, as the privacy buget is consume by the interactive metho, one cannot perform any other analysis on the ataset;

5 oing so will violate ifferential privacy. Non-interactive approaches, which first generate a synopsis of a ataset using a ifferentially private algorithm, an then apply -means clustering algorithm on the synopsis, avoi these two limitations. In this paper, we consier the following synopsis metho. Given a -imensional ataset, one partitions the omain into M equal-with gri cells, an then releases the noisy count in each cell, by aing Laplacian noise to each cell count. The synopsis release is a set of cells, each of which has a rectangular bouning box an a (noisy) count of how many ata points are in the bouning box. The synopsis tells only how many points are in a cell, but not the exact locations of these points. For the purpose of clustering, We treat all points as if they are at the center of the bouning box. In aition, these noisy counts might be negative, non-integer, or both. A straightforwar solution is to roun the noisy count of a cell to be a non-negative nearest integer an replicate the cell center as many as the roune count. This approach, however, may introuce a significant systematic bias in the clustering result, when many cells in theug synopsis are empty or close to empty an these cells are not istribute uniformly. In this case, simply turning negative counts to zero can prouce a large number of points in those empty areas, which can pull the centroi away from its true position. We tae the approach of eeping the noisy count unchange an aapting the centroi upate proceure in-means to use the cell as a whole. Specifically, given a cell with center c an noisy count ñ, its contribution to the centroi is c ñ. Using this approach, in one cluster, cells who have negative noisy count can cancel out the effect of other cells with positive noise. Therefore, we can have better clustering performance. For this metho, the ey parameter is M, the number of cells. When M is large, the average count per cell is low, an the noise will have more impact. When M is small, each cell covers a large area, an treating all points as at the center may be inaccurate when the points are not uniformly istribute. We now escribe two methos of choosing M...1 EUGM Qaraji et al. [] stuie the effectiveness of proucing ifferentially private synopses of -imensional atasets for answering rectangular range counting queries (i.e., how many ata points there are in a rectangular range) with high accuracy, an suggeste choosing M = Nǫ. We now analyze the choice of M for higherimensional case. We use mean square error (MSE) to measure the accuracy of est with respect toact. That is, MSE(est) = E [ (est act) ] = Var(est)+(Bias(est)), where Var(est) is the variance of est an Bias(est) is its bias. There are two error sources when computing est. First, Laplace noises are ae to cell counts to satisfy ifferential privacy. This results in the variance of est. Since counting a cell size has the sensitivity of 1, Laplace noise Lap ( 1 ǫ) is ae. Thus, the noisy count has the variance of. Suppose that the given counting query ǫ covers α portion of the total M cells in the ata space. Then, Var(est) = α M. Secon, the given counting query may not fully ǫ contain the cells that fall on the borer of the query rectangle. To estimate the number of points in the intersection between the query rectangle an the borer cells, it assumes that ata are uniformly istribute. This results in the bias of est, which epens on the number of tuples in the borer cells. The borer of the given query consists of hyper rectangles, each being ( 1)-imensional. The number of cells falling on a hyper rectangle is in the orer of M 1. On average the number of tuples in these cells is in the orer ofm 1 N M = N M 1. Therefore, we estimate the bias of est with respect to one hyper rectangle to be β N, where β 0 is M 1 ( ) a parameter. We thus estimate (Bias(est)) to be β N. M 1 Summing the variance an the square bias, it follows that MSE(est) = α M N ǫ +β. M To minimize the MSE, we set the erivative of the above equation with respect tom to 0. This gives M = ( ) Nǫ +, () θ where θ = α. We name the above extene approach aseug β (extene uniform griing approach). We use EUGM to represent the EUG-base -means clustering scheme.. PERFORMANCE AND ANALYSIS In this section, we compare an analyze the performance of the five methos introuce in the last section..1 Evaluation Methoology We experimente with six external atasets an a group of syntheticly generate atasets. The first ataset is a D synthetic ataset S1 [1], which is a benchmar to stuy the performance of clustering schemes. S1 contains,000 tuples an 1 Gaussian clusters. The Gowalla ataset contains the user checin locations from the Gowalla location-base social networ whose users share their checing-in time an locations (longitue an latitue). We tae all the unique locations, an obtain a D ataset of,01 tuples. We set = for this ataset. The thir ataset is a 1- percentage sample of roa ataset which was rawn from the 00 TIGER (Topologically Integrate Geographic Encoing an Referencing) ataset []. It contains the GPS coorinates of roa intersections in the states of Washington an New Mexico. The fourth is Image [1], a D ataset with,11 RGB vectors. We set = for it. We also use the well nown Ault ataset [1]. We use its six numerical attributes, an set =. The last ataset is Lifesci. It contains, recors an each of them consists of the top principal components for a chemistry or biology experiment. As previous approaches [1, 0], we set =. Table 1 summarizes the atasets. For all the atasets, we normalize the omain of each attribute to [-, ]. When generating the synthetic atasets, we fix the ataset size to,000, an vary an from to. For each ataset, well separate Gaussian clusters of equal size are generate, an 0 sets of initial centrois are generate in the same way as in Section.1.1. Implementations for DPLloy an GM were ownloae from [] an [0], respectively. The source coe of PGM [0] was share by the authors. We implemente EUGM. Configuration. Each algorithm outputs centrois o = {o 1,o,,o }. To evaluate the quality of such an output o, we compute the average square istance between any ata point ind an the nearest centroi ino, an call this the NICV. We note that since both DPLloy an EUGM use Lloy-style iteration, they are affecte by the choice of initial centrois. In aition, all algorithms have ranom noises ae somewhere to satisfy ifferential privacy. To conuct a fair comparison, we nee to carefully average out such ranomness effects. GM an PGM

6 Table 1: Descriptions of the Datasets. Dataset # of tuples l GM l GM-K S1, Gowalla,01,1 TIGER 1,1,1 Image,11,0 Ault-num,1, Lifesci,,0 Synthetic,000 [, ] [, ] 0 000/() o not tae a set of initial centrois as input. GM ivies the input ataset into multiple blocs, an for each bloc invoes the stanar -means implementation from the Scipy pacage [] with a ifferent set of initial centrois to get the result, an finally aggregates the outputs for all the blocs. We run GM an PGM 0 times an report the average result. For DPLloy, we generate 0 sets of initial centrois, run DPLloy 0 times on each set of initial centrois, an we report the average of the 000 NICV values as the final evaluation of DPLloy. The non-interactive approach (EUGM) has the avantage that once a synopsis is publishe, one can run -means clustering with as many sets of initial centrois as one wants an choose the result that has the best performance relative to the synopsis. In our experiments, given a synopsis, we use the same 0 sets of initial centrois as those generate for the DPLloy metho. For each set, we run clustering an output a set of centrois. Among all the 0 sets of output centrois, we select the one that has the lowest NICV relative to the synopsis rather than to the original ataset. This process ensures selecting the set of output centrois satisfies ifferential privacy. We then compute the NICV of this selecte set relative to the original ataset, an tae it as the resulting NICV with respect to the synopsis. To eal with the ranomness introuce by the process of generating synopsis, we generate ifferent synopses an tae the average of the resulting NICV. As the baseline, we run stanar -means algorithm [] over the same 0 sets of initial centrois an tae the minimum NICV among all the 0 runs. Experimental Results. Figure 1 reports the results for the external atasets. For these, we vary ǫ from 0.0 to.0 an plot the NICV curve for the methos mentione in Section. This enables us to see how these algorithms perform uner ifferent ǫ. Figure reports the results for the synthetic atasets. For these, we fix ǫ = an report the ifference of NICV between each approach an the baseline. This enables us to see the scalability of these algorithms when an increase. For interactive approaches, DPLloy has the best performance in most cases. Its performance is worse than that of PGM only on the small ataset S1 when the privacy buget ǫ is smaller than. Comparing DPLloy an EUGM, we observe that in the four low imensional atasets (S1, Gowalla, TIGER an Image), EUGM clearly outperforms DPLloy at small ǫ value an their gap becomes smaller as ǫ increases. However, in the two high imensional atasets (Ault-num an Lifesci), DPLloy outperforms EUGM almost in all given privacy bugets. Similar results can also be foun in Figure. Figure also exhibits the effects of the number of clusters an the number of imensions. The EUGM s performance is more sensitive to the increase of imension, while DPLloy gets worse quicly as the number of clusters increases. Below we analyze these algorithms to unerstan why they perform in this way. In aition, Figure shows the ifference of EUGM s performance on ifferent θ choices. Setting θ = for EUGM wors well in most cases.. The Analysis of the GM Approach From Figures 1 an, it is clear that GM is always much worse than others. There are two sources of errors for GM. One is that GM is aggregating centrois compute from the subsets of ata, an this aggregate may be inaccurate even without aing noise. The other is that the noise ae accoring to Equation () may be too large. To tease out the role playe by these two error sources, Figure shows the effect of varying bloc size from aroun N to N. It shows error from GM, error from using the aggregation without noise (SAG), an error from aing noise compute by Equation ) to the best nown centrois (Noise). From the figure, it is clear that setting l = N 0., which correspons to bloc size of N 0. is far from optimal, as the error GM is ominate by that from the noise, an is much higher than the error ue to sample an aggregation. Inee, we observe that as the bloc size ecreases the error of GM eeps ecreasing, until when the bloc size gets close to. It seems that even though many iniviual blocs result in poor centrois, aggregating these relatively poor centrois can result in highly accurate centrois. This effect is most pronounce in the Tiger ataset, which consists of two large clusters. The two centrois compute from each small bloc can be approximately viewe as choosing one ranom point from each cluster. When averaging these centrois, one gets very close to the true centrois. This observation motivate the introuction of GM-K algorithm, which fixes each bloc size to be. Recall that we are to select centrois from each bloc. As can be seen from Figures 1 an, GM-K becomes competitive with PGM, sometimes significantly outperforms PGM (e.g. TIGER an Lifesci), although it still unerperforms DPLloy.. The Analysis of the PGM Approach PGM is a stochastic -means metho base on genetic algorithms. A stochastic metho converges to global optimum []. On the contrary, DPLloy is a graient escent metho erive from the stanar Lloy s algorithm [], which may reach local optimum. However, PGM is still inferior to DPLloy in Figure 1. There are two possible reasons. First, a stochastic approach typically taes a larger number of iterations to converge []. Figure compares the Lloy s algorithm with Gene (i.e., the nonprivate version of PGM without consiering ifferential privacy). For Lloy, we reuse the initial centrois generate in Section.1. Given a ataset, we run Lloy on the 0 sets of initial centrois generate for the ataset, an report the average NICV. Generally, Gene overtaes Lloy as the number of iterations increases an finally converges to the global optimum. However, Lloy improves its performance much faster than Gene in the first few iterations, an converges to the global optimal (or local optimum) more quicly. For example, in the Image ataset, Lloy reaches the best baseline after three iterations, while the Gene nees more than iterations to achieve the same. The secon reason that PGM is inferior to DPLloy is the low privacy buget allocate to select a parameter (i.e., a set of cluster centrois) from the caniate set. In each iteration PGM selects parameters, an the total number of iterations is at least. Thus, the privacy buget allocate to select a single parameter is at most ǫ/0. Therefore, PGM has reasonable performance only for bigǫvalue.. THE HYBRID APPROACH Experimental results in Section establish that DPLloy is the best performing interactive metho; however, it still unerperforms EUGM. Recall that EUGM publishes a private syn-

7 Privacy Buget ε, log scale Privacy Buget ε, log scale (a) S1 [ =, = 1] (b) Image [ =, = ] Privacy Buget ε, log scale Privacy Buget ε, log scale (c) Gowalla [ =, = ] () Ault-num [ =, = ] Privacy Buget ε, log scale Privacy Buget ε, log scale (e) TIGER [ =, = ] (f) Lifesci [ =, = ] Figure 1: The comparison of DPLloy, EUGM, PGM an GM. x-axis: privacy bugetǫ in log-scale. y-axis: NICV in log-scale. opsis of the the ataset, an thus enables other analysis to be performe on the ataset, beyon -means. This means that currently the non-interactive metho has a clear avantage over interactive methos, at least for -means clustering. An intriguing question is Whether EUGM is the best we can o for -means clustering? In particular, can we further improve DPLloy? Recall that there are two ey issues that greatly affect the accuracy of DPLloy: the number of iterations an the choice of initial centrois. In fact, these two are closely relate. If the initially chosen centrois are very goo an close to the true centrois, one only nees perhaps one more iteration to improve it, an this reuction in the number of iterations woul mean little noise is ae. Now if only we have a metho to choose really goo centrois in a ifferentially private way, then we can use part (e.g., half) of the privacy buget to get those initial centrois, an the remaining privacy buget to run one iteration of DPLloy to further improve it. In fact, we o have such a metho. EUGM oes it. This leas us to propose a hybri metho that combines non-interactive EUGM with interactive DPLloy. We first use half the privacy buget to run EUGM, an then use the centrois outputte by EUGM as the initial centrois for one roun of DPLloy. Such a metho, however, may not actually outperform EUGM, especially when the privacy buget ǫ is small, since then one roun of DPLloy may actually worsen the centrois. Therefore, when ǫ is small, we shoul stic to the EUGM metho, an only when ǫ is large enough shoul we aopt the EUGM+DPLloy approach. In orer to etermine what ǫ is large enough, we analyze how the errors

8 (a) DPLloy (b) PGM (c) GM () GM-K (e) EUGM (f) EUGMθ = (g) EUGMθ = (h) EUGMθ = (i) EUGMθ = (j) EUGMθ = 0 () EUGMθ = Figure : The heatmap by varying an epen on the various parameters in DPLloy an in EUGM..1 Error Stuy of DPLloy DPLloy as noises to each iteration of upating centrois. To stuy the error behavior of DPLloy ue to the injecte Laplace noises, we focus on analyzing the mean square error (MSE) between noisy centrois an true centrois in one iteration. Consier one centroi an its upate in one iteration. The true centroi s i th imension shoul be o i = S i, where C is the number of ata points in the cluster ans i is the sum of i th imension C coorinates of ata points in the cluster. Consier the noisy centroi ô; its i th imension is ô i = S i+ S i, where C is the noise C+ C ae to the count an S i is the noise ae to the S i. The MSE is thus: [ ( ) ] Si + S i MSE(ô) = E C + C Si () C i=1 Derivation base on the above formula gives the following proposition.

9 SAG GM Noise K SAG GM Noise SAG GM Noise K SAG GM Noise 0.0 K (a) S1 [ =, = 1] (b) Gowalla [ =, = ] (c) TIGER [ =, = ].0.0 SAG GM Noise.0.0 SAG GM Noise K K K () Image [ =, = ] (e) Ault-num [ =, = ] (f) Lifesci [ =, = ] Figure : The analysis of the GM Approach. x-axis: bloc size exponent in log-scale, y-axis: NICV in log-scale Gene Lloy Gene Lloy Gene Lloy (a) S1 [ =, = 1] (b) Gowalla [ =, = ] (c) TIGER [ =, = ] Gene Lloy Gene Lloy Gene Lloy () Image [ =, = ] (e) Ault-num [ =, = ] (f) Lifesci [ =, = ] Figure : The comparison of the convergence rate of the genetic algorithm base -means an Lloy algorithm. x-axis: number of iterations in log-scale, y-axis: NICV in log-scale. PROPOSITION 1. In one roun of DPLloy, the MSE is ( ) (t) Θ. (Nǫ) PROOF. Let us first consier the MSE on the i-th imension. (Si + S MSE(ô i) = E[ i [ (C Si ) S E i C ] C S i C+ C C ) ] = E[( S i) ] + E[S C i ( C) ] + CS ie[ S i C] C C = Var( S i) C + S i Var( C) C The last step hols, because S i an C are inepenent zeromean Laplacian noises an the following formulas hol: E[ S i C] = 0 E[( S i) ] = E[( S i) ] (E[ S i]) = Var( S i) E[( C) ] = E[( C) ] (E[ C]) = Var( C), wherevar( S i) anvar( C) are the variances of S i an C, respectively. Suppose that on average S i = ρ, where[ r,r] is the range of r C the i th imension. That is, ρ is the normalize coorinate of i-th imension of the cluster s centroi. Furthermore, suppose that each cluster is about the same size, i.e.,c N. Then,MSE(ôi) can be

10 approximate as follows: MSE(ô i) N ( Var( Si)+(βr) Var( C) ) () DPLloy as ) to each sum/count function Laplace noise Lap. Therefore, both Var( S i) an Var( C) are ( (r+1)t ǫ equal to ((r+1)t). From Equation () we obtain ǫ MSE(ô i) ( Var( Si)+(ρr) Var( C) ) N ( ) t(r +1) = (1+(ρr) ). Nǫ As the noise ae to each imension is inepenent, from Equation we now that the MSE is MSE(ô) = ( ) MSE(ô i) (1+(ρr) t(r+1) ) () Nǫ i=1 ( Whenr is a small constant, this becomes Θ (t) ). (Nǫ) Proposition 1 shows that the istortion to the centroi proportional to t, while inversely proportional to (Nǫ). At first glance, this analysis seems to conflict with the experimental result in Figure (a), where DPLloy is much less scalable to than to. The reason behin is that the performance of DPLloy is also affecte by the fact that rouns are not enough for it to converge. When increases, converging taes more time, an it is also more liely that choices of initial centrois lea to local optima that are far from global optimum.. Error Stuy of EUGM Non-interactive approach partitions a ataset into a gri of M uniform cells. Then, it releases private synopses for the cells, an runs -means clustering on the synopses to return the cluster centrois. Similar to the error analysis for DPLloy, we analyze the MSE. Let o be the true centroi of a cluster, an ô be its estimator compute by a non-interactive approach. The MSE between ô an o is compose of two error sources. First, the count in each cell is inaccurate after aing Laplace noise. This results in the variance (i.e.,var(ô)) of ô from its expectation E[ô]. Secon, we no longer have the precise positions of ata points, an only assume that they occur at the center in a cell. Thus, the expectation of ô is not equal too, resulting in a bias (i.e.,bias(ô)). The MSE is the combination of these two errors. MSE(ô) = Var(ô)+(Bias(ô)) () Analyzing the variance. We assume that each cluster has a volume that is 1 of the total volume of the ata space, an has the shape of a cube. In -imensional case, the with of the cube is w = r. Suppose that the geometric center 1 of the cube is τ i. Let T be the set of cells inclue in the cluster. For each cell t T, we use c t to enote the number of tuples int,t i to enote the i th imension coorinate of the center of cell t, an ν t to enote the noise ae 1 Note that this is not the cluster centroi. to the cell size. Let ô i be the i-th imension of the noisy centroi. Then, the variance of ô i is Var(ô i) = Var(ô i τ i) ( ) = Var t T t i(c t+ν t) τ t T (ct+νt) i ( t T = Var (t i τ i )(c t+ν t) 1 C t T t T (ct+νt) ) ( (ti τ i) Var(c t +ν t) ). In the above, the first step follows because τ i as the cube geometric center is a constant. The last step is erive by assuming t T (ct+νt) C, that is, the noisy cluster size is approximately equal to the original cluster size C. We can see that within the cube, ifferent cells contribution to the variance is not the same. Basically, the closer a cell is to the cube center, the less its contribution. The contribution is proportional to the square istance to the cube center. We thus approximate the variance as follows: Var(ô i) 1 w ( ) M x x C w (r) w 1 ǫ = Mr C ǫ + In the above integral, x in the first term is the istance from a cell center to the cube center (i.e., t i τ i). The secon term M is (r) the number of cells per unit volume, anw 1 is the volume of the ( 1)-imensional plane that has a istance of x to the cube center. The last term is the variance of the cell size (i.e.,var(c ǫ t +ν t)). Suppose that clusters are of equal size, that is, C = N. Then, the variance of the noisy centroi by summing all the imensions is. Var(ô) Mr () N ǫ ) The analysis shows that the variance of the EUGM is propor- M tional to. EUGM sets M to ( Nǫ (Nǫ) +. Plugging it into Equation, we get that the variance of EUGM is inversely proportional to(nǫ) +. Analyzing the bias. Let x i be the i th imension coorinate of a tuple x. Then, the bias ofô i is = E Bias(ô i) = E[ô i] o i [ ] t T t i(c t+ν t) t T (ct+νt) t T x t (t i x i ), C t T x t x i t T ct where the last step is evelope by approximating t T (ct +νt) to the cluster size C. The bias evelope in the above formula is epenent on ata istribution. Its precise estimation requires to access real ata. We thus only estimate its upper boun. Let q i = t i x i. Noninteractive approach partitions each imension into M intervals of equal length. Hence, q i falls in the range of [ r r, ], an M M r the upper boun of Bias(ô i) is. Summing all the imensions, we obtain the upper boun of square bias of noisy M centroi (Bias(ô)) r. () M

11 The estimation shows that the upper boun of square bias ecreases as a function of M. This is consistent with the expectation. As M increases, the ata space is partitione into finergraine cells. Therefore, the istance between a tuple in a cell to the cell center ecreases on average. Comparing DPLloy an EUGM. We now analyze the performance of DPLloy an EUGM in Figure 1. Equation shows that the MSE of DPLloy is inversely proportional to (Nǫ). The MSE of EUGM consists of variance an square bias. Plugging ) M = ( Nǫ + into Equation an Inequality, it follows that the MSE of EUGM is inversely proportional to (Nǫ) +. This explains why the NICV of DPLloy, which is inversely proportional to (Nǫ) rops much faster than that of EUGM as ǫ grows. It also explains why DPLloy has better performance on big ataset (e.g., the TIGER ataset). The MSE of EUGM is inversely proportional to (Nǫ) +. Thus, it increases exponentially as a function of. Instea, from Equation, it follows that the MSE of DPLloy has only cubic growth with respect to. Therefore, in Figure 1, as the imensionality of ataset increases, DPLloy outperforms EUGM. This also explains in Figure why DPLloy is more scalable to than EUGM.. The Hybri Approach Our hybri approach combines EUGM an DPLloy. Given a ataset an privacy buget ǫ, the hybri approach first checs whether it overtaes the DPLloy metho an also the EUGM metho. If this is not the case, the hybri approach simply falls bac to EUGM. Otherwise, the hybri approach allocates half privacy buget to EUGM to output a synopsis an fin intermeiary centrois that wor well for the synopsis. Then, it runs DPLloy for one iteration using the remaining half privacy buget to refine these centrois. We use MSE to heuristically etermine the conitions, on which the hybri approach overtaes the DPLloy metho an also the EUGM metho. Basically, we require that the MSE of the hybri approach be smaller than those of the other two approaches, since smaller MSE implies smaller error to the cluster centroi. From Equation, it follows that the MSE of DPLloy with full privacy buget is ( ) (1+(ρr) t(r+1) ). (11) Nǫ A precise estimation of the MSE of the EUGM metho requires to access the ataset, since the bias epens on the real ata istribution. However, we have the approximate variance (Equation ) by settingm = ( Nǫ ) +. r () () +(Nǫ) + (1) One-iteration DPLloy with half privacy buget outputs the final cluster centrois, if it is applie in the hybri approach. Therefore, we approximate the MSE of the hybri approach by that of the oneiteration DPLloy ( ) (1+(ρr) (r +1) ), (1) Nǫ which is evelope by setting t = 1 an privacy buget to 0.ǫ in Equation. Comparing Formulas 11 an 1, it follows that the MSE of the hybri approach is lower than or equal to that of the DPLloy if t. (1) Variance is the lower boun of MSE. Thus, if the MSE of the hybri approach is equal to or smaller than the variance of the EU- GM metho, then it is sure that the hybri approach has lower MSE. Setting Formula 1 smaller than or equal to Formula 1 yiels where an ǫ ( )+ X, (1) Y ( ) X = (1+(ρr) (r +1) ), N Y = r () () +N + Inequalities 1 an 1 give the conitions of applying the hybri approach. Inequality 1 is automatically satisfie since DPLloy runs for t = iterations.. Experimental results We now compare the hybri approach with EUGM an DPLloy. The configuration for EUGM an DPLloy is the same as in Section.1. For the hybri approach, we run EUGM times to output sets of intermeiate centrois. Then we run DPLloy times on each intermeiate result. We finally report the average of 0 NICV values. Figure gives the results on the six external atasets. In low imensional atasets (S1, Gowalla, TIGER, an Image), the hybri approach simply falls bac to EUGM for small ǫ value. When ǫ increases, both the hybri approach an EUGM converge to the baseline with the former having slightly better performance. For example, in the Gowalla ataset for ǫ = 0., the average NICV of the hybri approach is 0.01 an that of EUGM is0.01. In higher imensional atasets (Ault-num an Lifesci), the hybri approach outperforms the other two approaches in most cases. It is worse than DPLloy only for a few smallǫvalues, on which it falls bac to EUGM. There are two possible reasons. The first is that the MSE analysis assumes that atasets are well clustere an each cluster has equal size, but the real atasets are sewe. For example, the baseline approach partitions the Ault-num ataset into clusters, in which the biggest cluster contains 1, tuples an the smallest contains,10 tuples. The secon is that we use the variance of EUGM as the lower boun of its MSE. Thus, it is possible that the MSE of the hybri approach (approximate by the MSE of one-iteration DPLloy with half privacy buget) is larger than the variance of EUGM, but actually smaller than its MSE. In such cases, the hybri approach gives lower NICV if it oes not fall bac to EUGM. For example, on the Ault-num ataset for ǫ = 0.0, the hybri approach of falling bac to EUGM has the NICV of 0.0, while its NICV is 0., if it applies EUGM plus one-iteration of DPLloy. We also evaluate the approaches using the synthetic atasets as generate in Section.1. Figure clearly shows that the hybri approach is more scalable than EUGM with respect to both an. This confirms the effectiveness of the hybri approach. Figure presents the runtime of DPLloy an EUGM on the six external atasets. We follow the same experiment configuration as. 11

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

Design of Policy-Aware Differentially Private Algorithms

Design of Policy-Aware Differentially Private Algorithms Design of Policy-Aware Differentially Private Algorithms Samuel Haney Due University Durham, NC, USA shaney@cs.ue.eu Ashwin Machanavajjhala Due University Durham, NC, USA ashwin@cs.ue.eu Bolin Ding Microsoft

More information

6 Gradient Descent. 6.1 Functions

6 Gradient Descent. 6.1 Functions 6 Graient Descent In this topic we will iscuss optimizing over general functions f. Typically the function is efine f : R! R; that is its omain is multi-imensional (in this case -imensional) an output

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

Lecture 1 September 4, 2013

Lecture 1 September 4, 2013 CS 84r: Incentives an Information in Networks Fall 013 Prof. Yaron Singer Lecture 1 September 4, 013 Scribe: Bo Waggoner 1 Overview In this course we will try to evelop a mathematical unerstaning for the

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

1 Surprises in high dimensions

1 Surprises in high dimensions 1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing Inexing the Eges A simple an yet efficient approach to high-imensional inexing Beng Chin Ooi Kian-Lee Tan Cui Yu Stephane Bressan Department of Computer Science National University of Singapore 3 Science

More information

Fast Fractal Image Compression using PSO Based Optimization Techniques

Fast Fractal Image Compression using PSO Based Optimization Techniques Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting

More information

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract The Reconstruction of Graphs Dhananay P. Mehenale Sir Parashurambhau College, Tila Roa, Pune-4030, Inia. Abstract In this paper we iscuss reconstruction problems for graphs. We evelop some new ieas lie

More information

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Using Vector and Raster-Based Techniques in Categorical Map Generalization Thir ICA Workshop on Progress in Automate Map Generalization, Ottawa, 12-14 August 1999 1 Using Vector an Raster-Base Techniques in Categorical Map Generalization Beat Peter an Robert Weibel Department

More information

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization 1 Offloaing Cellular Traffic through Opportunistic Communications: Analysis an Optimization Vincenzo Sciancalepore, Domenico Giustiniano, Albert Banchs, Anreea Picu arxiv:1405.3548v1 [cs.ni] 14 May 24

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters Available online at www.scienceirect.com Proceia Engineering 4 (011 ) 34 38 011 International Conference on Avances in Engineering Cluster Center Initialization Metho for K-means Algorithm Over Data Sets

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE БСУ Международна конференция - 2 THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE Evgeniya Nikolova, Veselina Jecheva Burgas Free University Abstract:

More information

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar

More information

Overlap Interval Partition Join

Overlap Interval Partition Join Overlap Interval Partition Join Anton Dignös Department of Computer Science University of Zürich, Switzerlan aignoes@ifi.uzh.ch Michael H. Böhlen Department of Computer Science University of Zürich, Switzerlan

More information

Learning Polynomial Functions. by Feature Construction

Learning Polynomial Functions. by Feature Construction I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate

More information

Evolutionary Optimisation Methods for Template Based Image Registration

Evolutionary Optimisation Methods for Template Based Image Registration Evolutionary Optimisation Methos for Template Base Image Registration Lukasz A Machowski, Tshilizi Marwala School of Electrical an Information Engineering University of Witwatersran, Johannesburg, South

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation Solution Representation for Job Shop Scheuling Problems in Ant Colony Optimisation James Montgomery, Carole Faya 2, an Sana Petrovic 2 Faculty of Information & Communication Technologies, Swinburne University

More information

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace A Classification of R Orthogonal Manipulators by the Topology of their Workspace Maher aili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S.

More information

arxiv: v1 [cs.ds] 12 Sep 2016

arxiv: v1 [cs.ds] 12 Sep 2016 Jaewoo Lee Penn State University, University Par, PA 16801 Daniel Kifer Penn State University, University Par, PA 16801 JLEE@CSE.PSU.EDU DKIFER@CSE.PSU.EDU arxiv:1609.03251v1 [cs.ds] 12 Sep 2016 Abstract

More information

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly International Journal "Information Technologies an Knowlege" Vol. / 2007 309 [Project MINERVAEUROPE] Project MINERVAEUROPE: Ministerial Network for Valorising Activities in igitalisation -

More information

Top-k Frequent Itemsets via Differentially Private FP-trees

Top-k Frequent Itemsets via Differentially Private FP-trees Top-k Frequent Itemsets via Differentially Private FP-trees Jaewoo Lee Dept. of Computer Science / CERIAS Purue University West Lafayette, IN USA jaewoo@cs.purue.eu Chris Clifton Dept. of Computer Science

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex boies is har Navin Goyal Microsoft Research Inia navingo@microsoftcom Luis Raemacher Georgia Tech lraemac@ccgatecheu Abstract We show that learning a convex boy in R, given ranom samples

More information

A Stochastic Process on the Hypercube with Applications to Peer to Peer Networks

A Stochastic Process on the Hypercube with Applications to Peer to Peer Networks A Stochastic Process on the Hypercube with Applications to Peer to Peer Networs [Extene Abstract] Micah Aler Department of Computer Science, University of Massachusetts, Amherst, MA 0003 460, USA micah@cs.umass.eu

More information

Learning Subproblem Complexities in Distributed Branch and Bound

Learning Subproblem Complexities in Distributed Branch and Bound Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University

More information

Kinematic Analysis of a Family of 3R Manipulators

Kinematic Analysis of a Family of 3R Manipulators Kinematic Analysis of a Family of R Manipulators Maher Baili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S. 6597 1, rue e la Noë, BP 92101,

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

A Plane Tracker for AEC-automation Applications

A Plane Tracker for AEC-automation Applications A Plane Tracker for AEC-automation Applications Chen Feng *, an Vineet R. Kamat Department of Civil an Environmental Engineering, University of Michigan, Ann Arbor, USA * Corresponing author (cforrest@umich.eu)

More information

Fuzzy Clustering in Parallel Universes

Fuzzy Clustering in Parallel Universes Fuzzy Clustering in Parallel Universes Bern Wisweel an Michael R. Berthol ALTANA-Chair for Bioinformatics an Information Mining Department of Computer an Information Science, University of Konstanz 78457

More information

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 A PSO Optimize Layere Approach for Parametric Clustering on Weather Dataset Shikha Verma, 1 Kiran Jyoti 1 Stuent, Guru Nanak Dev Engineering College

More information

Divide-and-Conquer Algorithms

Divide-and-Conquer Algorithms Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:

More information

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance New Version of Davies-Boulin Inex for lustering Valiation Base on ylinrical Distance Juan arlos Roas Thomas Faculta e Informática Universia omplutense e Mari Mari, España correoroas@gmail.com Abstract

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Calculus I course that I teach here at Lamar University. Despite the fact that these are my class notes, they shoul be accessible to anyone wanting to learn Calculus

More information

Optimal Oblivious Path Selection on the Mesh

Optimal Oblivious Path Selection on the Mesh Optimal Oblivious Path Selection on the Mesh Costas Busch Malik Magon-Ismail Jing Xi Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 280, USA {buschc,magon,xij2}@cs.rpi.eu Abstract

More information

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES OLIVIER BERNARDI AND ÉRIC FUSY Abstract. We present bijections for planar maps with bounaries. In particular, we obtain bijections for triangulations an quarangulations

More information

Modifying ROC Curves to Incorporate Predicted Probabilities

Modifying ROC Curves to Incorporate Predicted Probabilities Moifying ROC Curves to Incorporate Preicte Probabilities Cèsar Ferri DSIC, Universitat Politècnica e València Peter Flach Department of Computer Science, University of Bristol José Hernánez-Orallo DSIC,

More information

Research Article Inviscid Uniform Shear Flow past a Smooth Concave Body

Research Article Inviscid Uniform Shear Flow past a Smooth Concave Body International Engineering Mathematics Volume 04, Article ID 46593, 7 pages http://x.oi.org/0.55/04/46593 Research Article Invisci Uniform Shear Flow past a Smooth Concave Boy Abullah Mura Department of

More information

Short-term prediction of photovoltaic power based on GWPA - BP neural network model

Short-term prediction of photovoltaic power based on GWPA - BP neural network model Short-term preiction of photovoltaic power base on GWPA - BP neural networ moel Jian Di an Shanshan Meng School of orth China Electric Power University, Baoing. China Abstract In recent years, ue to China's

More information

Considering bounds for approximation of 2 M to 3 N

Considering bounds for approximation of 2 M to 3 N Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for

More information

WLAN Indoor Positioning Based on Euclidean Distances and Fuzzy Logic

WLAN Indoor Positioning Based on Euclidean Distances and Fuzzy Logic WLAN Inoor Positioning Base on Eucliean Distances an Fuzzy Logic Anreas TEUBER, Bern EISSFELLER Institute of Geoesy an Navigation, University FAF, Munich, Germany, e-mail: (anreas.teuber, bern.eissfeller)@unibw.e

More information

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 1, NO. 4, APRIL 01 74 Towar Efficient Distribute Algorithms for In-Network Binary Operator Tree Placement in Wireless Sensor Networks Zongqing Lu,

More information

Interior Permanent Magnet Synchronous Motor (IPMSM) Adaptive Genetic Parameter Estimation

Interior Permanent Magnet Synchronous Motor (IPMSM) Adaptive Genetic Parameter Estimation Interior Permanent Magnet Synchronous Motor (IPMSM) Aaptive Genetic Parameter Estimation Java Rezaie, Mehi Gholami, Reza Firouzi, Tohi Alizaeh, Karim Salashoor Abstract - Interior permanent magnet synchronous

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.

More information

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem Throughput Characterization of Noe-base Scheuling in Multihop Wireless Networks: A Novel Application of the Gallai-Emons Structure Theorem Bo Ji an Yu Sang Dept. of Computer an Information Sciences Temple

More information

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract Optimization Methos for Calculating Design Imprecision y William S. Law Eri K. Antonsson Engineering Design Research Laboratory Division of Engineering an Applie Science California Institute of Technology

More information

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway State Inexe Policy Search by Dynamic Programming Charles DuHaway Yi Gu 5435537 503372 December 4, 2007 Abstract We consier the reinforcement learning problem of simultaneous trajectory-following an obstacle

More information

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT

More information

AnyTraffic Labeled Routing

AnyTraffic Labeled Routing AnyTraffic Labele Routing Dimitri Papaimitriou 1, Pero Peroso 2, Davie Careglio 2 1 Alcatel-Lucent Bell, Antwerp, Belgium Email: imitri.papaimitriou@alcatel-lucent.com 2 Universitat Politècnica e Catalunya,

More information

A multiple wavelength unwrapping algorithm for digital fringe profilometry based on spatial shift estimation

A multiple wavelength unwrapping algorithm for digital fringe profilometry based on spatial shift estimation University of Wollongong Research Online Faculty of Engineering an Information Sciences - Papers: Part A Faculty of Engineering an Information Sciences 214 A multiple wavelength unwrapping algorithm for

More information

Estimating Velocity Fields on a Freeway from Low Resolution Video

Estimating Velocity Fields on a Freeway from Low Resolution Video Estimating Velocity Fiels on a Freeway from Low Resolution Vieo Young Cho Department of Statistics University of California, Berkeley Berkeley, CA 94720-3860 Email: young@stat.berkeley.eu John Rice Department

More information

A Revised Simplex Search Procedure for Stochastic Simulation Response Surface Optimization

A Revised Simplex Search Procedure for Stochastic Simulation Response Surface Optimization 272 INFORMS Journal on Computing 0899-1499 100 1204-0272 $05.00 Vol. 12, No. 4, Fall 2000 2000 INFORMS A Revise Simplex Search Proceure for Stochastic Simulation Response Surface Optimization DAVID G.

More information

EXACT SIMULATION OF A BOOLEAN MODEL

EXACT SIMULATION OF A BOOLEAN MODEL Original Research Paper oi:10.5566/ias.v32.p101-105 EXACT SIMULATION OF A BOOLEAN MODEL CHRISTIAN LANTUÉJOULB MinesParisTech 35 rue Saint-Honoré 77305 Fontainebleau France e-mail: christian.lantuejoul@mines-paristech.fr

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

Characterizing Decoding Robustness under Parametric Channel Uncertainty

Characterizing Decoding Robustness under Parametric Channel Uncertainty Characterizing Decoing Robustness uner Parametric Channel Uncertainty Jay D. Wierer, Wahee U. Bajwa, Nigel Boston, an Robert D. Nowak Abstract This paper characterizes the robustness of ecoing uner parametric

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

Selection Strategies for Initial Positions and Initial Velocities in Multi-optima Particle Swarms

Selection Strategies for Initial Positions and Initial Velocities in Multi-optima Particle Swarms ACM, 2011. This is the author s version of the work. It is poste here by permission of ACM for your personal use. Not for reistribution. The efinitive version was publishe in Proceeings of the 13th Annual

More information

Design and Analysis of Optimization Algorithms Using Computational

Design and Analysis of Optimization Algorithms Using Computational Appl. Num. Anal. Comp. Math., No. 3, 43 433 (4) / DOI./anac.47 Design an Analysis of Optimization Algorithms Using Computational Statistics T. Bartz Beielstein, K.E. Parsopoulos,3, an M.N. Vrahatis,3 Department

More information

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition ITERATIOAL JOURAL OF MATHEMATICS AD COMPUTERS I SIMULATIO A eural etwork Moel Base on Graph Matching an Annealing :Application to Han-Written Digits Recognition Kyunghee Lee Abstract We present a neural

More information

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm NASA/CR-1998-208733 ICASE Report No. 98-45 Parallel Directionally Split Solver Base on Reformulation of Pipeline Thomas Algorithm A. Povitsky ICASE, Hampton, Virginia Institute for Computer Applications

More information

Lab work #8. Congestion control

Lab work #8. Congestion control TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño

More information

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs IEEE TRANSACTIONS ON KNOWLEDE AND DATA ENINEERIN, MANUSCRIPT ID Distribute Line raphs: A Universal Technique for Designing DHTs Base on Arbitrary Regular raphs Yiming Zhang an Ling Liu, Senior Member,

More information

Distributed Decomposition Over Hyperspherical Domains

Distributed Decomposition Over Hyperspherical Domains Distribute Decomposition Over Hyperspherical Domains Aron Ahmaia 1, Davi Keyes 1, Davi Melville 2, Alan Rosenbluth 2, Kehan Tian 2 1 Department of Applie Physics an Applie Mathematics, Columbia University,

More information

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics CS 106 Winter 2016 Craig S. Kaplan Moule 01 Processing Recap Topics The basic parts of speech in a Processing program Scope Review of syntax for classes an objects Reaings Your CS 105 notes Learning Processing,

More information

A Cost Model For Nearest Neighbor Search. High-Dimensional Data Space

A Cost Model For Nearest Neighbor Search. High-Dimensional Data Space A Cost Moel For Nearest Neighbor Search in High-Dimensional Data Space Stefan Berchtol University of Munich Germany berchtol@informatikuni-muenchene Daniel A Keim University of Munich Germany keim@informatikuni-muenchene

More information

Study of Network Optimization Method Based on ACL

Study of Network Optimization Method Based on ACL Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 01 01 01 01 01 00 01 01 Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University

More information

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways Ben, Jogs, An Wiggles for Railroa Tracks an Vehicle Guie Ways Louis T. Klauer Jr., PhD, PE. Work Soft 833 Galer Dr. Newtown Square, PA 19073 lklauer@wsof.com Preprint, June 4, 00 Copyright 00 by Louis

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

A Duality Based Approach for Realtime TV-L 1 Optical Flow

A Duality Based Approach for Realtime TV-L 1 Optical Flow A Duality Base Approach for Realtime TV-L 1 Optical Flow C. Zach 1, T. Pock 2, an H. Bischof 2 1 VRVis Research Center 2 Institute for Computer Graphics an Vision, TU Graz Abstract. Variational methos

More information

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks mproving Spatial Reuse of EEE 82.11 Base A Hoc Networks Fengji Ye, Su Yi an Biplab Sikar ECSE Department, Rensselaer Polytechnic nstitute Troy, NY 1218 Abstract n this paper, we evaluate an suggest methos

More information

CONSTRUCTION AND ANALYSIS OF INVERSIONS IN S 2 AND H 2. Arunima Ray. Final Paper, MATH 399. Spring 2008 ABSTRACT

CONSTRUCTION AND ANALYSIS OF INVERSIONS IN S 2 AND H 2. Arunima Ray. Final Paper, MATH 399. Spring 2008 ABSTRACT CONSTUCTION AN ANALYSIS OF INVESIONS IN S AN H Arunima ay Final Paper, MATH 399 Spring 008 ASTACT The construction use to otain inversions in two-imensional Eucliean space was moifie an applie to otain

More information

Optimal Distributed P2P Streaming under Node Degree Bounds

Optimal Distributed P2P Streaming under Node Degree Bounds Optimal Distribute P2P Streaming uner Noe Degree Bouns Shaoquan Zhang, Ziyu Shao, Minghua Chen, an Libin Jiang Department of Information Engineering, The Chinese University of Hong Kong Department of EECS,

More information

A shortest path algorithm in multimodal networks: a case study with time varying costs

A shortest path algorithm in multimodal networks: a case study with time varying costs A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Linus Svärm Petter Stranmark Centre for Mathematical Sciences, Lun University {linus,petter}@maths.lth.se Abstract Shift-map image processing is a new framework base on energy

More information

A Convex Clustering-based Regularizer for Image Segmentation

A Convex Clustering-based Regularizer for Image Segmentation Vision, Moeling, an Visualization (2015) D. Bommes, T. Ritschel an T. Schultz (Es.) A Convex Clustering-base Regularizer for Image Segmentation Benjamin Hell (TU Braunschweig), Marcus Magnor (TU Braunschweig)

More information

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure

More information

Detecting Overlapping Communities from Local Spectral Subspaces

Detecting Overlapping Communities from Local Spectral Subspaces Detecting Overlapping Communities from Local Spectral Subspaces Kun He, Yiwei Sun Huazhong University of Science an Technology Wuhan 430074, China Email: {brooklet60, yiweisun}@hust.eu.cn Davi Binel, John

More information

Exploring Context with Deep Structured models for Semantic Segmentation

Exploring Context with Deep Structured models for Semantic Segmentation 1 Exploring Context with Deep Structure moels for Semantic Segmentation Guosheng Lin, Chunhua Shen, Anton van en Hengel, Ian Rei between an image patch an a large backgroun image region. Explicitly moeling

More information

Handling missing values in kernel methods with application to microbiology data

Handling missing values in kernel methods with application to microbiology data an Machine Learning. Bruges (Belgium), 24-26 April 2013, i6oc.com publ., ISBN 978-2-87419-081-0. Available from http://www.i6oc.com/en/livre/?gcoi=28001100131010. Hanling missing values in kernel methos

More information

Blind Data Classification using Hyper-Dimensional Convex Polytopes

Blind Data Classification using Hyper-Dimensional Convex Polytopes Blin Data Classification using Hyper-Dimensional Convex Polytopes Brent T. McBrie an Gilbert L. Peterson Department of Electrical an Computer Engineering Air Force Institute of Technology 9 Hobson Way

More information

2-connected graphs with small 2-connected dominating sets

2-connected graphs with small 2-connected dominating sets 2-connecte graphs with small 2-connecte ominating sets Yair Caro, Raphael Yuster 1 Department of Mathematics, University of Haifa at Oranim, Tivon 36006, Israel Abstract Let G be a 2-connecte graph. A

More information

Classical Mechanics Examples (Lagrange Multipliers)

Classical Mechanics Examples (Lagrange Multipliers) Classical Mechanics Examples (Lagrange Multipliers) Dipan Kumar Ghosh Physics Department, Inian Institute of Technology Bombay Powai, Mumbai 400076 September 3, 015 1 Introuction We have seen that the

More information

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory Feature Extraction an Rule Classification Algorithm of Digital Mammography base on Rough Set Theory Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative

More information

Open Access Adaptive Image Enhancement Algorithm with Complex Background

Open Access Adaptive Image Enhancement Algorithm with Complex Background Sen Orers for Reprints to reprints@benthamscience.ae 594 The Open Cybernetics & Systemics Journal, 205, 9, 594-600 Open Access Aaptive Image Enhancement Algorithm with Complex Bacgroun Zhang Pai * epartment

More information

Architecture Design of Mobile Access Coordinated Wireless Sensor Networks

Architecture Design of Mobile Access Coordinated Wireless Sensor Networks Architecture Design of Mobile Access Coorinate Wireless Sensor Networks Mai Abelhakim 1 Leonar E. Lightfoot Jian Ren 1 Tongtong Li 1 1 Department of Electrical & Computer Engineering, Michigan State University,

More information

Data Mining: Concepts and Techniques. Chapter 7. Cluster Analysis. Examples of Clustering Applications. What is Cluster Analysis?

Data Mining: Concepts and Techniques. Chapter 7. Cluster Analysis. Examples of Clustering Applications. What is Cluster Analysis? Data Mining: Concepts an Techniques Chapter Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.eu/~hanj Jiawei Han an Micheline Kamber, All rights reserve

More information

Table-based division by small integer constants

Table-based division by small integer constants Table-base ivision by small integer constants Florent e Dinechin, Laurent-Stéphane Diier LIP, Université e Lyon (ENS-Lyon/CNRS/INRIA/UCBL) 46, allée Italie, 69364 Lyon Ceex 07 Florent.e.Dinechin@ens-lyon.fr

More information

Bayesian localization microscopy reveals nanoscale podosome dynamics

Bayesian localization microscopy reveals nanoscale podosome dynamics Nature Methos Bayesian localization microscopy reveals nanoscale poosome ynamics Susan Cox, Ewar Rosten, James Monypenny, Tijana Jovanovic-Talisman, Dylan T Burnette, Jennifer Lippincott-Schwartz, Gareth

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information