Stability yields a PTAS for k-median and k-means Clustering

Size: px
Start display at page:

Download "Stability yields a PTAS for k-median and k-means Clustering"

Transcription

1 Stability yields a PTAS for -Media ad -Meas Clusterig Prajal Awasthi Caregie Mello Uiversity pawasthi@cs.cmu.edu Avrim Blum Caregie Mello Uiversity avrim@cs.cmu.edu Or Sheffet Caregie Mello Uiversity osheffet@cs.cmu.edu Abstract We cosider -media clusterig i fiite metric spaces ad -meas clusterig i Euclidea spaces, i the settig where is part of the iput (ot a costat). For the -meas problem, Ostrovsy et al. [18] show that if the optimal ( 1)-meas clusterig of the iput is more expesive tha the optimal -meas clusterig by a factor of 1/ǫ 2, the oe ca achieve a (1 + f(ǫ))-approximatio to the -meas optimal i time polyomial i ad by usig a variat of Lloyd s algorithm. I this wor we substatially improve this approximatio guaratee. We show that give oly the coditio that the ( 1)-meas optimal is more expesive tha the -meas optimal by a factor 1+α for some costat α > 0, we ca obtai a PTAS. I particular, uder this assumptio, for ay ǫ > 0 we achieve a (1 + ǫ)-approximatio to the -meas optimal i time polyomial i ad, ad expoetial i 1/ǫ ad 1/α. We thus decouple the stregth of the assumptio from the quality of the approximatio ratio. We also give a PTAS for the -media problem i fiite metrics uder the aalogous assumptio as well. For -meas, we i additio give a radomized algorithm with improved ruig time of O(1) ( log ) poly(1/ǫ,1/α). Our techique also obtais a PTAS uder the assumptio of Balca et al. [4] that all (1 + α) approximatios are δ-close to a desired target clusterig, i the case that all target clusters have size greater tha δ ad α > 0 is costat. Note that the motivatio of Balca et al. [4] is that for may clusterig problems, the objective fuctio is oly a proxy for the true goal of gettig close to the target. From this perspective, our improvemet is that for -meas i Euclidea spaces we reduce the distace of the clusterig foud to the target from O(δ) to δ whe all target clusters are large, ad for -media we improve the largeess coditio eeded i [4] to get exactly δ-close from O(δ) to δ. Our results are based o a ew otio of clusterig stability. 1. INTRODUCTION Clusterig is a well-studied tas, arisig i umerous areas from computer visio to computatioal biology to distributed computig. Geerally speaig, the goal of clusterig is to partitio give data objects ito groups that share some commoality. Operatioally, clusterig is ofte performed by viewig the data as poits i a metric space ad the optimizig some atural objective over them. I this paper, we cosider two popular such objectives, -media ad -meas. Both measure a -partitio by choosig a special poit for each cluster, called the ceter, ad defie the cost of a clusterig as a fuctio of the distaces betwee the data poits ad their respective ceters. I the -media case, the cost is the sum of the distaces of the poits to their ceters, ad i the -meas case, the cost is the sum of these distaces squared. The -media objective is typically studied for data i a fiite metric (complete weighted graph satisfyig triagle iequality) over the data poits; -meas clusterig is typically studied for poits i a (fiite dimesioal) Euclidea space. Both objectives are ow to be NP-hard (we view as part of the iput ad ot a costat, though eve the 2-meas problem i Euclidea space was recetly show to be NP-hard [8]). For -media i a fiite metric, there is a ow (1+1/e)- hardess of approximatio result [14] ad substatial wor o approximatio algorithms [11], [7], [2], [14], [9], with the best guaratee a 3 + ǫ approximatio. For -meas i a Euclidea space, there is also a vast literature of approximatio algorithms [17], [3], [9], [10], [12], [15] with the best guaratee a costat-factor approximatio if polyomial depedece o ad the dimesio d is desired. 1 Ostrovsy et al. [18] proposed a iterestig coditio uder which oe ca achieve better -meas approximatios i time polyomial i ad. They cosider -meas istaces where the optimal -clusterig has cost oticeably smaller tha the cost of ay ( 1)-clusterig, motivated by the idea that if a ear-optimal -clusterig ca be achieved by a partitio ito fewer tha clusters, the that smaller value of should be used to cluster the data [18]. Uder the assumptio that the ratio of the cost of the optimal ( 1)-meas clusterig to the cost of the optimal -meas clusterig is at least max{100, 1/ǫ 2 }, Ostrovsy et al. show that oe ca obtai a (1+f(ǫ))-approximatio for -meas i time polyomial i ad, by usig a variat o Lloyd s algorithm. I this paper, we substatially improve o this approximatio guaratee. We show that uder the much weaer assumptio that the ratio of these costs is just at least (1 + α) for some costat α > 0, we ca achieve a PTAS: amely, (1 + ǫ)-approximate the -meas optimum, for ay costat ǫ > 0. Our approximatio scheme rus i time which is poly(, ) ad expoetial oly i 1/ǫ ad 1/α. Thus, we decouple the stregth of the assumptio from the quality of the coclusio, ad i the process allow the assumptio to be substatially weaer. For -meas clusterig we i additio give a radomized algorithm with improved ruig time O(1) ( log ) poly(1/ǫ,1/α). Balca et al. [4], motivated by the fact that objective fuctios are ofte just a proxy for the uderlyig goal 1 If is costat, the -media i fiite metrics ca be trivially solved i polyomial time ad there is a PTAS ow for -meas (ad -media) i Euclidea space [16]. There is also a PTAS ow for low-dimesioal Euclidea spaces (dimesio at most log log ) [1], [12].

2 of gettig the data clustered correctly, propose clusterig istaces that satisfy the coditio that all (1 + α) approximatios to the give objective (e.g., -media or -meas) are δ-close, i terms of how poits are partitioed, to a target clusterig (such as a correct clusterig of proteis by fuctio or a correct clusterig of images by who is i them). This ca be viewed as a assumptio made implicitly whe cosiderig approximatio algorithms for problems of this ature where the true goal is to get close to the target. Balca et al. show that for ay α ad δ, give a istace satisfyig this property for -media or -meas objectives, oe ca i fact efficietly produce a clusterig that is O(δ/α)-close to the target clusterig (so, O(δ)-close for ay costat α > 0), eve though obtaiig a 1 + α approximatio to the objective is NP-hard for α < 1 e, ad remais hard eve uder this assumptio. Thus they show that oe ca approximate the target eve though it is hard to approximate the objective. Oe iterestig questio that has remaied is the approximability of the objectives whe all target clusters are large compared to δ, sice the hardess of approximatio requires allowig small clusters. 2 Here, we show that for both -media ad -meas objectives, if all clusters cotai more tha δ poits, the for ay costat α > 0 we ca i fact get a PTAS. Thus, we (early) resolve the approximability of these objectives uder this coditio. Note that uder this coditio, this further implies fidig a δ-close clusterig (settig ǫ = α). Thus, we also exted the results of Balca et al. [4] i the case of large clusters ad costat α by gettig exactly δ-close for both -media ad -meas objectives. (I [4] this exact closeess was achieved for the -media objective but eeded a somewhat larger O(δ(1 + 1/α)) miimum cluster size requiremet). Our algorithmic results are achieved by examiig implicatios of a property we call wea deletio-stability that is implied by both the separatio coditio of Ostrovsy et al. [18] as well as (whe target clusters are large) the stability coditio of Balca et al. [4]. I particular, a istace of - media/-meas clusterig satisfies wea deletio-stability if i the optimal solutio, deletig ay of the ceters c i ad assigig all poits i cluster i istead to oe of the remaiig 1 ceters c j, results i a icrease i the -media/-meas cost by a (arbitrarily small) costat factor. We also show that wea deletio-stability still allows for NP-hard istaces ad that o FPTAS is possible as well (uless P = NP). Thus, our algorithm, whose ruig time is () poly(1/ǫ,1/β), is optimal i the sese that the superpolyomial depedece o 1/ǫ ad 1/β is uavoidable. After presetig otatio ad prelimiaries i Sectio 2, i Sectio 3 we itroduce wea deletio-stability ad relate it to the stability otios of [18] ad [4]. We the defie aother property of a clusterig beig β-distributed which, 2 I fact, as show i [19], the -media algorithm i [4] for the case that clusters are sufficietly large compared to δ(1 + 1/α) achieves a better costat-factor approximatio. Note that δ eed ot be a costat. while ot so ituitive, we show is implied by wea deletiostability ad will be the actual coditio that our algorithms will use. We the go o to prove that beig β-distributed suffices to give a PTAS for -media i Sectio 4. We exted the algorithm to -meas clusterig i Sectio 5, where we also itroduce a radomized versio whose ru-time is bouded by 3 ((log() )) poly(1/ǫ,1/β). We coclude with discussio ad ope problems i Sectio NOTATION AND PRELIMINARIES We are give a set S of poits. Whe discussig - media, we assume the poits reside i a fiite metric space, ad whe discussig -meas, we assume they all reside i a fiite dimesioal Euclidea space. We deote d : S S R 0 as the distace fuctio. A solutio to the -media objective partitios the poits ito disjoit subsets, C 1, C 2,...,C ad assigs a ceter c i for each subset. The -media cost of this partitio is the measured by i=1 x C i d(x, c i ). A solutio to the -meas objective agai gives a -partitio of the data poits, but ow we may assume uses the ceter of mass, µ Ci = 1 x C i x, as the ceter of the C i. C i We the measure the -meas cost of this clusterig by i=1 x C i d 2 (x, µ Ci ) = i=1 x C i x µ Ci 2. The optimal clusterig (w.r.t. to either the -media or the -meas objective) is deoted as C = {C1, C 2,...,C }, ad its cost is deoted as. The ceters used i the optimal clusterig are deoted as {c 1, c 2,..., c }. Clearly, give the optimal clusterig, we ca fid the optimal ceters (either by brute-force checig all possible poits for - media, or by c i = µ Ci for -meas). Alteratively, give the optimal ceters, we ca assig each x to its earest ceter, thus obtaiig the optimal clusterig. Thus, we use C to deote both the optimal -partitio, ad the optimal list of ceters. We use i to deote the cotributio of the cluster i to, that is i = x C d(x, c i i ) i the -media case, or i = x C d 2 (x, c i i ) i the -meas case. 3. STABILITY PROPERTIES As metioed above, our results are achieved by exploitig implicatios of a stability coditio we call wea deletio-stability, ad i particular a implicatio we call beig β-distributed. I this sectio we defie wea deletiostability ad of beig β-distributed, relate wea deletiostability to coditios of Ostrovsy et al. [18] ad Balca et al. [4], ad show that wea deletio-stability implies the clusterig is β-distributed. I Sectios 4 ad 5 we use the property of beig β-distributed to obtai a PTAS. 3 Defiitio 3.1. For α > 0, a -media/-meas istace satisfies (1+α) wea deletio-stability, if it has the followig property. Let {c 1, c 2,...,c } deote the ceters i the 3 Techically, we could sip the middlema of wea deletio-stability ad just defie the property of beig β-distributed as our mai stability otio, but wea deletio-stability is a more ituitive coditio.

3 optimal -media/-meas solutio. Let deote the optimal -media/-meas cost ad let (i j) deote the cost of the clusterig obtaied by removig c i as a ceter ad assigig all its poits istead to c j. The for ay i j, it holds that (i j) > (1 + α) We use wea deletio-stability via the followig implicatio we call beig β-distributed. Defiitio 3.2. For β > 0, a -media istace is β- distributed if for ay ceter c i of the optimal clusterig ad ay data poit x / Ci, it holds that d(x, c i ) β C i. A -meas istace is β-distributed if for ay such c i ad x / Ci, it holds that d 2 (x, c i ) β C i We prove that (1 + α) wea deletio-stability implies the clusterig is α/2-distributed for -media (α/4-distributed for -meas) i Theorem 3.5 below. First, however, we relate wea deletio-stability to the coditios cosidered i [18] ad [4]. A. ORSS-Separability Ostrovsy, Rabai, Schulma ad Swamy [18] defie a clusterig istace to be ǫ-separated if the optimal -meas solutio is cheaper tha the optimal ( 1)-meas solutio by at least a factor ǫ 2. For a give objective (-meas or -media) let us use ( 1) to deote the cost of the optimal ( 1)-clusterig. Itroducig a parameter α > 0, say a clusterig istace is (1 + α)-orss separable if ( 1) > 1 + α If a istace satisfies (1 + α)-orss separability the all ( 1) clusterigs must have cost more tha (1 + α) ad hece it is immediately evidet that the istace will also satisfy (1 + α)-wea deletio-stability. Hece we have the followig claim: Claim 3.3. Ay (1 + α)-orss separable -media/-meas istace is also (1 + α)-wealy deletio stable. B. BBG-Stability Balca, Blum, ad Gupta [4] (see also Balca ad Braverma [5] ad Balca, Rögli, ad Teg [6]) cosider a otio of stability to approximatios motivated by settigs i which there exists some (uow) target clusterig C target we would lie to produce. Balca et al. [4] defie a clusterig istace to be (1 + α, δ) approximatio-stable with respect to some objective Φ (such as -media or -meas), if ay -partitio whose cost uder Φ is at most (1+α) agrees with the target clusterig o all but at most δ data poits. That is, for ay (1 + α) approximatio C to objective Φ, we have mi σ S i Ctarget i C σ(i) δ (here, σ is simply a matchig of the idices i the target clusterig to those i C). I geeral, δ may be larger tha the smallest target cluster size, ad i that case approximatio-stability eed ot imply wea deletio-stability (ot surprisigly sice [4] show that -media ad -meas remai hard to approximate). However, whe all target clusters have size greater tha δ (ote that δ eed ot be a costat) the approximatio-stability ideed also implies wea deletiostability, allowig us to get a PTAS (ad thereby δ-close to the target) whe α > 0 is a costat. Claim 3.4. A -media/-meas clusterig istace that satisfies (1 + α, δ) approximatio-stability, ad i which all clusters i the target clusterig have size greater tha δ, also satisfies (1 + α) wea deletio-stability. Proof: Cosider a istace of -media/-meas clusterig which satisfies (1 + α, δ) approximatio-stability. As before, let {c 1, c 2,...,c } be the ceters i the optimal solutio ad cosider the clusterig C (i j) obtaied by o loger usig c i as a ceter ad istead assigig each poit from cluster i to c j, maig the ith cluster empty. The distace of this clusterig from the target is defied as 1 mi σ S i Ctarget i C(i j) σ(i ). Sice C(i j) has oly ( 1) oempty clusters, oe of the target clusters must map to a empty cluster uder ay permutatio σ. Sice by assumptio, this target cluster has more tha δ poits, the distace betwee C target ad C (i j) will be greater tha δ ad hece by the BBG stability coditio, the -media/meas cost of C (i j) must be greater tha (1 + α). C. Wea Deletio-Stability implies β-distributed We show ow that wea deletio-stability implies the istace is β-distributed. Theorem 3.5. Ay (1 + α)-wealy deletio-stable -media istace is α 2 -distributed. Ay (1+α)-wealy deletio-stable -meas istace is α 4 -distributed. Proof: Fix ay ceter i the optimal -clusterig, c i, ad fix ay poit p that does ot belog to the Ci cluster. Deote by Cj the cluster that p is assiged to i the optimal -clusterig. Therefore it must hold that d(p, c j ) d(p, c i ). Cosider the clusterig obtaied by deletig c i from the list of ceters, ad assigig each poit i Ci to Cj. Sice the istace is (1 + α)-wealy deletio-stable, this should icrease the cost by at least α. Suppose we are dealig with a -media istace. Each poit x Ci origially pays d(x, c i ), ad ow, assiged to c j, it pays d(x, c j ) d(x, c i )+d(c i, c j ). Thus, the ew cost of the poits i Ci is upper bouded by x C d(x, c i j ) i + Ci d(c i, c j ). As the icrease i cost is lower bouded by α ad upper bouded by Ci d(c i, c j ), we deduce that d(c i, c j ) > α Ci. Observe that triagle iequal-

4 ity gives that d(c i, c j ) d(c i, p) + d(p, c j ) 2d(c i, p), so we have that d(c i, p) > (α/2) Ci. Suppose we are dealig with a Euclidea -meas istace. Agai, we have created a ew clusterig by assigig all poits i Ci to the ceter c j. Thus, the cost of trasitioig from the optimal -clusterig to this ew ( 1)- clusterig, which is at least α, is upper bouded by x C x c i j 2 x c i 2. As c i = µ Ci, it follows that this boud is exactly x C c i j c i 2 = Ci d2 (c i, c j ), see [13] ( 2, Theorem 2). It follows that d 2 (c i, c j ) > α Ci. As before, d 2 (c i, c j ) ( d(c i, p) + d(p, c j )) 2 4d 2 (c i, p), so d 2 (c i, p) > α 4 Ci. D. NP-hardess uder wea deletio-stability Fially, we would lie to poit out that NP-hardess of the -media problem i maitaied eve if we restrict ourselves oly to wealy deletio-stable istaces. Also the reductio setched below uses oly iteger poly-size distaces, ad hece rules out the existece of a FPTAS for the problem, uless P = NP. I additio, the reductio ca be modified to show that NP-hardess is maitaied uder the coditios studied i [18] ad [4]. Theorem 3.6. For ay costat α > 0, fidig the optimal -media clusterig of (1 + α)-wealy deletio-stable istaces is NP-hard. Proof: Fix ay costat α > 0. We give a poly-time reductio from Set-Cover to (1 + α)-wealy deletio-stable -media istaces. Uder stadard otatio, we assume our iput cosists of subsets of a give uiverse of size m, for which we see a -cover. We reduce such a istace to a -media istace over m + ( + 4αm) poits. We start with the usual reductio of Set-Cover to a istace with m poits represetig the items of the uiverse ad poits represetig all possible sets. Fix iteger D 1 to be chose later. If j belogs to the ith set, fix the distace d(i, j) = D, otherwise we fix the distace d(i, j) = D + 1, ad betwee ay two set-poits we fix the distace to be 1. (The distace betwee ay two item poits is shortestpath distace.) However, we augmet the set-poits with additioal 2mD poits, settig the distace betwee all of the ( + 2mD) poits as 1. Furthermore, we replicate copies of these ( + 2mD) augmeted set-poits, all coected oly via the m-item poits. Observe that each of the copies of our augmeted set-poits compoets cotais may poits, ad all poits outside this copy are of distace D from it. Therefore, i the optimal -media solutio, each ceter resides i oe uique copy of the augmeted set-poits. Now, if our Set-Cover istace has a -cover, the we ca pic the respective ceters ad have a optimal solutio with cost exactly ( + 2mD 1)+ md. Otherwise, o sets cover all m items, so for ay ceters, some item-poit must have distace D + 1 from its ceter, ad so the cost of ay - partitio is (+2mD 1)+mD +1. Furthermore, the resultig istace is (1 + α) wealy deletio-stable, i fact, eve (1+α) ORSS-separable. I particular, usig oe ceter from each augmeted set-poit results i a -media solutio of cost m(d+1)+(+2md 1) < (+1)(+2mD); hece, is at most this quatity. However, i ay 1 clusterig, oe of the copies of the augmeted setpoits must ot cotai a ceter ad therefore ( 1) + ( + 2mD)(D 1). Choosig D = α( + 1) + 1 esures that this cost is at least (1 + α). 4. A PTAS FOR ANY β-distributed -MEDIAN INSTANCE We ow preset the algorithm for fidig a (1 + ǫ)- approximatio of the -media optimum for β-distributed istaces. First, we commet that usig a stadard doublig techique, we ca assume we approximately ow the value of. 4 Our algorithm wors if istead of we use a value v s.t. v (1 + ǫ/2), but for ease of expositio, we assume that the exact value of is ow. Below, we iformally describe the algorithm for a special case of β-distributed istaces i which o cluster domiates the overall cost of the optimal clusterig. Specifically, we say a cluster Ci i the optimal -media clusterig C (hereafter also referred to as the target clusterig) is cheap if i βǫ 32, otherwise, we say Ci is expesive. Note that i ay evet, there ca be at most a costat ( 32 βǫ ) umber of expesive clusters. Algorithm Ituitio: The ituitio for our algorithm ad for itroducig the otio of cheap clusters is the followig. Pic some cluster Ci i the optimal -media clusterig. Sice the istace is β-distributed, ay x / Ci is far from c i, amely, d(x, c i ) > β Ci. I cotrast, the average distace of x Ci from c i i is Ci. Thus, if we focus o a cluster whose cotributio, i, is o more tha, say, β 100, we have that c i is 100 times closer, o average, to the poits of Ci tha to the poits outside C i. Furthermore, usig the triagle iequality we have that ay two average poits of Ci 2β are of distace at most 100 Ci, while the distace betwee ay such average poit ad ay poit outside of Ci 99β is at least 100 Ci. So, if we maage to correctly guess the size s of a cheap cluster, we ca set a radius r = Θ ( ) β s ad collect data-poits accordig to the size ad itersectio of the r-balls aroud them. We ote that this use of balls with a iverse relatio betwee size ad radius is similar to that i the mi-sum clusterig algorithm of [5]. Note that i the geeral case we might have up to 32 βǫ expesive clusters. We hadle them by brute force guessig their ceters. I Subsectio 4-A, we preset the algorithm for clusterig β-distributed istaces of -media uder the assumptio that for all the expesive clusters we have made the correct guess for their cluster ceters. The algorithm 4 Istead of doublig from 1, we ca alteratively ru a off-the-shelf 5-approximatio of, which will retur a value v 5.

5 populates a list Q, where each elemet i this list is a subset of poits. Ideally, each subset is cotaied i some target cluster, yet we might have a few subsets with poits from two or more target clusters. The first stage of the algorithm is to add compoets ito Q, ad the secod stage is to fid good compoets i Q, ad use these compoets to retrieve a clusterig with low cost. Sice we do ot have may expesive clusters, we ca ru the algorithm for all possible guesses for the ceters of the expesive clusters ad choose the solutio which has the miimum cost. The aalysis below shows that oe such guess will lead to a solutio of cost at most (1+ǫ). Later, i Sectio 5, whe we deal with -meas i Euclidea space, we use samplig techiques, similar to those of Kumar et al. [16] ad Ostrovsy et al. [18], to get good substitutes for the ceters of the expesive clusters. Note however a importat differece betwee the approach of [16], [18] ad ours. While they sample poits from all clusters, we sample poits oly for the O(1) expesive clusters. As a result, the rutime of the PTAS of [16], [18] has expoetial depedece i, while ours has oly a polyomial depedece i. A. Clusterig β-distributed Istaces The algorithm is preseted i Figure 1. I this sectio we assume that at the begiig, the list Q is iitialized with Q iit which cotais the ceters of all the expesive clusters. I geeral, the algorithm will be ru several times with Q iit cotaiig differet guesses for the ceters of the expesive clusters. Before goig ito the proof of correctess of the algorithm, we itroduce{ aother defiitio. We defie the ier rig of Ci as the set x; d(x, c i ) β 8 C }. i Note the followig fact: Fact 4.1. If C i is a cheap cluster, the o more tha a ǫ/4 fractio of its poits reside outside the ier rig. I particular, at least half of a cheap cluster is cotaied withi the ier rig. Proof: This follows from Marov s iequality. If more tha (ǫ/4) Ci poits are outside of the ier rig, the i > ǫ C i 4 β 8 Ci = βǫ/32. This cotradicts the fact that Ci is cheap. Our high level goal is to show that for ay cheap cluster Ci i the target clusterig, we isert a compoet T i that is cotaied withi Ci, ad furthermore, cotais oly poits that are close to c i. It will follow from the ext claims that the compoet T i is the oe that cotais poits from the ier rig of Ci. We start with the followig Lemma which we will utilize a few times. Lemma 4.2. Let T be ay compoet added to Q. Let s be the stage i which we add T to Q. Let Ci be ay cheap cluster s.t. s Ci. The (a) T does ot cotai ay [ poit z s.t. the distace d(c i, z) lies withi the rage β 2 Ci, 3β 4 C ], i ad (b) T caot cotai both 1) Iitializatio Stage: Set Q Q iit. 2) Populatio Stage: For s =, 1, 2,...,1 do: a) Set r = β 4s. b) Remove ay poit x such that d(x, Q) < 2r. (Here, d(x, Q) = mi T Q;y T d(x, y).) c) For ay remaiig data poit x, deote the set of data poits whose distace from x is at most r, by B(x, r). Coect ay two remaiig poits a ad b if: (i) d(a, b) r, (ii) B(a, r) > s 2 ad (iii) B(b, r) > s 2. d) Let T be a coected compoet of size > s 2. The: i) Add T to Q. (That is, Q Q {T }.) ii) Defie the set B(T) = {x : d(x, y) 2r for some y T }. Remove the poits of B(T) from the istace. 3) Ceters-Retrievig Stage: For ay choice of compoets T 1, T 2,..., T out of Q (we later show that Q < + O(1/β)) a) Fid the best ceter c i for T i B(T i ). That is c i = argmi p Ti B(T i) x T i B(T i) d(x, p).a b) Partitio all poits accordig to the earest poit amog the ceters of the curret compoets. c) If a clusterig of cost at most (1+ǫ) is foud output these ceters ad halt. a This ca be doe before fixig the choice of compoets out of Q. Figure 1. The algorithm to obtai a PTAS for β-distributed istaces of -media. a poit p 1 s.t. d(c i, p 1) < β 2 d(c i, p 2) > 3β 4 C i. C i ad a poit p 2 s.t. Proof: We prove (a) by cotradictio. Assume T cotais a poit z s.t. β 2 Ci d(c i, z) 3β 4 Ci. Set r = β 4s β 4 Ci, just as i the stage whe T was added to Q, ad let p be ay poit i the ball B(z, r). The by the triagle iequality we have that d(c i, p) d(c i, z) d(z, p) β 4 Ci, ad similarly d(c i, p) d(c β i, z) + d(z, p) Ci. Sice our istace is β-distributed it holds that p belogs to Ci, ad from the defiitio of the ier rig of C i, it holds that p falls outside the ier rig. However, z is added to T because the ball B(z, r) cotais more tha s/2 Ci /2 may poits. So more tha half of the poits i Ci fall outside the ier rig of Ci, which cotradicts Fact 4.1. Assume ow (b) does ot hold. Recall that T is a coected compoet, so exists some path p 1 p 2. Each two cosecutive poits alog this path were coected because

6 their distace is at most β 2 [ β 2 β 4s Ci β 4 C i. As d(c i, p 1) < Ci ad d(c i, p 2) > 3β 4, there must exist a poit z alog the path whose distace from c i falls i the rage Ci, 3β 4 C ], i cotradictig (a). Claim 4.3. Let Ci be ay cheap cluster i the target clusterig. By stage s = Ci, the algorithm adds to Q a compoet T that cotais a poit from the ier rig of Ci. Proof: Suppose that up to the stage s = Ci the algorithm has ot iserted such a compoet ito Q. Now, it is possible that by stage s, the algorithm has iserted some compoet T to Q, s.t. some x i the ier rig of Ci is too close to some y T (amely, d(x, y) 2r), thus causig x to be removed from the istace. Assume for ow this is ot the case. This meas that the ier rig of cluster Ci still cotais more tha C i /2 poits. Also observe that all ier rig poits are of distace at most β 8 Ci from the ceter, so every pair of ier rig poits has a distace of at most β 4 Ci. Hece, whe we reach stage s = C i, ay ball of radius r = β 4s = β 4 Ci cetered at ay ier-rig poit, must cotai all other ier-rig poits. This meas that at stage s = Ci all ier rig poits are coected amog themselves, so they form a compoet (i fact, a clique) of size > s/2. Therefore, the algorithm iserts a ew compoet, cotaiig all ier rig poits. So, by stage s = Ci, oe of two thigs ca happe. Either the algorithm iserts a compoet that cotais some ier rig poit to Q, or the algorithm removes a ier rig poit due to some compoet T Q. If the former happes, we are doe. So let us prove by cotradictio that we caot have oly the latter. Let s Ci be the stage i which we throw away the first ier rig poit of the cluster Ci. At stage s the algorithm removes this ier rig poit x because there exists a poit y i some compoet T Q, s.t. d(x, y) 2r = β 2s, ad so d(c i, y) d(c β i, x) + d(x, y) 8 Ci + β 2s 5 β 8 Ci. This immediately implies that T caot be the ceter of a expesive cluster sice ay such poit will be at a distace at least β C from c i. Let s s Ci be the previous stage i which we added the compoet T to Q. As Lemma 4.2 applies to T, we deduce that d(c i, y) < β 2 Ci. Recall that T cotais > s /2 Ci /2 may poits, yet, by assumptio, cotais oe of the Ci /2 poits that reside i the ier rig of Ci. It follows from Fact 4.1 that some poit w T must belog to a differet cluster Cj. Sice the istace is β-distributed, we have that d(c β i, w) > Ci. The existece of both y ad w i T cotradicts part (b) of Lemma 4.2. We call a compoet T Q good if it cotais a ier rig poit of some cheap cluster Ci. A compoet is called bad if it is ot good ad is ot oe of the iitial ceters preset i Q iit. We ow discuss the properties of good compoets. Claim 4.4. Let T be a good compoet added to Q, cotaiig a ier rig poit from a cheap cluster Ci. (By Claim 4.3 we ow at least oe such T exists.) The: (a) all poits i T are of distace at most β 2 Ci from c i, (b) T B(T) is fully cotaied i Ci, ad (c) the etire ier rig of Ci is cotaied i T B(T), ad (d) o other compoet T Q, T T, cotais a ier rig poit from Ci. Proof: As we do ot ow (d) i advace, it might be the case that Q cotais may good compoets, all cotaiig a ier-rig poit from the same cluster, Ci. Out of these (potetially may) compoets, let T deote the first oe iserted to Q. Deote the stage i which T was iserted to Q as s. Due to the previous claim, we ow s Ci, ad so Lemma 4.2 applies to T. We show (a), (b), (c) ad (d) hold for T, ad deduce that T is the oly good compoet to cotai a ier rig poit from Ci. Part (a) follows immediately from Lemma 4.2. We ow T cotais some ier rig poit x from Ci, so d(c i, x) β 8 C i < β 2 C i, so we ow that ay y T must satisfy that d(c i, y) < β 2 Ci. Sice we ow ow (a) holds ad the istace is β-distributed, we have that T Ci, so we oly eed to show B(T) Ci. Fix ay y B(T). The poit y is assiged to B(T) (thus removed from the istace) because there exists some poit x T s.t. d(x, y) 2r. So agai, we have that d(c i, y) d(c β i, x) + d(x, y) Ci, which gives us that y Ci (sice the istace is β-distributed). We ow prove (c). Because of (b), we deduce that the umber of poits i T is at most Ci. However, i order for T to be added to Q, it must also hold that T > s/2. It follows that s < 2 Ci. Let x be a ier rig poit of C i that belogs to T. The the distace of ay other ier rig poit of Ci β ad x is at most = 2r. It follows 4 Ci < β 2s that ay ier rig poit of C i which is t added to T is assiged to B(T). Thus T B(T) cotais all ier-rig poits. Fially, observe that (d) follows immediately from the defiitio of a good compoet ad from (c). We ow show that i additio to havig all good compoets, we caot have too may bad compoets. Claim 4.5. We have less tha 16/(3β) bad compoets. Proof: Let T be a bad compoet, ad let s be the stage i which T was iserted to Q. Let y be ay poit i T, ad let C be the cluster to which y belogs i the optimal clusterig with ceter c. We show d(c, y) > 3β 8 s. We divide ito cases. Case 1: C is a expesive cluster. Note that we are worig uder the assumptio that Q iit cotais the correct ceters of the expesive clusters. I particular, Q iit cotais c. Also, the fact that poit y was ot throw out i stage s implies that d(c, y) > 2r = β 2s > 3β 8s. Case 2: C is a cheap cluster ad s C. We apply Lemma 4.2, ad deduce that either d(c, y) < β 2 C or

7 C 3β 4 s that d(c, y) > 3β 4. As the ier rig of C cotais > C /2 ad T cotais > s/2 C /2 may poits, oe of which is a ier rig poit, some poit w T does ot belog to C ad hece d(c, w) > β 3β 4 C C >. Part (b) of Lemma 4.2 assures us that all poits i T are also far from c. Case 3: C is a cheap cluster ad s < C. Usig Claim 4.3 we have that some good compoet cotaiig a poit x from the ier rig of C was already added to Q. So it must hold that d(x, y) > 2r, for otherwise we removed y from the istace ad it caot be added to ay T. We deduce that d(c, y) d(x, y) d(c, x) β 3β 8 s. 2s β 8 C > All poits i T have distace > 3β 8s from their respective ceters i the optimal clusterig, ad recall that T is added to Q because T cotais at least s/2 may poits. Therefore, the cotributio of all elemets i T to is at least 3β 16. It follows that we ca have o more tha 16/3β such bad compoets. We ca ow prove the correctess of our algorithm. Theorem 4.6. The algorithm outputs a -clusterig whose cost is o more tha (1 + ǫ). Proof: Usig Claim 4.4, it follows that there exists some choice of compoets, T 1,..., T, such that we have the ceter of every expesive cluster ad the good compoet correspodig to every cheap cluster C. Fix that choice. We show that for the optimal clusterig, replacig the true ceters {c 1, c 2,..., c } with the ceters {c 1, c 2,..., c } that the algorithm outputs, icreases the cost by at most a (1+ǫ) factor. This implies that usig the {c 1, c 2,..., c } as ceters must result i a clusterig with cost at most (1 + ǫ). Fix ay Ci i the optimal clusterig. Let i be the cost of this cluster. If Ci is a expesive cluster the we ow that its ceter c i is preset i the list of ceters chose. Hece, the cost paid by poits i Ci will be at most i. If Ci is a cheap cluster the deote by T the good compoet correspodig to it. We brea the cost of Ci ito two parts: i = x C d(x, c i i ) = x T B(T) d(x, c i ) + x Ci, yet x/ T B(T) d(x, c i ) ad compare it to the cost Ci usig c i, the poit piced by the algorithm to serve as ceter: x C d(x, c i i ) = x T B(T) d(x, c i) + x Ci, yet x/ T B(T) d(x, c i). Now, the first term is exactly the fuctio that is miimized by c i, as c i = arg mi p x T B(T) d(x, p). We also ow c i, the actual ceter of C i, resides i the ier rig, ad therefore, by Claim 4.4 must belog to T B(T). It follows that x T B(T) d(x, c i) x T B(T) d(x, c i ). We ow upper boud the 2d term, ad show that x Ci, yet x/ T B(T) d(x, c i) (1 + ǫ) x Ci, yet x/ T B(T) d(x, c i ) Ay poit x Ci, s.t. x / T B(T), must reside outside the ier rig of Ci. Therefore, d(x, c i ) > β 8 Ci. We show that d(c i, c i ) ǫ β 8 Ci, ad thus we have that d(x, c i ) d(x, c i ) + d(c i, c i) (1 + ǫ)d(x, c i ), which gives the required result. Note that thus far, we have oly used the fact that the cost of ay cheap cluster is proportioal to β/ Ci. Here is the first (ad the oly) time we use the fact that the cost is actually at most (ǫ/32) β/ Ci. Usig the Marov iequality, we have that the set of poits satisfyig {x; d(x, c i ) ǫ β/(16 C i )} cotais at least half of the poits i Ci, ad they all reside i the ier rig, thus belog to T B(T). Assume for the sae of cotradictio that d(c i, c i ) ǫ β 8 Ci. The at least half of the poits i Ci cotribute more tha ǫ β 16 Ci to the sum x T B(T) d(x, c i). It follows that this sum is more tha ǫ β 32 Ci i. However, c i is the poit that miimizes the sum x T B(T) d(x, p), ad by usig p = c i we have x T B(T) d(x, p) i. Cotradictio. B. Rutime aalysis A aive implemetatio of the 2d step of algorithm i Sectio 4-A taes O( 3 ) time (for every s ad every poit x, fid how may of the remaiig poits fall withi the ball of radius r aroud it). Fidig c i for all compoets taes O( 2 ) time, ad measurig the cost of the solutio usig a particular set of data poits as ceters taes O() time. Guessig the right compoets taes O(1/β) time. Overall, the ruig time of the algorithm i Figure 1 is O( 3 O(1/β) ). The geeral algorithm that brute-force guesses the ceters of all expesive clusters, maes O(1/βǫ) iteratios of the give algorithm, so its overall ruig time is O(1/βǫ) O(1/β). 5. A PTAS FOR ANY β-distributed EUCLIDEAN -MEANS INSTANCE Aalogous to the -media algorithm, we preset a essetially idetical algorithm for -meas i Euclidea space. Ideed, the fact that -meas cosiders distaces squared, maes upper (or lower) boudig distaces a bit more complicated, ad requires that we fiddle with the parameters of the algorithm. I additio, the ceters c i may ot be data poits. However, the overall approach remais the same. Roughly speaig, covertig the -media algorithm to the -meas case, we use the same costats, oly squared. 5 As before we hadle expesive clusters by guessig good substitutes for their ceters ad obtai good compoets for cheap clusters. Ofte, whe cosiderig the Euclidea space -meas problem, the dimesio of the space plays a importat factor. I cotrast, here we mae o assumptios about the dimesio, ad our results hold for ay poly() dimesio. I fact, for ease of expositio, we assume all distaces betwee ay two poits were computed i advace ad are give to our algorithm. Clearly, this oly adds O( 2 dim) 5 We stress that we made o attempt to optimize the costats.

8 to our rutime. I additio to the chage i parameters, we utilize the followig facts that hold for the ceter of mass i Euclidea space. Fact 5.1. Let U be a (fiite) set of poits i a Euclidea space, ad let µ U deote their ceter of mass (µ = 1 U x U x). Let A be a radom subset of U, ad deote by µ A the ceter of mass of A. The for ay δ < 1/2, we have both [ ] Pr µ U µ A 2 > 1 δ A 1 x µ U 2 < δ (1) U x U Pr [ x U x µ A 2 > (1 + 1 δ A ) x U x µ U 2 ] < δ Fact 5.2. Let U be a (fiite) set of poits i a Euclidea space, ad let A ad B be a partitio of U. Deote by µ U ad µ A the ceter of mass of U ad A resp. The µ U µ A 2 1 U x U x µ U 2 B A. Fact 5.2, prove i [18] (Lemma 2.2), allows us to upper boud the distace betwee the real ceter of a cluster ad the empirical ceter we get by averagig all poits i T B(T) for a good compoet T. Fact 5.1 allows us to hadle expesive clusters. Sice we caot brute force guess a ceter (as the ceter of the clusters are t ecessarily data poits), we guess a sample of O(β 1 + ǫ 1 ) poits from every expesive cluster, ad use their average as a ceter. Both properties of Fact 5.1, prove i [13] ( 3, Lemma 1 ad 2), assure us that the ceter is a adequate substitute for the real ceter ad is also close to it. This motivates the approach behid our first algorithm, i which we brute-force traverse all choices of O(ǫ 1 + β 1 ) poits for ay of the expesive clusters. The secod algorithm, whose rutime is ( log ) poly(1/ǫ,1/β) O( 3 ), replaces brute-force guessig with radom samplig. Ideed, if a cluster cotais poly(1/) fractio of the poits, the by radomly samplig O(ǫ 1 + β 1 ) poits, the probability that all poits belog to the same expesive cluster, ad furthermore, their average ca serve as a good empirical ceter, is at least 1/ poly(1/ǫ,1/β). I cotrast, if we have expesive clusters that cotai few poits (e.g. a expesive cluster of size, while = poly(log())), the radom samplig is uliely to fid good empirical ceters for them. However, recall that our algorithm collects poits ad deletes them from our istace. So, it is possible that i the middle of the ru, we are left with so few poits, so that expesive clusters whose size is small i compariso to the origial umber of poits, cotai a poly(1/) fractio of the remaiig poits. Ideed, this is the motivatio behid our secod algorithm. We ru the algorithm while iterleavig the Populatio Stage of the algorithm with radom samplig. Istead of ruig s from to 1, we use {, 2, 4, 6,...,1 } (2) as brea poits. Correspodigly, we defie l i to be the umber of expesive clusters whose size is i the rage [ 2i 2, 2i). Wheever s reaches such a 2i brea poit, we radomly sample poits i order to guess the l i+3 ceters of the clusters that lie 3 itervals ahead (ad so, iitially, we guess all ceters i the first 3 itervals). We prove that i every iterval we are liely to sample good empirical ceters. This is a simple corollary of Fact 5.2 alog with the followig two claims. First, we claim that at the ed of each iterval, the umber of poits remaiig is at most 2i+1. Secodly, we also claim that i each iterval we do ot remove eve a sigle poit from a cluster whose size is smaller tha 2i 6. We refer the reader to Appedix A for the algorithms ad their aalysis. 6. DISCUSSION AND OPEN PROBLEMS The algorithm we preset here for -media has rutime of poly( 1/β, 1/ǫ, ), ad the algorithm for -meas has rutime poly(, ( log ) 1/ǫ, ( log ) 1/β ). 6 We commet that it is uliely that we ca obtai a algorithm of rutime poly( 1/ǫ, 1/β, ). Observe that for ay clusterig istace ad ay > 1 we have that ( 1) > 1 + 1, simply by cosiderig the -clusterig that results from taig the optimal ( 1)-clusterig, ad settig the poit which is the furthest from its ceter i a cluster of its ow (as a ew ceter). Hece, ay -media/-meas istace is β- distributed for β = Ω( 1 ). Recall from Sectio 3-D the - media problem restricted oly to wealy-stable istaces has o FPTAS. So the fact that our algorithm s rutime has super-polyomial depedece i both 1/β ad 1/ǫ is uavoidable. Noetheless, oe might still hope to do better. I particular, oe major rutime expese of our algorithm comes from hadlig expesive clusters by brute-force guessig or samplig. Ca oe improve the rutime by doig somethig more clever for expesive clusters? It is worth otig that for the stability coditios of [4], Voevodsi et al. [20] develop a especially efficiet implemetatio with good performace (i terms of both accuracy ad speed) o real-world protei sequece datasets. A differet ope problem lies i the relatio to results of Ostrovsy et al. [18]. Their motivatig questio was to aalyze the performace of Lloyd-type methods over stable istaces. Is it possible that wea deletio-stability is sufficiet for some versio of the -meas heuristic to coverge to the optimal clusterig? Acowledgemets: This wor was supported i part by the Natioal Sciece Foudatio uder grat CCF REFERENCES [1] Sajeev Arora, Prabhaar Raghava, ad Satish Rao. Approximatio schemes for Euclidea -medias ad related problems. I STOC, Whe dealig with -meas i a Euclidea space of dimesio dim, we eed to explicitly compute the distaces, so we add 2 dim to the rutime.

9 [2] Vijay Arya, Navee Garg, Rohit Khadear, Adam Meyerso, Kamesh Muagala, ad Viayaa Padit. Local search heuristic for -media ad facility locatio problems. I STOC, [3] Mihai B ādoiu, Sariel Har-Peled, ad Piotr Idy. Approximate clusterig via core-sets. I STOC, pages , [4] Maria-Floria Balca, Avrim Blum, ad Aupam Gupta. Approximate clusterig without the approximatio. I SODA, [5] Maria-Floria Balca ad Mar Braverma. Fidig low error clusterigs. I COLT, [6] Maria-Floria Balca, Heio Rögli, ad Shag-Hua Teg. Agostic clusterig. I ALT, pages , [7] Moses Chariar, Sudipto Guha, Éva Tardos, ad David B. Shmoys. A costat-factor approximatio algorithm for the -media problem. I STOC, [8] Sajoy Dasgupta. The hardess of -meas clusterig. Techical report, Uiversity of Califoria at Sa Diego, [9] W. Feradez de la Vega, Mare Karpisi, Claire Keyo, ad Yuval Rabai. Approximatio schemes for clusterig problems. I STOC, [10] Michelle Effros ad Leoard J. Schulma. Determiistic clusterig with data ets. ECCC, (050), [11] Sudipto Guha ad Samir Khuller. Greedy stries bac: Improved facility locatio algorithms. I Joural of Algorithms, pages , [12] Sariel Har-Peled ad Soham Mazumdar. O coresets for - meas ad -media clusterig. I STOC, pages , [13] Mary Iaba, Naoi Katoh, ad Hiroshi Imai. Applicatios of weighted vorooi diagrams ad radomizatio to variacebased -clusterig: (exteded abstract). I Proc. 10th Symp. Comp. Geom., pages , [14] Kamal Jai, Mohammad Mahdia, ad Ami Saberi. A ew greedy approach for facility locatio problems (exteded abstract). I STOC, pages , [15] Tapas Kaugo, David M. Mout, Natha S. Netayahu, Christie D. Piato, Ruth Silverma, ad Agela Y. Wu. A local search approximatio algorithm for -meas clusterig. I Proc. 18th Symp. Comp. Geom., [16] Amit Kumar, Yogish Sabharwal, ad Sadeep Se. A simple liear time (1+ ǫ)-approximatio algorithm for -meas clusterig i ay dimesios. I FOCS, [17] R. Ostrovsy ad Y. Rabai. Polyomial time approximatio schemes for geometric -clusterig. I FOCS, [18] Rafail Ostrovsy, Yuval Rabai, Leoard J. Schulma, ad Chaitaya Swamy. The effectiveess of Lloyd-type methods for the -meas problem. I FOCS, pages , [19] F. Schaleamp, M. Yu, ad A. va Zuyle. Clusterig with or without the Approximatio. I COCOON, [20] Kostati Voevodsi, Maria Floria Balca, Heio Rogli, ShagHua Teg, ad Yu Xia. Efficiet clusterig with limited distace iformatio. I Proc. 26th UAI, APPENDIX We preset the algorithm for (1 + ǫ)-approximatio to the -meas optimum of a β-distributed istace. Much lie i Sectio 4, we call a cluster i the optimal -meas solutio cheap if i = x C d 2 (x, c i i ) βǫ 4. 6 A. Clusterig β-distributed Istaces of Euclidea -meas The algorithm is preseted i Figure 2. The correctess is proved i a similar fashio to the proof of correctess preseted i Sectio 4. First, observe that by the Marov { iequality, for ay cheap } cluster Ci, we have that the set x; d 2 (x, c i ) > t β Ci caot cotai more tha ǫ/(4 6 t) fractio of the { poits i Ci. It follows that the ier rig of Ci, the set x; d 2 (x, c i ) β 256 C }, i cotais at least half of the poits of Ci. As metioed Sectio 5 the algorithm populates the list Q with good compoets correspodig to cheap clusters. Also from Sectio 5, we ow that for every expesive cluster, there exists a sample of O( 1 β + 1 ǫ ) data poits whose ceter is a good substitute for the ceter of the cluster. I the aalysis below, we assume that Q has bee iitialized correctly with Q iit cotaiig these good substitutes. I geeral, the algorithm will be ru multiple times for all possible guesses of samples from expesive clusters. We start with the followig lemma which is similar to Lemma 4.2. Lemma A.1. Let T Q be ay compoet ad let s be the stage i which we isert T to Q. Let Ci be ay cheap cluster s.t. s Ci. The (a) T does ot cotai[ ay poit z s.t. the distace d 2 (c i, z) lies withi the rage β 16 Ci, β 4 C ], i ad (b) T caot cotai both a poit p 1 s.t. d 2 (c i, p 1) β 16 Ci ad a poit p 2 s.t. d 2 (c i, p 2) > β 4 Ci. Proof: Assume (a) does ot hold. Let z be such poit, ad let B(z, r) be the set of all poits p s.t. d 2 (z, p) r = β 64s β 64 Ci. As d2 (z, c i ) β 16 Ci, we have that d(z, p) 1 2 d(z, c i ). It follows that d2 (c i, p) (d(c i, z) d(z, p))2 (d(c i, z)/2)2 = β 64 Ci. Similarly, d 2 (c i, p) (d(c i, z) + d(z, p))2 (3d(c i, z)/2)2 9β 16 Ci. Thus B(z, r) is cotaied i C i ier-rig of Ci, yet cotais s/2 C i, but falls outside the /2 may poits. Cotradictio. Assume (b) does ot hold. Let p 1 ad p 2 the above metioed poits. As T is a coected compoets, it follows that alog the path p 1 p 2, exists a pairs of eighborig odes, x, y, s.t. d 2 (x, y) r β 64 C i yet d 2 (c i, x) β 16 Ci while d 2 (c i, y) β 4 Ci. However, a simple computatio gives that d 2 (c i, y) (3d(c i, x)/2)2 9β 64 C i. Cotradictio.

10 1) Iitializatio Stage: Set Q Q iit. 2) Populatio Stage: For s =, 1, 2,...,1 do: a) Set r = β 64s. b) Remove ay poit x such that d 2 (x, Q) < 4r. (Here, d(x, Q) = mi T Q;y T d(x, y).) c) For ay remaiig data poit x, deote the set of data poits whose distace squared from x is at most r, by B(x, r). Coect ay two remaiig poits a ad b if: (i) d 2 (a, b) r, (ii) B(a, r) > s 2 ad (iii) B(b, r) > s 2. d) Let T be a coected compoet of size > s 2. The: i) Add T to Q. (That is, Q Q {T }.) ii) Defie the set B(T) = {x : d 2 (x, y) 4r for some y T }. Remove the poits of B(T) from the istace. 3) Ceters-Retrievig Stage: For ay choice of compoets T 1, T 2,..., T out of Q a) Fid the best ceter c i for T i B(T i ). That is c i = µ(t i B(T i )) = Figure 2. 1 T i B(T i) x T i B(T i) x. b) Partitio all poits accordig to the earest poit amog the ceters of the curret compoets. c) If a clusterig of cost at most (1+ǫ) is foud output these ceters ad halt. A PTAS for β-distributed istaces of Euclidea -meas. Lemma A.1 allows us to give the aalogous claims to Claims 4.3 ad 4.4. As before, call a compoet T good if it is cotaied withi some target cluster Ci ad T B(T) cotais all of the ier rig poits of Ci. Otherwise, the compoet is called bad provided it is ot oe of the iitial ceters preset i Q iit. We ow show that each cheap target cluster will have a sigle, uique, good compoet. Claim A.2. Let Ci be ay cheap cluster i the target clusterig. By stage s = Ci, the algorithm adds to Q a compoet T that cotais a poit from the ier rig of Ci. Claim A.3. Let T be a good coected compoet added to Q, cotaiig a ier rig poit from cluster Ci. The: β (a) all poits i T are of distace squared at most 16 Ci from c i, (b) T B(T) is fully cotaied i C i, ad (c) the etire ier rig of Ci is cotaied i T B(T), ad (d) o other compoet T T i Q cotais a ier rig poit from Ci. As the proofs of Claims A.2 ad A.3 are idetical to the Claims 4.3 ad 4.4, we omit them. Lemma A.4. We do ot add to Q more tha 1000/β bad compoets. Proof: Cosider ay bad compoet T that we add to Q ad deote that stage i which we isert T to Q as s. So the size of this compoet is > s 2. Let y be a arbitrary poit from T which belogs to cluster C i the optimal clusterig. Let c be the ceter of C. We show that d 2 (c, y) > β 500s. We divide ito cases. Case 1: C is a cheap cluster ad s C. Recall that T must cotai s/2 C /2 poits, so it follows that T cotais some poit x that does ot belog to C. β-stability gives that this poit has distace d 2 (c, x) > β C, ad we apply Lemma A.1 to deduce that all poits i T are of C. distace squared of at least β 4 Case 2: C is a cheap cluster ad s < C. I this case we have that the etire ier rig of C already belogs to some T Q. Let x T be ay ier rig poit from C, ad we have that d(c, x) 2 β 256 C β 256s, while d2 (x, y) > β 16s. It follows that d2 (c, y) (3d(x, y)/4) 2 > β 500s. Case 3: C is a expesive cluster ad s > 2 C. We claim that d 2 (c, y) > β 32 C. If, by cotradictio, we have that d 2 (c, y) β 32 C, the we show that the ball B(y, r) cotais oly poits from Ci, yet it must cotais s/2 > Ci poits. This is because each p B(y, ( r) satisfies that d 2 (c, p) (d(c, y) + d(y, p)) 2 ) 2 β 32 C + β 16s < β C. Case 4: C is a expesive cluster ad s 2 C. I this case, from Fact 5.1 we ow that Q iit cotais a a good empirical ceter c for the expesive cluster C, i the sese that c c 2 β 512 C β 256s. The, similarly case 2 above we have d 2 (y, c ) (d(y, c) d(c, c )) 2 > β 500s. It follows that every poit i T has a large distace from its ceter. Therefore, the s/2 poits i this compoet cotribute at least β/1000 to the -meas cost. Hece, we ca have o more tha 1000/β such bad compoets. We ow prove the mai theorem. Theorem A.5. The algorithm outputs a -clusterig whose cost is at most (1 + ǫ). Proof: Usig Claim A.3, it follows that there exists some choice of compoets which has good compoets for all the cheap clusters ad good substitutes for the ceters of the expesive clusters. Fix that choice ad cosider a cluster Ci with ceter c i. If C i is a expesive cluster the from Sectio 5 we ow that Q iit cotais a poit c i i Ci. Hece, the cost paid by the such that d 2 (c i, c i ) βǫ β+ǫ poits i Ci will be atmost (1 + ǫ) i. If Ci is a cheap cluster the deote by T the good compoet that resides withi Ci. Deote T B(T) by A, ad C i \ A by B. Let

Stability yields a PTAS for k-median and k-means Clustering

Stability yields a PTAS for k-median and k-means Clustering Stability yields a PTAS for -Median and -Means Clustering Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cmu.edu Avrim Blum Carnegie Mellon University avrim@cs.cmu.edu Or Sheffet Carnegie Mellon

More information

The isoperimetric problem on the hypercube

The isoperimetric problem on the hypercube The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose

More information

condition w i B i S maximum u i

condition w i B i S maximum u i ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility

More information

1 Graph Sparsfication

1 Graph Sparsfication CME 305: Discrete Mathematics ad Algorithms 1 Graph Sparsficatio I this sectio we discuss the approximatio of a graph G(V, E) by a sparse graph H(V, F ) o the same vertex set. I particular, we cosider

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

Improved Random Graph Isomorphism

Improved Random Graph Isomorphism Improved Radom Graph Isomorphism Tomek Czajka Gopal Paduraga Abstract Caoical labelig of a graph cosists of assigig a uique label to each vertex such that the labels are ivariat uder isomorphism. Such

More information

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University CSCI 5090/7090- Machie Learig Sprig 018 Mehdi Allahyari Georgia Souther Uiversity Clusterig (slides borrowed from Tom Mitchell, Maria Floria Balca, Ali Borji, Ke Che) 1 Clusterig, Iformal Goals Goal: Automatically

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe

More information

Counting the Number of Minimum Roman Dominating Functions of a Graph

Counting the Number of Minimum Roman Dominating Functions of a Graph Coutig the Number of Miimum Roma Domiatig Fuctios of a Graph SHI ZHENG ad KOH KHEE MENG, Natioal Uiversity of Sigapore We provide two algorithms coutig the umber of miimum Roma domiatig fuctios of a graph

More information

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015. Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Computational Geometry

Computational Geometry Computatioal Geometry Chapter 4 Liear programmig Duality Smallest eclosig disk O the Ageda Liear Programmig Slides courtesy of Craig Gotsma 4. 4. Liear Programmig - Example Defie: (amout amout cosumed

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #2: Randomized MST and MST Verification January 14, 2015

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #2: Randomized MST and MST Verification January 14, 2015 15-859E: Advaced Algorithms CMU, Sprig 2015 Lecture #2: Radomized MST ad MST Verificatio Jauary 14, 2015 Lecturer: Aupam Gupta Scribe: Yu Zhao 1 Prelimiaries I this lecture we are talkig about two cotets:

More information

Xiaozhou (Steve) Li, Atri Rudra, Ram Swaminathan. HP Laboratories HPL Keyword(s): graph coloring; hardness of approximation

Xiaozhou (Steve) Li, Atri Rudra, Ram Swaminathan. HP Laboratories HPL Keyword(s): graph coloring; hardness of approximation Flexible Colorig Xiaozhou (Steve) Li, Atri Rudra, Ram Swamiatha HP Laboratories HPL-2010-177 Keyword(s): graph colorig; hardess of approximatio Abstract: Motivated b y reliability cosideratios i data deduplicatio

More information

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a 4. [10] Usig a combiatorial argumet, prove that for 1: = 0 = Let A ad B be disjoit sets of cardiality each ad C = A B. How may subsets of C are there of cardiality. We are selectig elemets for such a subset

More information

Random Graphs and Complex Networks T

Random Graphs and Complex Networks T Radom Graphs ad Complex Networks T-79.7003 Charalampos E. Tsourakakis Aalto Uiversity Lecture 3 7 September 013 Aoucemet Homework 1 is out, due i two weeks from ow. Exercises: Probabilistic iequalities

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised

More information

On (K t e)-saturated Graphs

On (K t e)-saturated Graphs Noame mauscript No. (will be iserted by the editor O (K t e-saturated Graphs Jessica Fuller Roald J. Gould the date of receipt ad acceptace should be iserted later Abstract Give a graph H, we say a graph

More information

Graphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU)

Graphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU) Graphs Miimum Spaig Trees Slides by Rose Hoberma (CMU) Problem: Layig Telephoe Wire Cetral office 2 Wirig: Naïve Approach Cetral office Expesive! 3 Wirig: Better Approach Cetral office Miimize the total

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

Greedy Algorithms. Interval Scheduling. Greedy Algorithms. Interval scheduling. Greedy Algorithms. Interval Scheduling

Greedy Algorithms. Interval Scheduling. Greedy Algorithms. Interval scheduling. Greedy Algorithms. Interval Scheduling Greedy Algorithms Greedy Algorithms Witer Paul Beame Hard to defie exactly but ca give geeral properties Solutio is built i small steps Decisios o how to build the solutio are made to maximize some criterio

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

Combination Labelings Of Graphs

Combination Labelings Of Graphs Applied Mathematics E-Notes, (0), - c ISSN 0-0 Available free at mirror sites of http://wwwmaththuedutw/ame/ Combiatio Labeligs Of Graphs Pak Chig Li y Received February 0 Abstract Suppose G = (V; E) is

More information

Homework 1 Solutions MA 522 Fall 2017

Homework 1 Solutions MA 522 Fall 2017 Homework 1 Solutios MA 5 Fall 017 1. Cosider the searchig problem: Iput A sequece of umbers A = [a 1,..., a ] ad a value v. Output A idex i such that v = A[i] or the special value NIL if v does ot appear

More information

New Results on Energy of Graphs of Small Order

New Results on Energy of Graphs of Small Order Global Joural of Pure ad Applied Mathematics. ISSN 0973-1768 Volume 13, Number 7 (2017), pp. 2837-2848 Research Idia Publicatios http://www.ripublicatio.com New Results o Eergy of Graphs of Small Order

More information

Assignment 5; Due Friday, February 10

Assignment 5; Due Friday, February 10 Assigmet 5; Due Friday, February 10 17.9b The set X is just two circles joied at a poit, ad the set X is a grid i the plae, without the iteriors of the small squares. The picture below shows that the iteriors

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical

More information

1.2 Binomial Coefficients and Subsets

1.2 Binomial Coefficients and Subsets 1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =

More information

Alpha Individual Solutions MAΘ National Convention 2013

Alpha Individual Solutions MAΘ National Convention 2013 Alpha Idividual Solutios MAΘ Natioal Covetio 0 Aswers:. D. A. C 4. D 5. C 6. B 7. A 8. C 9. D 0. B. B. A. D 4. C 5. A 6. C 7. B 8. A 9. A 0. C. E. B. D 4. C 5. A 6. D 7. B 8. C 9. D 0. B TB. 570 TB. 5

More information

Numerical Methods Lecture 6 - Curve Fitting Techniques

Numerical Methods Lecture 6 - Curve Fitting Techniques Numerical Methods Lecture 6 - Curve Fittig Techiques Topics motivatio iterpolatio liear regressio higher order polyomial form expoetial form Curve fittig - motivatio For root fidig, we used a give fuctio

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

DATA STRUCTURES. amortized analysis binomial heaps Fibonacci heaps union-find. Data structures. Appetizer. Appetizer

DATA STRUCTURES. amortized analysis binomial heaps Fibonacci heaps union-find. Data structures. Appetizer. Appetizer Data structures DATA STRUCTURES Static problems. Give a iput, produce a output. Ex. Sortig, FFT, edit distace, shortest paths, MST, max-flow,... amortized aalysis biomial heaps Fiboacci heaps uio-fid Dyamic

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Symmetric Class 0 subgraphs of complete graphs

Symmetric Class 0 subgraphs of complete graphs DIMACS Techical Report 0-0 November 0 Symmetric Class 0 subgraphs of complete graphs Vi de Silva Departmet of Mathematics Pomoa College Claremot, CA, USA Chaig Verbec, Jr. Becer Friedma Istitute Booth

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana The Closest Lie to a Data Set i the Plae David Gurey Southeaster Louisiaa Uiversity Hammod, Louisiaa ABSTRACT This paper looks at three differet measures of distace betwee a lie ad a data set i the plae:

More information

The Adjacency Matrix and The nth Eigenvalue

The Adjacency Matrix and The nth Eigenvalue Spectral Graph Theory Lecture 3 The Adjacecy Matrix ad The th Eigevalue Daiel A. Spielma September 5, 2012 3.1 About these otes These otes are ot ecessarily a accurate represetatio of what happeed i class.

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

Project 2.5 Improved Euler Implementation

Project 2.5 Improved Euler Implementation Project 2.5 Improved Euler Implemetatio Figure 2.5.10 i the text lists TI-85 ad BASIC programs implemetig the improved Euler method to approximate the solutio of the iitial value problem dy dx = x+ y,

More information

Intro to Scientific Computing: Solutions

Intro to Scientific Computing: Solutions Itro to Scietific Computig: Solutios Dr. David M. Goulet. How may steps does it take to separate 3 objects ito groups of 4? We start with 5 objects ad apply 3 steps of the algorithm to reduce the pile

More information

15 UNSUPERVISED LEARNING

15 UNSUPERVISED LEARNING 15 UNSUPERVISED LEARNING [My father] advised me to sit every few moths i my readig chair for a etire eveig, close my eyes ad try to thik of ew problems to solve. I took his advice very seriously ad have

More information

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme Improvig Iformatio Retrieval System Security via a Optimal Maximal Codig Scheme Dogyag Log Departmet of Computer Sciece, City Uiversity of Hog Kog, 8 Tat Chee Aveue Kowloo, Hog Kog SAR, PRC dylog@cs.cityu.edu.hk

More information

Perhaps the method will give that for every e > U f() > p - 3/+e There is o o-trivial upper boud for f() ad ot eve f() < Z - e. seems to be kow, where

Perhaps the method will give that for every e > U f() > p - 3/+e There is o o-trivial upper boud for f() ad ot eve f() < Z - e. seems to be kow, where ON MAXIMUM CHORDAL SUBGRAPH * Paul Erdos Mathematical Istitute of the Hugaria Academy of Scieces ad Reu Laskar Clemso Uiversity 1. Let G() deote a udirected graph, with vertices ad V(G) deote the vertex

More information

Convergence results for conditional expectations

Convergence results for conditional expectations Beroulli 11(4), 2005, 737 745 Covergece results for coditioal expectatios IRENE CRIMALDI 1 ad LUCA PRATELLI 2 1 Departmet of Mathematics, Uiversity of Bologa, Piazza di Porta Sa Doato 5, 40126 Bologa,

More information

On Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract

On Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract O Ifiite Groups that are Isomorphic to its Proper Ifiite Subgroup Jaymar Talledo Baliho Abstract Two groups are isomorphic if there exists a isomorphism betwee them Lagrage Theorem states that the order

More information

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1 CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS) CSC165H1, Witer 018 Learig Objectives By the ed of this worksheet, you will: Aalyse the ruig time of fuctios cotaiig ested loops. 1. Nested loop variatios. Each of the followig fuctios takes as iput a

More information

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions U.C. Berkeley CS170 : Algorithms Midterm 1 Solutios Lecturers: Sajam Garg ad Prasad Raghavedra Feb 1, 017 Midterm 1 Solutios 1. (4 poits) For the directed graph below, fid all the strogly coected compoets

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

University of Waterloo Department of Electrical and Computer Engineering ECE 250 Algorithms and Data Structures

University of Waterloo Department of Electrical and Computer Engineering ECE 250 Algorithms and Data Structures Uiversity of Waterloo Departmet of Electrical ad Computer Egieerig ECE 250 Algorithms ad Data Structures Midterm Examiatio ( pages) Istructor: Douglas Harder February 7, 2004 7:30-9:00 Name (last, first)

More information

Some non-existence results on Leech trees

Some non-existence results on Leech trees Some o-existece results o Leech trees László A.Székely Hua Wag Yog Zhag Uiversity of South Carolia This paper is dedicated to the memory of Domiique de Cae, who itroduced LAS to Leech trees.. Abstract

More information

Lecture 2: Spectra of Graphs

Lecture 2: Spectra of Graphs Spectral Graph Theory ad Applicatios WS 20/202 Lecture 2: Spectra of Graphs Lecturer: Thomas Sauerwald & He Su Our goal is to use the properties of the adjacecy/laplacia matrix of graphs to first uderstad

More information

Matrix Partitions of Split Graphs

Matrix Partitions of Split Graphs Matrix Partitios of Split Graphs Tomás Feder, Pavol Hell, Ore Shklarsky Abstract arxiv:1306.1967v2 [cs.dm] 20 Ju 2013 Matrix partitio problems geeralize a umber of atural graph partitio problems, ad have

More information

Thompson s Group F (p + 1) is not Minimally Almost Convex

Thompson s Group F (p + 1) is not Minimally Almost Convex Thompso s Group F (p + ) is ot Miimally Almost Covex Claire Wladis Thompso s Group F (p + ). A Descriptio of F (p + ) Thompso s group F (p + ) ca be defied as the group of piecewiseliear orietatio-preservig

More information

The golden search method: Question 1

The golden search method: Question 1 1. Golde Sectio Search for the Mode of a Fuctio The golde search method: Questio 1 Suppose the last pair of poits at which we have a fuctio evaluatio is x(), y(). The accordig to the method, If f(x())

More information

Module 8-7: Pascal s Triangle and the Binomial Theorem

Module 8-7: Pascal s Triangle and the Binomial Theorem Module 8-7: Pascal s Triagle ad the Biomial Theorem Gregory V. Bard April 5, 017 A Note about Notatio Just to recall, all of the followig mea the same thig: ( 7 7C 4 C4 7 7C4 5 4 ad they are (all proouced

More information

Lecture 18. Optimization in n dimensions

Lecture 18. Optimization in n dimensions Lecture 8 Optimizatio i dimesios Itroductio We ow cosider the problem of miimizig a sigle scalar fuctio of variables, f x, where x=[ x, x,, x ]T. The D case ca be visualized as fidig the lowest poit of

More information

Algorithms Chapter 3 Growth of Functions

Algorithms Chapter 3 Growth of Functions Algorithms Chapter 3 Growth of Fuctios Istructor: Chig Chi Li 林清池助理教授 chigchi.li@gmail.com Departmet of Computer Sciece ad Egieerig Natioal Taiwa Ocea Uiversity Outlie Asymptotic otatio Stadard otatios

More information

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs CHAPTER IV: GRAPH THEORY Sectio : Itroductio to Graphs Sice this class is called Number-Theoretic ad Discrete Structures, it would be a crime to oly focus o umber theory regardless how woderful those topics

More information

Characterizing graphs of maximum principal ratio

Characterizing graphs of maximum principal ratio Characterizig graphs of maximum pricipal ratio Michael Tait ad Josh Tobi November 9, 05 Abstract The pricipal ratio of a coected graph, deoted γg, is the ratio of the maximum ad miimum etries of its first

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

Analysis of Documents Clustering Using Sampled Agglomerative Technique

Analysis of Documents Clustering Using Sampled Agglomerative Technique Aalysis of Documets Clusterig Usig Sampled Agglomerative Techique Omar H. Karam, Ahmed M. Hamad, ad Sheri M. Moussa Abstract I this paper a clusterig algorithm for documets is proposed that adapts a samplig-based

More information

Minimum Spanning Trees

Minimum Spanning Trees Presetatio for use with the textbook, lgorithm esig ad pplicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 0 Miimum Spaig Trees 0 Goodrich ad Tamassia Miimum Spaig Trees pplicatio: oectig a Network Suppose

More information

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters. SD vs. SD + Oe of the most importat uses of sample statistics is to estimate the correspodig populatio parameters. The mea of a represetative sample is a good estimate of the mea of the populatio that

More information

On Alliance Partitions and Bisection Width for Planar Graphs

On Alliance Partitions and Bisection Width for Planar Graphs Joural of Graph Algorithms ad Applicatios http://jgaa.ifo/ vol. 17, o. 6, pp. 599 614 (013) DOI: 10.7155/jgaa.00307 O Alliace Partitios ad Bisectio Width for Plaar Graphs Marti Olse 1 Morte Revsbæk 1 AU

More information

Minimum Spanning Trees. Application: Connecting a Network

Minimum Spanning Trees. Application: Connecting a Network Miimum Spaig Tree // : Presetatio for use with the textbook, lgorithm esig ad pplicatios, by M. T. oodrich ad R. Tamassia, Wiley, Miimum Spaig Trees oodrich ad Tamassia Miimum Spaig Trees pplicatio: oectig

More information

Mathematical Stat I: solutions of homework 1

Mathematical Stat I: solutions of homework 1 Mathematical Stat I: solutios of homework Name: Studet Id N:. Suppose we tur over cards simultaeously from two well shuffled decks of ordiary playig cards. We say we obtai a exact match o a particular

More information

Examples and Applications of Binary Search

Examples and Applications of Binary Search Toy Gog ITEE Uiersity of Queeslad I the secod lecture last week we studied the biary search algorithm that soles the problem of determiig if a particular alue appears i a sorted list of iteger or ot. We

More information

Solution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring, Instructions:

Solution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring, Instructions: CS 604 Data Structures Midterm Sprig, 00 VIRG INIA POLYTECHNIC INSTITUTE AND STATE U T PROSI M UNI VERSI TY Istructios: Prit your ame i the space provided below. This examiatio is closed book ad closed

More information

Average Connectivity and Average Edge-connectivity in Graphs

Average Connectivity and Average Edge-connectivity in Graphs Average Coectivity ad Average Edge-coectivity i Graphs Jaehoo Kim, Suil O July 1, 01 Abstract Coectivity ad edge-coectivity of a graph measure the difficulty of breakig the graph apart, but they are very

More information

On Nonblocking Folded-Clos Networks in Computer Communication Environments

On Nonblocking Folded-Clos Networks in Computer Communication Environments O Noblockig Folded-Clos Networks i Computer Commuicatio Eviromets Xi Yua Departmet of Computer Sciece, Florida State Uiversity, Tallahassee, FL 3306 xyua@cs.fsu.edu Abstract Folded-Clos etworks, also referred

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

Octahedral Graph Scaling

Octahedral Graph Scaling Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Relationship between augmented eccentric connectivity index and some other graph invariants

Relationship between augmented eccentric connectivity index and some other graph invariants Iteratioal Joural of Advaced Mathematical Scieces, () (03) 6-3 Sciece Publishig Corporatio wwwsciecepubcocom/idexphp/ijams Relatioship betwee augmeted eccetric coectivity idex ad some other graph ivariats

More information

c-dominating Sets for Families of Graphs

c-dominating Sets for Families of Graphs c-domiatig Sets for Families of Graphs Kelsie Syder Mathematics Uiversity of Mary Washigto April 6, 011 1 Abstract The topic of domiatio i graphs has a rich history, begiig with chess ethusiasts i the

More information

Analysis of Algorithms

Analysis of Algorithms Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms

More information

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions Proceedigs of the 10th WSEAS Iteratioal Coferece o APPLIED MATHEMATICS, Dallas, Texas, USA, November 1-3, 2006 316 A Geeralized Set Theoretic Approach for Time ad Space Complexity Aalysis of Algorithms

More information

Spanning Maximal Planar Subgraphs of Random Graphs

Spanning Maximal Planar Subgraphs of Random Graphs Spaig Maximal Plaar Subgraphs of Radom Graphs 6. Bollobiis* Departmet of Mathematics, Louisiaa State Uiversity, Bato Rouge, LA 70803 A. M. Frieze? Departmet of Mathematics, Caregie-Mello Uiversity, Pittsburgh,

More information

Throughput-Delay Scaling in Wireless Networks with Constant-Size Packets

Throughput-Delay Scaling in Wireless Networks with Constant-Size Packets Throughput-Delay Scalig i Wireless Networks with Costat-Size Packets Abbas El Gamal, James Mamme, Balaji Prabhakar, Devavrat Shah Departmets of EE ad CS Staford Uiversity, CA 94305 Email: {abbas, jmamme,

More information