Stability yields a PTAS for k-median and k-means Clustering
|
|
- Elijah Booth
- 5 years ago
- Views:
Transcription
1 Stability yields a PTAS for -Media ad -Meas Clusterig Prajal Awasthi Caregie Mello Uiversity pawasthi@cs.cmu.edu Avrim Blum Caregie Mello Uiversity avrim@cs.cmu.edu Or Sheffet Caregie Mello Uiversity osheffet@cs.cmu.edu Abstract We cosider -media clusterig i fiite metric spaces ad -meas clusterig i Euclidea spaces, i the settig where is part of the iput (ot a costat). For the -meas problem, Ostrovsy et al. [18] show that if the optimal ( 1)-meas clusterig of the iput is more expesive tha the optimal -meas clusterig by a factor of 1/ǫ 2, the oe ca achieve a (1 + f(ǫ))-approximatio to the -meas optimal i time polyomial i ad by usig a variat of Lloyd s algorithm. I this wor we substatially improve this approximatio guaratee. We show that give oly the coditio that the ( 1)-meas optimal is more expesive tha the -meas optimal by a factor 1+α for some costat α > 0, we ca obtai a PTAS. I particular, uder this assumptio, for ay ǫ > 0 we achieve a (1 + ǫ)-approximatio to the -meas optimal i time polyomial i ad, ad expoetial i 1/ǫ ad 1/α. We thus decouple the stregth of the assumptio from the quality of the approximatio ratio. We also give a PTAS for the -media problem i fiite metrics uder the aalogous assumptio as well. For -meas, we i additio give a radomized algorithm with improved ruig time of O(1) ( log ) poly(1/ǫ,1/α). Our techique also obtais a PTAS uder the assumptio of Balca et al. [4] that all (1 + α) approximatios are δ-close to a desired target clusterig, i the case that all target clusters have size greater tha δ ad α > 0 is costat. Note that the motivatio of Balca et al. [4] is that for may clusterig problems, the objective fuctio is oly a proxy for the true goal of gettig close to the target. From this perspective, our improvemet is that for -meas i Euclidea spaces we reduce the distace of the clusterig foud to the target from O(δ) to δ whe all target clusters are large, ad for -media we improve the largeess coditio eeded i [4] to get exactly δ-close from O(δ) to δ. Our results are based o a ew otio of clusterig stability. 1. INTRODUCTION Clusterig is a well-studied tas, arisig i umerous areas from computer visio to computatioal biology to distributed computig. Geerally speaig, the goal of clusterig is to partitio give data objects ito groups that share some commoality. Operatioally, clusterig is ofte performed by viewig the data as poits i a metric space ad the optimizig some atural objective over them. I this paper, we cosider two popular such objectives, -media ad -meas. Both measure a -partitio by choosig a special poit for each cluster, called the ceter, ad defie the cost of a clusterig as a fuctio of the distaces betwee the data poits ad their respective ceters. I the -media case, the cost is the sum of the distaces of the poits to their ceters, ad i the -meas case, the cost is the sum of these distaces squared. The -media objective is typically studied for data i a fiite metric (complete weighted graph satisfyig triagle iequality) over the data poits; -meas clusterig is typically studied for poits i a (fiite dimesioal) Euclidea space. Both objectives are ow to be NP-hard (we view as part of the iput ad ot a costat, though eve the 2-meas problem i Euclidea space was recetly show to be NP-hard [8]). For -media i a fiite metric, there is a ow (1+1/e)- hardess of approximatio result [14] ad substatial wor o approximatio algorithms [11], [7], [2], [14], [9], with the best guaratee a 3 + ǫ approximatio. For -meas i a Euclidea space, there is also a vast literature of approximatio algorithms [17], [3], [9], [10], [12], [15] with the best guaratee a costat-factor approximatio if polyomial depedece o ad the dimesio d is desired. 1 Ostrovsy et al. [18] proposed a iterestig coditio uder which oe ca achieve better -meas approximatios i time polyomial i ad. They cosider -meas istaces where the optimal -clusterig has cost oticeably smaller tha the cost of ay ( 1)-clusterig, motivated by the idea that if a ear-optimal -clusterig ca be achieved by a partitio ito fewer tha clusters, the that smaller value of should be used to cluster the data [18]. Uder the assumptio that the ratio of the cost of the optimal ( 1)-meas clusterig to the cost of the optimal -meas clusterig is at least max{100, 1/ǫ 2 }, Ostrovsy et al. show that oe ca obtai a (1+f(ǫ))-approximatio for -meas i time polyomial i ad, by usig a variat o Lloyd s algorithm. I this paper, we substatially improve o this approximatio guaratee. We show that uder the much weaer assumptio that the ratio of these costs is just at least (1 + α) for some costat α > 0, we ca achieve a PTAS: amely, (1 + ǫ)-approximate the -meas optimum, for ay costat ǫ > 0. Our approximatio scheme rus i time which is poly(, ) ad expoetial oly i 1/ǫ ad 1/α. Thus, we decouple the stregth of the assumptio from the quality of the coclusio, ad i the process allow the assumptio to be substatially weaer. For -meas clusterig we i additio give a radomized algorithm with improved ruig time O(1) ( log ) poly(1/ǫ,1/α). Balca et al. [4], motivated by the fact that objective fuctios are ofte just a proxy for the uderlyig goal 1 If is costat, the -media i fiite metrics ca be trivially solved i polyomial time ad there is a PTAS ow for -meas (ad -media) i Euclidea space [16]. There is also a PTAS ow for low-dimesioal Euclidea spaces (dimesio at most log log ) [1], [12].
2 of gettig the data clustered correctly, propose clusterig istaces that satisfy the coditio that all (1 + α) approximatios to the give objective (e.g., -media or -meas) are δ-close, i terms of how poits are partitioed, to a target clusterig (such as a correct clusterig of proteis by fuctio or a correct clusterig of images by who is i them). This ca be viewed as a assumptio made implicitly whe cosiderig approximatio algorithms for problems of this ature where the true goal is to get close to the target. Balca et al. show that for ay α ad δ, give a istace satisfyig this property for -media or -meas objectives, oe ca i fact efficietly produce a clusterig that is O(δ/α)-close to the target clusterig (so, O(δ)-close for ay costat α > 0), eve though obtaiig a 1 + α approximatio to the objective is NP-hard for α < 1 e, ad remais hard eve uder this assumptio. Thus they show that oe ca approximate the target eve though it is hard to approximate the objective. Oe iterestig questio that has remaied is the approximability of the objectives whe all target clusters are large compared to δ, sice the hardess of approximatio requires allowig small clusters. 2 Here, we show that for both -media ad -meas objectives, if all clusters cotai more tha δ poits, the for ay costat α > 0 we ca i fact get a PTAS. Thus, we (early) resolve the approximability of these objectives uder this coditio. Note that uder this coditio, this further implies fidig a δ-close clusterig (settig ǫ = α). Thus, we also exted the results of Balca et al. [4] i the case of large clusters ad costat α by gettig exactly δ-close for both -media ad -meas objectives. (I [4] this exact closeess was achieved for the -media objective but eeded a somewhat larger O(δ(1 + 1/α)) miimum cluster size requiremet). Our algorithmic results are achieved by examiig implicatios of a property we call wea deletio-stability that is implied by both the separatio coditio of Ostrovsy et al. [18] as well as (whe target clusters are large) the stability coditio of Balca et al. [4]. I particular, a istace of - media/-meas clusterig satisfies wea deletio-stability if i the optimal solutio, deletig ay of the ceters c i ad assigig all poits i cluster i istead to oe of the remaiig 1 ceters c j, results i a icrease i the -media/-meas cost by a (arbitrarily small) costat factor. We also show that wea deletio-stability still allows for NP-hard istaces ad that o FPTAS is possible as well (uless P = NP). Thus, our algorithm, whose ruig time is () poly(1/ǫ,1/β), is optimal i the sese that the superpolyomial depedece o 1/ǫ ad 1/β is uavoidable. After presetig otatio ad prelimiaries i Sectio 2, i Sectio 3 we itroduce wea deletio-stability ad relate it to the stability otios of [18] ad [4]. We the defie aother property of a clusterig beig β-distributed which, 2 I fact, as show i [19], the -media algorithm i [4] for the case that clusters are sufficietly large compared to δ(1 + 1/α) achieves a better costat-factor approximatio. Note that δ eed ot be a costat. while ot so ituitive, we show is implied by wea deletiostability ad will be the actual coditio that our algorithms will use. We the go o to prove that beig β-distributed suffices to give a PTAS for -media i Sectio 4. We exted the algorithm to -meas clusterig i Sectio 5, where we also itroduce a radomized versio whose ru-time is bouded by 3 ((log() )) poly(1/ǫ,1/β). We coclude with discussio ad ope problems i Sectio NOTATION AND PRELIMINARIES We are give a set S of poits. Whe discussig - media, we assume the poits reside i a fiite metric space, ad whe discussig -meas, we assume they all reside i a fiite dimesioal Euclidea space. We deote d : S S R 0 as the distace fuctio. A solutio to the -media objective partitios the poits ito disjoit subsets, C 1, C 2,...,C ad assigs a ceter c i for each subset. The -media cost of this partitio is the measured by i=1 x C i d(x, c i ). A solutio to the -meas objective agai gives a -partitio of the data poits, but ow we may assume uses the ceter of mass, µ Ci = 1 x C i x, as the ceter of the C i. C i We the measure the -meas cost of this clusterig by i=1 x C i d 2 (x, µ Ci ) = i=1 x C i x µ Ci 2. The optimal clusterig (w.r.t. to either the -media or the -meas objective) is deoted as C = {C1, C 2,...,C }, ad its cost is deoted as. The ceters used i the optimal clusterig are deoted as {c 1, c 2,..., c }. Clearly, give the optimal clusterig, we ca fid the optimal ceters (either by brute-force checig all possible poits for - media, or by c i = µ Ci for -meas). Alteratively, give the optimal ceters, we ca assig each x to its earest ceter, thus obtaiig the optimal clusterig. Thus, we use C to deote both the optimal -partitio, ad the optimal list of ceters. We use i to deote the cotributio of the cluster i to, that is i = x C d(x, c i i ) i the -media case, or i = x C d 2 (x, c i i ) i the -meas case. 3. STABILITY PROPERTIES As metioed above, our results are achieved by exploitig implicatios of a stability coditio we call wea deletio-stability, ad i particular a implicatio we call beig β-distributed. I this sectio we defie wea deletiostability ad of beig β-distributed, relate wea deletiostability to coditios of Ostrovsy et al. [18] ad Balca et al. [4], ad show that wea deletio-stability implies the clusterig is β-distributed. I Sectios 4 ad 5 we use the property of beig β-distributed to obtai a PTAS. 3 Defiitio 3.1. For α > 0, a -media/-meas istace satisfies (1+α) wea deletio-stability, if it has the followig property. Let {c 1, c 2,...,c } deote the ceters i the 3 Techically, we could sip the middlema of wea deletio-stability ad just defie the property of beig β-distributed as our mai stability otio, but wea deletio-stability is a more ituitive coditio.
3 optimal -media/-meas solutio. Let deote the optimal -media/-meas cost ad let (i j) deote the cost of the clusterig obtaied by removig c i as a ceter ad assigig all its poits istead to c j. The for ay i j, it holds that (i j) > (1 + α) We use wea deletio-stability via the followig implicatio we call beig β-distributed. Defiitio 3.2. For β > 0, a -media istace is β- distributed if for ay ceter c i of the optimal clusterig ad ay data poit x / Ci, it holds that d(x, c i ) β C i. A -meas istace is β-distributed if for ay such c i ad x / Ci, it holds that d 2 (x, c i ) β C i We prove that (1 + α) wea deletio-stability implies the clusterig is α/2-distributed for -media (α/4-distributed for -meas) i Theorem 3.5 below. First, however, we relate wea deletio-stability to the coditios cosidered i [18] ad [4]. A. ORSS-Separability Ostrovsy, Rabai, Schulma ad Swamy [18] defie a clusterig istace to be ǫ-separated if the optimal -meas solutio is cheaper tha the optimal ( 1)-meas solutio by at least a factor ǫ 2. For a give objective (-meas or -media) let us use ( 1) to deote the cost of the optimal ( 1)-clusterig. Itroducig a parameter α > 0, say a clusterig istace is (1 + α)-orss separable if ( 1) > 1 + α If a istace satisfies (1 + α)-orss separability the all ( 1) clusterigs must have cost more tha (1 + α) ad hece it is immediately evidet that the istace will also satisfy (1 + α)-wea deletio-stability. Hece we have the followig claim: Claim 3.3. Ay (1 + α)-orss separable -media/-meas istace is also (1 + α)-wealy deletio stable. B. BBG-Stability Balca, Blum, ad Gupta [4] (see also Balca ad Braverma [5] ad Balca, Rögli, ad Teg [6]) cosider a otio of stability to approximatios motivated by settigs i which there exists some (uow) target clusterig C target we would lie to produce. Balca et al. [4] defie a clusterig istace to be (1 + α, δ) approximatio-stable with respect to some objective Φ (such as -media or -meas), if ay -partitio whose cost uder Φ is at most (1+α) agrees with the target clusterig o all but at most δ data poits. That is, for ay (1 + α) approximatio C to objective Φ, we have mi σ S i Ctarget i C σ(i) δ (here, σ is simply a matchig of the idices i the target clusterig to those i C). I geeral, δ may be larger tha the smallest target cluster size, ad i that case approximatio-stability eed ot imply wea deletio-stability (ot surprisigly sice [4] show that -media ad -meas remai hard to approximate). However, whe all target clusters have size greater tha δ (ote that δ eed ot be a costat) the approximatio-stability ideed also implies wea deletiostability, allowig us to get a PTAS (ad thereby δ-close to the target) whe α > 0 is a costat. Claim 3.4. A -media/-meas clusterig istace that satisfies (1 + α, δ) approximatio-stability, ad i which all clusters i the target clusterig have size greater tha δ, also satisfies (1 + α) wea deletio-stability. Proof: Cosider a istace of -media/-meas clusterig which satisfies (1 + α, δ) approximatio-stability. As before, let {c 1, c 2,...,c } be the ceters i the optimal solutio ad cosider the clusterig C (i j) obtaied by o loger usig c i as a ceter ad istead assigig each poit from cluster i to c j, maig the ith cluster empty. The distace of this clusterig from the target is defied as 1 mi σ S i Ctarget i C(i j) σ(i ). Sice C(i j) has oly ( 1) oempty clusters, oe of the target clusters must map to a empty cluster uder ay permutatio σ. Sice by assumptio, this target cluster has more tha δ poits, the distace betwee C target ad C (i j) will be greater tha δ ad hece by the BBG stability coditio, the -media/meas cost of C (i j) must be greater tha (1 + α). C. Wea Deletio-Stability implies β-distributed We show ow that wea deletio-stability implies the istace is β-distributed. Theorem 3.5. Ay (1 + α)-wealy deletio-stable -media istace is α 2 -distributed. Ay (1+α)-wealy deletio-stable -meas istace is α 4 -distributed. Proof: Fix ay ceter i the optimal -clusterig, c i, ad fix ay poit p that does ot belog to the Ci cluster. Deote by Cj the cluster that p is assiged to i the optimal -clusterig. Therefore it must hold that d(p, c j ) d(p, c i ). Cosider the clusterig obtaied by deletig c i from the list of ceters, ad assigig each poit i Ci to Cj. Sice the istace is (1 + α)-wealy deletio-stable, this should icrease the cost by at least α. Suppose we are dealig with a -media istace. Each poit x Ci origially pays d(x, c i ), ad ow, assiged to c j, it pays d(x, c j ) d(x, c i )+d(c i, c j ). Thus, the ew cost of the poits i Ci is upper bouded by x C d(x, c i j ) i + Ci d(c i, c j ). As the icrease i cost is lower bouded by α ad upper bouded by Ci d(c i, c j ), we deduce that d(c i, c j ) > α Ci. Observe that triagle iequal-
4 ity gives that d(c i, c j ) d(c i, p) + d(p, c j ) 2d(c i, p), so we have that d(c i, p) > (α/2) Ci. Suppose we are dealig with a Euclidea -meas istace. Agai, we have created a ew clusterig by assigig all poits i Ci to the ceter c j. Thus, the cost of trasitioig from the optimal -clusterig to this ew ( 1)- clusterig, which is at least α, is upper bouded by x C x c i j 2 x c i 2. As c i = µ Ci, it follows that this boud is exactly x C c i j c i 2 = Ci d2 (c i, c j ), see [13] ( 2, Theorem 2). It follows that d 2 (c i, c j ) > α Ci. As before, d 2 (c i, c j ) ( d(c i, p) + d(p, c j )) 2 4d 2 (c i, p), so d 2 (c i, p) > α 4 Ci. D. NP-hardess uder wea deletio-stability Fially, we would lie to poit out that NP-hardess of the -media problem i maitaied eve if we restrict ourselves oly to wealy deletio-stable istaces. Also the reductio setched below uses oly iteger poly-size distaces, ad hece rules out the existece of a FPTAS for the problem, uless P = NP. I additio, the reductio ca be modified to show that NP-hardess is maitaied uder the coditios studied i [18] ad [4]. Theorem 3.6. For ay costat α > 0, fidig the optimal -media clusterig of (1 + α)-wealy deletio-stable istaces is NP-hard. Proof: Fix ay costat α > 0. We give a poly-time reductio from Set-Cover to (1 + α)-wealy deletio-stable -media istaces. Uder stadard otatio, we assume our iput cosists of subsets of a give uiverse of size m, for which we see a -cover. We reduce such a istace to a -media istace over m + ( + 4αm) poits. We start with the usual reductio of Set-Cover to a istace with m poits represetig the items of the uiverse ad poits represetig all possible sets. Fix iteger D 1 to be chose later. If j belogs to the ith set, fix the distace d(i, j) = D, otherwise we fix the distace d(i, j) = D + 1, ad betwee ay two set-poits we fix the distace to be 1. (The distace betwee ay two item poits is shortestpath distace.) However, we augmet the set-poits with additioal 2mD poits, settig the distace betwee all of the ( + 2mD) poits as 1. Furthermore, we replicate copies of these ( + 2mD) augmeted set-poits, all coected oly via the m-item poits. Observe that each of the copies of our augmeted set-poits compoets cotais may poits, ad all poits outside this copy are of distace D from it. Therefore, i the optimal -media solutio, each ceter resides i oe uique copy of the augmeted set-poits. Now, if our Set-Cover istace has a -cover, the we ca pic the respective ceters ad have a optimal solutio with cost exactly ( + 2mD 1)+ md. Otherwise, o sets cover all m items, so for ay ceters, some item-poit must have distace D + 1 from its ceter, ad so the cost of ay - partitio is (+2mD 1)+mD +1. Furthermore, the resultig istace is (1 + α) wealy deletio-stable, i fact, eve (1+α) ORSS-separable. I particular, usig oe ceter from each augmeted set-poit results i a -media solutio of cost m(d+1)+(+2md 1) < (+1)(+2mD); hece, is at most this quatity. However, i ay 1 clusterig, oe of the copies of the augmeted setpoits must ot cotai a ceter ad therefore ( 1) + ( + 2mD)(D 1). Choosig D = α( + 1) + 1 esures that this cost is at least (1 + α). 4. A PTAS FOR ANY β-distributed -MEDIAN INSTANCE We ow preset the algorithm for fidig a (1 + ǫ)- approximatio of the -media optimum for β-distributed istaces. First, we commet that usig a stadard doublig techique, we ca assume we approximately ow the value of. 4 Our algorithm wors if istead of we use a value v s.t. v (1 + ǫ/2), but for ease of expositio, we assume that the exact value of is ow. Below, we iformally describe the algorithm for a special case of β-distributed istaces i which o cluster domiates the overall cost of the optimal clusterig. Specifically, we say a cluster Ci i the optimal -media clusterig C (hereafter also referred to as the target clusterig) is cheap if i βǫ 32, otherwise, we say Ci is expesive. Note that i ay evet, there ca be at most a costat ( 32 βǫ ) umber of expesive clusters. Algorithm Ituitio: The ituitio for our algorithm ad for itroducig the otio of cheap clusters is the followig. Pic some cluster Ci i the optimal -media clusterig. Sice the istace is β-distributed, ay x / Ci is far from c i, amely, d(x, c i ) > β Ci. I cotrast, the average distace of x Ci from c i i is Ci. Thus, if we focus o a cluster whose cotributio, i, is o more tha, say, β 100, we have that c i is 100 times closer, o average, to the poits of Ci tha to the poits outside C i. Furthermore, usig the triagle iequality we have that ay two average poits of Ci 2β are of distace at most 100 Ci, while the distace betwee ay such average poit ad ay poit outside of Ci 99β is at least 100 Ci. So, if we maage to correctly guess the size s of a cheap cluster, we ca set a radius r = Θ ( ) β s ad collect data-poits accordig to the size ad itersectio of the r-balls aroud them. We ote that this use of balls with a iverse relatio betwee size ad radius is similar to that i the mi-sum clusterig algorithm of [5]. Note that i the geeral case we might have up to 32 βǫ expesive clusters. We hadle them by brute force guessig their ceters. I Subsectio 4-A, we preset the algorithm for clusterig β-distributed istaces of -media uder the assumptio that for all the expesive clusters we have made the correct guess for their cluster ceters. The algorithm 4 Istead of doublig from 1, we ca alteratively ru a off-the-shelf 5-approximatio of, which will retur a value v 5.
5 populates a list Q, where each elemet i this list is a subset of poits. Ideally, each subset is cotaied i some target cluster, yet we might have a few subsets with poits from two or more target clusters. The first stage of the algorithm is to add compoets ito Q, ad the secod stage is to fid good compoets i Q, ad use these compoets to retrieve a clusterig with low cost. Sice we do ot have may expesive clusters, we ca ru the algorithm for all possible guesses for the ceters of the expesive clusters ad choose the solutio which has the miimum cost. The aalysis below shows that oe such guess will lead to a solutio of cost at most (1+ǫ). Later, i Sectio 5, whe we deal with -meas i Euclidea space, we use samplig techiques, similar to those of Kumar et al. [16] ad Ostrovsy et al. [18], to get good substitutes for the ceters of the expesive clusters. Note however a importat differece betwee the approach of [16], [18] ad ours. While they sample poits from all clusters, we sample poits oly for the O(1) expesive clusters. As a result, the rutime of the PTAS of [16], [18] has expoetial depedece i, while ours has oly a polyomial depedece i. A. Clusterig β-distributed Istaces The algorithm is preseted i Figure 1. I this sectio we assume that at the begiig, the list Q is iitialized with Q iit which cotais the ceters of all the expesive clusters. I geeral, the algorithm will be ru several times with Q iit cotaiig differet guesses for the ceters of the expesive clusters. Before goig ito the proof of correctess of the algorithm, we itroduce{ aother defiitio. We defie the ier rig of Ci as the set x; d(x, c i ) β 8 C }. i Note the followig fact: Fact 4.1. If C i is a cheap cluster, the o more tha a ǫ/4 fractio of its poits reside outside the ier rig. I particular, at least half of a cheap cluster is cotaied withi the ier rig. Proof: This follows from Marov s iequality. If more tha (ǫ/4) Ci poits are outside of the ier rig, the i > ǫ C i 4 β 8 Ci = βǫ/32. This cotradicts the fact that Ci is cheap. Our high level goal is to show that for ay cheap cluster Ci i the target clusterig, we isert a compoet T i that is cotaied withi Ci, ad furthermore, cotais oly poits that are close to c i. It will follow from the ext claims that the compoet T i is the oe that cotais poits from the ier rig of Ci. We start with the followig Lemma which we will utilize a few times. Lemma 4.2. Let T be ay compoet added to Q. Let s be the stage i which we add T to Q. Let Ci be ay cheap cluster s.t. s Ci. The (a) T does ot cotai ay [ poit z s.t. the distace d(c i, z) lies withi the rage β 2 Ci, 3β 4 C ], i ad (b) T caot cotai both 1) Iitializatio Stage: Set Q Q iit. 2) Populatio Stage: For s =, 1, 2,...,1 do: a) Set r = β 4s. b) Remove ay poit x such that d(x, Q) < 2r. (Here, d(x, Q) = mi T Q;y T d(x, y).) c) For ay remaiig data poit x, deote the set of data poits whose distace from x is at most r, by B(x, r). Coect ay two remaiig poits a ad b if: (i) d(a, b) r, (ii) B(a, r) > s 2 ad (iii) B(b, r) > s 2. d) Let T be a coected compoet of size > s 2. The: i) Add T to Q. (That is, Q Q {T }.) ii) Defie the set B(T) = {x : d(x, y) 2r for some y T }. Remove the poits of B(T) from the istace. 3) Ceters-Retrievig Stage: For ay choice of compoets T 1, T 2,..., T out of Q (we later show that Q < + O(1/β)) a) Fid the best ceter c i for T i B(T i ). That is c i = argmi p Ti B(T i) x T i B(T i) d(x, p).a b) Partitio all poits accordig to the earest poit amog the ceters of the curret compoets. c) If a clusterig of cost at most (1+ǫ) is foud output these ceters ad halt. a This ca be doe before fixig the choice of compoets out of Q. Figure 1. The algorithm to obtai a PTAS for β-distributed istaces of -media. a poit p 1 s.t. d(c i, p 1) < β 2 d(c i, p 2) > 3β 4 C i. C i ad a poit p 2 s.t. Proof: We prove (a) by cotradictio. Assume T cotais a poit z s.t. β 2 Ci d(c i, z) 3β 4 Ci. Set r = β 4s β 4 Ci, just as i the stage whe T was added to Q, ad let p be ay poit i the ball B(z, r). The by the triagle iequality we have that d(c i, p) d(c i, z) d(z, p) β 4 Ci, ad similarly d(c i, p) d(c β i, z) + d(z, p) Ci. Sice our istace is β-distributed it holds that p belogs to Ci, ad from the defiitio of the ier rig of C i, it holds that p falls outside the ier rig. However, z is added to T because the ball B(z, r) cotais more tha s/2 Ci /2 may poits. So more tha half of the poits i Ci fall outside the ier rig of Ci, which cotradicts Fact 4.1. Assume ow (b) does ot hold. Recall that T is a coected compoet, so exists some path p 1 p 2. Each two cosecutive poits alog this path were coected because
6 their distace is at most β 2 [ β 2 β 4s Ci β 4 C i. As d(c i, p 1) < Ci ad d(c i, p 2) > 3β 4, there must exist a poit z alog the path whose distace from c i falls i the rage Ci, 3β 4 C ], i cotradictig (a). Claim 4.3. Let Ci be ay cheap cluster i the target clusterig. By stage s = Ci, the algorithm adds to Q a compoet T that cotais a poit from the ier rig of Ci. Proof: Suppose that up to the stage s = Ci the algorithm has ot iserted such a compoet ito Q. Now, it is possible that by stage s, the algorithm has iserted some compoet T to Q, s.t. some x i the ier rig of Ci is too close to some y T (amely, d(x, y) 2r), thus causig x to be removed from the istace. Assume for ow this is ot the case. This meas that the ier rig of cluster Ci still cotais more tha C i /2 poits. Also observe that all ier rig poits are of distace at most β 8 Ci from the ceter, so every pair of ier rig poits has a distace of at most β 4 Ci. Hece, whe we reach stage s = C i, ay ball of radius r = β 4s = β 4 Ci cetered at ay ier-rig poit, must cotai all other ier-rig poits. This meas that at stage s = Ci all ier rig poits are coected amog themselves, so they form a compoet (i fact, a clique) of size > s/2. Therefore, the algorithm iserts a ew compoet, cotaiig all ier rig poits. So, by stage s = Ci, oe of two thigs ca happe. Either the algorithm iserts a compoet that cotais some ier rig poit to Q, or the algorithm removes a ier rig poit due to some compoet T Q. If the former happes, we are doe. So let us prove by cotradictio that we caot have oly the latter. Let s Ci be the stage i which we throw away the first ier rig poit of the cluster Ci. At stage s the algorithm removes this ier rig poit x because there exists a poit y i some compoet T Q, s.t. d(x, y) 2r = β 2s, ad so d(c i, y) d(c β i, x) + d(x, y) 8 Ci + β 2s 5 β 8 Ci. This immediately implies that T caot be the ceter of a expesive cluster sice ay such poit will be at a distace at least β C from c i. Let s s Ci be the previous stage i which we added the compoet T to Q. As Lemma 4.2 applies to T, we deduce that d(c i, y) < β 2 Ci. Recall that T cotais > s /2 Ci /2 may poits, yet, by assumptio, cotais oe of the Ci /2 poits that reside i the ier rig of Ci. It follows from Fact 4.1 that some poit w T must belog to a differet cluster Cj. Sice the istace is β-distributed, we have that d(c β i, w) > Ci. The existece of both y ad w i T cotradicts part (b) of Lemma 4.2. We call a compoet T Q good if it cotais a ier rig poit of some cheap cluster Ci. A compoet is called bad if it is ot good ad is ot oe of the iitial ceters preset i Q iit. We ow discuss the properties of good compoets. Claim 4.4. Let T be a good compoet added to Q, cotaiig a ier rig poit from a cheap cluster Ci. (By Claim 4.3 we ow at least oe such T exists.) The: (a) all poits i T are of distace at most β 2 Ci from c i, (b) T B(T) is fully cotaied i Ci, ad (c) the etire ier rig of Ci is cotaied i T B(T), ad (d) o other compoet T Q, T T, cotais a ier rig poit from Ci. Proof: As we do ot ow (d) i advace, it might be the case that Q cotais may good compoets, all cotaiig a ier-rig poit from the same cluster, Ci. Out of these (potetially may) compoets, let T deote the first oe iserted to Q. Deote the stage i which T was iserted to Q as s. Due to the previous claim, we ow s Ci, ad so Lemma 4.2 applies to T. We show (a), (b), (c) ad (d) hold for T, ad deduce that T is the oly good compoet to cotai a ier rig poit from Ci. Part (a) follows immediately from Lemma 4.2. We ow T cotais some ier rig poit x from Ci, so d(c i, x) β 8 C i < β 2 C i, so we ow that ay y T must satisfy that d(c i, y) < β 2 Ci. Sice we ow ow (a) holds ad the istace is β-distributed, we have that T Ci, so we oly eed to show B(T) Ci. Fix ay y B(T). The poit y is assiged to B(T) (thus removed from the istace) because there exists some poit x T s.t. d(x, y) 2r. So agai, we have that d(c i, y) d(c β i, x) + d(x, y) Ci, which gives us that y Ci (sice the istace is β-distributed). We ow prove (c). Because of (b), we deduce that the umber of poits i T is at most Ci. However, i order for T to be added to Q, it must also hold that T > s/2. It follows that s < 2 Ci. Let x be a ier rig poit of C i that belogs to T. The the distace of ay other ier rig poit of Ci β ad x is at most = 2r. It follows 4 Ci < β 2s that ay ier rig poit of C i which is t added to T is assiged to B(T). Thus T B(T) cotais all ier-rig poits. Fially, observe that (d) follows immediately from the defiitio of a good compoet ad from (c). We ow show that i additio to havig all good compoets, we caot have too may bad compoets. Claim 4.5. We have less tha 16/(3β) bad compoets. Proof: Let T be a bad compoet, ad let s be the stage i which T was iserted to Q. Let y be ay poit i T, ad let C be the cluster to which y belogs i the optimal clusterig with ceter c. We show d(c, y) > 3β 8 s. We divide ito cases. Case 1: C is a expesive cluster. Note that we are worig uder the assumptio that Q iit cotais the correct ceters of the expesive clusters. I particular, Q iit cotais c. Also, the fact that poit y was ot throw out i stage s implies that d(c, y) > 2r = β 2s > 3β 8s. Case 2: C is a cheap cluster ad s C. We apply Lemma 4.2, ad deduce that either d(c, y) < β 2 C or
7 C 3β 4 s that d(c, y) > 3β 4. As the ier rig of C cotais > C /2 ad T cotais > s/2 C /2 may poits, oe of which is a ier rig poit, some poit w T does ot belog to C ad hece d(c, w) > β 3β 4 C C >. Part (b) of Lemma 4.2 assures us that all poits i T are also far from c. Case 3: C is a cheap cluster ad s < C. Usig Claim 4.3 we have that some good compoet cotaiig a poit x from the ier rig of C was already added to Q. So it must hold that d(x, y) > 2r, for otherwise we removed y from the istace ad it caot be added to ay T. We deduce that d(c, y) d(x, y) d(c, x) β 3β 8 s. 2s β 8 C > All poits i T have distace > 3β 8s from their respective ceters i the optimal clusterig, ad recall that T is added to Q because T cotais at least s/2 may poits. Therefore, the cotributio of all elemets i T to is at least 3β 16. It follows that we ca have o more tha 16/3β such bad compoets. We ca ow prove the correctess of our algorithm. Theorem 4.6. The algorithm outputs a -clusterig whose cost is o more tha (1 + ǫ). Proof: Usig Claim 4.4, it follows that there exists some choice of compoets, T 1,..., T, such that we have the ceter of every expesive cluster ad the good compoet correspodig to every cheap cluster C. Fix that choice. We show that for the optimal clusterig, replacig the true ceters {c 1, c 2,..., c } with the ceters {c 1, c 2,..., c } that the algorithm outputs, icreases the cost by at most a (1+ǫ) factor. This implies that usig the {c 1, c 2,..., c } as ceters must result i a clusterig with cost at most (1 + ǫ). Fix ay Ci i the optimal clusterig. Let i be the cost of this cluster. If Ci is a expesive cluster the we ow that its ceter c i is preset i the list of ceters chose. Hece, the cost paid by poits i Ci will be at most i. If Ci is a cheap cluster the deote by T the good compoet correspodig to it. We brea the cost of Ci ito two parts: i = x C d(x, c i i ) = x T B(T) d(x, c i ) + x Ci, yet x/ T B(T) d(x, c i ) ad compare it to the cost Ci usig c i, the poit piced by the algorithm to serve as ceter: x C d(x, c i i ) = x T B(T) d(x, c i) + x Ci, yet x/ T B(T) d(x, c i). Now, the first term is exactly the fuctio that is miimized by c i, as c i = arg mi p x T B(T) d(x, p). We also ow c i, the actual ceter of C i, resides i the ier rig, ad therefore, by Claim 4.4 must belog to T B(T). It follows that x T B(T) d(x, c i) x T B(T) d(x, c i ). We ow upper boud the 2d term, ad show that x Ci, yet x/ T B(T) d(x, c i) (1 + ǫ) x Ci, yet x/ T B(T) d(x, c i ) Ay poit x Ci, s.t. x / T B(T), must reside outside the ier rig of Ci. Therefore, d(x, c i ) > β 8 Ci. We show that d(c i, c i ) ǫ β 8 Ci, ad thus we have that d(x, c i ) d(x, c i ) + d(c i, c i) (1 + ǫ)d(x, c i ), which gives the required result. Note that thus far, we have oly used the fact that the cost of ay cheap cluster is proportioal to β/ Ci. Here is the first (ad the oly) time we use the fact that the cost is actually at most (ǫ/32) β/ Ci. Usig the Marov iequality, we have that the set of poits satisfyig {x; d(x, c i ) ǫ β/(16 C i )} cotais at least half of the poits i Ci, ad they all reside i the ier rig, thus belog to T B(T). Assume for the sae of cotradictio that d(c i, c i ) ǫ β 8 Ci. The at least half of the poits i Ci cotribute more tha ǫ β 16 Ci to the sum x T B(T) d(x, c i). It follows that this sum is more tha ǫ β 32 Ci i. However, c i is the poit that miimizes the sum x T B(T) d(x, p), ad by usig p = c i we have x T B(T) d(x, p) i. Cotradictio. B. Rutime aalysis A aive implemetatio of the 2d step of algorithm i Sectio 4-A taes O( 3 ) time (for every s ad every poit x, fid how may of the remaiig poits fall withi the ball of radius r aroud it). Fidig c i for all compoets taes O( 2 ) time, ad measurig the cost of the solutio usig a particular set of data poits as ceters taes O() time. Guessig the right compoets taes O(1/β) time. Overall, the ruig time of the algorithm i Figure 1 is O( 3 O(1/β) ). The geeral algorithm that brute-force guesses the ceters of all expesive clusters, maes O(1/βǫ) iteratios of the give algorithm, so its overall ruig time is O(1/βǫ) O(1/β). 5. A PTAS FOR ANY β-distributed EUCLIDEAN -MEANS INSTANCE Aalogous to the -media algorithm, we preset a essetially idetical algorithm for -meas i Euclidea space. Ideed, the fact that -meas cosiders distaces squared, maes upper (or lower) boudig distaces a bit more complicated, ad requires that we fiddle with the parameters of the algorithm. I additio, the ceters c i may ot be data poits. However, the overall approach remais the same. Roughly speaig, covertig the -media algorithm to the -meas case, we use the same costats, oly squared. 5 As before we hadle expesive clusters by guessig good substitutes for their ceters ad obtai good compoets for cheap clusters. Ofte, whe cosiderig the Euclidea space -meas problem, the dimesio of the space plays a importat factor. I cotrast, here we mae o assumptios about the dimesio, ad our results hold for ay poly() dimesio. I fact, for ease of expositio, we assume all distaces betwee ay two poits were computed i advace ad are give to our algorithm. Clearly, this oly adds O( 2 dim) 5 We stress that we made o attempt to optimize the costats.
8 to our rutime. I additio to the chage i parameters, we utilize the followig facts that hold for the ceter of mass i Euclidea space. Fact 5.1. Let U be a (fiite) set of poits i a Euclidea space, ad let µ U deote their ceter of mass (µ = 1 U x U x). Let A be a radom subset of U, ad deote by µ A the ceter of mass of A. The for ay δ < 1/2, we have both [ ] Pr µ U µ A 2 > 1 δ A 1 x µ U 2 < δ (1) U x U Pr [ x U x µ A 2 > (1 + 1 δ A ) x U x µ U 2 ] < δ Fact 5.2. Let U be a (fiite) set of poits i a Euclidea space, ad let A ad B be a partitio of U. Deote by µ U ad µ A the ceter of mass of U ad A resp. The µ U µ A 2 1 U x U x µ U 2 B A. Fact 5.2, prove i [18] (Lemma 2.2), allows us to upper boud the distace betwee the real ceter of a cluster ad the empirical ceter we get by averagig all poits i T B(T) for a good compoet T. Fact 5.1 allows us to hadle expesive clusters. Sice we caot brute force guess a ceter (as the ceter of the clusters are t ecessarily data poits), we guess a sample of O(β 1 + ǫ 1 ) poits from every expesive cluster, ad use their average as a ceter. Both properties of Fact 5.1, prove i [13] ( 3, Lemma 1 ad 2), assure us that the ceter is a adequate substitute for the real ceter ad is also close to it. This motivates the approach behid our first algorithm, i which we brute-force traverse all choices of O(ǫ 1 + β 1 ) poits for ay of the expesive clusters. The secod algorithm, whose rutime is ( log ) poly(1/ǫ,1/β) O( 3 ), replaces brute-force guessig with radom samplig. Ideed, if a cluster cotais poly(1/) fractio of the poits, the by radomly samplig O(ǫ 1 + β 1 ) poits, the probability that all poits belog to the same expesive cluster, ad furthermore, their average ca serve as a good empirical ceter, is at least 1/ poly(1/ǫ,1/β). I cotrast, if we have expesive clusters that cotai few poits (e.g. a expesive cluster of size, while = poly(log())), the radom samplig is uliely to fid good empirical ceters for them. However, recall that our algorithm collects poits ad deletes them from our istace. So, it is possible that i the middle of the ru, we are left with so few poits, so that expesive clusters whose size is small i compariso to the origial umber of poits, cotai a poly(1/) fractio of the remaiig poits. Ideed, this is the motivatio behid our secod algorithm. We ru the algorithm while iterleavig the Populatio Stage of the algorithm with radom samplig. Istead of ruig s from to 1, we use {, 2, 4, 6,...,1 } (2) as brea poits. Correspodigly, we defie l i to be the umber of expesive clusters whose size is i the rage [ 2i 2, 2i). Wheever s reaches such a 2i brea poit, we radomly sample poits i order to guess the l i+3 ceters of the clusters that lie 3 itervals ahead (ad so, iitially, we guess all ceters i the first 3 itervals). We prove that i every iterval we are liely to sample good empirical ceters. This is a simple corollary of Fact 5.2 alog with the followig two claims. First, we claim that at the ed of each iterval, the umber of poits remaiig is at most 2i+1. Secodly, we also claim that i each iterval we do ot remove eve a sigle poit from a cluster whose size is smaller tha 2i 6. We refer the reader to Appedix A for the algorithms ad their aalysis. 6. DISCUSSION AND OPEN PROBLEMS The algorithm we preset here for -media has rutime of poly( 1/β, 1/ǫ, ), ad the algorithm for -meas has rutime poly(, ( log ) 1/ǫ, ( log ) 1/β ). 6 We commet that it is uliely that we ca obtai a algorithm of rutime poly( 1/ǫ, 1/β, ). Observe that for ay clusterig istace ad ay > 1 we have that ( 1) > 1 + 1, simply by cosiderig the -clusterig that results from taig the optimal ( 1)-clusterig, ad settig the poit which is the furthest from its ceter i a cluster of its ow (as a ew ceter). Hece, ay -media/-meas istace is β- distributed for β = Ω( 1 ). Recall from Sectio 3-D the - media problem restricted oly to wealy-stable istaces has o FPTAS. So the fact that our algorithm s rutime has super-polyomial depedece i both 1/β ad 1/ǫ is uavoidable. Noetheless, oe might still hope to do better. I particular, oe major rutime expese of our algorithm comes from hadlig expesive clusters by brute-force guessig or samplig. Ca oe improve the rutime by doig somethig more clever for expesive clusters? It is worth otig that for the stability coditios of [4], Voevodsi et al. [20] develop a especially efficiet implemetatio with good performace (i terms of both accuracy ad speed) o real-world protei sequece datasets. A differet ope problem lies i the relatio to results of Ostrovsy et al. [18]. Their motivatig questio was to aalyze the performace of Lloyd-type methods over stable istaces. Is it possible that wea deletio-stability is sufficiet for some versio of the -meas heuristic to coverge to the optimal clusterig? Acowledgemets: This wor was supported i part by the Natioal Sciece Foudatio uder grat CCF REFERENCES [1] Sajeev Arora, Prabhaar Raghava, ad Satish Rao. Approximatio schemes for Euclidea -medias ad related problems. I STOC, Whe dealig with -meas i a Euclidea space of dimesio dim, we eed to explicitly compute the distaces, so we add 2 dim to the rutime.
9 [2] Vijay Arya, Navee Garg, Rohit Khadear, Adam Meyerso, Kamesh Muagala, ad Viayaa Padit. Local search heuristic for -media ad facility locatio problems. I STOC, [3] Mihai B ādoiu, Sariel Har-Peled, ad Piotr Idy. Approximate clusterig via core-sets. I STOC, pages , [4] Maria-Floria Balca, Avrim Blum, ad Aupam Gupta. Approximate clusterig without the approximatio. I SODA, [5] Maria-Floria Balca ad Mar Braverma. Fidig low error clusterigs. I COLT, [6] Maria-Floria Balca, Heio Rögli, ad Shag-Hua Teg. Agostic clusterig. I ALT, pages , [7] Moses Chariar, Sudipto Guha, Éva Tardos, ad David B. Shmoys. A costat-factor approximatio algorithm for the -media problem. I STOC, [8] Sajoy Dasgupta. The hardess of -meas clusterig. Techical report, Uiversity of Califoria at Sa Diego, [9] W. Feradez de la Vega, Mare Karpisi, Claire Keyo, ad Yuval Rabai. Approximatio schemes for clusterig problems. I STOC, [10] Michelle Effros ad Leoard J. Schulma. Determiistic clusterig with data ets. ECCC, (050), [11] Sudipto Guha ad Samir Khuller. Greedy stries bac: Improved facility locatio algorithms. I Joural of Algorithms, pages , [12] Sariel Har-Peled ad Soham Mazumdar. O coresets for - meas ad -media clusterig. I STOC, pages , [13] Mary Iaba, Naoi Katoh, ad Hiroshi Imai. Applicatios of weighted vorooi diagrams ad radomizatio to variacebased -clusterig: (exteded abstract). I Proc. 10th Symp. Comp. Geom., pages , [14] Kamal Jai, Mohammad Mahdia, ad Ami Saberi. A ew greedy approach for facility locatio problems (exteded abstract). I STOC, pages , [15] Tapas Kaugo, David M. Mout, Natha S. Netayahu, Christie D. Piato, Ruth Silverma, ad Agela Y. Wu. A local search approximatio algorithm for -meas clusterig. I Proc. 18th Symp. Comp. Geom., [16] Amit Kumar, Yogish Sabharwal, ad Sadeep Se. A simple liear time (1+ ǫ)-approximatio algorithm for -meas clusterig i ay dimesios. I FOCS, [17] R. Ostrovsy ad Y. Rabai. Polyomial time approximatio schemes for geometric -clusterig. I FOCS, [18] Rafail Ostrovsy, Yuval Rabai, Leoard J. Schulma, ad Chaitaya Swamy. The effectiveess of Lloyd-type methods for the -meas problem. I FOCS, pages , [19] F. Schaleamp, M. Yu, ad A. va Zuyle. Clusterig with or without the Approximatio. I COCOON, [20] Kostati Voevodsi, Maria Floria Balca, Heio Rogli, ShagHua Teg, ad Yu Xia. Efficiet clusterig with limited distace iformatio. I Proc. 26th UAI, APPENDIX We preset the algorithm for (1 + ǫ)-approximatio to the -meas optimum of a β-distributed istace. Much lie i Sectio 4, we call a cluster i the optimal -meas solutio cheap if i = x C d 2 (x, c i i ) βǫ 4. 6 A. Clusterig β-distributed Istaces of Euclidea -meas The algorithm is preseted i Figure 2. The correctess is proved i a similar fashio to the proof of correctess preseted i Sectio 4. First, observe that by the Marov { iequality, for ay cheap } cluster Ci, we have that the set x; d 2 (x, c i ) > t β Ci caot cotai more tha ǫ/(4 6 t) fractio of the { poits i Ci. It follows that the ier rig of Ci, the set x; d 2 (x, c i ) β 256 C }, i cotais at least half of the poits of Ci. As metioed Sectio 5 the algorithm populates the list Q with good compoets correspodig to cheap clusters. Also from Sectio 5, we ow that for every expesive cluster, there exists a sample of O( 1 β + 1 ǫ ) data poits whose ceter is a good substitute for the ceter of the cluster. I the aalysis below, we assume that Q has bee iitialized correctly with Q iit cotaiig these good substitutes. I geeral, the algorithm will be ru multiple times for all possible guesses of samples from expesive clusters. We start with the followig lemma which is similar to Lemma 4.2. Lemma A.1. Let T Q be ay compoet ad let s be the stage i which we isert T to Q. Let Ci be ay cheap cluster s.t. s Ci. The (a) T does ot cotai[ ay poit z s.t. the distace d 2 (c i, z) lies withi the rage β 16 Ci, β 4 C ], i ad (b) T caot cotai both a poit p 1 s.t. d 2 (c i, p 1) β 16 Ci ad a poit p 2 s.t. d 2 (c i, p 2) > β 4 Ci. Proof: Assume (a) does ot hold. Let z be such poit, ad let B(z, r) be the set of all poits p s.t. d 2 (z, p) r = β 64s β 64 Ci. As d2 (z, c i ) β 16 Ci, we have that d(z, p) 1 2 d(z, c i ). It follows that d2 (c i, p) (d(c i, z) d(z, p))2 (d(c i, z)/2)2 = β 64 Ci. Similarly, d 2 (c i, p) (d(c i, z) + d(z, p))2 (3d(c i, z)/2)2 9β 16 Ci. Thus B(z, r) is cotaied i C i ier-rig of Ci, yet cotais s/2 C i, but falls outside the /2 may poits. Cotradictio. Assume (b) does ot hold. Let p 1 ad p 2 the above metioed poits. As T is a coected compoets, it follows that alog the path p 1 p 2, exists a pairs of eighborig odes, x, y, s.t. d 2 (x, y) r β 64 C i yet d 2 (c i, x) β 16 Ci while d 2 (c i, y) β 4 Ci. However, a simple computatio gives that d 2 (c i, y) (3d(c i, x)/2)2 9β 64 C i. Cotradictio.
10 1) Iitializatio Stage: Set Q Q iit. 2) Populatio Stage: For s =, 1, 2,...,1 do: a) Set r = β 64s. b) Remove ay poit x such that d 2 (x, Q) < 4r. (Here, d(x, Q) = mi T Q;y T d(x, y).) c) For ay remaiig data poit x, deote the set of data poits whose distace squared from x is at most r, by B(x, r). Coect ay two remaiig poits a ad b if: (i) d 2 (a, b) r, (ii) B(a, r) > s 2 ad (iii) B(b, r) > s 2. d) Let T be a coected compoet of size > s 2. The: i) Add T to Q. (That is, Q Q {T }.) ii) Defie the set B(T) = {x : d 2 (x, y) 4r for some y T }. Remove the poits of B(T) from the istace. 3) Ceters-Retrievig Stage: For ay choice of compoets T 1, T 2,..., T out of Q a) Fid the best ceter c i for T i B(T i ). That is c i = µ(t i B(T i )) = Figure 2. 1 T i B(T i) x T i B(T i) x. b) Partitio all poits accordig to the earest poit amog the ceters of the curret compoets. c) If a clusterig of cost at most (1+ǫ) is foud output these ceters ad halt. A PTAS for β-distributed istaces of Euclidea -meas. Lemma A.1 allows us to give the aalogous claims to Claims 4.3 ad 4.4. As before, call a compoet T good if it is cotaied withi some target cluster Ci ad T B(T) cotais all of the ier rig poits of Ci. Otherwise, the compoet is called bad provided it is ot oe of the iitial ceters preset i Q iit. We ow show that each cheap target cluster will have a sigle, uique, good compoet. Claim A.2. Let Ci be ay cheap cluster i the target clusterig. By stage s = Ci, the algorithm adds to Q a compoet T that cotais a poit from the ier rig of Ci. Claim A.3. Let T be a good coected compoet added to Q, cotaiig a ier rig poit from cluster Ci. The: β (a) all poits i T are of distace squared at most 16 Ci from c i, (b) T B(T) is fully cotaied i C i, ad (c) the etire ier rig of Ci is cotaied i T B(T), ad (d) o other compoet T T i Q cotais a ier rig poit from Ci. As the proofs of Claims A.2 ad A.3 are idetical to the Claims 4.3 ad 4.4, we omit them. Lemma A.4. We do ot add to Q more tha 1000/β bad compoets. Proof: Cosider ay bad compoet T that we add to Q ad deote that stage i which we isert T to Q as s. So the size of this compoet is > s 2. Let y be a arbitrary poit from T which belogs to cluster C i the optimal clusterig. Let c be the ceter of C. We show that d 2 (c, y) > β 500s. We divide ito cases. Case 1: C is a cheap cluster ad s C. Recall that T must cotai s/2 C /2 poits, so it follows that T cotais some poit x that does ot belog to C. β-stability gives that this poit has distace d 2 (c, x) > β C, ad we apply Lemma A.1 to deduce that all poits i T are of C. distace squared of at least β 4 Case 2: C is a cheap cluster ad s < C. I this case we have that the etire ier rig of C already belogs to some T Q. Let x T be ay ier rig poit from C, ad we have that d(c, x) 2 β 256 C β 256s, while d2 (x, y) > β 16s. It follows that d2 (c, y) (3d(x, y)/4) 2 > β 500s. Case 3: C is a expesive cluster ad s > 2 C. We claim that d 2 (c, y) > β 32 C. If, by cotradictio, we have that d 2 (c, y) β 32 C, the we show that the ball B(y, r) cotais oly poits from Ci, yet it must cotais s/2 > Ci poits. This is because each p B(y, ( r) satisfies that d 2 (c, p) (d(c, y) + d(y, p)) 2 ) 2 β 32 C + β 16s < β C. Case 4: C is a expesive cluster ad s 2 C. I this case, from Fact 5.1 we ow that Q iit cotais a a good empirical ceter c for the expesive cluster C, i the sese that c c 2 β 512 C β 256s. The, similarly case 2 above we have d 2 (y, c ) (d(y, c) d(c, c )) 2 > β 500s. It follows that every poit i T has a large distace from its ceter. Therefore, the s/2 poits i this compoet cotribute at least β/1000 to the -meas cost. Hece, we ca have o more tha 1000/β such bad compoets. We ow prove the mai theorem. Theorem A.5. The algorithm outputs a -clusterig whose cost is at most (1 + ǫ). Proof: Usig Claim A.3, it follows that there exists some choice of compoets which has good compoets for all the cheap clusters ad good substitutes for the ceters of the expesive clusters. Fix that choice ad cosider a cluster Ci with ceter c i. If C i is a expesive cluster the from Sectio 5 we ow that Q iit cotais a poit c i i Ci. Hece, the cost paid by the such that d 2 (c i, c i ) βǫ β+ǫ poits i Ci will be atmost (1 + ǫ) i. If Ci is a cheap cluster the deote by T the good compoet that resides withi Ci. Deote T B(T) by A, ad C i \ A by B. Let
Stability yields a PTAS for k-median and k-means Clustering
Stability yields a PTAS for -Median and -Means Clustering Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cmu.edu Avrim Blum Carnegie Mellon University avrim@cs.cmu.edu Or Sheffet Carnegie Mellon
More informationThe isoperimetric problem on the hypercube
The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose
More informationcondition w i B i S maximum u i
ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility
More information1 Graph Sparsfication
CME 305: Discrete Mathematics ad Algorithms 1 Graph Sparsficatio I this sectio we discuss the approximatio of a graph G(V, E) by a sparse graph H(V, F ) o the same vertex set. I particular, we cosider
More informationLecture 1: Introduction and Strassen s Algorithm
5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access
More informationarxiv: v2 [cs.ds] 24 Mar 2018
Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves
More information6.854J / J Advanced Algorithms Fall 2008
MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms
More informationBig-O Analysis. Asymptotics
Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses
More information. Written in factored form it is easy to see that the roots are 2, 2, i,
CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or
More informationLecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein
068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig
More informationCIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)
CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig
More informationAn Efficient Algorithm for Graph Bisection of Triangularizations
A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu
More informationImproved Random Graph Isomorphism
Improved Radom Graph Isomorphism Tomek Czajka Gopal Paduraga Abstract Caoical labelig of a graph cosists of assigig a uique label to each vertex such that the labels are ivariat uder isomorphism. Such
More informationCSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University
CSCI 5090/7090- Machie Learig Sprig 018 Mehdi Allahyari Georgia Souther Uiversity Clusterig (slides borrowed from Tom Mitchell, Maria Floria Balca, Ali Borji, Ke Che) 1 Clusterig, Iformal Goals Goal: Automatically
More informationAn Efficient Algorithm for Graph Bisection of Triangularizations
Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe
More informationCounting the Number of Minimum Roman Dominating Functions of a Graph
Coutig the Number of Miimum Roma Domiatig Fuctios of a Graph SHI ZHENG ad KOH KHEE MENG, Natioal Uiversity of Sigapore We provide two algorithms coutig the umber of miimum Roma domiatig fuctios of a graph
More informationHash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative
More informationAlgorithms for Disk Covering Problems with the Most Points
Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi
More informationOnes Assignment Method for Solving Traveling Salesman Problem
Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:
More informationComputational Geometry
Computatioal Geometry Chapter 4 Liear programmig Duality Smallest eclosig disk O the Ageda Liear Programmig Slides courtesy of Craig Gotsma 4. 4. Liear Programmig - Example Defie: (amout amout cosumed
More informationBig-O Analysis. Asymptotics
Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses
More information15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #2: Randomized MST and MST Verification January 14, 2015
15-859E: Advaced Algorithms CMU, Sprig 2015 Lecture #2: Radomized MST ad MST Verificatio Jauary 14, 2015 Lecturer: Aupam Gupta Scribe: Yu Zhao 1 Prelimiaries I this lecture we are talkig about two cotets:
More informationXiaozhou (Steve) Li, Atri Rudra, Ram Swaminathan. HP Laboratories HPL Keyword(s): graph coloring; hardness of approximation
Flexible Colorig Xiaozhou (Steve) Li, Atri Rudra, Ram Swamiatha HP Laboratories HPL-2010-177 Keyword(s): graph colorig; hardess of approximatio Abstract: Motivated b y reliability cosideratios i data deduplicatio
More informationn n B. How many subsets of C are there of cardinality n. We are selecting elements for such a
4. [10] Usig a combiatorial argumet, prove that for 1: = 0 = Let A ad B be disjoit sets of cardiality each ad C = A B. How may subsets of C are there of cardiality. We are selectig elemets for such a subset
More informationRandom Graphs and Complex Networks T
Radom Graphs ad Complex Networks T-79.7003 Charalampos E. Tsourakakis Aalto Uiversity Lecture 3 7 September 013 Aoucemet Homework 1 is out, due i two weeks from ow. Exercises: Probabilistic iequalities
More informationLecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming
Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis
More informationAdministrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today
Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised
More informationOn (K t e)-saturated Graphs
Noame mauscript No. (will be iserted by the editor O (K t e-saturated Graphs Jessica Fuller Roald J. Gould the date of receipt ad acceptace should be iserted later Abstract Give a graph H, we say a graph
More informationGraphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU)
Graphs Miimum Spaig Trees Slides by Rose Hoberma (CMU) Problem: Layig Telephoe Wire Cetral office 2 Wirig: Naïve Approach Cetral office Expesive! 3 Wirig: Better Approach Cetral office Miimize the total
More informationPattern Recognition Systems Lab 1 Least Mean Squares
Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig
More informationGreedy Algorithms. Interval Scheduling. Greedy Algorithms. Interval scheduling. Greedy Algorithms. Interval Scheduling
Greedy Algorithms Greedy Algorithms Witer Paul Beame Hard to defie exactly but ca give geeral properties Solutio is built i small steps Decisios o how to build the solutio are made to maximize some criterio
More informationCIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13
CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis
More informationCombination Labelings Of Graphs
Applied Mathematics E-Notes, (0), - c ISSN 0-0 Available free at mirror sites of http://wwwmaththuedutw/ame/ Combiatio Labeligs Of Graphs Pak Chig Li y Received February 0 Abstract Suppose G = (V; E) is
More informationHomework 1 Solutions MA 522 Fall 2017
Homework 1 Solutios MA 5 Fall 017 1. Cosider the searchig problem: Iput A sequece of umbers A = [a 1,..., a ] ad a value v. Output A idex i such that v = A[i] or the special value NIL if v does ot appear
More informationNew Results on Energy of Graphs of Small Order
Global Joural of Pure ad Applied Mathematics. ISSN 0973-1768 Volume 13, Number 7 (2017), pp. 2837-2848 Research Idia Publicatios http://www.ripublicatio.com New Results o Eergy of Graphs of Small Order
More informationAssignment 5; Due Friday, February 10
Assigmet 5; Due Friday, February 10 17.9b The set X is just two circles joied at a poit, ad the set X is a grid i the plae, without the iteriors of the small squares. The picture below shows that the iteriors
More informationLecture 5. Counting Sort / Radix Sort
Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018
More informationCIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19
CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.
More informationPseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance
Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured
More informationFundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le
Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical
More information1.2 Binomial Coefficients and Subsets
1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =
More informationAlpha Individual Solutions MAΘ National Convention 2013
Alpha Idividual Solutios MAΘ Natioal Covetio 0 Aswers:. D. A. C 4. D 5. C 6. B 7. A 8. C 9. D 0. B. B. A. D 4. C 5. A 6. C 7. B 8. A 9. A 0. C. E. B. D 4. C 5. A 6. D 7. B 8. C 9. D 0. B TB. 570 TB. 5
More informationNumerical Methods Lecture 6 - Curve Fitting Techniques
Numerical Methods Lecture 6 - Curve Fittig Techiques Topics motivatio iterpolatio liear regressio higher order polyomial form expoetial form Curve fittig - motivatio For root fidig, we used a give fuctio
More informationData Structures and Algorithms. Analysis of Algorithms
Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output
More informationDATA STRUCTURES. amortized analysis binomial heaps Fibonacci heaps union-find. Data structures. Appetizer. Appetizer
Data structures DATA STRUCTURES Static problems. Give a iput, produce a output. Ex. Sortig, FFT, edit distace, shortest paths, MST, max-flow,... amortized aalysis biomial heaps Fiboacci heaps uio-fid Dyamic
More informationAnalysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis
Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems
More informationSymmetric Class 0 subgraphs of complete graphs
DIMACS Techical Report 0-0 November 0 Symmetric Class 0 subgraphs of complete graphs Vi de Silva Departmet of Mathematics Pomoa College Claremot, CA, USA Chaig Verbec, Jr. Becer Friedma Istitute Booth
More informationImage Segmentation EEE 508
Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.
More informationA New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method
A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro
More informationThe Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana
The Closest Lie to a Data Set i the Plae David Gurey Southeaster Louisiaa Uiversity Hammod, Louisiaa ABSTRACT This paper looks at three differet measures of distace betwee a lie ad a data set i the plae:
More informationThe Adjacency Matrix and The nth Eigenvalue
Spectral Graph Theory Lecture 3 The Adjacecy Matrix ad The th Eigevalue Daiel A. Spielma September 5, 2012 3.1 About these otes These otes are ot ecessarily a accurate represetatio of what happeed i class.
More informationSorting in Linear Time. Data Structures and Algorithms Andrei Bulatov
Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio
More informationRunning Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments
Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.
More informationAnalysis of Algorithms
Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The
More informationProject 2.5 Improved Euler Implementation
Project 2.5 Improved Euler Implemetatio Figure 2.5.10 i the text lists TI-85 ad BASIC programs implemetig the improved Euler method to approximate the solutio of the iitial value problem dy dx = x+ y,
More informationIntro to Scientific Computing: Solutions
Itro to Scietific Computig: Solutios Dr. David M. Goulet. How may steps does it take to separate 3 objects ito groups of 4? We start with 5 objects ad apply 3 steps of the algorithm to reduce the pile
More information15 UNSUPERVISED LEARNING
15 UNSUPERVISED LEARNING [My father] advised me to sit every few moths i my readig chair for a etire eveig, close my eyes ad try to thik of ew problems to solve. I took his advice very seriously ad have
More informationImproving Information Retrieval System Security via an Optimal Maximal Coding Scheme
Improvig Iformatio Retrieval System Security via a Optimal Maximal Codig Scheme Dogyag Log Departmet of Computer Sciece, City Uiversity of Hog Kog, 8 Tat Chee Aveue Kowloo, Hog Kog SAR, PRC dylog@cs.cityu.edu.hk
More informationPerhaps the method will give that for every e > U f() > p - 3/+e There is o o-trivial upper boud for f() ad ot eve f() < Z - e. seems to be kow, where
ON MAXIMUM CHORDAL SUBGRAPH * Paul Erdos Mathematical Istitute of the Hugaria Academy of Scieces ad Reu Laskar Clemso Uiversity 1. Let G() deote a udirected graph, with vertices ad V(G) deote the vertex
More informationConvergence results for conditional expectations
Beroulli 11(4), 2005, 737 745 Covergece results for coditioal expectatios IRENE CRIMALDI 1 ad LUCA PRATELLI 2 1 Departmet of Mathematics, Uiversity of Bologa, Piazza di Porta Sa Doato 5, 40126 Bologa,
More informationOn Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract
O Ifiite Groups that are Isomorphic to its Proper Ifiite Subgroup Jaymar Talledo Baliho Abstract Two groups are isomorphic if there exists a isomorphism betwee them Lagrage Theorem states that the order
More informationCS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1
CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()
More informationHow do we evaluate algorithms?
F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:
More informationCSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)
CSC165H1, Witer 018 Learig Objectives By the ed of this worksheet, you will: Aalyse the ruig time of fuctios cotaiig ested loops. 1. Nested loop variatios. Each of the followig fuctios takes as iput a
More informationLecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions
U.C. Berkeley CS170 : Algorithms Midterm 1 Solutios Lecturers: Sajam Garg ad Prasad Raghavedra Feb 1, 017 Midterm 1 Solutios 1. (4 poits) For the directed graph below, fid all the strogly coected compoets
More information3D Model Retrieval Method Based on Sample Prediction
20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer
More informationRunning Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments
Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The
More informationUniversity of Waterloo Department of Electrical and Computer Engineering ECE 250 Algorithms and Data Structures
Uiversity of Waterloo Departmet of Electrical ad Computer Egieerig ECE 250 Algorithms ad Data Structures Midterm Examiatio ( pages) Istructor: Douglas Harder February 7, 2004 7:30-9:00 Name (last, first)
More informationSome non-existence results on Leech trees
Some o-existece results o Leech trees László A.Székely Hua Wag Yog Zhag Uiversity of South Carolia This paper is dedicated to the memory of Domiique de Cae, who itroduced LAS to Leech trees.. Abstract
More informationLecture 2: Spectra of Graphs
Spectral Graph Theory ad Applicatios WS 20/202 Lecture 2: Spectra of Graphs Lecturer: Thomas Sauerwald & He Su Our goal is to use the properties of the adjacecy/laplacia matrix of graphs to first uderstad
More informationMatrix Partitions of Split Graphs
Matrix Partitios of Split Graphs Tomás Feder, Pavol Hell, Ore Shklarsky Abstract arxiv:1306.1967v2 [cs.dm] 20 Ju 2013 Matrix partitio problems geeralize a umber of atural graph partitio problems, ad have
More informationThompson s Group F (p + 1) is not Minimally Almost Convex
Thompso s Group F (p + ) is ot Miimally Almost Covex Claire Wladis Thompso s Group F (p + ). A Descriptio of F (p + ) Thompso s group F (p + ) ca be defied as the group of piecewiseliear orietatio-preservig
More informationThe golden search method: Question 1
1. Golde Sectio Search for the Mode of a Fuctio The golde search method: Questio 1 Suppose the last pair of poits at which we have a fuctio evaluatio is x(), y(). The accordig to the method, If f(x())
More informationModule 8-7: Pascal s Triangle and the Binomial Theorem
Module 8-7: Pascal s Triagle ad the Biomial Theorem Gregory V. Bard April 5, 017 A Note about Notatio Just to recall, all of the followig mea the same thig: ( 7 7C 4 C4 7 7C4 5 4 ad they are (all proouced
More informationLecture 18. Optimization in n dimensions
Lecture 8 Optimizatio i dimesios Itroductio We ow cosider the problem of miimizig a sigle scalar fuctio of variables, f x, where x=[ x, x,, x ]T. The D case ca be visualized as fidig the lowest poit of
More informationAlgorithms Chapter 3 Growth of Functions
Algorithms Chapter 3 Growth of Fuctios Istructor: Chig Chi Li 林清池助理教授 chigchi.li@gmail.com Departmet of Computer Sciece ad Egieerig Natioal Taiwa Ocea Uiversity Outlie Asymptotic otatio Stadard otatios
More informationCHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs
CHAPTER IV: GRAPH THEORY Sectio : Itroductio to Graphs Sice this class is called Number-Theoretic ad Discrete Structures, it would be a crime to oly focus o umber theory regardless how woderful those topics
More informationCharacterizing graphs of maximum principal ratio
Characterizig graphs of maximum pricipal ratio Michael Tait ad Josh Tobi November 9, 05 Abstract The pricipal ratio of a coected graph, deoted γg, is the ratio of the maximum ad miimum etries of its first
More informationAnalysis of Algorithms
Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite
More informationCS 683: Advanced Design and Analysis of Algorithms
CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,
More informationAnalysis of Documents Clustering Using Sampled Agglomerative Technique
Aalysis of Documets Clusterig Usig Sampled Agglomerative Techique Omar H. Karam, Ahmed M. Hamad, ad Sheri M. Moussa Abstract I this paper a clusterig algorithm for documets is proposed that adapts a samplig-based
More informationMinimum Spanning Trees
Presetatio for use with the textbook, lgorithm esig ad pplicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 0 Miimum Spaig Trees 0 Goodrich ad Tamassia Miimum Spaig Trees pplicatio: oectig a Network Suppose
More informationSD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.
SD vs. SD + Oe of the most importat uses of sample statistics is to estimate the correspodig populatio parameters. The mea of a represetative sample is a good estimate of the mea of the populatio that
More informationOn Alliance Partitions and Bisection Width for Planar Graphs
Joural of Graph Algorithms ad Applicatios http://jgaa.ifo/ vol. 17, o. 6, pp. 599 614 (013) DOI: 10.7155/jgaa.00307 O Alliace Partitios ad Bisectio Width for Plaar Graphs Marti Olse 1 Morte Revsbæk 1 AU
More informationMinimum Spanning Trees. Application: Connecting a Network
Miimum Spaig Tree // : Presetatio for use with the textbook, lgorithm esig ad pplicatios, by M. T. oodrich ad R. Tamassia, Wiley, Miimum Spaig Trees oodrich ad Tamassia Miimum Spaig Trees pplicatio: oectig
More informationMathematical Stat I: solutions of homework 1
Mathematical Stat I: solutios of homework Name: Studet Id N:. Suppose we tur over cards simultaeously from two well shuffled decks of ordiary playig cards. We say we obtai a exact match o a particular
More informationExamples and Applications of Binary Search
Toy Gog ITEE Uiersity of Queeslad I the secod lecture last week we studied the biary search algorithm that soles the problem of determiig if a particular alue appears i a sorted list of iteger or ot. We
More informationSolution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring, Instructions:
CS 604 Data Structures Midterm Sprig, 00 VIRG INIA POLYTECHNIC INSTITUTE AND STATE U T PROSI M UNI VERSI TY Istructios: Prit your ame i the space provided below. This examiatio is closed book ad closed
More informationAverage Connectivity and Average Edge-connectivity in Graphs
Average Coectivity ad Average Edge-coectivity i Graphs Jaehoo Kim, Suil O July 1, 01 Abstract Coectivity ad edge-coectivity of a graph measure the difficulty of breakig the graph apart, but they are very
More informationOn Nonblocking Folded-Clos Networks in Computer Communication Environments
O Noblockig Folded-Clos Networks i Computer Commuicatio Eviromets Xi Yua Departmet of Computer Sciece, Florida State Uiversity, Tallahassee, FL 3306 xyua@cs.fsu.edu Abstract Folded-Clos etworks, also referred
More informationWhat are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs
What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure
More informationEvaluation scheme for Tracking in AMI
A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:
More informationOctahedral Graph Scaling
Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of
More informationThe Magma Database file formats
The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,
More informationRelationship between augmented eccentric connectivity index and some other graph invariants
Iteratioal Joural of Advaced Mathematical Scieces, () (03) 6-3 Sciece Publishig Corporatio wwwsciecepubcocom/idexphp/ijams Relatioship betwee augmeted eccetric coectivity idex ad some other graph ivariats
More informationc-dominating Sets for Families of Graphs
c-domiatig Sets for Families of Graphs Kelsie Syder Mathematics Uiversity of Mary Washigto April 6, 011 1 Abstract The topic of domiatio i graphs has a rich history, begiig with chess ethusiasts i the
More informationAnalysis of Algorithms
Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms
More informationA Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions
Proceedigs of the 10th WSEAS Iteratioal Coferece o APPLIED MATHEMATICS, Dallas, Texas, USA, November 1-3, 2006 316 A Geeralized Set Theoretic Approach for Time ad Space Complexity Aalysis of Algorithms
More informationSpanning Maximal Planar Subgraphs of Random Graphs
Spaig Maximal Plaar Subgraphs of Radom Graphs 6. Bollobiis* Departmet of Mathematics, Louisiaa State Uiversity, Bato Rouge, LA 70803 A. M. Frieze? Departmet of Mathematics, Caregie-Mello Uiversity, Pittsburgh,
More informationThroughput-Delay Scaling in Wireless Networks with Constant-Size Packets
Throughput-Delay Scalig i Wireless Networks with Costat-Size Packets Abbas El Gamal, James Mamme, Balaji Prabhakar, Devavrat Shah Departmets of EE ad CS Staford Uiversity, CA 94305 Email: {abbas, jmamme,
More information