The underigned hereby recommend to the Faculty of Graduate Studie and Reearch aceeptance of the thei, Two Topic in Applied Algorithmic ubmitted by Pat

Size: px

Start display at page:

Download "The underigned hereby recommend to the Faculty of Graduate Studie and Reearch aceeptance of the thei, Two Topic in Applied Algorithmic ubmitted by Pat"

Trevor Johnson
5 years ago
Views:

1 Two Topic in Applied Algorithmic By Patrick R. Morin A thei ubmitted to the Faculty of Graduate Studie and Reearch in partial fullment of the requirement for the degree of Mater of Computer Science Ottawa-Carleton Intitute for Computer Science School of Computer Science Carleton Univerity Ottawa, Ontario January 1998 c Copyright 1998, Patrick R. Morin

2 The underigned hereby recommend to the Faculty of Graduate Studie and Reearch aceeptance of the thei, Two Topic in Applied Algorithmic ubmitted by Patrick R. Morin Dr. Evangelo Kranaki (Director, School of Computer Science) Dr. Jorg-Rudiger Sack (Thei Supervior) Carleton Univerity January 1998 ii

3 Abtract Thi thei examine two largely unrelated problem in applied algorithmic, motivated by the earch for ecient geometric algorithm. In the rt part of the thei, we conider the problem of nding ecient parallel algorithm for heterogeneou parallel computer, i.e., parallel computer in which dierent proceor have dierent computational potential. To thi end, we dene a formal computational model for heterogeneou ytem and develop algorithm for commonly ued communication operation. The reult i that many exiting parallel algorithm which ue thee communication operation can be adapted to our model with little or no modication. In the econd part of the thei we conider the problem of geometric model which allow for varying level of detail. To thi end, we extend the progreive meh repreentation introduced by Hoppe. The main technical contribution of thi part i an ecient cheme for rening only elected region of a progreive meh. Uing thi cheme, we develop important application in the eld of computational geometry and geographic information ytem. Both part of the thei are upported by experimental reult which how that our algorithm are of coniderable practical relevance. iii

4 Acknowledgement A number of people have been intrumental in the completion of thi thei, and I would like to acknowledge them here. While writing thi thei I received funding from the Natural Science and Engineering Reearch Council of Canada and Almerco. It goe without aying that thi wa greatly appreciated. Jorg Sack ha proven to be an excellent upervior. There are many reaon for thi, but the mot important of thee i hi dedication to, and willingne to work for, hi tudent. Anil Mahehwari ha provided input on many apect of thi thei. It i becaue of hi comment and quetion that many of the idea in thi thei are developed to the extent that they are. While writing part of thi thei, I wa lucky to have been a viitor at the Heinz Nixdorf Intitut at the Univerat GH-Paderborn. During thi time, Ben Juurlink gave comment which ignicantly improved the preentation of the rt part of thi thei. Finally I would like to thank Silvia Gotz for having upported me throughout the writing of thi thei, both a a colleague and cloe friend. For thi I am epecially grateful. iv

5 Content Abtract Acknowledgement iii iv I Coare Grained Parallel Computing on Heterogeneou Sytem 1 1 Introduction Comparion with Related Work Coare Grained Parallel Computing Model Preliminarie The Bulk Synchronou Parallel Model The Coare Grained Multicomputer Model A Simple Example: Prex Sum A Heterogeneou Computing Model The Heterogeneou Coare Grained Multicomputer A Simple Example: Prex Sum Communication Pattern Random-Sample Random-Aign Linear-Partition v

6 4.4 PRAM-Simulation Circulate HCGM Algorithm Priority Queue Operation Lower Envelope Lit Ranking Matrix Multiplication Empirical Reult Concluion 45 II Multireolution Surface Modeling 48 7 Introduction 49 8 Survey of Exiting Work Tree-Baed Scheme DAG-Baed Scheme The Progreive Meh Repreentation Selective Renement of Progreive Mehe Computing the Region of Inuence Retrieving the Vertex Split Sorting and Applying the Vertex Split Analyi and Comment Extenion and Implementation Note Dealing with Miing Neighbour Invalid Triangle in Query Region Empirical Reult Application Point Location vi

7 10.2 Ioline Extraction Viibility Querie External Memory Progreive Mehe Concluion 80 Bibliography 82 A High Probability Bound 89 A.1 The Binomial Ditribution A.2 Cherno Bound vii

8 Lit of Figure 1.1 Performance of orting algorithm with low proceor An example h-relation Computing the Prex Sum An example of an hcgm(m; p; ) The Lower Envelope of a et of line egment Example execution of LowerEnvelope algorithm An example of the lit ranking algorithm of Cacere et. al Partitioning matrice A, B, and C in a 4 proceor ytem Performance of cgm and hcgm verion of Sample Sort Performance of cgm and hcgm verion of Floyd-Warhall algorithm An example of a triangle meh A triangle terrain at 3 dierent level of reolution (left) and it' correponding haded image (right) An example of a tree-baed hierarchy An example of a DAG-baed hierarchy Some poible tranformation in a MultiTriangulation The edge collape tranformation and it invere, the vertex plit A equence of vertex plit and it aociated dependency graph An example of the rank and range numbering An example of the cae where vl 0 = vr viii

9 9.4 An example in which a triangle not in M interect q Performance of elective renement algorithm for medium and large query region Performance elective renement algorithm for mall query region Expreing a point location query a a elective renement query Expreing an ioline query a a elective renement query Viibility on a TIN ix

10 Part I Coare Grained Parallel Computing on Heterogeneou Sytem 1

11 Chapter 1 Introduction In recent year, parallel computing ha been increaing in popularity. Individual with limited budget can now build worktation cluter from o-the-helf proceing component and interconnection network [11, 52]. High peed network are being ued to interconnect traditional upercomputer in order to direct large amount of computing power at Grand Challenge problem [8]. Even traditional upercomputer uually conit of a very fat worktation hot connected to a number of lower inthe-box proceor. The three ituation above, which cover nearly all modern parallel computing ytem, are all potential example of heterogeneou ytem, i.e., ytem in which dierent proceor have dierent computational potential. In the cae of worktation cluter, the proceing component may be dierent becaue the ytem wa grown incrementally and newly added proceor are more modern than the original. The ame may be true in the cae of upercomputer cluter, or the upercomputer may have even come from dierent manufacturer. Finally, in the cae of traditional upercomputer, it may be benecial to ue the hot proceor, particularly for equential portion of computation. Traditionally, there have been two approache to dealing with the varying proceor peed in uch ytem. The rt and implet approach, which we call the otrich approach i to imply ignore the dierence in proceor peed and ue tandard parallel algorithm. In many cae, thi lead to the lowet proceor becoming 2

12 CHAPTER 1. INTRODUCTION Speed (item/ec) Number of Proceor Figure 1.1: Performance of orting algorithm with low proceor a bottleneck, and eectively reduce performance to that of a machine in which all proceor are equally low. Thi can reult in decreaed performance when low proceor are added to a ytem. Figure 1.1 how an example of a orting algorithm in which the overall performance of a ytem decreae with the addition of low proceor. The rt even proceor are fat proceor, while all other proceor are low proceor. Important to note i the decreae in performance when the rt low proceor i added to the ytem. The econd approach, which we call the overpartitioning approach i to break the problem into mall ubproblem, o that there are many more ubproblem than proceor, and aign ubproblem to proceor whenever they become idle, either by having a mater proceor aign all ubproblem, or by having proceor requet ubproblem from other proceor when they become idle. Thi approach alo ha

13 CHAPTER 1. INTRODUCTION 4 it diadvantage. Decompoing the problem and merging the olution to ubproblem i not alway eay, nor i coordinating the proceor, and thee tak have an overhead aociated with them. Even wore, becaue of the high latency of communication network, many proceor cycle are wated waiting for the network to deliver ubproblem. In mot cae, a healthy doe of performance teting, algorithm analyi, and common ene i required to determine the optimum ubproblem ize, and thi procedure mut be repeated when the ytem conguration change. The approach taken in thi thei i to modify fat parallel algorithm which have been hown to be ecient in homogeneou ytem to run eciently on heterogeneou ytem. The cla of algorithm we chooe a our tarting point i the cla of coare grained parallel (CGP) algorithm. Example of uch algorithm include algorithm for the bulk ynchronou parallel (bp) [61], Coare Grained Multicomputer (cgm) [21], and LogP [15] model of parallel computation. In thee model a parallel computer i compoed of p proceor and i being ued to olve a problem of ize n, where p n. The baic communication operation i the h-relation, an all-to-all communication operation in which no proceor i the ource or detination of more than h word. Algorithm baed on thee model work in upertep, where a upertep conit of local computation, followed by global communication (routing an h-relation). The goal of algorithm deign i to imultaneouly minimize communication and computation. The heterogeneou network decribed above preent a problem for tandard CGP algorithm, ince the low proceor in the network become a bottleneck for the computation. Thi i due to the fact that CGP algorithm are deigned to ditribute computation load evenly acro proceor. However, through careful modication, thee algorithm can be made to ditribute computation load according to proceor peed without acricing eciency. Thi approach ha the obviou advantage over the otrich approach that it balance the computation according to proceor peed and therefore improve performance (Chapter 5 bear thi out with empirical evidence). Thi approach ha two advantage over the overpartitioning approach. The rt i that it minimize the eect of latency (the algorithm decribed in Chapter 5 perform only a contant

14 CHAPTER 1. INTRODUCTION 5 number of communication operation). The econd i that it doen't require extenive teting and meaurement to determine optimum algorithm parameter. In fact, the only parameter ued by the algorithm are the proceor peed. The main contribution of thi part of the thei are the following: 1. The denition of a parallel computation model called the heterogeneou coare grained multicomputer (hcgm) which take into account varying proceor peed The model i imple enough to be eay to ue, accurate enough to allow for the development of ecient algorithm, and portable enough to allow thee algorithm to run eciently on a wide variety of parallel architecture. 2. The identication of a number of communication pattern mot commonly ued in CGP algorithm and ecient hcgm algorithm for their implementation Thee algorithm form the bai for tranlating exiting CGP algorithm into hcgm algorithm. 3. A number of algorithm for the hcgm model Thee algorithm are arrived at by decribing exiting cgm and bp algorithm in term of the previouly mentioned communication pattern. 4. An implementation of thee idea The implementation conit of a library of the previouly mentioned communication pattern and ome algorithm. 1.1 Comparion with Related Work In order to dierentiate thi work from other reearch on heterogeneou parallel computing, thi ection compare and contrat thi work with previou work in thi area. The topic of data partitioning in heterogeneou ytem with imple xed communication pattern i addreed in [14, 51], and emi-automatic method of chooing the bet partitioning cheme and parameter are decribed. Method for the compile time cheduling of variou type of parallel loop are decribed in [12]. The reult

15 CHAPTER 1. INTRODUCTION 6 in thi thei go beyond thee in that the problem addreed have much le tructure than imple tenciling operation on 2D grid or uniform parallel loop whoe communication pattern can be analyzed at compile time. In Chapter 5 algorithm are preented for orting, median nding, and a number of computational geometry problem. Method for dynamic load balancing uch a thoe decribed in [54, 45, 64] can alo be applied to heterogeneou ytem. All thee method fall into the overpatitioning trategy category. The advantage of our trategy over uch overpartitioning trategie have been decribed above. Thee are the minimization of the eect latency and implicity of the algorithm parameter. In [66] a mathematical model of a network of worktation i decribed. In [65], the author decribe a tochatic performance prediction methodology for thi model baed on the tak graph of the parallel application. Although thi model i an accurate predictor of performance, it i not clear that the model lead to the development of ecient algorithm. In fact, in the matrix multiplication tet decribed in [65], a 12 proceor conguration actually perform wore than a 2 proceor conguration. The dierence between the model in [66, 65] and the hcgm model i that the hcgm model i not intended to predict exact running time of parallel algorithm on parallel machine. Rather, it i deigned to ditinguih between \good" and \bad" algorithm, i.e., if the model ay that algorithm A i better than algorithm B, then A hould perform better than B when implemented. Thi make the hcgm model impler, which in turn lead to a much impler algorithm analyi procedure.

16 Chapter 2 Coare Grained Parallel Computing Model In thi chapter, two coare grained parallel computing model are decribed. However, before beginning with a decription of thee model we ugget ome criteria which any coare parallel computing model hould trive for. 1. Simplicity. Any model hould be imple enough that analye under the model are imple to derive and undertand. 2. Accuracy. The model hould be an accurate reection of real life in that algorithm which are ecient according to the model hould perform well when implemented on actual ytem. 3. Portability. The model hould repreent a wide range of parallel architecture. Clearly, ome of thee goal are in conict with each other. On one ide i the goal of accuracy which eemingly require a complex parameterized model, and on the other ide i the goal of implicity which require a model with few parameter. The goal of portability i orthogonal to thee two in that it can be achieved by a imple very abtract model through the model' abtractne, or by a complex highly parameterized model through the heer number of parameter. The two model 7

17 CHAPTER 2. COARSE GRAINED PARALLEL COMPUTING MODELS 8 P 0 P P P Figure 2.1: An example h-relation with h = 4 and p = 4. preented in thi ection repreent a tradeo between thee two ide which achieve thee goal to varying degree. 2.1 Preliminarie The two model preented in thi chapter have ome feature and terminology in common. We take the time to introduce thee here. In both thee model computation proceed in upertep coniting entirely of local computation, or entirely of global communication. Thee are repectively referred to a computation upertep and communication upertep. A parallel computation interleave the two type of upertep, and all proceor mut have completed the previou computation upertep before a communication upertep can proceed. The complexity of a computation upertep i meaured by the maximum amount of computation done by any proceor during the upertep. Thu the \lowet" 1 proceor dominate the computation time of the upertep. The only communication operation upported by thee two model i the h-relation, an all-to-all communication operation in which each proceor end at mot h word of data and receive at mot h word of data. An example h-relation i illutrated in Figure 2.1. When dicuing the performance of algorithm, we will often make ue of a coare 1 In thi cae \lowet" mean the proceor which take the longet to complete it computation.

18 CHAPTER 2. COARSE GRAINED PARALLEL COMPUTING MODELS 9 grained aumption, n p, i.e., the ize of the problem i ignicantly larger than the number of proceor. p n p. One poible interpretation of \ignicantly larger" i Thi lead to the following implication which can be arrived at through imple algebraic manipulation. p n p, p 2 n, p p n ) p n n p The main advantage of the coare grained aumption i that it i often poible to take a problem of ize n, \compre" it to produce a problem of ize p, and ue the olution of compreed problem to olve the original problem. The coare grained aumption help here ince the problem of ize p i mall enough to be olved equentially on a ingle proceor. Thi idea will become more concrete in Section 2.4 where it will be applied to the problem of computing the prex um of n element. 2.2 The Bulk Synchronou Parallel Model The bulk ynchronou parallel (bp) computer, dened by Valiant in [61] conit of a et of identical proceing element, each with their own (non-hared) local memory, interconnected by a communication network capable of routing an h-relation for any value of h. A bp computer i characterized by three parameter: p, the number of proceor, g, the ratio of proceor peed to the bandwidth of the communication network, and L, the minimum time between computation upertep. The g and L parameter are ued to meaure the cot of communication of a bp algorithm. The g parameter penalize algorithm for their ue of communication bandwidth. The L parameter repreent network latency, and penalize algorithm for the number of communication operation ued. Routing a ingle h-relation on a bp computer incur a communication cot of O(gh + L). The performance of a bp algorithm i given by it communication and computation cot. A an example, conider a bp algorithm which ha communication

19 CHAPTER 2. COARSE GRAINED PARALLEL COMPUTING MODELS 10 and computation upertep. Aume that each communication upertep conit of an h-relationwith h = n p, and that each computation upertep ue O( n p ) local computation per proceor. Then the running time of the algorithm i given by O( n + gn + L). The rt term in thi expreion repreent the computation time, p p while the econd two repreent the communication time. 2.3 The Coare Grained Multicomputer Model The coare grained multicomputer (cgm) model wa introduced by Dehne et. al. [21], and a number of algorithm have been dened with repect to thi model [9, 10, 20, 21, 22, 23]. A coare grained multicomputer, cgm(m; p), conit of p identical proceor, labelled P 0 ; : : : ; P p?1, each with ( m ) local random acce memory. Thee proceor p are interconnected by a communication network capable of routing an h-relation with h = O( m p ). The performance of a cgm algorithm i meaured in term of the amount of local computation performed and the number of upertep. Both of thee quantitie can be function of n and p. The following obervation (a verion of which appear in [10]) relate cgm algorithm to bp algorithm: Obervation 1. Any cgm(m; p) algorithm which ue O() upertep and O(T (n; p)) computation time i alo a bp algorithm with running time O(T (n; p) + gm p + L). In the cgm model, algorithm are claied baed on, the number of upertep in the algorithm. The three clae, in order of deirability, are a follow: 1. = O(1) 2. = O(f(p)), with performance degrading a f increae. 3. = O(f(n)), with performance degrading a f increae.

20 CHAPTER 2. COARSE GRAINED PARALLEL COMPUTING MODELS Step Step 2 and 3 Step Figure 2.2: Computing the prex um of 20 element uing 4 proceor. 2.4 A Simple Example: Prex Sum Thi ection decribe a coare grained parallel algorithm for one of the mot common operation in parallel algorithm: computing the prex um of n element. algorithm i analyzed both in the cgm and bp model. The prex um of n element x 0 ; x 1 ; : : : ; x n?1 i dened a the equence 0 ; : : : ; n?1, where i = x 0 x 1 x i and i any aociative operator. One method of computing the prex um of n element on a coare grained parallel computer i to rt compute locally the prex um of the element tored at each proceor and then merge thee value with other proceor uing the tandard erew-pram prex um algorithm (ee e.g. [41]). Uing thi approach, one can devie an algorithm that ue O(log p) upertep and O( n ) local computation per proceor. p However, a more ecient algorithm can be obtained for a retricted range of parameter by uing a coare grained aumption. The algorithm proceed a follow, and an example of it execution i given in Figure 2.2. The

21 CHAPTER 2. COARSE GRAINED PARALLEL COMPUTING MODELS 12 Prefix-Sum() 1. Each proceor locally compute the prex um of it n p input element. 2. Each proceor, P i, broadcat the total um of it input element to all P j, where i < j p? 1, in a ingle communication upertep. 3. Each proceor, P i, compute the um of the (at mot p? 1) input element received in Step Each proceor compute it nal portion of the prex um by adding the value computed in Step 3 to each of the value computed in Step 1. The analyi of thi algorithm under the cgm(n; p) model i quite imple. Step 1 and 4 ue O( n ) computation time. Step 2 can be implemented a an h-relation with p h = p 2 O( n ). Step 3 ue O(p) O( n ) computation time. Therefore the algorithm p p ue O( n ) computation time and O(1) upertep on a cgm(n; p). p By applying Obervation 1 one can derive a bp analyi of thi algorithm of O( n + gn + L). Although thi analyi i correct, it i not a tight a poible. A p p tighter analyi can be obtained a follow: Step 1, Step 3, and Step 4, ue O( n ) p local computation per proceor. Step 2 i an h-relation with h = p, and therefore take O(gp + L) time, yielding an overall running time of O( n + gp + L) for the entire p algorithm. Thee analye point out the relative advantage of thee two model. The main advantage of the cgm model i it implicity; the expreion derived for running time of algorithm are imple to derive and undertand. Analye under the bp model on the other hand, are uually more complicated. Often bp algorithm have retriction on g and L a well a n and p, and which make thee algorithm le portable than cgm algorithm. The main advantage of the bp model i that it reward algorithm which are bandwidth ecient via the g parameter, wherea the cgm model doe not, making the bp model more accurate, and poibly leading to algorithm which make more ecient ue of bandwidth. From the point of view of machine architecture, the cgm model implie the bp parameter by working under the aumption that the latency, L, i very high

22 CHAPTER 2. COARSE GRAINED PARALLEL COMPUTING MODELS 13 and the ratio of proceor peed to bandwidth, g, i very low. Under thi aumption, the communication cot i dominated by the latency, and counting upertep i a reaonable method of determining communication cot. In the remainder of thi thei, the cgm model will be ued a a tarting point for a heterogeneou coare grained parallel computing model, and the reulting model will be ued to analyze all algorithm preented. Thi deciion wa made in order to implify the analyi of the algorithm preented o that the dicuion can focu on the iue aociated with heterogeneou ytem rather than on the detail of analyzing algorithm in the bp model. It i worthwhile noting, however, that the modication introduced to the cgm model to arrive at the heterogeneou cgm model could alo be introduced into the bp model, and the algorithm preented for the heterogeneou cgm model are alo algorithm for thi heterogeneou bp model.

23 Chapter 3 A Heterogeneou Computing Model Thi chapter dicue a generalization of the cgm model decribed in Chapter 2 which take the preence of heterogeneou proceor into account. Thi model, called the heterogeneou coare grained multicomputer (hcgm) model maintain the implicity of the cgm model while providing a mean of modelling the eect of heterogeneou proceor uch a the anomaly decribed in Chapter The Heterogeneou Coare Grained Multicomputer A heterogeneou coare grained multicomputer hcgm(m; p; ) conit of p poibly heterogeneou proceor labelled P 0 ; : : : ; P p?1. The value = P p?1 i=0 i repreent the total peed of the parallel machine, where i repreent the peed of P i and i an integer. Each proceor, P i, can perform w unit of work in w i time unit. Each proceor know the value of 0 ; : : : ; p?1 a well a the value of. For conciene, we dene max = maxf i : 0 i p? 1g and min = minf i : 0 i p?1g, i.e., max and min are the peed of the fatet and lowet proceor, repectively. Similarly, we dene P max = P minfi:i = max g and P min = P minfi:i = min g. 14

24 CHAPTER 3. A HETEROGENEOUS COMPUTING MODEL 15 0 =1 P 0 =2 1 P 1 2 =1 P 2 3 =2 P 3 m=1000 Figure 3.1: An example of an hcgm(m; p; ) with m = 1000, p = 4, = 6, P max = P 1, max = 2, and P min = P 0, and min = 1. That i, P max i a repreentative fatet proceor, and P min i a repreentative lowet proceor. An example of an hcgm(m; p; ) i hown in Figure 3.1. Each proceor, P i, in an hcgm(m; p; ) ha (maxf m; i mg) local memory. The p p proceor of an hcgm(m; p; ) are interconnected by a network capable of routing any all-to-all communication in which the total amount of data exchanged i O(m). However, thee communication operation incur a penalty in computation time. If P i i the ource (rep. detination) of b word of information, then P i incur a penalty in computation time of b i. Thi repreent the local computation needed to pack (rep. unpack) meage into (rep. from) buer. For example, the computation time aociated with routing an h-relation i maxf h i : 0 i p? 1g = h min. Becaue the memory and communication peed of the proceor are proportional to the proceor peed, hcgmalgorithm can take advantage of fater proceor by having them proce and communicate more data. If, a in the cgm(m; p) model, each proceor ha only O( m ) memory, it may not be poible to improve the performance p of algorithm and avoid the anomaly decribed in Chapter 1. Such i the cae when the problem ize, n, i equal to the ize of the total memory, m. We aume that the input to an hcgm(m; p; ) algorithm i initially ditributed

25 CHAPTER 3. A HETEROGENEOUS COMPUTING MODEL 16 in a load balanced manner, that i, each P i initially hold i n input element. At thi point we note that the hcgm(m; p; ) model i equivalent to the cgm(m; p) model when 0 = 1 = = p?1 = 1. Like a cgm algorithm, the performance of an hcgm algorithm i meaured in term of computation time and the number of upertep. Both of thee quantitie can be function of n, p,, and 0 ; : : : ; p?1. However, rather than meauring the amount of local computation, w, it i the local computation time, w i, that i meaured. Ideally, an hcgm(m; p; ) algorithm give a peedup of when compared to a uniproceor machine with unit peed running the fatet equential algorithm for the ame problem. If at all poible, thi peedup hould be independent of the value of 0 ; : : : ; p?1. One poible approach of obtaining hcgm algorithm directly from bp and cgm algorithm i to have each proceor, P i, imulate i = gcd( 0 ; : : : ; p?1 ) virtual cgm proceor, where gcd( 0 ; : : : ; p?1 ) i the greatet common divior of 0 ; : : : ; p?1. There are at leat three problem with thi approach. 1. The overhead aociated with automatically imulating virtual proceor can have a ignicant negative impact on real running time. Thee overhead can be avoided by having implementor code the imulation by hand, but thi add complexity to the already dicult tak of implementing parallel algorithm. 2. In ome cae the number of upertep in a cgm algorithm i a function of the number of proceor, o increaing the number of proceor by creating virtual proceor increae the number of upertep. 3. Many coare grained parallel algorithm work on ome verion of the coare grained aumption (ee Section 2.1), and introducing virtual proceor may violate thi aumption. 3.2 A Simple Example: Prex Sum Next, we conider the problem of computing the prex um of n element, for which a cgm algorithm wa preented in Chapter 2. The ection begin by analyzing the

26 CHAPTER 3. A HETEROGENEOUS COMPUTING MODEL 17 algorithm preented in Chapter 2 under the hcgm(n; p; ) model. Following thi (diappointing) analyi, a new algorithm i developed epecially for the hcgm(n; p; ). Recall the prex um algorithm of Chapter 2. Step 1 of the algorithm, computing the prex um locally take O( n ) computation time ince each proceor P i initially contain i n element. In Step 2 of the algorithm, each proceor P i end p? i? 1 element and receive i?1 element. Thi take O(p) work per proceor, and therefore take O(maxf p i : 0 i p? 1g) = O( p ) time. Step 3 take the ame amount min of local computation time a Step 2, and Step 4 take the ame amount of local computation time a Step 1. Thu the algorithm ue O( p min and O(1) upertep on a hcgm(n; p; ). Thi running time can be improved by having P max Step 3. Thi lead to the following algorithm. HCGM-Prefix-Sum() + n ) computation time do the bulk of the work in 1. Each proceor locally compute the prex um of it i n input element. 2. Each proceor, P i, end the total um of it input element to P max in a ingle communication upertep. 3. P max compute the prex um of the p element received in Step For 1 i p? 1, P max end the (i? 1)t element computed in Step 3 to P i in a ingle communication upertep. 5. Each proceor compute it nal portion of the prex um by adding the value received in Step 4 to each of the value computed in Step 1. Thi algorithm provide our rt reult for the hcgm model. Theorem 1. The prex um of n element can be computed on an hcgm(n; p; ), uing O( n ) computation time and O(1) upertep, provided that n p p. Proof. In Step 1 and Step 5, each proceor P i doe O( i n) work and thi work can be done in O( n ) time. Step 2 and Step 4 each proceor ue O(1) computation time

27 CHAPTER 3. A HETEROGENEOUS COMPUTING MODEL 18 except P max which ue O( p ) computation time. Step 3 take O( p ) computation max max time. Since p n, and by pigeonhole principle p max, all tep can be performed p in O( n ) time. In [25], Ferreira and Ubeda decribe an algorithm for computing the medial axi tranform uing 8 prex um operation and O( n ) local computation on a cgm(n; p). p By uing the hcgm(n; p; ) verion of the prex um algorithm decribed above, thi algorithm can be adapted to an hcgm(n; p; ), yielding the following reult. Corollary 1. The medial axi tranform of a p n p n image can be computed uing O( n ) computation time and O(1) upertep on a hcgm(n; p; ) with p n p. Corollary 1 i the rt example of a technique which will be ued repeatedly in thi part of the thei. Namely, to obtain hcgm algorithm from cgm algorithm, one need only nd hcgm algorithm for the primitive communication operation performed by the cgm algorithm. When thee operation are replaced in the cgm algorithm by their hcgm counterpart, the reulting algorithm i an hcgm algorithm.

28 Chapter 4 Communication Pattern Thi chapter dicue common communication pattern ued in coare grained parallel algorithm and give their implementation both in the cgm and hcgm model. The motivation for thi i that by implementing hcgm verion of thee pattern, we obtain a number of hcgm algorithm directly from cgm algorithm which ue thee pattern. Thi work alo ha application outide of heterogeneou parallel computing. A i well known in the eld of oftware engineering, the tudy of oftware pattern i a eld in itelf (ee e.g., [28]). By identifying common pattern ued in coare grained parallel algorithm we provide a good tarting point for the development of librarie and framework upporting the implementation of uch algorithm. 4.1 Random-Sample The technique of random ampling i one of the mot ueful tool ued in the deign of randomized parallel algorithm. In random ampling, a random ubet of ize O(r) i choen from the n input value. In the context of coare grained parallel computing, random ampling involve chooing O(r) ample from the input and routing them to a deignated proceor, uually P 0. Thi proceor then typically perform ome computation on thee element and broadcat the reult of thi computation to all proceor. The algorithm 19

29 CHAPTER 4. COMMUNICATION PATTERNS 20 for the Random-Sample pattern proceed a follow: CGM-Random-Sample(r) 1. Each proceor, P i, toe a biaed coin with ucce probability r n it input element. for each of 2. The element for which the coin to wa ucceful are routed to P 0. Uing Cherno bound (ee Appendix A), it i eaily hown (ee, e.g., [2]) that the number of element which arrive at P 0 i O(r). ~ 1 Thu, if r 2 O( n ), the computation p time ued by Random-Sample pattern i ~O( n) and the number of upertep i ~O(1) p on a cgm(n; p). In order to modify the random ample pattern for the hcgm(n; p; ) model we need only change the deignated proceor to which the ample are routed. Rather than routing the ample to P 0, we route the ample to P max. Thi reult in the following implementation: HCGM-Random-Sample(r) 1. Each proceor, P i, toe a biaed coin with ucce probability r n it input element. for each of 2. The element for which the coin to wa ucceful are routed to P max. Theorem 2. The HCGM-Random-Sample(r) algorithm ue ~ O( n ) computation time and ~O(1) upertep on an hcgm(n; p; ), provided that max n r 3 ln n. Proof. In Step 1, each proceor, P i mut perform i n coin toe and can do thee in O( n) time. In Step 2 each proceor, P i end at mot i n element and can do thi in O( n) time. Let r0 be the number of element received by P max in Step 2. Then r 0 i a random variable following the binomial ditribution b(n; r ). Applying Theorem 15, n 1 See Appendix A for a denition of the ~ O notation.

30 CHAPTER 4. COMMUNICATION PATTERNS 21 Equation A.1 we get that Pr [r 0 cr] = 1 e (c?1) 2 r=3 1 n (c?1)2, for r 3 ln n. Therefore, during Step 2 P max ~O( n) time. receive ~ O(r) element and can do thi in ~ O( r max ) 4.2 Random-Aign Random aignment i a tool ued in a number of CGP algorithm to achieve load balancing. The idea behind random aignment i to aign element of the input to proceor in a random fahion. In thi way, if the work performed on each element i variable, then one expect that all proceor will be aigned roughly the ame amount of work. Thi idea i realized in the following procedure. CGM-Random-Aign() 1. Each proceor, P i, randomly aign each of it element to one of p bucket, b i;0 ; : : : ; b i;p?1 with equal probability. 2. Each proceor, P i, route the content of each bucket, b i;j, to P j. Once again, uing Cherno bound it i not dicult to how that the number of element routed to any proceor i ~O( n ) (ee, e.g., [2]). Thu, the algorithm ue p ~O( n ) computation time and O(1) ~ communication round. p The hcgm verion of the Random-Aign pattern i imilar to the cgm verion except that each proceor, P i, hould receive O( i n) element in Step 2. To achieve thi, we change the probability to which element are aigned to bucket in Step 1. The modied algorithm work a follow.

31 CHAPTER 4. COMMUNICATION PATTERNS 22 HCGM-Random-Aign() 1. Each proceor, P i, randomly aign each of it element to one of p bucket, b i;0 ; : : : ; b i;p?1. P i aign an element to bucket b i;j with probability j. 2. Each proceor, P i, route the content of each bucket, b i;j, to P j. log p) computa- Theorem 3. The HCGM-Random-Aign() algorithm ue ~O( n tion time and O(1) upertep on an hcgm(n; p; ), with p n and min n 3 ln n. Proof. Step 1 can be accomplihed by performing a binary earch on the p bucket for each of the input element, and can therefore be done in O( n log p) time. Next we how that the number of element received in Step 2 i ~O( i n) for each proceor, P i. Let n i be the number of element received by P i in Step 2. Then clearly n i i a random variable which follow the binomial ditribution b( i ; n). Applying Theorem 15, Equation A.1, we get Pr n i c i n 1 e (c?1) 2 i n=3 1, for i n 3 ln n. n (c?1)2 Therefore, the probability that any proceor, P i, receive more than c i n element i bounded by Pr 9i.t. n i c i n p n (c?1)2 1, for p n. n (c?1)2?1 Therefore Step 2 can be done uing O( ~ n ) computation time and the entire algorithm ue ~O( n log p) computation time. 4.3 Linear-Partition Let S be a be a et of key and be a relation that dene a partial order on S. A linear partition of S i a partitioning of S into p dijoint ubet S 0 ; : : : ; S p?1 uch

32 CHAPTER 4. COMMUNICATION PATTERNS 23 that x y for all x 2 S i, y 2 S j and i < j. Linear partitioning i one of the mot commonly ued communication pattern in parallel computing. Thi i due imply to the fact that orting i a pecial cae of linear partitioning in which the key are orted locally after being partitioned. Here we decribe a randomized linear partitioning algorithm baed on the ample ort algorithm decribed in [32]. We aume that the n key are all ditinct ince if they are not, they can be made o by, e.g., concatenating their value with their proceor number and memory location. The algorithm proceed a follow. CGM-Linear-Partition() 1. All proceor take a random ample of ize O(r), r a contant, uing the HCGM- Random-Sample algorithm and route the ample key to P P 0 ort the ample key. Denote thee key by ample 0 ; : : : ; ample pr?1 where ample i i the ample with rank i in the orted order. 3. P 0 dene p + 1 plitter, plitter 0 ; : : : ; plitter p, where plitter i = 8 >< >:?1 if i = 0 ample ir if 0 < i < p 1 if i = p 4. P 0 broadcat plitter 0 ; : : : ; plitter p to all proceor. 5. Each proceor, P i, place each of it key into one of p bucket, where a key x i placed in bucket b ij if and only if plitter j x < plitter j Each proceor, P i, route the content of bucket b ij to P j for all i; j. That thi algorithm produce a valid linear partition i clear ince (1) all key are aigned to exactly one bucket, and hence one proceor, (2) all key in bucket i are trictly le than all key in bucket j for all i < j. Le clear i the running time of the algorithm, ince it i conceivable that ome proceor receive ignicantly more

33 CHAPTER 4. COMMUNICATION PATTERNS 24 than O( n ) element in Step 6. Gerbeioti and Valiant [32] howed that for properly p choen value of, n, and p uch a ituation doe not occur. When adapting thi algorithm to the hcgm model, we change the way in which the plitter are choen. In order to balance the work according to 0 ; : : : ; p?1 it i neceary that O( i n) input key fall between plitter i and plitter i+1. In order to achieve thi, we chooe the plitter o that O( i r) ample key fall between plitter i and plitter i+1. Thi lead to the following algorithm. HCGM-Linear-Partition() 1. All proceor take a random ample of ize O(r), r to be dened later, uing the HCGM-Random-Sample algorithm and route the ample key to P max. 2. P max ort the ample key. Denote thee key by ample 0 ; : : : ; ample pr?1 where ample i i the ample with rank i in the orted order. 3. P max dene p + 1 plitter, plitter 0 ; : : : ; plitter p, where plitter i = 8 >< >:?1 if i = 0 m if 0 < i < p ample l P i j j=0 r 1 if i = p 4. P max broadcat plitter 0 ; : : : ; plitter p to all proceor. 5. Each proceor, P i, place each of it key into one of p bucket, where a key x i placed in bucket b ij if and only if plitter j x < plitter j Each proceor, P i, route the content of bucket b ij to P j for all i; j. Theorem 4. The HCGM-Linear-Partition() algorithm ue ~O( n log p) computation time and O(1) upertep on an hcgm(n; p; ), with r = max and p n. n, min r 2 ln n,

34 CHAPTER 4. COMMUNICATION PATTERNS 25 Proof. By Theorem 2, Step 1 of the algorithm ue ~O( n ) computation time. Step 2 of the algorithm ue ~O( r log r) ~O( n log n) computation time. Step 3 and 4 max clearly ue ~O( n) computation time. Step 5 ue O( n log p) computation time. Next we conider the poibility that ome proceor receive too many key in Step 6. Let n i be the number of key received by P i in Step 6. Let r i be the number of ample choen from the c i n key following plitter i in the overall orted order. Now note that n i > c i n only if r i < i r. That i, the number of key between plitter i and plitter i+1 can only exceed c i n if le than i r ample are choen from thee key. Since r i i a random variable that follow the binomial ditribution b(c i n; r ), n Theorem 15, Equation A.2 can be applied to get Pr q i i r = = 1 e (1? 1 c )2 ( r n )(c i n)=2 1 e (1? 1 c )2 c( i r)=2 1 n, for i r 2 ln n. (1? 1 c )2 c Therefore, the probability that any proceor, P i, receive more than c i n element i bounded by Pr 9i.t. q i i r p n (1? 1 c )2 c 1 n (1? 1 c )2 c?1 and Step 6 can be done uing ~O( n ) computation time., for p n, Since orting i a pecial cae of Linear-Partition in which element are orted locally after partitioning, we obtain the following corollary. Corollary 2. Sorting n key can be done uing ~ O( n O(1) upertep on an hcgm(n; p; ), with r = max log n) computation time and n, min r 2 ln n, and p n.

35 CHAPTER 4. COMMUNICATION PATTERNS PRAM-Simulation pram imulation on the bp model were rt introduced by Valiant in [61], and by Gerbeioti and Valiant in [32] a a mean of obtaining bp algorithm from pram algorithm, and it wa hown that if the bp parameter g i cloe to unity, the reulting bp algorithm would be optimal. Unfortunately, thi condition i not uually met in practice and the performance of the reulting algorithm i often diappointing. More recently, pram algorithm have been revived in the form of clipping [10]. Clipping involve imulating a pram algorithm for O(log p) round, topping the algorithm (clipping it), and completing the computation with a pecialized CGP algorithm. In doing pram imulation on a cgm, each proceor imulate n erew-pram p proceor and tore n data element. Each round of the imulation conit of a read p phae and a write phae. The following algorithm perform 1 tep of an erew-pram imulation on a cgm(n; p): CGM-PRAM-Simulation() 1. Each proceor, P i, formulate O( n ) read requet and end each requet to p the proceor holding the element to be read. 2. Each proceor, P i, repond to the O( n ) read requet received in Step 1. p 3. Each proceor, P i, formulate O( n ) write requet and end each requet to p the proceor holding the element to be written. 4. Each proceor, P i, repond to the O( n ) write requet received in Step 3. p Theorem 5. The CGM-PRAM-Simulation() algorithm ue O( n p ) computation time and O(1) upertep on a cgm(n; p). Proof. Determining which proceor, P i, ervice a read or write requet for memory location j can be done in contant time uing the formula i = bj=(n=p)c. Therefore Step 1 and Step 3 take O( n p ) time. Since an erew-pram i being imulated, no proceor receive more than O( n ) requet in Step 2 and Step 4. Therefore thee p tep can be done in O( n ) time, yielding the tated time bound. p

36 CHAPTER 4. COMMUNICATION PATTERNS 27 The hcgm(n; p; ) verion of the PRAM-Simulation procedure i nearly identical to the cgm verion, though it only hold for a retricted range of parameter. The algorithm proceed a follow. HCGM-PRAM-Simulation() 1. Each proceor, P i, formulate O( i n) read requet and end each requet to the proceor holding the element to be read. 2. Each proceor, P i, repond to the O( i n) read requet received in Step Each proceor, P i, formulate O( i n) write requet and end each requet to the proceor holding the element to be written. 4. Each proceor, P i, repond to the O( i n) write requet received in Step 3. The extra retriction on the range of parameter come from the fact that the proceor, P i, which correpond to pram memory location j can not be determined uing a imple formula a it i in the cgm procedure. A imple workaround to thi i to ue binary earch to nd the correct proceor but thi lead to an O(log p) lowdown. A more ecient method can be obtained uing integer orting. Theorem 6. The HCGM-PRAM-Simulation() algorithm ue O( n ) computation time and O(1) upertep on an hcgm(n; p; ), provided that contant c 1 ; c 2 > 0, and min n p. c min 1 p c 2 for ome Proof. Clearly the proof of Theorem 5 extend to thi theorem with the exception of nding the proceor which ervice requet. Thu we need only how how thi i done. By rt orting the requet locally at each proceor, the relevant proceor can be determined by (equentially) canning the orted lit uing O( i n + p) = O( i n) work at each proceor P i, and can therefore be done uing O( n ) computation time. Thu we need only conider how to ort the requet. Fact 1 (Radix Sort [42]). It i poible to ort (equentially) k integer in the range [0; l? 1] uing O( log l k) computation and O(k) memory. log k

37 CHAPTER 4. COMMUNICATION PATTERNS 28 k = i Each proceor P i mut ort i n element in the range [0; n? 1]. By ubtituting n and l = n in Fact 1, we ee that the orting can be done uing W ort 2 O = O! i n! log n log i n log i n + log i log i n = O 1 + log i log i n! i n!! i n! O 1 + log c 1 + c 2 log p i n O 1 + log c 1 + c 2 log p log p = O i n work at each proceor, P i, and O( n ) time.! i n!, for i c 1 p c 2! i n!, for i n p For an algorithm which ue pram imulation to be ueful in practice, the imulation mut be done a eciently a poible. If we have the extra retriction that = gcd( 0 ; : : : ; p?1 ) 2 O( min n) we can imulate the CGM-PRAM-Simulation algorithm by having each proceor, P i, imulate i = gcd( 0 ; : : : ; p?1 ) cgm proceor thereby reducing the contant in the running time of the HCGM-PRAM- Simulation algorithm. If we have the weaker retriction that min 2 O( min n) then the ame game can be played by having each proceor, P i, imulate O( i ) cgm proceor. Although thi min lead to an algorithm with running time O( n ), the big-oh notation hide the fact that the proceor are not doing work which i exactly proportional to their peed. Thi may or may not be acceptable in practice. In many cae, pram algorithm operate on pointer-baed data tructure uch a lit or graph and the memory location acceed by the pram proceor are dictated by the value of pointer in the data tructure. In uch cae the data tructure can be preproceed o that the pointer are modied to contain a proceor/addre pair to allow addreing in contant time. Thi preproceing can eaily be done in time O( n log p), and will mot likely yield the mot ecient imulation in practice.

38 CHAPTER 4. COMMUNICATION PATTERNS 29 Uing the technique in [61, 32], it i poible to imulate other type of pram on the hcgm model. In particular, extenion of the randomized erew-pram and crcw-pram imulation decribed in [32] to the hcgm model are poible. 4.5 Circulate Scientic computation often ue a very regular communication pattern in which a et of item i \rotated" or \circulated" through the proceor in round, o that after p round, every proceor ha een every item. Example include dene matrix multiplication, in which the row of the matrix are rotated, and olution to the o-called n-body problem, in which the n bodie in quetion are rotated. The Circulate pattern take two ordered lit A and B of ize O(n) a input. The computation proceed in p round. During each round each proceor end and receive ome portion of B of ize n, and perform ome computation on it locally p tored portion of A and B. After the p round, each element of B ha been tored in the ame proceor a each element in A during exactly one round. The nature of the computation performed in each round may vary, but the running time mut be of the form O(jA i j jb i j n c ), where A i (rep. B i ) i the ublit of A (rep. B) tored at P i. Thi i captured by the following algorithm. CGM-Circulate(A; B) 1. Repeat Step 2 and 3 p time. 2. Each proceor P i perform computation on A i and B i. 3. Each proceor P i end B i to P (i+1) mod p. The number of upertep ued by the CGM-Partition algorithm i clearly O(p). Initially, A and B are ditributed evenly among the proceor, and o the amount of computation done during each of the p computation upertep i O jaj p jbj!! p nc = O nc+2 p 2

39 CHAPTER 4. COMMUNICATION PATTERNS 30 and the overall computation time i O( nc+2 p ). To implement an hcgm verion of the Circulate pattern, we need only change the way in which A and B are ditributed among the proceor. Thi lead to the following algorithm: HCGM-Circulate(A; B) 1. Ditribute A and B o that P i tore i jbj jaj element of A and p 2. Repeat Step 2 and 3 p time. 3. Each proceor P i perform computation on A i and B i. 4. Each proceor P i end B i to P (i+1) mod p. element of B. Theorem 7. The HCGM-Circulate(A; B) algorithm ue O( nc+2 ) computation nc 1. time and O(p) upertep on an hcgm(n; p; ), provided that n p p, and b min Proof. Reditributing A and B in Step 1 i a traightforward matter uing the prex n um algorithm of Theorem 1, and take O( ) computation time and O(1) upertep. p min During each execution of Step 2, the work done by P i i given by O! i jbj jaj p nc = O! in c+2 p and thi work can be done in time O( nc+2 ) time. Therefore, over the p round, p the total computation time i O( nc+2 ), and thi dominate the overall computation time.

40 Chapter 5 HCGM Algorithm Thi chapter provide a ampling of algorithm for the hcgm model, and decribe ome empirical reult which how that thee algorithm work well. Thee algorithm are arrived at by expreing exiting cgm and bp algorithm in term of the communication pattern decribed in Chapter 4. We preent algorithm for the following problem: 1. Parallel inertion and deletion operation on a priority queue illutrate the Random-Aign and Linear-Partition pattern. 2. Computing the lower envelope of non-interecting line egment illutrate the Linear-Partition pattern. 3. Lit ranking illutrate the PRAM-Simulation pattern. 4. Matrix multiplication illutrate the Circulate pattern. Rather than give a upercial treatment of a large number of algorithm, we have choen to examine a few algorithm in detail. A conequence of thi approach i that we do not provide an exhautive lit of poible hcgm algorithm baed on the communication pattern in Chapter 4. However, after preenting each algorithm, we make note of other bp and cgm algorithm which could be converted to hcgm algorithm in a imilar manner. 31

41 CHAPTER 5. HCGM ALGORITHMS Priority Queue Operation Priority queue are a fundamental data tructure ued in a large number of graph and optimization algorithm. A common example i the cla of \Branch and Bound" algorithm ued in combinatorial optimization. A priority queue, Q, dynamically maintain a et of key belonging to a linear order. In general, priority queue upport a number of operation including inertion of key, deletion of arbitrary key, deletion of the mallet key, changing the value of a key, and concatenation of two queue. However, for the purpoe of thi work, we will conider only the following two operation: 1. Inert(k). Inert the key k into Q. 2. DeleteMin(). Remove the minimum key, k min, from Q. Variou implementation of priority queue exit, each with dierent running time for dierent operation. We note however, that for a comparion baed implementation at leat one of the above operation mut have a running time of (log n) where n i the number of key in Q. Otherwie, orting n key could be done in o(n log n) time by inerting the key into Q and then removing them in orted order. In thi ection, we conider the problem of performing a batch of m priority queue operation on a priority queue, Q, of ize n. Throughout the following dicuion, we will aume that n > m. The two operation we wih to upport are: 1. MultiInert(k 0 ; : : : ; k m?1 ; Q). Inert the key k 0 ; : : : ; k m?1 into Q. 2. MultiDelete(m; Q). Delete the m mallet key from Q. The algorithm preented in thi ection are a generalization of the algorithm dicovered by Baumker et. al. [4] and independently by Gerbeioti and Siniolaki [31]. In thee cheme, each proceor, P i maintain a local priority queue Q i which contain O( ~ n ) of the key in the overall priority queue. When inerting key into p the priority queue, the key are aigned to proceor at random. The MultiInert algorithm i given below:

42 CHAPTER 5. HCGM ALGORITHMS 33 MultiInert(k 0 ; : : : ; k m?1 ; Q) 1. All proceor ue the Random-Aign algorithm to randomly aign the key to proceor. 2. Each proceor, P i, inert the key received in Step 1 into Q i. Theorem 8. The algorithm MultiInert(k 0 ; : : : ; k m?1 ; Q) run ue ~O( m log n) computation time and O(1) upertep on an hcgm(n; p; ), provided that min m 3 ln m. Proof. By Theorem 3, Step 1 take ~O( m log p) computation time and ~O(1) upertep. Furthermore, each proceor, P i, receive O( ~ i m) key. Therefore, Step 2 take O( m log n) computation time and dominate the computation time. Aigning the key to proceor uing the Random-Aign pattern not only enure load balancing during inertion, but alo enure load balancing during deletion. Thi i becaue each local queue, Q i, contain no more than c i m of the m mallet key, with high probability. Thu, the trategy ued when deleting key i to delete too many key from Q i deleted. Thi lead to the following algorithm. MultiDelete(m; Q) and then reinert thoe key which hould not have been 1. Each proceor P i remove the minimum c i m key from Q i. 2. All proceor globally ort the key removed in Step The key with rank < m in the orted order are deleted, while the key with rank > m in the orted order are reinerted uing the MultiInert algorithm. Note that the MultiDelete(m; Q) algorithm i a Monte Carlo algorithm a well a a La Vega algorithm. It i poible that ome Q i contain more than c i m of the m mallet key, in which cae the algorithm produce an incorrect reult. However, a Theorem 9 how, thi i highly unlikely.

43 CHAPTER 5. HCGM ALGORITHMS 34 Theorem 9. Let m i be the number of the m mallet key which are tored in Q i. Then provided that i m 3 ln m. Pr m i c i m 1 m (c?1)2 Proof. Note that any key i aigned to Q i with probability i and that thi probability i independent of any other key being aigned to Q i. Therefore the value of m i follow the binomial ditribution b(m; i ). Applying Equation A.1, we get that Pr m i c i m 1 e (c?1) 2 i m=3 1 m (c?1)2 Theorem 10. The algorithm MultiDelete(m; Q) ue ~O( m and O(1) upertep on an hcgm(n; p; ), provided that r = max p n. log n) computation time n, min r 2 ln n, and O( m Proof. In Step 1, each proceor P i delete i m item from Q i, and can do thi uing log n) computation time. By Corollary 2, Step 2 can be done uing O(1) upertep and O( m log m) computation time. By Theorem 8, Step 3 can be done uing O(1) upertep and O( m log n) computation time. Thu, all tep can be completed in the tated reource bound. The Monte Carlo MultiDelete(m; Q) algorithm decribed above can eaily be made into a trictly La Vega algorithm, with ome additional verication. Thi can be accomplihed by enuring that after Step 2, the minimum item in each Q i i greater than the key with rank m. If thi doe not hold, then item hould be removed from Q i until it doe. Note that thi doe not aect the analyi of the algorithm ince, with high probability, no extra item need to be deleted from any Q i. A number of bp and cgm algorithm exit which ue the Random-Aign pattern. Thee include the randomized orting algorithm of Bader et. al. [2], the tree multiearch algorithm of Baumker et. al. [5, 6, 7, 3], and the DAG multiearch algorithm of Gerbeioti and Siniolaki [29, 30].

44 CHAPTER 5. HCGM ALGORITHMS 35 Figure 5.1: The lower envelope of a et of line egment. The portion of the line egment which form the lower envelope are hown in bold. 5.2 Lower Envelope Computing the lower envelope of non-interecting line egment i a claic problem in computational geometry. Given a et of non-interecting line egment in the plane, the lower envelope problem i to determine which portion of thee egment are viible to a viewer tanding at (0;?1) (ee Figure 5.1). A number of (more realitic) viibility problem can be reduced to the lower envelope problem through tandard geometric tranformation. The equential complexity of the lower envelope problem i O(n log n). The algorithm we decribe relie on the following property of the lower envelope of line egment. Obervation 2 (x-monotonicity). Any lower envelope, L, i x-monotone, i.e., any vertical line interect L at mot once. Dehne, Fabri, and Rau-Chaplin [21, 22] decribe a cgm algorithm for the lower envelope problem which ue O(1) upertep and O( n log n ) computation time. The p algorithm we decribe i arrived at by imply replacing the global ort operation ued in [21, 22] with the Linear-Partition algorithm in Chapter 4. The algorithm work by rt computing p individual lower envelope. Next, the

45 CHAPTER 5. HCGM ALGORITHMS 36 plane i partitioned into p vertical lab, where each lab interect at mot O( n) of p the egment of the lower envelope computed above. Finally, the lower envelope of the egment in each lab i computed, and the overall lower envelope i the union of the lower envelope of all lab. The algorithm i decribed in detail below, and it' execution i illutrated in Figure 5.2. LowerEnvelope(S) 1. Each proceor P i compute the lower envelope of it' locally tored egment, call thi L i. 2. All proceor globally perform a linear partition of the L i computed in Step 1, uing the x-coordinate of the right endpoint of each egment a the key. 3. Each proceor, P i, determine the vertical line, l i, through the rightmot egment received in Step 2, and broadcat thi line to all other proceor. 4. Each proceor, P i, end the egment 2 L i to P j if and only if interect l j. 5. Each proceor, P i, compute the lower envelope of the egment received in Step 2 and 4. Theorem 11. The lower envelope of n non-interecting line egment can be computed uing ~O(1) upertep and ~O( n log n) computation time on an hcgm(n; p; ) provided that r = max n, min min r 2 ln n, and n p. Proof. The correctne of the algorithm follow from the correctne of the algorithm in [21, 22]. At the beginning of Step 1 each proceor, P i, contain O( i n) egment and can therefore compute the lower envelope of thee egment in O( n log n) time. By Theorem 4, Step 2 can be done in O(1) ~ upertep and O( ~ n log p) computation time. Step 3 conit of routing an h-relation with h = p. A noted in [22], during Step 4, each proceor end and receive at mot p min n egment. Thi i due to Obervation 2, ince each L i interect each l j at mot once. At the beginning of

46 CHAPTER 5. HCGM ALGORITHMS 37 Input Step 1 Step 2 and 3 Step 4 Step 5 Figure 5.2: Example execution of LowerEnvelope algorithm. Line egment are colored by the proceor which hold them.

47 CHAPTER 5. HCGM ALGORITHMS 38 Step 5 each proceor, P i, contain ~O( i n) egment and can therefore compute the lower envelope of thee egment in ~O( n log n) time. Dehne, Fabri, and Rau-Chaplin [22] alo dicu an extenion of the algorithm to the cae of poibly interecting line egment and how that with a mall amount of extra memory, thi problem can alo be olved. Similar comment hold regarding our generalized verion of the algorithm. The Linear-Partition algorithm i extremely ueful in adapting cgm and bp algorithm to the hcgm model. The only communication operation ued by the algorithm in [21, 22] i global orting. A rt tep in generalizing almot any cgm and bp algorithm to the hcgm model i to replace all call to global ort with call to Linear-Partition. In ome cae, the reulting algorithm i in fact more ecient than the original ince global ort i often ued when a linear partition will uce. (Thi i the cae with the LowerEnvelope algorithm if a equential algorithm i ued in Step 5 that doe not require orting.) 5.3 Lit Ranking The lit ranking problem take a input a linked lit and return a output the ditance of each lit element to the lat element of the lit. Thi lit ranking problem ha been tudied extenively for the pram model and optimal randomized and determinitic algorithm have been devied [59]. The implet (near-optimal) olution to the lit ranking problem ue a recurive doubling technique known a pointer jumping (ee, e.g., [41]. In pointer jumping, each element i initially aigned a rank of 1, with the exception of the lat lit element which i aigned a rank of 0. Each element then get aigned the pointer of it ucceor, and add to it rank, the rank of it ucceor. After repeating thi procedure log n time, each element ha the lat lit element a it ucceor and i correctly ranked. In [10], Cacere et. al. decribe a cgm algorithm for the lit ranking problem. The algorithm begin by nding a p 2 -ruling et of ize O( n ). Thi i a ubet of the original p

48 CHAPTER 5. HCGM ALGORITHMS 39 next rank dit nextr Figure 5.3: An example of the lit ranking algorithm of Cacere et. al. Ruling et element are hown in gray. lit element uch that no two conecutive element are further than ditance p 2 apart. At the ame time, the algorithm determine for each lit element x, nextr(x) the rt ruling et element which occur after x in the lit, and dit(x) the ditance between x and nextr(x). The ruling et element are then broadcat to all proceor and are ranked equentially. Element not in the ruling et are then ranked uing the formula rank(x) = rank(nextr(x)) + dit(x). Thi i expreed by the following algorithm. See Figure 5.3 for an example. CGM-Lit-Rank() [10] 1. All proceor compute a p 2 ruling et of ize O( n ). At the ame time, all p proceor compute dit(x), and nextr(x), for all lit element x. Thi can be done in O(log p) upertep and O( n log p) computation time [10]. p 2. All proceor perform an all-to-all broadcat of the ruling et element. 3. Each proceor, P i rank the ruling et element. 4. Each proceor, P i, rank the element it contain which are not in the ruling et uing the formula rank(x) = rank(nextr(x)) + dit(x). Clearly the running time of thi algorithm i dominated by Step 1. Therefore the algorithm ue O( n log p) computation time and O(log p) upertep. p

49 CHAPTER 5. HCGM ALGORITHMS 40 When adapting thi algorithm to the hcgm model there are two problem to overcome. The rt i the all-to-all broadcat in Step 2 and the reulting computation in Step 3. Thi i eaily handled gathering the ruling et element in P max and having P max do the work of ranking thee element equentially. The econd conideration i what to do with the ranked ruling et element after they have been ranked by P max. They can not be broadcat, ince then ome low proceor would receive O( n p ) element and become bottleneck for the algorithm. The anwer i to imply replace the element in the proceor to which they were originally aigned. Afterward, another O(log p) round of pointer jumping allow all proceor to determine the rank of thee element. HCGM-Lit-Rank() 1. All proceor compute a p 2 ruling et of ize O( n ) uing the erew-pram p imulation procedure decribed in [10]. 2. All proceor gather the ruling et element into P max. 3. P max rank the ruling et element. 4. P max end each (now ranked) ruling et element back to the proceor from which it wa received. 5. All proceor imulate pointer jumping to determine, for each lit element x, the value nextr(x), dit(x), and rank(nextr(x)). 6. Each proceor, P i, rank the element it contain which are not in the ruling et uing the formula rank(x) = rank(nextr(x)) + dit(x). Theorem 12. The lit ranking problem can be olved on an hcgm(n; p; ) uing O( n log p) computation time and O(log p) upertep. Proof. In a preproceing tep, each proceor, P i, can preproce it ubarray in O( n log p) time, o that rather than containing indice, the ubarray contain proceor/index pair. In thi way, the pram imulation in Step 1 and the pointer jumping

50 CHAPTER 5. HCGM ALGORITHMS 41 A B C P 0 P P P P 0 P P P P 0 P P P Figure 5.4: Partitioning matrice A, B, and C in a 4 proceor ytem. in Step 5 can be done in O( n log p) time. Step 2{4 can clearly be done uing O(1) upertep and O( n ) computation time. Lit ranking i ued a a ubroutine in a number of parallel tree and graph algorithm. In [10], Cacere et. al. decribe a number of cgm graph algorithm baed on exiting pram algorithm. Uing the pram imulation technique of Chapter 4, mot of thee algorithm can be adapted to the hcgm model. 5.4 Matrix Multiplication Matrix multiplication i perhap one of the mot common operation ued in largecale cientic computing. Given two n n matrice A and B, we dene the matrix C = A B a C i;j = n?1 X k=0 A k;i B j;k : In thi ection, we how how to implement matrix multiplication uing the Circulate pattern. We aume that the matrix A i partitioned among the proceor o that each proceor, P i hold i n row of A and n column of B. At the completion of the p computation, P i will hold i n row of C (ee Figure 5.4). We denote the part of A, B, and C held by P i a A i, B i, and C i repectively. The matrix multiplication algorithm conit of circulating the column of B among the proceor. Note that when P i receive column j of B, it can compute

51 CHAPTER 5. HCGM ALGORITHMS 42 column j of C i. Thu, once P i ha een all column of B, it will have computed all of C i. Although, by now the operation of the algorithm hould be obviou, we include it here for the ake of completene. HCGM-Matrix-Multiply(A; B) 1. Apply the HCGM-Circulate algorithm, where the A et conit of the row of A, and the B et conit of the column of B. When proceor P i receive ome column of B, it compute the correponding column of C i. Theorem 13. The HCGM-Matrix-Multiply(A; B) algorithm compute the matrix C uing O( n3 ) computation time and O(p) upertep on an hcgm(n2 ; p; ), provided that n min p, and b nc 1. p Proof. The correctne of the algorithm follow from the fact that the Circulate pattern enure that every proceor P i, ee every column of B exactly once, thereby enabling it to correctly compute C i. The running time follow from Theorem 7 and from the fact that the work done by P i during each round i of the form O(jA i j jb i j n) 5.5 Empirical Reult The algorithm for the communication pattern decribed in Chapter 4 and ome of the algorithm in thi chapter have been implemented a part of the PLEDA library, an ongoing project whoe goal i to upply a portable library of ecient parallel data tructure and algorithm [47]. Thi work build on the LEDA library of equential data tructure and algorithm [49]. The library i written in C++ and ue MPI for meage paing. Timing reult are preented for a orting algorithm, which ue the Random- Sample and Linear-Partition pattern (ee Corollary 2), and for a parallel verion of the Floyd-Warhall all pair hortet path algorithm (ee, e.g., [13]), which i baed on the Circulate pattern. Thee reult were obtained on a dedicated cluter

52 CHAPTER 5. HCGM ALGORITHMS With Load-Balancing Without Load-Balancing Speed (item/ec) Number of Proceor Figure 5.5: Performance of cgm and hcgm verion of Sample Sort. of worktation coniting of MHz Pentium proceor interconnected by a 100MHz Ethernet witch, running Linux, and uing the LAM MPI implementation. In order to imulate low proceor, a crippling proce wa launched on thoe proceor in order to reduce their eective peed. Crippling procee do nothing but pin in a tight loop performing uele calculation, eectively reducing the peed of the proceor to 1 it uual peed. For thee tet up to 14 proceor were ued. P 2 0 through P 6 were run at the regular peed, while P 7 through P 13 were crippled. Figure 5.5 compare the reult of uing the hcgm Linear-Partition algorithm and then orting locally againt the reult obtained by tandard Sample Sort [32]. The tet ort a lit of 2: integer, uing the LEDA implementation of quickort a the local orting function. In both cae, the input i initially ditributed in a load balanced manner. It i clear from Figure 5.5 that the hcgm verion (labelled \With Load-Balancing") of the algorithm perform much better than the tandard verion (labelled \Without Load-Balancing") when low proceor are introduced into the ytem.

53 CHAPTER 5. HCGM ALGORITHMS With Load-Balancing Without Load-Balancing 2500 Speed (item/ec) Number of Proceor Figure 5.6: Performance of cgm and hcgm verion of Floyd-Warhall algorithm In order meaure the performance of another cla of hcgm algorithm we implemented a CGP verion of the Floyd-Warhall all pair hortet path algorithm which ue the Circulate pattern on the column of the adjacency matrix. The reult of running thi tet with n = 1: are hown in Figure 5.6. A we would expect, the hcgm verion of the algorithm perform much better. With the cgm verion it i fater to run the application with 7 fat proceor than it i to run it with 7 fat proceor and 4 low proceor, while with the hcgm verion the performance improve each time a proceor i added to the cluter.

54 Chapter 6 Concluion Thi part of the thei ha dened a model for parallel computing on heterogeneou ytem, and decribed a number of algorithm for thi model. Although thi model i very imple, the empirical reult in Chapter 5 ugget that it capture the mot important apect of actual ytem, and algorithm which the model predict a being fat tend to work well in practice. There are everal direction in which thi work could be extended. Although the hcgm model appear to be a fairly good model for developing algorithm, it i not (nor i it intended to be) an accurate predictor of exact running time. One could argue that the aymptotic analye ued in thi thei are not trong enough, ince in practice proceor peed do not vary by more than contant factor. To thi end, it might be ueful to dene an hbp model which incorporate the bp parameter g and L. Tet like thoe done in [63] could then be performed to determine whether the reulting model i an accurate predictor of performance. We note that uch extenion to the bp model are not entirely trivial. quetion of whether to apply a bandwith limitation locally, by e.g., charging a cot of g i for each word ent or received by P i, or globally a i done in the hcgm model i not an eay one. In a recent paper, Adler et. al. [1] have hown that a model in which bandwidth i retricted globally ha ignicant theoretical advantage over a model in which bandwidth i retricted locally. Empirical teting i neceary to determine which i more appropriate, and it may be the cae that the choice of global veru The 45

55 CHAPTER 6. CONCLUSIONS 46 local bandwidth retriction depend on the actual parallel machine being ued. Another direction for future work i a direct comparion of the algorithm decribed in thi thei with algorithm baed on the overparitioning approach decribed in Chapter 1. The diculty with thi i that although algorithm which ue the overpartitioning approach exit for ome problem, mot of the algorithm decribed in Chapter 5 do not have counterpart baed on overpartitioning. Thu, regardle of the reult of the outcome of uch tet, mot of the algorithm in Chapter 5 are the only algorithm currently available for uch problem. One advantage which algorithm baed on overpartitioning may have over hcgm algorithm i that they can adjut to changing load in ytem in which proceor are hared among uer. Although the hcgm model could be extended to incorporate dynamically changing proceor peed, the algorithm for thi model would have to be very dierent than thoe preented in thi thei. Thi i due to the fact that hcgm algorithm attempt to minimize the number of upertep, which mean maximizing (with reaon) the amount of computation performed between communication operation. Therefore, if proceor peed change frequently and dratically, mot of the computation will be non-optimal. It eem that the CGP paradigm of algorithm deign i imply not uited for a cenario in which proceor peed change dynamically. The communication pattern of Chapter 4 are alo intereting in their own right. Since mot exiting CGP algorithm can be expreed in term of thee pattern, they are an excellent tarting point for a oftware framework which i to upport the development of CGP algorithm. Thi i the direction being purued in the PLEDA project [47]. Ongoing work in thi area include the implementation and teting of more algorithm, a well a keeping the lit of communication pattern up-to-date a new algorithm are developed which ue dierent pattern. On a more theoretical note, it may be intereting to develop algorithm for the pattern of Chapter 4 which do not rely on randomization.

56 CHAPTER 6. CONCLUSIONS 47 Bibliographic Note The work decribed in thi part of the thei ha been accepted for publication in the proceeding of the 12th Annual ACM Sympoium on Applied Computing (SAC'98) [48].

57 Part II Multireolution Surface Modeling 48

58 Chapter 7 Introduction Triangle mehe are ued in a number of eld for modelling the urface of real world object. A triangle meh i a graph which i iomorphic to a planar triangulation (c.f. [57]), each of whoe vertice i aigned a poition in < 3. An example of a triangle meh which model a human head i hown in Figure 7.1. In the eld of geographic information ytem (GIS), triangle mehe are ued a to model the urface of the earth. In computer graphic, triangle mehe are ued to model the urface of object in virtual world. In CAD ytem triangle mehe are ued to model the urface of machine part. Often mehe are generated from data obtained uing electronic imaging equipment uch a laer range nder, remote ening equipment, and atelite imaging unit. A thi equipment increae in delity, the ize of the reulting mehe increae a well. One method of dealing with thi dramatic increae in data ize ha been meh implication. Meh implication involve reducing the number of vertice in a meh in an intelligent manner, o that accuracy of the model i maintained a much a poible. Meh implication i a time conuming proce, and in many cae, manual intervention i neceary in order to preerve feature of the meh which may be important, but can not be recognized by a computer. Additionally, meh implication involve implifying a meh to a given level of detail (number of vertice or error tolerance). 49

59 CHAPTER 7. INTRODUCTION 50 Figure 7.1: An example of a triangle meh which model a human head. 1 For ome application, thi level of detail may be inappropriate. If the level of detail i inucient, then the application may produce poor or even incorrect reult. If the level of detail i too high, then the application may become computationally inecient or even infeaible. It i thi latter problem of meh implication that i addreed by the eld of multireolution urface modelling. Eentially, multireolution urface modelling involve performing meh implication uing a erie of localized implication operation to produce a equence of progreively le detailed mehe M n ; M n?1 ; : : : ; M 0. While the implication take place, the implication operation are recorded. In thi way, the operation can be inverted, and the original meh can be recreated. In fact, given the meh M i, any of the of the mehe M n ; M n?1 ; : : : ; M 0 can be recreated. Thu, the level of detail can be varied according to the requirement of the application. For illutration, Figure 7.2 how an example of a multireolution terrain model at 3 dierent level of reolution. In [38], Hoppe dene ome deirable feature of a multireolution urface model

60 CHAPTER 7. INTRODUCTION 51 Figure 7.2: A triangle terrain at 3 dierent level of reolution (left) and it' correponding haded image (right).

61 CHAPTER 7. INTRODUCTION 52 Level-of-Detail (LOD) Approximation allow for viewing and manipulating the meh at varying level of detail (number of vertice). Progreive tranmiion allow for the tranmiion of a meh acro a network, o that the receiver can immediately diplay an approximation of the meh, and rene thi diplay a more data i received. Selective renement i the ability to view and manipulate part of the meh at a very high level of detail while the remainder of the meh remain at a low level of detail. In thi part of the thei we are intereted in multireolution urface model and their application to other eld, with pecial emphai on geographic information ytem. The main technical contribution of thi portion of the thei i a olution to the problem of ecient elective renement of progreive mehe poed by Hoppe in [38]. 2 Uing the idea behind thi olution, a number of application are preented, including point location, elevation querie, viibility querie, ioline/contour extraction (converion to contour line). Empirical reult are preented for ome of thee algorithm which how that they are of coniderable practical relevance. Furthermore, we ketch repreentation a multireolution urface model in external memory. The remainder of thi part of the thei i organized a follow: Chapter 8 dicue other approache to multireolution terrain modelling and review the progreive meh repreentation upon which our work i baed. Chapter 9 decribe our olution to the problem of ecient elective renement in progreive mehe. Chapter 10 dicue ome application of thi olution. outline direction for future work. Finally, Chapter 11 ummarize and 2 A elective renement cheme imilar to the one decribed in Chapter 9 ha been independently and concurrently propoed by Hoppe [39].

62 Chapter 8 Survey of Exiting Work The problem of level of detail approximation in triangulated mehe ha received a ignicant amount of attention in recent year. In thi chapter, we give a brief critical urvey of thi work. We broadly claify thee cheme into three categorie: (1) the tree-baed cheme of De Floriani et. al. [17, 19, 18], (2) the DAG-baed cheme of Dobrindt and de Berg [16] and Puppo [58], and (3) the Progreive Meh cheme of Hoppe [38]. 8.1 Tree-Baed Scheme The approach to multireolution modelling taken in [19, 18] i to generate a meh coniting of neted triangle, and to arrange thee triangle into a tree haped hierarchy (ee Figure 8.1). At the top level of the hierarchy, the meh conit of a xed number of very large triangle. To rene (all or part of) the meh, the hierarchy i earched in a top-down manner, and large (parent) triangle are replaced by group of mall (child) triangle. Thi proce i repeated until a ucient level of detail ha been achieved. Since the triangle of the hierarchy are arranged a a tree with mall xed degree, thi cheme tend to be very ecient in practice. However, it doe have ome diadvantage. A i already evident in the three level hierarchy of Figure 8.1, triangle at lower level in the hierarchy tend to become elongated, due to the fact that the long 53

63 CHAPTER 8. SURVEY OF EXISTING WORK 54 Figure 8.1: An example of a tree-baed hierarchy. edge of the low-reolution meh are alo preent in the high-reolution meh. Such long triangle can lead to numerical intabilitie in computation, and aliaing eect in graphic application. Method which avoid thee elongated triangle by plitting long edge are decribed in [18], but thee can lead to vertical face (dicontinuitie) in the TIN. 8.2 DAG-Baed Scheme Dobrindt and De Berg [16], and De Floriani [17] addre the problem of the thin triangle which occur in tree-baed cheme by uing a very dierent approach. By repeatedly deleting an independent et of TIN vertice, and retriangulating the reulting hole, a hierarchy of triangulation which ha a depth of O(log n) and a total ize of O(n) i achieved (ee Figure 8.2). In thi cae, renement i achieved by replacing a group of triangle from a higher level in the hierarchy, with a group of triangle from a lower level in the hierarchy. Uing thi method, Dobrindt and de Berg how

64 CHAPTER 8. SURVEY OF EXISTING WORK 55 Figure 8.2: An example of a DAG-baed hierarchy. that the Delaunay triangulation of the data can be maintained at all level of the hierarchy. While thi approach olve the problem of thin triangle, it tend not to be a ecient a tree baed cheme. The torage requirement of thi cheme are much larger ince, on average, every vertex which i deleted generate an additional 10 pointer in the hierarchy (ee Figure 8.2). Furthermore, although the hierarchy ha depth O(log n), there are contant factor hidden in the big-o notation (Dobrindt and De Berg report hierarchie of with depth varying between 1:7 log 2 n and 2:4 log 2 n depending on the method ued to elect vertice for deletion). Puppo [58] take the DAG-baed approach one tep further by decribing a general DAG-baed framework called the MultiTriangulation. In thi cheme, a group of triangle can be replaced by another group of triangle a long a the boundarie of

65 CHAPTER 8. SURVEY OF EXISTING WORK 56 Figure 8.3: Some poible tranformation in a MultiTriangulation. the two group match (ee Figure 8.3). In a companion paper [26], it i hown that mot exiting cheme, including thoe in [16] and [19], can be expreed in term of the MultiTriangulation (although ometime the MultiTriangulation i le ecient than the original cheme). Although the MultiTriangulation i quite powerful and expreive, current implementation till eem to exhibit the ame high overhead aociated with the DAG-bae cheme of Dobrindt and de Berg [27]. 8.3 The Progreive Meh Repreentation Hoppe' progreive meh (PM) repreentation [38] i baed on the edge collape tranformation and it invere the vertex plit. An edge collape involve collaping an edge by identifying it two incident vertice, v 1 and v 2, into a ingle aggregate vertex v. The two adjacent face (v 1 ; v 2 ; v l ), and (v 1 ; v 2 ; v r ) vanih in the proce. An edge collape and it invere vertex plit are illutrated in Figure 8.4. In the progreive meh repreentation, a meh M, i repreented a a pair, (M 0 ; vplit), where M 0 i a coare-grained meh and vplit i a lit of vertex plit

66 CHAPTER 8. SURVEY OF EXISTING WORK 57 v 1 Edge Collape v l v r v l v v r v 2 Vertex Split Figure 8.4: The edge collape tranformation and it invere, the vertex plit which will reproduce the original meh when applied in order. A vertex plit tranformation vplit(v; v 1 ; v 2 ; v l ; v r ; A) plit the aggregate vertex v into two vertice v 1 and v 2 and add the edge (v 1 ; v l ), (v 1 ; v r ), (v 2 ; v l ), and (v 2 ; v r ). A tore attribute information for the neighbourhood of the tranformation, including, but not limited to, the poition of the two new vertice. We call v the parent of v 1 and v 2, and we call v 1 and v 2 the children of v. We ay that a vertex u i an ancetor of a vertex v if u = v or if u i an ancetor of the parent of v. In thi cae we call v a decendent of u. We denote by M i the meh obtained by performing the rt i element of vplit on the coare grained meh M 0. The original meh, M, can be obtained by applying all the element of vplit in order, i.e., M jvplitj = M. The progreive meh contruction algorithm take a meh, M, perform a erie of edge collape operation to get the meh M 0, and et vplit to the lit of vertex plit operation that invert the edge collape performed. The order in which the edge collape are performed i determined by an application pecic tne function (e.g., minimizing the geometric error). The edge of the meh are placed in a priority queue baed on their tne, and are removed one at a time a they are collaped. Edge in the neighbourhood of a collape may have their prioritie updated. A tne function which conider the geometry, dicrete attribute, and calar attribute of the meh i decribed in [38]. Sucient requirement for an edge collape to be valid are decribed in [40].

67 Chapter 9 Selective Renement of Progreive Mehe In Chapter 7, we aw that it may be deirable to work with a meh at a lower level of reolution than the original data. More generally, we may wih to work with part of the meh at a high level of reolution, and the ret of the meh at a lower level of reolution. For example, if the meh i a terrain repreenting a city, the uer may only be intereted in a particular part of the city. Therefore, it i not neceary (and inecient) to diplay the unintereting part of the city at a high level of reolution. The elective renement problem i that of extracting a meh which i at a high level of reolution in a query region q while leaving the meh outide of q at a low level of reolution. In thi chapter we are concerned with the elective renement problem on progreive mehe. In particular, we devie an algorithm to extract all the triangle of of M which interect a query region q while leaving the meh outide the query region relatively untouched. In [38], a brute force method of elective renement i decribed which examine every vertex plit record to determine whether or not it hould be plit. There are two diadvantage of thi algorithm. The rt i that the running time i O(jvplitj) regardle of the number of plit actually performed. The econd i that it i dicult to perform an exact elective renement, that i, to extract all the triangle of M 58

68 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 59 which interect a pecied query region, q. The reaon for thi i that there may be vertex plit which lie outide of q, but which aect the triangle within q. The following ection decribe a new ecient method of performing elective renement in progreive mehe. Our technique ue information about dependencie in the vplit lit to achieve better eciency. The method i imple, robut, and eay to implement. Thi method i the main technical contribution of thi part of the thei. Our approach can be ummarized a follow: In a preproceing phae, we aociate with each vertex plit record a region of inuence outide of which plitting the vertex ha no eect. By uing the parent/child relationhip between vertex plit record, thee region of inuence can be organized a a foret of rooted binary tree. When the record are organized in thi manner it i poible to eciently identify all the vertex plit record whoe region of inuence interect a given query region q. Once the relevant vertex plit record have been identied, they can be orted and applied in order. 9.1 Computing the Region of Inuence We treat the vplit lit a a dependency graph in which a vertex v 1 depend on a vertex v if v i plit and one of the reulting vertice i v 1. Oberve that the dependency graph i a foret of rooted binary tree whoe root are the vertice of the coare TIN, M 0, and whoe leave are the vertice of the original meh, M (ee Figure 9.1 for an illutration). With each node v in the dependency graph we aociate an axi aligned 3-Dimenional bounding box, denoted roi(v) which repreent the region outide of which thi vertex plit ha no inuence. We dene roi(v) recurively a follow: if v i a leaf then roi(v) i the mallet box which encloe all the neighbour of v in M, otherwie roi(v) i the mallet box which encloe roi(v 1 ) and roi(v 2 ) where v 1 and v 2 are the two children of v. Thee boxe, i.e., the roi' can be computed in a bottom-up fahion during the progreive TIN contruction procedure. Alternatively, thee boxe can be contructed uing a pot order traveral of the foret.

69 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES Figure 9.1: A equence of vertex plit and it aociated dependency graph.

70 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 61 The following lemma how u that if we apply to M 0 every vertex plit v uch that roi(v) interect q, then all triangle of M which interect q will be recontructed. Lemma 1. Let u be a vertex incident to a triangle in M which interect the query region q. Then, for all ancetor v of u, roi(v) interect q. Proof. The proof i by induction on the level of v in the dependency graph. The bae cae i when v = u, and i true ince roi(u) contain all triangle incident to u. Next, w.l.o.g., uppoe v i an internal vertex which wa formed by collaping the edge (v 1 ; v 2 ). Note that one of v 1 or v 2 i an ancetor of u, and by the inductive hypothei one of roi(v 1 ) or roi(v 2 ) interect q. Thi implie that roi(v) interect q a well, ince roi(v) contain roi(v 1 ) and roi(v 2 ). For completene, the following peudocode how how to contruct the roi information. The notation neighb(v) i ued to denote the neighbourhood of the vertex v in M, i.e., neighb(v) = fvg [ fu : (u; v) 2 E(M)g. The notation bb(x) i ued to denote the bounding box of the et X, i.e., the mallet axi-aligned box which contain X. Build-ROI() 1 forall v 2 V (M) do 2 roi(v) bb(neighb(v)) 3 forall v 2 V (M 0 ) do 4 Build-ROI-2(v) 5 return Build-ROI-2(v) 1 if vplit[v]:v 1 6= nil then 2 Build-ROI-2(vplit[v]:v 1 ) 3 Build-ROI-2(vplit[v]:v 2 ) 4 roi(v) bb(roi(vplit[v]:v 1 ); roi(vplit[v]:v 2 )) 5 return

71 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 62 Lemma 2. The running time of procedure Build-Roi() i O(n). Proof. Line 1 and 2 of Build-Roi() take O(n) time, ince each vertex of M i examined once, and each edge i examined twice (once for each end point). Line 3 and 4, including the call to Build-Roi-2, take O(n) time, ince they perform a pot-order traveral of the dependency graph, which i a foret of ize O(n). 9.2 Retrieving the Vertex Split Once the vplit lit i augmented with the roi information, elective renement can be done in the following manner: For each vertex of M 0 we earch the tree rooted at M 0 and retrieve all vertex plit, v, for which roi(v) overlap the query region, q. It i not neceary to earch the children of v if roi(v) doe not interect q, ince the roi of the children of v are contained in roi(v). Get-Split(q) 1 plit 2 forall v 2 V (M 0 ) do 3 plit plit [ Get-Split-2(v; q) 4 return plit Get-Split-2(v; q) 1 if v = nil then 2 return 3 if q interect roi(v) then 4 return fvg [ Get-Split-2(vplit[v]:v 1 ; q) [ Get-Split-2(vplit[v]:v 2 ; q) 5 return Lemma 3. The procedure Get-Split(q) return exactly the vertex plit v uch that roi(v) interect q.

72 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 63 Proof. Clearly, a vertex plit, v, uch that roi(v) doe not interect q i never reported. Furthermore, any vertex plit which i examined in Line 3 and ha a region of inuence which interect v i reported in Line 4 of Get-Split-2. Thu we need only how that all uch vertice are examined in Line 3. Note that the only reaon a vertex plit, v, i not examined in Line 3 i if one of it ancetor, v 0 had a region of inuence that did not interect q. But in thi cae, roi(v) can not interect q, ince roi(v) i contained in roi(v 0 ), which doe not interect q. Therefore, every vertex plit which i not examined in Line 3 of Get-Split-2 doe not have a region of inuence which interect q. Thi complete the proof. Lemma 4. The procedure Get-Split(q) ha running time O(jM 0 j + k) where k i the number of vertice, v, uch that roi(v) interect q. Proof. We count the number of call to the Get-Split-2 procedure, ince thi clearly bound the total running time. The procedure i called at mot jm 0 j time in the Get-Split procedure. The number of recurive call i at mot 3k, ince each vertex which i reported reult in calling the procedure for at mot two unreported vertice. 9.3 Sorting and Applying the Vertex Split Once the vplit record are retrieved they need to be orted and applied in order. The orting could be done by a tandard orting algorithm, but thi would take O(k log k) time. A more ecient method can be obtained by oberving that the order in which the vertex plit can be applied i not necearily unique. In fact, the only dependencie between vertex plit come from the parent-child relationhip on which our data tructure i baed, and the left-neighbour, and right-neighbour relationhip. I.e., uing the notation in Section 8.3 the vertice v 1 and v 2 are dependent on the vertex v, and v i dependent on the vertice v l and v r. Both thee relationhip can be repreented a a directed acyclic graph of ize O(k) and thu a feaible order for the edge collape can be obtained in O(k) time by performing a topological ort (c.f. [13]).

73 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 64 [1,11] 6 [1,5] [7,11] 4 8 [1,3] [9,11] Figure 9.2: An example of the rank and range numbering. When plitting a vertex v, it i poible that v l (or v r ) may not exit, ince it wa not plit. Then the dependency i with the nearet ancetor of v l which appear in the retrieved vplit record. Thi require that we locate the nearet living ancetor of a vertex. In order to do thi we ue an in-order numbering of the vertice in the dependency graph, and aociate with each node v, a number, rank(v) which i the inorder traveral number of v, and an interval, range(v) which contain the rank of the decendent of v a well a v itelf. Thi information i computed in a precomputation phae. With thi information, it i poible to nd the nearet living ancetor of v l by earching the neighbourhood of v for a vertex, v 0 l, uch that rank(v l ) 2 range(v 0 l). See Figure 9.2 for an example of uch a numbering. The following function produce the rank and range information ued for nding ancetor. After thee function are run, one can determine if a vertex u i an ancetor of v by checking if rank(v) 2 range(u). Build-Range() 1 next 0 2 forall v 2 V (M 0 ) do 3 next Build-Range-2(v; next) 4 return

74 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 65 Build-Range-2(v; next) 1 marker next 2 if vplit[v]:v 1 6= nil then 3 next Build-Range-2(vplit[v]:v 1 ; next) 4 rank(v) next 5 next Build-Range-2(vplit[v]:v 1 ; next + 1) 6 ele 7 rank(v) next 8 next next range(v) [marker; next? 1] 10 return next Lemma 5. The running time of procedure Build-Range() i O(n). Proof. The function implement a pot-order traveral of the dependency graph, which ha ize O(n). The Selective-Refine function i the main entry point for the elective renement algorithm. It rt nd all the vertex plit whoe region of inuence interect the query region, and then applie thee plit in an order which repect all dependencie. Selective-Refine(q) 1 plit Get-Split(q) 2 forall v 2 plit do 3 if v 2 M 0 then 4 Apply-Split(v; plit) 5 return

75 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 66 Apply-Split(v; plit) 1 if v 6= nil and v 2 plit then 2 Split(v; vplit) 3 Apply-Split(vplit[v]:v 1 ; vplit) 4 Apply-Split(vplit[v]:v 2 ; vplit) 5 return The Split function i what perform a vertex plit operation. Line 1{2 enure that we don't plit a vertex that wa already plit. Line 3{6 enure that the left and right neighbour or their nearet ancetor are preent before we do the plit. Line 7 actually perform the plit a dicued above. The predicate active(v) i true when vertex v appear in the meh at it current tate of renement and fale otherwie. The activeancetor(u; v) function anwer the active ancetor of vertex v which i adjacent to vertex u in the current meh. Thi i determined by earching the neighbourhood of u for the vertex v 0 which atie rank(v) 2 range(v 0 ). Split(v; plit) 1 if not active(v) then 2 return 3 while [not active(vplit[v]:v l )] and [activeancetor(v; vplit[v]:v l ) 2 plit] 4 Split(activeancetor(v; vplit[v]:v l ); plit) 5 while [not active(vplit[v]:v r )] and [activeancetor(v; vplit[v]:v r ) 2 plit] 6 Split(activeancetor(v:vplit[v]:v r ); plit) 7 Perform-Split(vplit[v]) 8 return Lemma 6. The running time of the Selective-Refine procedure i O(jM 0 j + kd), where k i the number of vertice whoe region of inuence interect q, and d i the average degree of thee vertice. Proof. The running time of the Get-Split procedure ha been hown to be O(jM 0 j+ k) in Lemma 3. Therefore, we need only prove the running time of the Apply-Split procedure. Note that the number of call to the Apply-Split procedure i at mot

76 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 67 3 time the number of call to the Split procedure. In turn, the number of call to the Split procedure i at 2k, ince for each v 2 plit, Split i called at mot once from Apply-Split and once recurively. Diregarding recurive call, the running time of Apply-Split i contant, and the running time of Split i O(d 0 ) where d 0 i the degree of the vertex being plit. Amortizing thi value over all call to Split we get a running time of O(kd). Thi complete the proof. 9.4 Analyi and Comment Finally, we ummarize with a theorem decribing our elective renement cheme and comment on other practical apect of the cheme. Theorem 14. Given a meh in the progreive meh repreentation, the elective renement cheme ue O(n) preproceing time and O(jM 0 j + kd) query time, where jm 0 j i the ize of M 0, k i the number of vertex plit whoe region of inuence interect the query region, and d i the average degree of the vertice retrieved. Proof. The preproceing reource follow from Lemma 2 and Lemma 5. The query time follow from Lemma 6. At thi point we note that it i alo poible to generalize thi cheme to perform elective renement on any meh M t 2 fm 0 ; : : : ; M jvplitj g. In thi cae, the query region i returned at exactly the level of detail at which it appear in M t. To achieve thi generalization, we imply \prune" the earch when a vertex plit record i reached who' index i greater than t. In analyzing the running time of the generalized algorithm, the value of k i dened in the ame manner a above, but with repect to the vertex plit equence v 1 ; : : : ; v t. From the peudocode, it i eay to ee that the preproceing and elective re- nement procedure are quite imple, and hide only mall contant in the big-oh notation (ee Section 9.6 for empirical reult). Another merit of thi cheme i that ince it ue only axi aligned bounding boxe, the earch procedure can be implemented uing only comparion operation and i therefore not ubject to the rounding error inherent in oating point arithmetic computation.

77 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 68 v v 1 v =v l r Figure 9.3: An example of the cae where v 0 l = v 0 r. 9.5 Extenion and Implementation Note Dealing with Miing Neighbour In Section 9.3, it wa noted that when plitting a vertex v, it may be the cae that one (or both) of the vertice v l or v r may not be preent in the meh. In thi cae we ugget uing the nearet living ancetor of v l or v r, call thee vl 0 and vr, 0 repectively. An important point to note when implementing thi i that it may be the cae that vl 0 = vr. 0 In thi cae, the two reulting triangle (v 1 ; v 2 ; v l ) and (v 1 ; v 2 ; v r ) are identical, and are only connected to the meh by a ingle edge (ee Figure 9.3 for an example). Depending on the application, thee type of triangle may or may not caue problem. For diplay purpoe, a reaonable olution i to imply ag thee pecial triangle and not render them. Another approach to olving thi problem, which i propoed in [39] i to force plit of vl 0 and vr 0 and their decendent until v l and v r are recreated, at which point the vertex v can be plit. Thi approach eem to work well, and prevent extremely abrupt change in reolution which may be viually unpleaant. Unfortunately, it may alo ignicantly increae the number of vertice which are plit thereby increaing the running time of the elective renement algorithm.

78 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES ecoll ecoll q Figure 9.4: An example in which a triangle not in M interect q Invalid Triangle in Query Region Although the elective renement cheme decribed in thi chapter guarantee that all the triangle of M which interect the query region q are extracted, it doe not guarantee that there are not other triangle not in M which interect the query region. Figure 9.4 how an example in which thi occur. The gure how two edge collape, and the reulting meh with the roi boxe hown a dahed line. The query region q interect a triangle which i not in M, but doe not interect any of the roi boxe. 1 For the application decribed in Chapter 10 thi i not a problem. However, for ome application it may be neceary to avoid thi ituation. In order to do o, we mut enure that every triangle 4abc that i in an intermediate meh i completely covered by the roi(a) [ roi(b) [ roi(c). In thi way, if the triangle interect q, then o doe one of roi(a), roi(b), roi(c), and the triangle will not remain in the electively rened meh ince one of the vertice of the triangle will have been plit. A imple way to achieve thi i to augment the denition of roi o that for a vertex v, roi(v) contain all triangle incident to v. If thi approach i combined with Hoppe' method of forcing vertex plit (ee Section 9.5.1), then the query region q will not interect any triangle not in M, ince every triangle not in M will be covered by the roi of each of it vertice. Unfortunately, thi increae the running time of the elective renement procedure on two count. Firtly, ince the ize of the roi 1 The ame ituation can occur in the elective renement cheme decribed by Hoppe in [39].

79 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 70 increae, the likelihood that a query region will interect them alo increae, which increae the number of vertex plit which are reported. Secondly, uing Hoppe' method of forcing vertex plit reult in increaed running time a well. 9.6 Empirical Reult In order to verify the viability of the bounding box approximation to the region of inuence ued in our elective renement algorithm, ome empirical tet were performed on triangulated terrain (TIN). Tet were performed on random TIN a well a two real TIN, containing and point, each. Random TIN were generated by chooing uniformly ditributed point in the unit quare and then computing a Delaunay triangulation of thee point. We meaured and compared our reult on random TIN with thoe on the two TIN, with the ame number of vertice, and found that the reult obtained were almot identical. Split Retieved/Vertice in Region Large Window Small Window Number of Vertice (x1000) Figure 9.5: Performance of elective renement algorithm for medium and large query region. Figure 9.5 how the ratio between the number of vertice of the TIN, T, in a query region and the number of vertex plit which are retrieved when electively

80 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 71 rening the region. The tet take a query window of a xed ize and place it at 2500 regularly paced location on the TIN and perform elective renement at each location. Both mall (1=25 of the TIN' urface area) and large (1=4 of the TIN urface area) query window were teted. The main reult of thee tet i that the ratio between the number of plit performed and the number of vertice in the query window converge to a mall contant (< 3) a n increae Maximum Average Maximum Average Number of Split Performed Number of Vertice (x1000) Figure 9.6: Performance elective renement algorithm for mall query region. Figure 9.6 how reult for a query region coniting of a ingle point on the urface of the TIN. Again, the query point i placed at 2500 regularly paced location on the TIN and elective renement i performed. Figure 9.6 how the reult for TIN with up to vertice, and how that even the wort-cae running time tend to be logarithmic in the ize of T. Alo of interet are the abolute value in Figure 9.6 ince thee how that the contant are quite mall. In no cae doe the number of vertex plit performed exceed 70. When taken together, thee two et of experiment would ugget that the running time of the elective renement algorithm i of the form O(log n + k), where k i the complexity of the TIN in the query region, and n i the number of vertice in the

81 CHAPTER 9. SELECTIVE REFINEMENT OF PROGRESSIVE MESHES 72 TIN. Thu, at leat empirically, the elective renement procedure exhibit optimal behaviour up to mall contant factor.

82 Chapter 10 Application The triangulated irregular network (TIN) i one of the baic model for repreenting geographical data where a triangulated et of point i tored together with it elevation. The TIN i eentially a meh with the added contraint that no two point on it' urface have the ame projection on the (x; y) plane. TIN have been introduced in 1978 (ee [55, 56, 33]) and they are a fundamental data tructure in GIS and related area. In thi ection, we decribe ome application of our elective renement cheme, with particular attention paid to application to TIN. We how that mehe tored in the progreive meh repreentation naturally upport a number of operation common in both computational geometry and geographic information ytem. Becaue of the mall contant involved in the PM repreentation, the algorithm preented in thi ection are competitive with exiting algorithm which operate on mehe in a tandard repreentation. The advantage of thee algorithm over thee exiting algorithm are: 1. thee algorithm work on mehe in the PM repreentation, maintaining all the advantage of the PM repreentation, 2. thee algorithm require little or no additional preproceing or torage beyond the progreive meh contruction proce, making them, in ome cae, more ecient than exiting olution, and 73

83 CHAPTER 10. APPLICATIONS by uing the \pruning" technique decribed in Section 9.4, all the algorithm preented here can be ued to eciently olve approximate verion of the problem in quetion, i.e., the problem can be olved on any of the mehe M 0 ; : : : ; M jvplitj. Throughout the remainder of thi chapter we ue T when dicuing a TIN or a planar triangulation in the ame manner a we have previouly ued M to denote an arbitrary meh Point Location Point location in a triangulation i a well tudied problem in computational geometry, and a number of algorithm exit. The point location problem conit of determining, given a planar triangulation, T, the face, f, in which a query point lie. Although theoretically optimal algorithm (O(n) preproceing time and O(log n) query time) exit for the point location problem, thee algorithm have large contant hidden in the big-oh notation which make them le ueful in practice. Thi i witneed by the fact that mot real world implementation ue cheme that are le than optimal but which work well in practice [34, 50]. The elective renement procedure decribed in Chapter 9 can be ued a an ecient method for locating a point in a triangulation. The motivation for thi application i twofold: (1) the elective renement algorithm i fat in practice, and thu hould yield a fat point location algorithm, and (2) if a triangulation i already in the progreive meh repreentation then point location can be performed without any additional preproceing. To perform point location on a progreive meh, we perform elective renement in which the query region i a vertical line, namely the vertical line which pae through the query point q (ee Figure 10.1). The elective renement algorithm i run and the triangle in which q lie i found. The running time of thi algorithm i clearly O(jM 0 j + kd) where k and d are dened a in Chapter 9. If our objective i to do largely point location querie in a triangulation, then the tne function that can be ued in the contruction algorithm to obtain a progreive

84 CHAPTER 10. APPLICATIONS 75 Figure 10.1: Expreing a point location query a a elective renement query. meh i to collape the edge (v 1 ; v 2 ) uch that the reulting region of inuence i the mallet among all poible edge collape. A imilar heuritic i ued in the contruction of R-Tree [37] Ioline Extraction The problem of ioline or contour line extraction i tated a follow: given a query elevation h, return all triangle of T which occur at elevation h. Given the triangle which occur at elevation h, the polygon and polygonal chain which form the ioline can be found in O(k) time (where k i the number of triangle) by tarting at an arbitrary triangle and walking along triangle until a triangle already viited i reached, or the border of T i reached. When thi occur, another unviited triangle i choen, and the ame procedure i applied. In, [62] van Kreveld preent an algorithm for ioline extraction baed on the interval tree [24, 46]. An interval tree tore interval on the real line and can anwer querie of the form: nd all interval which overlap the query point q. By toring the minimum and maximum height of each triangle in an interval tree (along with a pointer to the triangle itelf), ioline extraction reduce to interval tree querie.

85 CHAPTER 10. APPLICATIONS 76 Figure 10.2: Expreing an ioline query a a elective renement query. Contruction of the interval tree take O(n log n) time, and querie take O(log n + k) time giving the preproceing time and query time, repectively, for ioline extraction on a TIN repreented in a tandard repreentation. For anwering ioline querie on a progreive TIN, we imply perform elective renement in which the query region i a level plane of height h, and report all triangle of T which interect thi plane. An example of thi approach i hown in Figure The running time of thi algorithm i O(jM 0 j + kd) where k and d are dened a in Chapter 9. From thi, it follow that the edge collape equence can be optimized for ioline extraction by alway collaping the edge which produce the mallet interval in the reulting upervertex. Thi heuritic i imilar to that uggeted in Section 10.1 for optimizing the edge collape equence for point location. Progreive TIN alo upport progreive tranmiion of ioline. To perform progreive tranmiion of ioline, we note that a vertex plit operation can only aect ioline in the neighbourhood of the plit vertex, and o maintaining ioline under the vertex plit operation i a traightforward localized matter. Therefore,

1 The secretary problem

1 The secretary problem Thi i new material: if you ee error, pleae email jtyu at tanford dot edu 1 The ecretary problem We will tart by analyzing the expected runtime of an algorithm, a you will be expected to do on your homework.