* (4.1) A more exact setting will be specified later. The side lengthsj are determined such that

Size: px
Start display at page:

Download "* (4.1) A more exact setting will be specified later. The side lengthsj are determined such that"

Transcription

1 D D Chapter 4 xtensions of the CUB MTOD e present several generalizations of the CUB MTOD In section 41 we analyze the query algorithm GOINGCUB The difference to the CUB MTOD occurs in the case when the cube is empty : instead of calling the bruteforce method the GOINGCUB enlarges the cube with center until it contains data points In Section 42 we extend the algorithm GOING CUB to compute the nearest neighbors to some query point from the data set Next we generalize in Section 43 the expected runtime analysis of our algorithms to other wellbehaved probability distributions In Section 44 we develop extensions of our algorithms which work efficiently in the externalmemory model of computation introduced in [2] Section 45 shows an experimental comparison of implementations of the presented variants of the CUB MTOD e emphasize the factors by which the algorithms are faster than the bruteforce method A part of the results presented in this chapter has been published in [39] 41 The GOINGCUB variant If the cube considered by the CUB MTOD is empty the alternative to determining the nearest neighbor by bruteforce is to consider a larger cube with center and determine the points contained in it The size of the current cube should be increased as long as it contains no points of This searching algorithm is called the GOINGCUB method The first cube around contains an expected number of points is the expected number of points which are contained in the cube considered in the th iteration ie The side length of this cube is determined by the procedure SID_LNGT as described in Section 212 SID_LNGT gets a sequence of parameters # $%' ' + ' ' such that 687:9 A@CBD+ 5?> 41 A more exact setting will be specified later The side lengthsj are determined such that KLM+ > 42 The sequences ' and are strictly monotone increasing The termination of the algorithm is ensured and we define OPQK 6S7T9 BJD UV The following gives a schematic description of the algorithm 53

2 GOING_CUB repeat t:t+1 if : SID_LNGT : SAC_CUB else : BUT_FOC until return SAC_CUB stands for one of the orthogonal range searching procedures SCAN JCT and SCAN_JCT ocedure SID_LNGT was introduced in Section 212 It computes the side lengths of the cubes #%$ The order of dimensions corresponds to the increasing order of the side lengths of a box '+ This order depends only on the query point and is also computed by the SID_LNGT procedure but just only once in the first iteration in time e measure the running time 57689: of the GOINGCUB method by the number of performed comparisons and arithmetic operations For the expected runtime analysis we introduce the discrete random variable that represents the number of points of contained in the cube To enable compactness of formulas we define > Let? be the A random variable representing the number of tests required by an independent call of SAC_CUB to determine the points of contained in the cube The expected runtime of the GOINGCUB method is given by: where $ 57D GCJ 8KFM $ 5B689C: $ 57DF G I $ 57D GCJ 8LKNM I $ 5PO 8LQ G is the expected runtime of the SAC_CUB procedure: $ 5PD GJ 8KFM S T@ 5 VUX'Y[Z\ $? ' 5 B 43 Y'Z where $ is the corresponding constant of the SAC_CUB procedure 5PO 8Q G is the expected runtime of the bruteforce part of the GOINGCUB method: $ 5PO 8LQ G S T 5 ] U ^Y[_% $ ' 5 B 44 Y'_ where is the corresponding $ constant of the bruteforce method The expected runtime 5`DF G required to compute the side lengths of the cubes of dimensions is given by: $ 57DF G L +S T where Y[a is the constant of the SID_LNGT procedure 5 U 'Yab 1032# 54 and the order C 45

3 # # # # Next we present the expected runtime analysis of the SAC_CUB part for each of the 3 searching strategies : SCAN JCT and In the following we first give bounds on $ 5 DN G SCAN_JCT The analysis uses a bound on the probability that the cube and $ 5PO 8LQ G is empty: T U is empty e used for some random point $ ' ] The expected runtime to determine the side lengths of the cubes can be bounded by using 46 for OPQ for and 45: $ L 57DN G Ya 1032# L Yab ] +S +S Ya 1032# Ya Let ' with be discrete random variables such that point e use 1S and T U part as follows: $ 5PO 8LQ G Y[_% e used T S Y _ Y _ Y _ Y[_% 5 T S S S S U 5 ] S S S T T T e bound the expected runtime of the bruteforce U S 5 $ 5 U$ 5 5 ' U ' S# xpected runtime of GOING_CUB when searching with SCAN e choose the sequence such that ] U%$ 5 $+$ e specify the value later By 47 and by $ 57DF G OPQ we obtain: 1032B

4 Since for and we obtain by 48 and 42 the following: $ BO 8LQ G Y _ ] 5 Y _ +S Y[_% ] 'Y[_% 411 To estimate the scanning part we recall the analysis of the SCAN procedure which we presented in Section 212 e focus on the number 5 D K J of performed tests of coordinates In Section 212 we denoted by the box obtained by relaxing the side lengths of the cube to unit intervals with respect to those dimensions different from the first ones in the list For OPQ we consider: $ 5 D K J C C 5 4 C S where 5 D K J is the number of comparisons required by an independent call of SCAN event 5 and the event ] ' are independent for all we obtain C 5 5 C ] Because of which implies: By 46 and 414 we obtain: where $ 5 D K J by If 49 we obtain: C then $ 5 D K J $ 5 D K J the following holds: $ $ 5 D K J +S C C L +S S 5 T $ $ 5 U 5 $ 5 D K J $ 5 D K J $ 5 D K J C C B $ 5 which we obtain by 27 Note that $ 5 D K J Let be the smallest such that S C 5 +S Since the is also bounded Then by 42 and

5 since Now by 42 and 49 we get implies for Thus the inequality $ 5 D K J 5 5 For such that holds and by 417 we get: 7 for we have: OPQ which 418 By and 418 we obtain an upper bound on the expected runtime of the GOING_CUB method: The function obtain $ 5B68L9: $ 5B689C: 7 is monotone increasing in $ on Thus we choose xpected runtime of GOING_CUB when searching with JCT $ Obviously 5 68L9: $ 5 8 G G 7 K where 5 8 G G K is the number of comparisons required by JCT for a cube expected to contain $ points By 35 we have 5 8 G G K 7 $ The aim is to prove that 57689: P X The function P takes its minimum at ' e set Because of the small probability of an empty first cube we choose here the sequence to grow more slowly than in the case of SCAN: By 47 we obtain: ' 4 $ 57DN G 1032# $ In order to bound 57O 8LQ G 5 5 $ Now by 48 we obtain the following bound on 5 O 8Q G : $ 5PO 8LQ G ] we use the following inequalities which we obtain by 42: where Y_ Y[_% 5 +S Y_ ] Y[_% +S Y _ ^Y _ +S OPQ 5 and

6 T By for we have : S This together with 420 implies $ 5PO 8LQ G S Y _ S Y _ For the runtime analysis of the procedure JCT we focus on the number of tests of coordinates and tests of bit array entries In the following we bound $ 5 8 G G K 5 for OPQ Let ' be discrete random variables such that $ e obtain: $ 5 8 G G K [ 5 S S points outside ' 5 ] U [ S Since 5 and ] are independent events for all we have 5 [ 5 ' To enable compactness of formulas we set for all and ] Note that 5 for OPQ Furthermore the side length of the first cube is smaller than the side length of the th cube Thus by 424: points outside ' By 31 and 32 we have S Thus S 5 L points outside ' Thus by 423 and 428 we obtain for $ 5 8 G G K 5 [ OPQ : 58 S [

7 This together with 43 and 421 implies: Summarizing $ 5PD GJ 8KFM $ X 5P68L9: Y[Z 7 Y[Z 7 'Y[Z 7 B where $ thus 5P689C: xpected runtime of GOING_CUB when searching with JCT_SCAN e recall the analysis of the JCT_SCAN procedure presented in Section 312 If the expected number of points in 7 fulfills A then JCT_SCAN consists essentially of the SCANpart since rejecting is done only with respect to one of the dimensions In this case the 7 expected running time of JCT_SCAN is which is also the running time of SCAN 7 if Thus for reasons of simplicity if then we consider the GOINGCUB method to call the searching algorithm SCAN In this case the expected runtime of the method is P Assume $ e prove that 5 689C: when searching is done by JCT_SCAN Observe that the function $ is monotone increasing in e choose the sequence such that $ The expected running times 5`DN G $ and 5PO 8LQ G can be bounded as shown in 419 and 422 respectively To estimate the expected running time required to search the cubes for points we use 43 and prove a bound on 5 8 G _D K J 5 where OPQ 5 8 _D K J is the number of comparisons required by an independent call of JCT_SCAN and is composed of the number of performed comparisons to partition the dimensions and the number of tests of coordinates and tests of bit array entries required in the rejectpart and scanningpart Let be the partition of variables done in the th iteration by JCT_SCAN It is computed to fulfill 327 and 330 and it can be determined with at most comparisons Let be the points which were not rejected in the th iteration Thus contains all points such that $ for all dimensions Only these points are scanned with respect to the dimensions of e have: 5B8 G _D K J C 5 ] 5B8 G G K 5 5 D K J where 5B8 G G K and 5 D K call of and by an independent call of JCT The reject part is given by 5#8 G G K 5 ] are the numbers of comparisons required by an independent respectively points outside

8 T U e recall the arguments used in the analysis of the expected runtime of GOING_CUB with the searching algorithm JCT By 423 and 430 we obtain: By and 433 we obtain: 5 8 G G K points outside 5 5 ] Let? be the number of comparisons required to scan some data point with respect to its dimen sions in Then the scan part of the expected runtime is given by 5 D K J 5 5? 5 which holds since events with respect to different dimensions of the points are independent By arguments similar to those in 412 and 413 we get? 5? Based on the analysis of SCAN property 327 of partition and arguments of 25 we obtain for OPQ :? 5? 435 $ ' where is the geometric mean of the side lengths of the box are the points contained in the box which is obtained by relaxing the side lengths of the cube to unit intervals except those with respect to the dimensions in In analogy to 413 the following inequalities are easily seen to hold: where is some data point Following the arguments of 329 and 334 in Section 312 we obtain for OPQ :? This combined with and 43 implies by using T T U $ 57D GCJ 8LKNM U Y[Z Y[Z 7 7 e summarize the results in the following theorem S Since we chose 60 $ we obtain 5768L9:

9 Theorem 41 The GOING_CUB method finds the nearest neighbor from the set some query point and has the expected asymptotic runtime of the CUB MTOD of data points to In practice GOING_CUB may perform better than the CUB MTOD This consideration is confirmed by the experiments in Section 45 and it is based on the fact that the GOINGCUB starts with a smaller cube Furthermore the CUB MTOD calls the bruteforce method if the considered cube does not contain any points of 61

10 42 NearestNeighbor Search 421 The algorithm The GOINGCUB method can be extended to compute the nearest neighbors of the query point from the point set The extended GOINGCUB method works as follows : the points in the current cube are determined if still less than points have been found so far then a larger cube is considered e use the notations introduced in Section 41 In the th iteration the procedure SID_LNGT is called with the parameter $ # and computes side lengths such that For the first iteration we choose GOING_CUB repeat 4 The following gives a schematic description of the algorithm if then : SID_LNGT C : SAC_CUB 5 else until if then 5 SLCT 5 else return 5 5 ST_UPDAT SLCT Set contains the nearest neighbors to the query point is the set of points found by the end of the which is together with 5 th iteration represents the set of points outside the cube the result of a slightly modified SAC_CUB The iteration terminates if If X the additional 5 nearest neighbors A which we still need are extracted from the set by the SLCTprocedure The procedure SLCT computes the distances of all points of to the query selects the smallest distances and returns as result A the corresponding points from Using a linear time selection algorithm the running time of SLCT Y'Z Y Z for some appropriate constant e provide the set of points outside since we do not want to test in the next iteration the points of which have been found so far e need some small modifications of SAC_CUB which is one of the searching procedures SCAN JCT and JCT_SCAN: 62

11 Since SCAN checks each point of its input set it can simply maintain two index lists for the points inside and for those outside of the current cube JCT gets the set 5 as a bit array where an entry is set to 1 if and only if this represents the index of a point outside After the rejecting phase the bit array contains 1entries exactly for the points in and represents the result e built the bit array by the bitwiseand between 5 and the bitwisecomplement of JCT_SCAN has the bit array 5 as part of the input At the end of the rejecting phase we have a bit array with 1entries exactly for the points to be scanned in this th iteration The scanning phase has as result and bit array is built as described above 7$ The order of dimensions corresponds to the increasing order of the side lengths of a box '+ This order depends only on the query point and is computed by the SID_LNGT procedure only once in the first iteration in time 422 The expected runtime analysis e introduce the discrete random variable? to represent the total of tests of coordinates and tests of bit array entries required by an independent call of SAC_CUB 5 Let be here the discrete $ random variable for the number of points found during the first iterations e have for % and we define % In practice the numbers could be chosen to depend on the number of nearest neighbors which are still to be found thus if 5 then 5 5 The analysis of the algorithm with random variables is involved e consider a fixed sequence that is known at the beginning of the method and which satisfies 41 The termination of the algorithm is ensured and we can define OPQ $ The expected runtime $ 5P68L9: of the GOINGCUB is given by: 5B689: $ 5PDN G I $ 57D GJ 8KFM $ 57D G $ where 57D GCJ 8KFM is the expected runtime of the SAC_CUB procedure: $ PD GJ 8KFM 5 ^Y[Z $? ' $ +S and 57D G is the expected runtime of step SLCT: $ D G 5 +S 'Y[_ Y'Z Y[_ The values and are the corresponding constants $ of the SAC_CUB procedure and of the SLCTprocedure respectively The expected runtime 5 DN G for the computation of the side lengths of the cubes is given by: $ 5PDN G L +S T 5 U Ya 1032 where Y[a is the constant of the SID_LNGT procedure introduced in Section 212 In the following we first bound the probability time of SLCT SID_LNGT and SAC_CUB and then determine the expected running

12 The probability of cube enlargement To bound the probability we use the following theorem see [22] based on the Chernoff Bound Theorem 42 [22] Let be the discrete variable which counts the total number of successes by performing independent Bernoulli trials with the success probability of a trial Then 8 for each OPQ L where is the variance of Taking 8 in Theorem 42 we get the bound: Corollary 41 Let be the discrete variable which counts the total number of successes by performing independent Bernoulli trials with the success probability of a trial Then where OPQ Corollary 42 is the discrete random variable for the number of nearest neighbors found by GO ING_CUB during the first iterations and a constant such that $ where OPQ Then the following holds: 440 oof $ ' Let be the random variable for the number of points of within Thus and for some point $ '+ the probability equals Let Because of we get by Corollary 41: By using L L L ' we obtain the stated inequality e choose to fulfill OPQ with OPQ $ e have By 440 and we obtain: ] where e specify the exact value of later ]

13 # # The total expected runtime of the GOINGCUB method e introduce some preliminary notations and observations The event 5 following 2 disjoint events for any : is the union of the $ S# 5 S# 5 $ 443 where point for any 5 e denote by 5 and by the left event and the right event of the union in 443 respectively e will use the following inequality: S# oposition 41 The expected runtime of step SLCT is S# S# oof e have: $ D G Y[_% 5 $ 5 5 +S Y[_% S S Since variables are independent we obtain by and 445 the following for OPQ : S# 5 S# 5 e used the fact 5 5 If then obviously Altogether we have: $ 5PD G Y[_% Y[_ Y[_% Y[_ In a similar way we bound the expected running time to compute the side lengths of the cubes L and 442 and we obtain: and the order e proceed in analogy to 47 use $ 57DF G 5 ^Ya # 446

14 N In the following we bound the expected runtime $ 5 D GCJ 8KFM for each of the considered searching procedures SCAN JCT and JCT_SCAN OPQ we obtain A If SCAN is used then for iteration $ 5? ' S where $ '+ and is as introduced in Section 212 the box obtained by relaxing the side lengths of the cube to unit intervals with respect to those dimensions different from the first ones in The equation 447 is easily verified by adapting the analysis of SCAN to the fact that only points outside are scanned By using and the fact that the points are drawn independently at random we obtain: S# This implies together with 22 and $ 5PD GJ 8KFM Y[Z +S 5 $ 5 D K J B C where 5 D K J is the total number of tests required by an independent call of SCAN Clearly 5 D K J C $ and by 27 5 D K J C $ for OPQ Note that the upper bound on 5`D GCJ 8KFM procedure is used e choose the smallest possible value of which fulfills 441 that is If then: 7 C 7 $ Thus by using 437 5`D GJ 8KFM $ 57D GJ 8KFM is monotone increasing in for this case when the SCAN can be bounded as follows: 7 7 If we have for big enough and In this case if the smallest such that Since 66 It can be shown that then we have by 442: L L Let be

15 T By 442 we get: $ 5PD GJ 8KFM procedure SCAN is 5 for If L ith this we obtain then the expected runtime of the searching Thus by 446 and oposition 41 the expected runtime of the GOINGCUB method with the searching procedure SCAN is for Notice that our method is only of interest if is significantly smaller then since tests are needed in any case and the bruteforce method cannot be beaten asymptotically by the GOING CUB method if B B e consider the procedure JCT to determine the point sets By 437 we obtain: $ L 5PD GJ 8KFM 5 'Y[Z $ 5B8 G G 5 P K L ' +S where 5 8 G G K is the number of comparisons required by JCT for a cube expected to contain points For the runtime analysis of the procedure JCT we focus on the number of tests of coordinates and tests of bit array entries e will count also the additional bit operations to build bit array The discrete random variables ' are defined as in Section 41 such that $ e obtain: $ 5B8 G G K C 5 ' points outside 5 ' U S Thus $ 57D GCJ 8LKNM Y[Z Y Z $ The essential step is to bound +S S S S S $ S 5 5 for is the event stating that dimension of point lies outside the interval Observing that 5 we obtain: 5 5 $ 5 ] ] L [ OPQ where $ S# 5 451

16 where is the side length of the first cube e also used 444 the fact that the points are drawn independently at random and that in each iteration the side length increases Obviously since 5 5 the inequalities and 427 imply: $ 57D GCJ 8LKNM Y[Z 7 Y[Z 7 7 Thus the expected 7 $ runtime 57689: X JCT is where ] The function P with takes its minimum at $ and is monotone increasing on e choose % If % $ then 5B689: P $ otherwise 5P689C: X T U of the GOINGCUB method with the searching procedure C e consider the procedure JCT_SCAN to determine the point sets based on the results obtained in Section 312 and Section 41 If The analysis is 7 then the GOINGCUB method calls the searching algorithm SCAN as an extremal 4 case of the procedure JCT_SCAN In this case the expected runtime of the method is 7 Now assume To compute partition such that 327 and 330 are fulfilled we need at most comparisons To bound the rejecting part in iteration OPQ we investigate 5#8 G G K 5 5B8 G G K is the number of comparisons required by an independent call of JCT_SCAN In analogy to 432 and 449 and by using 451 we have 5B8 G G K 5 T points outside ' 5 U By using the properties of the partition 5B8 G G K 5 The scanning part in iteration S S 5 S and 336 we get: 5 OPQ is given by: L P where is the box obtained by relaxing the side lengths of the cube to unit intervals except those with respect to the dimensions in and the first dimensions in the list By using the 68

17 same arguments as in 448 we obtain the following bound for the scanning part: S 5 S In analogy to the analysis presented in Section 41 we obtain by the properties of the partition the following: S ` ` which correspond to the dimen ` where is the geometric mean of those side lengths of sions in Since S 5 L for $ Thus 5B68L9: $ we obtain 5 D GCJ 8LKNM where Since in both cases the upper bound on the expected runtime of the GOINGCUB method is monotone increasing in we choose Thus the expected runtime of the GOINGCUB method with the searching procedure JCT_ SCAN is e summarize the results obtained for the GOINGCUB method with the searching procedure JCT_SCAN: Theorem 43 The GOINGCUB method with the searching procedure JCT_SCAN finds the nearest neighbors from the set of points to some query point with an expected asymptotic runtime of ] if and if the points of are drawn independently at random under uniform distribution The method requires storage and 43 Other distributions preprocessing time In the following we present a generalization of the runtime analysis of the discussed methods to other wellbehaved distributions For reasons of clarity we focus here only on the analysis of the CUB MTOD with the orthogonal range searching procedures SCAN JCT and SCAN_JCT Let be the random vector that represents the coordinates of a point and let $ ' be the joint distribution function of the random vector e consider the points of to be drawn independently at random under the following distributions: are independent e have a Independent distribution: The random variables is the distribution function of the random variable b Positivelybounded distributionsee [15]: e say that exists constants such that measure and for? $? S where is positivelybounded over the bounded convex open region if there for all where has Lebesgue The distribution function has the following property:? where? is a region of volume measure 69

18 T U xpected runtime analysis under independent distribution e assume that the points of are drawn independently at random under independent distribution For a given query point we $ e introduce the functions $ S with is con such Since the are continuous and monotone increasing the function tinuous and monotone increasing as well Thus we can determine the side length of the cube S by binary search where $ of the cube L The permutation of dimensions is chosen 452 is the parameter of the algorithm which determines the side length e measure the running time of the CUB MTOD by the number of comparisons of coordinates and bit array entries e omit here the computation of the side length and of the order of dimensions which depends on the complexity of computing the distribution functions equals Since the $ points are drawn independently at random the expected number of points in the $ Let be defined such First we consider the SCAN procedure The expected number $ 5 D K J SCAN procedure is given by: where $ 5 D K J S of tests of coordinates of the are obtained by relaxing the side lengths of the cube to unit intervals with respect to those equals By 25 and by Lemma 22 we obtain the same results as in Section 212: dimensions that are different from the first ones in the list The S $ 5 D K J 70 7

19 U e proceed in analogy to the analysis in Section 214 and we obtain that the CUB MTOD with the searching procedure SCAN performs an expected number The expected number $ 5 8 G G K JCT is given by Since S we obtain by Lemma 23 $ 578 G G K of comparisons of coordinates of comparisons of coordinates and bit array entries of the procedure P The rest of the analysis of the CUB MTOD with the searching procedure JCT is done as discussed in Section 31 e skip the details of the running time analysis of the procedure JCT_SCAN Its expected number of comparisons of coordinates and bit array entries is given in 325 where $ the functions are defined in 452 instead of representing the side lengths of the box '+ The crucial obtained in Section 312 can be expressed as function $ 5B8 G _D K $ observation is that the upper bound on 5 8 G _D K J of see 334 and 335 thus it holds also in the case when we assume independent distribution of the points Thus the CUB MTOD with the searching procedure JCT_SCAN P ] performs an expected number of comparisons of coordinates and bit array entries T xpected runtime analysis under positivelybounded distribution and at random under positivelybounded distribution In the following we use the notations and 5B8 G _D K J as introduced above e consider the procedure JCT to determine the point set e assume wlog $ ' where and is the box where the ' $ '+ S e consider the points of to be drawn independently 5 D K J 5B8 G G K $ represents the side length of $ ' For each point we obtain 453 and is given by 23 The side length is determined by the procedure SID_LNGT as described in Section 213 such 454 S where is a real parameter in $ in dimension of SID_LNGT By 453 and 454 we have for 71 :

20 e consider the order of dimensions to be defined as mentioned in Section 212 to correspond to the increasing order of the side lengths of the box By 22 and Lemma 22 we obtain: where L 0 $ 5 D K J S For the probability of an empty cube we have Note that by 453 we have To obtain a total asymptotic runtime of we guarantee 455 which is satisfied for 7 e choose 7 and we obtain that the expected asymptotic runtime of the CUB $ MTOD with SCAN is if the points are ' drawn independently at random from under positivelybounded distribution e consider the procedure JCT to determine the point set $ The expected number 578 G G K of comparisons of coordinates and bit array entries of the procedure JCT is given by $ 5B8 G G K S S $ $ # e used 454 and Lemma 22 As mentioned above we have the upper bound B of the CUB MTOD with JCT is T X U of is 7 X $ ' if the points are drawn independently at random from distribution Thus the expected runtime A suitable value The expected asymptotic runtime of the CUB MTOD with JCT is under positivelybounded The expected runtimes of the procedures SCAN and JCT can be bounded in terms of the volume of the box and the constants and This can be achieved also in the analysis of the procedure JCT_SCAN by combining the above results e obtain in a straightforward manner that the procedure JCT_SCAN has an expected runtime $ P of T U assuming that the points are ' drawn independently at random from under positivelybounded distribution Theorem 44 The CUB MTOD with one of the orthogonal range searching procedures SCAN JCT and SCAN_JCT has the expected asymptotic runtime given in Theorem 22 Theorem 31 and Theorem 32 respectively if the points of are drawn independently at random under independent distribution or positivelybounded distribution 72

21 5 44 xternalmemory NearestNeighbor Search Many database applications involve massive datasets which do not fit in the main memory of modern machines In such cases the data structures need to be stored on external storage devices such as disks A major performance bottleneck is the InputOutput IO communication between fast internalmemory and slow external storage Minimizing the IO communication is the essential problem considered in the design of externalmemory data structures and algorithms First we define the IO model of computation and then present an externalmemory variant of the CUB MTOD and of the CLL MTOD 441 The xternalmemory Model of Computation Modeling accurately memory and disks systems is a complex task which has been broadly surveyed e shall work in the externalmemory model of computation introduced by Aggarwal and Vitter [2] This standard twolevel disk model focuses on the fact that typical disks read or write large blocks of contiguous data at once in order to amortize the access time over a large amount of data The model has the following parameters [ ]: number of objects in the problem instance number of objects in the problem solution number of objects that can fit into internal memory number of objects per disk block where An IO operation is the reading or writing a block from or into disk respectively see Figure 41 Computation is only performed on objects in internal memory The measures of performance in this model are the number of IOs used to solve a problem as well as the amount of space disk blocks used and the internal memory computation time P M B D Figure 41: An IO moves contiguous elements between disk and main memory More complex multilevel memory models have been considered in the literature where several disks are used in parallel to increase the performance of IO systems e refer to a recent survey by Vitter [64] e primarily want to model the extremely long disk access time relative to the internal running time Therefore we ignore the internal computation time in our algorithmic performance analysis 442 The xternalmemory Data Structure e investigate the nearestneighbor problem in the externalmodel of computation ere an object or item is a coordinate of some data point The number of objects of the problem instance is of some data point 73 or an index

22 e consider the GOINGCUB method introduced in Section 41 and we compute the expected number of IO operations performed by the external variant of this method to determine the nearest neighbor to a query point from the data set e assume that the data points are drawn independently at random under uniform distribution e compare the external GOINGCUB with the external bruteforce method which stores the points with their coordinates successively in contiguous locations of the external memory The external bruteforce method loads the data from the disk and determines the nearest neighbor to the query with IO operations The GOINGCUB method considers cubes of increasing side lengths around the query point until a cube containing points of is found In order to determine the points of a cube expected to contain points the GOINGCUB method calls the orthogonal searching algorithm JCT This algorithm is the only one among the presented searching procedures having an efficient extension in the externalmemory model of computation In the case of the JCT procedure the objects to be tested in the main memory are stored in contiguous locations in disk blocks More precisely the external data structure consists mainly of the arrays 5 introduced in Section 311 For a dimension the entries 5 $ are stored successively in contiguous locations on the disk The th entry is 5 $ where is the th largest coordinate in the dimension and is the index of the corresponding point Additionally the external data structure contains all points of such that all coordinates for some point are stored successively in contiguous locations of the external memory The external data structure is stored in blocks on the disk First we analyze the expected IO complexity of the JCT procedure which is required to determine the set of the points contained in a cube itself expected to contain data points The expected IO complexity of testing coordinates The external JCT procedure consists essentially of reading blocks containing objects 5 $ for each dimension and testing in the main memory whether the corresponding coordinates are still out of the interval $ The objects are read exactly in the same order as the order considered by the internal JCT procedure described in Section 311 : 5 $ F 5 $ for some index and 5 $ F 5 $ for some index see Figure 42 Thus for a given dimension almost all data blocks which are read from the disk contain exclusively relevant data which needs to be tested in the main memory In fact for each dimension there might be at most two read blocks that contain also coordinates which are not of interest The number of read operations required by JCT to test the coordinates of points is bounded by $ points outside points outside S S The additional multiplicative factor 2 comes from the fact that an array entry 5 $ contains 2 objects Let be the side lengths of the intervals By 31 and 32 we get $ points outside P By using the linearity of the expectation we obtain that the expected number of IO transfers performed by JCT in order to make the necessary test of coordinates is bounded by 7 where is the expected number of points of contained in the cube

23 _ T T _ dim j B items B items 0 T j [1] T j [2] T j [l] 1 Figure 42: Almost all blocks contain exclusively relevant data The expected IO complexity of marking rejected points The internal searching algorithm JCT uses a bit array to mark the rejected points in order to determine the points inside the cube A realistic assumption is that this bit array fits in the main memory Under this hypothesis the total IO complexity of the JCT procedure to determine the points of contained in the cube is In case the bit array has to be stored in external memory we proceed as follows to mark the rejected points hile testing coordinates the algorithm fills up a part of the main memory with indices of points which have to be marked as rejected This part of the memory consists of contiguous locations and has size After the fillup phase we determine the blocks of the bit array we need for the marking process and load them into a part of the main memory of size consisting of free contiguous locations The bit entries of interest are set to 1 in the blocks loaded in the main memory If necessary we repeatedly replace those blocks that are not relevant in the current phase anymore These fillup and marking phases are repeated until all coordinates were tested Blocks of the bit array that are already in the memory due to a previous phase can be reused The worst case situation is that the expected number of IO operations required to mark the rejected points is linear in the number of rejecting steps that is This is based on the fact that loaded blocks of the bit array may contain only one entry of interest e propose an alternative procedure to determine the unrejected points Instead of using a bit array we gather the indices of the rejected points during the process of testing coordinates This is done by repeatedly filling up the internal memory sorting the indices such that each index appears only once and writing those indices in contiguous locations of the external memory e end up having blocks each of them containing sorted indices By using the external merge sort algorithm introduced by Aggarwal and Vitter [2] we obtain a list with the indices of the rejected points in increasing order in 1032 _ IOs This list is stored in _ U blocks in the external memory The list is scanned and we determine in T _ U The missing indices are stored in _ U in the cube from the list blocks in the external memory They represent the points contained and are the result of the external JCT procedure IOs the missing indices 75 7

24 Y # # $ _ Y e take a closer look at the above bound 1032 _ 032 of and the base of the logarithm is large in the order of Thus _ For typical values is less 1032 than 4 for all realistic values of and Thus in the above bound the important term is not the term but the term in the denominator of This observation represents an important feature of the external sorting The total IO complexity e consider in the following the realistic assumption that the bit array fits in the main memory In this case the JCT procedure performs expected at most 4 IOs to determine the points of contained in some cube where $ The GOINGCUB method works as described in Section 41 and we choose the sequence such that e specify the value of the parameter later The expected number of points in the cube fulfills where OPQ and OPQ ' Let $ By using 428 and 421 we obtain the following upper bound on the expected number of IOs performed by GOINGCUB during the JCT phase: where S T@ S S S VUX P P points outside ' S S $ P Next we look at the expected number of IO operations performed in the bruteforce part of the external GOINGCUB method If the current cube is the first cube not to be empty then the bruteforce method is called which loads the coordinates of the data points and determines the minimum distance of these points to the query Since the data points contained in are not necessarily stored successively in the external memory the bruteforce method loads each point separately and requires IO operations per loaded point in the worst case e proceed in analogy to 48 and 422 and obtain that the expected number of IOs performed in the bruteforce part of the GOINGCUB is bounded by for an appropriate constant Y Y A realistic assumption is that a data point or a query point fits in the main memory In this case the computation of the side lengths can be done entirely in the main memory since they depend only on the Y 76 Y

25 _ If then by 419 the computation of the side lengths requires 7 IO operations This is based on the fact that as seen in Section 213 the computation of takes steps ach step involves an evaluation of the actual volume which takes IO operations Altogether we obtain the following upper bound on the number of IOs required by the external GOINGCUB to compute the nearest neighbor to the query point : 7 query point and the values % 7 expected _ Since the function takes its minimum at _ we set the parameter to equal _ ' Summarizing we obtain that the external GOINGCUB method finds the nearest neighbor from the _ IO operations data set to the query point in The external GOINGCUB 7 method compares favorably with the external bruteforce method if and which is a realistic assumption in the externalmemory model where the data points do not fit in the main memory 77

26 45 xperiments e present an experimental comparison of implementations of the presented variants of the CUB MTOD and GOINGCUB method e emphasize the factors by which the algorithms are faster than the bruteforce method $ 'N e test our algorithms on random test data e generate data sets in each of size according $ 'C to the uniform distribution For each data set a query point is chosen randomly in For each of the considered settings of and we generate 20 pairs each consisting of a random data set and a random query point Based on those 20 pairs we compute for each considered setting of and the average speedup of our algorithms with respect to the bruteforce method In our first experiments we consider query algorithms that do not need preprocessing Figure 43 shows 3 an experimental comparison for dimension and number of points between 1000 and The value on the vertical scale is the factor by which each algorithm is faster then the bruteforce method The speedup factors of the CUB MTOD and the GOINGCUB method using the SCAN searching procedure are represented by the curves cube and grow respectively The curves adaptcube and adaptgrow illustrate the speedup factors of the methods when the range searching procedure ADAPTIV_SCAN introduced in Section 22 is used The speedup factors of the algorithms measured in practice show a similar behavior to the speedup factor presented in Chapter 2 As expected the ADAPTIV_SCAN improves the query algorithms The observed improvement of the GOINGCUB method over the CUB MTOD is based on the fact that the GOINGCUB method starts with a smaller cube speedup Speedup with respect to bruteforce with dim100 adaptgrow adaptcube grow cube number of points speedup Speedup with respect to bruteforce number of nearest neighbors number of points Figure 43: Comparison of the CUB method and GOINGCUB method with the bruteforce method Figure 44: Comparison for nearestneighbor search of the GOINGCUB method using SCAN with the bruteforce method Figure 44 shows a comparison of the GOINGCUB method for nearestneighbor search with the bruteforce method which computes all distances of the points of to the query point and selects the smallest distances among them The GOINGCUB method performs the orthogonal range searching of the considered cubes by the searching algorithm SCAN The experiments have been carried out for dimension 3 number of points between 1000 and and number of nearest neighbors between and The experimental comparison reflects the speedup factor 7 proven in Section 422 Next we consider the GOINGCUB method with the searching procedures JCT and JCT_SCAN and compare it with the bruteforce method and the GOINGCUB method when using the procedure ADAPTIV_SCAN 78

27 60 50 Speedup with respect to bruteforce with d200 reject_scan reject adaptive_scan Speedup with respect to bruteforce with n10000 reject_scan reject adaptive_scan speedup speedup number of points Figure 45: Speedup factors for fixed dimension dimension Figure 46: Speedup factors for fixed number of points B Figure 45 shows an experimental comparison of the searching algorithms for dimension 3 and sizes of the data set between and Figure 46 shows the experimental comparison 333 of the searching algorithms for and dimension varying between 100 and 500 The value on the vertical scale is the factor by which each algorithm is faster then the bruteforce method Both figures illustrate that for high dimensions the methods using preprocessing dominate clearly the GOING CUB method with the ADAPTIV_SCAN searching procedure Furthermore the JCT_SCAN procedure is better than the procedure JCT as expected from the runtime analysis presented in Section 31 The speedup factor of the GOINGCUB method with the JCT_SCAN procedure is with respect to the bruteforce method Figure 46 points out the dependency of the speedup on the dimension for the methods using preprocessing A speedup factor above 30 is a considerable improvement in practice e conclude with the experimental comparison for nearestneighbor search In Figure 47 the GOINGCUB method using JCT_SCAN is compared with the bruteforce method The experiments 333 have been carried out for number of points dimension between 100 and 500 and number of nearest neighbors between and A growth of the speedup factor with the dimension can be seen ven for big values of the number of nearest neighbors the speedup factor of the GOINGCUB with respect to the bruteforce method is between 10 and 30 79

28 Speedup with respect to bruteforce for n10000 speedup number of nearest neighbors dimension Figure 47: Comparison for nearestneighbor search of the GOINGCUB method using JCT_SCAN with the bruteforce method 80

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems. 2.3 Designing algorithms There are many ways to design algorithms. Insertion sort uses an incremental approach: having sorted the subarray A[1 j - 1], we insert the single element A[j] into its proper

More information

General properties of staircase and convex dual feasible functions

General properties of staircase and convex dual feasible functions General properties of staircase and convex dual feasible functions JÜRGEN RIETZ, CLÁUDIO ALVES, J. M. VALÉRIO de CARVALHO Centro de Investigação Algoritmi da Universidade do Minho, Escola de Engenharia

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols SIAM Journal on Computing to appear From Static to Dynamic Routing: Efficient Transformations of StoreandForward Protocols Christian Scheideler Berthold Vöcking Abstract We investigate how static storeandforward

More information

Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1]

Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1] Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1] Marc André Tanner May 30, 2014 Abstract This report contains two main sections: In section 1 the cache-oblivious computational

More information

II (Sorting and) Order Statistics

II (Sorting and) Order Statistics II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 8 Sorting in Linear Time The sorting algorithms introduced thus far are comparison sorts Any comparison

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information

PCP and Hardness of Approximation

PCP and Hardness of Approximation PCP and Hardness of Approximation January 30, 2009 Our goal herein is to define and prove basic concepts regarding hardness of approximation. We will state but obviously not prove a PCP theorem as a starting

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Lecture 17. Lower bound for variable-length source codes with error. Coding a sequence of symbols: Rates and scheme (Arithmetic code)

Lecture 17. Lower bound for variable-length source codes with error. Coding a sequence of symbols: Rates and scheme (Arithmetic code) Lecture 17 Agenda for the lecture Lower bound for variable-length source codes with error Coding a sequence of symbols: Rates and scheme (Arithmetic code) Introduction to universal codes 17.1 variable-length

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

An Eternal Domination Problem in Grids

An Eternal Domination Problem in Grids Theory and Applications of Graphs Volume Issue 1 Article 2 2017 An Eternal Domination Problem in Grids William Klostermeyer University of North Florida, klostermeyer@hotmail.com Margaret-Ellen Messinger

More information

CS 6402 DESIGN AND ANALYSIS OF ALGORITHMS QUESTION BANK

CS 6402 DESIGN AND ANALYSIS OF ALGORITHMS QUESTION BANK CS 6402 DESIGN AND ANALYSIS OF ALGORITHMS QUESTION BANK Page 1 UNIT I INTRODUCTION 2 marks 1. Why is the need of studying algorithms? From a practical standpoint, a standard set of algorithms from different

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

3 Fractional Ramsey Numbers

3 Fractional Ramsey Numbers 27 3 Fractional Ramsey Numbers Since the definition of Ramsey numbers makes use of the clique number of graphs, we may define fractional Ramsey numbers simply by substituting fractional clique number into

More information

22 Elementary Graph Algorithms. There are two standard ways to represent a

22 Elementary Graph Algorithms. There are two standard ways to represent a VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph

More information

Testing Continuous Distributions. Artur Czumaj. DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science

Testing Continuous Distributions. Artur Czumaj. DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science Testing Continuous Distributions Artur Czumaj DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science University of Warwick Joint work with A. Adamaszek & C. Sohler Testing

More information

Lecture 15. Error-free variable length schemes: Shannon-Fano code

Lecture 15. Error-free variable length schemes: Shannon-Fano code Lecture 15 Agenda for the lecture Bounds for L(X) Error-free variable length schemes: Shannon-Fano code 15.1 Optimal length nonsingular code While we do not know L(X), it is easy to specify a nonsingular

More information

Comparing the strength of query types in property testing: The case of testing k-colorability

Comparing the strength of query types in property testing: The case of testing k-colorability Comparing the strength of query types in property testing: The case of testing k-colorability Ido Ben-Eliezer Tali Kaufman Michael Krivelevich Dana Ron Abstract We study the power of four query models

More information

Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-Memory Multiprocessors

Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-Memory Multiprocessors Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-Memory Multiprocessors Emery D. Berger Robert D. Blumofe femery,rdbg@cs.utexas.edu Department of Computer Sciences The University of Texas

More information

ENGI 4892: Data Structures Assignment 2 SOLUTIONS

ENGI 4892: Data Structures Assignment 2 SOLUTIONS ENGI 4892: Data Structures Assignment 2 SOLUTIONS Due May 30 in class at 1:00 Total marks: 50 Notes: You will lose marks for giving only the final answer for any question on this assignment. Some steps

More information

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure Lecture 6: External Interval Tree (Part II) Yufei Tao Division of Web Science and Technology Korea Advanced Institute of Science and Technology taoyf@cse.cuhk.edu.hk 3 Making the external interval tree

More information

A step towards the Bermond-Thomassen conjecture about disjoint cycles in digraphs

A step towards the Bermond-Thomassen conjecture about disjoint cycles in digraphs A step towards the Bermond-Thomassen conjecture about disjoint cycles in digraphs Nicolas Lichiardopol Attila Pór Jean-Sébastien Sereni Abstract In 1981, Bermond and Thomassen conjectured that every digraph

More information

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19 CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types

More information

Dartmouth Computer Science Technical Report TR Chain Match: An Algorithm for Finding a Perfect Matching of a Regular Bipartite Multigraph

Dartmouth Computer Science Technical Report TR Chain Match: An Algorithm for Finding a Perfect Matching of a Regular Bipartite Multigraph Dartmouth Computer Science Technical Report TR2014-753 Chain Match: An Algorithm for Finding a Perfect Matching of a Regular Bipartite Multigraph Stefanie Ostrowski May 28, 2014 Abstract We consider the

More information

COMP Data Structures

COMP Data Structures COMP 2140 - Data Structures Shahin Kamali Topic 5 - Sorting University of Manitoba Based on notes by S. Durocher. COMP 2140 - Data Structures 1 / 55 Overview Review: Insertion Sort Merge Sort Quicksort

More information

Online Stochastic Matching CMSC 858F: Algorithmic Game Theory Fall 2010

Online Stochastic Matching CMSC 858F: Algorithmic Game Theory Fall 2010 Online Stochastic Matching CMSC 858F: Algorithmic Game Theory Fall 2010 Barna Saha, Vahid Liaghat Abstract This summary is mostly based on the work of Saberi et al. [1] on online stochastic matching problem

More information

Partitions and Packings of Complete Geometric Graphs with Plane Spanning Double Stars and Paths

Partitions and Packings of Complete Geometric Graphs with Plane Spanning Double Stars and Paths Partitions and Packings of Complete Geometric Graphs with Plane Spanning Double Stars and Paths Master Thesis Patrick Schnider July 25, 2015 Advisors: Prof. Dr. Emo Welzl, Manuel Wettstein Department of

More information

PANCYCLICITY WHEN EACH CYCLE CONTAINS k CHORDS

PANCYCLICITY WHEN EACH CYCLE CONTAINS k CHORDS Discussiones Mathematicae Graph Theory xx (xxxx) 1 13 doi:10.7151/dmgt.2106 PANCYCLICITY WHEN EACH CYCLE CONTAINS k CHORDS Vladislav Taranchuk Department of Mathematics and Statistics California State

More information

arxiv: v3 [cs.dm] 12 Jun 2014

arxiv: v3 [cs.dm] 12 Jun 2014 On Maximum Differential Coloring of Planar Graphs M. A. Bekos 1, M. Kaufmann 1, S. Kobourov, S. Veeramoni 1 Wilhelm-Schickard-Institut für Informatik - Universität Tübingen, Germany Department of Computer

More information

Multiway Blockwise In-place Merging

Multiway Blockwise In-place Merging Multiway Blockwise In-place Merging Viliam Geffert and Jozef Gajdoš Institute of Computer Science, P.J.Šafárik University, Faculty of Science Jesenná 5, 041 54 Košice, Slovak Republic viliam.geffert@upjs.sk,

More information

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Mathematical and Algorithmic Foundations Linear Programming and Matchings Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis

More information

CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002

CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002 CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002 Name: Instructions: 1. This is a closed book exam. Do not use any notes or books, other than your 8.5-by-11 inch review sheet. Do not confer

More information

JOB SHOP SCHEDULING WITH UNIT LENGTH TASKS

JOB SHOP SCHEDULING WITH UNIT LENGTH TASKS JOB SHOP SCHEDULING WITH UNIT LENGTH TASKS MEIKE AKVELD AND RAPHAEL BERNHARD Abstract. In this paper, we consider a class of scheduling problems that are among the fundamental optimization problems in

More information

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm Computing intersections in a set of line segments: the Bentley-Ottmann algorithm Michiel Smid October 14, 2003 1 Introduction In these notes, we introduce a powerful technique for solving geometric problems.

More information

22 Elementary Graph Algorithms. There are two standard ways to represent a

22 Elementary Graph Algorithms. There are two standard ways to represent a VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph

More information

FINAL EXAM SOLUTIONS

FINAL EXAM SOLUTIONS COMP/MATH 3804 Design and Analysis of Algorithms I Fall 2015 FINAL EXAM SOLUTIONS Question 1 (12%). Modify Euclid s algorithm as follows. function Newclid(a,b) if a

More information

Finding a -regular Supergraph of Minimum Order

Finding a -regular Supergraph of Minimum Order Finding a -regular Supergraph of Minimum Order Hans L. Bodlaender a, Richard B. Tan a,b and Jan van Leeuwen a a Department of Computer Science Utrecht University Padualaan 14, 3584 CH Utrecht The Netherlands

More information

Testing random variables for independence and identity

Testing random variables for independence and identity Testing rom variables for independence identity Tuğkan Batu Eldar Fischer Lance Fortnow Ravi Kumar Ronitt Rubinfeld Patrick White Abstract Given access to independent samples of a distribution over, we

More information

However, this is not always true! For example, this fails if both A and B are closed and unbounded (find an example).

However, this is not always true! For example, this fails if both A and B are closed and unbounded (find an example). 98 CHAPTER 3. PROPERTIES OF CONVEX SETS: A GLIMPSE 3.2 Separation Theorems It seems intuitively rather obvious that if A and B are two nonempty disjoint convex sets in A 2, then there is a line, H, separating

More information

Advanced Algorithms and Data Structures

Advanced Algorithms and Data Structures Advanced Algorithms and Data Structures Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Prerequisites A seven credit unit course Replaced OHJ-2156 Analysis of Algorithms We take things a bit further than

More information

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri CS161, Lecture 2 MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri 1 Introduction Today, we will introduce a fundamental algorithm design paradigm,

More information

9.5 Equivalence Relations

9.5 Equivalence Relations 9.5 Equivalence Relations You know from your early study of fractions that each fraction has many equivalent forms. For example, 2, 2 4, 3 6, 2, 3 6, 5 30,... are all different ways to represent the same

More information

Comp Online Algorithms

Comp Online Algorithms Comp 7720 - Online Algorithms Notes 4: Bin Packing Shahin Kamalli University of Manitoba - Fall 208 December, 208 Introduction Bin packing is one of the fundamental problems in theory of computer science.

More information

12.1 Formulation of General Perfect Matching

12.1 Formulation of General Perfect Matching CSC5160: Combinatorial Optimization and Approximation Algorithms Topic: Perfect Matching Polytope Date: 22/02/2008 Lecturer: Lap Chi Lau Scribe: Yuk Hei Chan, Ling Ding and Xiaobing Wu In this lecture,

More information

Algorithms Exam TIN093/DIT600

Algorithms Exam TIN093/DIT600 Algorithms Exam TIN093/DIT600 Course: Algorithms Course code: TIN 093 (CTH), DIT 600 (GU) Date, time: 22nd October 2016, 14:00 18:00 Building: M Responsible teacher: Peter Damaschke, Tel. 5405 Examiner:

More information

8 SortinginLinearTime

8 SortinginLinearTime 8 SortinginLinearTime We have now introduced several algorithms that can sort n numbers in O(n lg n) time. Merge sort and heapsort achieve this upper bound in the worst case; quicksort achieves it on average.

More information

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48 Algorithm Analysis (Algorithm Analysis ) Data Structures and Programming Spring 2018 1 / 48 What is an Algorithm? An algorithm is a clearly specified set of instructions to be followed to solve a problem

More information

Algorithms and Data Structures

Algorithms and Data Structures Algorithms and Data Structures Spring 2019 Alexis Maciel Department of Computer Science Clarkson University Copyright c 2019 Alexis Maciel ii Contents 1 Analysis of Algorithms 1 1.1 Introduction.................................

More information

Lecture 3: Art Gallery Problems and Polygon Triangulation

Lecture 3: Art Gallery Problems and Polygon Triangulation EECS 396/496: Computational Geometry Fall 2017 Lecture 3: Art Gallery Problems and Polygon Triangulation Lecturer: Huck Bennett In this lecture, we study the problem of guarding an art gallery (specified

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Suggested Reading: Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Probabilistic Modelling and Reasoning: The Junction

More information

On median graphs and median grid graphs

On median graphs and median grid graphs On median graphs and median grid graphs Sandi Klavžar 1 Department of Mathematics, PEF, University of Maribor, Koroška cesta 160, 2000 Maribor, Slovenia e-mail: sandi.klavzar@uni-lj.si Riste Škrekovski

More information

The I/O-Complexity of Ordered Binary-Decision Diagram Manipulation

The I/O-Complexity of Ordered Binary-Decision Diagram Manipulation BRICS RS-96-29 L. Arge: The I/O-Complexity of Ordered Binary-Decision Diagram Manipulation BRICS Basic Research in Computer Science The I/O-Complexity of Ordered Binary-Decision Diagram Manipulation Lars

More information

(Refer Slide Time: 1:27)

(Refer Slide Time: 1:27) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 1 Introduction to Data Structures and Algorithms Welcome to data

More information

III Data Structures. Dynamic sets

III Data Structures. Dynamic sets III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations

More information

26 The closest pair problem

26 The closest pair problem The closest pair problem 1 26 The closest pair problem Sweep algorithms solve many kinds of proximity problems efficiently. We present a simple sweep that solves the two-dimensional closest pair problem

More information

Eulerian disjoint paths problem in grid graphs is NP-complete

Eulerian disjoint paths problem in grid graphs is NP-complete Discrete Applied Mathematics 143 (2004) 336 341 Notes Eulerian disjoint paths problem in grid graphs is NP-complete Daniel Marx www.elsevier.com/locate/dam Department of Computer Science and Information

More information

An Optimal Algorithm for the Euclidean Bottleneck Full Steiner Tree Problem

An Optimal Algorithm for the Euclidean Bottleneck Full Steiner Tree Problem An Optimal Algorithm for the Euclidean Bottleneck Full Steiner Tree Problem Ahmad Biniaz Anil Maheshwari Michiel Smid September 30, 2013 Abstract Let P and S be two disjoint sets of n and m points in the

More information

Lecture 1. 1 Notation

Lecture 1. 1 Notation Lecture 1 (The material on mathematical logic is covered in the textbook starting with Chapter 5; however, for the first few lectures, I will be providing some required background topics and will not be

More information

K s,t -saturated bipartite graphs

K s,t -saturated bipartite graphs K s,t -saturated bipartite graphs Wenying Gan Dániel Korándi Benny Sudakov Abstract An n-by-n bipartite graph is H-saturated if the addition of any missing edge between its two parts creates a new copy

More information

THE INSULATION SEQUENCE OF A GRAPH

THE INSULATION SEQUENCE OF A GRAPH THE INSULATION SEQUENCE OF A GRAPH ELENA GRIGORESCU Abstract. In a graph G, a k-insulated set S is a subset of the vertices of G such that every vertex in S is adjacent to at most k vertices in S, and

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

Efficient Prefix Computation on Faulty Hypercubes

Efficient Prefix Computation on Faulty Hypercubes JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 17, 1-21 (21) Efficient Prefix Computation on Faulty Hypercubes YU-WEI CHEN AND KUO-LIANG CHUNG + Department of Computer and Information Science Aletheia

More information

COSC 311: ALGORITHMS HW1: SORTING

COSC 311: ALGORITHMS HW1: SORTING COSC 311: ALGORITHMS HW1: SORTIG Solutions 1) Theoretical predictions. Solution: On randomly ordered data, we expect the following ordering: Heapsort = Mergesort = Quicksort (deterministic or randomized)

More information

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version

Lecture Notes: External Interval Tree. 1 External Interval Tree The Static Version Lecture Notes: External Interval Tree Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk This lecture discusses the stabbing problem. Let I be

More information

Algorithms - Ch2 Sorting

Algorithms - Ch2 Sorting Algorithms - Ch2 Sorting (courtesy of Prof.Pecelli with some changes from Prof. Daniels) 1/28/2015 91.404 - Algorithms 1 Algorithm Description Algorithm Description: -Pseudocode see conventions on p. 20-22

More information

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n. Problem 5. Sorting Simple Sorting, Quicksort, Mergesort Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all 1 i j n. 98 99 Selection Sort

More information

Sparse Hypercube 3-Spanners

Sparse Hypercube 3-Spanners Sparse Hypercube 3-Spanners W. Duckworth and M. Zito Department of Mathematics and Statistics, University of Melbourne, Parkville, Victoria 3052, Australia Department of Computer Science, University of

More information

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1 CS241 - week 1 review Special classes of algorithms: logarithmic: O(log n) linear: O(n) quadratic: O(n 2 ) polynomial: O(n k ), k 1 exponential: O(a n ), a > 1 Classifying algorithms is generally done

More information

Parallel Auction Algorithm for Linear Assignment Problem

Parallel Auction Algorithm for Linear Assignment Problem Parallel Auction Algorithm for Linear Assignment Problem Xin Jin 1 Introduction The (linear) assignment problem is one of classic combinatorial optimization problems, first appearing in the studies on

More information

Interleaving Schemes on Circulant Graphs with Two Offsets

Interleaving Schemes on Circulant Graphs with Two Offsets Interleaving Schemes on Circulant raphs with Two Offsets Aleksandrs Slivkins Department of Computer Science Cornell University Ithaca, NY 14853 slivkins@cs.cornell.edu Jehoshua Bruck Department of Electrical

More information

Searching a Sorted Set of Strings

Searching a Sorted Set of Strings Department of Mathematics and Computer Science January 24, 2017 University of Southern Denmark RF Searching a Sorted Set of Strings Assume we have a set of n strings in RAM, and know their sorted order

More information

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms Algorithm Efficiency & Sorting Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms Overview Writing programs to solve problem consists of a large number of decisions how to represent

More information

ARELAY network consists of a pair of source and destination

ARELAY network consists of a pair of source and destination 158 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 55, NO 1, JANUARY 2009 Parity Forwarding for Multiple-Relay Networks Peyman Razaghi, Student Member, IEEE, Wei Yu, Senior Member, IEEE Abstract This paper

More information

EXTREME POINTS AND AFFINE EQUIVALENCE

EXTREME POINTS AND AFFINE EQUIVALENCE EXTREME POINTS AND AFFINE EQUIVALENCE The purpose of this note is to use the notions of extreme points and affine transformations which are studied in the file affine-convex.pdf to prove that certain standard

More information

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols Christian Scheideler Ý Berthold Vöcking Þ Abstract We investigate how static store-and-forward routing algorithms

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

Matching Theory. Figure 1: Is this graph bipartite?

Matching Theory. Figure 1: Is this graph bipartite? Matching Theory 1 Introduction A matching M of a graph is a subset of E such that no two edges in M share a vertex; edges which have this property are called independent edges. A matching M is said to

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

arxiv: v3 [cs.db] 20 Feb 2018

arxiv: v3 [cs.db] 20 Feb 2018 Variance-Optimal Offline and Streaming Stratified Random Sampling Trong Duc Nguyen 1 Ming-Hung Shih 1 Divesh Srivastava 2 Srikanta Tirthapura 1 Bojian Xu 3 arxiv:1801.09039v3 [cs.db] 20 Feb 2018 1 Iowa

More information

For searching and sorting algorithms, this is particularly dependent on the number of data elements.

For searching and sorting algorithms, this is particularly dependent on the number of data elements. Looking up a phone number, accessing a website and checking the definition of a word in a dictionary all involve searching large amounts of data. Searching algorithms all accomplish the same goal finding

More information

The Structure and Properties of Clique Graphs of Regular Graphs

The Structure and Properties of Clique Graphs of Regular Graphs The University of Southern Mississippi The Aquila Digital Community Master's Theses 1-014 The Structure and Properties of Clique Graphs of Regular Graphs Jan Burmeister University of Southern Mississippi

More information

1 Introduction and Review

1 Introduction and Review Figure 1: The torus. 1 Introduction and Review 1.1 Group Actions, Orbit Spaces and What Lies in Between Our story begins with the torus, which we will think of initially as the identification space pictured

More information

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015 CS161, Lecture 2 MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015 1 Introduction Today, we will introduce a fundamental algorithm design paradigm, Divide-And-Conquer,

More information

Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes

Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes Venkatesan Guruswami Carnegie Mellon University Pittsburgh, PA 53 Email: guruswami@cmu.edu Ray Li Carnegie Mellon University

More information

1 The range query problem

1 The range query problem CS268: Geometric Algorithms Handout #12 Design and Analysis Original Handout #12 Stanford University Thursday, 19 May 1994 Original Lecture #12: Thursday, May 19, 1994 Topics: Range Searching with Partition

More information

Lecture 3: Sorting 1

Lecture 3: Sorting 1 Lecture 3: Sorting 1 Sorting Arranging an unordered collection of elements into monotonically increasing (or decreasing) order. S = a sequence of n elements in arbitrary order After sorting:

More information

Theory and Algorithms Introduction: insertion sort, merge sort

Theory and Algorithms Introduction: insertion sort, merge sort Theory and Algorithms Introduction: insertion sort, merge sort Rafael Ramirez rafael@iua.upf.es Analysis of algorithms The theoretical study of computer-program performance and resource usage. What s also

More information

6. Asymptotics: The Big-O and Other Notations

6. Asymptotics: The Big-O and Other Notations Chapter 7 SEARCHING 1. Introduction, Notation 2. Sequential Search 3. Binary Search 4. Comparison Trees 5. Lower Bounds 6. Asymptotics: The Big-O and Other Notations Outline Transp. 1, Chapter 7, Searching

More information

On the perimeter of k pairwise disjoint convex bodies contained in a convex set in the plane

On the perimeter of k pairwise disjoint convex bodies contained in a convex set in the plane On the perimeter of k pairwise disjoint convex bodies contained in a convex set in the plane Rom Pinchasi August 2, 214 Abstract We prove the following isoperimetric inequality in R 2, conjectured by Glazyrin

More information

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem CS61: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem Tim Roughgarden February 5, 016 1 The Traveling Salesman Problem (TSP) In this lecture we study a famous computational problem,

More information

Approximate Nearest Line Search in High Dimensions

Approximate Nearest Line Search in High Dimensions Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi MIT mahabadi@mit.edu Abstract We consider the Approximate Nearest Line Search (NLS) problem. Given a set L of N lines in the high dimensional

More information

Which n-venn diagrams can be drawn with convex k-gons?

Which n-venn diagrams can be drawn with convex k-gons? Which n-venn diagrams can be drawn with convex k-gons? Jeremy Carroll Frank Ruskey Mark Weston Abstract We establish a new lower bound for the number of sides required for the component curves of simple

More information

MOST attention in the literature of network codes has

MOST attention in the literature of network codes has 3862 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010 Efficient Network Code Design for Cyclic Networks Elona Erez, Member, IEEE, and Meir Feder, Fellow, IEEE Abstract This paper introduces

More information

Lecture 9 March 4, 2010

Lecture 9 March 4, 2010 6.851: Advanced Data Structures Spring 010 Dr. André Schulz Lecture 9 March 4, 010 1 Overview Last lecture we defined the Least Common Ancestor (LCA) and Range Min Query (RMQ) problems. Recall that an

More information

Cecil Jones Academy Mathematics Fundamentals

Cecil Jones Academy Mathematics Fundamentals Year 10 Fundamentals Core Knowledge Unit 1 Unit 2 Estimate with powers and roots Calculate with powers and roots Explore the impact of rounding Investigate similar triangles Explore trigonometry in right-angled

More information