Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Size: px
Start display at page:

Download "Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2"

Transcription

1 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar Department of Electrical Engineering an Computer Science University of Michigan Ann Arbor, MI USA September 7, A preliminary conense version of this paper appeare as Passing messages in link-boun hypercubes, Hypercube Mutiprocessors 1987, M. Heath, e., pp This research was partially supporte by National Science Founation grant DCR an an Incentives for Excellence awar from Digital Equipment Corp.

2 Abstract Hypercube algorithms are evelope for a variety of communication-intensive tasks such as transposing a matrix, histogramming, one noe sening a (long) message to another, broacasting a message from one noe to all others, each noe broacasting a message to all others, an noes exchanging messages via a fixe permutation. The algorithm for exchanging via a fixe permutation can be viewe as a eterministic analogue of Valiant s ranomize routing. The algorithms are for link-boun hypercubes in which local processing time is ignore, communication time preominates, message heaers are not neee because all noes know the task being performe, an all noes can use all communication links simultaneously. Through systematic use of techniques such as pipelining, batching, variable packet sizes, symmetrizing, an completing, for all problems algorithms are obtaine which achieve a time with an optimal highest-orer term.

3 1 Introuction This paper gives efficient hypercube algorithms for a variety of communication-intensive tasks. The emphasis is on problems where the communication pattern (i.e., which noes are sening information an where it is going) is known in avance by all processors, an all messages are of the same size. This situation is common in SIMD (Single Instruction Multiple Data) or SCMD (Single Coe Multiple Data) applications such as matrix multiplication or inversion, some atabase operations, solving PDEs on a regular gri, image manipulation, an histogramming. By systematic use of a few basic techniques, algorithms are evelope which are significantly faster than the simplest or most common ones, an faster than those publishe previously. Our algorithms show that these techniques are quite powerful, an also show that it is avantageous to be able to use all communication links simultaneously when communication becomes a bottleneck. 1.1 Definition of a Hypercube A -imensional hypercube computer is a istribute-memory multiprocessor consisting of 2 separate processing elements (or noes), linke together in a -imensional binary cube network (see Figure 1). Each noe is given a unique -bit ientification number (henceforth referre to as the noe i..) an two noes linke if an only if their noe i.. s iffer in exactly one bit position. Two noes in a hypercube are sai to be ajacent or neighboring if they share a link. Hypercube networks have some useful properties such as a logarithmic communication iameter an a totally symmetric layout. Because the number of links per noe grows only linearly with the imension, reasonably-size hypercubes (of say, imension 10 or 12) can be practically built with current technology. NCUBE, Intel, Floating Point Systems, Ametek, an Thinking Machines have alreay introuce hypercube computers on the commercial market. 1.2 The Link-Boun Moel This paper uses a moel of hypercubes in which communication time is assume to preominate, an local processing time by the noes can be ignore. We are intereste in problems where extensive communication is require because very long messages are being sent, an where all noes are available to participate in the task an know the communication task being performe. This latter point helps reuce the communication time by insuring that messages o not nee to inclue heaer information such as message estination, message route, packet sequence number, etc. Further, we assume that each noe can utilize all of its communication links simultaneously, where the links between neighboring noes can be use in both irections simultaneously. Thus, a noe in a -imensional hypercube may be hanling 2 messages simultaneously, receiving an sening. This property we call link-boun, as oppose to other possibilities such as processor-boun (in which each noe can o only one operation at a time) or DMA-boun (in which there is an upper boun on the number of messages which can go in an out of a noe at one time). While no hypercube can currently use all of its communication links simultaneously, several manufacturers are trying to attain such ability. The NCUBE machines apparently come the closest [4], an the FPS T-Series machines have noes capable of 4 bi-irectional communications at the same time [3]. The link-boun moel has been stuie in [1, 5, 6, 7, 8, 9, 10, 11, 12, 13]. 1.3 Message-Passing Problems Throughout the paper, various hypercube communication patterns will be looke at. Each will involve sening messages between some known combination of the noes. These messages will be the same length m, but their contents may vary between ifferent pairs of communicating noes. It 1

4 = 0 = 1 = 2 = 3 Figure 1: Some Hypercubes for Small will be assume that messages may be broken own into packets at any time, while existing packets may be either recombine or broken own still further, in orer to facilitate sening, so long as the ultimate estination eventually receives the entire original message. Another key assumption is that a noe receiving a packet must finish receiving it before any of its contents can be utilize. This is sometimes calle the store-an-forwar or packet-switche moel, as oppose to a circuit-switche moel. All existing hypercubes use store-an-forwar. We assume that it takes m + β time for a noe to sen a packet of length m to a neighbor, where represents the transfer rate an β the time for start-up an termination. In real systems it is generally the case that β, an hence it is important to inclue the effect of start-up costs. Also note that the total internal I/O banwith of the computer is 2. In general, an β are treate as constants an the algorithms are analyze as functions of an m. As in [5, 6, 7, 8, 11], we ignore the effects of rouning or truncating. Furthermore, because special cases arise for various relative values of the parameters, the exact formulas will be given only for m sufficiently large. Because we are most intereste in processing long messages, the term containing the highest power of m will be calle the highest-orer term. Some of the problems stuie in this paper, such as broacasting a message from one noe to all others, have been previously consiere by [5, 9, 11]. These papers use a similar moel to the link-boun moel, but give slightly slower algorithms. In particular, [11] consiere a moel where communication between noes can only take place one irection at a time, so where this mae a ifference, the times of their algorithms have been ivie by two so that they can be fairly compare with ours. For most of the problems, several algorithms are evelope, the last being the fastest (an most sophisticate). In all problems, simple arguments show that the best versions of our algorithms have optimal high-orer terms. In some cases we believe that our algorithms are completely optimal, but are unable to prove this because our arguments can boun terms involving, m, an, or terms involving β an, but not sums of terms of ifferent types. 1.4 Notation Several of the patterns consiere here are oriente aroun one noe. For such patterns, it will be assume without loss of generality that noe 0 is the special noe. The set of all noes (or their corresponing i.. s, epening on context) which are istance k from noe 0 will be enote by C k. Note that because of the symmetry of the hypercube, any algorithm written for noe 0 can be 2

5 converte to an ientical algorithm for noe n {1,2,...,2 1} by exclusive-oring all noe i.. s reference in the noe 0 algorithm with n. Some of the algorithms utilize somewhat subtle message routing schemes, for which the following notation will be useful. Let k (n) enote the noe i.. forme by taking the bit-wise exclusive-or of n an 2 k. I.e., it is just n with the kth bit flippe. For example, if = 5, then 1 (01001) = an 3 (01001) = Observe then that the neighbors of noe n are noes 0 (n), 1 (n),..., 1 (n). Let i (n) enote the left circular-shift (rotation) of the -bit representation of n by i bits, where i {0,1,..., 1} an n {0,1,...,2 1}. For example, if = 5, then 2 (01001) = For each n {1,2,...,2 1}, let W n enote the set of all bit positions in the binary representation of n which are 1. That is, W n is the unique set of integers such that n = w W n 2 w. Likewise, let Z n = {0,1,..., 1} \ W n enote the corresponing set of 0-bit positions of n. Define n : Z n W n as follows: min W n if z > maxw n n (z) = min{w W n : w > z} otherwise for each z Z n. I.e., n (z) is the bit position of the first 1 (in left circular orer) after z in n. For example, when = 5, then (1) = 3 an (4) = 0. 2 Sening the Same Message From One Noe to All Others One of the funamental hypercube communication patterns is broacasting, in which one noe has to sen the same message to all the other noes in the hypercube. This problem has been previously examine in [5, 11], but our algorithms are somewhat faster. It is interesting to note that the NCUBE machine has special harware instructions to allow a noe to simultaneously broacast to all of its neighbors, though the current operating system oes not make use of these instructions. Assume noe 0 is to o the broacasting. The most common an straightforwar way to broacast in a hypercube is recursive oubling. Algorithm: BROADCAST 1 (Recursive Doubling) There are stages, numbere 0,1,..., 1. During stage k, noes 0,1,...,2 k 1 sen the message to noes k (0), k (1),..., k (2 k 1), respectively (an concurrently). Analysis of BROADCAST 1 Each of the stages takes m + β time, so the whole algorithm requires time (m + β) = m + β. BROADCAST 1 works on only one imension at a time an thus fails to take avantage of the concurrent rea/write channels of the link-boun hypercube. To alleviate this, it is necessary to symmetrize the algorithm. That is, break up the problem into subproblems an run them simultaneously along separate imensions. 3

6 Algorithm: BROADCAST 2 Symmetrize BROADCAST 1. I.e., break up the message into packets P 0,P 1,...,P 1, each of i size m/. During stage k, noes (0), i (1),..., (2 k 1) sen P i to noes (i+k) mo ( (0)), i (i+k) mo ( i (1)),..., (i+k) mo ( i (2 k 1)), respectively, for each i in {0,1,..., 1}. Notice that packet P 0 reaches each processor at the same stage as the single packet in BROAD- CAST 1, an in general at the en of stage k each packet has been broacast to a k-imensional subcube. The moular packet routing guarantees that no processor is trying to sen two packets along the same link at the same time. Analysis of BROADCAST 2 The same as for BROADCAST 1, only now each stage takes time (m/) + β for a total time of ( m ) + β = m + β. i Although much improve from the previous version, BROADCAST 2 still suffers from the fact that a large percentage of the links are ile most of the time. In fact, half of the links are never use (those which are irecte towars 0). A better algorithm woul be one in which each link is always busy sening ata to a noe which has not yet seen it. This is impossible to fully achieve, but it provies a goal to aim for. The next version attempts this in a systolic fashion, for the message travels across the hypercube in a wave going from one C k to the next, with every noe actively sening along all of its links when the wave hits it. Algorithm: BROADCAST 3A There are +1 stages, numbere 0,1,...,. Break up the message into packets as in BROADCAST 2. During stage k, only the noes in C k actively sen along their their outgoing links, each noe sening exactly one packet along each such link. Every noe in C k receives k istinct packets from C k 1 uring stage k 1 an the remaining k packets from C k+1 uring stage k+1. More specifically, uring stage k, each noe n C k starts out with P w for every w W n. For each such w, it sens P w to noe w (n) C k 1. In aition, for each z Z n, it sens P n(z) to noe z (n) C k+1. Figure 2 shows this process for = 3. By inuction, it is easy to verify that uring stage k 1, a noe n C k receives P w for every w W n, receiving it from noe v (n) C k 1, where v W n is such that n (v) = w. During stage k+1 noe n receives P i from i (n) C k+1 for each i W n. Therefore, at the en of stage k+1, every noe in C k has receive all of the packets. Analysis of BROADCAST 3A Similar to that of BROADCAST 2, only now there are +1 stages an thus, a total time of (+1) ( m ) + β = +1 m + (+1)β. Although slower than the previous broacast, 3A has the special property that only noes in C k are sening in the kth stage. This will be exploite later, but first notice that the only purpose of the last stage is to sen P 0,P 1,...,P 1 from noe 2 1 to noes 0 (2 1), 1 (2 1),..., 1 (2 1), respectively. 4

7 P 1 P P 1 P 1 P 1 P 1 P 0 P 0 P 1 P 2 P 2 P 2 P 0 P P 0 P 2 P 1 P 1 P 0 P 0 P 0 P 2 P 0 P Stage 0 Stage 1 Stage 2 Stage 3 Figure 2: BROADCAST 3A for = 3 Algorithm: BROADCAST 3B Ientical to BROADCAST 3A, but the last stage is eliminate by sening a secon wave right after the first. During stage 1, noe 0 relabels P 0,P 1,P 2,...,P 1 as Q 1,Q 0,Q 1,...,Q 2, respectively, an starts a secon BROADCAST 3A, only this time using Q 0,Q 1,...,Q 1 in place of P 0,P 1,...,P 1, respectively. This secon broacast is run for only 1 stages, after which each noe in C 1 will have receive every packet (an then some). Analysis of BROADCAST 3B The one stage saving results in a final time of ( m ) + β = m + β, the same as for BROADCAST 2. BROADCAST 3 can be significantly spe up through the use of pipelining. Basically, pipelining consists of breaking up a problem into smaller pieces an sening these out as separate waves, one after another, in orer to keep more of the noes busy at the same time. It is a useful tool for speeing up asymmetrical communication algorithms. Algorithm: BROADCAST 4 Pipeline BROADCAST 3. That is, ivie up the message into g groups, numbere 0,1,...,g 1, each of length m/g. Execute g 1 separate BROADCAST 3A s, one for each of the first g 1 groups, followe by a BROADCAST 3B for the last group. These shoul be one concurrently, but staggere one stage apart so that no noe is working on more than one at a time. To be more precise, there are +g 1 stages, numbere 0,1,...,+g 2. During stage k, each noe in C j, k g+1 j k, executes stage j of BROADCAST 3 for group k j. 5

8 Analysis of BROADCAST 4 Each of the +g 1 stages now takes (m/g) + β time, so the whole algorithm requires time (+g 1) ( m ) g + β = (+g 1) g By simple calculus, it can be shown that this time is minimize when which prouces a final time of g = 1 = 1 ( 1) β m 1/2 2 m + (+g 1)β., m + β = 1 m + 2 ( 1)β m 1/2 + ( 1)β 2. For purposes of comparison, the best broacast in [5] has a running time of m + 2 βm 1/2 + β. One last observation. BROADCAST 4 is absolutely optimal if all sens have to use fixe-size packets. To see this, suppose the packet size is s. Then each noe will have to receive at least m/s of these packets an, since only packets can be sent out at a time from noe 0, at least one such packet will have to wait (m/s 1)(s + β) time for a free link before it can leave noe 0. To reach noe 2 1, this packet will then have to travel links, which takes time (s+β), for a total minimum time of ( + ms 1 ) (s + β). But this is just what BROADCAST 4 takes when you let g = m/s. 3 Sening From One Noe to the Opposite Corner Noe Another important communication pattern is when some noe n sens to its opposite corner (o.c.) noe 2 1 n. Obviously, this is similar to, an coul be accomplishe by, broacasting. Although this might appear to waste time, the analysis of BROADCAST 4 shows that that algorithm alreay performs an optimal o.c. sen, if one is limite to fixe-size packets. Which is not to say that variable-size packets will help much, for at least (/)m + β time is neee just to get all of the ata out of noe n. More important, though, is to cut own all of the unnecessary communication. Assume that noe 0 wants to sen to noe 2 1. Algorithm: O.C. SEND 1 There are stages, numbere 0,1,..., 1. During stage k, noe 2 k 1 sens the message to noe k (2 k 1). 6

9 Analysis of O.C. SEND 1 Ientical to BROADCAST 1: (m + β) = m + β. Algorithm: O.C. SEND 2 Symmetrize O.C. SEND 1. I.e., break up the ata into packets P 0,P 1,...,P 1, each of size m/. During stage k, noe (2 i k 1) sens P i to noe (i+k) mo ( (2 i k 1)) for each i {0,1,..., 1}. Analysis of O.C. SEND 2 Ientical to BROADCAST 2: ( m ) + β = m + β. These algorithms reuce the total communication neee in their broacast equivalents consierably. Unfortunately, they on t save any time. In orer to o that, it is necessary to spee up the intermeiate stages. Algorithm: O.C. SEND 3 There are stages, numbere 0,1,..., 1. The packets will vary in size epening on the stage. During stage k, only the noes of C k will be sening, while only noes in C k+1 will be receiving. At the start of stage k, the ata starts out evenly istribute among each of the ( k) noes in Ck. Each such noe then breaks up its ata into k equal-size packets an sens a ifferent one out along each of its k links to C k+1, at which point the message will be evenly istribute among C k+1 s noes. Analysis of O.C. SEND 3 Stage k takes time so the whole algorithm requires where [ 1 k=0 m ( + β = k) ( k) ( 1 k ] ) m + β = ( 1 k [ 1 k=0 ) m + β, 1 ( 1 k ) ] m + β = K 1 m + β, K = k=0 1 ( k). 7

10 To help unerstan K, note that it s first six values are 1,2,2 1 2,22 3,22 3, an For 5, an hence, K ( 1) + 6( 5) ( 1)( 2) < ( 1), K The upshot of all this is that O.C. SEND 3 is faster than BROADCAST 3 by a factor of about /2. Like BROADCAST 3, O.C. SEND 3 also lens itself well to pipelining. Algorithm: O.C. SEND 4 Pipeline O.C. SEND 3. I.e., break up the message into g groups of length m/g an execute g separate O.C. SEND 3 s, one for each group. These shoul be one concurrently, but staggere one stage apart. Because O.C. SEND 3 s stages have varying lengths, there will be no synchronization between the various stages of O.C. SEND 3 being worke on for each of the groups. Since O.C. SEND 3 s first stage, which takes time (m/g) + β, is at least as long as any other of its stages, it sets the rate at which the groups can be starte. Each noe will be able to complete sening the previous group by the time it has finishe receiving the next group. Analysis of O.C. SEND 4 Due to the fact that there is no chance of conflict, the time neee for this algorithm is just the time neee to start the first g 1 O.C. SEND 3 s, as well as all of the time neee for the last one. This works out to be (g 1) ( m ) g + β + K 1 m g + β = g+k 1 1 m + (g+ 1)β, g which is minimize when g = 1 = 1 (K 1 1) β m 1/2 2. This prouces a final time of m + β = 1 m + 2 (K 1 1)β m 1/2 + ( 1)β 2. Notice that this is ientical to the time for BROADCAST 4, except that the coefficient of the m 1/2 term has been reuce by a factor of approximately 1. A final observation about the O.C. SEND algorithms: they only use links in one irection (i.e., towars noe 2 1). Consequently, another o.c. sen coul be one from noe 2 1 to noe 0 concurrently (that is, an opposite corner exchange) since they each woul use ifferent links. 8

11 4 Sening Messages Between Two Arbitrary Noes One of the more interesting questions that can be aske is, given the full use of the link-boun hypercube, what is the fastest possible time for one noe to sen to an arbitrary noe istance n away (1 n )? For simplicity, assume noe 0 is sening to noe 2 n 1. Algorithm: ARBITRARY SEND 1 Use O.C. SEND 4 on the n-imensional subcube containing noes 0,1,...,2 n 1. Analysis of ARBITRARY SEND 1 m + β n = 1 n m + 2 (K n 1 1)β m 1/2 + (n 1)β n 2 n. Like previous algorithms, this approach fails to make use of the tremenous banwith available an is slower than broacasting for all n <. Algorithm: ARBITRARY SEND 2 Use BROADCAST 4, eleting the last 2 n stages if n < 2. These stages were only neee to get the message to noes farther than istance n from noe 0. Analysis of ARBITRARY SEND 2 There are n+g 1 +g 1 stages, which prouce minimum times when g = 0 n 2 2 n 1 = 1 (n+1) β m 1/2 ( 1) m 1/2 β 1 n n. The corresponing times are then m + β = 1 m + 2 (n+1)β m 1/2 + (n+1)β 1 n 2 m + 2 ( 1)β m 1/2 + ( 1)β 0 2 n. 9

12 Closer inspection of BROADCAST 3 reveals that yet another stage can be save when /2 n 2 an a fraction of a stage save when 1 n < /2, but these only reuce the m 1/2 an β coefficients by negligible amounts. What s neee is a way to combine the link utilization of BROADCAST 4 with the efficiency of O.C. SEND 4. With this in min, it is necessary to look at a new communication pattern, the extene o.c. (x.o.c.) sen. It is similar to a regular o.c. sen, except that the sening an receiving noes are connecte to opposite corners of an n-imensional subcube S, an all communication must go through S. Algorithm: X.O.C. SEND 1 Ientical to O.C. SEND 3, except two stages, numbere 1 an n, are ae to get the items into an out of S. Analysis of X.O.C. SEND 1 The first an last stages each take time m + β, so the whole algorithm requires 2(m + β) + K n 1 m + nβ = n (2n + K n 1) n m + (n+2)β. Algorithm: X.O.C. SEND 2 Pipeline X.O.C. SEND 1 as in O.C. SEND 4. There are still g groups of m/g items apiece, only in this case the (m/g) + β time neee to get a group out of the sening noe sets the pace for the rest of the stages. Analysis of X.O.C. SEND 2 Similar to O.C. SEND 2. The total time neee is (g 1) ( m ) g + β + (2n + K n 1) m n g + (n+2)β = (gn + n + K n 1) gn m + (g+n+1)β, which is minimize when g = (n+k n 1 ) m 1/2 nβ with a resulting time of m + 2 (n+k n 1 )β n m 1/2 + (n+1)β. 10

13 Algorithm: ARBITRARY SEND 3 Break up the ata into n+1 packets, n of which contain m/ apiece while the other has the remaining nm/ items. Sen the larger packet via an O.C. SEND 4 through the subcube T containing noes 0 an 2 n 1 as opposite corners. Sen each of the other n packets via an X.O.C. SEND 2 through a ifferent one of the n n-imensional subcubes which both run parallel to T an are istance 1 from T (i.e., the subcubes containing noes 2 n+1 through 2 n+1 +2 n 1, 2 n+2 through 2 n+2 +2 n 1,..., 2 1 through n 1). Analysis of ARBITRARY SEND 3 The n+1 separate sens use ifferent links, so they can all be one concurrently with no conflicts. Each of the X.O.C. SEND 2 s of length m/ takes time m + 2 (n+k n 1 )β m 1/2 + (n+1)β n whereas the O.C. SEND 4 of length nm/ requires m + β n = 1 m + 2 (K n 1 1)β m 1/2 + (n 1)β 2 n Hence, the X.O.C. SEND 2 s will be slower than the O.C. SEND 4 whenever n+k n 1 K n 1 1, n which is true by inspection for 1 n 4 an is false for bigger n since n < K n hols for all n 4. What this all means is that the actual time for ARBITRARY SEND works out to m + β = n = 1 m + 2 (n+k n 1 )β m 1/2 + (n+1)β 1 n 4 an n 1 n. m + 2 (K n 1 1)β m 1/2 + (n 1)β 5 n or n = 2 This compares to the neee by [11]. m + 2 (n+1)β m + 2 ( 1)β m 1/2 + (n+1)β n 1 m 1/2 + ( 1)β n = As was the case with O.C. SEND 4, ARBITRARY SEND 3 is only a slight improvement over broacasting. More importantly, it uses only one link between noes, so an arbitrary exchange is possible between two noes in the same amount of time by using the links in the other irection. 11

14 5 Sening Different Messages From One Noe to Every Other Noe The next communication pattern to be looke at is where one noe nees to sen a ifferent message to each of the other noes. This operation has no stanar name (it was referre to as scatter in [11] an personalize communications in [5]), so we will call it istributing. Its ual operation, collecting, where one noe has to receive a message from each of the other noes, is exactly the same operation, only run in reverse. Hence, it suffices to esign an analyze istributing algorithms. Both of these operations are useful in asymmetrical situations where one noe of the hypercube acts as a master processor an the others as its slaves. The master istributes ifferent ata sets to each of the slaves, which in turn perform computations on them. Then the master collects all of the results. Assume without loss of generality that noe 0 is the istributing noe. The following algorithm makes use of O.C. SEND 2, which gets execute on every subcube containing noe 0. Algorithm: DISTRIBUTE 1 There are stages, numbere 0,1,..., 1. Each noe is sent its corresponing message via a separate O.C. SEND 2 applie to the subcube containing it an noe 0 as opposite corner noes. These 2 1 o.c. sens are run concurrently, with batching, an staggere so that the one to C goes first, followe by the ones to C 1, then C 2, etc. Specifically, uring stage k, the ( j) noes in Cj, 0 j k, o their share of the work for the ( ) ( +j k = k j) O.C. SEND 2 s whose estinations are in C +j k. Analysis of DISTRIBUTE 1 The time for each stage epens on the maximal amount of ata being sent out over a single link. For stage k, C j contains ( ( k j) messages of length m to be sent out evenly along its ) ( j ( j) = 1 ) j links to C j+1. This means that each such link sens a packet of length ( k j) m ( 1 j ), which is maximize when j = 0, for ( ) k i ( 1 ) > i ( ) k (i+1) ( 1 ) i+1 hols for all i {0,1,...,k 1}. Consequently, the time spent by noe 0 uring each stage is longer than noes in any other C j, so the total time for the algorithm is [ 1 k=0 ( ) k m + β The istribute algorithm in [11] ha a time of ] = [ 1 k=0 ( )] k m + β = (2 1) m + β. (2 1)m + β 12

15 an [5] s ha a time strictly greater than ours. The time in [5] is ificult to represent, but has the property that for fixe > 1, the coefficient of m is strictly greater than (2 1)/, but it tens to (2 1)/ as approaches. Note that, as was the case with the o.c. sen algorithms, DISTRIBUTE 1 uses links in only one irection, namely towars noe 2 1. Hence, a DISTRIBUTE 1 from noe 2 1 can be one concurrently using the opposite set of links. Also, for m sufficiently large (epening on the values of, β, an ), it is possible to reuce the β coefficient, which represents the number of stages or waves of ata leaving noe 0, by grouping together some of the waves as they leave noe 0 an then breaking them apart in C 1. For example, using = 3, first a wave containing messages for C 2 an C 3 is sent out, taking 4 3m + β time to leave noe 0. When this wave arrives at the noes of C 1, the portion estine for C 3 is sent, taking 1 6 m + β time, an then the portion estine for C 2 is sent, taking 1 2 m + β time. When the portion estine for C 3 reaches the noes of C 2, it is sent on to C 3, taking 1 3m + β time. Meanwhile, the secon wave of messages sent by noe 0 are those estine for C 1, taking m + β time. Messages for C 1 finish arriving at time 7 3m + 2β, messages for C 2 finish at time 2m + 3β, an messages for C 3 finish at time 11 6 m + 3β. If m/3 > β, then all messages arrive by 7 3m+2β, which has improve upon the coefficient of β. This can be shown to be absolutely optimal. This approach is extene in DISTRIBUTE 2. Algorithm: DISTRIBUTE 2 Fix, let k be the smallest integer such that ( 1)k , let r be such that rk 1 r 1 = 2 1. (Since r may be irrational, in practice one may prefer to use some rational r such that r r < 1.) There will be exactly k waves. Wave k will be those messages estine for C 1, where each link from noe 0 carries a message of size m. Wave k 1 starts from noe 0 with packets of size rm along each link, an will contain all of the messages for C 2 an (for large ) portions of each of the messages for C 3, where the portion is chosen to fill the packet size. Wave k 2 will start with packets of size r 2 m, containing the rest of each of the messages for C 3, plus messages for C 4, plus (for sufficiently large ) portions of messages for C 5. Each wave starts with packets r times larger than the following wave, an contains messages estine for a set of further C i s. When a wave reaches the noes of C 1 it is broken into wavelets, one for each of the estination C i in the wave. These wavelets continue on to their estination, ajusting the packet sizes at each step as in DISTRIBUTE 1, but not subiviing into smaller wavelets. Analysis of DISTRIBUTE 2 Since the banwith from C 1 to C 2 is 1 times the banwith from 0 to C 1, an r < 1, for m sufficiently large each wave can be sent on from C 1 before the next wave arrives. The reason m must be sufficiently large is that the breaking into wavelets introuces aitional β terms, but since r is less than 1 there is a slight bit of extra banwith, which can mask the extra start-up for sufficiently large m. As in DISTRIBUTE 1, it can be shown that, for suffiently large m, all wavelets reach their estination by the time the last wave reaches C 1. Therefore the total time is etermine 13

16 by the time it takes noe 0 to sen all k waves, which is (2 1) m + kβ (2 1) m + log 2 β. This algorithm shows that one cannot obtain a lower boun by simply aing the banwith lower boun, which etermines the optimal coefficient of m, to the start-up lower boun which shows that at least β time is neee to move any message across the hypercube. In general one can only take the maximum of these two components as a lower boun, since operations can be overlappe. 6 Completing Hypercube Algorithms Completing a hypercube operation refers to taking an operation centere aroun one noe an then simultaneously performing it on all of the noes. This prouces highly symmetrical communication patterns which utilize all of the available banwith. The simplest way to complete an operation is just to run 2 single-noe operations concurrently. In terms of algorithms, this amounts to using the same number of stages as the single-noe version. During each stage of the complete algorithm, however, each noe oes all the work necessary for the corresponing stage in all of the single noe algorithms. Link conflicts are resolve by batching. That is, grouping together all of the separate packets that have to be sent along a particular link uring the same stage an sening them as one big packet. This also reuces communication overhea (i.e., β terms) consierably. The best single-noe algorithms to complete are usually the simplest versions which still take avantage of the concurrent link capability. Sophisticate techniques such as pipelining an link balancing aren t necessary because the complete operations are so symmetric. The first operation to be complete will be the broacast. This pattern is useful for various matrix operations as well as vector multiplication. Algorithm: COMPLETE BROADCAST Complete BROADCAST 2. There are stages, numbere 0,1,..., 1, an uring stage k, each noe oes its share of the work for the corresponing stage of BROADCAST 2 for all ( k) noes which are istance k from it. Analysis of COMPLETE BROADCAST During stage k of BROADCAST 2, the total amount of ata being sent out is 2 k m/ = 2 k m, so the corresponing amount being sent out in COMPLETE BROADCAST is 2 2 k m. Due to the overall symmetry of COMPLETE BROADCAST, this outgoing ata will be evenly ivie among all 2 links of the hypercube, so each link ens up sening a packet of size 2 k m/. Therefore, the time for the algorithm is ( 1 k=0 2k m + β ) = (2 1) m + β. For purposes of comparison, [11] prouce an optimal complete broacast (which they referre to as a total exchange ) with a running time of (2 + 2 ) m + β. 14

17 The next operation to be complete will be the o.c. sen. A complete o.c. sen, henceforth referre to as an inversion is another funamental communication pattern useful for reversing the orer of ata which is store by noe i.. an for transposing matrices (to be iscusse later). Algorithm: INVERSION (Complete Opposite Corner Sen) Complete O.C. SEND 2 in exactly the same manner as BROADCAST 2 was in COMPLETE BROADCAST. Analysis of INVERSION During each stage of O.C. SEND 2, a total of packets of size m/ were being sent over separate links. Now there are 2 such packets, but there are also that many links an the symmetry of the algorithm guarantees that no more than one packet will be sent along the same link uring a stage. Thus, the time for INVERSION is ientical to O.C. SEND 2: ( m ) + β = m + β. Now consier the ultimate communication pattern, the complete exchange. This is when every noe wants to sen (as well as receive) a ifferent message to (from) each of the other noes. In other wors, it s the same thing as completing the istributing or collecting operations. The complete exchange turns out to be useful for matrix transpositions as well as ranom communication patterns (both to be iscusse later). In [11], complete exchange was calle multigather/scatter. Algorithm: COMPLETE EXCHANGE Just complete DISTRIBUTE in the same manner that O.C. SEND 1 was complete to prouce INVERSION. There are still stages, numbere 0, 1,..., 1. Analysis of COMPLETE EXCHANGE As in INVERSION s analysis, all that has to be etermine is the amount of ata each noe has to pass along each stage. Basically, every noe starts out with (2 1)m items which have to be sent out to the other noes. During stage k, it starts sening messages to the ( k) noes which are istance k away. No messages reach their proper estinations until the last stage, which means that a total of 2 k j=0 ( ) j messages are being worke on uring stage k. Due to the symmetric pattern of the sens, each of the 2 links thus sens a packet of size ( k j) m = j=0 k j=0 ( 1 ) j j m, so the algorithm nees time 1 k=0 ( k 1 ) j j m + β = j=0 1 1 j=0 k=j ( 1 ) j m + β j 15

18 = 1 j=0 ( ) 1 m + β j compare to the neee by [11] s corresponing algorithm. 2 1 m + β = 2 1 m + β, Finally, sometimes a situation arises where each noe wants to sen to one other noe as well as receive from just one noe. This will be terme a permute sen, although in some ways it is analogous to a complete arbitrary sen. An obvious example is an inversion. Another one is when each noe wants to sen to the next higher-numbere noe (mo 2 ), which can be thought of as a rotation. There are 2! such permutations, so etermining the most efficient algorithm for each one seems neither possible nor practical, though recently some papers have appeare analyzing specific permutations [8, 10]. For arbitrary permutatations, however, a eterministic analogue of Valiant s ranomize routing [12, 13] can be employe. It consists of two complete exchanges: one to isperse all the ata evenly throughout the cube an another to collect it all up at the appropriate estinations. We explicitly use the fact that all noes know the permutation being performe so that estination information nee not be sent with the ata. Algorithm: PERMUTED SEND Each noe breaks up its m items into 2 packets of m/2 items apiece. These packets are istribute throughout the cube via a complete exchange so that each noe has one packet from every noe in the cube. Since the communication pattern is a permutation, this also means that each noe has a ifferent packet to sen to every other noe in the cube. As a result, another complete exchange can be use to route all of the packets to their correct estinations. Analysis of PERMUTED SEND There are two complete exchanges, each involving message lengths of m/2 items, so the time neee for PERMUTED SEND is 2 (2 1 m2 ) + β = m + 2β. 7 Matrix Transposition Transposing a matrix in a hypercube is an interesting communication problem which can make goo use of some of the algorithms evelope so far. It has been previously consiere in [9, 11], but faster algorithms will be evelope here. Suppose you want to transpose an N N matrix M store in a -imensional hypercube, with each noe containing N 2 /2 entries. It is necessary to specify exactly how M is store, where the usual ways are either by rows(columns) or as square submatrices. Storage by rows is the easier of the two, so it will be consiere first. 16

19 7.1 Storage by Rows(Columns) There are many ways of storing by rows(columns), where we assume that N is evenly ivisible by 2. For example, the rows may be store by partitioning the rows into blocks of N/2 consecutive rows, where the assignment of blocks to noes may or may not use a Gray coe. Or it may be that a stripe pattern is use, partitioning the rows into sets of rows 2 apart, again with variations possible on how the sets are mappe onto the noes. However, no matter what metho is use to assign rows to noes (as long as the assignment evenly istributes the ata), to perform transposition each noe must sen exactly N 2 /2 2 entries to each other noe. In other wors, a complete exchange has to be performe with a message length of N 2 /2 2. This takes time 2 1 N β = 2 +1N2 + β, as compare to neee by [11]. 2 +1N2 + β 7.2 Storage by Submatrices When store as submatrices, it is convenient to assume that is even, say = 2c, an that N is evenly ivisible by 2 c. Assume that M has been partitione into submatrices M x,y, x,y {0,1,...,2 c 1}, where M x,y is forme by the intersection of rows with columns xn/2 c + 1 through (x+1)n/2 c yn/2 c + 1 through (y+1)n/2 c. Let G enote any permutation of {0,...,2 c 1}, an assume that M x,y is store in noe G(x)2 c + G(y). Typical choices for G inclue the ientity, in which case this is known as row-major orering, or a Gray coe, in which case ajacent submatrices are store in ajacent noes. No matter what G is use, transposition reuces to the problem of noe a2 c + b exchanging its entries with noe b2 c + a, for all a,b {0,1,...,2 c 1}. We provie an algorithm for this operation. First observe that this is a permute sen with message length N 2 /2. Hence, it can be accomplishe by using PERMUTED SEND in time ( ) N β = N2 + β. This is twice as long as when M is store by rows, yet on average, each item moves only half as far. Consequently, it woul not be unreasonable to expect there to be an algorithm which works in half the time. In fact, such an algorithm oes exist, but escribing an analyzing it requires examining the bit patterns of the noe i.. s. Consier noe a2 c + b, where a,b {0,1,...,2 c 1}. In their base two representations, a an b iffer by say, k bits, an agree on the other c k. Now let S a,b enote the set of all noes whose first c bits of their i.. s iffer from their last c bits in exactly the same k positions that a an b o. Observe that S a,b contains 2 k noes. By themselves, they o not form a proper subcube, but something close to one. The istance between any two noes of S a,b is always even, an if there 17

20 A D C B physical link logical link Figure 3: Logical an Physical Links Between two Noes were links between the noes that are istance two apart, then S a,b plus these new connections woul form a k-imensional hypercube. With these thoughts in min, efine a logical link between two hypercube noes A an B, which are istance two apart, to be the four physical links which connect A an B along the two possible paths of length 2 (see Figure 3). Let C an D enote the intermeiate noes connecting A an B. A logically connecte (l.c.) subcube can then be efine to be a subset of noes whose logical links connect them together in a hypercube network. That sai, S a,b is a l.c. subcube. Furthermore, its intermeiate noe i.. s have the property that their first c bits iffer from their last c bits in precisely k 1 positions. Finally, from the efinition of S a,b, every noe in the hypercube belongs to exactly one such l.c. subcube. Combining these last two statements, it becomes apparent that no two such subcubes can share the same physical link. As a consequence, algorithms can be run concurrently on all of the these l.c. subcubes without the possibility of link conflict. Returning to the original problem, noe a2 c + b can exchange ata with noe b2 c + a by simply performing an o.c. exchange in S a,b. In fact, every noe in S a,b can exchange ata with their corresponing noe for the transposition by performing an inversion in S a,b. This brings up the nee then for an inversion algorithm for l.c. subcubes. Inversion in a logically connecte subcube Sening ata along a logical link is equivalent to oing an o.c. sen in a 2-imensional hypercube. Hence, a stanar sen woul take time 2 m + 2 β 2 m1/2 + β to perform, so a l.c. inversion coul be accomplishe by performing a regular INVERSION using these logical sens. For a k-imensional l.c. subcube, this woul require time k ( ) m 2 k + 2 β m 1/2 + β = 2 k 2 m + 2 kβ 2 m1/2 + kβ. 18

21 As long as the number of packets sent out along each physical link in a logical sen is at least two, however, then there is no point in waiting for all of the incoming packets to arrive before starting to pass them along. That is, after all, the whole iea behin pipelining. Algorithm: L.C. INVERSION Perform a regular INVERSION along the logical links of the l.c. subcube, making sure to pipeline the stages together an breaking the ata up into at least four packets so that incoming packets start arriving no later than when the last of the outgoing packets are being sent out. Analysis of L.C. INVERSION Let p enote the number of packets to be sent out along each outgoing physical link (note that p has to be at least 2). Then the packet size for each sen is m/2kp. With pipelining then, it takes a total of kp sens for each noe to pass along its ata, with the intermeiate noes being one sen behin the regular noes. Therefore, kp + 1 sen stages are neee, so the algorithm runs in time which is minimize when ( ) (kp + 1) 2kp m + β, p = 1 k 2β m1/2. This prouces a final time of 2 m + 2 β 2 m1/2 + β. Observe that this time is the same time as neee by INVERSION for a 2-imensional cube. Also, it is inepenent of k so long as m is large enough to insure p 2. Algorithm: MATRIX TRANSPOSITION (store by submatrices) With L.C. INVERSION, transposing M becomes trivial. Just perform it on every l.c. subcube of M. Analysis of MATRIX TRANSPOSITION Since the l.c. inversions can all be one concurrently without overlap, an all take the same amount of time, the transposition is complete in time N β 2 ( ) N 2 1/2 + β = 2 β 2 +1N N + β, which is nearly the same time neee when M was store by rows. This is approximately half of the 2 N2 + ( 1) time require by [9], where it is assume that β is zero. 19

22 8 Histogramming The same techniques use previously can be applie to the problem of histogramming. We consier a simple variant in in which there are m ifferent bins, each noe starts with a value for each of the bins, an the goal is to fin the sum of the values for each bin. The sum for bins im/2 through (i+1)m/2 1 will be in noe i. (If it is esire that all noes contain all sums, then a complete broacast can be use at the en.) We assume that each value an sum is of unit length. Algorithm: HISTOGRAM First consier the algorithm where the ata is exchange one imension at a time, using recursive halving to ecrease the number of subtotals in each noe. During the first stage, noes in the bottom half of the subcube sen up their values for the secon half of the bins to their neighbors in the upper half, which are concurrently sening own their values for the first half of the bins. Each noe as the values receive to its own, an recursively continues on to the next stage. The final HISTOGRAM algorithm is just the symmetrize version of this simple algorithm. Analysis of HISTOGRAM For the unsymmetric algorithm, stage k, 1 k, takes (/2 k )m + β time, so for the symmetric algorithm it takes (/2 k )m + β time. The total time is (1 2 ) m + β. 9 Optimality of the Algorithms As was mentione earlier, the coefficients of the high-orer terms are the least possible. In all cases, a proof can be given base on a simple counting argument. The easiest such approach is pick a subset S of links, show that the total message loa that must be sent over S is at least some amount a, an conclue that at least one link sens at least a/ S an thus, takes at least time oing so. a S m + β For example, in BROADCAST, O.C. SEND, ARBITRARY SEND, DISTRIBUTE, an HISTOGRAM, let S be the outgoing links of noe 0. Then a is m, m, m, (2 1)m, an (1 2 )m, respectively. In the case of COMPLETE BROADCAST, pick S to be the incoming links to noe 0 an set a to (2 1)m, since noe 0 receives a ifferent message from each other broacast. For INVERSION an COMPLETE EXCHANGE, consier S to be the 2 links connecting noes 0,1,...,2 1 1 in the lower subcube L with noes 2 1,2 1 +1,...,2 1 in the upper subcube U. Then a is 2 m an m, respectively, since every noe in L sens one an 2 1 messages, respectively, to the noes in U (an vice versa). PERMUTED SEND is optimal since it is slower than the special case INVERSION by only an aitive β. Finally, for MATRIX TRANSPOSITION, when the 20

23 matrix is store by rows or columns the problem is just complete exchange, which was shown to be optimal. When the ata is store as submatrices, let S be all 2 links. Then a is N 2 /2 since the total istance travele by all messages, each of size N 2 /2, is 2 1. The lower boun for the permutation reflection, where every noe in L exchanges with its corresponing neighbor in U, has the same high-orer term as oes inversion (by the same argument). This occurs espite the fact that reflection is a fixe-point free permuation with the smallest total message istance, while inversion has the greatest total message istance. Given this, an the fact that PERMUTED SEND shows that all permutations can be route with this highest-orer term, one might guess that all fixe-point free permutations require the same highest-orer term. (If permutations with fixe points are consiere, then the ientity can be complete in zero time.) However, it has been shown that some fixe-point free permutations can be route with a highest-orer term smaller than that of reflection [10], an therefore PERMUTED SEND is only worst-case optimal among fixe-point free permutations. Beyon the highest-orer term, we believe that some of the algorithms herein are absolutely optimal. Unfortunately, we have generally been unable prove this because of the ifficulty in fining goo lower bouns which go beyon the highest-orer term. Such bouns must incorporate both banwith consierations an an accounting of start-up times, but, as was note in Section 5, one cannot simply a these components together to obtain a correct lower boun. We can, however, prove absolute optimality for DISTRIBUTE 2 for any, if m is sufficiently large. Notice that at least one of noe 0 s neighbors must receive at least (2 1)m/ items, an all except perhaps m of these items nee to be forware. Therefore it suffices to show that if a noe 0 is connecte to a noe 1, which in turn is connecte to 1 aitional noes, an if noe 0 starts with m items estine for noe 0, an (2 1)m/ m items estine for the aitional noes, then the time neee is at least the time taken by DISTRIBUTE 2. We assume that we have complete freeom in eciing which aitional noe to eliver a specific item to. Without increasing the time, we can alter any algorithm so that the items estine for noe 1 are the last items sent from noe 0. Suppose the first packet to arrive at noe 1 has size p, an the secon packet has size q, an both are estine for the aitional noes. If p < ( 1)q, then the items cannot finish arriving at the aitional noes until time (p + β) + (q + β) + (q/( 1) + β). By moving some of the items from the secon message to the first, creating new messages with lengths p an q, where p = ( 1)q an p +q = p+q, the messages can finish arriving at the aitional noes at time (p + β) + (q + β) + (q /( 1) + β). Since q < q, this is faster. A similar argument applies if p > ( 1)q, an therefore without increasing the time, we can assume that the first packet is 1 times as long as the secon. This argument can be applie inuctively, showing that we may assume that each packet sent from noe 0 is ( 1) as long as the following one. (Temporarily ignore the fact that this argument oes not apply to the last packets, since some of the items in them are not forware). Suppose noe 0 sens k packets, with sizes x( 1) k 1, x( 1) k 2,...,x, where x is such that the sum of the message sizes is (2 1)m/. The time for noe 1 to receive these messages an sen on the items estine for the aitional noes is at least (2 1) m + kβ + (x m) 1 where the last two terms are inclue only if x > m. For fixe,, an β, an sufficiently large m, this is minimize when k = log 1 [1 + ( 2)(2 1)/], which gives the time taken by DISTRIBUTE 2. To be correct, this argument must be moifie to eal with the sizes of the last packets, since the argument showing each packet must be 1 times as long as the following one assume that all items were estine for the aitional noes. An analysis by cases shows that the same time boun hols. + β, 21

24 10 Conclusion We have shown that link-boun hypercubes can make effective use of all of their communication links to perform some common communication-intensive tasks. Since a lower boun for some of these tasks is the time neee to sen out the ata from an originating noe, such tasks woul take longer on more restricte machines in which noes cannot use all of their communication links at one time. Thus our algorithms provie support for the belief that it is useful to buil machines where all communication links can be use simultaneously. By systematically applying a few techniques such as pipelining, symmetrizing, an completing, we were able to evelop a collection of algorithms giving efficient solutions to a wie range of problems. We concentrate on communication problems that are rather funamental, an have not trie to evelop all of their uses. However, we note that several aitional matrix manipulation problems can be solve by our algorithms. For example, if a matrix is store by rows or columns, then switching between blocke an stripe storage, or rotating by a quarter-turn, are all examples of complete exchange. If a matrix is store via submatrices, an the G function use in the assignment is either the ientity or a reflexive Gray coe, then rotation via quarter-turns or half-turns can be accomplishe by algorithms closely relate to MATRIX TRANSPOSITION. Since the initial announcement of our results in [14] an the submission of this paper, aitional papers have appeare which pursue the use of such techniques for matrix problems [6, 8, 9]. These papers inclue experimental results on Intel an Thinking Machines hypercubes, showing that our techniques o inee result in faster message transmission. Though our algorithms are eterministic, this paper has ties to Valiant s work on ranomize routing [12, 13]. He showe that inivisible unit-length messages in a link-boun hypercube coul be route in Θ() expecte time, no matter what the permutation, by routing each message to a ranom intermeiate estination an then on to its original estination. For long ivisible messages an a known permutation (so that heaer information nee not be attache), PERMUTED SEND eliminates the ranom estination by sening a portion of the message to every processor. Further, in [12] he use four ba examples to empirically show the usefulness of ranomization. One of these is equivalent to matrix transposition for a matrix store as submatrices, an the worst one was inversion. MATRIX TRANSPOSITION an INVERSION show that there are efficient eterministic routing schemes for these permutations. Finally, espite the intense interest in hypercube communication [1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13], still little is known about optimal hypercube performance on communication-intensive tasks such as sorting, routing, ata balancing, atabase operations, an image warping. For example, it is not known if a -imensional hypercube, starting with one item per noe, can sort the items in Θ() worst-case time. Aitional open questions inclue extening analyses to processor-boun an DMA-boun hypercubes, an to problems where the communication pattern is not known in avance an/or the message lengths are not uniform. 22

Lecture 1 September 4, 2013

Lecture 1 September 4, 2013 CS 84r: Incentives an Information in Networks Fall 013 Prof. Yaron Singer Lecture 1 September 4, 013 Scribe: Bo Waggoner 1 Overview In this course we will try to evelop a mathematical unerstaning for the

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Message Transport With The User Datagram Protocol

Message Transport With The User Datagram Protocol Message Transport With The User Datagram Protocol User Datagram Protocol (UDP) Use During startup For VoIP an some vieo applications Accounts for less than 10% of Internet traffic Blocke by some ISPs Computer

More information

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics CS 106 Winter 2016 Craig S. Kaplan Moule 01 Processing Recap Topics The basic parts of speech in a Processing program Scope Review of syntax for classes an objects Reaings Your CS 105 notes Learning Processing,

More information

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Questions? Post on piazza, or  Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)! EE122 Fall 2013 HW3 Instructions Recor your answers in a file calle hw3.pf. Make sure to write your name an SID at the top of your assignment. For each problem, clearly inicate your final answer, bol an

More information

1 Surprises in high dimensions

1 Surprises in high dimensions 1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

Divide-and-Conquer Algorithms

Divide-and-Conquer Algorithms Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

Figure 1: 2D arm. Figure 2: 2D arm with labelled angles

Figure 1: 2D arm. Figure 2: 2D arm with labelled angles 2D Kinematics Consier a robotic arm. We can sen it commans like, move that joint so it bens at an angle θ. Once we ve set each joint, that s all well an goo. More interesting, though, is the question of

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Calculus I course that I teach here at Lamar University. Despite the fact that these are my class notes, they shoul be accessible to anyone wanting to learn Calculus

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 01 01 01 01 01 00 01 01 Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

Optimal Oblivious Path Selection on the Mesh

Optimal Oblivious Path Selection on the Mesh Optimal Oblivious Path Selection on the Mesh Costas Busch Malik Magon-Ismail Jing Xi Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 280, USA {buschc,magon,xij2}@cs.rpi.eu Abstract

More information

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract The Reconstruction of Graphs Dhananay P. Mehenale Sir Parashurambhau College, Tila Roa, Pune-4030, Inia. Abstract In this paper we iscuss reconstruction problems for graphs. We evelop some new ieas lie

More information

Additional Divide and Conquer Algorithms. Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication

Additional Divide and Conquer Algorithms. Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication Aitional Divie an Conquer Algorithms Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication Divie an Conquer Closest Pair Let s revisit the closest pair problem. Last

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex boies is har Navin Goyal Microsoft Research Inia navingo@microsoftcom Luis Raemacher Georgia Tech lraemac@ccgatecheu Abstract We show that learning a convex boy in R, given ranom samples

More information

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions.

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions. Overview Operating Systems I Management Provie Services processes files Manage Devices processor memory isk Simple Management One process in memory, using it all each program nees I/O rivers until 96 I/O

More information

6.823 Computer System Architecture. Problem Set #3 Spring 2002

6.823 Computer System Architecture. Problem Set #3 Spring 2002 6.823 Computer System Architecture Problem Set #3 Spring 2002 Stuents are strongly encourage to collaborate in groups of up to three people. A group shoul han in only one copy of the solution to the problem

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Linus Svärm Petter Stranmark Centre for Mathematical Sciences, Lun University {linus,petter}@maths.lth.se Abstract Shift-map image processing is a new framework base on energy

More information

Ad-Hoc Networks Beyond Unit Disk Graphs

Ad-Hoc Networks Beyond Unit Disk Graphs A-Hoc Networks Beyon Unit Disk Graphs Fabian Kuhn, Roger Wattenhofer, Aaron Zollinger Department of Computer Science ETH Zurich 8092 Zurich, Switzerlan {kuhn, wattenhofer, zollinger}@inf.ethz.ch ABSTRACT

More information

Design of Policy-Aware Differentially Private Algorithms

Design of Policy-Aware Differentially Private Algorithms Design of Policy-Aware Differentially Private Algorithms Samuel Haney Due University Durham, NC, USA shaney@cs.ue.eu Ashwin Machanavajjhala Due University Durham, NC, USA ashwin@cs.ue.eu Bolin Ding Microsoft

More information

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

PERFECT ONE-ERROR-CORRECTING CODES ON ITERATED COMPLETE GRAPHS: ENCODING AND DECODING FOR THE SF LABELING

PERFECT ONE-ERROR-CORRECTING CODES ON ITERATED COMPLETE GRAPHS: ENCODING AND DECODING FOR THE SF LABELING PERFECT ONE-ERROR-CORRECTING CODES ON ITERATED COMPLETE GRAPHS: ENCODING AND DECODING FOR THE SF LABELING PAMELA RUSSELL ADVISOR: PAUL CULL OREGON STATE UNIVERSITY ABSTRACT. Birchall an Teor prove that

More information

Considering bounds for approximation of 2 M to 3 N

Considering bounds for approximation of 2 M to 3 N Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for

More information

Solutions to Tutorial 1 (Week 8)

Solutions to Tutorial 1 (Week 8) The University of Syney School of Mathematics an Statistics Solutions to Tutorial 1 (Week 8) MATH2069/2969: Discrete Mathematics an Graph Theory Semester 1, 2018 1. In each part, etermine whether the two

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs IEEE TRANSACTIONS ON KNOWLEDE AND DATA ENINEERIN, MANUSCRIPT ID Distribute Line raphs: A Universal Technique for Designing DHTs Base on Arbitrary Regular raphs Yiming Zhang an Ling Liu, Senior Member,

More information

2-connected graphs with small 2-connected dominating sets

2-connected graphs with small 2-connected dominating sets 2-connecte graphs with small 2-connecte ominating sets Yair Caro, Raphael Yuster 1 Department of Mathematics, University of Haifa at Oranim, Tivon 36006, Israel Abstract Let G be a 2-connecte graph. A

More information

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure

More information

6 Gradient Descent. 6.1 Functions

6 Gradient Descent. 6.1 Functions 6 Graient Descent In this topic we will iscuss optimizing over general functions f. Typically the function is efine f : R! R; that is its omain is multi-imensional (in this case -imensional) an output

More information

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques Politehnica University of Timisoara Mobile Computing, Sensors Network an Embee Systems Laboratory ing Techniques What is testing? ing is the process of emonstrating that errors are not present. The purpose

More information

Module13:Interference-I Lecture 13: Interference-I

Module13:Interference-I Lecture 13: Interference-I Moule3:Interference-I Lecture 3: Interference-I Consier a situation where we superpose two waves. Naively, we woul expect the intensity (energy ensity or flux) of the resultant to be the sum of the iniviual

More information

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES OLIVIER BERNARDI AND ÉRIC FUSY Abstract. We present bijections for planar maps with bounaries. In particular, we obtain bijections for triangulations an quarangulations

More information

Performance Modelling of Necklace Hypercubes

Performance Modelling of Necklace Hypercubes erformance Moelling of ecklace ypercubes. Meraji,,. arbazi-aza,, A. atooghy, IM chool of Computer cience & harif University of Technology, Tehran, Iran {meraji, patooghy}@ce.sharif.eu, aza@ipm.ir a Abstract

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

AnyTraffic Labeled Routing

AnyTraffic Labeled Routing AnyTraffic Labele Routing Dimitri Papaimitriou 1, Pero Peroso 2, Davie Careglio 2 1 Alcatel-Lucent Bell, Antwerp, Belgium Email: imitri.papaimitriou@alcatel-lucent.com 2 Universitat Politècnica e Catalunya,

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization 1 Offloaing Cellular Traffic through Opportunistic Communications: Analysis an Optimization Vincenzo Sciancalepore, Domenico Giustiniano, Albert Banchs, Anreea Picu arxiv:1405.3548v1 [cs.ni] 14 May 24

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 0 0 0 0 0 0 0 0 on-uniform Sensor Deployment in Mobile Wireless Sensor etworks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University Boca Raton,

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mit.eu 6.854J / 18.415J Avance Algorithms Fall 2008 For inormation about citing these materials or our Terms o Use, visit: http://ocw.mit.eu/terms. 18.415/6.854 Avance Algorithms

More information

A Stochastic Process on the Hypercube with Applications to Peer to Peer Networks

A Stochastic Process on the Hypercube with Applications to Peer to Peer Networks A Stochastic Process on the Hypercube with Applications to Peer to Peer Networs [Extene Abstract] Micah Aler Department of Computer Science, University of Massachusetts, Amherst, MA 0003 460, USA micah@cs.umass.eu

More information

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters Available online at www.scienceirect.com Proceia Engineering 4 (011 ) 34 38 011 International Conference on Avances in Engineering Cluster Center Initialization Metho for K-means Algorithm Over Data Sets

More information

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation On Effectively Determining the Downlink-to-uplink Sub-frame With Ratio for Mobile WiMAX Networks Using Spline Extrapolation Panagiotis Sarigianniis, Member, IEEE, Member Malamati Louta, Member, IEEE, Member

More information

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem Throughput Characterization of Noe-base Scheuling in Multihop Wireless Networks: A Novel Application of the Gallai-Emons Structure Theorem Bo Ji an Yu Sang Dept. of Computer an Information Sciences Temple

More information

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity Worl Applie Sciences Journal 16 (10): 1387-1392, 2012 ISSN 1818-4952 IDOSI Publications, 2012 A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Base on Gravity Aliasghar Rahmani Hosseinabai,

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science an Center of Simulation of Avance Rockets University of Illinois at Urbana-Champaign

More information

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation Solution Representation for Job Shop Scheuling Problems in Ant Colony Optimisation James Montgomery, Carole Faya 2, an Sana Petrovic 2 Faculty of Information & Communication Technologies, Swinburne University

More information

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Questions? Post on piazza, or  Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)! EE122 Fall 2013 HW3 Instructions Recor your answers in a file calle hw3.pf. Make sure to write your name an SID at the top of your assignment. For each problem, clearly inicate your final answer, bol an

More information

Design Principles for Practical Self-Routing Nonblocking Switching Networks with O(N log N) Bit-Complexity

Design Principles for Practical Self-Routing Nonblocking Switching Networks with O(N log N) Bit-Complexity IEEE TRANSACTIONS ON COMPUTERS, VOL. 46, NO. 10, OCTOBER 1997 1 Design Principles for Practical Self-Routing Nonblocking Switching Networks with O(N log N) Bit-Complexity Te H. Szymanski, Member, IEEE

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway State Inexe Policy Search by Dynamic Programming Charles DuHaway Yi Gu 5435537 503372 December 4, 2007 Abstract We consier the reinforcement learning problem of simultaneous trajectory-following an obstacle

More information

Lesson 11 Interference of Light

Lesson 11 Interference of Light Physics 30 Lesson 11 Interference of Light I. Light Wave or Particle? The fact that light carries energy is obvious to anyone who has focuse the sun's rays with a magnifying glass on a piece of paper an

More information

FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD

FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD Warsaw University of Technology Faculty of Physics Physics Laboratory I P Joanna Konwerska-Hrabowska 6 FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD.

More information

MODULE VII. Emerging Technologies

MODULE VII. Emerging Technologies MODULE VII Emerging Technologies Computer Networks an Internets -- Moule 7 1 Spring, 2014 Copyright 2014. All rights reserve. Topics Software Define Networking The Internet Of Things Other trens in networking

More information

Characterizing Decoding Robustness under Parametric Channel Uncertainty

Characterizing Decoding Robustness under Parametric Channel Uncertainty Characterizing Decoing Robustness uner Parametric Channel Uncertainty Jay D. Wierer, Wahee U. Bajwa, Nigel Boston, an Robert D. Nowak Abstract This paper characterizes the robustness of ecoing uner parametric

More information

Proving Vizing s Theorem with Rodin

Proving Vizing s Theorem with Rodin Proving Vizing s Theorem with Roin Joachim Breitner March 25, 2011 Proofs for Vizing s Theorem t to be unwiely unless presente in form a constructive algorithm with a proof of its correctness an termination.

More information

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks sensors Article EDOVE: Energy an Depth Variance-Base Opportunistic Voi Avoiance Scheme for Unerwater Acoustic Sensor Networks Safar Hussain Bouk 1, *, Sye Hassan Ahme 2, Kyung-Joon Park 1 an Yongsoon Eun

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT

More information

Lab work #8. Congestion control

Lab work #8. Congestion control TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño

More information

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways Ben, Jogs, An Wiggles for Railroa Tracks an Vehicle Guie Ways Louis T. Klauer Jr., PhD, PE. Work Soft 833 Galer Dr. Newtown Square, PA 19073 lklauer@wsof.com Preprint, June 4, 00 Copyright 00 by Louis

More information

Optimal Routing and Scheduling for Deterministic Delay Tolerant Networks

Optimal Routing and Scheduling for Deterministic Delay Tolerant Networks Optimal Routing an Scheuling for Deterministic Delay Tolerant Networks Davi Hay Dipartimento i Elettronica olitecnico i Torino, Italy Email: hay@tlc.polito.it aolo Giaccone Dipartimento i Elettronica olitecnico

More information

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition ITERATIOAL JOURAL OF MATHEMATICS AD COMPUTERS I SIMULATIO A eural etwork Moel Base on Graph Matching an Annealing :Application to Han-Written Digits Recognition Kyunghee Lee Abstract We present a neural

More information

ACE: And/Or-parallel Copying-based Execution of Logic Programs

ACE: And/Or-parallel Copying-based Execution of Logic Programs ACE: An/Or-parallel Copying-base Execution of Logic Programs Gopal GuptaJ Manuel Hermenegilo* Enrico PontelliJ an Vítor Santos Costa' Abstract In this paper we present a novel execution moel for parallel

More information

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing Inexing the Eges A simple an yet efficient approach to high-imensional inexing Beng Chin Ooi Kian-Lee Tan Cui Yu Stephane Bressan Department of Computer Science National University of Singapore 3 Science

More information

Table-based division by small integer constants

Table-based division by small integer constants Table-base ivision by small integer constants Florent e Dinechin, Laurent-Stéphane Diier LIP, Université e Lyon (ENS-Lyon/CNRS/INRIA/UCBL) 46, allée Italie, 69364 Lyon Ceex 07 Florent.e.Dinechin@ens-lyon.fr

More information

On the Placement of Internet Taps in Wireless Neighborhood Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks 1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that

More information

Comparison of Methods for Increasing the Performance of a DUA Computation

Comparison of Methods for Increasing the Performance of a DUA Computation Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,

More information

A Formal Model and Efficient Traversal Algorithm for Generating Testbenches for Verification of IEEE Standard Floating Point Division

A Formal Model and Efficient Traversal Algorithm for Generating Testbenches for Verification of IEEE Standard Floating Point Division A Formal Moel an Efficient Traversal Algorithm for Generating Testbenches for Verification of IEEE Stanar Floating Point Division Davi W. Matula, Lee D. McFearin Department of Computer Science an Engineering

More information

CS269I: Incentives in Computer Science Lecture #8: Incentives in BGP Routing

CS269I: Incentives in Computer Science Lecture #8: Incentives in BGP Routing CS269I: Incentives in Computer Science Lecture #8: Incentives in BGP Routing Tim Roughgaren October 19, 2016 1 Routing in the Internet Last lecture we talke about elay-base (or selfish ) routing, which

More information

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm NASA/CR-1998-208733 ICASE Report No. 98-45 Parallel Directionally Split Solver Base on Reformulation of Pipeline Thomas Algorithm A. Povitsky ICASE, Hampton, Virginia Institute for Computer Applications

More information

Nearest Neighbor Search using Additive Binary Tree

Nearest Neighbor Search using Additive Binary Tree Nearest Neighbor Search using Aitive Binary Tree Sung-Hyuk Cha an Sargur N. Srihari Center of Excellence for Document Analysis an Recognition State University of New York at Buffalo, U. S. A. E-mail: fscha,sriharig@cear.buffalo.eu

More information

High dimensional Apollonian networks

High dimensional Apollonian networks High imensional Apollonian networks Zhongzhi Zhang Institute of Systems Engineering, Dalian University of Technology, Dalian 11604, Liaoning, China E-mail: lutzzz063@yahoo.com.cn Francesc Comellas Dep.

More information

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly International Journal "Information Technologies an Knowlege" Vol. / 2007 309 [Project MINERVAEUROPE] Project MINERVAEUROPE: Ministerial Network for Valorising Activities in igitalisation -

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks Robust PIM-SM Multicasting using Anycast RP in Wireless A Hoc Networks Jaewon Kang, John Sucec, Vikram Kaul, Sunil Samtani an Mariusz A. Fecko Applie Research, Telcoria Technologies One Telcoria Drive,

More information

Chapter 5 Proposed models for reconstituting/ adapting three stereoscopes

Chapter 5 Proposed models for reconstituting/ adapting three stereoscopes Chapter 5 Propose moels for reconstituting/ aapting three stereoscopes - 89 - 5. Propose moels for reconstituting/aapting three stereoscopes This chapter offers three contributions in the Stereoscopy area,

More information

Secure Network Coding for Distributed Secret Sharing with Low Communication Cost

Secure Network Coding for Distributed Secret Sharing with Low Communication Cost Secure Network Coing for Distribute Secret Sharing with Low Communication Cost Nihar B. Shah, K. V. Rashmi an Kannan Ramchanran, Fellow, IEEE Abstract Shamir s (n,k) threshol secret sharing is an important

More information

Supporting Fully Adaptive Routing in InfiniBand Networks

Supporting Fully Adaptive Routing in InfiniBand Networks XIV JORNADAS DE PARALELISMO - LEGANES, SEPTIEMBRE 200 1 Supporting Fully Aaptive Routing in InfiniBan Networks J.C. Martínez, J. Flich, A. Robles, P. López an J. Duato Resumen InfiniBan is a new stanar

More information

Principles of B-trees

Principles of B-trees CSE465, Fall 2009 February 25 1 Principles of B-trees Noes an binary search Anoe u has size size(u), keys k 1,..., k size(u) 1 chilren c 1,...,c size(u). Binary search property: for i = 1,..., size(u)

More information

Adjacency Matrix Based Full-Text Indexing Models

Adjacency Matrix Based Full-Text Indexing Models 1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,

More information

Problem Paper Atoms Tree. atoms.pas. atoms.cpp. atoms.c. atoms.java. Time limit per test 1 second 1 second 2 seconds. Number of tests

Problem Paper Atoms Tree. atoms.pas. atoms.cpp. atoms.c. atoms.java. Time limit per test 1 second 1 second 2 seconds. Number of tests ! " # %$ & Overview Problem Paper Atoms Tree Shen Tian Harry Wiggins Carl Hultquist Program name paper.exe atoms.exe tree.exe Source name paper.pas atoms.pas tree.pas paper.cpp atoms.cpp tree.cpp paper.c

More information

Physics INTERFERENCE OF LIGHT

Physics INTERFERENCE OF LIGHT Physics INTERFERENCE OF LIGHT Q.1 State the principle of superposition of waves an explain the concept of interference of light. Ans. Principle of superposition of waves : When two or more waves, traveling

More information

Impact of FTP Application file size and TCP Variants on MANET Protocols Performance

Impact of FTP Application file size and TCP Variants on MANET Protocols Performance International Journal of Moern Communication Technologies & Research (IJMCTR) Impact of FTP Application file size an TCP Variants on MANET Protocols Performance Abelmuti Ahme Abbasher Ali, Dr.Amin Babkir

More information

Learning Polynomial Functions. by Feature Construction

Learning Polynomial Functions. by Feature Construction I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate

More information

Figure 1: Schematic of an SEM [source: ]

Figure 1: Schematic of an SEM [source:   ] EECI Course: -9 May 1 by R. Sanfelice Hybri Control Systems Eelco van Horssen E.P.v.Horssen@tue.nl Project: Scanning Electron Microscopy Introuction In Scanning Electron Microscopy (SEM) a (bunle) beam

More information

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees Disjoint Multipath Routing in Dual Homing Networks using Colore Trees Preetha Thulasiraman, Srinivasan Ramasubramanian, an Marwan Krunz Department of Electrical an Computer Engineering University of Arizona,

More information

A shortest path algorithm in multimodal networks: a case study with time varying costs

A shortest path algorithm in multimodal networks: a case study with time varying costs A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via

More information

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 1, NO. 4, APRIL 01 74 Towar Efficient Distribute Algorithms for In-Network Binary Operator Tree Placement in Wireless Sensor Networks Zongqing Lu,

More information

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises verview Frequent Pattern Mining comprises Frequent Pattern Mining hristian Borgelt School of omputer Science University of Konstanz Universitätsstraße, Konstanz, Germany christian.borgelt@uni-konstanz.e

More information