The Data Locality of Work Stealing

Size: px
Start display at page:

Download "The Data Locality of Work Stealing"

Transcription

1 The Data Locality of Work Stealing Umut A. Acar School of Computer Science Carnegie Mellon Univerity Guy E. Blelloch School of Computer Science Carnegie Mellon Univerity Robert D. Blumofe Department of Computer Science Univerity of Texa at Autin Abtract Thi paper tudie the data locality of the work-tealing cheduling algorithm on hardware-controlled hared-memory machine. We preent lower and upper bound on the number of cache mie uing work tealing, and introduce a locality-guided work-tealing algorithm along with experimental validation. A a lower bound, we how that there i a family of multithreaded computation G n each member of which require (n) total intruction (work), for which when uing work-tealing the number of cache mie on one proceor i contant, while even on two proceor the total number of cache mie i (n). Thi implie that for general computation there i no ueful bound relating multiproceor to uninproceor cache mie. For neted-parallel computation, however, we how that on P proceor the expected additional number of cache mie beyond thoe on a ingle proceor i bounded by O(Cd m e PT), where m i the execution time of an intruction incurring a cache mi, i the teal time, C i the ize of cache, and T i the number of node on the longet chain of dependence. Baed on thi we give trong bound on the total running time of neted-parallel computation uing work tealing. For the econd part of our reult, we preent a locality-guided work tealing algorithm that improve the data locality of multithreaded computation by allowing a thread to have an affinity for a proceor. Our initial experiment on iterative data-parallel application how that the algorithm matche the performance of taticpartitioning under traditional work load but improve the performance up to 5% over tatic partitioning under multiprogrammed work load. Furthermore, the locality-guided work tealing improve the performance of work-tealing up to 8%. Introduction Many of today parallel application ue ophiticated, adaptive algorithm which are bet realized with parallel programming ytem that upport dynamic, lightweight thread uch a Cilk [8], Nel [5], Hood [], and many other [3, 6, 7,, 3]. The core of thee ytem i a thread cheduler that balance load among the procee. In addition to a good load balance, however, good data locality i eential in obtaining high performance from modern parallel ytem. Permiion to make digital or hard copie of part or all of thi work or peronal or claroom ue i granted without fee provided that copie are not made or ditributed for profit or commercial advantage and that copie bear thi notice and the full citation on the firt page. To copy otherwie, to republih, to pot on erver, or to reditribute to lit, require prior pecific permiion and/or a fee. SPAA, Bar Harbor, Maine USA ACM //7...$5. Several reearche have tudied technique to improve the data locality of multithreaded program. One cla of uch technique i baed on oftware-controlled ditribution of data among the local memorie of a ditributed hared memory ytem [5,, 6]. Another cla of technique i baed on hint upplied by the programmer o that imilar tak might be executed on the ame proceor [5, 3, 34]. Both thee clae of technique rely on the programmer or compiler to determine the data acce pattern in the program, which may be very difficult when the program ha complicated data acce pattern. Perhap the earliet cla of technique wa to attempt to execute thread that are cloe in the computation graph on the ame proceor [, 9,, 3, 6, 8]. The work-tealing algorithm i the mot tudied of thee technique [9,, 9,, 4, 36, 37]. Blumofe et al howed that fully-trict computation achieve a provably good data locality [7] when executed with the work-tealing algorithm on a dag-conitent ditributed hared memory ytem. In recent work, Narlikar howed that work tealing improve the performance of pace-efficient multithreaded application by increaing the data locality [9]. None of thi previou work, however, ha tudied upper or lower bound on the data locality of multithreaded computation executed on exiting hardware-controlled hared memory ytem. In thi paper, we preent theoretical and experimental reult on the data locality of work tealing on hardware-controlled hared memory ytem (HSMS). Our firt et of reult are upper and lower bound on the number of cache mie in multithreaded computation executed by the work-tealing algorithm. Let M (C) denote the number of cache mie in the uniproceor execution and M P (C) denote the number of cache mie in a P -proceor execution of a multithreaded computation by the work tealing algorithm on an HSMS with cache ize C. Then, for a multithreaded computation with T work (total number of intruction), T critical path (longet equence of dependence), we how the following reult for the work-tealing algorithm running on a HSMS. Lower bound on the number of cache mie for general computation: We how that there i a family of computation G n with T = (n) uch that M (C) = 3C while even on two proceor the number of mie M (C) = (n). Upper bound on the number of cache mie for netedparallel computation: For a neted-parallel computation, we how that M P M(C) +C, where i the number of teal in the P -proceor execution. We then how that the expected number of teal i O(d m ept), where m i the time for a cache mi and i the time for a teal. Upper bound on the execution time of neted-parallel computation: We how that the expected execution time of a

2 Speedup linear work-tealing locality-guided worktealing tatic partioning Number of Procee Figure : The peedup obtained by three different over-relaxation algorithm. neted-parallel computation on P proceor i O( T (C) P + md m ect+(m+)t), where T(C) i the uniproceor execution time of the computation including cache mie. A in previou work [6, 9], we repreent a multithreaded computation a a directed, acyclic graph (dag) of intruction. Each node in the dag repreent a ingle intruction and the edge repreent ordering contraint. A neted-parallel computation [5, 6] i a race-free computation that can be repreented with a erie-parallel dag [33]. Neted-parallel computation include computation coniting of parallel loop and fork an join and any neting of them. Thi cla include mot computation that can be expreed in Cilk [8], and all computation that can be expreed in Nel [5]. Our reult how that neted-parallel computation have much better locality characteritic under work tealing than do general computation. We alo briefly conider another cla of computation, computation with future [, 3, 4,, 5], and how that they can be a bad a general computation. The econd part of our reult are on further improving the data locality of multithreaded computation with work tealing. In work tealing, a proceor teal a thread from a randomly (with uniform ditribution) choen proceor when it run out of work. In certain application, uch a iterative data-parallel application, random teal may caue poor data locality. The locality-guided work tealing i a heuritic modification to work tealing that allow a thread to have an affinity for a proce. In locality-guided work tealing, when a proce obtain work it give priority to a thread that ha affinity for the proce. Locality-guided work tealing can be ued to implement a number of technique that reearcher ugget to improve data locality. For example, the programmer can achieve an initial ditribution of work among the procee or chedule thread baed on hint by appropriately aigning affinitie to thread in the computation. Our preliminary experiment with locality-guided work tealing give encouraging reult, howing that for certain application the performance i very cloe to that of tatic partitioning in dedicated mode (i.e. when the uer can lock down a fixed number of proceor), but doe not uffer a performance cliff problem [] in multiprogrammed mode (i.e. when proceor might be taken by other uer or the OS). Figure how a graph comparing work tealing, locality-guided work tealing, and tatic partitioning for a imple over-relaxation algorithm on a 4 proceor Sun Ultra Enterprie. The over-relaxation algorithm iterate over a dimenional array performing a 3-point tencil computation on each tep. The uperlinear peedup for tatic partitioning and locality-guided work tealing i due to the fact that the data for each run doe not fit into the L cache of one proceor but fit into the collective L cache of 6 or more proceor. For thi benchmark the following can be een from the graph.. Locality-guided work tealing doe ignificantly better than tandard work tealing ince on each tep the cache i prewarmed with the data it need.. Locality-guided work tealing doe approximately a well a tatic partitioning for up to 4 procee. 3. When trying to chedule more than 4 procee on 4 proceor tatic partitioning ha a eriou performance drop. The initial drop i due to load imbalance caued by the coare-grained partitioning. The performance then approache that of work tealing a the partitioning get more fine-grained. We are intereted in the performance of work-tealing computation on hardware-controlled hared memory (HSMS). We model an HSMS a a group of identical proceor each of which ha it own cache and ha a ingle hared memory. Each cache contain C block and i managed by the memory ubytem automatically. We allow for a variety of cache organization and replacement policie, including both direct-mapped and aociative cache. We aign a erver proce with each proceor and aociate the cache of a proceor with proce that the proceor i aigned. One limitation of our work i that we aume that there i no fale haring. Related Work A mentioned in Section, there are three main clae of technique that reearcher have uggeted to improve the data locality of multithreaded program. In the firt cla, the program data i ditributed among the node of a ditributed hared-memory ytem by the programmer and a thread in the computation i cheduled on the node that hold the data that the thread accee [5,, 6]. In the econd cla, data-locality hint upplied by the programmer are ued in thread cheduling [5, 3, 34]. Technique from both clae are employed in ditributed hared memory ytem uch a COOL and Illinoi Concert [5, ] and alo ued to improve the data locality of equential program [3]. However, the firt cla of technique do not apply directly to HSMS, becaue HSMS do not allow oftware controlled ditribution of data among the cache. Furthermore, both clae of technique rely on the programmer to determine the data acce pattern in the application and thu, may not be appropriate for application with complex data-acce pattern. The third cla of technique, which i baed on execution of thread that are cloe in the computation graph on the ame proce, i applied in many cheduling algorithm including work tealing [, 9, 3, 6, 8, 9]. Blumofe et al howed bound on the number of cache mie in a fully-trict computation executed by the work-tealing algorithm under the dag-conitent ditributed hared-memory of Cilk [7]. Dag conitency i a relaxed memory-conitency model that i employed in the ditributed hared-memory implementation of the Cilk language. In a ditributed Cilk application, procee maintain the dag conitency by mean of the BACKER algorithm. In [7], Blumofe et al bound the number of hared-memory cache mie in a ditributed Cilk application for cache that are maintained with the LRU replacement policy. They aumed that accee to the hared memory are ditributed uniformly and independently, which i not generally true becaue thread may concurrently acce the ame page by algorithm deign. Furthermore, they aumed that procee do

3 Figure : A dag (directed acyclic graph) for a multithreaded computation. Thread are hown a gray rectangle. not generate teal attempt frequently by making procee do additional page tranfer before they attempt to teal from another proce. 3 The Model In thi ection, we preent a graph-theoretic model for multithreaded computation, decribe the work-tealing algorithm, define erieparallel and neted-parallel computation and introduce our model of an HSMS (Hardware-controlled Shared-Memory Sytem). A with previou work [6, 9] we repreent a multithreaded computation a a directed acyclic graph, a dag, of intruction (ee Figure ). Each node in the dag repreent an intruction and the edge repreent ordering contraint. There are three type of edge, continuation, pawn, and dependency edge. A thread i a equential ordering of intruction and the node that correpond to the intruction are linked in a chain by continuation edge. A pawn edge repreent the creation of a new thread and goe from the node repreenting the intruction that pawn the new thread to the node repreenting the firt intruction of the new thread. A dependency edge from intruction i of a thread to intruction j of ome other thread repreent a ynchronization between two intruction uch that intruction j mut be executed after i. We draw pawn edge with thick traight arrow, dependency edge with curly arrow and continuation edge with thick traight arrow throughout thi paper. Alo we how path with wavy line. For a computation with an aociateddag G, we define the computational work, T, a the number of node in G and the critical path, T, a the number of node on the longet path of G. Let u and v be any two node in a dag. Then we call u an ancetor of v, and v a decendant of u if there i a path from u to v. Any node i it decendant and ancetor. We ay that two node are relative if there i a path from one to the other, otherwie we ay that the node are independent. The children of a node are independent becaue otherwie the edge from the node to one child i redundant. We call a common decendant y of u and v a merger of u and v if the path from u to y and v to y have only y in common. We define the depth of a node u a the number of edge on the hortet path from the root node to u. We define the leat common ancetor of u and v a the ancetor of both u and v with maximum depth. Similarly, we define the greatet common decendant of u and v, a the decendant of both u and v with minimum depth. An edge (u v) i redundant if there i a path between u and v that doe not contain the edge (u v). The tranitive reduction of a dag i the dag with all the redundant edge removed. In thi paper we are only concerned with the tranitive reduction of the computational dag. We alo require that the dag have a ingle node with in-degree, the root, and a ingle node with outdegree, the final node. In a multiproce execution of a multithreaded computation, independent node can execute at the ame time. If two independent node read or modify the ame data, we ay that they are RR or WW haring repectively. If one node i reading and the other i modifying the data we ay they are RW haring. RW or WW haring can caue data race, and the output of a computation with uch race uually depend on the cheduling of node. Such race are typically indicative of a bug [8]. We refer to computation that do not have any RW or WW haring a race-free computation. In thi paper we conider only race-free computation. The work-tealing algorithm i a thread cheduling algorithm for multithreaded computation. The idea of work-tealing date back to the reearch of Burton and Sleep [] and ha been tudied extenively ince then [, 9, 9,, 4, 36, 37]. In the work-tealing algorithm, each proce maintain a pool of ready thread and obtain work from it pool. When a proce pawn a new thread the proce add the thread into it pool. When a proce run out of work and find it pool empty, it chooe a random proce a it victim and trie to teal work from the victim pool. In our analyi, we imagine the work-tealing algorithm operating on individual node in the computation dag rather than on the thread. Conider a multithreaded computation and it execution by the work-tealing algorithm. We divide the execution into dicrete time tep uch that at each tep, each proce i either working on a node, which we call the aigned node, or i trying to teal work. The execution of a node take time tep if the node doe not incur a cache mi and m tep otherwie. We ay that a node i executed at the time tep that a proce complete executing the node. The execution time of a computation i the number of time tep that elape between the time tep that a proce tart executing the root node to the time tep that the final node i executed. The execution chedule pecifie the activity of each proce at each time tep. During the execution, each proce maintain a deque (doubly ended queue) of ready node; we call the end of a deque the top and the bottom. When a node, u, i executed, it enable ome other node v if u i the lat parent of v that i executed. We call the edge (u v) an enabling edge and u the deignated parent of v. When a proce execute a node that enable other node, one of the enabled node become the aigned node and the proce puhe the ret onto the bottom of it deque. If no node i enabled, then the proce obtain work from it deque by removing a node from the bottom of the deque. If a proce find it deque empty, it become a thief and teal from a randomly choen proce, the victim. Thi i a teal attempt and take at leat and at mot k time tep for ome contant k to complete. A thief proce might make multiple teal attempt before ucceeding, or might never ucceed. When a teal ucceed, the thief proce tart working on the tolen node at the tep following the completion of the teal. We ay that a teal attempt occur at the tep it complete. The work-tealing algorithm can be implemented in variou way. We ay that an implementation of work tealing i determinitic if, whenever a proce enable other node, the implementation alway chooe the ame node a the aigned node for then next tep on that proce, and the remaining node are alway placed in the deque in the ame order. Thi mut be true for both multiproce and uniproce execution. We refer to a determinitic implementation of the work-tealing algorithm together with the HSMS that run the implementation a a work tealer. For brevity, we refer to an execution of a multithreaded computation with a work tealer a an execution. We define the total work a the number of tep taken by a uniproce execution, including the cache mie, and denote it by T (C), where C i the cache ize. We denote the number of cache mie in a P -proce execution with C-block cache a M P (C). We define the cache overhead of a P -proce execution a M P (C) ; M (C), where M (C) i the number of mie in the uniproce execution on the ame work tealer. We refer to a multithreaded computation for which the trani- 3

4 t G G t u (a) (b) (c) Figure 3: Illutrate the recurive definition for erie-parallel dag. Figure (a) i the bae cae, figure (b) depict the erial, and figure (c) depict the parallel compoition. tive reduction of the correponding dag i erie-parallel [33] a a erie-parallel computation. A erie-parallel dag G(V E) i a dag with two ditinguihed vertice, a ource, V and a ink, t V and can be defined recurively a follow (ee Figure 3). Bae: G conit of a ingle edge connecting to t. Serie Compoition: G conit of two erie-parallel dag G (V E ) and G (V E ) with dijoint edge et uch that i the ource of G, u i the ink of G and the ource of G, and t i the ink of G. Moreover V \ V = fug. Parallel Compoition: The graph conit of two erie-parallel dag G (V E ) and G (V E ) with dijoint edge et uch that and t are the ource and the ink of both G and G. Moreover V \ V = f tg. A neted-parallel computation i a race-free erie-parallel computation [6]. We alo conider multithreaded computation that ue future [, 3, 4,, 5]. The dag tructure of computation with future are defined elewhere [4]. Thi i a upercla of neted-parallel computation, but till much more retrictive than general computation. The work-tealing algorithm for future i a retricted form of work-tealing algorithm, where a proce tart executing a newly created thread immediately, putting it aigned thread onto it deque. In our analyi, we conider everal cache organization and replacement policie for an HSMS. We model a cache a a et of (cache) line, each of which can hold the data belonging to a memory block (a conecutive, typically mall, region of memory). One intruction can operate on at mot one memory block. We ay that an intruction accee a block or the line that contain the block when the intruction read or modifie the block. We ay that an intruction overwrite a line that contain the block b when the intruction accee ome other block that replace b in the cache. We ay that a cache replacement policy i imple if it atifie two condition. Firt the policy i determinitic. Second whenever the policy decide to overwrite a cache line, l, it make the deciion to overwrite l by only uing information pertaining to the accee that are made after the lat acce to l. We refer to a cache managed with a imple cache-replacement policy a a imple cache. Simple cache and replacement policie are common in practice. For example, leat-recently ued (LRU) replacement policy, direct mapped cache and et aociative cache where each et i maintained by a imple cache replacement policy are imple. In regard to the definition of RW or WW haring, we aume that read and write pertain to the whole block. Thi mean we do not allow for fale haring when two procee acceing different portion of a block invalidate the block in each other cache. In practice, fale haring i an iue, but can often be avoided by a knowledge of underlying memory ytem and appropriately padding the hared data to prevent two procee from acceing different portion of the ame block. G G t L 4C Figure 4: The tructure for dag of a computation with a large cache overhead. 4 General Computation In thi ection, we how that the cache overhead of a multiproce execution of a general computation and a computation with future can be large even though the uniproce execution incur a mall number of mie. Theorem There i a family of computation fg n : n = kc for k Z + g with O(n) computational work, whoe uniproce execution incur 3C mie while any -proce execution of the computation incur (n) mie on a work tealer with a cache ize of C, auming that S = O(C), where S i the maximum teal time. Proof: Figure 4 how the tructure of a dag, G 4C for n =4C. Each node except the root node repreent a equence of C intruction acceing a et of C ditinct memory block. The root node repreent C +S intruction that acceec ditinct memory block. The graph ha two ymmetric component L 4C and R 4C, which correpond to the left and the right ubtree of the root excluding the leave. We partition the node in G 4C into three clae, uch that all node in a cla acce the ame memory block while node from different clae acce mutually dijoint et of memory block. The firt cla contain the root node only, the econd cla contain all the node in L 4C, and the third cla contain the ret of the node, which are the node in R 4C and the leave of G 4C. For general n = kc, G n can be partitioned into L n, R n and the k leave of G n and the root imilarly. Each of L n and R n contain d k e; node and ha the tructure of a complete binary tree with additional k leave at the lowet level. There i a dependency edge from the leave of both L n and R n to the leave of G n. Conider a work tealer that execute the node of G n in the order that they are numbered in a uniproce execution. In the uniproce execution, no node in L n incur a cache mi except the root node, ince all node in L n acce the ame memory block a the root of L n. The ame argument hold for R n and the k leave of G n. Hence the execution of the node in L n, R n, and the leave caue C mie. Since the root node caue C mie, the total number of mie in the uniproce execution i 3C. Now, conider a -proce execution with the ame work tealer and call the procee, proce and. At time tep, proce tart executing the root node, which enable the root of R n no later than time tep m. Since proce tart tealing immediately and there are no other procee to teal from, proce teal and tart working on the root of R n, no later than time tep m + S. Hence, the root of R n execute before the root of L n and thu, all the node in L n execute before the correponding ymmetric node in R n. Therefore, for any leaf of G n, the parent that i in R n execute before the parent in L n. Therefore a leaf node of G n i executed immediately after it parent in L n and thu, caue C cache mie. Thu, the total number of cache mie i (kc) =(n). R 4C 5 8 4

5 Figure 5: The tructure for dag of a computation with future that can incur a large cache overhead. There exit computation imilar to the computation in Figure 4 that generalize Theorem for arbitrary number of procee by making ure that all the procee but teal throughout any multiproce execution. Even in the general cae, however, where the average parallelim i higher than the number of procee, Theorem can be generalized with the ame bound on expected number of cache mie by exploiting the ymmetry in G n and by auming a ymmetrically ditributed teal-time. With a ymmetrically ditributed teal-time, for any, a teal that take tep more than mean teal-time i equally likely to happen a a teal that take le tep than the mean. Theorem hold for computation with future a well. Multithreaded computing with future i a fairly retricted form of multithreaded computing compared to computing with event uch a ynchronization variable. The graph F in Figure 5 how the tructure of a dag, whoe -proce execution caue large number of cache mie. In a -proce execution of F, the enabling parent of the leaf node in the right ubtree of the root are in the left ubtree and therefore the execution of each uch leaf node caue C mie. 5 Neted-Parallel Computation In thi ection, we how that the cache overhead of an execution of a neted-parallel computation with a work tealer i at mot twice the product of the number of teal and the cache ize. Our proof ha two tep. Firt, we how that the cache overhead i bounded by the product of the cache ize and the number of node that are executed out of order with repect to the uniproce execution order. Second, we prove that the number of uch out-of-order execution i at mot twice the number of teal. Conider a computation G and it P -proce execution, X P, with a work tealer and the uniproce execution, X with the ame work tealer. Let v be a node in G and node u be the node that execute immediately before v in X. Then we ay that v i drifted in X P if node u i not executed immediately before v by the proce that execute v in X P. Lemma etablihe a key property of an execution with imple cache. Lemma Conider a proce with a imple cache of C block. Let X denote the execution of a equence of intruction on the proce tarting with cache tate S and let X denote the execution of the ame equence of intruction tarting with cache tate S. Then X incur at mot C more mie than X. Proof: We contruct a one-to-one mapping between the cache line in X and X uch that an intruction that accee a line l in X accee the entry l in X, if and only if l i mapped to 4 l. Conider X and let l be a cache line. Let i be the firt intruction that accee or overwrite l. Let l be the cache line that the ame intruction accee or overwrite in X and map l to l. Since the cache are imple, an intruction that overwrite l in X overwrite l in X. Therefore the number of mie that overwrite l in X i equal to the number of mie that overwrite l in X after intruction i. Since i itelf can caue mi, the number of mie that overwrite l in X i at mot more than the number of mie that overwrite l in X. We contruct the mapping for each cache line in X in the ame way. Now, let u how that the mapping i one-to-one. For the ake of contradiction, aume that two cache line, l and l,inx map to the ame line in X. Let i and i be the firt intruction acceing the cache line in X uch that i i executed before i. Since i and i map to the ame line in X and the cache are imple, i accee the line that i acceein X but then l = l, a contradiction. Hence, the total number of cache mie in X i at mot C more than the mie in X. Theorem 3 Let D denote the total number of drifted node in an execution of a neted-parallel computation with a work tealer on P procee, each of which ha a imple cache with C word. Then the cache overhead of the execution i at mot CD. Proof: Let X P denote the P -proce execution and let X be the uniproce execution of the ame computation with the ame work tealer. We divide the multiproce computation into D piece each of which can incur at mot C more mie than in the uniproce execution. Let u be a drifted node let q be the proce that execute u. Let v be the next drifted node executed on q (or the final node of the computation). Let the ordered et O repreent the execution order of all the node that are executed after u (u i included) and before v (v i excluded if it i drifted, included otherwie) on q in X P. Then node in O are executed on the ame proce and in the ame order in both X and X P. Now conider the number of cache mie during the execution of the node in O in X and X P. Since the computation i neted parallel and therefore race free, a proce that execute in parallel with q doe not caue q to incur cache mie due to haring. Therefore by Lemma during the execution of the node in O the number of cache mie in X P i at mot C more than the number of mie in X. Thi bound hold for each of the D equence of uch intruction O correponding to D drifted node. Since the equence tarting at the root node and ending at the firt drifted node incur the ame number of mie in X and X P X P take at mot CD more mie than X and the cache overhead i at mot CD. Lemma (and thu Theorem 3) doe not hold for cache that are not imple. For example, conider the execution of a equence of intruction on a cache with leat-frequently-ued replacement policy tarting at two cache tate. In the firt cache tate, the block that are frequently acceedby the intruction are in the cache with high frequencie, wherea in the econd cache tate, the block that are in the cache are not acceed by the intruction and have low frequencie. The execution with the econd cache tate, therefore, incur many more mie than the ize of the cache compared to the execution with the econd cache tate. Now we how that the number of drifted node in an execution of a erie-parallel computation with a work tealer i at mot twice the number of teal. The proof i baed on the repreentation of erie-parallel computation a p-dag. We call a node with outdegree of at leat a fork node and partition the node of an p-dag except the root into three categorie: join node, table node and nomadic node. We call a node that ha an in-degree of at leat a join node and partition all the node that have in-degree into 5

6 t w u z r u x y v G t Figure 6: Children of and their merger. v G z G u Figure 8: The join node i the leat common ancetor of y and z. Node u and v are the children of. t G Figure 7: The joint embedding of u and v. two clae: a nomadic node ha a parent that i a fork node, and a table node ha a parent that ha out-degree. The root node ha indegree and it doe not belong to any of thee categorie. Lemma 4 lit two fundamental propertie of p-dag; one can prove both propertie by induction on the number of edge in an p-dag. Lemma 4 Let G be an p-dag. Then G ha the following propertie. v. The leat common ancetor of any two node in G i unique.. The greatet common decendant of any two node in G i unique and i equal to their unique merger. Lemma 5 Let be a fork node. Then no child of i a join node. Proof: Let u and v denote two children of and uppoe u i a join node a in Figure 6. Let t denote ome other parent of u and z denote the unique merger of u and v. Then both z and u are merger for and t, which i a contradiction of Lemma 5. Hence u i not a join node. Corollary 6 Only nomadic node can be tolen in an execution of a erie-parallel computation by the work-tealing algorithm. Proof: Let u be a tolen node in an execution. Then u i puhed on a deque and thu the enabling parent of u i a fork node. By Lemma 5, u i not a join node and ha an incoming degree. Therefore u i nomadic. Conider a erie-parallel computation and let G be it p-dag. Let u and v be two independent node in G and let and t denote their leat common ancetor and greatet common decendant repectively a hown in Figure 7. Let G denote the graph that i induced by the relative of u that are decendant of and alo ancetor of t. Similarly, let G denote the graph that i induced by the relative of v that are decendant of and ancetor of t. Then we call G the embedding of u with repect to v and G the embedding of v with repect to u. We call the graph that i the union of G and G the joint embedding of u and v with ource and ink t. Now conider an execution of G and y and z be the children of uch that y i executed before z. Then we call y the leader and z the guard of the joint embedding. Lemma 7 Let G(V E) be an p-dag and let y and z be two parent of a join node t in G. Let G denote the embedding of y with repect to z and G denote the embedding of z with repect to y. Let denote the ource and t denote the ink of the joint embedding. Then the parent of any node in G except for and t i in G and the parent of any node in G except for and t i in G. Proof: Since y and z are independent, both of and t are different from y and z (ee Figure 8). Firt, we how that there i not an edge that tart at a node in G except at and end at a node in G except at t and vice vera. For the ake of contradiction, aume there i an edge (m n) uch that m 6= i in G and n 6= t i in G. Then m i the leat common ancetor of y and z; hence no uch (m n) exit. A imilar argument hold when m i in G and n i in G. Second, we how that there doe not exit an edge that originate from a node outide of G or G and end at a node at G or G. For the ake of contradiction, let (w x) be an edge uch that x i in G and w i not in G or G. Then x i the unique merger for the two children of the leat common ancetor of w and, which we denote with r. But then t i alo a merger for the children of r. The children of r are independent and have a unique merger, hence there i no uch edge (w x). A imilar argument hold when x i in G. Therefore we conclude that the parent of any node in G except and t i in G and the parent of any node in G except and t i in G. Lemma 8 Let G be an p-dag and let y and z be two parent of a join node t in G. Conider the joint embedding of y and z and let u be the guard node of the embedding. Then y and z are executed in the ame repective order in a multiproce execution a they are executed in the uniproce execution if the guard node u i not tolen. Proof: Let be the ource, t the ink, and v the leader of the joint embedding. Since u i not tolen, v i not tolen. Hence, by Lemma 7, before it tart working on u, the proce that execute executed v and all it decendant in the embedding except for t Hence, z i executed before u and y i executed after u a in the uniproce execution. Therefore, y and z are executed in the ame repective order a they execute in the uniproce execution. Lemma 9 A nomadic node i drifted in an execution only if it i tolen. Proof: Let u be a nomadic and drifted node. Then, by Lemma 5, u ha a ingle parent that enable u. Ifu i the firt child of to execute in the uniproce execution then u i not drifted in the multiproce execution. Hence, u i not the firt child to execute. Let v be the lat child of that i executed before u in the uniproce execution. Now, conider the multiproce execution and let q be the 6

7 u Figure 9: Node t and t are two join node with the common guard u. proce that execute v. For the ake of contradiction, aume that u i not tolen. Conider the joint embedding of u and v a hown in Figure 8. Since all parent of the node in G except for and t are in G by Lemma 7, q execute all the node in G before it execute u and thu, z precede u on q. But then u i not drifted, becaue z i the node that i executed immediately before u in the uniproce computation. Hence u i tolen. Let u define the cover of a join node t in an execution a the et of all the guard node of the joint embedding of all poible pair of parent of t in the execution. The following lemma how that a join node i drifted only if a node in it cover i tolen. Lemma A join node i drifted in an execution only if a node in it cover i tolen in the execution. Proof: Conider the execution and let t be a join node that i drifted. Aume, for the ake of contradiction, that no node in the cover of t, C(t), i tolen. Let y and z be any two parent of t a in Figure 8. Then y and z are executed in the ame order a in the uniproce execution by Lemma 8. But then all parent of t execute in the ame order a in the uniproce execution. Hence, the enabling parent of t in the execution i the ame a in the uniproce execution. Furthermore, the enabling parent of t ha out-degree, becaue otherwie t i not a join node by Lemma 5 and thu, the proce that enable t execute t. Therefore, t i not drifted. A contradiction, hence a node in the cover of t i tolen. Lemma The number of drifted node in an execution of a erieparallel computation i at mot twice the number of teal in the execution. Proof: We aociate each drifted node in the execution with a teal uch that no teal ha more than drifted node aociated with it. Conider a drifted node, u. Then u i not the root node of the computation and it i not table either. Hence, u i either a nomadic or join node. If u i nomadic, then u i tolen by Lemma 9 and we aociate u with the teal that teal u. Otherwie, u i a join node and there i a node in it cover C(u) that i tolen by Lemma. We aociate u with the teal that teal a node in it cover. Now, aume there are more than node aociated with a teal that teal node u. Then there are at leat two join node t and t that are aociated with u. Therefore, node u i in the joint embedding of two parent of t and alo t. Let x, y be thee parent of t and x, y be the parent of t, a hown in Figure 9. But then u ha parent that i a fork node and i a joint node, which contradict Lemma 5. Hence no uch u exit. Theorem The cache overheadof an execution of a neted-parallel computation with imple cache i at mot twice the product of the number of mie in the execution and the cache ize. Proof: Follow from Theorem 3 and Lemma. x y x y t t 6 An Analyi of Nonblocking Work Stealing The non-blocking implementation of the work-tealing algorithm deliver provably good performance under traditional and multiprogrammed workload. A decription of the implementation and it analyi i preented in []; an experimental evaluation i given in []. In thi ection, we extend the analyi of the non-blocking work-tealing algorithm for claical workload and bound the execution time of a neted-parallel, computation with a work tealer to include the number of cache mie, the cache-mi penalty and the teal time. Firt, we bound the number of teal attempt in an execution of a general computation by the work-tealing algorithm. Then we bound the execution time of a neted-parallel computation with a work tealer uing reult from Section 5. The analyi that we preent here i imilar to the analyi given in [] and ue the ame potential function technique. We aociate a nonnegative potential with node in a computation dag and how that the potential decreae a the execution proceed. We aume that a node in a computation dag ha outdegree at mot. Thi i conitent with the aumption that each node repreent on intruction. Conider an execution of a computation with it dag, G(V E) with the work-tealing algorithm. The execution grow a tree, the enabling tree, that contain each node in the computation and it enabling edge. We define the ditance of a node u V, d(u),at ; depth(u), where depth(u) i the depth of u in the enabling tree of the computation. Intuitively, the ditance of a node indicate how far the node i away from end of the computation. We define the potential function in term of ditance. At any given tep i, we aign a poitive potential to each ready node, all other node have potential. A node i ready if it i enabled and not yet executed to completion. Let u denote a ready node at time tep i. Then we define, i(u), the potential of u at time tep i a i(u) = 3 d(u); if u i aigned; 3 d(u) otherwie. The potential at tep i, i, i the um of the potential of each ready node at tep i. When an execution begin, the only ready node i the root node which ha ditance T and i aigned to ome proce, o we tart with =3 T;. A the execution proceed, node that are deeper in the dag become ready and the potential decreae. There are no ready node at the end of an execution and the potential i. Let u give a few more definition that enable u to aociate a potential with each proce. Let R i(q) denote the et of ready node that are in the deque of proce q along with q aigned node, if any, at the beginning of tep i. We ay that each node u in R i(q) belong to proce q. Then we define the potential of q deque a X i(q) = i(u) : ur i (q) In addition, let A i denote the et of procee whoe deque i empty at the beginning of tep i, and let D i denote the et of all other procee. We partition the potential i into two part where i(a i)= X i = i(a i)+ i(d i) qa i i(q) and i(d i)= and we analyze the two part eparately. X qd i i(q) 7

8 Lemma 3 lit four baic propertie of the potential that we ue frequently. The proof for thee propertie are given in [] and the lited propertie are correct independent of the time that execution of a node or a teal take. Therefore, we give a hort proof ketch. Lemma 3 The potential function atifie the following propertie.. Suppoe node u i aigned to a proce at tep i. Then the potential decreae by at leat (=3) i(u).. Suppoe a node u i executed at tep i. Then the potential decreae by at leat (5=9) i(u) at tep i. 3. Conider any tep i and any proce q in D i. The topmot node u in q deque contribute at leat 3=4 of the potential aociated with q. That i, we have i(u) (3=4) i(q). 4. Suppoe a proce p chooe proce q in D i a it victim at time tep i (a teal attempt of p targeting q occur at tep i). Then the potential decreae by at leat (=) i(q) due to the aignment or execution of a node belonging to q at the end of tep i. Property follow directly from the definition of the potential function. Property hold becaue a node enable at mot two children with maller potential, one of which become aigned. Specifically, the potential after the execution of node u decreaeby at leat (u)(; 3 ; 9 )= 5 (u). Property 3 follow from a tructural property of the node in a deque. The ditance of the node in 9 a proce deque decreae monotonically from the top of the deque to bottom. Therefore, the potential in the deque i the um of geometrically decreaing term and dominated by the potential of the top node. The lat property hold becaue when a proce chooe proce q in D i a it victim, the node at the top of q deque i aigned at the next tep. Therefore, the potential decreae by =3 i(u) by property. Moreover, i(u) (3=4) i(q) by property 3 and the reult follow. Lemma 6 how that the potential decreae a a computation proceed. The proof for Lemma 6 utilize ball and bin game bound from Lemma 4. Lemma 4 (Ball and Weighted Bin) Suppoe that at leat P ball are thrown independently and uniformly at random into P bin, where P bin i ha a weight W i, for i = ::: P. The total weight i P W = Wi. For each bin i, define the random variable X i= i a n Wi if ome ball land in bin i; X i = P otherwie. P If X = Xi, then for any in the range <<, we have i= Pr fx Wg > ; =(( ; )e). Thi lemma can be proven with an application of Markov inequality. The proof of a weaker verion of thi lemma for the cae of exactly P throw i imilar and given in []. Lemma 4 alo follow from the weaker lemma becaue X doe not decreae with more throw. We now how that whenever P or more teal attempt occur, the potential decreae by a contant fraction of i(d i) with contant probability. Lemma 5 Conider any tep i and any later tep j uch that at leat P teal attempt occur at tep from i (incluive) to j (excluive). Then we have o Pr n i ; j 4 i(di) > 4 : Moreover the potential decreae i becaue of the execution or aignment of node belonging to a proce in D i. Proof: Conider all P procee and P teal attempt that occur at or after tep i. For each proce q in D i, if one or more of the P attempt target q a the victim, then the potential decreae by (=) i(q) due to the execution or aignment of node that belong to q by property 4 in Lemma 3. If we think of each attempt a a ball to, then we have an intance of the Ball and Weighted Bin Lemma (Lemma 4). For each proce q in D i, we aign a weight W q =(=) i(q), and for each other proce q in A i, we aign a weight W q =. The weight um to W =(=) i(d i). Uing = = in Lemma 4, we conclude that the potential decreae by at leat W =(=4) i(d i) with probability greater than ; =(( ; )e) > =4 due to the execution or aignment of node that belong to a proce in D i. We now bound the number of teal attempt in a work-tealing computation. Lemma 6 Conider a P -proce execution of a multithreaded computation with the work-tealing algorithm. Let T and T denote the computational work and the critical path of the computation. Then the expected number of teal attempt in the execution i O(d m e PT). Moreover, for any ">, the number of teal attempt i O(d m e PT +lg(=")) with probability at leat ; ". Proof: We analyze the number of teal attempt by breaking the execution into phae of d m e P teal attempt. We how that with contant probability, a phae caue the potential to drop by a contant factor. The firt phae begin at tep t = and end at the firt tep t uch that at leat d m e P teal attempt occur during the interval of tep [t t ]. The econd phae begin at tep t = t +, and o on. Let u firt how that there are at leat m tep in a phae. A proce ha at mot outtanding teal attempt at any time and a teal attempt take at leat tep to complete. Therefore, at mot P teal attempt occur in a period of time tep. Hence a phae of teal attempt take at leat d(d m e)p )=P e m time unit. Conider a phae beginning at tep i, and let j be the tep at which the next phae begin. Then i + m j. We will how that we have Pr f j (3=4)ig > =4. Recall that the potential can be partitioned a i = i(a i)+ i(d i). Since the phae contain d m e P teal attempt, Pr fi ; j (=4)i(Di)g > =4 due to execution or aignment of node that belong to a proce in D i, by Lemma 5. Now we how that the potential alo drop by a contant fraction of i(a i) due to the execution of aigned node that are aigned to the procee in A i. Conider a proce, ay q in A i. If q doe not have an aigned node, then i(q) =. If q ha an aigned node u, then i(q) = i(u). In thi cae, proce q complete executing node u at tep i + m ; <jat the latet and the potential drop by at leat (5=9) i(u) by property of Lemma 3. Summing over each proce q in A i, we have i ; j (5=9)i(Ai). Thu, we have hown that the potential decreae at leat by a quarter of i(a i) and i(d i). Therefore no matter how the total potential i ditributed over A i and D i, the total potential decreae by a quarter with probability more than =4, that i, Pr f i ; j (=4)ig > =4. We ay that a phae i ucceful if it caue the potential to drop by at leat a =4 fraction. A phae i ucceful with probability at leat =4. Since the potential tart at = 3 T; and end at (and i alway an integer), the number of ucceful phae i at mot (T ; ) log4=3 3 < 8T. The expected number of phae needed to obtain 8T ucceful phae i at mot 3T. Thu, the expected number of phae i O(T ), and becaue each phae contain d m e P teal attempt, the expected number of teal attempt i O(d m ept). The high probability bound follow by an application of the Chernoff bound. 8

9 Theorem 7 Let M P (C) be the number of cache mie in a P - proce execution of a neted-parallel computation with a worktealer that ha imple cache of C block each. Let M (C) be the number of cache mie in the uniproce execution Then Step M P (C) =M (C) + m O(d e CP T + m d e CP ln(=")) with probability at leat ;". The expected number of cache mie i M m (C) +O(d e CP T) Step Proof: Theorem how that the cache overhead of a netedparallel computation i at mot twice the product of the number of teal and the cache ize. Lemma 6 how that the number of teal attempt i O(d m ep (T+ln(="))) with probability at leat ;" and the expected number of teal i O(d m ept). The number of teal i not greater than the number of teal attempt. Therefore the bound follow. Theorem 8 Conider a P -proce, neted-parallel, work-tealing computation with imple cache of C block. Then, for any ">, the execution time i O( T(C) m P +md ec (T + ln(="))+(m +)(T +ln(="))) with probability at leat ( ; "). Moreover, the expected running time i O( T(C) P + m md e CT +(m + )T) : Proof: We ue an accounting argument to bound the running time. At each tep in the computation, each proce put a dollar into one of two bucket that matche it activity at that tep. We name the two bucket a the work and the teal bucket. A proce put a dollar into the work bucket at a tep if it i working on a node in the tep. The execution of a node in the dag add either or m dollar to the work bucket. Similarly, a proce put a dollar into the teal bucket for each tep that it pend tealing. Each teal attempt take O() tep. Therefore, each teal add O() dollar to the teal bucket. The number of dollar in the work bucket at the end of execution i at mot O(T +(m ; ) M P (C)), which i l m m O(T (C) +(m ; ) CP (T +ln(=" ))) with probability at leat ; ". The total number of dollar in teal bucket i the total number of teal attempt multiplied by the number of dollar added to the teal bucket for each teal attempt, which i O(). Therefore total number of dollar in the teal bucket i l m m O( P (T +ln(=" ))) with probability at leat ; ". Each proce add exactly one dollar to a bucket at each tep o we divide the total number of dollar by P to get the high probability bound in the theorem. A imilar argument hold for the expected time bound. Figure : The tree of thread created in a data-parallel worktealing application. 7 Locality-Guided Work Stealing The work-tealing algorithm achieve good data locality by executing node that are cloe in the computation graph on the ame proce. For certain application, however, region of the program that acce the ame data are not cloe in the computational graph. A an example, conider an application that take a equence of tep each of which operate in parallel over a et or array of value. We will call uch an application an iterative data-parallel application. Such an application can be implemented uing work-tealing by forking a tree of thread on each tep, in which each leaf of the tree update a region of the data (typically dijoint). Figure how an example of the tree of thread created in two tep. Each node repreent a thread and i labeled with the proce that execute it. The gray node are the leave. The thread ynchronize in the ame order a they fork. The firt and econd tep are tructurally identical, and each pair of correponding gray node update the ame region, often uing much of the ame input data. The dahed rectangle in Figure, for example, how a pair of uch gray node. To get good locality for thi application, thread that update the ame data on different tep ideally hould run on the ame proceor, even though they are not cloe in the dag. In work tealing, however, thi i highly unlikely to happen due to the random teal. Figure, for example, how an execution where all pair of correponding gray node run on different procee. In thi ection, we decribe and evaluate locality-guided work tealing, a heuritic modification to work tealing which i deigned to allow locality between node that are ditant in the computational graph. In locality-guided work tealing, each thread can be given an affinity for a proce, and when a proce obtain work it give priority to thread with affinity for it. To enable thi, in addition to a deque each proce maintain a mailbox: a firt-in-firt-out (FIFO) queue of pointer to thread that have affinity for the proce. There are then two difference between the locality-guided work-tealing and work-tealing algorithm. Firt, when creating a thread, a proce will puh the thread onto both the deque, a in normal work tealing, and alo onto the tail of the mailbox of the proce that the thread ha affinity for. Second, a proce will firt try to obtain work from it mailbox before attempting a teal. Becaue thread can appear twice, once in a mailbox and once on a deque, there need to be ome form of ynchronization between the two copie to make ure the thread i not executed twice. A number of technique that have been uggeted to improve the data locality of multithreaded program can be realized by the locality-guided work-tealing algorithm together with an appropriate policy to determine the affinitie of thread. For example, an 9

The Data Locality of Work Stealing

The Data Locality of Work Stealing The Data Locality of Work Stealing Umut A. Acar School of Computer Science Carnegie Mellon University umut@cs.cmu.edu Guy E. Blelloch School of Computer Science Carnegie Mellon University guyb@cs.cmu.edu

More information

Theory of Computing Systems 2002 Springer-Verlag New York Inc.

Theory of Computing Systems 2002 Springer-Verlag New York Inc. Theory Comput. Systems 35, 321 347 (2002) DOI: 10.1007/s00224-002-1057-3 Theory of Computing Systems 2002 Springer-Verlag New York Inc. The Data Locality of Work Stealing Umut A. Acar, 1 Guy E. Blelloch,

More information

1 The secretary problem

1 The secretary problem Thi i new material: if you ee error, pleae email jtyu at tanford dot edu 1 The ecretary problem We will tart by analyzing the expected runtime of an algorithm, a you will be expected to do on your homework.

More information

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X Lecture 37: Global Optimization [Adapted from note by R. Bodik and G. Necula] Topic Global optimization refer to program optimization that encompa multiple baic block in a function. (I have ued the term

More information

Lecture 14: Minimum Spanning Tree I

Lecture 14: Minimum Spanning Tree I COMPSCI 0: Deign and Analyi of Algorithm October 4, 07 Lecture 4: Minimum Spanning Tree I Lecturer: Rong Ge Scribe: Fred Zhang Overview Thi lecture we finih our dicuion of the hortet path problem and introduce

More information

Routing Definition 4.1

Routing Definition 4.1 4 Routing So far, we have only looked at network without dealing with the iue of how to end information in them from one node to another The problem of ending information in a network i known a routing

More information

A note on degenerate and spectrally degenerate graphs

A note on degenerate and spectrally degenerate graphs A note on degenerate and pectrally degenerate graph Noga Alon Abtract A graph G i called pectrally d-degenerate if the larget eigenvalue of each ubgraph of it with maximum degree D i at mot dd. We prove

More information

Lecture Outline. Global flow analysis. Global Optimization. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

Lecture Outline. Global flow analysis. Global Optimization. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization Lecture Outline Global flow analyi Global Optimization Global contant propagation Livene analyi Adapted from Lecture by Prof. Alex Aiken and George Necula (UCB) CS781(Praad) L27OP 1 CS781(Praad) L27OP

More information

MAT 155: Describing, Exploring, and Comparing Data Page 1 of NotesCh2-3.doc

MAT 155: Describing, Exploring, and Comparing Data Page 1 of NotesCh2-3.doc MAT 155: Decribing, Exploring, and Comparing Data Page 1 of 8 001-oteCh-3.doc ote for Chapter Summarizing and Graphing Data Chapter 3 Decribing, Exploring, and Comparing Data Frequency Ditribution, Graphic

More information

Minimum congestion spanning trees in bipartite and random graphs

Minimum congestion spanning trees in bipartite and random graphs Minimum congetion panning tree in bipartite and random graph M.I. Otrovkii Department of Mathematic and Computer Science St. John Univerity 8000 Utopia Parkway Queen, NY 11439, USA e-mail: otrovm@tjohn.edu

More information

Karen L. Collins. Wesleyan University. Middletown, CT and. Mark Hovey MIT. Cambridge, MA Abstract

Karen L. Collins. Wesleyan University. Middletown, CT and. Mark Hovey MIT. Cambridge, MA Abstract Mot Graph are Edge-Cordial Karen L. Collin Dept. of Mathematic Weleyan Univerity Middletown, CT 6457 and Mark Hovey Dept. of Mathematic MIT Cambridge, MA 239 Abtract We extend the definition of edge-cordial

More information

Operational Semantics Class notes for a lecture given by Mooly Sagiv Tel Aviv University 24/5/2007 By Roy Ganor and Uri Juhasz

Operational Semantics Class notes for a lecture given by Mooly Sagiv Tel Aviv University 24/5/2007 By Roy Ganor and Uri Juhasz Operational emantic Page Operational emantic Cla note for a lecture given by Mooly agiv Tel Aviv Univerity 4/5/7 By Roy Ganor and Uri Juhaz Reference emantic with Application, H. Nielon and F. Nielon,

More information

arxiv: v1 [cs.ds] 27 Feb 2018

arxiv: v1 [cs.ds] 27 Feb 2018 Incremental Strong Connectivity and 2-Connectivity in Directed Graph Louka Georgiadi 1, Giueppe F. Italiano 2, and Niko Parotidi 2 arxiv:1802.10189v1 [c.ds] 27 Feb 2018 1 Univerity of Ioannina, Greece.

More information

Refining SIRAP with a Dedicated Resource Ceiling for Self-Blocking

Refining SIRAP with a Dedicated Resource Ceiling for Self-Blocking Refining SIRAP with a Dedicated Reource Ceiling for Self-Blocking Mori Behnam, Thoma Nolte Mälardalen Real-Time Reearch Centre P.O. Box 883, SE-721 23 Väterå, Sweden {mori.behnam,thoma.nolte}@mdh.e ABSTRACT

More information

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline Generic Travere CS 62, Lecture 9 Jared Saia Univerity of New Mexico Travere(){ put (nil,) in bag; while (the bag i not empty){ take ome edge (p,v) from the bag if (v i unmarked) mark v; parent(v) = p;

More information

Edits in Xylia Validity Preserving Editing of XML Documents

Edits in Xylia Validity Preserving Editing of XML Documents dit in Xylia Validity Preerving diting of XML Document Pouria Shaker, Theodore S. Norvell, and Denni K. Peter Faculty of ngineering and Applied Science, Memorial Univerity of Newfoundland, St. John, NFLD,

More information

A SIMPLE IMPERATIVE LANGUAGE THE STORE FUNCTION NON-TERMINATING COMMANDS

A SIMPLE IMPERATIVE LANGUAGE THE STORE FUNCTION NON-TERMINATING COMMANDS A SIMPLE IMPERATIVE LANGUAGE Eventually we will preent the emantic of a full-blown language, with declaration, type and looping. However, there are many complication, o we will build up lowly. Our firt

More information

3D SMAP Algorithm. April 11, 2012

3D SMAP Algorithm. April 11, 2012 3D SMAP Algorithm April 11, 2012 Baed on the original SMAP paper [1]. Thi report extend the tructure of MSRF into 3D. The prior ditribution i modified to atify the MRF property. In addition, an iterative

More information

Shortest Path Routing in Arbitrary Networks

Shortest Path Routing in Arbitrary Networks Journal of Algorithm, Vol 31(1), 1999 Shortet Path Routing in Arbitrary Network Friedhelm Meyer auf der Heide and Berthold Vöcking Department of Mathematic and Computer Science and Heinz Nixdorf Intitute,

More information

The Association of System Performance Professionals

The Association of System Performance Professionals The Aociation of Sytem Performance Profeional The Computer Meaurement Group, commonly called CMG, i a not for profit, worldwide organization of data proceing profeional committed to the meaurement and

More information

Cutting Stock by Iterated Matching. Andreas Fritsch, Oliver Vornberger. University of Osnabruck. D Osnabruck.

Cutting Stock by Iterated Matching. Andreas Fritsch, Oliver Vornberger. University of Osnabruck. D Osnabruck. Cutting Stock by Iterated Matching Andrea Fritch, Oliver Vornberger Univerity of Onabruck Dept of Math/Computer Science D-4909 Onabruck andy@informatikuni-onabrueckde Abtract The combinatorial optimization

More information

Hassan Ghaziri AUB, OSB Beirut, Lebanon Key words Competitive self-organizing maps, Meta-heuristics, Vehicle routing problem,

Hassan Ghaziri AUB, OSB Beirut, Lebanon Key words Competitive self-organizing maps, Meta-heuristics, Vehicle routing problem, COMPETITIVE PROBABIISTIC SEF-ORGANIZING MAPS FOR ROUTING PROBEMS Haan Ghaziri AUB, OSB Beirut, ebanon ghaziri@aub.edu.lb Abtract In thi paper, we have applied the concept of the elf-organizing map (SOM)

More information

xy-monotone path existence queries in a rectilinear environment

xy-monotone path existence queries in a rectilinear environment CCCG 2012, Charlottetown, P.E.I., Augut 8 10, 2012 xy-monotone path exitence querie in a rectilinear environment Gregory Bint Anil Mahehwari Michiel Smid Abtract Given a planar environment coniting of

More information

CERIAS Tech Report EFFICIENT PARALLEL ALGORITHMS FOR PLANAR st-graphs. by Mikhail J. Atallah, Danny Z. Chen, and Ovidiu Daescu

CERIAS Tech Report EFFICIENT PARALLEL ALGORITHMS FOR PLANAR st-graphs. by Mikhail J. Atallah, Danny Z. Chen, and Ovidiu Daescu CERIAS Tech Report 2003-15 EFFICIENT PARALLEL ALGORITHMS FOR PLANAR t-graphs by Mikhail J. Atallah, Danny Z. Chen, and Ovidiu Daecu Center for Education and Reearch in Information Aurance and Security,

More information

Delaunay Triangulation: Incremental Construction

Delaunay Triangulation: Incremental Construction Chapter 6 Delaunay Triangulation: Incremental Contruction In the lat lecture, we have learned about the Lawon ip algorithm that compute a Delaunay triangulation of a given n-point et P R 2 with O(n 2 )

More information

Contents. shortest paths. Notation. Shortest path problem. Applications. Algorithms and Networks 2010/2011. In the entire course:

Contents. shortest paths. Notation. Shortest path problem. Applications. Algorithms and Networks 2010/2011. In the entire course: Content Shortet path Algorithm and Network 21/211 The hortet path problem: Statement Verion Application Algorithm (for ingle ource p problem) Reminder: relaxation, Dijktra, Variant of Dijktra, Bellman-Ford,

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each type of circuit will be implemented in two

More information

DAROS: Distributed User-Server Assignment And Replication For Online Social Networking Applications

DAROS: Distributed User-Server Assignment And Replication For Online Social Networking Applications DAROS: Ditributed Uer-Server Aignment And Replication For Online Social Networking Application Thuan Duong-Ba School of EECS Oregon State Univerity Corvalli, OR 97330, USA Email: duongba@eec.oregontate.edu

More information

Key Terms - MinMin, MaxMin, Sufferage, Task Scheduling, Standard Deviation, Load Balancing.

Key Terms - MinMin, MaxMin, Sufferage, Task Scheduling, Standard Deviation, Load Balancing. Volume 3, Iue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Reearch in Computer Science and Software Engineering Reearch Paper Available online at: www.ijarce.com Tak Aignment in

More information

AUTOMATIC TEST CASE GENERATION USING UML MODELS

AUTOMATIC TEST CASE GENERATION USING UML MODELS Volume-2, Iue-6, June-2014 AUTOMATIC TEST CASE GENERATION USING UML MODELS 1 SAGARKUMAR P. JAIN, 2 KHUSHBOO S. LALWANI, 3 NIKITA K. MAHAJAN, 4 BHAGYASHREE J. GADEKAR 1,2,3,4 Department of Computer Engineering,

More information

Advanced Encryption Standard and Modes of Operation

Advanced Encryption Standard and Modes of Operation Advanced Encryption Standard and Mode of Operation G. Bertoni L. Breveglieri Foundation of Cryptography - AES pp. 1 / 50 AES Advanced Encryption Standard (AES) i a ymmetric cryptographic algorithm AES

More information

Optimizing Synchronous Systems for Multi-Dimensional. Notre Dame, IN Ames, Iowa computation is an optimization problem (b) circuit

Optimizing Synchronous Systems for Multi-Dimensional. Notre Dame, IN Ames, Iowa computation is an optimization problem (b) circuit Optimizing Synchronou Sytem for ulti-imenional pplication Nelon L. Pao and Edwin H.-. Sha Liang-Fang hao ept. of omputer Science & Eng. ept. of Electrical & omputer Eng. Univerity of Notre ame Iowa State

More information

A Practical Model for Minimizing Waiting Time in a Transit Network

A Practical Model for Minimizing Waiting Time in a Transit Network A Practical Model for Minimizing Waiting Time in a Tranit Network Leila Dianat, MASc, Department of Civil Engineering, Sharif Univerity of Technology, Tehran, Iran Youef Shafahi, Ph.D. Aociate Profeor,

More information

Multicast with Network Coding in Application-Layer Overlay Networks

Multicast with Network Coding in Application-Layer Overlay Networks IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 22, NO. 1, JANUARY 2004 1 Multicat with Network Coding in Application-Layer Overlay Network Ying Zhu, Baochun Li, Member, IEEE, and Jiang Guo Abtract

More information

A Sparse Shared-Memory Multifrontal Solver in SCAD Software

A Sparse Shared-Memory Multifrontal Solver in SCAD Software Proceeding of the International Multiconference on ISBN 978-83-6080--9 Computer Science and Information echnology, pp. 77 83 ISSN 896-709 A Spare Shared-Memory Multifrontal Solver in SCAD Software Sergiy

More information

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router Ditributed Packet Proceing Architecture with Reconfigurable Hardware Accelerator for 100Gbp Forwarding Performance on Virtualized Edge Router Satohi Nihiyama, Hitohi Kaneko, and Ichiro Kudo Abtract To

More information

See chapter 8 in the textbook. Dr Muhammad Al Salamah, Industrial Engineering, KFUPM

See chapter 8 in the textbook. Dr Muhammad Al Salamah, Industrial Engineering, KFUPM Goal programming Objective of the topic: Indentify indutrial baed ituation where two or more objective function are required. Write a multi objective function model dla a goal LP Ue weighting um and preemptive

More information

On successive packing approach to multidimensional (M-D) interleaving

On successive packing approach to multidimensional (M-D) interleaving On ucceive packing approach to multidimenional (M-D) interleaving Xi Min Zhang Yun Q. hi ankar Bau Abtract We propoe an interleaving cheme for multidimenional (M-D) interleaving. To achieved by uing a

More information

SIMIT 7. Component Type Editor (CTE) User manual. Siemens Industrial

SIMIT 7. Component Type Editor (CTE) User manual. Siemens Industrial SIMIT 7 Component Type Editor (CTE) Uer manual Siemen Indutrial Edition January 2013 Siemen offer imulation oftware to plan, imulate and optimize plant and machine. The imulation- and optimizationreult

More information

Shortest-Path Routing in Arbitrary Networks

Shortest-Path Routing in Arbitrary Networks Ž. Journal of Algorithm 31, 105131 1999 Article ID jagm.1998.0980, available online at http:www.idealibrary.com on Shortet-Path Routing in Arbitrary Network Friedhelm Meyer auf der Heide and Berthold Vocking

More information

Keywords Cloud Computing, Service Level Agreements (SLA), CloudSim, Monitoring & Controlling SLA Agent, JADE

Keywords Cloud Computing, Service Level Agreements (SLA), CloudSim, Monitoring & Controlling SLA Agent, JADE Volume 5, Iue 8, Augut 2015 ISSN: 2277 128X International Journal of Advanced Reearch in Computer Science and Software Engineering Reearch Paper Available online at: www.ijarce.com Verification of Agent

More information

Increasing Throughput and Reducing Delay in Wireless Sensor Networks Using Interference Alignment

Increasing Throughput and Reducing Delay in Wireless Sensor Networks Using Interference Alignment Int. J. Communication, Network and Sytem Science, 0, 5, 90-97 http://dx.doi.org/0.436/ijcn.0.50 Publihed Online February 0 (http://www.scirp.org/journal/ijcn) Increaing Throughput and Reducing Delay in

More information

Today s Outline. CS 561, Lecture 23. Negative Weights. Shortest Paths Problem. The presence of a negative cycle might mean that there is

Today s Outline. CS 561, Lecture 23. Negative Weights. Shortest Paths Problem. The presence of a negative cycle might mean that there is Today Outline CS 56, Lecture Jared Saia Univerity of New Mexico The path that can be trodden i not the enduring and unchanging Path. The name that can be named i not the enduring and unchanging Name. -

More information

Shortest Paths Problem. CS 362, Lecture 20. Today s Outline. Negative Weights

Shortest Paths Problem. CS 362, Lecture 20. Today s Outline. Negative Weights Shortet Path Problem CS 6, Lecture Jared Saia Univerity of New Mexico Another intereting problem for graph i that of finding hortet path Aume we are given a weighted directed graph G = (V, E) with two

More information

Universität Augsburg. Institut für Informatik. Approximating Optimal Visual Sensor Placement. E. Hörster, R. Lienhart.

Universität Augsburg. Institut für Informatik. Approximating Optimal Visual Sensor Placement. E. Hörster, R. Lienhart. Univerität Augburg à ÊÇÅÍÆ ËÀǼ Approximating Optimal Viual Senor Placement E. Hörter, R. Lienhart Report 2006-01 Januar 2006 Intitut für Informatik D-86135 Augburg Copyright c E. Hörter, R. Lienhart Intitut

More information

Multi-Target Tracking In Clutter

Multi-Target Tracking In Clutter Multi-Target Tracking In Clutter John N. Sander-Reed, Mary Jo Duncan, W.B. Boucher, W. Michael Dimmler, Shawn O Keefe ABSTRACT A high frame rate (0 Hz), multi-target, video tracker ha been developed and

More information

A Load Balancing Model based on Load-aware for Distributed Controllers. Fengjun Shang, Wenjuan Gong

A Load Balancing Model based on Load-aware for Distributed Controllers. Fengjun Shang, Wenjuan Gong 4th International Conference on Machinery, Material and Computing Technology (ICMMCT 2016) A Load Balancing Model baed on Load-aware for Ditributed Controller Fengjun Shang, Wenjuan Gong College of Compute

More information

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks Performance of a Robut Filter-baed Approach for Contour Detection in Wirele Senor Network Hadi Alati, William A. Armtrong, Jr., and Ai Naipuri Department of Electrical and Computer Engineering The Univerity

More information

New Structural Decomposition Techniques for Constraint Satisfaction Problems

New Structural Decomposition Techniques for Constraint Satisfaction Problems New Structural Decompoition Technique for Contraint Satifaction Problem Yaling Zheng and Berthe Y. Choueiry Contraint Sytem Laboratory Univerity of Nebraka-Lincoln Email: yzheng choueiry@ce.unl.edu Abtract.

More information

Algorithmic Discrete Mathematics 4. Exercise Sheet

Algorithmic Discrete Mathematics 4. Exercise Sheet Algorithmic Dicrete Mathematic. Exercie Sheet Department of Mathematic SS 0 PD Dr. Ulf Lorenz 0. and. May 0 Dipl.-Math. David Meffert Verion of May, 0 Groupwork Exercie G (Shortet path I) (a) Calculate

More information

Stochastic Search and Graph Techniques for MCM Path Planning Christine D. Piatko, Christopher P. Diehl, Paul McNamee, Cheryl Resch and I-Jeng Wang

Stochastic Search and Graph Techniques for MCM Path Planning Christine D. Piatko, Christopher P. Diehl, Paul McNamee, Cheryl Resch and I-Jeng Wang Stochatic Search and Graph Technique for MCM Path Planning Chritine D. Piatko, Chritopher P. Diehl, Paul McNamee, Cheryl Rech and I-Jeng Wang The John Hopkin Univerity Applied Phyic Laboratory, Laurel,

More information

SLA Adaptation for Service Overlay Networks

SLA Adaptation for Service Overlay Networks SLA Adaptation for Service Overlay Network Con Tran 1, Zbigniew Dziong 1, and Michal Pióro 2 1 Department of Electrical Engineering, École de Technologie Supérieure, Univerity of Quebec, Montréal, Canada

More information

Representations and Transformations. Objectives

Representations and Transformations. Objectives Repreentation and Tranformation Objective Derive homogeneou coordinate tranformation matrice Introduce tandard tranformation - Rotation - Tranlation - Scaling - Shear Scalar, Point, Vector Three baic element

More information

Aspects of Formal and Graphical Design of a Bus System

Aspects of Formal and Graphical Design of a Bus System Apect of Formal and Graphical Deign of a Bu Sytem Tiberiu Seceleanu Univerity of Turku, Dpt. of Information Technology Turku, Finland tiberiu.eceleanu@utu.fi Tomi Weterlund Turku Centre for Computer Science

More information

Chapter S:II (continued)

Chapter S:II (continued) Chapter S:II (continued) II. Baic Search Algorithm Sytematic Search Graph Theory Baic State Space Search Depth-Firt Search Backtracking Breadth-Firt Search Uniform-Cot Search AND-OR Graph Baic Depth-Firt

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier a a The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each b c circuit will be decribed in Verilog

More information

Service and Network Management Interworking in Future Wireless Systems

Service and Network Management Interworking in Future Wireless Systems Service and Network Management Interworking in Future Wirele Sytem V. Tountopoulo V. Stavroulaki P. Demeticha N. Mitrou and M. Theologou National Technical Univerity of Athen Department of Electrical Engineering

More information

Maneuverable Relays to Improve Energy Efficiency in Sensor Networks

Maneuverable Relays to Improve Energy Efficiency in Sensor Networks Maneuverable Relay to Improve Energy Efficiency in Senor Network Stephan Eidenbenz, Luka Kroc, Jame P. Smith CCS-5, MS M997; Lo Alamo National Laboratory; Lo Alamo, NM 87545. Email: {eidenben, kroc, jpmith}@lanl.gov

More information

SIMIT 7. Profinet IO Gateway. User Manual

SIMIT 7. Profinet IO Gateway. User Manual SIMIT 7 Profinet IO Gateway Uer Manual Edition January 2013 Siemen offer imulation oftware to plan, imulate and optimize plant and machine. The imulation- and optimizationreult are only non-binding uggetion

More information

A Linear Interpolation-Based Algorithm for Path Planning and Replanning on Girds *

A Linear Interpolation-Based Algorithm for Path Planning and Replanning on Girds * Advance in Linear Algebra & Matrix Theory, 2012, 2, 20-24 http://dx.doi.org/10.4236/alamt.2012.22003 Publihed Online June 2012 (http://www.scirp.org/journal/alamt) A Linear Interpolation-Baed Algorithm

More information

The underigned hereby recommend to the Faculty of Graduate Studie and Reearch aceeptance of the thei, Two Topic in Applied Algorithmic ubmitted by Pat

The underigned hereby recommend to the Faculty of Graduate Studie and Reearch aceeptance of the thei, Two Topic in Applied Algorithmic ubmitted by Pat Two Topic in Applied Algorithmic By Patrick R. Morin A thei ubmitted to the Faculty of Graduate Studie and Reearch in partial fullment of the requirement for the degree of Mater of Computer Science Ottawa-Carleton

More information

Trainable Context Model for Multiscale Segmentation

Trainable Context Model for Multiscale Segmentation Trainable Context Model for Multicale Segmentation Hui Cheng and Charle A. Bouman School of Electrical and Computer Engineering Purdue Univerity Wet Lafayette, IN 47907-1285 {hui, bouman}@ ecn.purdue.edu

More information

A Multi-objective Genetic Algorithm for Reliability Optimization Problem

A Multi-objective Genetic Algorithm for Reliability Optimization Problem International Journal of Performability Engineering, Vol. 5, No. 3, April 2009, pp. 227-234. RAMS Conultant Printed in India A Multi-objective Genetic Algorithm for Reliability Optimization Problem AMAR

More information

Shortest Paths with Single-Point Visibility Constraint

Shortest Paths with Single-Point Visibility Constraint Shortet Path with Single-Point Viibility Contraint Ramtin Khoravi Mohammad Ghodi Department of Computer Engineering Sharif Univerity of Technology Abtract Thi paper tudie the problem of finding a hortet

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each circuit will be decribed in VHL and implemented

More information

ES205 Analysis and Design of Engineering Systems: Lab 1: An Introductory Tutorial: Getting Started with SIMULINK

ES205 Analysis and Design of Engineering Systems: Lab 1: An Introductory Tutorial: Getting Started with SIMULINK ES05 Analyi and Deign of Engineering Sytem: Lab : An Introductory Tutorial: Getting Started with SIMULINK What i SIMULINK? SIMULINK i a oftware package for modeling, imulating, and analyzing dynamic ytem.

More information

Optimal Gossip with Direct Addressing

Optimal Gossip with Direct Addressing Optimal Goip with Direct Addreing Bernhard Haeupler Microoft Reearch 1065 La Avenida, Mountain View Mountain View, CA 94043 haeupler@c.cmu.edu Dahlia Malkhi Microoft Reearch 1065 La Avenida, Mountain View

More information

Analyzing Hydra Historical Statistics Part 2

Analyzing Hydra Historical Statistics Part 2 Analyzing Hydra Hitorical Statitic Part Fabio Maimo Ottaviani EPV Technologie White paper 5 hnode HSM Hitorical Record The hnode i the hierarchical data torage management node and ha to perform all the

More information

The norm Package. November 15, Title Analysis of multivariate normal datasets with missing values

The norm Package. November 15, Title Analysis of multivariate normal datasets with missing values The norm Package November 15, 2003 Verion 1.0-9 Date 2002/05/06 Title Analyi of multivariate normal dataet with miing value Author Ported to R by Alvaro A. Novo . Original by Joeph

More information

else end while End References

else end while End References 621-630. [RM89] [SK76] Roenfeld, A. and Melter, R. A., Digital geometry, The Mathematical Intelligencer, vol. 11, No. 3, 1989, pp. 69-72. Sklanky, J. and Kibler, D. F., A theory of nonuniformly digitized

More information

Bottom Up parsing. Bottom-up parsing. Steps in a shift-reduce parse. 1. s. 2. np. john. john. john. walks. walks.

Bottom Up parsing. Bottom-up parsing. Steps in a shift-reduce parse. 1. s. 2. np. john. john. john. walks. walks. Paring Technologie Outline Paring Technologie Outline Bottom Up paring Paring Technologie Paring Technologie Bottom-up paring Step in a hift-reduce pare top-down: try to grow a tree down from a category

More information

A METHOD OF REAL-TIME NURBS INTERPOLATION WITH CONFINED CHORD ERROR FOR CNC SYSTEMS

A METHOD OF REAL-TIME NURBS INTERPOLATION WITH CONFINED CHORD ERROR FOR CNC SYSTEMS Vietnam Journal of Science and Technology 55 (5) (017) 650-657 DOI: 10.1565/55-518/55/5/906 A METHOD OF REAL-TIME NURBS INTERPOLATION WITH CONFINED CHORD ERROR FOR CNC SYSTEMS Nguyen Huu Quang *, Banh

More information

CS201: Data Structures and Algorithms. Assignment 2. Version 1d

CS201: Data Structures and Algorithms. Assignment 2. Version 1d CS201: Data Structure and Algorithm Aignment 2 Introduction Verion 1d You will compare the performance of green binary earch tree veru red-black tree by reading in a corpu of text, toring the word and

More information

Engineering Parallel Software with

Engineering Parallel Software with Engineering Parallel Software with Our Pattern Language Profeor Kurt Keutzer and Tim Matton and (Jike Chong), Ekaterina Gonina, Bor-Yiing Su and Michael Anderon, Bryan Catanzaro, Chao-Yue Lai, Mark Murphy,

More information

Markov Random Fields in Image Segmentation

Markov Random Fields in Image Segmentation Preented at SSIP 2011, Szeged, Hungary Markov Random Field in Image Segmentation Zoltan Kato Image Proceing & Computer Graphic Dept. Univerity of Szeged Hungary Zoltan Kato: Markov Random Field in Image

More information

Parallel MATLAB at FSU: Task Computing

Parallel MATLAB at FSU: Task Computing Parallel MATLAB at FSU: Tak John Burkardt Department of Scientific Florida State Univerity... 1:30-2:30 Thurday, 07 April 2011 499 Dirac Science Library... http://people.c.fu.edu/ jburkardt/preentation/...

More information

Uninformed Search Complexity. Informed Search. Search Revisited. Day 2/3 of Search

Uninformed Search Complexity. Informed Search. Search Revisited. Day 2/3 of Search Informed Search ay 2/3 of Search hap. 4, Ruel & Norvig FS IFS US PFS MEM FS IS Uninformed Search omplexity N = Total number of tate = verage number of ucceor (branching factor) L = Length for tart to goal

More information

Brief Announcement: Distributed 3/2-Approximation of the Diameter

Brief Announcement: Distributed 3/2-Approximation of the Diameter Brief Announcement: Ditributed /2-Approximation of the Diameter Preliminary verion of a brief announcement to appear at DISC 14 Stephan Holzer MIT holzer@mit.edu David Peleg Weizmann Intitute david.peleg@weizmann.ac.il

More information

Aalborg Universitet. Published in: Proceedings of the Working Conference on Advanced Visual Interfaces

Aalborg Universitet. Published in: Proceedings of the Working Conference on Advanced Visual Interfaces Aalborg Univeritet Software-Baed Adjutment of Mobile Autotereocopic Graphic Uing Static Parallax Barrier Paprocki, Martin Marko; Krog, Kim Srirat; Kritofferen, Morten Bak; Krau, Martin Publihed in: Proceeding

More information

Nearly Constant Approximation for Data Aggregation Scheduling in Wireless Sensor Networks

Nearly Constant Approximation for Data Aggregation Scheduling in Wireless Sensor Networks Nearly Contant Approximation for Data Aggregation Scheduling in Wirele Senor Network Scott C.-H. Huang, Peng-Jun Wan, Chinh T. Vu, Yinghu Li and France Yao Computer Science Department, City Univerity of

More information

(12) Patent Application Publication (10) Pub. No.: US 2013/ A1. Dhar et al. (43) Pub. Date: Jun. 6, 2013 NY (US) (57) ABSTRACT

(12) Patent Application Publication (10) Pub. No.: US 2013/ A1. Dhar et al. (43) Pub. Date: Jun. 6, 2013 NY (US) (57) ABSTRACT (19) United State US 2013 0145314A1 (12) Patent Application Publication (10) Pub. No.: US 2013/0145314 A1 Dhar et al. (43) Pub. Date: Jun. 6, 2013 (54) SYSTEMAND METHOD FORCHANGEABLE (52) U.S. Cl. FOCUS

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and Thi article appeared in a journal publihed by Elevier. The attached copy i furnihed to the author for internal non-commercial reearch and education ue, including for intruction at the author intitution

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each circuit will be decribed in Verilog and implemented

More information

An Intro to LP and the Simplex Algorithm. Primal Simplex

An Intro to LP and the Simplex Algorithm. Primal Simplex An Intro to LP and the Simplex Algorithm Primal Simplex Linear programming i contrained minimization of a linear objective over a olution pace defined by linear contraint: min cx Ax b l x u A i an m n

More information

Touring a Sequence of Polygons

Touring a Sequence of Polygons Touring a Sequence of Polygon Mohe Dror (1) Alon Efrat (1) Anna Lubiw (2) Joe Mitchell (3) (1) Univerity of Arizona (2) Univerity of Waterloo (3) Stony Brook Univerity Problem: Given a equence of k polygon

More information

The Data Locality of Work Stealing

The Data Locality of Work Stealing The Data Locality of Work Stealing Umut A. Acar umut@cs.cmu.edu School of Computer Science Carnegie Mellon University Guy E. Blelloch blelloch@cs.cmu.edu School of Computer Science Carnegie Mellon University

More information

Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder

Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder Computer Arithmetic Homework 3 2016 2017 Solution 1 An adder for graphic In a normal ripple carry addition of two poitive number, the carry i the ignal for a reult exceeding the maximum. We ue thi ignal

More information

CENTER-POINT MODEL OF DEFORMABLE SURFACE

CENTER-POINT MODEL OF DEFORMABLE SURFACE CENTER-POINT MODEL OF DEFORMABLE SURFACE Piotr M. Szczypinki Iintitute of Electronic, Technical Univerity of Lodz, Poland Abtract: Key word: Center-point model of deformable urface for egmentation of 3D

More information

Optimal Multi-Robot Path Planning on Graphs: Complete Algorithms and Effective Heuristics

Optimal Multi-Robot Path Planning on Graphs: Complete Algorithms and Effective Heuristics Optimal Multi-Robot Path Planning on Graph: Complete Algorithm and Effective Heuritic Jingjin Yu Steven M. LaValle Abtract arxiv:507.0390v [c.ro] Jul 05 We tudy the problem of optimal multi-robot path

More information

Planning of scooping position and approach path for loading operation by wheel loader

Planning of scooping position and approach path for loading operation by wheel loader 22 nd International Sympoium on Automation and Robotic in Contruction ISARC 25 - September 11-14, 25, Ferrara (Italy) 1 Planning of cooping poition and approach path for loading operation by wheel loader

More information

Testing Structural Properties in Textual Data: Beyond Document Grammars

Testing Structural Properties in Textual Data: Beyond Document Grammars Teting Structural Propertie in Textual Data: Beyond Document Grammar Felix Saaki and Jen Pönninghau Univerity of Bielefeld, Germany Abtract Schema language concentrate on grammatical contraint on document

More information

Gray-level histogram. Intensity (grey-level) transformation, or mapping. Use of intensity transformations:

Gray-level histogram. Intensity (grey-level) transformation, or mapping. Use of intensity transformations: Faculty of Informatic Eötvö Loránd Univerity Budapet, Hungary Lecture : Intenity Tranformation Image enhancement by point proceing Spatial domain and frequency domain method Baic Algorithm for Digital

More information

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1 US 2003O196031A1 (19) United State (12) Patent Application Publication (10) Pub. No.: US 2003/0196031 A1 Chen (43) Pub. Date: Oct. 16, 2003 (54) STORAGE CONTROLLER WITH THE DISK Related U.S. Application

More information

Performance Evaluation of an Advanced Local Search Evolutionary Algorithm

Performance Evaluation of an Advanced Local Search Evolutionary Algorithm Anne Auger and Nikolau Hanen Performance Evaluation of an Advanced Local Search Evolutionary Algorithm Proceeding of the IEEE Congre on Evolutionary Computation, CEC 2005 c IEEE Performance Evaluation

More information

Distribution-based Microdata Anonymization

Distribution-based Microdata Anonymization Ditribution-baed Microdata Anonymization Nick Kouda niverity of Toronto kouda@c.toronto.edu Ting Yu North Carolina State niverity yu@cc.ncu.edu Diveh Srivatava AT&T Lab Reearch diveh@reearch.att.com Qing

More information

Floating Point CORDIC Based Power Operation

Floating Point CORDIC Based Power Operation Floating Point CORDIC Baed Power Operation Kazumi Malhan, Padmaja AVL Electrical and Computer Engineering Department School of Engineering and Computer Science Oakland Univerity, Rocheter, MI e-mail: kmalhan@oakland.edu,

More information

Quadrilaterals. Learning Objectives. Pre-Activity

Quadrilaterals. Learning Objectives. Pre-Activity Section 3.4 Pre-Activity Preparation Quadrilateral Intereting geometric hape and pattern are all around u when we tart looking for them. Examine a row of fencing or the tiling deign at the wimming pool.

More information

Semi-Distributed Load Balancing for Massively Parallel Multicomputer Systems

Semi-Distributed Load Balancing for Massively Parallel Multicomputer Systems Syracue Univerity SUFAC lectrical ngineering and Computer Science echnical eport College of ngineering and Computer Science 8-1991 Semi-Ditributed Load Balancing for aively Parallel ulticomputer Sytem

More information

Using Partial Evaluation in Distributed Query Evaluation

Using Partial Evaluation in Distributed Query Evaluation A X x Z z R r y Y B Uing Partial Evaluation in Ditributed Query Evaluation Peter Buneman Gao Cong Univerity of Edinburgh Wenfei Fan Univerity of Edinburgh & Bell Laboratorie Anataio Kementietidi Univerity

More information

Sequencing and Counting with the multicost-regular Constraint

Sequencing and Counting with the multicost-regular Constraint Sequencing and Counting with the multicot-regular Contraint Julien Menana and Sophie Demaey École de Mine de Nante, LINA CNRS UMR 6241, F-44307 Nante, France. {julien.menana,ophie.demaey}@emn.fr Abtract.

More information