Reducing Structural Bias in Technology Mapping

Size: px
Start display at page:

Download "Reducing Structural Bias in Technology Mapping"

Transcription

1 Reducing Structural Bias in Technology Maing S. Chatterjee A. Mishchenko R. Brayton Deartment of EECS U. C. Berkeley {satrajit, alanmi, X. Wang T. Kam Strategic CAD Labs Intel Cororation {xinning.wang, ABSTRACT Technology maing based on DAG-covering suffers from the roblem of structural bias: the structure of the maed netlist deends strongly on the subject grah. In this aer we resent a new maer aimed at mitigating structural bias. It is based on a simlified cut-based Boolean matching algorithm, and using the seed afforded by this simlification we exlore two ideas to reduce structural bias. The first, called lossless synthesis, leverages recent advances in structure-based combinational equivalence checking to combine the different networks seen during technology indeendent synthesis into a single network with choices in a scalable manner. We show how cut-based maing extends naturally to handle such networks with choices. The second idea is to combine several library gates into a single gate (called a suergate) in order to make the matching rocess less local. We show how suergates hel address the structural bias roblem, and how they fit naturally into the cut-based Boolean matching scheme. An imlementation based on these ideas significantly outerforms state-of-the-art maers in terms of delay, area and run-time on academic and industrial benchmarks. 1. INTRODUCTION The task of technology maing in standard-cell logic synthesis is to exress a given Boolean function as a network of gates chosen from a given standard-cell library to otimize some objective function such as total area or delay. In these general terms, technology maing is intractable. The roblem is usually simlified by first reresenting the Boolean function as a good initial multi-level network of simle gates called the subject grah. The subject grah is then transformed into a multilevel network of library gates by means of local substitutions. This simlification means that the structure of the subject grah dictates to a large extent the structure of the maed network; this is known as structural bias. In this work we resent a new Boolean technology maer aimed at mitigating the effects of structural bias. At the core of the maer is a simlified Boolean matching algorithm that is faster than structural matching and roduces better results. With the seed afforded by this matching technique, we roose two comlementary techniques to reduce structural bias: lossless synthesis and suergates. Lossless Synthesis. To obtain a good structure for the subject grah a number of technology indeendent synthesis stes are usually erformed. An examle of this is the SIS rugged scrit shown in Figure 1(a). Each ste in the scrit is Design simlify simlify resub fx resub mfs w 22 Technology Maing (a) Conventional Synthesis Design simlify simlify resub fx resub mfs w 22 Technology Maing (b) Lossless Synthesis Figure 1: In lossless synthesis intermediate networks are combined to create a choice network which is then used for maing. heuristic, and the subject grah roduced at the end of the scrit is not necessarily otimal; indeed it is ossible that an intermediate network is better in some resects than the final network. We exlore the idea of combining these intermediate networks into a single subject grah with choices and using that to derive the maed netlist. This is shown schematically in Figure 1(b). Following the work of Lehman et al. [13], the maing is done otimally over all the networks encoded in the subject grah. The maer is not constrained to use any one network, but can ick and choose the best arts of each network. We call this lossless synthesis since no network seen during the synthesis rocess is ever lost. By including the initial network in the choice network, we can be sure that the heuristic logic synthesis oerations never make things worse. Furthermore, as in [13], multile scrits (with ossibly different objectives) could be used to accumulate more choices.

2 f f f q 0 x 1 q a b c d a b c d a b c d (a) The subject grah. (b) An easy match to find. (c) A hard match to find. Figure 2: Simle examle illustrating the utility of suergates. Consider the subject grah shown in (a) where each node reresents a 2-inut AND gate and a dashed edge indicates the resence of an inverter. The match in (b) is easy to find since both inuts of the XOR are resent in the subject grah. In contrast the match in (c) is hard to find, since the MUX inut labeled x is not resent in the subject grah. The match in (c) can be found by combining some gates into a suergate. Note that all the inuts of the suergate are still required to be resent in the subject grah. The challenging art of this roosal is the efficient creation of the choice network. Towards this end we roose to leverage recent advances in combinational equivalence. State-of-the-art equivalence checkers deend on finding functionally equivalent internal oints in the networks being checked in order to reduce the comlexity of the decision rocedure. By using a combination of random simulation and SAT it is ossible to quickly determine equivalent oints in two circuits. These equivalent oints rovide the choices for maing in our roosal. Suergates. A suergate is a single outut combinational network of a few library gates which is treated as a single library gate by our algorithms. The advantage of doing this may not be immediately obvious: one might exect that if a suergate matches at a node, the conventional matching algorithm would return the same result, excet it would match library gate by library gate rather than all the gates at once. This is not true and Figure 2 rovides a simle counter-examle. The subject grah is shown in Figure 2(a). Conventional maing would find the maing consisting of the XOR and AOI gates as shown in Figure 2(b), but would fail to find the maing with the MUX as shown in Figure 2(c). To see this, observe that one of the inuts of the MUX (labeled x in Figure 2(c)) is not resent in the subject grah. Consequently, the MUX by itself is not a valid match for f. In contrast, the inuts of the XOR (nets and d) in Figure 2(b) are both resent in the subject grah; thus making the XOR a valid match at f. Now if we connect the MUX, the XOR, and the inverter together to form a new gate (in the manner shown in Figure 2(c)), it is easily verified that this new gate is a match at f in the subject grah. This new gate is a suergate built from the library gates exressly for the urose of finding better matches. This examle illustrates the main idea behind suergates: Using bigger gates allows the matching rocedure to be less local, and thus less affected by the structure of the subject grah. Furthermore, as this examle illustrates, suergates are useful even with standard cell libraries that are functionally rich. The three ideas resented in this aer the faster matching algorithm, lossless synthesis, and suergates are logically indeendent. However, one of the contributions of this aer is to show how these ideas come together naturally in the context of cut-based Boolean maing. In Section 2 we resent related work. Section 3 has an overview of the maer and details of the matching algorithm. In Section 4 we resent ractical techniques to construct the choice network for lossless synthesis and extend the basic maing to handle choices. In Section 5 we resent details of suergate construction. In Section 6 we resent the results of exeriments designed to determine the runtime quality trade-off of the techniques resented in this aer. We also resent comarisons with other industrial and academic maers. We conclude in Section 7 by ointing out some limitations of these techniques and suggesting future work. Finally, we note that the focus of this aer is on mitigating structural bias, that is, on the logical asects of the technology maing roblem. Therefore to simlify exosition we resent the algorithms in the context of a simle gain-based delay model which does not take caacitive loading into account. Such a model is useful in ractice in a flow that uses technology maing for toology selection, and does sizing and buffering iteratively with lacement. Furthermore, the techniques in this aer continue to be alicable in a more traditional load-based flow, since essentially

3 they are techniques to increase the number of toologies considered during maing, and are orthogonal to the usual techniques of considering loads during technology maing. 2. RELATED WORK The literature on technology maing rovides a sectrum of techniques that trade structural bias for comutational comlexity. The classical structural aroaches, such as tree- and DAG-covering [8, 12], lie at one end of this sectrum. They have relatively short run-times but rovide sub-otimal results since their maing choices are comletely constrained by the given subject grah. Constructive Boolean aroaches [9, 19] lie closer to the other end of the sectrum. Although they do not deend as much on the structure of the subject grah, they are limited by the choice of their (heuristic) decomosition schemes and local view since they decomose one node at a time. These limitations combined with long run-time makes them useful mostly in a re-synthesis flow after a maed network has been obtained by some other means. The aroach described by Lehman et al. [13] lies between these two extremes: a number of different local algebraic decomositions are encoded into the given subject grah as choices. The subject grah itself is constructed by combining the results of different technology maing scrits. The ideas relating to lossless synthesis resented in this aer are variations on this basic aroach. Beyond the obvious difference of using intermediate networks from one synthesis scrit, there are significant algorithmic differences. First, the use of structural equivalence checking to detect choices (as oosed to BDDs) allows for lossless synthesis on large circuits. Second, the extension of Boolean matching to handle choices resented in this aer overcomes the run-time limitations of the structural matching methods used by Lehman et al. Furthermore, there is an imrovement in quality since matching is done directly on general DAGs and comlex multi-fanout gates are handled more naturally. Wavefront maing [21] is a ractical enhancement of [13] that mas the circuit in stages, thus reducing the amount of match data that is stored at one time, though at the cost of otimality. An extension of this algorithm allows for dynamic decomosition based on a artially maed circuit. Comared to the aroach of Lehman et al. it reduces the number of decomositions that need to be exlored. The idea of wavefront can be used in conjunction with the techniques in this aer to reduce the amount of match data that needs to be stored in memory at one time. In addition, this aer uts forth the idea of combining library gates into suergates to obtain a more Boolean matching. By squinting a little, one can view suergates as being an alternate source of choices. The choices induced by suergates come from the library rather than the network. In the literature similar combinations of gates have been used for other uroses such as constructive decomosition [19] and rewiring [2]. We note here that suergates allows libraryaware Boolean decomositions to be exlored as oosed to algebraic decomositions in [13], and we defer the discussion to Section 5.5. The work related to the simlified Boolean matching is resented in Section BOOLEAN MATCHING We introduce the simlified, Boolean matching algorithm in the context of the overall maing flow. The maing rocedure is a Boolean, cut-based one [4, 6, 7] on a DAG using dynamic rogramming [3, 12] to guarantee delay otimality. 3.1 Overview of Boolean Maing The inut to the maing rocedure is an AND-Inverter grah (AIG) [11]. An AIG is a DAG whose nodes reresent either AND gates or rimary inuts(pis). Its edges reresent wires. Inverters are reresented by bubbles on the edges. Given an AIG, the maing is done in 5 stes. Ste 1. Comute k-feasible cuts. A feasible cut of a node N in the AIG is a set of nodes {X i} in the transitive fan-in cone of N such that an arbitrary assignment of values to X i comletely determines the value of N. A feasible cut is redundant if the value of a node in the cut is comletely determined by an assignment of values to the other nodes in the cut. A k-feasible cut is a feasible cut of size at most k that is not redundant. The cut {N} is always a k-feasible cut of node N (for any k) and is called the trivial cut. Let Φ(N) denote the set of k-feasible cuts of node N. If N is a PI, then Φ(N) = {{N}}. If N is a AND node with inuts A and B, then Φ(N) = {{N}} {u v u Φ(A), v Φ(B), u v k} We comute all 5-feasible cuts of every node in the network by the simle bottom-u traversal based on the above recursion. Although in general a grah may have O(n 5 ) 5- feasible cuts, we found that most test-cases have between 20 and 30 5-feasible cuts er node. We restrict our attention to 5-feasible cuts since our exeriments show that the total number of cuts increases very quickly with k. Pruning techniques have to be alied, and the maing results are not significantly better (since the runing is quite arbitrary). Ste 2. Comute truth-tables of cuts. The next ste is to comute the local function of a node in terms of its cut. This is done for every non-trivial k-feasible cut of every node in the network. Given a node N, and a cut {X i} of that node, formal variables are assigned to the each cut node (in no articular order). Using these variables, the functionality of the node is comuted symbolically. We note here that BDDs are not necessary for this comutation: since usually only 5-feasible cuts are considered the symbolic function comutation can be erformed efficiently using 32-bit machine words to reresent truth-tables. In what follows we use the words function and truth-table interchangeably. Ste 3. Boolean Matching. For each node in the network, for every cut, an aroriate gate (if one exists) is chosen from the library. Each gate thus chosen is called a match for the node. Our matching rocedure differs from the traditional aroach and we resent this in greater detail in the following section. Ste 4. Comute best arrival time at each node. Starting from the PIs and working in toological order towards the oututs, the best arrival time is comuted for each node from amongst all its matches. Ste 5. Choose the best cover. In the reverse toological order, the best gate for each rimary outut is cho-

4 sen. Next, the best gates imlementing the inuts of these gates are chosen and so on until all rimary inuts have been reached. q 3.2 Simlified Boolean Matching In the matching ste we wish to choose the aroriate library gate to imlement a given function. The traditional solution is to use NPN-equivalence classes. (Two Boolean functions are NPN-equivalent if one can be transformed into the other by ermuting or negating the inuts or negating the outut.) First the NPN-canonical reresentatives of the library gates are comuted. The gates are stored in a hash table indexed by the reresentatives. During matching the NPN-canonical reresentative is comuted for every cut function. It is used to look u the hash table to find the aroriate gate. However this aroach is slow since it requires the comutation of the NPN-reresentative for every cut function. Furthermore, once the aroriate gate is found, the aroriate variable corresondence must be found between the library gate and the cut function. Since these comutations are done in the inner body of the maer, there has been a lot of research on seeding them u [1, 5, 7, 22]. Our simlified matching rocedure is motivated by the fact that in the maing rocedure described above, the functions have only 5 variables. It is therefore advantageous to recomute all functions obtained by ermuting the inuts to library gates and to add those to the hash table. Thus during the actual matching, there is no need to comute a canonical reresentative. The cut function can be used directly to look u the hash table. In addition to avoiding the NPN-equivalence checking, it also avoids the need to establish inut corresondence after the gate is found. Examle. For an AOI gate having the function (x 1 x 2 +x 3), we recomute all ermutations i.e. (x 1 x 3 +x 2), (x 2 x 1 + x 3), (x 2 x 3 + x 1) etc. Note that both (x 2 x 1 + x 3) and (x 1 x 2 + x 3) are ket although they are the same function. This is because in industrial libraries, the inut-to-outut delays are often different even for functionally symmetric ins. Once again, all functional comutations are done with truth tables. In Section 5.2 on suergate generation we describe this recomutation rocedure in more detail. (The recomutation can be thought of as generation of suergates consisting of single library gates.) So far we have not considered the hase assignment at the inuts. This is because the maing is done dual rail as exlained in the following section on otimal hase selection. The roblem with this simlified matching rocedure is that it is not scalable. Indeed, this would not work if the functions we were interested in had, say, 10 variables. But in the framework of Boolean maing it suffices since we deal with functions having only a few variables. 1 This means that in the worst case (a library gate with 5 inuts, and no symmetries) 120 functions are added to the hash table (instead of 1 as in the NPN-reresentative case). However since each match is now only a hash table looku (versus an NPN-reresentative comutation, looku, and inut corresondence), the matching rocedure runs faster. 1 In Boolean maing, as more variables are used, the cut comutation becomes a bottleneck before the matching does. a x b Figure 3: In order to ma and q otimally for delay, the node x must be maed in both olarities. 3.3 Otimal Phase Selection The maing quality can be imroved by exloiting the additional flexibility of maing each node in both olarities: ositive (as above) and negative. When the final maing is selected (in Ste 5 of Section 3.1) the aroriate olarity is chosen to guarantee the shortest delay on each ath. This may lead to an increase in area since the same node may be required in both olarities, and may have to be dulicated. Examle. Consider the AIG fragment in Figure 3. Node x is required in both olarities. If it is maed in only one olarity then the arrival time at either or q increases by an additional inverter delay. Observe that this dual rail maing means that the inuts of a cut are available in both olarities. Consequently the function of a cut is not recisely determined: it belongs to a class of functions which differ only by comlementation of inuts. This class is called the N-equivalence class. There is greater flexibility since any gate belonging to the N-equivalence class may be used to imlement the cut. The simlified matching rocedure above is extended to work with N-equivalence classes. During library rerocessing, for every ermutation of a gate, the N-equivalence reresentative is comuted. This is defined as the function with the lexicograhically smallest truth table. Similarly during matching, for every node function the N-equivalence reresentative is comuted and used for looking u in the hash table. The alert reader would have noticed that this extension negates some seed benefit of the simlified matching rocedure resented in Section 3.2. However this may be justified with the following arguments. First, comuting the N-equivalence reresentative is less exensive than comuting the NPN-equivalence reresentative since the class is smaller (2 n versus 2 n+1 n!). Second, this is a necessary rice to ay for exloring this larger search sace of decomositions (c.f. the inverter transform in Lehman et al. [13]). In structural maers this search sace is exlored by adding a air of inverters between two nodes, and by adding a wire (with zero cost) to the library. That technique is not well suited for Boolean maing since it significantly increases the number of cuts in the network. 3.4 Using Don t Cares An advantage of the Boolean matching technique outlined above is the ability to handle local satisfiability don t cares during the matching rocess. If the matching is done with k feasible cuts, then for every node, a cut larger than k is constructed. This can be done by a simle deth-first search starting from the node. (For examle if matching is done

5 with 5-feasible cuts, a larger cut of size, say, 10 is considered.) For every k feasible cut of the node, it is ossible to comute the satisfiability don t cares symbolically. This can be done either with exhaustive simulation using truthtables or with BDDs. Once the satisfiability don t cares of a k feasible cut are identified, each don t care minterm can be set to one or zero to obtain a comletely secified cut function. This comletely secified function is used to looku the hash table as before. Thus each k feasible cut gives rise to a set of cut functions when satisfiability don t cares are considered. The main drawback of this method is the exonential blow-u in the number of cut functions since each don t care gives rise to two ossible functions. Thus in ractice only an arbitrary subset of the cut functions can be used for matching. Our reliminary exeriments confirmed that using don t cares in this manner imroved the quality of maing. However, in our alication the increase in quality was not justified by the increase in run-time. 4. LOSSLESS LOGIC SYNTHESIS The idea behind lossless logic synthesis is to remember every network seen during a synthesis flow (or a set of flows) and to select the best arts of each network during technology maing. This is useful for two reasons. First, technology-indeendent synthesis algorithms are usually heuristic, and so there is no guarantee that the final network is otimal. By maing using only the final network, we might miss out on a better result that could be obtained from an intermediate network in the otimization flow. Second, synthesis oerations usually aly to the network as a whole. So a flow to otimize delay might significantly increase area, since the whole network is otimized for delay. By combining such a delay otimized network with another network that has been otimized for area, it is ossible to get the best of both. On the critical ath, the maer can choose from the delay-otimized network, whereas off the critical ath, the maer chooses from the area-otimized network. The main roblem is constructing the choice network efficiently. In Section 4.1 we give an overview of how this is done. In Section 4.2 we extend the Boolean maing rocedure of Section 3.1 to handle choices. 4.1 Constructing the choice network The choice network is constructed from a collection of networks that are functionally equivalent. The key idea is to use recent advances in equivalence checking that are based on identifying functionally equivalent internal oints in the networks being checked. Concetually the rocedure is as follows: one can imagine each network to be decomosed into AND gates and inverters to form an AIG. Now for every node in the network the global function is comuted, say by building BDDs. All those nodes which have the same global function are collected in equivalence classes. Thus, the choice network is an AIG which has multile functionally equivalent oints collected in equivalence classes. However for large circuits comuting global BDDs is not feasible. Note that in the rocedure outlined above, it is not necessary to actually comute BDDs. One can use random simulation to identify otentially equivalent nodes, and then use a SAT engine to verify equivalence and construct the equivalence classes. We have imlemented a ackage called FRAIG (functionally reduced and-inverter grahs) that exoses the same API as a BDD ackage but internally uses simulation and SAT. (The details of this ackage are in the technical reort [18]. A discussion of the issues involved can also be found in recent work on structure-based equivalence checking [10, 11, 14, 15].) Examle. Figure 4 illustrates the creation of a network with choices. Networks 1 and 2 show the subject grahs obtained from two networks that are functionally equivalent, but structurally different. The nodes x 1 and x 2 are functionally equivalent (u to comlementation) in the two subject grahs. They are collected in an equivalence class in the choice network, and an arbitrary member (x 1 in this case) is used as a reresentative for the class in the choice network. Note that there is no choice corresonding to node o since the rocedure described above detects the maximal commonality between the two networks. Algebraic re-writing. A different way to generate choices is by iteratively alying the Λ- and -transformations described by Lehman et al. [13] Given an AIG, we use the associativity of AND to locally re-write the grah (the Λ- transformation), i.e. whenever the structure AND(AND(x1, x2), x3) is seen in the AIG, it is relaced by the equivalent structures AND(AND(x1, x3), x2) and AND(x1, AND(x2, x3)). If this rocess is done until no new AND nodes are created, it is equivalent to identifying the maximal multiinut AND-gates in the AIG and adding all ossible treedecomositions. Similarly, the distributivity of AND over OR (the -transformation) rovides another source of choices. In Section 6 we resent exerimental results that comare the flexibility rovided by these choices with those from lossless synthesis. Note that this leads to a new way of thinking about logic synthesis: one can use arbitrary transformations to re-write the network and create choices. The best combination of these choices is selected during maing. 4.2 Maing with choices The cut-based Boolean maing rocedure of Section 3.1 can be extended naturally to handle equivalence classes of nodes. Only the cut comutation ste needs modification. Given a node N, let N denote the equivalence class it belongs to. Let Φ (N ) denote the set of cuts of the equivalence class N. Then, Φ (N ) = [ N N Φ(N) where, if A and B are the two inuts of N, Φ(N) is given by {{N}} {u v u Φ (A ), v Φ (B ), u v k} This exression for Φ(N) is a slight modification of the one in Section 3.1: The cuts of N are obtained from the cuts of the equivalence classes of its inuts (instead of the cuts of just its inuts). The reader should verify that in the absence of choices (which corresonds to the situation when all equivalence classes have only one node) this comutation is essentially the same as the one resented in Section 3.1. As before, the cut comutation can be done in a bottomu manner from PIs to oututs in a single ass.

6 Network 1 Network 2 Network with Choice o o o x 1 x 1 x 2 x 2 a b c d e = b c o = a + d + e a b c d e a b c d e = b c o = a + (d + e) Figure 4: Examle illustrating the creation of a network with choices. A choice is created when there are two nodes with the same global function (u to comlementation), but different structures. Thus nodes x 1 and x 2 lead to a choice, but node does not ( is structurally the same in both networks). Examle. Consider the comutation of the 3-feasible cuts of the equivalence class {o} in Figure 4. Let x reresent the equivalence class {x 1, x 2}. Now, Φ (x) = Φ(x 1) Φ(x 2) = {{x 1}, {x 2}, {q, r}, {, s}, {q,, e}, {, d, r}, {, d, e}, {b, c, s}} and Φ ({a}) = Φ(a) = {a}. We have Φ ({o}) = Φ({o}) = {{o}} {u v u Φ (a ), v Φ (x 1 ), u v 3} Now since a = {a}, and x 1 = x, we get Φ ({o}) = {{o}, {a, x 1}, {a, x 2}, {a, q, r}, {a,, s}} Observe that the set of cuts of o involves nodes from the two choices x 1 and x 2, i.e. o may be imlemented using either of the two structures. The subsequent stes of the maing rocess (stes 2 5 of Section 3.1) remain unchanged, excet now the oerations are done for equivalence classes of nodes, rather than for individual nodes. 5. SUPERGATES A suergate is a single outut combinational network of a few library gates which is treated as a single library gate by the maing rocedure. As the examle in the introduction shows (see Figure 2), suergates match larger ortions of the subject grah than library gates. This makes the matching more Boolean and less deendent on the structure of the subject grah. This greatly increases the number of matches seen by the maer, and leads to better results. In what follows we use the term simle gates to mean the gates in the original library. 5.1 Use in Maing Suergates require no change to the maing rocedure since they are no different from simle gates for the maer. Suergates are generated in a re-rocessing ste as described in Section 5.2. The suergate library is generated once, stored comactly in a file, and used when technology maing is invoked. The suergates are recomuted only if changes are made to the original library. This is why suergate generation has an additional advantage of reducing the total run-time of maing by re-comuting and re-using the maing information, which deends on the library but not on the network to be maed. After the network is maed using suergates, each suergate in the maed netlist is relaced by its constituent simle gates; thus the final netlist consists of only the library gates. 5.2 Suergate Generation Suergates are generated recursively in a number of rounds. In each round we generate a new set of suergates and comute their functions. These suergates are used in subsequent rounds to generate new suergates. As usual we reresent functions by truth tables. Let k be the maximum number of inuts in the suort of the suergates we consider. For examle when maing with 5-feasible cuts, k would be 5. Let G i be the set of suergates at round i. Let L be the set of simle library gates. Let J k denote {1.. k}, π(x) the set of ermutations of a set X, and g the number of inuts of a simle gate g L. The suergate generation is as follows. Initially, G 0 = {x i i J k } where x i are elementary functions. Each G i+1 is the union of G i and an additional set of functions of the form g(x i1, x i2,.. x in ) where g G i and (i 1, i 2,.. i n) π(y ), Y being a subset of J k of cardinality n = g. The total number of rounds corresonds to the maximum number of logic levels in a suergate, and is an user-secified

7 arameter. If the generation is stoed after the first round (i.e. at G 1), then the set of suergates contains only the library gates with all ermutations of inut variables (c.f. Section 3.2). Examle. Let k = 3, i.e. we are interested in suergates with at most 3 inuts. Let L = {AND, OR}. G 0 = {x 1, x 2, x 3} where the truth table of x 1 is , x 2 is and x 3 is Each G i+1 contains in addition to the functions in G i, functions of the form AND(y 1, y 2) and OR(y 1, y 2) where y 1, y 2 are functions in G i. Thus x 1, AND(x 1, x 2) (whose truth table is ), AND(x 2, x 1), AND(x 1, x 3), etc. are some functions in G 1. Similarly, AND(AND(x 1, x 2), x 3), AND(OR(x 2, x 1), AND(x 1, x 3)), AND(AND(x 1, x 2), AND(x 1, x 3)), are some functions in G Pruning by Dominance Since the above rocedure is exhaustive, a large number of suergates is generated. However, some of the suergates generated above are sub-otimal, i.e. they are dominated by other gates. Examle. Consider the gate AND(AND(x 1, x 2), AND(x 1, x 3)) from the examle above. It has worse delay and area than the functionally equivalent suergate AND(AND(x 1, x 2), x 3). Whenever a new suergate is created, it is checked against existing suergates that imlement the same function (by means of a hash table). If the new suergate is worse in terms of area and delay than an existing one, then it is not added to the set of suergates. Also note that usually in industrial libraries functional symmetry of the underlying gate is not very useful. For examle both AND(x 1, x 2) and AND(x 2, x 1) have to be retained since the in-to-in delays are different in the two cases. 5.4 Pruning by Resource Limits Another technique to reduce the number of suergates is by runing based on resource limits. This is imortant because of a combinatorial exlosion inherent in the above formulation. Examle. If at one oint there are 1000 suergates, and a 4-inut NAND is used as a root gate, there would be suergates to consider. Many of these would be added (since the in-to-in delays are different) and the number of suergates would increase significantly, making the next round imossible to comlete. We exerimented with a number of heuristics to reduce the combinatorial exlosion. The simlest heuristic is to use only small suort gates (say 3) as root gates. A second heuristic to set area and delay limits on the suergates. These limits can be handled efficiently by sorting the suergates and using only those suergates as inuts to the root gate such that the resulting suergate would be within the area and delay limits. The suergate generation technique resented above is rather basic and inefficient because of the bottom-u nature of the generation rocess. Imroving the generation rocess is an interesting research roblem. One idea is to study the cut functions actually encountered during maing, and then to emloy constructive decomosition techniques to generate good suergates for the commonly oc- Mode Delay Area Run-time B B+L B+S B+L+S Table 1: Comarison of various maer modes. B is Baseline, L is Lossless synthesis, S is Suergates. Run-time for B+L and B+L+S does not include the choice generation time. The time required for choice generation is roughly a factor of three of the baseline run-time. curring functions. This leads to the exciting ossibility of a maer that learns from the circuits it rocesses! 5.5 Comarison with Algebraic Re-Writing Recall from Section 4.1 that algebraic re-writing includes the addition of choices by adding alternative decomositions based on associative and distributive transforms in the manner of Lehman et al. [13]. If suergates are generated exhaustively without resource limits, it is ossible to do exact logic synthesis (the entire circuit would ma to a single suergate). In this theoretical setting, maing with suergates is the most general form of logic synthesis. In ractice, as discussed above, only a limited set of suergates can be used. Practically, the search sace exlored by suergates is different from the search sace exlored by re-writing in two ways. First, since suergates are built by combining gates, suergates cature different Boolean decomositions of a function. Therefore they are more general than algebraic re-writing which by definition is limited to algebraic decomositions. Second, since the set of decomositions exlored deends on the library, and the runing techniques described above rune away sub-otimal decomositions, suergates exlore a sace of good decomositions relative to the gates in the library. In summary, the decomositions due to suergates are a targeted though sarse samling of the large sace of Boolean decomositions. In contrast, algebraic re-writing is a dense samling of the smaller sace of algebraic decomositions. 6. EXPERIMENTAL RESULTS The techniques described in this aer have been imlemented in the MVSIS logic synthesis system [17]. The imlementation is freely available. We erformed a number of exeriments to characterize the erformance of the maer in its various modes, and to benchmark it against other state-of-the-art maers. Area recovery. In the exeriments reorted here, delay is the arameter of interest, and area is not directly controlled during maing. However, unnecessary dulication of logic during DAG-maing can lead to oor area. The area of the maed netlist is imroved by making two asses after the maing rocedure described in Section 3.1. Before each ass, the required time is comuted at every node. The nodes are then rocessed in toological order (from inuts to oututs). For nodes having ositive slacks, matches that

8 minimize area without violating required times are chosen. The two asses differ in the metric used to measure area. The first ass uses area-flow [16], a global metric that accounts for fanout. The second ass uses exact area, a local metric that catures the area contribution due to gates in the maximal fanout-free cone of a match. (In exact area, the area cost of a match is the sum of the areas of all gates used exclusively by the match.) Our exeriments show that the combination of these heuristics (in this order) is very effective: ost sizing area is reduced by about 30% without increasing delay. 6.1 Basic Evaluation The first two exeriments were done in an academic setting, using mcnc.genlib and simle load indeendent delay model. The benchmarks chosen were the 15 largest ublicly available ones (listed in Table 2), and they were rerocessed with the scrit shown in Figure 1(a) followed by balancing for delay. For lossless synthesis, the choices were generated using the scheme shown in Figure 1(b). Characterizing the quality run-time trade-off. Table 1 summarizes the relative delay, area and run-times of the maer in its various modes. As might be exected, the fastest run-time is obtained when neither suergates nor lossless synthesis is used (this mode is called baseline), and the best quality (32% imrovement in delay over baseline) is obtained when both techniques are used (the mode is called B+L+S). In addition to these extreme cases, Table 1 also shows the intermediate situations when either lossless synthesis or suergates is used alone giving a range of quality run-time trade-offs. We note that since the absolute run-time of the baseline maer is very small (less than 4 seconds for the largest ublic benchmarks), the order of magnitude increase in run-time when using both lossless synthesis and suergates is accetable for better quality. Exerimental comarison with algebraic re-writing. Since a direct comarison with the imlementation described by Lehman et al. [13] was not ossible, we erformed a simle exeriment to estimate the effect of algebraic re-writing. As er Section 4.1, a number of associative and distributive decomositions were added through local re-writing. Decomositions were iteratively added until the number of nodes in the AIG triled. Comared to baseline, algebraic re-writing led to a 9% reduction in delay (cf. the 32% reduction with B+L+S). When used in conjunction with either lossless synthesis or suergates, re-writing led to a smaller imrovement in delay (about 4% in both cases). When used in conjunction with both suergates and lossless synthesis, there was an imrovement of only 2% in delay. This confirms the analysis in Section 5.5 that in ractice suergates and algebraic re-writing exlore different search saces. In the subsequent exeriments we use the baseline and the B+L+S modes (suergates and lossless synthesis) for comarisons with other maers. Comarison with SIS. Table 2 shows the erformance of the maer on the benchmark circuits in comarison with the maer in SIS (used in delay otimal mode). In the baseline mode the maer runs 5 times faster than the tree maer in SIS [20] and roduces 33% better delay without degrading area. In the B+L+S mode, the maer roduces the best results with 30% reduction in delay over baseline, and 54% over SIS. 6.2 Comarison with Industrial Maers The next set of exeriments are conducted in an industrial setting. The examles are timing-critical combinational blocks extracted from a high-erformance microrocessor design which were otimized for delay during technology indeendent synthesis. After technology maing, buffering and sizing is done searately in accordance with a gain-based flow. As art of the maing, an attemt is made to refer those gates that can drive the estimated fanout loads. (This is done iteratively like area-recovery using fanoutestimation techniques similar to those used in the area-flow algorithm [16]). Table 3 shows a comarison of the maer with two other state-of-the art maers: DAG maer [12] and GrahMa which is an indeendent imlementation of the Lehman- Watanabe maer [13] that uses Boolean matching. Both maers do not have area recovery. Using suergates and choices, the maer outerforms both GrahMa and DAG maer in delay and area and has a significantly shorter run-time. Table 4 shows the erformance of the maer on some larger blocks from the microrocessor, in comarison with DAG maer. Delay reduces by 12% while area (measured after sizing) reduces by 24%. Thus, with larger blocks, the imrovement in area is greater. It was ointed out in [12] that DAG maer can roduce significantly faster circuits comared to the traditional tree maing aroach [8]. However, the area increase for DAG maer sometimes can be quite significant. The significant area reduction by the new maer makes DAG maing aroach much more ractical, esecially when leakage ower consumtion is becoming an increasingly imortant consideration in high-erformance designs. 7. CONCLUSIONS AND FUTURE WORK Our exeriments demonstrate that Boolean maing based on the simlified matching algorithm and otimal hase selection is a better alternative to structural matching since it roduces suerior results with shorter run-time. Suergates and choices fit nicely into this framework and greatly imrove the quality of maing by mitigating structural bias. Furthermore, the intermediate networks seen during technology indeendent synthesis are a useful source of choices for the final maing. Suergates, though generated by brute-force enumeration, imrove the quality of maing, even with industrial libraries. To give a balanced view of the techniques resented, we should oint out their limitations. The exhaustive cut comutation which works very well in baseline mode (when no choices are used) becomes a comutational bottleneck when many choices are added. We have develoed runing heuristics to restrict the number of cuts considered for each node, but extensions of the techniques roosed in [4] to handle choices would be useful. A general limitation of cut-based matching methods is that library gates with many inuts cannot be handled. In ractice this is solved by using a structural matcher just for those gates during the matching hase (as is done in the IBM system [21]). We are currently exloring more seamless techniques to extend Boolean matching to gates with many inuts.

9 Name Delay Area Run-time SIS Baseline B+L+S SIS Baseline B+L+S SIS Baseline B+L+S b b bigkey C C C clma clmb dsi j j j s s s Ratio Table 2: Comarison with SIS on ublic benchmarks. The numbers for Baseline are absolute; those for SIS and C-S are relative to Baseline. Run-time is in seconds on a 1.6GHz Intel lato. Name DAG Maer GrahMa B+L+S Area Delay Area Delay Area Delay ex ex ex ex ex ex ex ex ex ex ex ex avg Table 3: Comarison with the other maers on industrial benchmarks. Name DAG Maer Baseline Area Delay Run-time Area Delay Run-time ex ex ex ex ex ex avg Table 4: Comarison on large industrial circuits.

10 The exhaustive nature of suergate generation (as resented in Section 5.2) is inefficient since (i) the generated functions may not correlate well with the actual cut functions in the circuits, and (ii) the same function may be generated multile times. It would be interesting to exlore methods for guided suergate generation where more comutational effort is invested in finding the suergates for the frequently occurring cut functions. This suggests the ossibility of a maing rocedure that learns from the revious runs how to guide suergate generation. For our current rototye within a gain-based methodology, sizing and buffering are erformed after maing during hysical synthesis. We lan to extend our maer for use in a flow that combines logical and hysical synthesis. [17] MVSIS Grou. MVSIS: Multi-Valued Logic Synthesis System. U. C. Berkeley. htt://www-cad.eecs.berkeley.edu/research/mvsis/ [18] A. Mishchenko, S. Chatterjee and R. Brayton, Fraigs: Integrated verification and synthesis, Tech. Reort., Det. of EECS, U. C. Berkeley, htt://www-cad.eecs.berkeley.edu/research/mvsis/ [19] A. Mishchenko, X. Wang, T. Kam, A new enhanced constructive decomosition and maing algorithm, Proc. DAC 03, [20] E. Sentovich, et al. SIS: A system for sequential circuit synthesis, Tech. Re. UCB/ERI, M92/41, ERL, Det. of EECS, U. C. Berkeley, [21] L. Stok, M. A. Iyer, A. J. Sullivan, Wavefront technology maing, Proc. DATE [22] Wu et al., Efficient Boolean matching algorithm for cell libraries, Proc. ICCD 94, ACKNOWLEDGMENTS The first three authors were suorted in art by C2S2, the MARCO Focus Center for Circuit and System Solution, under MARCO contract 2003-CT-888; and in art by California Micro Program along with our industrial sonsors Cadence, Synlicity, Intel, Magma, and Fujitsu. 9. REFERENCES [1] L. Benini, G. DeMicheli, A survey of Boolean matching techniques for library binding, ACM TODAES, Vol. 2, No. 3, July 1997, [2] C.-W. Chang and M. Marek-Sadowska, Who are the alternative wires in your neighborhood? (Alternative wire identification without search), Proc. GLSVLSI 01, [3] J. Cong and Y. Ding, FlowMa: An otimal technology maing algorithm for delay otimization in looku-table based FPGA designs, IEEE Trans. CAD, Vol. 13(1), 1994, [4] J. Cong, C. Wu and Y. Ding, Cut ranking and runing: Enabling a general and efficient FPGA maing solution, Proc. FPGA 99. [5] D. Debnath and T. Sasao, Fast Boolean matching under variable ermutation using reresentative, Proc. ASP-DAC 99, [6] S. Hassoun and T. Sasao, eds., Logic synthesis and verification, Kluwer 2002, Chater 5, Technology maing, [7] U. Hinsberger and R. Kolla, Boolean matching for large libraries, Proc. DAC 98, [8] K. Keutzer, DAGON: Technology binding and local otimizations by DAG matching, Proc. DAC 87, [9] V. N. Kravets and K. A. Sakallah, Constructive library-aware synthesis using symmetries, Proc. DATE 00, [10] A. Kuehlmann, Dynamic transition relation simlification for bounded roerty checking, Proc. ICCAD 04. [11] A. Kuehlmann, V. Paruthi, F. Krohm, and M. K. Ganai, Robust Boolean reasoning for equivalence checking and functional roerty verification, IEEE Trans. CAD, Vol. 21(12), 2002, [12] Y. Kukimoto, R. K. Brayton, P. Sawkar, Delay-otimal technology maing by DAG covering, Proc. DAC 98, [13] E. Lehman, Y. Watanabe, J. Grodstein, and H. Harkness, Logic decomosition during technology maing, IEEE Trans. CAD, 16(8), 1997, [14] F. Lu, L. Wang, K. Cheng, R. Huang. A circuit SAT solver with signal correlation guided learning, Proc. DATE 03, [15] F. Lu, L. Wang, K. Cheng, J. Moondanos and Z. Hanna, A signal correlation guided ATPG solver and its alications for solving difficult industrial cases, Proc. DAC 03, [16] V. Manohararajah, S. D. Brown, Z. G. Vranesic, Heuristics for area minimization in LUT-based FPGA technology maing, Proc. IWLS 04.

Reducing Structural Bias in Technology Mapping

Reducing Structural Bias in Technology Mapping Reducing Structural Bias in Technology Mapping S. Chatterjee A. Mishchenko R. Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu X. Wang T. Kam Strategic CAD Labs Intel

More information

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University Query Processing Shuigeng Zhou May 18, 2016 School of Comuter Science Fudan University Overview Outline Measures of Query Cost Selection Oeration Sorting Join Oeration Other Oerations Evaluation of Exressions

More information

A New Enhanced Approach to Technology Mapping

A New Enhanced Approach to Technology Mapping A New Enhanced Approach to Technology Mapping Alan Mishchenko Satrajit Chatterjee Robert Brayton Xinning Wang Timothy Kam Department of EECS Strategic CAD Labs University of California, Berkeley Intel

More information

Submission. Verifying Properties Using Sequential ATPG

Submission. Verifying Properties Using Sequential ATPG Verifying Proerties Using Sequential ATPG Jacob A. Abraham and Vivekananda M. Vedula Comuter Engineering Research Center The University of Texas at Austin Austin, TX 78712 jaa, vivek @cerc.utexas.edu Daniel

More information

Improvements to Technology Mapping for LUT-Based FPGAs

Improvements to Technology Mapping for LUT-Based FPGAs Improvements to Technology Mapping for LUT-Based FPGAs Alan Mishchenko Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, satrajit, brayton}@eecs.berkeley.edu

More information

AN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY

AN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY AN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY Andrew Lam 1, Steven J.E. Wilton 1, Phili Leong 2, Wayne Luk 3 1 Elec. and Com. Engineering 2 Comuter Science

More information

AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K.

AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K. inuts er clock cycle Streaming ermutation oututs er clock cycle AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS Ren Chen and Viktor K.

More information

Hardware-Accelerated Formal Verification

Hardware-Accelerated Formal Verification Hardare-Accelerated Formal Verification Hiroaki Yoshida, Satoshi Morishita 3 Masahiro Fujita,. VLSI Design and Education Center (VDEC), University of Tokyo. CREST, Jaan Science and Technology Agency 3.

More information

Randomized algorithms: Two examples and Yao s Minimax Principle

Randomized algorithms: Two examples and Yao s Minimax Principle Randomized algorithms: Two examles and Yao s Minimax Princile Maximum Satisfiability Consider the roblem Maximum Satisfiability (MAX-SAT). Bring your knowledge u-to-date on the Satisfiability roblem. Maximum

More information

Equality-Based Translation Validator for LLVM

Equality-Based Translation Validator for LLVM Equality-Based Translation Validator for LLVM Michael Ste, Ross Tate, and Sorin Lerner University of California, San Diego {mste,rtate,lerner@cs.ucsd.edu Abstract. We udated our Peggy tool, reviously resented

More information

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network 1 Sensitivity Analysis for an Otimal Routing Policy in an Ad Hoc Wireless Network Tara Javidi and Demosthenis Teneketzis Deartment of Electrical Engineering and Comuter Science University of Michigan Ann

More information

Improved heuristics for the single machine scheduling problem with linear early and quadratic tardy penalties

Improved heuristics for the single machine scheduling problem with linear early and quadratic tardy penalties Imroved heuristics for the single machine scheduling roblem with linear early and quadratic tardy enalties Jorge M. S. Valente* LIAAD INESC Porto LA, Faculdade de Economia, Universidade do Porto Postal

More information

Factor Cuts. Satrajit Chatterjee Alan Mishchenko Robert Brayton ABSTRACT

Factor Cuts. Satrajit Chatterjee Alan Mishchenko Robert Brayton ABSTRACT Factor Cuts Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu ABSTRACT Enumeration of bounded size cuts is an important

More information

A Model-Adaptable MOSFET Parameter Extraction System

A Model-Adaptable MOSFET Parameter Extraction System A Model-Adatable MOSFET Parameter Extraction System Masaki Kondo Hidetoshi Onodera Keikichi Tamaru Deartment of Electronics Faculty of Engineering, Kyoto University Kyoto 66-1, JAPAN Tel: +81-7-73-313

More information

Efficient Parallel Hierarchical Clustering

Efficient Parallel Hierarchical Clustering Efficient Parallel Hierarchical Clustering Manoranjan Dash 1,SimonaPetrutiu, and Peter Scheuermann 1 Deartment of Information Systems, School of Comuter Engineering, Nanyang Technological University, Singaore

More information

Identity-sensitive Points-to Analysis for the Dynamic Behavior of JavaScript Objects

Identity-sensitive Points-to Analysis for the Dynamic Behavior of JavaScript Objects Identity-sensitive Points-to Analysis for the Dynamic Behavior of JavaScrit Objects Shiyi Wei and Barbara G. Ryder Deartment of Comuter Science, Virginia Tech, Blacksburg, VA, USA. {wei,ryder}@cs.vt.edu

More information

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data Efficient Processing of To-k Dominating Queries on Multi-Dimensional Data Man Lung Yiu Deartment of Comuter Science Aalborg University DK-922 Aalborg, Denmark mly@cs.aau.dk Nikos Mamoulis Deartment of

More information

Integrating Logic Synthesis, Technology Mapping, and Retiming

Integrating Logic Synthesis, Technology Mapping, and Retiming Integrating Logic Synthesis, Technology Mapping, and Retiming Alan Mishchenko Satrajit Chatterjee Robert Brayton Department of EECS University of California, Berkeley Berkeley, CA 94720 {alanmi, satrajit,

More information

IMS Network Deployment Cost Optimization Based on Flow-Based Traffic Model

IMS Network Deployment Cost Optimization Based on Flow-Based Traffic Model IMS Network Deloyment Cost Otimization Based on Flow-Based Traffic Model Jie Xiao, Changcheng Huang and James Yan Deartment of Systems and Comuter Engineering, Carleton University, Ottawa, Canada {jiexiao,

More information

Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Spanning Trees 1

Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Spanning Trees 1 Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Sanning Trees 1 Honge Wang y and Douglas M. Blough z y Myricom Inc., 325 N. Santa Anita Ave., Arcadia, CA 916, z School of Electrical and

More information

PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS

PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS Kevin Miller, Vivian Lin, and Rui Zhang Grou ID: 5 1. INTRODUCTION The roblem we are trying to solve is redicting future links or recovering missing links

More information

A BICRITERION STEINER TREE PROBLEM ON GRAPH. Mirko VUJO[EVI], Milan STANOJEVI] 1. INTRODUCTION

A BICRITERION STEINER TREE PROBLEM ON GRAPH. Mirko VUJO[EVI], Milan STANOJEVI] 1. INTRODUCTION Yugoslav Journal of Oerations Research (00), umber, 5- A BICRITERIO STEIER TREE PROBLEM O GRAPH Mirko VUJO[EVI], Milan STAOJEVI] Laboratory for Oerational Research, Faculty of Organizational Sciences University

More information

An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2

An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2 An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2 Mingliang Chen 1, Weiyao Lin 1*, Xiaozhen Zheng 2 1 Deartment of Electronic Engineering, Shanghai Jiao Tong University, China

More information

On Resolution Proofs for Combinational Equivalence Checking

On Resolution Proofs for Combinational Equivalence Checking On Resolution Proofs for Combinational Equivalence Checking Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu Andreas Kuehlmann

More information

EE678 Application Presentation Content Based Image Retrieval Using Wavelets

EE678 Application Presentation Content Based Image Retrieval Using Wavelets EE678 Alication Presentation Content Based Image Retrieval Using Wavelets Grou Members: Megha Pandey megha@ee. iitb.ac.in 02d07006 Gaurav Boob gb@ee.iitb.ac.in 02d07008 Abstract: We focus here on an effective

More information

Extracting Optimal Paths from Roadmaps for Motion Planning

Extracting Optimal Paths from Roadmaps for Motion Planning Extracting Otimal Paths from Roadmas for Motion Planning Jinsuck Kim Roger A. Pearce Nancy M. Amato Deartment of Comuter Science Texas A&M University College Station, TX 843 jinsuckk,ra231,amato @cs.tamu.edu

More information

OMNI: An Efficient Overlay Multicast. Infrastructure for Real-time Applications

OMNI: An Efficient Overlay Multicast. Infrastructure for Real-time Applications OMNI: An Efficient Overlay Multicast Infrastructure for Real-time Alications Suman Banerjee, Christoher Kommareddy, Koushik Kar, Bobby Bhattacharjee, Samir Khuller Abstract We consider an overlay architecture

More information

Integrating Logic Synthesis, Technology Mapping, and Retiming

Integrating Logic Synthesis, Technology Mapping, and Retiming Integrating Logic Synthesis, Technology Mapping, and Retiming Alan Mishchenko Satrajit Chatterjee Jie-Hong Jiang Robert Brayton Department of Electrical Engineering and Computer Sciences University of

More information

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation To aear in IEEE VLSI Test Symosium, 1997 SITFIRE: Scalable arallel Algorithms for Test Set artitioned Fault Simulation Dili Krishnaswamy y Elizabeth M. Rudnick y Janak H. atel y rithviraj Banerjee z y

More information

An Efficient Video Program Delivery algorithm in Tree Networks*

An Efficient Video Program Delivery algorithm in Tree Networks* 3rd International Symosium on Parallel Architectures, Algorithms and Programming An Efficient Video Program Delivery algorithm in Tree Networks* Fenghang Yin 1 Hong Shen 1,2,** 1 Deartment of Comuter Science,

More information

Fast Boolean Matching for Small Practical Functions

Fast Boolean Matching for Small Practical Functions Fast Boolean Matching for Small Practical Functions Zheng Huang Lingli Wang Yakov Nasikovskiy Alan Mishchenko State Key Lab of ASIC and System Computer Science Department Department of EECS Fudan University,

More information

A DEA-bases Approach for Multi-objective Design of Attribute Acceptance Sampling Plans

A DEA-bases Approach for Multi-objective Design of Attribute Acceptance Sampling Plans Available online at htt://ijdea.srbiau.ac.ir Int. J. Data Enveloment Analysis (ISSN 2345-458X) Vol.5, No.2, Year 2017 Article ID IJDEA-00422, 12 ages Research Article International Journal of Data Enveloment

More information

Learning Motion Patterns in Crowded Scenes Using Motion Flow Field

Learning Motion Patterns in Crowded Scenes Using Motion Flow Field Learning Motion Patterns in Crowded Scenes Using Motion Flow Field Min Hu, Saad Ali and Mubarak Shah Comuter Vision Lab, University of Central Florida {mhu,sali,shah}@eecs.ucf.edu Abstract Learning tyical

More information

Space-efficient Region Filling in Raster Graphics

Space-efficient Region Filling in Raster Graphics "The Visual Comuter: An International Journal of Comuter Grahics" (submitted July 13, 1992; revised December 7, 1992; acceted in Aril 16, 1993) Sace-efficient Region Filling in Raster Grahics Dominik Henrich

More information

Leak Detection Modeling and Simulation for Oil Pipeline with Artificial Intelligence Method

Leak Detection Modeling and Simulation for Oil Pipeline with Artificial Intelligence Method ITB J. Eng. Sci. Vol. 39 B, No. 1, 007, 1-19 1 Leak Detection Modeling and Simulation for Oil Pieline with Artificial Intelligence Method Pudjo Sukarno 1, Kuntjoro Adji Sidarto, Amoranto Trisnobudi 3,

More information

Stereo Disparity Estimation in Moment Space

Stereo Disparity Estimation in Moment Space Stereo Disarity Estimation in oment Sace Angeline Pang Faculty of Information Technology, ultimedia University, 63 Cyberjaya, alaysia. angeline.ang@mmu.edu.my R. ukundan Deartment of Comuter Science, University

More information

Representations of Terms Representations of Boolean Networks

Representations of Terms Representations of Boolean Networks Representations of Terms Representations of Boolean Networks Logic Circuits Design Seminars WS2010/2011, Lecture 4 Ing. Petr Fišer, Ph.D. Department of Digital Design Faculty of Information Technology

More information

10. Parallel Methods for Data Sorting

10. Parallel Methods for Data Sorting 10. Parallel Methods for Data Sorting 10. Parallel Methods for Data Sorting... 1 10.1. Parallelizing Princiles... 10.. Scaling Parallel Comutations... 10.3. Bubble Sort...3 10.3.1. Sequential Algorithm...3

More information

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model. U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 18 Professor Satish Rao Lecturer: Satish Rao Last revised Scribe so far: Satish Rao (following revious lecture notes quite closely. Lecture

More information

A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems

A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems Cheng-Yuan Chang and I-Lun Tseng Deartment of Comuter Science and Engineering Yuan Ze University,

More information

Collective communication: theory, practice, and experience

Collective communication: theory, practice, and experience CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Comutat.: Pract. Exer. 2007; 19:1749 1783 Published online 5 July 2007 in Wiley InterScience (www.interscience.wiley.com)..1206 Collective

More information

Truth Trees. Truth Tree Fundamentals

Truth Trees. Truth Tree Fundamentals Truth Trees 1 True Tree Fundamentals 2 Testing Grous of Statements for Consistency 3 Testing Arguments in Proositional Logic 4 Proving Invalidity in Predicate Logic Answers to Selected Exercises Truth

More information

Cross products. p 2 p. p p1 p2. p 1. Line segments The convex combination of two distinct points p1 ( x1, such that for some real number with 0 1,

Cross products. p 2 p. p p1 p2. p 1. Line segments The convex combination of two distinct points p1 ( x1, such that for some real number with 0 1, CHAPTER 33 Comutational Geometry Is the branch of comuter science that studies algorithms for solving geometric roblems. Has alications in many fields, including comuter grahics robotics, VLSI design comuter

More information

Integrating Logic Synthesis, Technology Mapping, and Retiming

Integrating Logic Synthesis, Technology Mapping, and Retiming Integrating Logic Synthesis, Technology Mapping, and Retiming Alan Mishchenko Satrajit Chatterjee Robert Brayton Department of EECS University of California, Berkeley Berkeley, CA 94720 {alanmi, satrajit,

More information

Lecture 2: Fixed-Radius Near Neighbors and Geometric Basics

Lecture 2: Fixed-Radius Near Neighbors and Geometric Basics structure arises in many alications of geometry. The dual structure, called a Delaunay triangulation also has many interesting roerties. Figure 3: Voronoi diagram and Delaunay triangulation. Search: Geometric

More information

Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming

Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming Weimin Chen, Volker Turau TR-93-047 August, 1993 Abstract This aer introduces a new technique for source-to-source code

More information

A Study of Protocols for Low-Latency Video Transport over the Internet

A Study of Protocols for Low-Latency Video Transport over the Internet A Study of Protocols for Low-Latency Video Transort over the Internet Ciro A. Noronha, Ph.D. Cobalt Digital Santa Clara, CA ciro.noronha@cobaltdigital.com Juliana W. Noronha University of California, Davis

More information

ABC basics (compilation from different articles)

ABC basics (compilation from different articles) 1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node

More information

Lecture 8: Orthogonal Range Searching

Lecture 8: Orthogonal Range Searching CPS234 Comutational Geometry Setember 22nd, 2005 Lecture 8: Orthogonal Range Searching Lecturer: Pankaj K. Agarwal Scribe: Mason F. Matthews 8.1 Range Searching The general roblem of range searching is

More information

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism Erlin Yao, Mingyu Chen, Rui Wang, Wenli Zhang, Guangming Tan Key Laboratory of Comuter System and Architecture Institute

More information

Experimental Comparison of Shortest Path Approaches for Timetable Information

Experimental Comparison of Shortest Path Approaches for Timetable Information Exerimental Comarison of Shortest Path roaches for Timetable Information Evangelia Pyrga Frank Schulz Dorothea Wagner Christos Zaroliagis bstract We consider two aroaches that model timetable information

More information

An empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs

An empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs An emirical analysis of looy belief roagation in three toologies: grids, small-world networks and random grahs R. Santana, A. Mendiburu and J. A. Lozano Intelligent Systems Grou Deartment of Comuter Science

More information

Image Segmentation Using Topological Persistence

Image Segmentation Using Topological Persistence Image Segmentation Using Toological Persistence David Letscher and Jason Fritts Saint Louis University Deartment of Mathematics and Comuter Science {letscher, jfritts}@slu.edu Abstract. This aer resents

More information

FRAIGs: A Unifying Representation for Logic Synthesis and Verification

FRAIGs: A Unifying Representation for Logic Synthesis and Verification FRAIGs: A Unifying Representation for Logic Synthesis and Verification Alan Mishchenko, Satrajit Chatterjee, Roland Jiang, Robert Brayton Department of EECS, University of California, Berkeley {alanmi,

More information

A Symmetric FHE Scheme Based on Linear Algebra

A Symmetric FHE Scheme Based on Linear Algebra A Symmetric FHE Scheme Based on Linear Algebra Iti Sharma University College of Engineering, Comuter Science Deartment. itisharma.uce@gmail.com Abstract FHE is considered to be Holy Grail of cloud comuting.

More information

Theoretical Analysis of Graphcut Textures

Theoretical Analysis of Graphcut Textures Theoretical Analysis o Grahcut Textures Xuejie Qin Yee-Hong Yang {xu yang}@cs.ualberta.ca Deartment o omuting Science University o Alberta Abstract Since the aer was ublished in SIGGRAPH 2003 the grahcut

More information

Directed File Transfer Scheduling

Directed File Transfer Scheduling Directed File Transfer Scheduling Weizhen Mao Deartment of Comuter Science The College of William and Mary Williamsburg, Virginia 387-8795 wm@cs.wm.edu Abstract The file transfer scheduling roblem was

More information

Model-Based Annotation of Online Handwritten Datasets

Model-Based Annotation of Online Handwritten Datasets Model-Based Annotation of Online Handwritten Datasets Anand Kumar, A. Balasubramanian, Anoo Namboodiri and C.V. Jawahar Center for Visual Information Technology, International Institute of Information

More information

/$ IEEE

/$ IEEE 240 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007 Improvements to Technology Mapping for LUT-Based FPGAs Alan Mishchenko, Member, IEEE, Satrajit

More information

GEOMETRIC CONSTRAINT SOLVING IN < 2 AND < 3. Department of Computer Sciences, Purdue University. and PAMELA J. VERMEER

GEOMETRIC CONSTRAINT SOLVING IN < 2 AND < 3. Department of Computer Sciences, Purdue University. and PAMELA J. VERMEER GEOMETRIC CONSTRAINT SOLVING IN < AND < 3 CHRISTOPH M. HOFFMANN Deartment of Comuter Sciences, Purdue University West Lafayette, Indiana 47907-1398, USA and PAMELA J. VERMEER Deartment of Comuter Sciences,

More information

Autonomic Physical Database Design - From Indexing to Multidimensional Clustering

Autonomic Physical Database Design - From Indexing to Multidimensional Clustering Autonomic Physical Database Design - From Indexing to Multidimensional Clustering Stehan Baumann, Kai-Uwe Sattler Databases and Information Systems Grou Technische Universität Ilmenau, Ilmenau, Germany

More information

AN INTEGER LINEAR MODEL FOR GENERAL ARC ROUTING PROBLEMS

AN INTEGER LINEAR MODEL FOR GENERAL ARC ROUTING PROBLEMS AN INTEGER LINEAR MODEL FOR GENERAL ARC ROUTING PROBLEMS Philie LACOMME, Christian PRINS, Wahiba RAMDANE-CHERIF Université de Technologie de Troyes, Laboratoire d Otimisation des Systèmes Industriels (LOSI)

More information

On Resolution Proofs for Combinational Equivalence

On Resolution Proofs for Combinational Equivalence 33.4 On Resolution Proofs for Combinational Equivalence Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu Andreas Kuehlmann

More information

Collective Communication: Theory, Practice, and Experience. FLAME Working Note #22

Collective Communication: Theory, Practice, and Experience. FLAME Working Note #22 Collective Communication: Theory, Practice, and Exerience FLAME Working Note # Ernie Chan Marcel Heimlich Avi Purkayastha Robert van de Geijn Setember, 6 Abstract We discuss the design and high-erformance

More information

Single character type identification

Single character type identification Single character tye identification Yefeng Zheng*, Changsong Liu, Xiaoqing Ding Deartment of Electronic Engineering, Tsinghua University Beijing 100084, P.R. China ABSTRACT Different character recognition

More information

An Indexing Framework for Structured P2P Systems

An Indexing Framework for Structured P2P Systems An Indexing Framework for Structured P2P Systems Adina Crainiceanu Prakash Linga Ashwin Machanavajjhala Johannes Gehrke Carl Lagoze Jayavel Shanmugasundaram Deartment of Comuter Science, Cornell University

More information

Matlab Virtual Reality Simulations for optimizations and rapid prototyping of flexible lines systems

Matlab Virtual Reality Simulations for optimizations and rapid prototyping of flexible lines systems Matlab Virtual Reality Simulations for otimizations and raid rototying of flexible lines systems VAMVU PETRE, BARBU CAMELIA, POP MARIA Deartment of Automation, Comuters, Electrical Engineering and Energetics

More information

A Novel Iris Segmentation Method for Hand-Held Capture Device

A Novel Iris Segmentation Method for Hand-Held Capture Device A Novel Iris Segmentation Method for Hand-Held Cature Device XiaoFu He and PengFei Shi Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200030, China {xfhe,

More information

Optimization of Collective Communication Operations in MPICH

Optimization of Collective Communication Operations in MPICH To be ublished in the International Journal of High Performance Comuting Alications, 5. c Sage Publications. Otimization of Collective Communication Oerations in MPICH Rajeev Thakur Rolf Rabenseifner William

More information

An Efficient VLSI Architecture for Adaptive Rank Order Filter for Image Noise Removal

An Efficient VLSI Architecture for Adaptive Rank Order Filter for Image Noise Removal International Journal of Information and Electronics Engineering, Vol. 1, No. 1, July 011 An Efficient VLSI Architecture for Adative Rank Order Filter for Image Noise Removal M. C Hanumantharaju, M. Ravishankar,

More information

Sensitivity of multi-product two-stage economic lotsizing models and their dependency on change-over and product cost ratio s

Sensitivity of multi-product two-stage economic lotsizing models and their dependency on change-over and product cost ratio s Sensitivity two stage EOQ model 1 Sensitivity of multi-roduct two-stage economic lotsizing models and their deendency on change-over and roduct cost ratio s Frank Van den broecke, El-Houssaine Aghezzaf,

More information

Combinational and Sequential Mapping with Priority Cuts

Combinational and Sequential Mapping with Priority Cuts Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, smcho, satrajit, brayton@eecs.berkeley.edu

More information

Interactive Image Segmentation

Interactive Image Segmentation Interactive Image Segmentation Fahim Mannan (260 266 294) Abstract This reort resents the roject work done based on Boykov and Jolly s interactive grah cuts based N-D image segmentation algorithm([1]).

More information

A Toolbox for Counter-Example Analysis and Optimization

A Toolbox for Counter-Example Analysis and Optimization A Toolbox for Counter-Example Analysis and Optimization Alan Mishchenko Niklas Een Robert Brayton Department of EECS, University of California, Berkeley {alanmi, een, brayton}@eecs.berkeley.edu Abstract

More information

Using Rational Numbers and Parallel Computing to Efficiently Avoid Round-off Errors on Map Simplification

Using Rational Numbers and Parallel Computing to Efficiently Avoid Round-off Errors on Map Simplification Using Rational Numbers and Parallel Comuting to Efficiently Avoid Round-off Errors on Ma Simlification Maurício G. Grui 1, Salles V. G. de Magalhães 1,2, Marcus V. A. Andrade 1, W. Randolh Franklin 2,

More information

SEARCH ENGINE MANAGEMENT

SEARCH ENGINE MANAGEMENT e-issn 2455 1392 Volume 2 Issue 5, May 2016. 254 259 Scientific Journal Imact Factor : 3.468 htt://www.ijcter.com SEARCH ENGINE MANAGEMENT Abhinav Sinha Kalinga Institute of Industrial Technology, Bhubaneswar,

More information

OPTIMIZATION OF MAPPING ONTO A FLEXIBLE LOW-POWER ELECTRONIC FABRIC ARCHITECTURE

OPTIMIZATION OF MAPPING ONTO A FLEXIBLE LOW-POWER ELECTRONIC FABRIC ARCHITECTURE OPTIMIZATION OF MAPPING ONTO A FLEXIBLE LOW-POWER ELECTRONIC FABRIC ARCHITECTURE by Mustafa Baz B.S., Middle East Technical University, 004 M.S., University of Pittsburgh, 005 Submitted to the Graduate

More information

Parallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka

Parallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka Parallel Construction of Multidimensional Binary Search Trees Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka School of CIS and School of CISE Northeast Parallel Architectures Center Syracuse

More information

Parametric Optimization in WEDM of WC-Co Composite by Neuro-Genetic Technique

Parametric Optimization in WEDM of WC-Co Composite by Neuro-Genetic Technique Parametric Otimization in WEDM of WC-Co Comosite by Neuro-Genetic Technique P. Saha*, P. Saha, and S. K. Pal Abstract The resent work does a multi-objective otimization in wire electro-discharge machining

More information

Distributed Estimation from Relative Measurements in Sensor Networks

Distributed Estimation from Relative Measurements in Sensor Networks Distributed Estimation from Relative Measurements in Sensor Networks #Prabir Barooah and João P. Hesanha Abstract We consider the roblem of estimating vectorvalued variables from noisy relative measurements.

More information

Introduction to Parallel Algorithms

Introduction to Parallel Algorithms CS 1762 Fall, 2011 1 Introduction to Parallel Algorithms Introduction to Parallel Algorithms ECE 1762 Algorithms and Data Structures Fall Semester, 2011 1 Preliminaries Since the early 1990s, there has

More information

A Metaheuristic Scheduler for Time Division Multiplexed Network-on-Chip

A Metaheuristic Scheduler for Time Division Multiplexed Network-on-Chip Downloaded from orbit.dtu.dk on: Jan 25, 2019 A Metaheuristic Scheduler for Time Division Multilexed Network-on-Chi Sørensen, Rasmus Bo; Sarsø, Jens; Pedersen, Mark Ruvald; Højgaard, Jasur Publication

More information

Improving Trust Estimates in Planning Domains with Rare Failure Events

Improving Trust Estimates in Planning Domains with Rare Failure Events Imroving Trust Estimates in Planning Domains with Rare Failure Events Colin M. Potts and Kurt D. Krebsbach Det. of Mathematics and Comuter Science Lawrence University Aleton, Wisconsin 54911 USA {colin.m.otts,

More information

Near-Optimal Routing Lookups with Bounded Worst Case Performance

Near-Optimal Routing Lookups with Bounded Worst Case Performance Near-Otimal Routing Lookus with Bounded Worst Case Performance Pankaj Guta Balaji Prabhakar Stehen Boyd Deartments of Electrical Engineering and Comuter Science Stanford University CA 9430 ankaj@stanfordedu

More information

arxiv: v1 [cs.mm] 18 Jan 2016

arxiv: v1 [cs.mm] 18 Jan 2016 Lossless Intra Coding in with 3-ta Filters Saeed R. Alvar a, Fatih Kamisli a a Deartment of Electrical and Electronics Engineering, Middle East Technical University, Turkey arxiv:1601.04473v1 [cs.mm] 18

More information

Lecture 3: Geometric Algorithms(Convex sets, Divide & Conquer Algo.)

Lecture 3: Geometric Algorithms(Convex sets, Divide & Conquer Algo.) Advanced Algorithms Fall 2015 Lecture 3: Geometric Algorithms(Convex sets, Divide & Conuer Algo.) Faculty: K.R. Chowdhary : Professor of CS Disclaimer: These notes have not been subjected to the usual

More information

Efficient Sequence Generator Mining and its Application in Classification

Efficient Sequence Generator Mining and its Application in Classification Efficient Sequence Generator Mining and its Alication in Classification Chuancong Gao, Jianyong Wang 2, Yukai He 3 and Lizhu Zhou 4 Tsinghua University, Beijing 0084, China {gaocc07, heyk05 3 }@mails.tsinghua.edu.cn,

More information

Generic ILP-Based Approaches for Time-Multiplexed FPGA Partitioning

Generic ILP-Based Approaches for Time-Multiplexed FPGA Partitioning 1266 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 2, NO. 1, OCTOBER 21 Generic ILP-Based Aroaches for Time-Multilexed FPGA Partitioning Guang-Ming Wu, Jai-Ming Lin,

More information

Implementations of Partial Document Ranking Using. Inverted Files. Wai Yee Peter Wong. Dik Lun Lee

Implementations of Partial Document Ranking Using. Inverted Files. Wai Yee Peter Wong. Dik Lun Lee Imlementations of Partial Document Ranking Using Inverted Files Wai Yee Peter Wong Dik Lun Lee Deartment of Comuter and Information Science, Ohio State University, 36 Neil Ave, Columbus, Ohio 4321, U.S.A.

More information

Mitigating the Impact of Decompression Latency in L1 Compressed Data Caches via Prefetching

Mitigating the Impact of Decompression Latency in L1 Compressed Data Caches via Prefetching Mitigating the Imact of Decomression Latency in L1 Comressed Data Caches via Prefetching by Sean Rea A thesis resented to Lakehead University in artial fulfillment of the requirement for the degree of

More information

To appear in IEEE TKDE Title: Efficient Skyline and Top-k Retrieval in Subspaces Keywords: Skyline, Top-k, Subspace, B-tree

To appear in IEEE TKDE Title: Efficient Skyline and Top-k Retrieval in Subspaces Keywords: Skyline, Top-k, Subspace, B-tree To aear in IEEE TKDE Title: Efficient Skyline and To-k Retrieval in Subsaces Keywords: Skyline, To-k, Subsace, B-tree Contact Author: Yufei Tao (taoyf@cse.cuhk.edu.hk) Deartment of Comuter Science and

More information

Fast Distributed Process Creation with the XMOS XS1 Architecture

Fast Distributed Process Creation with the XMOS XS1 Architecture Communicating Process Architectures 20 P.H. Welch et al. (Eds.) IOS Press, 20 c 20 The authors and IOS Press. All rights reserved. Fast Distributed Process Creation with the XMOS XS Architecture James

More information

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing Mikael Taveniku 2,3, Anders Åhlander 1,3, Magnus Jonsson 1 and Bertil Svensson 1,2

More information

CASCH - a Scheduling Algorithm for "High Level"-Synthesis

CASCH - a Scheduling Algorithm for High Level-Synthesis CASCH a Scheduling Algorithm for "High Level"Synthesis P. Gutberlet H. Krämer W. Rosenstiel Comuter Science Research Center at the University of Karlsruhe (FZI) HaidundNeuStr. 1014, 7500 Karlsruhe, F.R.G.

More information

Patterned Wafer Segmentation

Patterned Wafer Segmentation atterned Wafer Segmentation ierrick Bourgeat ab, Fabrice Meriaudeau b, Kenneth W. Tobin a, atrick Gorria b a Oak Ridge National Laboratory,.O.Box 2008, Oak Ridge, TN 37831-6011, USA b Le2i Laboratory Univ.of

More information

Real Time Compression of Triangle Mesh Connectivity

Real Time Compression of Triangle Mesh Connectivity Real Time Comression of Triangle Mesh Connectivity Stefan Gumhold, Wolfgang Straßer WSI/GRIS University of Tübingen Abstract In this aer we introduce a new comressed reresentation for the connectivity

More information

TOPP Probing of Network Links with Large Independent Latencies

TOPP Probing of Network Links with Large Independent Latencies TOPP Probing of Network Links with Large Indeendent Latencies M. Hosseinour, M. J. Tunnicliffe Faculty of Comuting, Information ystems and Mathematics, Kingston University, Kingston-on-Thames, urrey, KT1

More information

Face Recognition Using Legendre Moments

Face Recognition Using Legendre Moments Face Recognition Using Legendre Moments Dr.S.Annadurai 1 A.Saradha Professor & Head of CSE & IT Research scholar in CSE Government College of Technology, Government College of Technology, Coimbatore, Tamilnadu,

More information

A Scalable Parallel Approach for Peptide Identification from Large-scale Mass Spectrometry Data

A Scalable Parallel Approach for Peptide Identification from Large-scale Mass Spectrometry Data 2009 International Conference on Parallel Processing Workshos A Scalable Parallel Aroach for Petide Identification from Large-scale Mass Sectrometry Data Gaurav Kulkarni, Ananth Kalyanaraman School of

More information

Learning Robust Locality Preserving Projection via p-order Minimization

Learning Robust Locality Preserving Projection via p-order Minimization Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Learning Robust Locality Preserving Projection via -Order Minimization Hua Wang, Feiing Nie, Heng Huang Deartment of Electrical

More information