circuit simulation NP-complete? backward retiming forward retiming f(x)

Size: px
Start display at page:

Download "circuit simulation NP-complete? backward retiming forward retiming f(x)"

Transcription

1 Optimal FPGA Mapping and Retiming with Ecient Initial State Comptation Jason Cong and Chang W Department of Compter Science Uniersity of California, Los Angeles, CA 995 Abstract For seqential circits with gien initial states, new eqialent initial states mst be compted for retiming, which nfortnately is NP-hard. In this paper, we present a noel polynomial time algorithm for optimal FPGA mapping with forward retiming to minimize the clock period with ecient initial state comptation. It considers forward retiming, initial state comptation and mapping simltaneosly. Or algorithm enables a new methodology of separating forward retiming from backward retiming. Since we garantee to compte an optimal mapping with forward retiming soltion, backward retiming can be performed as a pre-processing to try to psh ipops to primary inpts with consideration of only initial state comptation. Ths, we can aoid the time-consming iterations between retiming for clock period minimization and initial state comptation. This algorithm compares ery faorably with both conentional approaches of mapping followed by separate retiming and recent approaches of combined mapping with retiming, bt withot consideration of initial state comptation [1], [2]. Or reslts show that or algorithm can redce the clock period by 17.5% oer conentional approaches of separate mapping with retiming. On the other hand, or algorithm can garantee ecient initial state comptation in the mapped and retimed circits with only 2.8% increase in clock period oer the optimal mapping with general retiming soltions, while the initial state comptation for the latter soltions may need prohibitiely long rntime for designs with oer seeral hndred ipops. I. Introdction Retiming is a well known optimization techniqe for seqential circits, originally proposed by Leiserson and Saxe [3], to redce the clock period by repositioning ipops (s). Many research reslts hae been pblished in literatre recently on combining retiming with logic optimization, circit partitioning and technology mapping [4], [5], [6]. For lookp-table (LUT) based FPGA designs, a signicant adancement was made by Pan and Li [1] on optimal simltaneos technology mapping with retiming for clock period minimization. Later on, Cong and W [7], [2] proposed a mch more ecient optimal mapping with retiming algorithm which achiees 1 times speedp oer [1]. Howeer, limited progress has been made on eqialent initial state comptation for retiming. A general retiming procedre sally inoles two kinds i 1 i 2 in f(x) forward retiming circit simlation y = f(i 1 ; i 2 ; ::; i n ) f(x) y backward retiming 9X; s:t: f (X) = y? NP-complete?? f(x) Fig. 1. Retiming and initial state comptation. Each small rectangle represents a. of moements. Forward retiming (FRT) moes s from inpts of gates to their otpts, while backward retiming moing s in the opposite direction. As shown in Figre 1, the eqialent initial state of forward retiming can be compted easily sing circit simlation. The eqialent initial state for backward retiming, howeer, reqires soling a satisability problem which is NP-complete in general. Toati and Brayton [8] proposed an initial state comptation algorithm based on STG (state transition graph) traersal. Their algorithm can compte eqialent initial states nder the condition that there exists a loop from the initial state to itself in the STG of the original circit. Howeer, nless proided externally, to determine the existence of sch a loop needs to search the STG which may hae an exponential nmber of nodes. Althogh representing the reset logic explicitly can garantee the existence of sch a loop and aoid the STG traersal, the extra cost of the reset logic separated from s after retiming can be high. 1 Frthermore, the reset logic may restrict possible retiming of the original circit, ths, redce the potential of retiming optimization. A recent paper by Singhal et al. [9] showed that by retiming the reset logic together with s will not aect the possibility of retiming and there is no extra cost on the area of the circit. Bt their approach is applicable for only forward retiming, bt not general retiming. Seeral heristics for initial state comptation of a gien retiming based on ATPG (atomatic test pattern generation) techniqes [1], [11], [12] hae been proposed recently. For example, L. Stok et al. [11] proposed to redce the initial state comptation eort by minimizing the maximm retiming ale which represents the maximm nmber of s to be moed backward across eery gate. ATPG based jstication was sed to compte eqialent initial states 1 In sal case, the reset logic is bilt in each. Bt we can represent each as a generic pls reset logic. After retiming, howeer, the reset logic may be separated from the generic s and cannot be re-packed into those s. As a reslt, extra LUTs may need to implement the reset logic.? 1

2 for backward retiming. Possible iterations of retiming and initial state comptation may still be needed. Howeer, ATPG itself is also NP-complete [13], [14]. Frthermore, we hae obsered that minimizing the maximm retiming ale as formlated in [11] might not always lead to a simpler initial state comptation problem. Or experience shows that nless a pre forward retiming soltion can be fond, the eort of initial state comptation depends on the nmber of nodes inoled in backward retiming, instead of the maximm retiming ale. Maheshwari and Sapatnekar [12] proposed to compte an pper-bond on each node's retiming ale based on ATPG techniqes sch that the eqialent initial state can be compted for a retiming within this bond. In general, the larger ales of the pper-bonds, the better the retiming soltion. Howeer, to increase one node's retiming pper-bond ale may need to redce another node's retiming pper-bond ale and it is hard to trade o the ales of dierent nodes to achiee the best possible soltion. As a reslt, the athors proposed to perform retiming for many sets of pper-bonds and choose the best possible reslts. Clearly, this can be time-consming in practice. To better nderstand the diclty of the initial state comptation problem and the capability of existing initial state comptation algorithms, we tested the retiming package in SIS [15] which compted initial states for retiming based on the algorithm in [8]. For circit s in the MCNC benchmark site with 135 s and 1293 two-inpt gates, SIS cold not nish the initial state comptation for an optimal retiming after rnning 4 hors on a Sn UL- TRA2 workstation with 256MB memory. We see that the initial state comptation problem being a major obstacle of preenting wide-spread se of the retiming techniqe in practice. One common problem of the existing approaches is lacking consideration of whether an eqialent initial state can be compted when compting a retiming. When failing to nd an eqialent initial state for a retiming, most algorithms need to backtrack to compte another retiming with almost no assrance if an eqialent initial state can be compted for the new retiming. As a reslt, many iterations between retiming and initial state comptation may be needed and the comptation time can be ery long. In order to aoid high time complexity of initial state comptation for general retiming, we propose to perform LUT-based FPGA mapping with forward retiming. Or main contribtion is the deelopment of the rst polynomial time optimal algorithm for FPGA mapping with forward retiming, which has immediate benet of garanteed eqialent initial state comptation in linear time. With this algorithm, we can try to psh s back to primary inpts as mch as possible to enlarge the soltion space of mapping with forward retiming and achiee better reslts. One important adantage of or approach is that we only need to focs on initial state comptation withot considering the eect of retiming on clock period minimization dring the pre-processing of backward retiming. As a reslt, we can still achiee eectie retiming for circit optimization withot time-consming iterations between retiming and initial state comptation, slow STG traersal, or any extra reset logic in existing approaches [12], [1], [16], [8]. Or algorithm compares ery faorably with both conentional approaches of separate mapping with retiming and recent approaches of combined mapping with retiming, bt withot initial state comptation [1], [2]. It is also applicable to FPGA deices with pre-xed initial state settings. 2 The remainder of the paper is organized as follows. Section II presents the problem formlation and denitions. Or algorithm is presented in Section III. A postprocessing for FPGAs with pre-xed initial state settings is presented in Section IV. Or experimental reslts are presented in Section V. Or conclsion and ftre work are presented in Section VI. A preliminary ersion of this work was presented at DAC98 [17]. II. Problem Formlation and Definitions Gien a seqential circit, the technology mapping problem for K-LUT based FPGAs is to constrct an eqialent circit consisting of K-LUTs and ipops (s), sch that the two circits generate the same otpt seqence for any inpt seqence, starting from their indiidal initial states, respectiely. The clock period of a circit is the maximm combinational path delay. In this paper, we shall explore the following problem: Problem 1: Gien a seqential circit with initial state s, nd an eqialent LUT circit with initial state s r and minimm clock period nder forward retiming. As in [1], [2], instead of soling the optimization Problem 1 directly, we shall sole its decision ersion and then se binary search for the minimm clock period. Problem 2: Gien a seqential circit with initial state s and target clock period, nd an eqialent LUT circit with initial state s r and clock period of no more than nder forward retiming. In this paper, seqential circits are represented as retiming graphs [3]. The retiming graph G(V; E; W ) of a seqential circit is a directed graph, where V is the set of nodes, E is the set of edges and W is the set of edge weights. Each node in V represents a gate, a primary inpt (PI) or a primary otpt (PO) in the original circit. Each edge e(; ) in E represents a directed connection from a node to a node. The weight w(e) of an edge e is the nmber of s on the connection represented by the edge. The path weight w(p) of a path p is the sm of all edge weights on the path. Under the nit-delay mode, node delay d() is 1 for eery internal node, or for eery PI or PO. Path delay d(p) of a path p is the sm of all node delays on the path. For simplicity, in this paper, we consider the nit-delay model and synchronos seqential circits with single-phase clock and edge-triggered ipops. 3 Depending 2 For example, the initial state of s on Xilinx XC52 FPGAs can only be set to. 3 For more complex delay models, sch as the fanot-based nominal delay model, the FPGA mapping of combinational circits for delay minimization has already been shown to be NP-hard [18]. The problem will be een more diclt if layot information is considered. We think, howeer, the techniqes presented in this paper can 2

3 on the context, we se G to represent both V and E. For a gien retiming, the retiming ale R() of node is the nmber of s moed backward across. A negatie retiming ale of a node means moing s forward across the node. In the remainder of this paper, a retiming is represented by a set of retiming ales fr() j 8 2 Gg, where R() is for eery PI or PO which means that no s moed across either PI or PO. As in [2], for a target clock period, we dene the edge length (denoted length(e)) of an edge e(; ) to be d() P? w(e). The path length (denoted length(p)) is e2p length(e). In an LUT network M, a node's l-ale l M () is dened to be the maximm path length of all paths from PIs to, which represents the node arrial time after retiming. (Here, we assme that eery node is reachable from at least one PI. In cases that there exist nodes not reachable from any PIs, the l-ales of those nodes are not well dened. Extension to the following theory and or algorithm will be presented in x III-F.) It is not diclt to proe that: Theorem 1: Gien an LUT network M and a target clock period, the clock period of M nder forward retiming is no more than, if and only if the l-ales l M () for eery LUT in M. 4 Proof: ()) Let fr() j 8 2 M; R() g be a forward retiming to achiee. After retiming, eery combinational path has delay of no more than. Ths, for any path p : P I ;, the path length after retiming satises the ineqality that length r (p) = d(p)? w r (p) = d(p)? (w(p) + R()). Since R(), the path length in M before retiming length(p) = length r (p) + R() is also no more than. As a reslt, l M () = maxflength(p) j 8p : P I ; g. (() Since l M (), we perform the forward retiming dened as: if is a PI or PO R() = lm () d e? 1 otherwise. We proe that the retiming will redce the clock period of M to. First, we show that this is a legal retiming sch that the edge weight of eery edge e(; ) after retiming is non-negatie. By denition of the l-ale, we hae that l M ()l M ()+length(e)=l M ()?w(e)+1. Withot lose of generality, we assme both and are LUTs. (Notice that they may also be PIs or POs with delay of. The proof for this case is ery similar and is omitted here.) The retimed edge weight satises the ineqality that w r (e) = lm () lm () w(e)+r()?r() = w(e)+(d e?1)?(d e?1) lm ()?w(e)+1 lm () w(e) + d e? d e. Second, we show that w r (p) 1 for any path p : ; with d(p) > to conclde or proof. For sch a path p, since d(p) is an integer, we hae that d(p) +1 and l M ()+length(p) = be extended for more accrate delay models as a heristic. 4 Note that this reslt is similar to Theorem 3 in [1]. The dierence is that both forward and backward retimings were allowed in [1], while only forward retiming is allowed in or case. Conseqently, Theorem 3 in [1] checks only the l-ales of POs, while we need to check that of eery node. l M () + d(p)? 1? w(p) l M (). Therefore, w r (p) = w(p) + R()? R() w(p) + (d l M () + d(p)? 1? w(p) e? 1)?(d l M() e? 1) = d l M() + d(p)? 1 d l M() + = d l M() = 1: e? d l M() e e + 1? d l M() e e? d l M() e Denition 1: Gien a mapping with forward retiming soltion M and a target clock period, the nmber of s moed forward across LUT from each inpt of, denoted r M (), is called the forward retiming ale of in M. The s-ale is dened to be s M () = l M ()? r M (), where l M () is the l-ale of in M. The s-ale represents the node arrial time before forward retiming. The reason of considering the s-ale is that it is easier to compte than the l-ale becase it does not depend on retiming, bt the latter does. According to Theorem 1, we hae that: Corollary 1: A mapping soltion M has a clock period of no more than a gien nder forward retiming, if and only if s M () + r M () for eery LUT root in M. Denition 2: Gien a circit and a target clock period, if there exists a FRT mapping soltion with clock period of no more than nder forward retiming (called a feasible FRT mapping soltion), the s-label S() for node is dened to be the minimm s M () among all feasible FRT mapping soltions and, the r-label R() is dened to be the minimm r M () among all feasible FRT mapping soltions with s M () = S(). The node label pair is dened to be a 2-tple (S(); R()). According to Corollary 1, we can sole Problem 2 by compting the node label pairs and checking if the ineqality of S() + R() holds for eery node. The minimm clock period min of the optimal mapping soltion can be compted with binary search. After getting min and the corresponding label pairs, we constrct a mapping soltion and perform a postprocessing of forward retiming step to achiee min. Minimizing S() is to minimize the longest path delay from PIs to after mapping, ths, redce the clock period after retiming. Minimizing R() is to psh forward as few s as possible for each node to leae the maximm freedom to sbseqent forward retiming step, becase those s pshed forward can no longer be pshed back. On the other hand, minimizing S() may increase R(), becase we need to constrct larger LUTs which may inclde more s. Or algorithm can simltaneosly minimize both S() and R() to compte optimal FRT mapping soltions. 3

4 As in [19], [1], [2], we assme that the initial circit is K-bonded. (When a circit is not K-bonded, we can se gate decomposition algorithms, sch as balanced tree decomposition [2], DMIG [21] or DOGMA [22], to decompose those gates with more than K fanins beforehand.) Before proceeding to present or algorithm, we rst introdce the denition of K-cts which is widely sed in this paper and some other FPGA mapping algorithms [19], [2], [1]. In a directed graph with one sink and one sorce, a ct (X; X) is a partition of the graph sch that the sink is in X and the sorce is in X. The node ct-set V (X; X) is the set of nodes in X that are connected directly to nodes in X. If j V (X; X) j K, a ct (X; X) is called a K-feasible ct, or a K-ct in short. A ct is a min-ct, if j V (X; X) j is minimal. To determine the existence of a K-ct, one can compte a maximm ow from the sorce to the sink and check whether its ale is larger than K. This process is called K-ct comptation in this paper. III. The TrboMap-frt Algorithm Or algorithm, named TrboMap-frt, is to compte optimal mapping with forward retiming soltions for synchronos seqential circits with gien initial states to minimize the clock period. It performs binary search on the clock period from 1 to an pper-bond compted by existing algorithms, for example, FlowMap [19]. For a gien clock period, a procedre named FRTcheck is sed to decide whether there exists a feasible soltion throgh label comptation. An oeriew of the label comptation procedre can be described as follows. First, we assign an initial lower-bond on the ale of each node label pair. Then, we iteratiely increase those lower-bonds ntil they all conerge to the ales of node label pairs if there exists a feasible soltion, or stop if we conclde there is no feasible soltion. In the remainder of this section, we se n to denote the nmber of nodes in the retiming graph of the original circit. A. Reiew of Expanded Circit for LUT Representation with Backward Retiming i1 i2 a c (a) b c a b 1 c i1 c 1 i2 1 a b 1 c a 1 b 2 i1 c 1 i2 1 a b 1 (b) (c) (d) (e) Fig. 2. Expanded circits for LUT representation with backward retiming. To compte an optimal soltion we need to search all possible LUTs for eery node nder node dplication and forward retiming. Pan and Li [1] proposed to search all possible LUTs rooted at a node which can be formed nder backward retiming on the expanded circit of the node. An c expanded circit E l of node with control nmber l is a directed acyclic graph (DAG) rooted at and constrcted from the original retiming graph G with node replication, sch that all paths from a node in E l to the root hae the same nmber of s. If a replication of a node passes w s before reaching the root in E, l it is denoted as w [1]. The control nmber l is the shortest distance (in terms of the nmber of edges) between the root and each leaf w if is an internal node in G. 5 For example, Figres 2(b) to 2(e) show for expanded circits Ec, Ec 1, Ec 2 and Ec 3 of node c. With backward retiming Pan and Li [1] showed that any K-LUT of a node can be deried from a K-ct of E Kn and any K-ct of E Kn can be sed to derie a K-LUT of. B. Expanded Circit Constrction for LUT Representation with Forward Retiming We also se expanded circits to represent all possible LUTs nder possible node dplication and forward retiming for label pair pdate. In the preios expanded circit constrction method, the one-to-one correspondence between K-LUTs and K-cts are based on the assmption that backward retiming is allowed. Howeer, this is not tre if only forward retiming is allowed. With only forward retiming, a K-ct on E Kn may not always be able to derie a K-LUT of. As shown in Figre 2(d), to pack all the nodes in the dotted box into an LUT we hae to moe backward the on the fanot edge of b 1 to its fanin edges, becase the retiming to psh the forward to the fanot edge of c is illegal as we hae no to moe forward on edge e(a ; c ). Ths, the corresponding K-ct does not correspond to any K-LUT for FRT mapping. In the following, we propose to constrct a smaller expanded circit for eery node based on its maximm forward retiming ale so that eery K-LUT in an FRT mapping soltion corresponds to a K-ct on the expanded circit and, ise ersa. Denition 3: The maximm forward retiming ale frt() of a node is the maximm nmber of s which can be moed forward from the (transitie) fanins of to the otpt of. Lemma 1: For a node in a retiming graph, frt() is the minimm path weight from PIs to, i.e., frt() = minfw(p) j 8 p : P I ; g. Proof: First, frt() minfw(p) j 8 p : P I ; g. If otherwise, assme there is a path p : P I ; and w(p) < f rt(). After moing f rt() s forward across, the new path weight w r (p) = w(p)? frt() <, which means the retiming is illegal. Let m = minfw(p) j 8 p : P I ; g for a node. Clearly, m 1 +w(e) m 2 for any edge e( 1 ; 2 ). We shall show that we can always psh m s forward across for eery node. To proe this, we simply need to perform a forward retiming fr() =?m j 8 2 Gg and show it is legal, i.e., w r (e) for eery edge e. For any edge 5 The control nmber does not apply to any leaf w if is a PI in G. 4

5 e( 1 ; 2 ) 2 G, w r (e) = w(e) + R( 2 )? R( 1 ) = w(e)? m 2 + m 1. So it is a legal forward retiming. Since the edge weight is non-negatie, the maximm forward retiming ales of all the nodes in a retiming graph can be compted in O(n 2 ) time with Dijkstra's shortest path algorithm [23]. Now we dene a set of expanded circits of a node for FRT mapping. The expanded circit F i for a gien pper-bond i of forward retiming ale of a node is a sb-dag of E Kn with root sch that, w is an internal node of F i if and only if w is an internal node of E Kn and w i; w is a leaf of F i if and only if w is either a leaf of or a fanin of an internal node of F i with w > i. For example, the shaded area of Figre 3(b) represents Fc 1 for the circit shown in Figre 3(a), where the heay shaded nodes are leaes of Fc 1. Denition 4: For a K-ct (X; X) of an expanded circit of node, the ct-weight is dened to be w(x; X) = maxfw j 8 w 2 Xg, which represents the minimm forward retiming ale needed to psh all the s inside X to the fanot edge of. Obiosly, the ct-weight of any K-ct of F i is no more E Kn than i. Let s assme that the initial states of s on dierent fanot edges of a node are the same, we hae the following theorem. 6 Theorem 2: Eery K-feasible ct (X; X) of F frt() corresponds to a K-LUT rooted at nder possible node dplication and forward retiming, where all the leaes are in X and the root is in X. On the other hand, any K-LUT in an FRT mapping soltion corresponds to a K-feasible ct of F frt(). Proof: Obiosly, eery K-ct (X; X) of F frt() corresponds to a K-LUT in FRT mapping, becase w(x; X) frt() and we always can psh all the s within X forward to the otpt of. Now we proe that the K-ct (X; X) corresponding to a K-LUT rooted at in an FRT mapping soltion mst be inclded in F frt(), or in other words, it mst be that X F frt(). Clearly w(x; X) frt(), becase we hae to psh all the s within X forward to the fanot of, which means w frt() for any w 2 X. By denition of F frt(), any w 2 X is an internal node of F frt(). On the other hand, any leaf w of F frt() cannot be inclded in X, becase either w > f rt() or w is also a leaf of E Kn sch that w 62 X. (Notice that it is proed in [1] that for any K-ct (X; X) corresponding to a K-LUT, any leaf of E Kn is not contained in X.) This concldes or proof. Let s look at the circit shown in Figre 3(a). Notice that its only dierence with the circit in Figre 2(a) is one extra on edge (i 1 ; a) which makes frt(c) = 1. Now the 3-ct (X; X) shown in Figre 3(b) can form a 3-LUT as shown in Figre 3(c). With the assmption that eery edge has at most one, has O(Kn 2 ) nodes and O(K 2 n 2 ) edges [1]. Clearly, E Kn 6 In case that the initial states of s on dierent fanot edges of a node are not the same, the theorem still holds after a simple ber insertion. The detail will be presented at the end of this section. a i 1 i 2 c (a) F 1 c b i 2 1 c 2 i 2 2 i 1 1 a 1 b 2 a c 1 c (b) b X i 1 2 i 1 1 a 1 c c (c) b LUT Fig. 3. LUT formation for FRT mapping based on the expanded circits. Edge e(c 2 ; b 2 ) shown in (b) does not belong to the expanded circit Fc 1, i.e., b 2 is a leaf node in Fc 1. F frt() has no more than O(Kn 2 ) nodes and O(K 2 n 2 ) edges as it is a sb-dag of E Kn. Practically, howeer, sing the techniqe of ecient K-ct comptation on partial ow networks proposed in [2], the expanded circits needed to be constrcted always hae far less than n nodes and Kn edges for all the benchmarks we hae tested. C. Iteratie Label Comptation For a target clock period, we compte all node label pairs of a circit to decide the existence of a feasible FRT mapping soltion. The algorithm for this operation is named FRTcheck. It assigns a pair of lower-bonds, denoted (s(); r()), on the ale of the label pair for eery node and iteratiely pdates the ales of the lower-bonds ntil they all conerge to the ales of the node label pairs, i.e., s() = S() and r() = R() for eery node. The initial ales of the lower-bonds are (; ) for the PIs and (?1; ) for the other nodes. If the lower-bonds cannot conerge after n 2 iterations (one iteration is the process of pdating the lower-bond of eery node's label pair once), FRTcheck stops the comptation and concldes that no feasible FRT mapping soltion exists for the gien. The psedo-code of the FRTcheck algorithm is shown in Figre 4. In the remainder of this sbsection, we will present the details of the LabelUpdate procedre (line 7 in Figre 4), which is to pdate the lower-bond of the label pair for a single node once based on the crrent lowerbond ales of all nodes. Denition 5: Gien a set of crrent lower-bonds (s(); r()), where s() S(), and r() R() if s() = S(), the ct-height of a K-ct (X; X) in the expanded circit F frt() h(x; X) = of node is max 8 K-ct (X;X) on F frt() fs()? w + 1 j 8 w 2 V (X; X)g: To compte a tighter lower-bond s new () for a node, we rst compte a ale &() = maxfs()? w(e) j 8e(; ) 2 Gg, where G is the retiming graph of the original circit. If &() < s(), we keep s new () to be s() and set r new () to be. If &() s(), we decide whether there exists a K-ct of F frt() with height of no more than &() i 1 2 5

6 FRTcheck(G(V; E; W ); ) 1 assign initial lower-bond (; ) for each PI and (?1; ) for the other nodes 2 conerge FALSE, i 3 sort all nodes from PIs to POs with DFS (depth-rst search) 4 while (not conerge and i n 2 ) do f 5 conerge TRUE 6 for each node from PIs to POs do f 7 (s new (); r new ()) LabelUpdate(; ) 8 if (s new () > s()) then f 9 s() s new () 1 conerge FALSE 11 g 12 g 13 i i g 15 if (conerge) then retrn(true) 16 else retrn(false) Fig. 4. Label comptation for target clock period. LabelUpdate(; ) 1 compte &() 2 if &() < s() then 3 s new () &(); r new () 4 else if does not exist a K-ct with height of no more than &() then 5 s new () &() + 1; r new () 6 else f 7 compte a K-ct with the minimm weight w min and h(x; X) &() 8 if &() + w min then 9 s new () &(); r new () w min 1 else s new () &() + 1; r new () = 11 g Fig. 5. Label pair pdate for a single node. based on the max-ow comptation on F frt(). (In fact, for eciency consideration, we perform the max-ow K-ct with mch smaller size as in [2].) If there does not exist sch a K-ct, we select a ct with X = fg and set s new () = &() + 1. Obiosly, r new () = in this case, becase X(= fg) does not inclde any. If, howeer, there does exist sch a K-ct with height of no more than &(), we compte a K-ct with minimm ctweight (see Denition 4) w min and height of no more than &() by binary searching the ct-weight from to frt() as follows. For a gien w 2 [; frt()], to decide whether there is a K-ct with height of no more than &() and ct-weight of no more than w, we rst constrct an expanded circit F w comptation on a partial ow network of F frt() and then, decide whether there exists a K-ct on F w with height of no more than &(). If there exists sch a K-ct, clearly, it is a K-ct with height of no more than &() and ct-weight of no more than w. Let (X; X) be sch a K-ct with height of &() and ctweight of w min compted aboe. If &()+w min, we pdate the new lower-bond to be (&(); w min ). Otherwise, we pdate the new lower-bond to be (&() + 1; ). Recall that if the minimm height of all K-cts is larger than &(), we also pdate the new lower-bond to be (&()+1; ). This concldes the LabelUpdate procedre. The psedo code is shown in Figre 5. Theorem 3: For a seqential circit which has a feasible FRT mapping soltion for a target clock period, starting from the initial lower-bonds (; ) for PIs and (?1; ) for the other nodes, the ineqalities (1) s() S() and, (2) r() R() when s() = S() hold all the time after any nmber of iterations of label pdate. Proof: See appendix. It means that s() and r() are trly lower-bonds on the ales of S() and R(). Frthermore, we shall proe in the appendix that s() will monotonically increase and both s() and r() will conerge to S() and R(), respectiely, in n 2 iterations, if there exists a feasible soltion for the target clock period. If, on the other hand, both s() and r() conerge to nite ales for all the nodes, we shall show that we can form an FRT mapping soltion with clock period of no more than the target ale in the following sbsection. Conseqently, we conclde that TrboMap-frt can compte an optimal FRT mapping soltion with the minimm clock period in polynomial time. D. Mapping Generation with Forward Retiming and Initial State Comptation After compting the minimm clock period min with binary search and obtaining the label pairs (S(); R()) of all nodes, the last step of or algorithm is to generate the mapping soltion based on the K-cts compted dring the label comptation and perform forward retiming with initial state comptation. First, we get all the LUT roots in the mapping soltion based on the K-cts compted dring the label comptation. Obiosly, all the POs of the original circit are LUT roots. If is an LUT root, all the nodes in the node-ct set of the K-ct of are LUT roots. The complete set of LUT roots can be compted as follows: starting with a rst-in-rst-ot (FIFO) qee inclding all the POs, we repeatedly extract nodes from the head of the qee ntil the qee is empty. For each node extracted from the qee, we mark it as an LUT root and pt all the nodes in its node-ct set to the end of the qee. Second, we create a new eqialent network by connecting the K-feasible cones X of the K-cts (X ; X ) of those LUT roots. Then, we compte a forward retiming for all nodes on the new network as follows: R() = 8 >< >: if is a PI or PO d S() e? 1 if is an LUT root min R() + w if w 2 LUT, where is an LUT root. After retiming all the s within each X will be pshed forward otside the X, becase w r ( w ; ) = w + 6

7 R()? R() = for any w 2 X. Then, we collapse each X and pt it into a K-LUT to form the nal mapping soltion which has a clock period of no more than min. Lemma 2: The retiming dened aboe is a legal forward retiming to achiee clock period min. Proof: First, we proe it is a forward retiming. Let be an LUT root. Since S() min, R() = d S() e? 1 min. If w is an internal node of LUT, we proe that R()?R() and then R(). Since S() + min R() min based on the denition of the node label pair and Corollary 1, we hae that d S()+minR() e 1 ) R() = min d S() e? 1?R(). Frthermore, for any min w 2 LUT we hae that w w(lut ; LUT ) R(). As a reslt, R() = R() + w?r() + R() =. After retiming, eery K-feasible cone X will become a pre combinational block and can be collapsed into a K- LUT. We shall proe that the clock period of the LUT network is no more than min in two steps. 1. The retiming is legal: Sppose LUT rooted at is a fanin of LUT rooted at and the path from to has w s in the original circit, S() S()? min w + 1 and d S() e d S() e?w. As a reslt, the new edge weight between the two LUTs after retiming is w r (LUT ; LUT ) = min min w? R() + R() = w? (d S() S() e? 1) + (d e? 1). min min 2. The retimed circit has a clock period of no more than min : For any path p : LUT ; LUT with delay d(p) > min, we proe that w r (p) 1. Since both d(p) and min are integral, it mst be that d(p) min + 1. Since S() S() + length(p) = S()? min w(p) + d(p)? d(lut ) and d(lut ) = 1, we hae that R() R()? w(p) + d d(p)?1 e R()? w(p) + 1. It means that min w r (p)=w(p) + R()? R()1. In the constrcted network, since the retiming is a forward retiming, the eqialent initial state can be compted with circit simlation. Since each K-inpt node can be simlated in O(K) time, 7 for a K-bonded circit with n gates and O(Kn) edges, the rst step of getting LUT roots can be done in O(Kn) time. There are O(n) LUTs in the constrcted LUT network. Each K-LUT incldes O(Kn) nodes as proed in Theorem 4 in [2]. The total nmber of nodes need to perform forward retiming is O(Kn 2 ). Frthermore, j R() j= O(n). The rntime of forward retiming is O(Kn 2 n K) = O(K 2 n 3 ). Howeer, the nmber of node in each K-LUT is almost bonded by a constant C = O(K) in practice. So the rntime of forward retiming is O(K 3 n 2 ) in practice. Theorem 4: For a seqential circit with n gates, the optimal FRT mapping soltion with the minimm clock period can be compted in O(K 3 n 5 log 2 n) time with O(K 2 n 2 +S) space, where S = O(2 K n) is the space needed to represent the original circit. Proof: See appendix. 7 To be precise, a node can be simlated in O(l) time, where l represents the size of the node, for example, the nmber of literals in the cbe-coer representation or the nmber of nodes in the BDD representation. Notice that Theorem 4 is based on the worst-case reslt that the expanded circits hae O(K 2 n 2 ) edges and we need go throgh n 2 iterations of label pdate. In practice, howeer, the expanded circits hae no more than Kn edges with the ecient K-ct comptation on partial ow networks [2] and the nmber of iterations for each target clock period is always mch less than n (515) for all the examples we hae tested with the DFS ordering of nodes. Practically, or algorithm rns in O(K 2 n 3 log 2 n) time with O(Kn + S) space reqirement, where S = O(2 K n) is the space to represent the original circit. E. Ber Insertion for Incompatible Initial States of Fanot s 1 (a) 1 (b) Fig. 6. Incompatible initial states of fanot ipops. Each small rectangle represents a with a digit in it representing its initial state. In Theorem 2 we assmed that the s on dierent fanot edges of a node had compatible initial states. Let s nmber the s on an edge e(; ) from to. The s on fanot edges of a node hae compatible initial states if for any i, the initial state of the i-th on edge e(; 1 ) is the same as that of the i-th on edge e(; 2 ), where 1 and 2 are two dierent fanots of. If, howeer, there are s with incompatible initial states as shown in Figre 6(a), the otpts of the two s may constitte two inpts to LUT as shown in Figre 6(b). On the other hand, if we psh the two s to the LUT's fanot as shown in Figre 6(c), the two inpts to the LUT can be merged into one. As a reslt, a K-LUT may not correspond to a K-ct. Conseqently, or label comptation may not be optimal. To sole this problem, we simply insert a ber on one fanot edge of as shown in Figre 6(d), before the label comptation. Obiosly, it will change neither any path's delay after mapping, nor the minimm clock period of FRT mapping soltions. In the worse-case, we only need to add O(Kn) bers, becase each edge needs at most one. After mapping, all those bers can be collapsed into LUTs withot increasing either delay or area. Theorem 5: For a gien seqential circit, the minimm clock periods of FRT mapping before and after ber insertion are the same. Proof: Let the minimm clock period of the optimal mapping soltion of the original circit (denoted G) be 1, and the minimm clock period for the circit after ber insertion (denoted G ) be 2. Obiosly, 2 1 becase any mapping soltion of G is also a mapping soltion of G. (c) (d) 1 7

8 (a) original circit (b) after bffer insertion (c) M1 (d) M2 (e) dplication-remoal for M2 Fig. 7. Ber insertion will not change the clock period. Now we proe that 2 1. Let M1 be any mapping soltion of G with clock period of. We shall show that we can constrct a mapping soltion M2 with the same clock period for G. It means that the minimm clock period of an optimal soltion of G is no more than the minimm clock period of an optimal soltion of G, i.e., 2 1. Let e(; ) be one edge in G. It is a isible edge in M1 if is an LUT root in M1, or an inisible edge otherwise. To constrct M2 for G based on M1, we add a ber on e(; ) if we added a ber on the corresponding edge in G. If e(; ) is an inisible edge in M1, the adding of the ber will not change the fnctionality of the LUT in which e(; ) is inclded. If e(; ) is a isible edge in M1, we can pack the ber in the LUT rooted at with possible node dplication as shown in Figre 7(d). Clearly, what we get is an FRT mapping soltion of G and the clock period is, the same as that of M1. This concldes or proof. Ber insertion shall not increase the area of any mapping soltion. First, all the bers can be deleted withot changing the circit behaior. Second, the node dplication cased by the ber insertion shown in Figre 7(d) can be eliminated easily with a postprocessing as shown in Figre 7(e). F. Label Comptation for Circits with Nodes Not Reachable from PIs PI b a c PO PI PO PI Φ=2 Φ=1 (a) original circit (b) mapping withot retiming (c) mapping with forward retiming Fig. 8. Mapping for circits with loops isolated from PIs. Forward retiming is performed for the LUT in the loop of the soltion shown in (c). In Theorem 1, we assmed that eery node is connected to at least one PI. Howeer, there exist circits with some nodes not connected with any PIs. The l-ales of those nodes are not well-dened, and as a reslt, Theorem 1 no longer holds. Notice that Theorem 3 in [1] also sers the same problem and the algorithms proposed in this paper and [1], [2] cannot be directly applied for this kind of circits. Neertheless, the mapping and retiming on those nodes can still aect the clock period of the entire circit PO as shown in Figres 8(b) and (c). We need to consider those nodes in the mapping algorithm. First, we assme that those nodes which are not connected with PIs mst form at least a loop. (Otherwise, they mst hae at least one predecessor withot any inpt and with constant ale. We can either collapse those nodes with constant ale into their fanots or mark them as PIs.) By denition of legal retiming in [3], we know that paths from those loops to POs will neer be critical, becase we can insert as many nmber of s on those paths as we want by moing s in those loops arond. Let s look at the examples shown in Figre 8(a). Wheneer we moe the arond the loop of three inerters once, we will insert one on the edge from inerter c to the AND gate. The only possibility that those loops will increase the clock period is when one of them has positie path length. To detect this case with Theorem 1, we can create a psedo PI and connect it to eery node that has no connection with any real PIs with a new edge. We then pt a ery large nmber of s on those edges. As a reslt, the l-ales of those nodes can still be dened and mst be ery ales, nless they form at least one loop with positie path length. We can modify or algorithm as follows. We rst detect nodes that are not connected with any PIs and then assign their initial lower-bonds to be a ery small ale, bt not?1. (For example, we can assign the initial lower-bond of node labels to be? F for internal nodes and POs and for PIs, respectiely, where F is the total nmber of s on the circit.) The initial lower-bond assignment of the rest of node is the same as presented before, i.e., for PIs and?1 for internal nodes and POs. We then the iteratie label pdate ntil all the lower-bonds conerge to ales no more than the target clock period or any of the lowerbonds exceed. In the rst case, we conclde that is a feasible clock period. In the latter case, we conclde that is infeasible. Clearly, this modication works for mapping with general retiming algorithms proposed in [1], [2] as well, except that in that case we only need to check the labels of the POs. IV. Post-Processing for FPGAs with Fixed Initial State Setting In the preios section, we assme that the target FPGA deice proides both set and reset signals, ths, we can set the initial states of s arbitrarily to be either 1 or. Howeer, there are some FPGA deices proiding only one sch control signal. To map a circit onto sch kinds of FPGA deices, the initial states of all the s need to be the same (either 1 or, depending on which signal, set or reset, is proided on the deices). 8 In this case, we propose to perform a postprocessing of state switching with inerter insertion on the mapping soltion generated by the TrboMap-frt algorithm presented in the preios section. We garantee that the clock period will remain the same. 8 For example, Xilinx XC52 FPGAs proide only reset signal to each. 8

9 Frthermore, in most cases, the nmber of LUTs will also remain the same. Withot lose of generality, we sppose that the target FPGA deice proides only reset signal for each, i.e., the initial state of eery needs to be. Let F 1 be an with initial state of 1 in an FRT mapping soltion. We insert a pair of inerters on the fanin and fanot edges of F 1 and change the initial state of F 1 to be as shown in Figres 9(a), (b) and (c). The inerters can then be packed in LUTs (or IO pads if they hae bilt-in inerters) connected to the as shown in Figres 9(d), (e) and (f). In case that the inerters connected only to s or IO pads which do not hae bilt-in inerters, those inerters need to be implemented with new LUTs. Clearly, we can change the initial state of eery with this operation and the clock period will remain the same, except a few more LUTs might be needed to implement some inerters if they cannot be packed into existing LUTs. Fig (a) (b) (c) 1 (d) (e) (f) Changing the initial state of a with inerter insertion. Theorem 6: The clock period will remain the same after inerter insertion and initial state switching. Proof: Let p be a combinational path ended with either PI/POs or s and the clock period be ( 1) before inerter insertion. We consider the following two cases: 1) If the nmber of LUTs on p is larger than, all the added inerters on p can be either cancelled ot or packed into LUTs on p. As a reslt, neither the nmber of LUTs on p, nor the clock period of the circit will change. 2) If the nmber of LUTs on p is, after inerter insertion and cancellation (of inerter pairs), there is at most one LUT on p. Since the clock period in the original mapping soltion mst be at least one, the delay increase on p from to 1 will not change the clock period as well. V. Experimental Reslts The TrboMap-frt algorithm has been implemented in C langage on Sn workstations and incorporated into the SIS package [15] and the UCLA RASP FPGA Synthesis system [24]. Or rst test set consists of 14 MCNC FSM benchmarks and 4 ISCAS'89 benchmarks. Or second test set consists of 8 larger ISCAS benchmarks with more s. SIS seqential synthesis commands and the DMIG decomposition method [21] are sed to generate the initial circits as shown in Table I. Colmns GATE and list the nmbers of gates and s in each circit, respectiely. Table II shows the comparison of TrboMap-frt with FlowMap-frt and TrboMap [2] on the rst test set. Or experiments were performed on a Sn ULTRA2 workstation with 256MB memory. K was set to be 5. All the mapcircit PI PO GATE bbara bbtas dk dk kirkman ex ex s sse keyb styr sand planet scf s s s s TABLE I Initial circits of MCNC and ISCAS Benchmarks. ping reslts of TrboMap-frt were compted and eried by erify fsm of SIS [15] which performs formal erication for seqential circits, except the 4 largest ones which were eried by circit simlation with inpt seqences of 38 random ectors. FlowMap-frt represents the conentional approach. It rst partitions a seqential circit into a set of combinational sbcircits by ctting at inpts and otpts of all s, then, maps eery sbcircits independently sing the depth-optimal FlowMap algorithm [19]. After merging the mapped LUT sbcircits with the original s, a post-processing of forward retiming for clock period minimization was performed. TrboMap, on the other hand, comptes optimal mapping with general retiming soltions, bt withot consideration of initial states [2]. We sed the initial state comptation tility in SIS [15] which is based on the algorithm in [8] to try to compte an eqialent initial state for eery TrboMap soltion. In Table II, Colmns LUT and list the nmbers of LUTs and s, respectiely, in each mapping soltion. Colmns list the clock periods of the mapping reslts. Colmns CPU list the CPU time in seconds for each algorithm. Those marked with? are examples for which SIS failed to compte eqialent initial states for the TrboMap soltions in 2 hors. Colmn Best lists the best alid soltions (with compted eqialent initial states) by TrboMap and FlowMap-frt. circit PI PO GATE bigkey s s s s s s s TABLE III Eight pipelined test circits from ISCAS Benchmark site. To show the eectieness of or algorithm on larger ex- 9

10 FlowMap-frt TrboMap Best TrboMap-frt circit LUT CPU LUT CPU LUT CPU bbara bbtas dk dk ex ex ? keyb kirkman ? planet ? s sand ? scf ? sse styr ? s ? s ? > s ? > s ? GMEAN % TABLE II Comparison of TrboMap-frt with FlowMap-frt and TrboMap for 5-LUT. \GMEAN" lists the geometric means of the reslts by each approach, respectiely. The rntime was recorded on a Sn ULTRA2 with 256MB memory. Those marked with? are circits that SIS failed to compte initial states for TrboMap soltions. FlowMap-frt TrboMap TrboMap-frt circit LUT CPU LUT CPU LUT CPU bigkey s s s s s s s geo-mean % % TABLE IV Comparison of TrboMap-frt with FlowMap-frt and TrboMap for 5-LUT. \GMEAN" lists the geometric means of the reslts by each approach, respectiely. The rntime was recorded on a Sn ULTRA2 with 512MB memory. No initial state comptation was performed for TrboMap soltions. amples with more s, we selected 8 ISCAS examples with more than 7 s. We rst performed pipeline insertion to frther increase the nmber of s. The pipeline insertion was performed to the extent that the clock period of the original circit (before mapping) cold not be frther redced, i.e., it eqals to the maximm loops' delay-toregister ratio. The sizes of the 8 examples are shown in Table III. The test reslts are shown in Table IV. Becase all the circits hae more than 1 s, we did not try to compte new initial states for TrboMap soltions with SIS, becase it wold take too long CPU time. We noticed that with more s in the circit, TrboMap-frt needs more time to compte optimal soltions. This is becase with more s, frt() is generally larger and the binary search of w min from to frt() (shown in Figre 5) needs longer comptation time. The reslts show that TrboMap-frt can redce the clock period by 17% as compared with FlowMap-frt for both test sets. Comparing with the reslts by TrboMap [2], which represent the minimm possible clock period of mapping with general retiming, the clock period by TrboMap-frt is 3.6% or 2.8% longer, respectiely, for the two test sets. Howeer, there are 1 ot of 18 TrboMap soltions in the rst test set for which SIS concldes no eqialent initial states or cannot nd one de to either large memory reqirement (more than 3MB) or long rntime (longer than 2 hors). If we compare the best alid soltions by TrboMap and FlowMap-frt, TrboMap-frt can still redce the clock period by 8%. Notice that we did not perform the pre-processing of backward retiming for TrboMap-frt in or experiments. We beliee that the clock period by TrboMap-frt wold hae been een closer to that by TrboMap if sch pre-processing had been performed. Howeer, we do not see a compelling reason in doing this as the 1

11 best possible improement cold be ery marginal (only % as shown in or experiments). Or reslts also show that all the three algorithms compte reslts with similar nmbers of LUTs. Bt FlowMap-frt ses 34% fewer s. In general, we think simltaneos mapping with retiming leads to smaller clock period bt tends to se more s. VI. Conclsion and Ftre Work For seqential circits with initial states, we present a new algorithm TrboMap-frt for FPGA mapping with forward retiming and initial state comptation to minimize the clock period. Unlike preios retiming algorithms which compte retiming to minimize the clock period at rst and then try to compte an eqialent initial state, we compte optimal mapping with forward retiming with garanteed eqialent initial state in one step. Or algorithm enables a new methodology of separating forward retiming from backward retiming. Since we garantee to compte an optimal mapping with forward retiming soltion, backward retiming can be performed as a preprocessing step to try to psh ipops to primary inpts as mch as possible with consideration of only initial state comptation. Ths, we can aoid the time-consming iterations between retiming for clock period minimization and initial state comptation. The experimental reslts show that TrboMap-frt is ery ecient and eectie comparing with conentional approaches of separated mapping with retiming. Frthermore, the reslts by TrboMap-frt are ery close to that by the optimal mapping with general retiming algorithm TrboMap [2] with regard to both area and clock period, while many soltions by TrboMap cannot compte initial states. In the ftre we plan to extend or work for library based technology mapping with (forward) retiming for high performance gate array and standard cell designs. A general framework on retiming with mltiple clock designs was proposed recently by Legl et al. [25]. (The retiming proposed in [3] and sed in this and many other papers consider seqential circits with single clock.) We plan to accommodate or approach into their framework as well. VII. Acknowledgements This work is partially spported by National Science Fondation Yong Inestigator Award MIP and grants from Xilinx, Lcent Technologies and Qicktrn Design Systems nder the California MICRO program. References [1] P. Pan and C. L. Li, \Optimal Clock Period FPGA Technology Mapping for Seqential Circits," ACM Transactions on Design Atomation of Electronic Systems, ol. 3, no. 3, 1998, [2] J. Cong and C. W, \An Ecient Algorithm for Performance- Optimal FPGA Technology Mapping with Retiming," IEEE Trans. on Compter-Aided Design of Integrated Circits And Systems, ol. 17, no. 9, pp. 738{748, [3] C. E. Leiserson and J. B. Saxe, \Retiming Synchronos Circitry," Algorithmica, ol. 6, pp. 5{35, [4] G. D. Micheli, \Synchronos Logic Synthesis: Algorithms for Cycle-Time Minimization," IEEE Trans. on Compter-Aided Design of Integrated Circits And Systems, ol. 1, no. 1, pp. 63{73, [5] H. Toati, N. Shenoy, and A. Sangioanni-Vincentelli, \Retiming for Table-Lookp Field-Programmable Gate Arrays," in FPGA'92, 1992, pp. 89{94. [6] S. Malik, K. J. Singh, R. K. Brayton, and A. Sangioanni- Vincentelli, \Performance Optimization of Pipelined Logic Circits Using Peripheral Retiming and Resynthesis," IEEE Trans. on Compter-Aided Design of Integrated Circits And Systems, ol. 12, no. 5, pp. 568{578, [7] J. Cong and C. W, \An Improed Algorithm for Performance Optimal Technology Mapping with Retiming in LUT- Based FPGA Design," in IEEE International Conference on Compter Design, 1996, pp. 572{578. [8] H. Toati and R. K. Brayton, \Compting the Initial States of Retimed Circits," IEEE Trans. on Compter-Aided Design of Integrated Circits And Systems, ol. 12, no. 1, pp. 157{162, [9] V. Singhal, S. Malik, and R. K. Brayton, \The Case for Retiming with Explicit Reset Circitry," in IEEE International Conference on CAD, 1996, pp. 618{625. [1] H. Yotsyanagi, S. Kajihara, and K. Kinoshita, \Retiming for Seqential Circits with a Specied Initial State and Its Application to Testability Enhancement," IEICE Trans. INF. & SYST., ol. E78-D, no. 7, pp. 861{867, Jly [11] G. Een, I. Y. Spillinger, and L. Stok, \Retiming Reisited and Reersed," IEEE Trans. on Compter-Aided Design of Integrated Circits And Systems, ol. 15, no. 3, pp. 348{357, [12] N. Maheshwari and S. S. Sapatnekar, \Minimm Area Retiming with Eqialent Initial States," in IEEE International Conference on CAD, 1997, pp. 216{219. [13] H. Fjiwara and S. Toida, \The Complexity of Falt Detection: An Approach to Design for Testability," in FTCS-12, 1982, pp. 11{18. [14] S. Knd, L. M. Hisman, I. Nair, and V. Iyengar, \A Small Test Generator for Large Designs," in International Test Conference, 1992, pp. 3{4. [15] E. Sentoich, K. Singh, L. Laagno, C. Moon, R. Mrgai, A. Saldanha, H. Saoj, P. Stephan, R. Brayton, and A. Sangioanni- Vincentelli, SIS: A System for Seqential Circit Synthesis, Electronics Research Laboratory, Memorandm No. UCB/ERL M92/41, [16] L. Stok, I. Spillinger, and G. Een, \Improing Initialization throgh Reersed Retiming," in Proc. Ero. Design & Test Conference, 1995, pp. 15{154. [17] J. Cong and C. W, \Optimal FPGA Mapping and Retiming with Ecient Initial State Comptation," in Proc. ACM/IEEE Design Atomation Conference., 1998, pp. 33{335. [18] J. Cong and Y. Ding, \On Nominal Delay Minimization in LUT- Based FPGA Technology Mapping," Integration { the VLSI Jornal, ol. 18, pp. 73{94, [19] J. Cong and Y. Ding, \FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookp-Table Based FPGA Designs," IEEE Trans. on Compter-Aided Design of Integrated Circits And Systems, ol. 13, no. 1, pp. 1{12, [2] R. K. Brayton, R. Rdell, A. L. Sangioanni-Vincentelli, and A. R. Wang, \MIS: A Mltiple-Leel Logic Optimization System," IEEE Tans. on Compter-Aided Desing, ol. 6, no. 6, pp. 162{181, [21] K. C. Chen, J. Cong, Y. Ding, A. B. Kahng, and P. Trajmar, \DAG-Map: Graph-based FPGA Technology Mapping for Delay Optimization," in IEEE Design and Test of Compters, 1992, pp. 7{2. [22] J. Cong and Y.-Y. Hwang, \Strctral Gate Decomposition for Depth-Optimal Technology Mapping in LUTbased FPGA Desings," ACM Transactions on Design Atomation of Electronic Systems, ol. 5, no. 2, 2, [23] T. H. Cormen, C. H. Leiserson, and R. L. Riest, Introdction to Algorithms, The MIT Press, 199. [24] J. Cong, J. Peck, and Y. Ding, \RASP: A General Logic Synthesis System for SRAM-based FPGAs," in Proc. ACM 4th Int'l Symp. on FPGA, 1996, pp. 137{143, xfpga/software/raspsyn.htm. 11

Optimal FPGA Mapping and Retiming with. Jason Cong and Chang Wu. problem which is in general NP-complete.

Optimal FPGA Mapping and Retiming with. Jason Cong and Chang Wu. problem which is in general NP-complete. Optimal FPGA Mapping and Retiming with Ecient Initial State Computation Jason Cong and Chang Wu Department of Computer Science University of California, Los Angeles, CA 90095 Abstract For sequential circuits

More information

The Disciplined Flood Protocol in Sensor Networks

The Disciplined Flood Protocol in Sensor Networks The Disciplined Flood Protocol in Sensor Networks Yong-ri Choi and Mohamed G. Goda Department of Compter Sciences The University of Texas at Astin, U.S.A. fyrchoi, godag@cs.texas.ed Hssein M. Abdel-Wahab

More information

v e v 1 C 2 b) Completely assigned T v a) Partially assigned Tv e T v 2 p k

v e v 1 C 2 b) Completely assigned T v a) Partially assigned Tv e T v 2 p k Approximation Algorithms for a Capacitated Network Design Problem R. Hassin 1? and R. Rai 2?? and F. S. Salman 3??? 1 Department of Statistics and Operations Research, Tel-Ai Uniersity, Tel Ai 69978, Israel.

More information

[1] Hopcroft, J., D. Joseph and S. Whitesides, Movement problems for twodimensional

[1] Hopcroft, J., D. Joseph and S. Whitesides, Movement problems for twodimensional Acknoledgement. The athors thank Bill Lenhart for interesting discssions on the recongration of rlers. References [1] Hopcroft, J., D. Joseph and S. Whitesides, Moement problems for todimensional linkages,

More information

Alliances and Bisection Width for Planar Graphs

Alliances and Bisection Width for Planar Graphs Alliances and Bisection Width for Planar Graphs Martin Olsen 1 and Morten Revsbæk 1 AU Herning Aarhs University, Denmark. martino@hih.a.dk MADAGO, Department of Compter Science Aarhs University, Denmark.

More information

CS 557 Lecture IX. Drexel University Dept. of Computer Science

CS 557 Lecture IX. Drexel University Dept. of Computer Science CS 7 Lectre IX Dreel Uniersity Dept. of Compter Science Fall 00 Shortest Paths Finding the Shortest Paths in a graph arises in many different application: Transportation Problems: Finding the cheapest

More information

Maximal Cliques in Unit Disk Graphs: Polynomial Approximation

Maximal Cliques in Unit Disk Graphs: Polynomial Approximation Maximal Cliqes in Unit Disk Graphs: Polynomial Approximation Rajarshi Gpta, Jean Walrand, Oliier Goldschmidt 2 Department of Electrical Engineering and Compter Science Uniersity of California, Berkeley,

More information

Reconstructing Generalized Staircase Polygons with Uniform Step Length

Reconstructing Generalized Staircase Polygons with Uniform Step Length Jornal of Graph Algorithms and Applications http://jgaa.info/ ol. 22, no. 3, pp. 431 459 (2018) DOI: 10.7155/jgaa.00466 Reconstrcting Generalized Staircase Polygons with Uniform Step Length Nodari Sitchinaa

More information

THE technology mapping and synthesis problem for field

THE technology mapping and synthesis problem for field 738 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 1998 An Efficient Algorithm for Performance-Optimal FPGA Technology Mapping with Retiming Jason

More information

Lemma 1 Let the components of, Suppose. Trees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic). (b)

Lemma 1 Let the components of, Suppose. Trees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic). (b) Trees Lemma Let the components of ppose "! be (a) $&%('*)+ - )+ / A tree is a graph which is (b) 0 %(')+ - 3)+ / 6 (a) (a) Connected and (b) has no cycles (acyclic) (b) roof Eery path 8 in which is not

More information

Vertex Guarding in Weak Visibility Polygons

Vertex Guarding in Weak Visibility Polygons Vertex Garding in Weak Visibility Polygons Pritam Bhattacharya, Sbir Kmar Ghosh*, Bodhayan Roy School of Technology and Compter Science Tata Institte of Fndamental Research Mmbai 400005, India arxi:1409.46212

More information

On the Computational Complexity and Effectiveness of N-hub Shortest-Path Routing

On the Computational Complexity and Effectiveness of N-hub Shortest-Path Routing 1 On the Comptational Complexity and Effectiveness of N-hb Shortest-Path Roting Reven Cohen Gabi Nakibli Dept. of Compter Sciences Technion Israel Abstract In this paper we stdy the comptational complexity

More information

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this

More information

Improving Network Connectivity Using Trusted Nodes and Edges

Improving Network Connectivity Using Trusted Nodes and Edges Improing Network Connectiity Using Trsted Nodes and Edges Waseem Abbas, Aron Laszka, Yegeniy Vorobeychik, and Xenofon Kotsokos Abstract Network connectiity is a primary attribte and a characteristic phenomenon

More information

Adaptive Influence Maximization in Microblog under the Competitive Independent Cascade Model

Adaptive Influence Maximization in Microblog under the Competitive Independent Cascade Model International Jornal of Knowledge Engineering, Vol. 1, No. 2, September 215 Adaptie Inflence Maximization in Microblog nder the Competitie Independent Cascade Model Zheng Ding, Kai Ni, and Zhiqiang He

More information

Mobility Control and Its Applications in Mobile Ad Hoc Networks

Mobility Control and Its Applications in Mobile Ad Hoc Networks Mobility Control and Its Applications in Mobile Ad Hoc Netorks Jie W and Fei Dai Department of Compter Science and Engineering Florida Atlantic Uniersity Boca Raton, FL 3331 Abstract Most existing localized

More information

Faster Random Walks By Rewiring Online Social Networks On-The-Fly

Faster Random Walks By Rewiring Online Social Networks On-The-Fly 1 Faster Random Walks By Rewiring Online ocial Networks On-The-Fly Zhojie Zho 1, Nan Zhang 2, Zhigo Gong 3, Gatam Das 4 1,2 Compter cience Department, George Washington Uniersity 1 rexzho@gw.ed 2 nzhang10@gw.ed

More information

Real-Time Robot Path Planning via a Distance-Propagating Dynamic System with Obstacle Clearance

Real-Time Robot Path Planning via a Distance-Propagating Dynamic System with Obstacle Clearance POSTPRINT OF: IEEE TRANS. SYST., MAN, CYBERN., B, 383), 28, 884 893. 1 Real-Time Robot Path Planning ia a Distance-Propagating Dynamic System with Obstacle Clearance Allan R. Willms, Simon X. Yang Member,

More information

Pushing squares around

Pushing squares around Pshing sqares arond Adrian Dmitresc János Pach Ý Abstract We stdy dynamic self-reconfigration of modlar metamorphic systems. We garantee the feasibility of motion planning in a rectanglar model consisting

More information

On Plane Constrained Bounded-Degree Spanners

On Plane Constrained Bounded-Degree Spanners On Plane Constrained Bonded-Degree Spanners Prosenjit Bose 1, Rolf Fagerberg 2, André an Renssen 1, Sander Verdonschot 1 1 School of Compter Science, Carleton Uniersity, Ottaa, Canada. Email: jit@scs.carleton.ca,

More information

Planarity-Preserving Clustering and Embedding for Large Planar Graphs

Planarity-Preserving Clustering and Embedding for Large Planar Graphs Planarity-Presering Clstering and Embedding for Large Planar Graphs Christian A. Dncan, Michael T. Goodrich, and Stephen G. Koboro Center for Geometric Compting The Johns Hopkins Uniersity Baltimore, MD

More information

Faster Random Walks By Rewiring Online Social Networks On-The-Fly

Faster Random Walks By Rewiring Online Social Networks On-The-Fly Faster Random Walks By Rewiring Online ocial Networks On-The-Fly Zhojie Zho 1, Nan Zhang 2, Zhigo Gong 3, Gatam Das 4 1,2 Compter cience Department, George Washington Uniersity 1 rexzho@gw.ed 2 nzhang10@gw.ed

More information

COMPOSITION OF STABLE SET POLYHEDRA

COMPOSITION OF STABLE SET POLYHEDRA COMPOSITION OF STABLE SET POLYHEDRA Benjamin McClosky and Illya V. Hicks Department of Comptational and Applied Mathematics Rice University November 30, 2007 Abstract Barahona and Mahjob fond a defining

More information

Constraint-Driven Communication Synthesis

Constraint-Driven Communication Synthesis Constraint-Drien Commnication Synthesis Alessandro Pinto Goqiang Wang Lca P. Carloni Abstract Constraint-drien Commnication Synthesis enables the atomatic design of the commnication architectre of a complex

More information

Friend of My Friend: Network Formation with Two-Hop Benefit

Friend of My Friend: Network Formation with Two-Hop Benefit Friend of My Friend: Network Formation with Two-Hop Benefit Elliot Ansheleich, Onkar Bhardwaj, and Michael Usher Rensselaer Polytechnic Institte, Troy NY, USA Abstract. How and why people form ties is

More information

On Plane Constrained Bounded-Degree Spanners

On Plane Constrained Bounded-Degree Spanners Algorithmica manscript No. (ill be inserted by the editor) 1 On Plane Constrained Bonded-Degree Spanners 2 3 Prosenjit Bose Rolf Fagerberg André an Renssen Sander Verdonschot 4 5 Receied: date / Accepted:

More information

This chapter is based on the following sources, which are all recommended reading:

This chapter is based on the following sources, which are all recommended reading: Bioinformatics I, WS 09-10, D. Hson, December 7, 2009 105 6 Fast String Matching This chapter is based on the following sorces, which are all recommended reading: 1. An earlier version of this chapter

More information

A sufficient condition for spiral cone beam long object imaging via backprojection

A sufficient condition for spiral cone beam long object imaging via backprojection A sfficient condition for spiral cone beam long object imaging via backprojection K. C. Tam Siemens Corporate Research, Inc., Princeton, NJ, USA Abstract The response of a point object in cone beam spiral

More information

Localized Delaunay Triangulation with Application in Ad Hoc Wireless Networks

Localized Delaunay Triangulation with Application in Ad Hoc Wireless Networks 1 Localized Delanay Trianglation with Application in Ad Hoc Wireless Networks Xiang-Yang Li Gria Călinesc Peng-Jn Wan Y Wang Department of Compter Science, Illinois Institte of Technology, Chicago, IL

More information

Chapter 5 Network Layer

Chapter 5 Network Layer Chapter Network Layer Network layer Physical layer: moe bit seqence between two adjacent nodes Data link: reliable transmission between two adjacent nodes Network: gides packets from the sorce to destination,

More information

Dynamic Maintenance of Majority Information in Constant Time per Update? Gudmund S. Frandsen and Sven Skyum BRICS 1 Department of Computer Science, Un

Dynamic Maintenance of Majority Information in Constant Time per Update? Gudmund S. Frandsen and Sven Skyum BRICS 1 Department of Computer Science, Un Dynamic Maintenance of Majority Information in Constant Time per Update? Gdmnd S. Frandsen and Sven Skym BRICS 1 Department of Compter Science, University of arhs, Ny Mnkegade, DK-8000 arhs C, Denmark

More information

Minimal Edge Addition for Network Controllability

Minimal Edge Addition for Network Controllability This article has been accepted for pblication in a ftre isse of this jornal, bt has not been flly edited. Content may change prior to final pblication. Citation information: DOI 10.1109/TCNS.2018.2814841,

More information

Design and Optimization of Multi-Faceted Reflectarrays for Satellite Applications

Design and Optimization of Multi-Faceted Reflectarrays for Satellite Applications Design and Optimization of Mlti-Faceted Reflectarrays for Satellite Applications Min Zho, Stig B. Sørensen, Peter Meincke, and Erik Jørgensen TICRA, Copenhagen, Denmark ticra@ticra.com Abstract The design

More information

A Unified Energy-Efficient Topology for Unicast and Broadcast

A Unified Energy-Efficient Topology for Unicast and Broadcast A Unified Energy-Efficient Topology for Unicast and Broadcast Xiang-Yang Li Dept. of Compter Science Illinois Institte of Technology, Chicago, IL, USA xli@cs.iit.ed Wen-Zhan Song School of Eng. & Comp.

More information

The single-cycle design from last time

The single-cycle design from last time lticycle path Last time we saw a single-cycle path and control nit for or simple IPS-based instrction set. A mlticycle processor fies some shortcomings in the single-cycle CPU. Faster instrctions are not

More information

Chapter 7 TOPOLOGY CONTROL

Chapter 7 TOPOLOGY CONTROL Chapter TOPOLOGY CONTROL Oeriew Topology Control Gabriel Graph et al. XTC Interference SINR & Schedling Complexity Distribted Compting Grop Mobile Compting Winter 00 / 00 Distribted Compting Grop MOBILE

More information

Evaluating Influence Diagrams

Evaluating Influence Diagrams Evalating Inflence Diagrams Where we ve been and where we re going Mark Crowley Department of Compter Science University of British Colmbia crowley@cs.bc.ca Agst 31, 2004 Abstract In this paper we will

More information

An Extended Fault-Tolerant Link-State Routing Protocol in the Internet

An Extended Fault-Tolerant Link-State Routing Protocol in the Internet An Extended Falt-Tolerant Link-State Roting Protocol in the Internet Jie W, Xiaola Lin, Jiannong Cao z, and Weijia Jia x Department of Compter Science and Engineering Florida Atlantic Uniersit Boca Raton,

More information

Retiming Sequential Circuits with Multiple Register Classes

Retiming Sequential Circuits with Multiple Register Classes Retiming Sequential Circuits with Multiple Register Classes Klaus Eckl Christian Legl Institute of Electronic Design Automation Technical Uniersity of Munich 80290 Munich, Germany KlausEckl@eitumde ChristianLegl@eitumde

More information

Fault Tolerance in Hypercubes

Fault Tolerance in Hypercubes Falt Tolerance in Hypercbes Shobana Balakrishnan, Füsn Özgüner, and Baback A. Izadi Department of Electrical Engineering, The Ohio State University, Colmbs, OH 40, USA Abstract: This paper describes different

More information

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs . FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles,

More information

Mathematical model for storing and effective processing of directed graphs in semistructured data management systems

Mathematical model for storing and effective processing of directed graphs in semistructured data management systems Mathematical model for storing and effectie processing of directed graphs in semistrctred data management MALIKOV A, GULEVSKIY Y, PARKHOMENKO D Information Systems and Technologies Department North Cacass

More information

Constructing Multiple Light Multicast Trees in WDM Optical Networks

Constructing Multiple Light Multicast Trees in WDM Optical Networks Constrcting Mltiple Light Mlticast Trees in WDM Optical Networks Weifa Liang Department of Compter Science Astralian National University Canberra ACT 0200 Astralia wliang@csaneda Abstract Mlticast roting

More information

ABSOLUTE DEFORMATION PROFILE MEASUREMENT IN TUNNELS USING RELATIVE CONVERGENCE MEASUREMENTS

ABSOLUTE DEFORMATION PROFILE MEASUREMENT IN TUNNELS USING RELATIVE CONVERGENCE MEASUREMENTS Proceedings th FIG Symposim on Deformation Measrements Santorini Greece 00. ABSOUTE DEFORMATION PROFIE MEASUREMENT IN TUNNES USING REATIVE CONVERGENCE MEASUREMENTS Mahdi Moosai and Saeid Khazaei Mining

More information

Minimum Spanning Trees Outline: MST

Minimum Spanning Trees Outline: MST Minimm Spanning Trees Otline: MST Minimm Spanning Tree Generic MST Algorithm Krskal s Algorithm (Edge Based) Prim s Algorithm (Vertex Based) Spanning Tree A spanning tree of G is a sbgraph which is tree

More information

Pipelining. Chapter 4

Pipelining. Chapter 4 Pipelining Chapter 4 ake processor rns faster Pipelining is an implementation techniqe in which mltiple instrctions are overlapped in eection Key of making processor fast Pipelining Single cycle path we

More information

(2, 4) Tree Example (2, 4) Tree: Insertion

(2, 4) Tree Example (2, 4) Tree: Insertion Presentation for se with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 B-Trees and External Memory (2, 4) Trees Each internal node has 2 to 4 children:

More information

Mobility Control and Its Applications in Mobile Ad Hoc Networks

Mobility Control and Its Applications in Mobile Ad Hoc Networks Mobility Control and Its Applications in Mobile Ad Hoc Netorks Jie W and Fei Dai, Florida Atlantic Uniersity Abstract Most eisting localized protocols in mobile ad hoc netorks, sch as data commnication

More information

Fixed-Parameter Algorithms for Cluster Vertex Deletion

Fixed-Parameter Algorithms for Cluster Vertex Deletion Fixed-Parameter Algorithms for Clster Vertex Deletion Falk Hüffner Christian Komsieicz Hannes Moser Rolf Niedermeier Institt für Informatik, Friedrich-Schiller-Uniersität Jena, Ernst-Abbe-Platz 2, D-07743

More information

On shortest-path all-optical networks without wavelength conversion requirements

On shortest-path all-optical networks without wavelength conversion requirements Research Collection Working Paper On shortest-path all-optical networks withot waelength conersion reqirements Athor(s): Erlebach, Thomas; Stefanakos, Stamatis Pblication Date: 2002 Permanent Link: https://doi.org/10.3929/ethz-a-004446054

More information

Chapter 6: Pipelining

Chapter 6: Pipelining CSE 322 COPUTER ARCHITECTURE II Chapter 6: Pipelining Chapter 6: Pipelining Febrary 10, 2000 1 Clothes Washing CSE 322 COPUTER ARCHITECTURE II The Assembly Line Accmlate dirty clothes in hamper Place in

More information

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809 PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA Laurent Lemarchand Informatique ubo University{ bp 809 f-29285, Brest { France lemarch@univ-brest.fr ea 2215, D pt ABSTRACT An ecient distributed

More information

Primary and Derived Variables with the Same Accuracy in Interval Finite Elements

Primary and Derived Variables with the Same Accuracy in Interval Finite Elements Primary and Deried Variables with the Same Accracy in Interal inite Elements M. V. Rama Rao R. L. Mllen 2 and R. L. Mhanna 3 Vasai College of Engineering Hyderabad - 5 3 INDIA dr.mrr@gmail.com 2 Case Western

More information

Compatible Class Encoding in Roth-Karp Decomposition for Two-Output LUT Architecture

Compatible Class Encoding in Roth-Karp Decomposition for Two-Output LUT Architecture Compatible Class Encoding in Roth-Karp Decomposition for Two-Output LUT Architecture Juinn-Dar Huang, Jing-Yang Jou and Wen-Zen Shen Department of Electronics Engineering, National Chiao Tung Uniersity,

More information

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION EXAINATIONS 2010 END OF YEAR COPUTER ORGANIZATION Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. ake sre yor answers are clear and to the point. Calclators and paper foreign langage

More information

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read. The final path PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtor RegDst ALUSrc em I [5

More information

Complete Information Pursuit Evasion in Polygonal Environments

Complete Information Pursuit Evasion in Polygonal Environments Complete Information Prsit Easion in Polygonal Enironments Kyle Klein and Sbhash Sri Department of Compter Science Uniersity of California Santa Barbara, CA 9306 Abstract Sppose an npredictable eader is

More information

Structural Gate Decomposition for Depth- Optimal Technology Mapping in LUT- Based FPGA Designs

Structural Gate Decomposition for Depth- Optimal Technology Mapping in LUT- Based FPGA Designs Structural Gate Decomposition or Depth- Optimal Technology Mapping in LUT- Based FPGA Designs JASON CONG and YEAN-YOW HWANG Uniersity o Caliornia In this paper we study structural gate decomposition in

More information

Review Multicycle: What is Happening. Controlling The Multicycle Design

Review Multicycle: What is Happening. Controlling The Multicycle Design Review lticycle: What is Happening Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em

More information

Pipelined van Emde Boas Tree: Algorithms, Analysis, and Applications

Pipelined van Emde Boas Tree: Algorithms, Analysis, and Applications This fll text paper was peer reviewed at the direction of IEEE Commnications Society sbject matter experts for pblication in the IEEE INFOCOM 007 proceedings Pipelined van Emde Boas Tree: Algorithms, Analysis,

More information

Review: Computer Organization

Review: Computer Organization Review: Compter Organization Pipelining Chans Y Landry Eample Landry Eample Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 mintes A B C D Dryer takes 3 mintes

More information

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Jason Cong and Yean-Yow Hwang Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this paper, we

More information

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code: EE8 Winter 25 Homework #2 Soltions De Thrsday, Feb 2, 5 P. ( points) Consider the following fragment of Java code: for (i=; i

More information

Optimal Sampling in Compressed Sensing

Optimal Sampling in Compressed Sensing Optimal Sampling in Compressed Sensing Joyita Dtta Introdction Compressed sensing allows s to recover objects reasonably well from highly ndersampled data, in spite of violating the Nyqist criterion. In

More information

ECE250: Algorithms and Data Structures Single Source Shortest Paths Dijkstra s Algorithm

ECE250: Algorithms and Data Structures Single Source Shortest Paths Dijkstra s Algorithm ECE0: Algorithms and Data Strctres Single Sorce Shortest Paths Dijkstra s Algorithm Ladan Tahildari, PEng, SMIEEE Associate Professor Software Technologies Applied Research (STAR) Grop Dept. of Elect.

More information

Tdb: A Source-level Debugger for Dynamically Translated Programs

Tdb: A Source-level Debugger for Dynamically Translated Programs Tdb: A Sorce-level Debgger for Dynamically Translated Programs Naveen Kmar, Brce R. Childers, and Mary Lo Soffa Department of Compter Science University of Pittsbrgh Pittsbrgh, Pennsylvania 15260 {naveen,

More information

Visibility-Graph-based Shortest-Path Geographic Routing in Sensor Networks

Visibility-Graph-based Shortest-Path Geographic Routing in Sensor Networks Athor manscript, pblished in "INFOCOM 2009 (2009)" Visibility-Graph-based Shortest-Path Geographic Roting in Sensor Networks Gang Tan Marin Bertier Anne-Marie Kermarrec INRIA/IRISA, Rennes, France. Email:

More information

An Adaptive Strategy for Maximizing Throughput in MAC layer Wireless Multicast

An Adaptive Strategy for Maximizing Throughput in MAC layer Wireless Multicast University of Pennsylvania ScholarlyCommons Departmental Papers (ESE) Department of Electrical & Systems Engineering May 24 An Adaptive Strategy for Maximizing Throghpt in MAC layer Wireless Mlticast Prasanna

More information

Hardware Design Tips. Outline

Hardware Design Tips. Outline Hardware Design Tips EE 36 University of Hawaii EE 36 Fall 23 University of Hawaii Otline Verilog: some sbleties Simlators Test Benching Implementing the IPS Actally a simplified 6 bit version EE 36 Fall

More information

Multiple Source Shortest Paths in a Genus g Graph

Multiple Source Shortest Paths in a Genus g Graph Mltiple Sorce Shortest Paths in a Gens g Graph Sergio Cabello Erin W. Chambers Abstract We gie an O(g n log n) algorithm to represent the shortest path tree from all the ertices on a single specified face

More information

3D SURFACE RECONSTRUCTION BASED ON COMBINED ANALYSIS OF REFLECTANCE AND POLARISATION PROPERTIES: A LOCAL APPROACH

3D SURFACE RECONSTRUCTION BASED ON COMBINED ANALYSIS OF REFLECTANCE AND POLARISATION PROPERTIES: A LOCAL APPROACH 3D SURFACE RECONSTRUCTION BASED ON COMBINED ANALYSIS OF REFLECTANCE AND POLARISATION PROPERTIES: A LOCAL APPROACH Pablo d Angelo and Christian Wöhler DaimlerChrysler Research and Technology, Machine Perception

More information

Assigning AS Relationships to Satisfy the Gao-Rexford Conditions

Assigning AS Relationships to Satisfy the Gao-Rexford Conditions Assigning AS Relationships to Satisfy the Gao-Rexford Conditions Lca Cittadini, Giseppe Di Battista, Thomas Erlebach, Marizio Patrignani, and Massimo Rimondini Dept. of Compter Science and Atomation, Roma

More information

The extra single-cycle adders

The extra single-cycle adders lticycle Datapath As an added bons, we can eliminate some of the etra hardware from the single-cycle path. We will restrict orselves to sing each fnctional nit once per cycle, jst like before. Bt since

More information

What do we have so far? Multi-Cycle Datapath

What do we have so far? Multi-Cycle Datapath What do we have so far? lti-cycle Datapath CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instrction being processed in datapath How to lower CPI frther? #1 Lec # 8 Spring2 4-11-2 Pipelining pipelining

More information

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM Lectre (Wed /5/28) Lab # Hardware De Fri Oct 7 HW #2 IPS programming, de Wed Oct 22 idterm Fri Oct 2 IorD The mlticycle path SrcA Today s objectives: icroprogramming Etending the mlti-cycle path lti-cycle

More information

Object Pose from a Single Image

Object Pose from a Single Image Object Pose from a Single Image How Do We See Objects in Depth? Stereo Use differences between images in or left and right eye How mch is this difference for a car at 00 m? Moe or head sideways Or, the

More information

Augmenting the edge connectivity of planar straight line graphs to three

Augmenting the edge connectivity of planar straight line graphs to three Agmenting the edge connectivity of planar straight line graphs to three Marwan Al-Jbeh Mashhood Ishaqe Kristóf Rédei Diane L. Sovaine Csaba D. Tóth Pavel Valtr Abstract We characterize the planar straight

More information

REPLICATION IN BANDWIDTH-SYMMETRIC BITTORRENT NETWORKS. M. Meulpolder, D.H.J. Epema, H.J. Sips

REPLICATION IN BANDWIDTH-SYMMETRIC BITTORRENT NETWORKS. M. Meulpolder, D.H.J. Epema, H.J. Sips REPLICATION IN BANDWIDTH-SYMMETRIC BITTORRENT NETWORKS M. Melpolder, D.H.J. Epema, H.J. Sips Parallel and Distribted Systems Grop Department of Compter Science, Delft University of Technology, the Netherlands

More information

Nash Convergence of Gradient Dynamics in General-Sum Games. Michael Kearns.

Nash Convergence of Gradient Dynamics in General-Sum Games. Michael Kearns. Convergence of Gradient Dynamics in General-Sm Games Satinder Singh AT&T Labs Florham Park, NJ 7932 bavejaresearch.att.com Michael Kearns AT&T Labs Florham Park, NJ 7932 mkearnsresearch.att.com Yishay

More information

Chapter 4: Network Layer

Chapter 4: Network Layer Chapter 4: Introdction (forarding and roting) Reie of qeeing theor Roting algorithms Link state, Distance Vector Roter design and operation IP: Internet Protocol IP4 (datagram format, addressing, ICMP,

More information

Resolving Linkage Anomalies in Extracted Software System Models

Resolving Linkage Anomalies in Extracted Software System Models Resolving Linkage Anomalies in Extracted Software System Models Jingwei W and Richard C. Holt School of Compter Science University of Waterloo Waterloo, Canada j25w, holt @plg.waterloo.ca Abstract Program

More information

Combinatorial and Geometric Properties of Planar Laman Graphs

Combinatorial and Geometric Properties of Planar Laman Graphs Combinatorial and Geometric Properties of Planar Laman Graphs Stephen Koboro 1, Torsten Ueckerdt 2, and Kein Verbeek 3 1 Department of Compter Science, Uniersity of Arizona 2 Department of Applied Mathematics,

More information

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction Review Friday the 2st of October Real world eamples of pipelining? How does pipelining pp inflence instrction latency? How does pipelining inflence instrction throghpt? What are the three types of hazard

More information

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this report, we

More information

Optimization and Translation of Tableau-Proofs. Abstract: Dierent kinds of logical calculi have dierent advantages and disadvantages.

Optimization and Translation of Tableau-Proofs. Abstract: Dierent kinds of logical calculi have dierent advantages and disadvantages. J. Inform. Process. Cybernet. EIK?? (199?), (formerly: Elektron. Inform.verarb. Kybernet.) Optimization and Translation of Tablea-Proofs into Resoltion By Andreas Wolf, Berlin Abstract: Dierent kinds of

More information

ARRAY ANTENNA DIAGNOSTICS WITH THE 3D RECONSTRUCTION ALGORITHM

ARRAY ANTENNA DIAGNOSTICS WITH THE 3D RECONSTRUCTION ALGORITHM ARRAY ANTENNA DIAGNOSTICS WITH THE 3D RECONSTRUCTION ALGORITHM Cecilia Cappellin 1, Peter Meincke 1, Sergey Pinenko 2, Erik Jørgensen 1 1 TICRA, Læderstræde 34, DK-1201 Copenhagen, Denmark 2 DTU Elektro,

More information

Rectangle-of-influence triangulations

Rectangle-of-influence triangulations CCCG 2016, Vancoer, British Colmbia, Ag 3 5, 2016 Rectangle-of-inflence trianglations Therese Biedl Anna Lbi Saeed Mehrabi Sander Verdonschot 1 Backgrond The concept of rectangle-of-inflence (RI) draings

More information

UPWARD PLANAR DRAWING OF SINGLE SOURCE ACYCLIC DIGRAPHS. MICHAEL D. HUTTON y AND ANNA LUBIW z

UPWARD PLANAR DRAWING OF SINGLE SOURCE ACYCLIC DIGRAPHS. MICHAEL D. HUTTON y AND ANNA LUBIW z UPWARD PLANAR DRAWING OF SINGLE SOURCE ACYCLIC DIGRAPHS MICHAEL D. HUTTON y AND ANNA LUBIW z Abstract. An pward plane drawing of a directed acyclic graph is a plane drawing of the digraph in which each

More information

Towards applications based on measuring the orbital angular momentum of light

Towards applications based on measuring the orbital angular momentum of light CHAPTER 8 Towards applications based on measring the orbital anglar momentm of light Efficient measrement of the orbital anglar momentm (OAM) of light has been a longstanding problem in both classical

More information

PART I: Adding Instructions to the Datapath. (2 nd Edition):

PART I: Adding Instructions to the Datapath. (2 nd Edition): EE57 Instrctor: G. Pvvada ===================================================================== Homework #5b De: check on the blackboard =====================================================================

More information

On Bichromatic Triangle Game

On Bichromatic Triangle Game On Bichromatic Triangle Game Gordana Manić Daniel M. Martin Miloš Stojakoić Agst 16, 2012 Abstract We stdy a combinatorial game called Bichromatic Triangle Game, defined as follows. Two players R and B

More information

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind Pipelining hink of sing machines in landry services Chapter 6 nhancing Performance with Pipelining 6 P 7 8 9 A ime ask A B C ot pipelined Assme 3 min. each task wash, dry, fold, store and that separate

More information

Chapter 4: Network Layer. TDTS06 Computer networks. Chapter 4: Network Layer. Network layer. Two Key Network-Layer Functions

Chapter 4: Network Layer. TDTS06 Computer networks. Chapter 4: Network Layer. Network layer. Two Key Network-Layer Functions Chapter : Netork Laer TDTS06 Compter s Lectre : Netork laer II Roting algorithms Jose M. Peña, jospe@ida.li.se ID/DIT, LiU 009-09- Chapter goals: nderstand principles behind laer serices: laer serice models

More information

CS 153 Design of Operating Systems Spring 18

CS 153 Design of Operating Systems Spring 18 CS 153 Design of Operating Systems Spring 18 Lectre 11: Semaphores Instrctor: Chengy Song Slide contribtions from Nael Ab-Ghazaleh, Harsha Madhyvasta and Zhiyn Qian Last time Worked throgh software implementation

More information

arxiv: v3 [math.co] 7 Sep 2018

arxiv: v3 [math.co] 7 Sep 2018 Cts in matchings of 3-connected cbic graphs Kolja Knaer Petr Valicov arxiv:1712.06143v3 [math.co] 7 Sep 2018 September 10, 2018 Abstract We discss conjectres on Hamiltonicity in cbic graphs (Tait, Barnette,

More information

Master for Co-Simulation Using FMI

Master for Co-Simulation Using FMI Master for Co-Simlation Using FMI Jens Bastian Christoph Claß Ssann Wolf Peter Schneider Franhofer Institte for Integrated Circits IIS / Design Atomation Division EAS Zenerstraße 38, 69 Dresden, Germany

More information

Bias of Higher Order Predictive Interpolation for Sub-pixel Registration

Bias of Higher Order Predictive Interpolation for Sub-pixel Registration Bias of Higher Order Predictive Interpolation for Sb-pixel Registration Donald G Bailey Institte of Information Sciences and Technology Massey University Palmerston North, New Zealand D.G.Bailey@massey.ac.nz

More information

Tri-Edge-Connectivity Augmentation for Planar Straight Line Graphs

Tri-Edge-Connectivity Augmentation for Planar Straight Line Graphs Tri-Edge-Connectivity Agmentation for Planar Straight Line Graphs Marwan Al-Jbeh 1, Mashhood Ishaqe 1, Kristóf Rédei 1, Diane L. Sovaine 1, and Csaba D. Tóth 1,2 1 Department of Compter Science, Tfts University,

More information

Pavlin and Daniel D. Corkill. Department of Computer and Information Science University of Massachusetts Amherst, Massachusetts 01003

Pavlin and Daniel D. Corkill. Department of Computer and Information Science University of Massachusetts Amherst, Massachusetts 01003 From: AAAI-84 Proceedings. Copyright 1984, AAAI (www.aaai.org). All rights reserved. SELECTIVE ABSTRACTION OF AI SYSTEM ACTIVITY Jasmina Pavlin and Daniel D. Corkill Department of Compter and Information

More information

Multiple-Choice Test Chapter Golden Section Search Method Optimization COMPLETE SOLUTION SET

Multiple-Choice Test Chapter Golden Section Search Method Optimization COMPLETE SOLUTION SET Mltiple-Choice Test Chapter 09.0 Golden Section Search Method Optimization COMPLETE SOLUTION SET. Which o the ollowing statements is incorrect regarding the Eqal Interval Search and Golden Section Search

More information