FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability

Size: px
Start display at page:

Download "FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability"

Transcription

1 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability Andrew C. Ling, Member, IEEE, Deshanand P. Singh, Member, IEEE, and Stephen D. Brown, Member, IEEE Abstract This work presents a Field-Programmable Gate Array (FPGA) logic synthesis technique based upon Boolean Satisfiability (SAT). This work shows how to map any Boolean function into an arbitrary PLB architecture without any custom decomposition techniques. The authors illustrate several useful applications of this technique by showing how this technique can be used for architecture evaluation and area optimization. When evaluating FPGA architecture, the authors focus on the basic building block of the FPGA which they refer as a programmable logic block (PLB). In order to illustrate the fleibility of their evaluation framework, several unrelated PLB architectures are evaluated in an automated fashion. Furthermore, the authors show that using their technique is able to reduce FPGA resource usage by 27% on average in common subcircuits found in digital design. Inde Terms Design Automation, Field-programmable gate array, Quantified Boolean Satisfiability, Boolean Satisfiability, Logic Synthesis, Resynthesis. I. INTRODUCTION Field-Programmable Gate Arrays (FPGAs) are integrated circuits characterized by two distinct features: programmable logic blocks (PLBs) and programmable interconnect structures. An FPGA consists of groups of PLBs known as clusters which are connected through programmable connection blocks and switch blocks to form a regular array of clusters as shown in Fig. 1. The cluster combined with its associated routing form a tile. Previous work has shown that grouping PLBs into clusters greatly improves the performance of FPGAs since the intra-cluster delay is an order of magnitude less than the inter-cluster delay [1]. Manuscript received November 20, 2005; revised November 30, This work was supported by the IEEE. A. Ling is with the University of Toronto. D. Singh is with Altera Corporation. S. Brown is with Altera Corporation and the University of Toronto.

2 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL PLB Cluster Tile Switch Block I/O pad Routing Tracks Fig. 1. An illustration of an FPGA consisting of a regular array of clustered PLBs. 2-LUT L 1 L 2 L 3 z 1 G L Fig. 2. Programmable Logic Block. An eample of a PLB is shown in Fig. 2. In this eample, the logic block is composed of a 2-input lookup table (2-LUT) that feeds an AND gate. The 2-LUT is capable of implementing any arbitrary Boolean function of 2 variables. Assuming, K is the number of inputs to the LUT, the LUT is implemented with a set of 2 K static RAM (SRAM) bits that are programmed with the truth-table values for the function to be implemented (4 SRAM bits in Fig. 2). The 2 inputs ( 1, 2 ) feed a multipleer that selects the appropriate truth-table value from the SRAM bits. In cases where the PLB only consists of LUTs, we will refer to them as K-LUT architectures. A. Motivation In general, many modern PLBs are based on the K input lookup table (K-LUT). Although the K-LUT is very fleible, it is usually beneficial to add dedicated non-programmable logic to the PLB such as adders or XOR/ANDgates ([2], [3]). These features increase the number of functions that can be implemented by a PLB without the power, speed, and area costs associated with programmable logic. However, since this reduces the fleibility of the PLB, optimally mapping functions to these non-programmable components is difficult. This creates an area penalty which is hard to quantify objectively. The PLB area usage significantly affects the cost of the final circuit implementation on the FPGA. Although

3 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL the FPGA silicon area is dominated by the routing interconnect, the cost of implementing a circuit in an FPGA is directly proportional to the PLB capacity of the FPGA [4]. Since FPGAs are sold in a number of pre-fabricated sizes, decreasing the number of PLBs in the final circuit netlist may allow the circuit to be realized in a smaller FPGA, thereby reducing the cost of the design. Reducing the PLB usage also has a more localized effect since it allows subcircuits to be realized in a smaller number of clusters. This produces a much faster subcircuit since it reduces the number of inter-cluster connections in these subcircuits which are known to dominate FPGA delay [1]. Both the PLB architecture and the technology mapper which converts a gate-level netlist into a netlist of PLBs has a large impact on the final area of the circuit. Bad PLB designs and poor quality technology mappers can lead to very costly circuit implementations with poor performance in the FPGA. Thus, it is important to evaluate PLB architectures and develop high quality technology mappers during FPGA development. In this paper, we present two tools that accomplish both of these goals using a new PLB function mapping approach based on Boolean Satisfiability (SAT). We will illustrate that the main benefit of our technique is its generality where it can be applied to any PLB architecture and requires no custom decomposition techniques. The first tool we present helps quantify the area usage of various PLB architectures. The second tool is a resynthesis technique which is guaranteed to optimally map functions to small subcircuits. During our resynthesis study, we focus on a class of functions known to be non-disjoint where we will show that synthesis and technology mappers have great difficulty solving optimally. Before we introduce our tools, we present some background on the technology mapping problem in Section II. This is followed by a description of the SAT problem and an eplanation on the transformation of PLB function mapping into the SAT problem in Section III. We follow this with a detailed description of our PLB evaluation method and resynthesis technique in Section IV. Finally, we present several results illustrating the generality of our technique in Section V. II. BACKGROUND A. Technology Mapping Technology mapping a circuit description into a netlist of PLBs occurs after logic synthesis. Logic synthesis optimizes a gate-level circuit description through a sequence of technology independent transformations [5] to improve area and delay. In this work, delay is considered proportional to the depth of a circuit where the depth of a node is defined as the longest path from the node to a primary input. A primary input is any node in a circuit with no fanin such as an input pin. The dual to this is a primary output which is any node in a circuit with no fanouts such as an output pin. Technology mapping takes the optimized gate-level netlist and converts it into a netlist of PLBs. Previous work showed that the depth-optimal technology mapping solution can be obtained in polynomial time using a dynamic programming procedure [6]. The disjoint relationship between logic synthesis and technology mapping often leads to technology mapped circuits that are far from optimal. In later sections, we will show methods to resolve this problem through SAT.

4 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL The process of technology mapping is often treated as a covering problem. For eample, consider the process of mapping a circuit into LUTs as illustrated in Fig. 3. Fig. 3a illustrates the initial gate-level netlist, Fig. 3b illustrates a possible covering of the initial netlist using s, and Fig. 3c illustrates the LUT netlist produced by the covering. In the mapping given, the gate labeled is covered by both LUTs and is said to be duplicated. In a duplication-free mapping, each gate in the initial circuit is covered by a single LUT in the mapped circuit [7]. However, surprisingly, the controlled use of duplication can lead to further area savings [8]. In contrast to the depth minimization problem, the area minimization problem was shown to be NP-hard for LUTs of size four and greater ([9], [10]). Thus, solving the area minimization problem requires heuristics. a b c d e a b c d e a b c d e f g f g f g (a) Initial Netlist (b) Possible Covering (c) LUT Mapping Fig. 3. Technology mapping as a covering problem. Another way to look at technology mapping is as a cone selection problem. The subcircuits circled in Fig. 3b are eamples of cones. Technology mapping seeks to find the best set of cones that can be mapped to the current PLB architecture. Best is determined by the optimizing goal such as area, speed, or power. If the FPGA architecture consists solely of K-LUTs, mapping from cones to K-LUTs is a direct process since any cone with K-inputs or less can be implemented in a K-LUT. A cone with K-inputs or less is known to be K-feasible. Thus, to technology map circuits to K-LUTs, the circuit simply has to be decomposed into a set of K-feasible cones. However, if the FPGA architecture consists of generic K-input PLBs, mapping from cones to PLBs is much more difficult since PLBs cannot implement all possible K-feasible cones. For eample, the PLB in Fig. 4 cannot implement a 3-input OR gate. Previous work solved this problem by using two main approaches: A specialized PLB is proposed and a customized mapping algorithm is implemented to map benchmark circuits to the proposed PLB [11]. Functions are decomposed using specialized Boolean matching techniques such that it matched the structure of the PLB [12]. A problem with both of these approaches is that they require specific Boolean techniques to map functions to a given PLB architecture. We solve this problem in a general manner using SAT, allowing our technique to be applicable to any PLB architecture and any Boolean function. Although more limited in functionality, PLBs offer speed, area, and power advantages over fully programmable K-LUTs. In general only a small subset of K-feasible cones will appear in most logic circuits; therefore, as long

5 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL LUT L 1 L 2 L 3 z 1 G L Fig. 4. Eample PLB. as a given PLB architecture captures most cones encountered in real circuits, it will be successful in implementing those circuits. In [12], the authors evaluate PLBs based upon the number of functions a given PLB can implement. We adopt a similar measure whereby we determine the fleibility of a PLB by etracting a set of K-feasible cones from benchmark circuits and determine how many of these cones can fit into the PLB where a high fit percentage is desired. Although, we adopt a similar comparison metric as in [12], no previous work has been done that has been general enough to apply to all the PLB architectures we present later. The PLB fleibility only gives a preliminary estimate on the efficiency of the PLB. To gauge how much area overhead the non-programmable components in the PLB will add to an FPGA, a full area estimate of the FPGA device is necessary. This can be calculated by deriving the number of PLBs required to implement a given circuit in conjunction with an FPGA tile area estimate containing a cluster of PLBs. We present two tools in this paper that do both these tasks. III. BOOLEAN SATISFIABILITY APPLIED TO FPGA FUNCTION MAPPING The following sections provides a brief overview of Boolean satisfiability and Quantified Boolean satisfiability. The informed reader may skip these sections and go on to Section III-B. Boolean Satisfiability (SAT) has gained recent interest, particularly in CAD for digital circuits. The primary reason for this is that several problems that occur in CAD can be represented as a Boolean formula and thus can be solved using SAT. SAT was the first problem shown to be NP-Complete [13, ch.34] and is formally defined as the following: Definition 3.1: The Boolean Satisfiability Problem: Given a Boolean Formula defined on variables 1, 2,..., n, seek an assignment to these variables such that the Boolean formula evaluates to true. If this is possible, the Boolean formula is said to be satisfiable (SAT), otherwise, it is said to be unsatisfiable (UNSAT) For ease of readability, we will use the term SAT to refer to both Boolean Satisfiability and satisfiable when the meaning is obvious from the contet. SAT solvers are tools that seek to solve the SAT problem. Generally, SAT solvers work on Boolean formulae in Conjunctive-Normal-Form (CNF, also known as a Product-of-Sums). A Boolean formula is in CNF if it consists only of a conjunction of clauses, where each clause contains a disjunction of literals and a literal is defined as

6 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL any variable or its complement. In this form, SAT seeks an assignment of variables such that every clause in the Boolean formula has at least one literal evaluating to true. For eample, Fig. 5 gives an illustration of a Boolean formula in CNF and a satisfying assignment. F =( }{{} ) ( 2) ( ) }{{} literal clause 1 = 0, 2 = 1, 3 = 1 Fig. 5. An eample CNF with a satisfying assignment. A. Quantified Boolean Satisfiability F = ( ) ( 2 ) ( ) Fig. 6. An eample QBF. SAT is actually a subset of the much more difficult problem called Quantified Boolean Satisfiability (QSAT). Definition 3.1 still holds for QSAT; however, QSAT is the more general problem of determining if a Quantified Boolean Formula (QBF) is satisfiable or not. A QBF is a Boolean formula where quantifiers are applied to its variables. For eample, Fig. 6 show an eample of a QBF in CNF. A Boolean formula is actually a special case of a QBF where all the variables on a Boolean formula have an implicit eistential quantifier. Quantified Boolean Satisfiability (QSAT) is known to be P-Space Complete [14]. Although not formally proven, P-Space Complete problems are thought to be harder than NP-Complete problem where PSPACE C NP C. An intuitive eplanation of this can be shown through a simple eample. Consider Equation 1, which shows a simple Boolean epression and a possible satisfying assignment. Now consider the same epression but with quantifiers added to its variables shown in Equation 2. The satisfiable assignment to the QBF shown in Equation 2 is much more elaborate than its unquantified counterpart. This simple eample shows that QSAT must eplore a much larger search space to find a satisfiable solution when compared against SAT. B. Transforming FPGA Function Mapping to SAT At its core, mapping digital circuits to FPGA fabric is the process of decomposing a circuit into a set of Boolean functions that map into a netlist of PLBs. In general, mapping Boolean functions into programmable logic is not trivial since general programmable structures with K inputs such as PLBs can only implement a small subset of K input functions. Interestingly, this problem can be represented as a QBF and solved using QSAT where a satisfying

7 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL ( ) ( ) ( ) 1 = 1, 2 = 1, 3 = 0 (1) ( ) ( ) ( ) 1 = 1, 2 = 1, 3 = 0 (2) 1 = 0, 2 = 0, 3 = 0 Fig. 7. A QSAT compleity eample. assignment indicates that the mapping is possible. Furthermore, if satisfiable, QSAT will return the programmable configuration necessary to implement the Boolean function in the given programmable structure. We will show later how to transform QSAT into SAT and use SAT solvers to solve the simplified problem. In order to formulate the Boolean function mapping problem as QSAT, it needs to be formalized as follows: Problem 3.2: A Boolean function, F, with n inputs can be realized in a programmable circuit, G, with m inputs and l programmable bits, where n m, if and only if there eists at least one configuration to the l programmable bits such that G F for all inputs applied in the same manner to G and F. The QBF representation of Problem 3.2 is shown in Fig. 8. To ensure that the inputs to F and G are applied in the same manner, we represent their inputs by the same variables, 1, 2,..., n. L 1,L 2,..., L l represent the programmable bits, and z 1,z 2,..., z o represent any auiliary variables found in the epression G F. The eistence of auiliary variables will be eplained later, and is a side effect of the derivation method we use to construct G F. H = L 1 L 2...L l n z 1 z 2...z o (G F) Fig. 8. QBF representation of Problem 3.2. In order to derive the epression G F, we adapt the circuit characteristic function to accomplish this. For a detailed description on the characteristic function, please refer to [15]. A characteristic function is a Boolean representation of a digital circuit, ψ, in CNF which can be modified to epress G F. The characteristic function describes all consistent inputs, outputs, and intermediate wire vectors of the digital circuit. For eample, consider the OR-gate shown in Fig. 9. Net to the OR-gate is its characteristic function truth-table. The onset of this truthtable represents all cubes that are consistent of an OR-gate such as 1 = 0, 2 = 1,G = 1. Using any standard minimization procedure, the OR-gate characteristic function can be derived as shown in Fig. 9. Characteristic functions for large circuits can be derived from the conjunction of the characteristic functions of

8 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL G F OR F OR = ( G) ( 1 + G) ( 2 + G) G Fig. 9. Deriving a characteristic function for an OR-gate. their basic elements. For eample, consider Fig. 10 which consists of anor-gate fed by a 2:1MUX. To construct the characteristic function of this circuit, we simply conjoin (logical AND) the characteristic functions of the MUX and OR-gate (the derivation of the MUX characteristic function can be accomplished using the technique shown in Fig. 9). The auiliary variables, z i, in Fig. 8 stem from the intermediate wires found in the configuration circuits. This is seen in Fig. 10 with variable z 1. These auiliary variables provide a logical link between the basic characteristic functions to form a unified epression representing the entire configurable circuit z 1 G F MUX-OR =( 1 + z 1 + G) ( 1 + G) (z 1 + G) F OR ( z 1 ) ( z 1 ) F MUX (3) ( z 1 ) ( z 1 ) Fig. 10. Deriving characteristic functions from basic elements. Using the previous procedure, the characteristic function, which we will refer as ψ, can be found for any configurable circuit. Using ψ, we can derive the epression for G F. In order to represent the equivalence operator, all instances of the variable G in ψ are replaced with the epression representing F. This substitution can be represented as ψ[g/f] which will be used in later sections. Going back to original problem formulation, our goal is to find if their eists a configuration to our circuit G such that G F for all inputs applied to G. Thus, as one final step, quantifiers are added to the epression G F to form a final CNF epression representing Problem 3.2 as shown in Fig. 11 where L 1 L 2...L l represent the configuration bits and n represent the inputs to G. 1) Removing Quantifiers on QBF to form SAT: Although function H in Fig. 11 can be solved using QSAT, it is often faster to remove the quantifiers found on a QBF and use SAT solvers. In [16], the author presents a method to remove all quantifiers on a QBF to convert the QSAT problem into SAT. We adopt a similar method in this work to remove the quantifiers in Fig. 11. To do this, first the unquantified epression ψ[g/f] is replicated 2 n times

9 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL H = L 1 L 2...L l n z 1 z 2...z o (ψ[g/f]) Fig. 11. Characteristic function based representation of Problem 3.2. and conjoined together where n is the number of universally quantified variables ( , n ). This is shown in the first line of Equation 4. Net, the universal variables in each replicated epression is replaced with one possible enumeration such that no two replicated epressions have identical enumerations, which is shown in Equation 5. The purpose of this is to eplicitly cover all possible values of the universal variables. In addition to this, variables bound to the innermost eistential quantifier (z 1 z 2...z o ) are replaced with unique variable in each replicated epression. This preserves the meaning of their original eistential quantifier. Finally, the remaining eistential quantifier on the configuration variables L 1 L 2...L l does not have to be eplicitly shown since variables without an eplicit quantifier will implicitly have an eistential quantifier applied to them. The resulting epression then can be passed to a SAT solver where a satisfying assignment implies that F can be realized in G. Ψ =ψ 0 ψ 1 ψ 2... ψ 2 n 1 (4) =ψ 0 [ 1 /0, 2 /0,..., n /0,z 1 /z o+1,..., z o /z 2o ] ψ 1 [ 1 /0,..., n /1,z 1 /z 2o+1,..., z o /z 3o ]... (5) ψ 2 n 1[ 1 /1,..., n /1,z 1 /z (2 n 1)o+1,...,z o /z 2 n o] Fig. 12. Removing the quantifiers on the QBF in Fig F(X) (a) Original Function F(X) (b) New Function. Fig. 13. Permutation eample. 2) Permutable Inputs: In addition to having programmable bits to program the function being implemented, the inputs of programmable circuits are usually permutable. This greatly epands the number of functions a programmable circuit can implement. For eample, consider the simple 3-input function shown in Fig. 13a. By

10 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL changing the inputs 1 and 2 a new function can be realized as shown in Fig. 13b. In fact, a K-input function can be transformed to at most K! 1 other functions by simply permuting its variables in every possible way. The way to model this fleibility in a digital circuit is to add multipleers at the inputs as shown in Fig. 14. These virtual multipleers are etremely versatile in that they can also add restrictions in routing. For eample, assume configuration V 5 V 6 = 11 would feed input 3 into the XOR-gate (z 4 ). In order to prevent this, the clause (V 5 + V 6 ) is conjoined to the PLB characteristic function. 2-LUT L 1 L 2 L 3 z1 G L 4 z z 4 2 z 3 V1 V V 2 3 V4 V5 V Fig. 14. Modeling permutable inputs of a programmable circuit. 3) Function Mapping using SAT Eample: In order to give a better understanding of the previously described concepts, an eample is provided. Taking the circuit shown in Fig. 14, we wish to determine if the Boolean function described in Fig. 13a can be realized in it. The first step is to create the epression ψ[g/f]. This is found by finding the characteristic function of Fig. 14 as shown in Fig. 15. Notice that the characteristic function ψ is created from the conjunction of the basic characteristic function components that form the configurable circuit. Following the construction of ψ, all instances of G need to be replaced by F to create the epression G F as shown in Fig. 16. The epression ψ[g/f] is dependent on the variables representing all inputs, output, configuration bits, and intermediate wire variables. In this form, quantifiers can be added to these variables and the final QBF can be solved using a QBF solver where a satisfying assignment implies that function F can be realized in the configurable circuit. This QBF was shown in Fig. 11 where n, L 1 L 2...L l, and z 1 z 2...z o, represent the inputs, configuration bits, and intermediate wire variables respectively. As an etra step, there is the option to remove all quantifiers and solve the final epression using the SAT method shown previously in Fig. 12. IV. FPGA AREA DRIVEN SAT BASED APPLICATIONS The primary power of the SAT technique shown in the previous section is its generality. There are no restrictions on the type of circuit nor function that it can represent. We demonstrate several algorithms that use our SAT-

11 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL G LUT =(z 3 + z 2 + L 1 + z 1 ) (z 3 + z 2 + L 1 + z 1 ) (z 3 + z 2 + L 2 + z 1 ) (z 3 + z 2 + L 2 + z 1 ) (6) (z 3 + z 2 + L 3 + z 1 ) (z 3 + z 2 + L 3 + z 1 ) (z 3 + z 2 + L 4 + z 1 ) (z 3 + z 2 + L 4 + z 1 ) G XOR =(z 1 + z 4 + G) (z 1 + z 4 + G) (7) (z 1 + z 4 + G) (z 1 + z 4 + G) G V MUX1 =(V 1 + V z 2 ) (V 1 + V z 2 ) (V 1 + V z 2 ) (V 1 + V z 2 ) (8) (V z 2 ) (V z 2 ) G V MUX2 =(V 3 + V z 3 ) (V 3 + V z 3 ) (V 3 + V z 3 ) (V 3 + V z 3 ) (9) (V z 3 ) (V z 3 ) G V MUX3 =(V 5 + V z 4 ) (V 5 + V z 4 ) (V 5 + V z 4 ) (V 5 + V z 4 ) (10) (V z 4 ) (V z 4 ) ψ = G LUT G XOR G V MUX1 G V MUX2 G V MUX3 (11) Fig. 15. Characteristic function of PLB seen in Fig. 14. (G F) = ψ[g/f] (12) Fig. 16. Forming epressions G F. based decision process. These applications can be categorized into PLB evaluation [17] and resynthesis. The PLB evaluation algorithm provides a quantitative area assessment to new PLB architectures, while the resynthesis algorithm helps reduce the final area of the circuit implementation in an FPGA. A. Application to PLB Evaluation In order to evaluate PLB area efficiency, we take two approaches. First, we develop a tool to characterize the fleibility of a PLB. The metric we use to represent PLB fleibility is a fit percentage which is the percentage of cones sampled from various circuits that can be realized in a single PLB. Using this metric, PLBs with a high fit percentage are thought to be more fleible than PLBs with a low fit percentage. This gives a rough indication on how often the non-programmable components are epected to be utilized. Our second approach yields a more conclusive area estimate associated with a given PLB architecture. This approach uses the PLB resource usage required to implement various circuits and FPGA tile area to derive an overall area estimate. The flow of this process is illustrated in Fig. 17. As Fig. 17 shows, in order to derive the PLB usage, a generic PLB technology mapper is necessary. Since area is our primary comparison metric, one requirement for the technology mapper is to be competitive with state-of-the-art technology mappers, yet be general enough to

12 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL PLB Description BlIF Netlist Of Primative Gates PLB Technology Mapping Technology Mapped Netlist MWT Area single FPGA Tile PLB Usage MWT Area Fig. 17. Area estimation flow for a given PLB architecture. map to any PLB architecture. We achieve the competitive requirement by using the same heuristics as the IMap LUT mapper which outperforms all other technology mappers in terms of area; and we achieve the generality requirement by using our SAT-based function mapping technique to map functions to any PLB architecture. This is shown at the top of Fig. 17 where the PLB circuit description and netlist is passed to the PLB technology mapper. The technology mapper uses this description to generate the CNF epression representing the PLB as described previously in Section III-B. Once the PLB usage is derived, we can estimate how much area a technology mapped circuit will consume. Since FPGA area is known to be dominated by transistor area, we use minimum width transistor area [18] as our estimate of the overall area taken by the circuit. This is shown at the bottom of Fig. 17. Minimum width transistor area gives a process independent metric of the number of transistors required to implement a given circuit where larger transistors are counted as several minimum width transistors. USAGE P LB MWT device = CEIL( PLBs per TILE ) MWT Tile Fig. 18. Minimum width transistor (MW T ) counts for smallest device capable of fitting the given circuit. The way we estimate total area from PLB usage and tile area is shown in Fig. 18. MWT Tile is the minimum width transistor estimate of a single tile. USAGE PLB is the number of PLBs required to implement the given circuit. The term CEIL( USAGE P LB PLBs per TILE ) returns the number of tiles required to implement a given circuit using a particular PLB architecture. Finally, MWT device is the area estimate of the smallest FPGA required to implement the technology mapped circuit returned by our generic PLB technology mapper. Using this metric, a fair area comparison of various PLB architectures can be made. 1) PLB Fit Percentage: Fig. 19 shows a high-level overview of our PLB fit percentage algorithm. As stated previously, PLBs that can capture the functionality of most cones found in real circuits are desired since their non-programmable components will not be wasted. In order to help find such PLBs, our tool can be used to return

13 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL a PLB cone fit percentage where a high fit percentage is preferred. This fit percentage is found by etracting a set of cones from a list of circuits, then applying our SAT decision step to remove cones that do not fit in the given architecture as shown in lines 1 and 2 of Fig. 19. By recording the number of cones generated and discarded, a fit percentage for various PLB architectures can be found. 1 X GENERATECONES() 2 Y REMOVENOFITCONES() 3 FitPercent (X Y )/X Fig. 19. An overview of the PLB evaluation algorithm. A version of the algorithm described in [8] is used to generate and store all K-feasible cones in the graph. The K-feasible cones are generated as the graph is traversed in topological order from primary inputs to primary outputs. At every internal node v, new cones are generated by combining the cones at the input nodes. 2) Technology Mapping Using SAT: Our function mapping technique allows us to convert any K-LUT technology mapper into a K-input PLB technology mapper. As stated in Section II-A, technology mapping to LUTs can be considered as a covering problem. The same is true for K-input PLBs; however, because a K-input PLB is not fully programmable, not all K-input cones can fit into the PLB. Thus, when generating cones during the technology mapping phase, cones that do not fit into the given PLB should be discarded. This will leave a set of cones guaranteed to fit into the PLB architecture. 1 GENERATECONES() 2 REMOVENOFITCONES() 3 for i 1 upto MaI 4 TRAVERSEFWD() 5 TRAVERSEBWD() 6 end for 7 CONESTOPLBS() Fig. 20. High-level overview of generic PLB technology mapper algorithm. We base our work on IMap [19], an iterative K-LUT technology mapping algorithm. For a detailed description of IMap please refer to [19], which shows that IMap produces amongst the best area results of any known technology mapper. Here, we have a brief overview of the algorithm where the basic framework for our technology mapper is presented in Fig. 20. First, a call to GENERATECONES generates a set of K-feasible cones for each node in the graph, where K is the input size of the PLB. Net, a call to REMOVENOFITCONES discards all cones that cannot fit into the PLB architecture. This decision process uses SAT as described in the Section III-B. Once a set of valid cones is found, a series of forward and backward graph traversals is started to select the best cover of the graph. The cost of the cover is measured in terms of area and depth. The forward traversal, TRAVERSEFWD, selects a cone for each node, and the backward traversal, TRAVERSEBWD, selects a set of cones to cover the graph. Iteration is beneficial because every backward traversal influences the behavior of the forward traversal that follows it.

14 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL During the forward traversal, the algorithm updates the depth and the area flow for every node and edge encountered. Area flow is a heuristic for estimating the area of the mapping solution below a node or an edge where minimizing it leads to smaller mapping solutions as described in [19]. The definition of area flow is reshown here for convenience. As Fig. 21 shows, iteration is necessary since area flow is influenced by the covering found AreaFlow(v) =1 + i fanin(c v) AreaFlow(i) = AreaFlow(u) fanout(c u ) AreaFlow(i) (13) (14) Fig. 21. Area Flow definition for a node v and edge i. Note that i is an edge that flows from node u to v, C v is a cone selected to cover v, and fanout(c u) is the number of fanouts leaving the cone covering u. in the previous backward traversal (C v ). In the first iteration, where no previous backward traversal has occurred, C v is estimated as the node v itself. Also, fanout(c u ) must be estimated and is taken as the weighted average of the previous iterations where it is initially estimated as fanout(u). A detailed description on this procedure can be found in [19]. At each internal node v, a cone rooted at v is selected to cover v and some of its predecessors in a mapping solution. The quality of the mapping solution is determined by the cone selection procedure. During area-oriented mapping, on the first mapping iteration, the cone with the lowest area flow is selected. If cones have equivalent area-flow, the cone with the lowest depth is selected. During depth-oriented mapping, the first forward traversal establishes the optimal mapping depth, ODepth, which can then be used in subsequent iterations to bound the depth of cones selected at every node. Using the optimal depth and the height of a node v, a bound can be defined on the depth of a cone C v as follows depth(c v ) ODepth height(v). (15) The height of a node or cone is defined as the longest path from that node or cone to a primary output of the circuit. Cones that meet the bound requirement are preferred and among a set of cones that meet the bound requirement, cones with lower area flows are selected. This selection strategy ensures that the mapping solutions will still achieve the optimal depth selected while minimizing area. During the backward traversal, internal nodes of the graph are visited in the reverse topological where a cover of cones is produced. During this traversal, the height(v) of all internal nodes are updated to the height of the cone covering it. This is for use in Equation 15 in the net forward traversal. If v is found in several cones, the largest height is used. Finally, a call to CONESTOPLBS converts the cones selected by the final backward traversal into PLBs. 3) Generating k-feasible Cones: A version of the algorithm described in [8] is used to generate and store all K-feasible cones in the graph. The K-feasible cones are generated as the graph is traversed in topological order from primary inputs to primary outputs. At every internal node v, new cones are generated by combining the

15 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL cones at the input nodes. The original IMap algorithm combined the cones in every possible way. In our work, in order to prune the number of cones eplored, the cone generation algorithm collapses cones if they have no more (k + e) distinct inputs in total (i.e. (l + m 1) (k + e) where l and m are the number of distinct inputs to each cone being collapsed into one). As long as e was set to a sufficiently high number (2 in our eperiments), this heuristic increased the speed of the cone generation process without significantly impacting the quality of the mapping solution. B. Resynthesis In this section, we address the problem of technology mapping where the technology mapper often fails to find an optimal solution for subcircuits. We consider state-of-the-art K-LUT technology mappers publicly available. As stated in Section II-A, technology mapping is a step that follows gate-level synthesis. Furthermore, gate-level synthesis and technology mapping are very disjoint steps. A problem arises when the cost metrics between gatelevel synthesis and technology mapping do not coincide. This is eplored in detail in Section V. To solve the problem between synthesis and technology mapping, we introduce a post technology mapping step that optimally resynthsizes small subcircuits. 1) Subcircuit Resynthesis: Resynthesizing several subcircuits in a sliding window fashion will reduce the overall LUT count of the entire circuit. Since a subcircuit of LUTs forms a cone, the subcircuit resynthesis problem is the function fitting problem as stated previously in Problem 3.2. In this case, the Boolean function is etracted from the subcircuit consisting of X K-input LUTs and then is checked if it can fit into a programmable structure containing less than X K-input LUTs. This check is done using our SAT-based technique. To illustrate this process, consider Fig. 22. The original cone 22a consists of three 2-LUTs which implements a three input function. Since only three inputs enter the cone, it may be possible to resynthesize Fig. 22a into Fig. 22b to save one LUT. To determine if resynthesis from Fig. 22a to 22b is possible, Fig. 22b is converted into a CNF 2-LUT 2-LUT 2-LUT 2-LUT 2-LUT (a) Original Cone (b) Resynthesized Cone Fig. 22. Resynthesis of three-input cone of logic. epression as described in Section III-B and the function etracted from Fig. 22a is tested using SAT to see if it fits into 22b. If the epression is satisfiable, resynthesis can proceed successfully. Unfortunately, resynthesizing subcircuits with more than 6 distinct inputs cannot be used in real-time resynthesis engines due to speed limitations. However, this technique can be used to build a cache of optimal configurations of

16 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL digital logic blocks. This is a similar technique used in [20] where the authors focus on multipleer transformations. In [20], the authors traverse a technology mapped netlist and identify multipleers. Once identified, they replace the multipleer circuit with their cached optimal configuration. This is a linear time operation with respect to the circuit size, and thus will have negligible impact on the running time. Our tool can help etend this process and find the optimal configuration of several types of subcircuits that technology mapping fails to find. V. RESULTS To demonstrate the previously described techniques, we provide several concrete eamples here. We first show results for our PLB evaluation method and follow with results for our area optimization techniques. When evaluating PLBs, we show that the main benefit of our technique is its generality. We prove this using three different approaches: We measure the fleibility of several PLB architectures. We eplore a large set of hardwired PLB configurations in an automated fashion. We incorporating routing features into the PLB evaluation. Each approach emphasizes how our technique can be applied to any PLB without any modification to our PLB technology mapper or evaluation framework. After our PLB evaluation results, we focus on our area optimization technique. Here, we resynthesize several common subcircuits using our sliding window technique. This is followed by a discussion on why synthesis and technology mapping misses the optimal configuration provided by our resynthesis technique. When running our eperiments, we focus on the MCNC benchmark circuit set [21]. The SAT solver used to drive our function mapper was the Chaff solver developed by M. W. Moskewicz et al. [22]. All of our algorithms were built on top of the Berkeley MVSIS project [23]. A. PLB Evaluation 1) Generality of Technique: First, to illustrate the generality of our evaluation algorithm, several unrelated PLB architectures were evaluated. Fig. 23 shows the five different PLB architectures used for evaluation. To derive the fit percentages, approimately 1000 K-input cones were etracted from each circuit sampled, where K was the input size of the PLB. Cones were etracted randomly to generate a large set of unrelated logic functions. Table I summarizes our results. Each column shows a fit percentage per circuit for each respective PLB, and % Fit shows the final fit percentages when considering all the circuits. Note that the cone fit percentage varies wildly for all PLBs depending on the circuit. This shows that PLB usefulness is dependent on the application of the circuit. Interestingly, PLB(b) failed for all circuits ecept the ALU circuit (C2670). A reason for this is because PLB(b) uses an XOR-gate which are very rare in most control circuits and are generally used for arithmetic logic. PLB(e) was only able to fit 9-input cones for a few circuits. This was epected since PLB(e) is a simplified version of a commercial PLB primarily used to implement 5-input functions or a 4:1 MUX, and is rarely used as a general 9-input function generator [24]. In order to obtain a more accurate picture of this PLB s functionality, in

17 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL SRAM (a) (b) (c) (d) (e) Fig. 23. Five PLBs used in evaluation eperiments. TABLE I PLB EVALUATION RESULTS. Circuit a b c d e C e5p clma dalu des i f51m mise mm30a mult16b % Fit addition to generating 9-input functions for PLB(e), 6, 7, and 8-input functions were evaluated. This is shown in Table II. As the numbers show, this PLB looks much more useful when considering a wider range of functions. 2) FPGA Architecture Eploration: Here, we demonstrate how to use our SAT-based technology mapper to give a full area comparison between various PLB architectures. The standard architectures that we used for area comparisons were 4 and 5-LUT based FPGAs. Our goal is to prove the generality of our technique by eploring the resource usage of a wide range of possible PLB structures using our technology mapping algorithm. This is followed by incorporating M W T areas and the routing architecture to get a full comparison. For all eperiments in this section, we optimized our circuits using SIS [25] with script.rugged. These optimized circuits were then passed to our SAT-based technology mapper. Since we wanted to achieve the smallest area results, our technology mapper was tuned to optimize for area while ignoring depth for all circuits. 1 no functions with 8-inputs or more could be found in this circuit

18 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL TABLE II THE PERCENTAGE OF CONES THAT FIT INTO FIG. 23(E). Circuit 6-input 7-input 8-input C e5p clma dalu des i f51m mise mm30a mult16b Total % Fit First, we focus on the highlighted steps shown in Fig. 24. As Fig. 24 illustrates, the PLB description can be PLB Description BlIF Netlist Of Primative Gates PLB Technology Mapping Technology Mapped Netlist MWT Area single FPGA Tile PLB Usage MWT Area Fig. 24. Steps in overall area estimation flow to derive the PLB usage. modified to eplore various PLB architectures. In order to eplore several PLB architectures in an automated fashion, instead of manually creating several PLB structures, we used the PLB shown in Fig. 25 as our PLB. The PLB shown has four distinct inputs consisting of a 3-LUT in conjunction with a 3-input hardwired function. The benefit of this PLB architecture is that there are 2 8 = 256 possible hardwired functions. Each possible hardwired function can be eplored quickly by modeling the hardwired function as a preconfigured 3-LUT which will be common to all PLBs in the FPGA. To illustrate the eploration process, we technology mapped one benchmark circuit to all 256 possible PLB hardwired configurations. The results are illustrated in Table III which summarizes the hardwired SRAM preconfigurations that produced the lowest and highest overhead in terms of PLB usage. Row shows the PLB usage if only s were used. Column Ratio shows the ratio of PLB usage when compared against the architecture. The configurations that produced the lowest PLB usage occurred for the preconfiguration values of 000 and 0FF. This corresponds to the PLBs shown in Fig. 26. The original intent of this eperiment was to show how we could evaluate a wide range of PLBs in an automated fashion. Since

19 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL Hardwired Function 3-LUT Fig input hardwired support function based PLB. TABLE III SUMMARY OF SRAM PRECONFIGURATIONS THAT PRODUCED LOW AND HIGH PLB USAGAGE FOR CIRCUIT EX5P. Bit Mask PLB Usage Ratio (low) 0FF (low) 0F (high) (high) this is just an illustrative study, only one benchmark circuit was used. A more conclusive study, however, would include a wide range of benchmark circuits. Having said that, note that the 000 and 0FF configuration model an AND and OR gate cascade as seen in Fig. 27. Digital logic contains a large degree of AND and OR gates, thus we suspect that other benchmark circuits would yield similar results. Furthermore, the area results we obtain for the PLB models shown in Fig. 27 coincides with industrial findings where AND and OR-gate cascade structures are common in industrial FPGAs such as Altera s Ape20k [2]. Table III clearly shows that there is an associated PLB usage overhead when removing some programmability in the PLB. However, as long as the increase in PLB usage is amortized by the decrease in silicon area of the non-programmable components, the loss in fleibility may be beneficial. To eplore this idea, we focused on the two PLB configurations that produced the lowest area overhead in Table III. Both of these PLB configurations can be realized as a single 4-input PLB using a 2:1 MUX and SRAM bit as shown in Fig. 28a which we refer as 4-MUX-PLB. To estimate the area performance of the 4-MUX-PLB architecture, we technology mapped 182 MCNC benchmark circuits to it and recorded the PLB usage for each circuit. Furthermore, since we want to illustrate that our evaluation tool works for any PLB architecture and we know that the cascade structure of 4-MUX-PLB is similar to industrial

20 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL LUT 3-LUT (a) (b) Fig. 26. PLB configurations that produced the smallest area results when compared against a simple architecture. 3-LUT 3-LUT (a) (b) Fig. 27. Equivalent PLB representation using basic gates. SRAM bit SRAM bit 3-LUT (a) 4-MUX-PLB (b) 5-MUX-PLB Fig. 28. Candidate PLB used in area eperiments. 5-input PLBs, we also compared against the 5-LUT architecture using a 5-input PLB shown in Fig. 28b, referred as 5-MUX-PLB, for technology mapping. After the PLB usage is recorded, we use those numbers to calculate the MWT area and compare the final results. A summary of the PLB usage is shown in Table IV where only the 20 largest circuits of the 182 circuits tested are shown in detail. A geometric mean of PLB usage ratios is shown where both a comparison against s and 5-LUTs is given. Ratio shows the ratio of the PLB architecture against the LUT architecture. GeoMean is the

21 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL TABLE IV MCNC PLB USAGE SUMMARY AND COMPARISON AGAINST AND 5-LUT BASED FPGA ARCHITECTURES. Circuit 4-MUX-PLB s Ratio 5-MUX-PLB 5-LUTs Ratio ape e5p s tseng alu4 1, ape4 1, mise3 1, ape2 1,123 1, spla 1,126 1, seq 1,140 1, e1010 1,226 1, dsip 1, pdc 1,431 1, ,143 1, des 1,577 1, ,124 1, bigkey 1,938 1, , elliptic 3,282 2, ,071 1, clma 3,841 3, ,050 2, frisc 4,001 2, ,252 2, s s GeoMean geometric mean of the Ratio for all 182 circuits tested. Fig. 29 and 30 has a graphical view of the PLB usage overhead for all the circuits where the overhead is calculated as Overhead = Ratio 1. The results show that the 4-MUX-PLB has a 20.1% usage overhead when compared against a architecture, and 5-MUX-PLB has a 10.5% overhead when compared against a 5-LUT architecture. Again, this PLB usage increase may be acceptable if it is amortized by the decrease in MWT area % % % % % 75.00% 50.00% 25.00% 0.00% % % Fig MCNC benchmark circuit PLB usage overhead when comparing the 4-MUX-PLB against the architecture. The Geometric mean of the overhead is 20.0%. Finally, we finish our area estimation demonstration with the steps highlighted in Fig. 31. In these steps, we use the PLB usage counts to find a full area comparison. This requires the minimum width transistor area for an FPGA

FPGA Programmable Logic Block Evaluation using. Quantified Boolean Satisfiability

FPGA Programmable Logic Block Evaluation using. Quantified Boolean Satisfiability FPGA Programmable Logic Block Evaluation using Quantified Boolean Satisfiability Andrew C. Ling, Deshanand P. Singh, and Stephen D. Brown, December 12, 2005 Abstract This paper describes a novel Field

More information

FPGA PLB EVALUATION USING QUANTIFIED BOOLEAN SATISFIABILITY

FPGA PLB EVALUATION USING QUANTIFIED BOOLEAN SATISFIABILITY FPGA PLB EVALUATION USING QUANTIFIED BOOLEAN SATISFIABILITY Andrew C. Ling Electrical and Computer Engineering University o Toronto Toronto, CANADA email: aling@eecg.toronto.edu Deshanand P. Singh, Stephen

More information

Field-Programmable Gate Array Logic Synthesis Using Boolean Satisfiability. Andrew C. Ling

Field-Programmable Gate Array Logic Synthesis Using Boolean Satisfiability. Andrew C. Ling Field-Programmable Gate Array Logic Synthesis Using Boolean Satisfiability by Andrew C. Ling A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate

More information

ABC basics (compilation from different articles)

ABC basics (compilation from different articles) 1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node

More information

Efficient SAT-based Boolean Matching for FPGA Technology Mapping

Efficient SAT-based Boolean Matching for FPGA Technology Mapping Efficient SAT-based Boolean Matching for FPGA Technology Mapping Sean Safarpour, Andreas Veneris Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada {sean, veneris}@eecg.toronto.edu

More information

CS137: Electronic Design Automation

CS137: Electronic Design Automation CS137: Electronic Design Automation Day 4: January 16, 2002 Clustering (LUT Mapping, Delay) Today How do we map to LUTs? What happens when delay dominates? Lessons for non-luts for delay-oriented partitioning

More information

Boolean Representations and Combinatorial Equivalence

Boolean Representations and Combinatorial Equivalence Chapter 2 Boolean Representations and Combinatorial Equivalence This chapter introduces different representations of Boolean functions. It then discusses the applications of these representations for proving

More information

A Boolean Paradigm in Multi-Valued Logic Synthesis

A Boolean Paradigm in Multi-Valued Logic Synthesis A Boolean Paradigm in Multi-Valued Logic Synthesis Abstract Alan Mishchenko Department of ECE Portland State University alanmi@ece.pd.edu Optimization algorithms used in binary multi-level logic synthesis,

More information

Managing Don t Cares in Boolean Satisfiability

Managing Don t Cares in Boolean Satisfiability Managing on t Cares in Boolean Satisfiability Sean Safarpour 1 ndreas eneris 1,2 Rolf rechsler 3 Joanne ee 1 bstract dvances in Boolean satisfiability solvers have popularized their use in many of today

More information

Versatile SAT-based Remapping for Standard Cells

Versatile SAT-based Remapping for Standard Cells Versatile SAT-based Remapping for Standard Cells Alan Mishchenko Robert Brayton Department of EECS, UC Berkeley {alanmi, brayton@berkeley.edu Thierry Besson Sriram Govindarajan Harm Arts Paul van Besouw

More information

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this

More information

ESE535: Electronic Design Automation. Today. LUT Mapping. Simplifying Structure. Preclass: Cover in 4-LUT? Preclass: Cover in 4-LUT?

ESE535: Electronic Design Automation. Today. LUT Mapping. Simplifying Structure. Preclass: Cover in 4-LUT? Preclass: Cover in 4-LUT? ESE55: Electronic Design Automation Day 7: February, 0 Clustering (LUT Mapping, Delay) Today How do we map to LUTs What happens when IO dominates Delay dominates Lessons for non-luts for delay-oriented

More information

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric. SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA

More information

Factor Cuts. Satrajit Chatterjee Alan Mishchenko Robert Brayton ABSTRACT

Factor Cuts. Satrajit Chatterjee Alan Mishchenko Robert Brayton ABSTRACT Factor Cuts Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu ABSTRACT Enumeration of bounded size cuts is an important

More information

Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains

Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains Hadi Parandeh-Afshar hadi.parandehafshar@epfl.ch Philip Brisk philip@cs.ucr.edu Grace Zgheib grace.zgheib@lau.edu.lb Paolo Ienne

More information

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this report, we

More information

Watching Clauses in Quantified Boolean Formulae

Watching Clauses in Quantified Boolean Formulae Watching Clauses in Quantified Boolean Formulae Andrew G D Rowley University of St. Andrews, Fife, Scotland agdr@dcs.st-and.ac.uk Abstract. I present a way to speed up the detection of pure literals and

More information

Figure 1. PLA-Style Logic Block. P Product terms. I Inputs

Figure 1. PLA-Style Logic Block. P Product terms. I Inputs Technology Mapping for Large Complex PLDs Jason Helge Anderson and Stephen Dean Brown Department of Electrical and Computer Engineering University of Toronto 10 King s College Road Toronto, Ontario, Canada

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices

Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Deshanand P. Singh Altera Corporation dsingh@altera.com Terry P. Borer Altera Corporation tborer@altera.com

More information

Quick Look under the Hood of ABC

Quick Look under the Hood of ABC Quick Look under the Hood of ABC A Programmer s Manual December 25, 2006 Network ABC is similar to SIS/MVSIS in that it processes the design by applying a sequence of transformations to the current network,

More information

On Resolution Proofs for Combinational Equivalence Checking

On Resolution Proofs for Combinational Equivalence Checking On Resolution Proofs for Combinational Equivalence Checking Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu Andreas Kuehlmann

More information

Improvements to Technology Mapping for LUT-Based FPGAs

Improvements to Technology Mapping for LUT-Based FPGAs Improvements to Technology Mapping for LUT-Based FPGAs Alan Mishchenko Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, satrajit, brayton}@eecs.berkeley.edu

More information

Optimized Implementation of Logic Functions

Optimized Implementation of Logic Functions June 25, 22 9:7 vra235_ch4 Sheet number Page number 49 black chapter 4 Optimized Implementation of Logic Functions 4. Nc3xe4, Nb8 d7 49 June 25, 22 9:7 vra235_ch4 Sheet number 2 Page number 5 black 5 CHAPTER

More information

Generating efficient libraries for use in FPGA re-synthesis algorithms

Generating efficient libraries for use in FPGA re-synthesis algorithms Generating efficient libraries for use in FPGA re-synthesis algorithms Andrew Kennings, University of Waterloo Alan Mishchenko, UC Berkeley Kristofer Vorwerk, Val Pevzner, Arun Kundu, Actel Corporation

More information

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada, V6T

More information

Vdd Programmability to Reduce FPGA Interconnect Power

Vdd Programmability to Reduce FPGA Interconnect Power Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

Boolean Matching for Complex PLBs in LUT-based FPGAs with Application to Architecture Evaluation. Jason Cong and Yean-Yow Hwang

Boolean Matching for Complex PLBs in LUT-based FPGAs with Application to Architecture Evaluation. Jason Cong and Yean-Yow Hwang Boolean Matching for Complex PLBs in LUT-based PAs with Application to Architecture Evaluation Jason Cong and Yean-Yow wang Department of Computer Science University of California, Los Angeles {cong, yeanyow}@cs.ucla.edu

More information

A Routing Approach to Reduce Glitches in Low Power FPGAs

A Routing Approach to Reduce Glitches in Low Power FPGAs A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin Wong Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign This research

More information

A Satisfiability Procedure for Quantified Boolean Formulae

A Satisfiability Procedure for Quantified Boolean Formulae A Satisfiability Procedure for Quantified Boolean Formulae David A. Plaisted Computer Science Department, University of North Carolina Chapel Hill, NC 27599-3175, USA Armin Biere ETH Zürich, Computer Systems

More information

TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS

TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS Zoltan Baruch E-mail: Zoltan.Baruch@cs.utcluj.ro Octavian Creţ E-mail: Octavian.Cret@cs.utcluj.ro Kalman Pusztai E-mail: Kalman.Pusztai@cs.utcluj.ro Computer

More information

1/28/2013. Synthesis. The Y-diagram Revisited. Structural Behavioral. More abstract designs Physical. CAD for VLSI 2

1/28/2013. Synthesis. The Y-diagram Revisited. Structural Behavioral. More abstract designs Physical. CAD for VLSI 2 Synthesis The Y-diagram Revisited Structural Behavioral More abstract designs Physical CAD for VLSI 2 1 Structural Synthesis Behavioral Physical CAD for VLSI 3 Structural Processor Memory Bus Behavioral

More information

Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures

Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures by Daniele G Paladino A thesis submitted in conformity with the requirements for the degree of Master of Applied

More information

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors RPack: Rability-Driven packing for cluster-based FPGAs E. Bozorgzadeh S. Ogrenci-Memik M. Sarrafzadeh Computer Science Department Department ofece Computer Science Department UCLA Northwestern University

More information

FAST time-to-market, steadily decreasing cost, and

FAST time-to-market, steadily decreasing cost, and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 10, OCTOBER 2004 1015 Power Estimation Techniques for FPGAs Jason H. Anderson, Student Member, IEEE, and Farid N. Najm, Fellow,

More information

Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays

Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada,

More information

QuteSat. A Robust Circuit-Based SAT Solver for Complex Circuit Structure. Chung-Yang (Ric) Huang National Taiwan University

QuteSat. A Robust Circuit-Based SAT Solver for Complex Circuit Structure. Chung-Yang (Ric) Huang National Taiwan University QuteSat A Robust Circuit-Based SAT Solver for Complex Circuit Structure Chung-Yang (Ric) Huang National Taiwan University To appear: DATE 27 2/1/27 Fact Sheet (Background) Boolean Satisfiability (SAT)

More information

SAT-Based Area Recovery in Technology Mapping

SAT-Based Area Recovery in Technology Mapping SAT-Based Area Recovery in Technology Mapping Bruno Schmitt Ecole Polytechnique Federale de Lausanne (EPFL) bruno@oschmitt.com Alan Mishchenko Robert Brayton Department of EECS, UC Berkeley {alanmi, brayton}@berkeley.edu

More information

Hybrid LUT/Multiplexer FPGA Logic Architectures

Hybrid LUT/Multiplexer FPGA Logic Architectures Hybrid LUT/Multiplexer FPGA Logic Architectures Abstract: Hybrid configurable logic block architectures for field-programmable gate arrays that contain a mixture of lookup tables and hardened multiplexers

More information

Delay Estimation for Technology Independent Synthesis

Delay Estimation for Technology Independent Synthesis Delay Estimation for Technology Independent Synthesis Yutaka TAMIYA FUJITSU LABORATORIES LTD. 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, JAPAN, 211-88 Tel: +81-44-754-2663 Fax: +81-44-754-2664 E-mail:

More information

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA A Path Based Algorithm for Timing Driven Logic Replication in FPGA By Giancarlo Beraudo B.S., Politecnico di Torino, Torino, 2001 THESIS Submitted as partial fulfillment of the requirements for the degree

More information

Designing Heterogeneous FPGAs with Multiple SBs *

Designing Heterogeneous FPGAs with Multiple SBs * Designing Heterogeneous FPGAs with Multiple SBs * K. Siozios, S. Mamagkakis, D. Soudris, and A. Thanailakis VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus

More information

Unit 8: Coping with NP-Completeness. Complexity classes Reducibility and NP-completeness proofs Coping with NP-complete problems. Y.-W.

Unit 8: Coping with NP-Completeness. Complexity classes Reducibility and NP-completeness proofs Coping with NP-complete problems. Y.-W. : Coping with NP-Completeness Course contents: Complexity classes Reducibility and NP-completeness proofs Coping with NP-complete problems Reading: Chapter 34 Chapter 35.1, 35.2 Y.-W. Chang 1 Complexity

More information

Coloring 3-Colorable Graphs

Coloring 3-Colorable Graphs Coloring -Colorable Graphs Charles Jin April, 015 1 Introduction Graph coloring in general is an etremely easy-to-understand yet powerful tool. It has wide-ranging applications from register allocation

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES

SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES Alexander (Sandy) Marquardt, Vaughn Betz, and Jonathan Rose Right Track CAD Corp. #313-72 Spadina Ave. Toronto, ON, Canada M5S 2T9 {arm, vaughn,

More information

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction 44.1 Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA

More information

Local Two-Level And-Inverter Graph Minimization without Blowup

Local Two-Level And-Inverter Graph Minimization without Blowup Local Two-Level And-Inverter Graph Minimization without Blowup Robert Brummayer and Armin Biere Institute for Formal Models and Verification Johannes Kepler University Linz, Austria {robert.brummayer,

More information

ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007

ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007 ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007 This examination is a three hour exam. All questions carry the same weight. Answer all of the following six questions.

More information

UC Berkeley College of Engineering, EECS Department CS61C: Combinational Logic Blocks

UC Berkeley College of Engineering, EECS Department CS61C: Combinational Logic Blocks 2 Wawrzynek, Garcia 2004 c UCB UC Berkeley College of Engineering, EECS Department CS61C: Combinational Logic Blocks 1 Introduction Original document by J. Wawrzynek (2003-11-15) Revised by Chris Sears

More information

UC Berkeley College of Engineering, EECS Department CS61C: Combinational Logic Blocks

UC Berkeley College of Engineering, EECS Department CS61C: Combinational Logic Blocks UC Berkeley College of Engineering, EECS Department CS61C: Combinational Logic Blocks Original document by J. Wawrzynek (2003-11-15) Revised by Chris Sears and Dan Garcia (2004-04-26) 1 Introduction Last

More information

SAT-Based Logic Optimization and Resynthesis

SAT-Based Logic Optimization and Resynthesis SAT-Based Logic Optimization and Resynthesis Alan Mishchenko Robert Brayton Jie-Hong Roland Jiang Stephen Jang Department of EECS Department of EE Xilinx Inc. University of California, Berkeley National

More information

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs Harrys Sidiropoulos, Kostas Siozios and Dimitrios Soudris School of Electrical & Computer Engineering National

More information

Unit 4: Formal Verification

Unit 4: Formal Verification Course contents Unit 4: Formal Verification Logic synthesis basics Binary-decision diagram (BDD) Verification Logic optimization Technology mapping Readings Chapter 11 Unit 4 1 Logic Synthesis & Verification

More information

Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization

Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization Zhiru Zhang School of ECE, Cornell University September 29, 2017 @ EPFL A Case Study on Digit Recognition bit6 popcount(bit49 digit)

More information

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989 University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science October 1989 P Is Not Equal to NP Jon Freeman University of Pennsylvania Follow this and

More information

On Resolution Proofs for Combinational Equivalence

On Resolution Proofs for Combinational Equivalence 33.4 On Resolution Proofs for Combinational Equivalence Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu Andreas Kuehlmann

More information

ESE535: Electronic Design Automation CNF. Today CNF. 3-SAT Universal. Problem (A+B+/C)*(/B+D)*(C+/A+/E)

ESE535: Electronic Design Automation CNF. Today CNF. 3-SAT Universal. Problem (A+B+/C)*(/B+D)*(C+/A+/E) ESE535: Electronic Design Automation CNF Day 21: April 21, 2008 Modern SAT Solvers ({z}chaff, GRASP,miniSAT) Conjunctive Normal Form Logical AND of a set of clauses Product of sums Clauses: logical OR

More information

Software Engineering 2DA4. Slides 2: Introduction to Logic Circuits

Software Engineering 2DA4. Slides 2: Introduction to Logic Circuits Software Engineering 2DA4 Slides 2: Introduction to Logic Circuits Dr. Ryan Leduc Department of Computing and Software McMaster University Material based on S. Brown and Z. Vranesic, Fundamentals of Digital

More information

Blocked Literals are Universal

Blocked Literals are Universal Blocked Literals are Universal Marijn J.H. Heule 1, Martina Seidl 2, and Armin Biere 2 1 Department of Computer Science, The University of Texas at Austin, USA marijn@cs.utexas.edu 2 Institute for Formal

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

An Efficient Framework of Using Various Decomposition Methods to Synthesize LUT Networks and Its Evaluation

An Efficient Framework of Using Various Decomposition Methods to Synthesize LUT Networks and Its Evaluation An Efficient Framework of Using Various Decomposition Methods to Synthesize LUT Networks and Its Evaluation Shigeru Yamashita Hiroshi Sawada Akira Nagoya NTT Communication Science Laboratories 2-4, Hikaridai,

More information

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS C H A P T E R 6 DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS OUTLINE 6- Binary Addition 6-2 Representing Signed Numbers 6-3 Addition in the 2 s- Complement System 6-4 Subtraction in the 2 s- Complement

More information

Software Optimization Using Hardware Synthesis Techniques Bret Victor,

Software Optimization Using Hardware Synthesis Techniques Bret Victor, EE 219B LOGIC SYNTHESIS, MAY 2000 Software Optimization Using Hardware Synthesis Techniques Bret Victor, bret@eecs.berkeley.edu Abstract Although a myriad of techniques exist in the hardware design domain

More information

EECS 219C: Formal Methods Boolean Satisfiability Solving. Sanjit A. Seshia EECS, UC Berkeley

EECS 219C: Formal Methods Boolean Satisfiability Solving. Sanjit A. Seshia EECS, UC Berkeley EECS 219C: Formal Methods Boolean Satisfiability Solving Sanjit A. Seshia EECS, UC Berkeley The Boolean Satisfiability Problem (SAT) Given: A Boolean formula F(x 1, x 2, x 3,, x n ) Can F evaluate to 1

More information

PAPER Accelerating Boolean Matching Using Bloom Filter

PAPER Accelerating Boolean Matching Using Bloom Filter IEICE TRANS. FUNDAMENTALS, VOL.E93 A, NO.10 OCTOBER 2010 1775 PAPER Accelerating Boolean Matching Using Bloom Filter Chun ZHANG, Member,YuHU, Lingli WANG a),leihe b), and Jiarong TONG, Nonmembers SUMMARY

More information

Combinational and Sequential Mapping with Priority Cuts

Combinational and Sequential Mapping with Priority Cuts Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, smcho, satrajit, brayton@eecs.berkeley.edu

More information

A Toolbox for Counter-Example Analysis and Optimization

A Toolbox for Counter-Example Analysis and Optimization A Toolbox for Counter-Example Analysis and Optimization Alan Mishchenko Niklas Een Robert Brayton Department of EECS, University of California, Berkeley {alanmi, een, brayton}@eecs.berkeley.edu Abstract

More information

Functional extension of structural logic optimization techniques

Functional extension of structural logic optimization techniques Functional extension of structural logic optimization techniques J. A. Espejo, L. Entrena, E. San Millán, E. Olías Universidad Carlos III de Madrid # e-mail: { ppespejo, entrena, quique, olias}@ing.uc3m.es

More information

Boolean Algebra. BME208 Logic Circuits Yalçın İŞLER

Boolean Algebra. BME208 Logic Circuits Yalçın İŞLER Boolean Algebra BME28 Logic Circuits Yalçın İŞLER islerya@yahoo.com http://me.islerya.com 5 Boolean Algebra /2 A set of elements B There exist at least two elements x, y B s. t. x y Binary operators: +

More information

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs . FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles,

More information

Place and Route for FPGAs

Place and Route for FPGAs Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming

More information

Combinational Equivalence Checking Using Incremental SAT Solving, Output Ordering, and Resets

Combinational Equivalence Checking Using Incremental SAT Solving, Output Ordering, and Resets ASP-DAC 2007 Yokohama Combinational Equivalence Checking Using Incremental SAT Solving, Output ing, and Resets Stefan Disch Christoph Scholl Outline Motivation Preliminaries Our Approach Output ing Heuristics

More information

Circuit versus CNF Reasoning for Equivalence Checking

Circuit versus CNF Reasoning for Equivalence Checking Circuit versus CNF Reasoning for Equivalence Checking Armin Biere Institute for Formal Models and Verification Johannes Kepler University Linz, Austria Equivalence Checking Workshop 25 Madonna di Campiglio,

More information

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010 Parallelizing FPGA Technology Mapping using GPUs Doris Chen Deshanand Singh Aug 31 st, 2010 Motivation: Compile Time In last 12 years: 110x increase in FPGA Logic, 23x increase in CPU speed, 4.8x gap Question:

More information

IMPROVING LOGIC DENSITY THROUGH SYNTHESIS-INSPIRED ARCHITECTURE Jason H. Anderson

IMPROVING LOGIC DENSITY THROUGH SYNTHESIS-INSPIRED ARCHITECTURE Jason H. Anderson IMPROVING LOGIC DENITY THROUGH YNTHEI-INPIRED ARCHITECTURE Jason H. Anderson Dept. of ECE, Univ. of Toronto Toronto, ON Canada email: janders@eecg.toronto.edu ABTRACT We leverage properties of the logic

More information

Binary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x

Binary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x Binary recursion Unate unctions! Theorem I a cover C() is unate in,, then is unate in.! Theorem I is unate in,, then every prime implicant o is unate in. Why are unate unctions so special?! Special Boolean

More information

Polynomial SAT-Solver Algorithm Explanation

Polynomial SAT-Solver Algorithm Explanation 1 Polynomial SAT-Solver Algorithm Explanation by Matthias Mueller (a.k.a. Louis Coder) louis@louis-coder.com Explanation Version 1.0 - December 1, 2013 Abstract This document describes an algorithm that

More information

Detailed Router for 3D FPGA using Sequential and Simultaneous Approach

Detailed Router for 3D FPGA using Sequential and Simultaneous Approach Detailed Router for 3D FPGA using Sequential and Simultaneous Approach Ashokkumar A, Dr. Niranjan N Chiplunkar, Vinay S Abstract The Auction Based methodology for routing of 3D FPGA (Field Programmable

More information

Majority Logic Representation and Satisfiability

Majority Logic Representation and Satisfiability Majority Logic Representation and Satisfiability Luca Amarú, Pierre-Emmanuel Gaillardon, Giovanni De Micheli Integrated Systems Laboratory (LSI), EPFL, Switzerland Abstract Majority logic is a powerful

More information

Conclusions and Future Work. We introduce a new method for dealing with the shortage of quality benchmark circuits

Conclusions and Future Work. We introduce a new method for dealing with the shortage of quality benchmark circuits Chapter 7 Conclusions and Future Work 7.1 Thesis Summary. In this thesis we make new inroads into the understanding of digital circuits as graphs. We introduce a new method for dealing with the shortage

More information

Learning Techniques for Pseudo-Boolean Solving and Optimization

Learning Techniques for Pseudo-Boolean Solving and Optimization Learning Techniques for Pseudo-Boolean Solving and Optimization José Faustino Fragoso Fremenin dos Santos September 29, 2008 Abstract The extension of conflict-based learning from Propositional Satisfiability

More information

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong Computer Science Department University of California, Los Angeles {demingc, cong}@cs.ucla.edu ABSTRACT

More information

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Jason Cong and Yean-Yow Hwang Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this paper, we

More information

Architecture and Synthesis of. Field-Programmable Gate Arrays with. Hard-wired Connections. Kevin Charles Kenton Chung

Architecture and Synthesis of. Field-Programmable Gate Arrays with. Hard-wired Connections. Kevin Charles Kenton Chung Architecture and Synthesis of Field-Programmable Gate Arrays with Hard-wired Connections by Kevin Charles Kenton Chung A thesis submitted in conformity with the requirements for the Degree of Doctor of

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Statistical Timing Analysis Using Bounds and Selective Enumeration

Statistical Timing Analysis Using Bounds and Selective Enumeration IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 9, SEPTEMBER 2003 1243 Statistical Timing Analysis Using Bounds and Selective Enumeration Aseem Agarwal, Student

More information

Provably Optimal Test Cube Generation using Quantified Boolean Formula Solving

Provably Optimal Test Cube Generation using Quantified Boolean Formula Solving Provably Optimal Test Cube Generation using Quantified Boolean Formula Solving ASP-DAC 2013 Albert-Ludwigs-Universität Freiburg Matthias Sauer, Sven Reimer, Ilia Polian, Tobias Schubert, Bernd Becker Chair

More information

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809 PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA Laurent Lemarchand Informatique ubo University{ bp 809 f-29285, Brest { France lemarch@univ-brest.fr ea 2215, D pt ABSTRACT An ecient distributed

More information

CS-E3200 Discrete Models and Search

CS-E3200 Discrete Models and Search Shahab Tasharrofi Department of Information and Computer Science, Aalto University Lecture 7: Complete and local search methods for SAT Outline Algorithms for solving Boolean satisfiability problems Complete

More information

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are: Problem 1: CLD2 Problems. (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are: C 0 = A + BD + C + BD C 1 = A + CD + CD + B C 2 = A + B + C + D C 3 = BD + CD + BCD + BC C 4

More information

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture Robert S. French April 5, 1989 Abstract Computational origami is a parallel-processing concept in which

More information

Polynomial Exact-3-SAT Solving Algorithm

Polynomial Exact-3-SAT Solving Algorithm Polynomial Eact-3-SAT Solving Algorithm Louis Coder louis@louis-coder.com December 0 Abstract In this document I want to introduce and eplain an algorithm that determines the solvability state (solvable

More information

IN general setting, a combinatorial network is

IN general setting, a combinatorial network is JOURNAL OF L A TEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 1 Clustering without replication: approximation and inapproximability Zola Donovan, Vahan Mkrtchyan, and K. Subramani, arxiv:1412.4051v1 [cs.ds]

More information

SIS: A System for Sequential Circuit Synthesis

SIS: A System for Sequential Circuit Synthesis SIS: A System for Sequential Circuit Synthesis Electronics Research Laboratory Memorandum No. UCB/ERL M92/41 Ellen M. Sentovich Kanwar Jit Singh Luciano Lavagno Cho Moon Rajeev Murgai Alexander Saldanha

More information

Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits Synthesis and Optimization of Digital Circuits Dr. Travis Doom Wright State University Computer Science and Engineering Outline Introduction Microelectronics Micro economics What is design? Techniques

More information

Exploring Logic Block Granularity for Regular Fabrics

Exploring Logic Block Granularity for Regular Fabrics 1530-1591/04 $20.00 (c) 2004 IEEE Exploring Logic Block Granularity for Regular Fabrics A. Koorapaty, V. Kheterpal, P. Gopalakrishnan, M. Fu, L. Pileggi {aneeshk, vkheterp, pgopalak, mfu, pileggi}@ece.cmu.edu

More information

1 Definition of Reduction

1 Definition of Reduction 1 Definition of Reduction Problem A is reducible, or more technically Turing reducible, to problem B, denoted A B if there a main program M to solve problem A that lacks only a procedure to solve problem

More information