Multi-way Netlist Partitioning into Heterogeneous FPGAs and Minimization of Total Device Cost and Interconnect

Size: px
Start display at page:

Download "Multi-way Netlist Partitioning into Heterogeneous FPGAs and Minimization of Total Device Cost and Interconnect"

Transcription

1 Multi-way Netlist Partitioning into Heterogeneous FPGAs and Minimization of Total Device Cost and Interconnect Roman Kužnar, Franc Brglez 2, Baldomir Zajc Department of ECE, Tržaška 25, University of Ljubljana, 6 Ljubljana, Slovenia 2 CBL, Dept. of Elec. & Computer Eng., North Carolina State University, Raleigh, N.C , U.S.A. Abstract This paper considers the problem of partitioning a large logic circuit into a collection of subcircuits each of which is implemented with a device from a specific (FPGA) library. The objective function that we minimize is not only the total cost of devices to be used in the partition but also the size of the interconnect between the devices. We introduce the concept of functional replication and a unified cost model for min-cut partitioning with replication. A prototype implementation demonstrates the feasibility of the approach, based on experimental results with a set of large benchmark circuits. I. Introduction FPGAs are widely used in many applications []. Large designs cannot be implemented with FPGAs unless they are partitioned into smaller subcircuits. Optimizing a set of large design specifications will in general require partitioning into multiple FPGAs of varying sizes and types. Different sizes and types of devices can be combined to reduce the design cost and achieve a better performance for the entire design. A survey of partitioning techniques related to physical design problems and a comprehensive list of references on the subject can be found in [2]. Except for [3], [4], none of the recent publications on partitioning, e.g. [5], [6], [7], [8], [9], [], incorporate sufficient constraints to specifically address the problem of FPGA partitioning. In this paper we extend the formulation of the partitioning problem in [3]: Find a feasible k way partition with the minimum cost $ k, where q $ k = d i n i () i= with d i representing the unit cost of each device type, n i the number of devices of type i to be used in the k way partition, and the number of partitions, k = q i= n i.a partition P j is called feasible if it fits the size and the terminal constraints of a specific FPGA library. If all FPGA devices in the implementation are of the same type, the partitioning problem is reduced to finding the minimum number k of subsets that all meet the same size and terminal constraints. An example of a library, from [] and used in [3], is shown in Table I. Each device D i in the FPGA library is described with five parameters, D i =(c i,t i,d i,l i,u i ), representing the number of elementary circuit units contained in the device, the number of terminals, the price, Roman Kužnar was supported in part by Slovenian Ministry of Research and Technology under grant S /535/93. Franc Brglez was supported in part by a grant from the Semiconductor Research Corporation (SRC). Xilinx Inc. provided the XACT toolset to verify routability of each benchmark partition. and the lower and upper bounds on the utilization of elementary circuit units. The circuit unit utilization is the ratio of the number of elementary circuit units assigned to a subcircuit which is to be implemented with device D i, to its capacity, c i. For Xilinx based devices, c i represents the number of configurable logic blocks (CLBs), and t i represents the number of input output blocks (IOBs). TABLE I A subset of the Xilinx XC3 device library. Device c i t i d i l i u i CLB cost (CLB) (IOB) (N$) d i /c i XC32x-x XC33x-x XC342x-x XC364x-x XC39x-x We extend the formulation of the partitioning problem defined above as follows: Find a feasible k way partition with the minimum cost as defined in () and the minimum interconnect between the partitions. By defining t Pj as the number of terminals used in the partition P j and a measure of interconnect as the average utilization k of input output blocks (IOBs) in a given k way partition, we can compare solutions in this paper directly with the solutions in [3]: k q k = t Pj / t i n i (2) j= Our approach to minimizing the measure of interconnect in (2) is based on introducing module replication at each step of the bipartitioning process as implemented in [3]. As pointed out in [2], [3], [4], replication can reduce the size of the min-cut in a bipartition. The min-cut replication algorithm proposed in [4] is applicable to graphs with no constraints on the sizes of partitions. After technology mapping, the number of inputs of a mapped cell increases relative to the number of output pins, seriously limiting the benefits of the traditional replication. In this paper, we propose an effective approach to reducing the size of the min-cut in a bipartition of a hypergraph. Introducing the concept of functional replication, we significantly increase the potential of reducing the number of nets in the cut set. We show that we can remove not only the nets connected to the output pins of a replicated cell but also the nets connected to cell input pins in a large number of replicated cells. Compared to results in [3], we report significant reductions in interconnect while also consistently reducing the total device cost. 3 ST ACM/IEEE Design Automation Conference Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying it is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 994 ACM /94/ i=

2 a b c II. Functional Replication Following key definitions, we review the role of traditional replication, leading to the concept of functional replication and replication potential. The latter provides the basis to generate a unique distribution for a large set of benchmark circuits, leading to an effective implementation of the replication-based bipartitioning algorithm. We use and extend the notation introduced in [3]. Hypergraph model of a circuit. The circuit partitioning problem addresses implementation of a digital circuit as a collection of subcircuits, each of which can be implemented as a single FPGA. We model the circuit as a hypergraph H =({X ; Y }, E ), where X and Y denote respectively the interior and terminal node sets, X Y =,ande is the set of nets. Whenever appropriate and for simplicity, we may interchangeably refer to the interior node sets as cells or modules, and the terminalnodesetsasi/os or IOBs. Ak way partition of H implies an assignment of the nodes in X and Y to a set of k non overlapping hypergraphs P j =({X j ; Y j }, E j ), j X j Y j =. When partitioning is performed without replication, each of the interior nodes of the original hypergraph is assigned to the interior node set of exactly one component hypergraph, thus k j= X j = X. When partitioning is performed with replication, some of the interior nodes of the original hypergraph are assigned to the interior node set of more than one component hypergraph, thus k j= X j X. The apparent increase in the size of each partition is expected to be absorbed within each device implementing this partition the benefit of replication is measured in terms of reduced interconnect between partitions. The question that arises is which and how many of the interior nodes should be replicated to minimize the interconnect. Traditional replication of a cell. An example of how traditional replication evaluates a move to minimize the size of the cut set is illustrated in Figure. The presence of dotted lines inside the cell should be ignored when discussing traditional replication. When the cell M i is replicated, it is copied from partition P k to the partition R k. This move permits elimination from the cut of the net connected to the output pin Y. However, during this process an additional net, connecting to the input pin a, has been added to the cut. Subsequently, no reduction in the cut set has been achieved and there is no indication why replication should be accepted in this case. Fig.. M i X Y Cut line b c a b c a M i Traditional replication ignores the I/O dependencies. M ' i X Y X Y Functional replication of a cell. The concept of functional replication relies on capturing the cell functional dependency at its outputs with respect to its inputs. We will formally evaluate the potential gain associated with functional replication in the section that follows. Here, we illustrate the concept by way of the example in Figure. This time, the dotted lines inside the cell carry the information about the dependency of output pins X and Y with respect to the input pins {a, b, c}. Specifically, we associate with output X an adjacency vector A X =[] T, and similarly, with output Y an adjacency vector A Y =[] T. Clearly, only the input b is adjacent at both outputs and controls the function of each output. Input a is adjacent at the output X only, and similarly, input c is adjacent at the output Y only. As a consequence, the net in the cut set connecting to pin a in the replicated cell can be removed from the cut. a a 2 a 3 a 4 a5 X =f (a,a 2,a 3,a 4 ) X2=f2(a4,a5) Replication potential of this cell: Ψ = 4 Fig. 2. A 2-output cell with the replication potential of 4. Replication potential of a cell. Functional replication relies on the information about the dependency of cell outputs with respect to cell inputs. Consider the illustration shown in Figure 2. This illustration is based on the information extracted from a netlist after technology mapping. While the information about the specific functions associated with each of the outputs may be of interest in other applications, we only require two adjacency vectors to assess the replication potential of this cell: one with respect to the output X, the other with respect to the output : A X =[] T and A X2 =[] T. (3) We associate a replication potential with each cell by counting all inputs which control only a single output of a cell. Thus, the cell in Figure has a replication potential of 2, the cell in Figure 2 has a replication potential of 4. The higher the replication potential of the cell, the more nets may be removed from the cut set during cell replication. We will illustrate this concept further in the following section. There are three binary operations we will perform on the adjacency vectors introduced above as well as on others to be introduced in the following section: Complementation. For example, given that A X =[] T, then A X =[] T. Logical AND. For example, given A X =[] T and A X2 =[] T we get a product vector A X A X2 =[] T. Norm. For example, given A X2 =[] T, A X2 =2. 239

3 Formally, for a cell with n inputs and m outputs {X,,...X m }, we find the corresponding set of adjacency vectors {A X,A X2,...A Xm }. Then the replication potential ψ is defined and can be evaluated as follows: m m ψ = (A Xi A Xj ) ; if m > i= j=;j i (4) ; if m = In (4), the adjacency vectors are complemented and AND-ed before taking the norm. For example, given the adjacency vectors in (3), the expression (4) evaluates to ψ = 4. This is also illustrated in Figure 2. Cell distribution versus replication potential. Let X designate the set of all cells in the circuit before partititioning and let ψ be the replication potential associated with each cellasdefinedin(4). Thend X (ψ) is a cell distribution with respect to the cell replication potential ψ, namely d X (ψ) =. (5) Cell distribution ψ= We have evaluated the replication potential of each cell in the benchmark set in [] and generated the distributions shown in Figure 3. Notably, less than 5% of all cells on average have single output and, by definition, a replication potential of. About % of the two-output cells have replication potential of (denoted as ). All other remaining cells have a replication potential which is greater than. Our experimental results clearly point out that cells which contribute to the largest decrease of the overall interconnect in (2) have ψ provided we use functional rather than traditional replication! In contrast, replications based on the set of all cells where ψ = reduces the interconnect only marginally in a few cases. Maximum cell replication factor, r T, relates to the replicationpotentialin5andisdefinedas r T = d X (ψ) (6) ψ=t The choice of T = allows maximum replication of all cells, while T = corresponds to partitioning without replication as formulated in [3]. We call T the threshold replication potential. * Cell replication potential Y (* refers to multi-output cells with Y = ) c354 c535 c6288 c 7552 s 5378 s9234 s 327 s 585 s Fig. 3. Distribution of cells after Xilinx-based technology mapping. III. Partitioning with Replications Cost Model The preceding section introduced the motivation and the concept of functional replication. Fundamental to this concept is the notion of the replication potential which can be calculated from the respective adjacency vectors associated with the output pins of each cell. In this section we extend the notion of adjacency vectors of a cell and also make binary-valued assignment to the nets connnected to the cell and crossing the cut. Our discussion will be guided by the example shown in Figure 4. By inspection we find:. moving the single cell across the cut line increases the size of the cut set from 3 to 4, hence the gain of this move equals -; 2. replicating the cell in a traditional manner increases the size of the cut set from 3 to 5, hence the gain of this move equals -2; 3. replicating the cell functionally, exploiting the knowledge of the input-output dependencies, reduces the size of the cut from 3 to, hence the gain of this move equals +2. Single cell move: Gain = - X Traditional replication : Gain = -2 X X X2 Functional replication : Gain = +2 X Cut line Fig. 4. Options to reduce the size of the cutset during a bipartition. We next introduce a unified formulation to calculate gains in each of the cases illustrated: () single move of a cell, (2) traditional replication/unreplication move, and (3) functional replication/unreplication move. Given that a cell under consideration has n inputs and m outputs, we associate the following binary vectors with the cell: m I/O adjacency vectors A Xi associated with each output X i, each vector of size n. a pair of cutset adjacency vectors, C I and C O ; C I is of size n, C O is of size m. An element c I j CI is equal to if the net in the cutset is adjacent to the j-th input pin of the cell. An element c O i C O is equal to if the net in the cutset is adjacent to the i-th output pin of the cell. Anetstate is called cut if it is in cutset, otherwise it is called nocut. a pair of critical net vectors, Q I and Q O ; Q I is of size n, Q O is of size m. Anetiscritical if one move changes its state. An element qj I QI is equal to if the net adjacent to the j-th input pin of the cell is critical. An element qi O Q O is equal to if the net adjacent to the i-th output pin of the cell is critical. X X2 24

4 With respect to Figure 4, these vectors are as follows: A X = ; A = ; CI = ; QI = ; [ ] [ ] C O = ; Q O = A. Gain of a single move. The gain of a single move can be calculated by counting the number of cuts and critical nets which are eliminated from a cut set and the number of nocut critical nets which are added to the cut set. Based on our definitions, the gain of a single move, G m, is then: G m = ( C I Q I + C O Q O ) ( C I Q I + C O Q O ) (7) For example, using vectors above, G m = + [ + [ = (2+) (3 + ) = ] [ ] [ ] ] = B. Gain of traditional replication. Replication duplicates a cell and moves the replicated cell across the cut line to another partition while the original cell remains in the original partition. According to the traditional replication as defined in [3], the replicated cell is identical to the original cell and connects exactly the same nets as the original one. Traditional replication eliminates all output nets from the cut set while adding all input nets to it. Since we know the number of input and output nets, as well as the number of cut nets connecting the cell before replication, the gain of traditional replication G tr,is simply: G tr =( C I + C O ) n. (8) For example, G tr =... =(2+) 5= 2 C. Gain of functional replication. If the functionality of a logic cell is known, we can exploit this information to leave some of the input and output pins of the original and replicated cell floating, resulting in additional reduction of the cut set. As shown in Figure 4, we can replicate the cell and leave one output pin and all input pins that control this output floating. For simplicity of presentation, we next derive a generalized formulation for a cell with two outputs only. Assume that in the original cell the output pin # is used while in the replicated cell it is left floating. Similarly, the output pin #2 is used in the replicated cell and left floating in the original cell. Since only the output pin #2 is used in the replicated cell, all input pins adjacent to pin # only can be left floating. Thus, when calculating the gain only input pins adjacent to the pin #2 need to be considered, etc. In general, we can write a gain expresssion for each of the outputs: G X = ( (C I A X ) (Q I A X A X2 ) +(c O q O )) ( (C I A X ) (Q I A X A X2 ) +(c O q O )).(9) and G X2 = ( (C I A X2 ) (Q I A X2 A X ) +(c O 2 q2 O )) ( (C I A X2 ) (Q I A X2 A X ) +(c O 2 q2 O )). () For example, we calculate the gain G r for the best case, where we use pin #2 in the replicated cell: G X2 =... =(+) ( + ) = 2 and similarly, we can calculate G X : G X =... =(+) (3 + ) = 4 Expressions in (9-) are basically an extension of (7) where we used logic operations to eliminate input nets not adjacent to the corresponding output pin #2 connected to the replicated cell. The gain of a functional replication G r is based on the highest gain associated with a given output. Since only two outputs have been considered in this case, we have: G r = max(g X, G X2 ). () When the replication is performed, the original and the replicated cells are disconnected from some nets. The net state and criticality is updated only for cells which are currently connected to the net. The gain of unreplication is equal to the gain of a move of the original cell to the partition that contains the replicated cell, or vice versa. Here, the gain calculations consider only those cells in the nets which are currently connected. Therefore, there is no need to derive an additional gain equation for unreplication. When the unreplication move is performed, the original and the replicated cell are merged into a single cell. D. Implementation highlights. The proposed approach to bipartitioning with functional cell replication was implemented as an extension of the traditional F-M heuristic [5]. We measure the cost of bipartition with the objective function as proposed in [3]. Due to space limitations, we omit discussion on modifications that were implemented to report the results in the following section. These details are reported in [6]. IV. Experimental Results We exercised the proposed algorithms on a set of benchmarks introduced in [3] and []. The characteristics of the benchmark circuits after mapping into the XC3 family are shown in Table II. 24

5 TABLE II Benchmark circuit characteristics. Circuit #CLBs #IOBs #DFF #NETs #PINs c c c c s s s s s We performed two experiments. First, we bipartitioned all benchmarks into two equal-sized partitions with the objective of minimizing the cut set, completely relaxing the terminal constraints. F-M min-cut was based on implementation of the original min-cut algorithm in [5]. In F-M min-cut + Func. Repl., we extended the original min-cut algorithm in [5] with functional replication as introduced in this paper. We performed 2 bipartitioning runs for each benchmark circuit, measuring the best and the average size of the cutset. In all experiments, the threshold replication potential T was set to, allowing maximum utilization of replications. TABLE III Best gains and average gains in the size of the cutset. F-M min-cut F-M min-cut + Func. Repl. Circuit Best Avg. Best cut Avg. cut Gain Gain Gain Red. Gain Red. c % 57 5.% c % 4 7.3% c % % c % % s % 62.6% s % % s % % s % % s % % Avg % % Table III shows promising results with functional replication. The reduction of the best cut ranges from 7.7% for circuit c354 to 62.9% for circuit s Averaged over all circuits, the best cut of 2 runs per benchmark circuit resulted in a reduction of 34.6%. The reduction of the average cut ranges from.6% for circuit s5378 to 64.% for circuit s Averaged over all circuits, the average cut of 2 runs per benchmark circuit resulted in a reduction of 32.7%. Note that the larger reduction of the cut set is achieved for the set of sequential ISCAS 89 benchmarks where cells are more clustered. We have every indication that functional replication is effective and consistent for a wide range of circuit sizes and characteristics. The average increase in CPU running cost due to functional replication was 34%. Combining this approach with techniques in [4], [7] may potentially reduce the size of the cut even further. In the second experiment, we extended the original min-cut algorithm in [5] with functional replication as introduced in this paper, combined with the k-way partitioning algorithm formulated in () and (2) in the introduction of this paper: the main objective being the reduction of the total device cost as well as the interconnect between devices. We limited the circuit expansion due to replication by using different values of threshold replication potential T as defined in (6). Experimental results in Table IV show that partitioning with replication adds only moderately to the total number of cells. Depending on the value of threshold potential, the percentage of cells which are replicated ranges from.% to 9.8%. Averaged over all circuits, the percentage of replicated cells ranges only from 3.3% to 5.%. Since each feasible partition must satisfy both the size and the terminal constraints, searching for the feasible partitions may increase the total CPU times for some circuits over the linear-time characteristic of the run without replications. For direct comparisons with [3], the CPU times shown in Tables IV VII are for the case when 5 feasible partitions per bipartitioning run were generated on a SUN SparcStation +. TABLE IV Percentage of replicated cells and CPU cost of 5 runs. Percentage of replicated cells CPU Circuit T = T = T =2 T =3 T =3 In [3] (%) (%) (%) (%) (sec.) c c c c s s s s s Avg Note: T = includes multi-output cells with ψ = Although we do not limit the replications during each bipartition explicitly, Table V shows that the utilization of FPGA devices did not increase beyond 9% for most of the circuits (except for the circuit s5378 when using threshold potential T > ). Compared with results in [3], the average utilization of CLBs when using functional replication increased from 77% to at most 83%. While in Table IV we report for circuit s327 that 9.8% of cells have been replicated, the average increase in CLB utilization is from 72% for the partitioned circuit without replication to 85% for the partitioned circuit with replication. Glancing ahead, we see that the average IOB utilization was reduced from 88% to 65% for the same circuit! The final results in this paper are reported with respect to the objective functions formulated in () and (2). Table VI reports on the total design cost as defined in (). The reported results are compared with the results published in [3]. Except for the circuit s585, we reduced the overall design cost for at least one setting of threshold potential T while consistently reducing the size of interconnect, reported in Table VII. Table VII summarizes results on the IOB utilization as a measure related to the interconnect density between FPGA devices, defined in (2). Compared to [3] we reduced the average IOB utilization for most circuits: typical reductions range from 4.3% to 53.9%. The circuit c535 proved to be an exceptionally difficult case. Averaging for all circuits, we also achieve IOB utilization of 67%, 242

6 compared to 77% as reported in [3]. We conclude that greater the freedom for unlimited functional replication, greater the reduction of the average IOB count. Our partitioning with replication utilizes different FPGA devices, so while the total costs are comparable with [3], the device distributions are quite different. A more detailed analysis of current partitioning results, including a report on routability, is in progress. TABLE V Average CLB utilization after partitioning. Partitioning with Functional Replication Util. T = T =2 T =3 Circuit In [3] Util. Incr. Util. Incr. Util. Incr. (%) (%) (%) c c c c s s s s s Avg TABLE VI Total design cost after partitioning. Partitioning with Functional Replication Cost T = T =2 T =3 Circuit In [3] Cost Red. Cost Red. Cost Red. (%) (%) (%) c c c c s s s s s Avg TABLE VII Average IOB utilization after partitioning. Partitioning with Functional Replication Util. T = T =2 T =3 Circuit In [3] Util. Red. Util. Red. Util. Red. (%) (%) (%) c c c c s s s s s Avg V. Conclusions We extended the formulation of the problem of partitioning a large logic circuit into a collection of subcircuits each of which is implemented with a device from a specific (FPGA) library. The objective function which we minimized was not only the total cost of devices used but also the size of the interconnect between the devices. We introduced the concept of functional replication with a unified cost model for min-cut partitioning with replication and demonstrated its effectiveness in achieving both objectives. References [] Stephen D. Brown, Robert J. Francis, and Jonathan Rose. Field-Programmable Gate Array. Kluwer Academic Publishers, Boston, 992. [2] W.E.Donath. Logic Partitioning. inphysical Design Automation of VLSI Systems, B. Preas and M. Lorenzett, ed. The Benjamin/Cummings Publisher Company, Menlo Park, California 9425, 988. [3] R. Kuznar, F. Brglez, and K. Kozminski. Cost miminimization of partitions into multiple devices. In 3th Design Automation Conference, ACM/IEEE, pages 35 32, June 993. [4] N.-S. Woo and J. Kim. An Efficient Method of Partitioning Circuits for Multiple- FPGA Implementation. In 3th Design Automation Conference, ACM/IEEE, pages 22 27, June 993. [5] L. A. Sanchis. Multiple-way network partitioning. IEEE Transactions on Computers, 38():62 8, January 989. [6] C. W. Yeh and C. K. Cheng. A general purpose multiple way partitioning algorithm. In Proceedings of the 28 th IEEE Design Automation Conference, pages , 99. [7] C.J. Alpert and A.B. Kahng. Geometric Embeddings for Faster and Better Multi-Way Netlist Partitioning. In 3th Design Automation Conference, ACM/IEEE, pages , 993. [8] P.K. Chan, M.D.F Schlag, and J.Y. Zien. Spectral K-Way Ratio-Cut Partitioning and Clustering. In 3th Design Automation Conference, ACM/IEEE, pages , June 993. [9] J. Cong and M. Smith. A Parallel Bottom-up Clustering Algorithm with Applications to Circuit Partitioning in VLSI Design. In 3th Design Automation Conference, ACM/IEEE, pages , June 993. [] M. Shih and E.S. Kuh. Quadratic Boolean Programming for Performance-Driven System Partitioning. In 3th Design Automation Conference, ACM/IEEE, pages , June 993. [] Benchmark directory pub/benchmark/partitioning93, June 993. send to benchmarks@mcnc.org for details on ftp access. [2] R. L. Russo, P. H. Odden, and P. K. Wolff. A heuristic procedure for the partitioning and mapping of computer logic graphs. IEEE Transaction on Computers, 2: , 97. [3] C. Kring and A. R. Newton. A Cell-Replicating Approach to Mincut-Based Circuit Partitioning. In IEEE International Conference on Computer-Aided Design ICCAD-9, pages 2 5, November 99. [4] J. Hwang and A. El Gamal. Optimal Replication for Min-Cut Partitioning. In IEEE Int. Conf. on Computer-Aided Design, pages , November 992. [5] Charles M. Fiduccia and R. M. Mattheyses. A linear-time heuristic for improving network partitions. In Proceedings of the 9 th IEEE Design Automation Conference, pages 75 8, 982. [6] R. Kuznar, F. Brglez, and B. Zajc. A Unified Cost Model for K-Way Netlist Partitioning with Replication. Technical report, CBL (CAD Benchmarking Laboratory), Elec. & Comp. Engineering, NCSU, Raleigh, N.C., 994. [7] L. Hagen and A.B. Kahng. A New Approach to Effective Circuit Clustering. In IEEE Int. Conf. on Computer-Aided Design, pages , November

A New K-Way Partitioning Approach. Bernhard M. Riess, Heiko A. Giselbrecht, and Bernd Wurth. Technical University of Munich, Munich, Germany

A New K-Way Partitioning Approach. Bernhard M. Riess, Heiko A. Giselbrecht, and Bernd Wurth. Technical University of Munich, Munich, Germany A New K-Way Partitioning Approach for Multiple Types of s Bernhard M. Riess, Heiko A. Giselbrecht, and Bernd Wurth Institute of Electronic Design Automation Technical University of Munich, 8090 Munich,

More information

Acyclic Multi-Way Partitioning of Boolean Networks

Acyclic Multi-Way Partitioning of Boolean Networks Acyclic Multi-Way Partitioning of Boolean Networks Jason Cong, Zheng Li, and Rajive Bagrodia Department of Computer Science University of California, Los Angeles, CA 90024 Abstract Acyclic partitioning

More information

Multiway Netlist Partitioning onto FPGA-based Board Architectures

Multiway Netlist Partitioning onto FPGA-based Board Architectures Multiway Netlist Partitioning onto FPGA-based Board Architectures U. Ober, M. Glesner Institute of Microelectronic Systems, Darmstadt University of Technology, Karlstr. 15, 64283 Darmstadt, Germany Abstract

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

rppatoh: Replicated Partitioning Tool for Hypergraphs

rppatoh: Replicated Partitioning Tool for Hypergraphs rppatoh: Replicated Partitioning Tool for Hypergraphs R. Oguz Selvitopi Computer Engineering Department Bilkent University Ankara, 06800 Turkey reha@cs.bilkent.edu.tr roguzsel@gmail.com Ata Turk Computer

More information

Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning

Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning George Karypis University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis, MN 55455 Technical Report

More information

CAD Algorithms. Circuit Partitioning

CAD Algorithms. Circuit Partitioning CAD Algorithms Partitioning Mohammad Tehranipoor ECE Department 13 October 2008 1 Circuit Partitioning Partitioning: The process of decomposing a circuit/system into smaller subcircuits/subsystems, which

More information

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this

More information

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this report, we

More information

Exploiting Signal Flow and Logic Dependency in Standard Cell Placement

Exploiting Signal Flow and Logic Dependency in Standard Cell Placement Exploiting Signal Flow and Logic Dependency in Standard Cell Placement Jason Cong and Dongmin Xu Computer Sci. Dept., UCLA, Los Angeles, CA 90024 Abstract -- Most existing placement algorithms consider

More information

CIRCUIT PARTITIONING is a fundamental problem in

CIRCUIT PARTITIONING is a fundamental problem in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 15, NO. 12, DECEMBER 1996 1533 Efficient Network Flow Based Min-Cut Balanced Partitioning Hannah Honghua Yang and D.

More information

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs . FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles,

More information

Placement Algorithm for FPGA Circuits

Placement Algorithm for FPGA Circuits Placement Algorithm for FPGA Circuits ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,

More information

Implementation of Multi-Way Partitioning Algorithm

Implementation of Multi-Way Partitioning Algorithm Implementation of Multi-Way Partitioning Algorithm Kulpreet S. Sikand, Sandeep S. Gill, R. Chandel, and A. Chandel Abstract This paper presents a discussion of methods to solve partitioning problems and

More information

General Models for Optimum Arbitrary-Dimension FPGA Switch Box Designs

General Models for Optimum Arbitrary-Dimension FPGA Switch Box Designs General Models for Optimum Arbitrary-Dimension FPGA Switch Box Designs Hongbing Fan Dept. of omputer Science University of Victoria Victoria B anada V8W P6 Jiping Liu Dept. of Math. & omp. Sci. University

More information

Replication for Logic Partitioning

Replication for Logic Partitioning for Logic Partitioning A Project Report Submitted to the Graduate School In Partial Fulfillment of the Requirements for the Degree Master of Science Field of Computer Science and Engineering By Morgan

More information

Wire Type Assignment for FPGA Routing

Wire Type Assignment for FPGA Routing Wire Type Assignment for FPGA Routing Seokjin Lee Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 seokjin@cs.utexas.edu Hua Xiang, D. F. Wong Department

More information

Using Analytical Placement Techniques. Technical University of Munich, Munich, Germany. depends on the initial partitioning.

Using Analytical Placement Techniques. Technical University of Munich, Munich, Germany. depends on the initial partitioning. Partitioning Very Large Circuits Using Analytical Placement Techniques Bernhard M. Riess, Konrad Doll, and Frank M. Johannes Institute of Electronic Design Automation Technical University of Munich, 9

More information

Place and Route for FPGAs

Place and Route for FPGAs Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming

More information

Functional extension of structural logic optimization techniques

Functional extension of structural logic optimization techniques Functional extension of structural logic optimization techniques J. A. Espejo, L. Entrena, E. San Millán, E. Olías Universidad Carlos III de Madrid # e-mail: { ppespejo, entrena, quique, olias}@ing.uc3m.es

More information

On Improving Recursive Bipartitioning-Based Placement

On Improving Recursive Bipartitioning-Based Placement Purdue University Purdue e-pubs ECE Technical Reports Electrical and Computer Engineering 12-1-2003 On Improving Recursive Bipartitioning-Based Placement Chen Li Cheng-Kok Koh Follow this and additional

More information

Multi-Resource Aware Partitioning Algorithms for FPGAs with Heterogeneous Resources

Multi-Resource Aware Partitioning Algorithms for FPGAs with Heterogeneous Resources Multi-Resource Aware Partitioning Algorithms for FPGAs with Heterogeneous Resources Navaratnasothie Selvakkumaran Abhishek Ranjan HierDesign Inc Salil Raje HierDesign Inc George Karypis Department of Computer

More information

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE [HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE International Conference on Computer-Aided Design, pp. 422-427, November 1992. [HaKa92b] L. Hagen and A. B.Kahng,

More information

Global Clustering-Based Performance-Driven Circuit Partitioning

Global Clustering-Based Performance-Driven Circuit Partitioning Global Clustering-Based Performance-Driven Circuit Partitioning Jason Cong University of California at Los Angeles Los Angeles, CA 90095 cong@cs.ucla.edu Chang Wu Aplus Design Technologies, Inc. Los Angeles,

More information

Effects of FPGA Architecture on FPGA Routing

Effects of FPGA Architecture on FPGA Routing Effects of FPGA Architecture on FPGA Routing Stephen Trimberger Xilinx, Inc. 2100 Logic Drive San Jose, CA 95124 USA steve@xilinx.com Abstract Although many traditional Mask Programmed Gate Array (MPGA)

More information

Test Set Compaction Algorithms for Combinational Circuits

Test Set Compaction Algorithms for Combinational Circuits Proceedings of the International Conference on Computer-Aided Design, November 1998 Set Compaction Algorithms for Combinational Circuits Ilker Hamzaoglu and Janak H. Patel Center for Reliable & High-Performance

More information

Circuit Placement: 2000-Caldwell,Kahng,Markov; 2002-Kennings,Markov; 2006-Kennings,Vorwerk

Circuit Placement: 2000-Caldwell,Kahng,Markov; 2002-Kennings,Markov; 2006-Kennings,Vorwerk Circuit Placement: 2000-Caldwell,Kahng,Markov; 2002-Kennings,Markov; 2006-Kennings,Vorwerk Andrew A. Kennings, Univ. of Waterloo, Canada, http://gibbon.uwaterloo.ca/ akenning/ Igor L. Markov, Univ. of

More information

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada, V6T

More information

Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks

Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks Charles Stroud, Ping Chen, Srinivasa Konala, Dept. of Electrical Engineering University of Kentucky and Miron Abramovici

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

VLSI Circuit Partitioning by Cluster-Removal Using Iterative Improvement Techniques

VLSI Circuit Partitioning by Cluster-Removal Using Iterative Improvement Techniques VLSI Circuit Partitioning by Cluster-Removal Using Iterative Improvement Techniques Shantanu Dutt and Wenyong Deng Department of Electrical Engineering, University of Minnesota, Minneapolis, MN 5555, USA

More information

Fault Grading FPGA Interconnect Test Configurations

Fault Grading FPGA Interconnect Test Configurations * Fault Grading FPGA Interconnect Test Configurations Mehdi Baradaran Tahoori Subhasish Mitra* Shahin Toutounchi Edward J. McCluskey Center for Reliable Computing Stanford University http://crc.stanford.edu

More information

IN general setting, a combinatorial network is

IN general setting, a combinatorial network is JOURNAL OF L A TEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 1 Clustering without replication: approximation and inapproximability Zola Donovan, Vahan Mkrtchyan, and K. Subramani, arxiv:1412.4051v1 [cs.ds]

More information

Timing Optimization of FPGA Placements by Logic Replication

Timing Optimization of FPGA Placements by Logic Replication 13.1 Timing Optimization of FPGA Placements by Logic Replication Giancarlo Beraudo ECE Department, University of Illinois at Chicago 851 S. Morgan St., Chicago IL, 60607 gberaudo@ece.uic.edu John Lillis

More information

Boolean Matching for Complex PLBs in LUT-based FPGAs with Application to Architecture Evaluation. Jason Cong and Yean-Yow Hwang

Boolean Matching for Complex PLBs in LUT-based FPGAs with Application to Architecture Evaluation. Jason Cong and Yean-Yow Hwang Boolean Matching for Complex PLBs in LUT-based PAs with Application to Architecture Evaluation Jason Cong and Yean-Yow wang Department of Computer Science University of California, Los Angeles {cong, yeanyow}@cs.ucla.edu

More information

Fast Timing-driven Partitioning-based Placement for Island Style FPGAs

Fast Timing-driven Partitioning-based Placement for Island Style FPGAs .1 Fast Timing-driven Partitioning-based Placement for Island Style FPGAs Pongstorn Maidee Cristinel Ababei Kia Bazargan Electrical and Computer Engineering Department University of Minnesota, Minneapolis,

More information

Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning

Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning 952 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 7, JULY 2003 [13] W. Kunz, D. Stoffel, and P. R. Menon, Logic optimization and equivalence checking by implication

More information

Local Unidirectional Bias for Smooth Cutsize-Delay Tradeoff in Performance-Driven Bipartitioning

Local Unidirectional Bias for Smooth Cutsize-Delay Tradeoff in Performance-Driven Bipartitioning Local Unidirectional Bias for Smooth Cutsize-Delay Tradeoff in Performance-Driven Bipartitioning Andrew B. Kahng CSE and ECE Departments UCSD La Jolla, CA 92093 abk@ucsd.edu Xu Xu CSE Department UCSD La

More information

Improved Algorithms for Hypergraph Bipartitioning

Improved Algorithms for Hypergraph Bipartitioning Improved Algorithms for Hypergraph Bipartitioning Andrew E. Caldwell, Andrew B. Kahng and Igor L. Markov* UCLA Computer Science Dept., Los Angeles, CA 90095-1596 {caldwell,abk,imarkov}@cs.ucla.edu Abstract

More information

VLSI Circuit Partitioning by Cluster-Removal Using Iterative Improvement Techniques

VLSI Circuit Partitioning by Cluster-Removal Using Iterative Improvement Techniques To appear in Proc. IEEE/ACM International Conference on CAD, 996 VLSI Circuit Partitioning by Cluster-Removal Using Iterative Improvement Techniques Shantanu Dutt and Wenyong Deng Department of Electrical

More information

Wojciech P. Maly Department of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA

Wojciech P. Maly Department of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong Deng Department of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213 412-268-5234

More information

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5.

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5. Course contents: Partitioning Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic Readings Chapter 7.5 Partitioning 1 Basic Definitions Cell: a logic block used to build larger circuits.

More information

JRoute: A Run-Time Routing API for FPGA Hardware

JRoute: A Run-Time Routing API for FPGA Hardware JRoute: A Run-Time Routing API for FPGA Hardware Eric Keller Xilinx Inc. 2300 55 th Street Boulder, CO 80301 Eric.Keller@xilinx.com Abstract. JRoute is a set of Java classes that provide an application

More information

A Semi-Persistent Clustering Technique for VLSI Circuit Placement

A Semi-Persistent Clustering Technique for VLSI Circuit Placement A Semi-Persistent Clustering Technique for VLSI Circuit Placement Charles Alpert, Andrew Kahng, Gi-Joon Nam, Sherief Reda and Paul Villarrubia IBM Corp. 114 Burnet Road, Austin, Texas, 78758 Department

More information

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi Incorporating the Controller Eects During Register Transfer Level Synthesis Champaka Ramachandran and Fadi J. Kurdahi Department of Electrical & Computer Engineering, University of California, Irvine,

More information

On Computing Minimum Size Prime Implicants

On Computing Minimum Size Prime Implicants On Computing Minimum Size Prime Implicants João P. Marques Silva Cadence European Laboratories / IST-INESC Lisbon, Portugal jpms@inesc.pt Abstract In this paper we describe a new model and algorithm for

More information

Wirelength Estimation based on Rent Exponents of Partitioning and Placement Λ

Wirelength Estimation based on Rent Exponents of Partitioning and Placement Λ Wirelength Estimation based on Rent Exponents of Partitioning and Placement Λ Xiaojian Yang Elaheh Bozorgzadeh Majid Sarrafzadeh Computer Science Department University of California at Los Angeles Los

More information

Combinational Equivalence Checking Using Incremental SAT Solving, Output Ordering, and Resets

Combinational Equivalence Checking Using Incremental SAT Solving, Output Ordering, and Resets ASP-DAC 2007 Yokohama Combinational Equivalence Checking Using Incremental SAT Solving, Output ing, and Resets Stefan Disch Christoph Scholl Outline Motivation Preliminaries Our Approach Output ing Heuristics

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 6 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Contents FPGA Technology Programmable logic Cell (PLC) Mux-based cells Look up table PLA

More information

Reducing Power in an FPGA via Computer-Aided Design

Reducing Power in an FPGA via Computer-Aided Design Reducing Power in an FPGA via Computer-Aided Design Steve Wilton University of British Columbia Power Reduction via CAD How to reduce power dissipation in an FPGA: - Create power-aware CAD tools - Create

More information

Multilevel k-way Hypergraph Partitioning

Multilevel k-way Hypergraph Partitioning _ Multilevel k-way Hypergraph Partitioning George Karypis and Vipin Kumar fkarypis, kumarg@cs.umn.edu Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN 55455 Abstract

More information

Static Compaction Techniques to Control Scan Vector Power Dissipation

Static Compaction Techniques to Control Scan Vector Power Dissipation Static Compaction Techniques to Control Scan Vector Power Dissipation Ranganathan Sankaralingam, Rama Rao Oruganti, and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer

More information

Hypergraph Partitioning With Fixed Vertices

Hypergraph Partitioning With Fixed Vertices Hypergraph Partitioning With Fixed Vertices Andrew E. Caldwell, Andrew B. Kahng and Igor L. Markov UCLA Computer Science Department, Los Angeles, CA 90095-596 Abstract We empirically assess the implications

More information

Generating Synthetic Benchmark Circuits for Evaluating CAD Tools

Generating Synthetic Benchmark Circuits for Evaluating CAD Tools IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 9, SEPTEMBER 2000 1011 Generating Synthetic Benchmark Circuits for Evaluating CAD Tools Dirk Stroobandt, Member,

More information

Large Scale Circuit Partitioning

Large Scale Circuit Partitioning Large Scale Circuit Partitioning With Loose/Stable Net Removal And Signal Flow Based Clustering Jason Cong Honching Li Sung-Kyu Lim Dongmin Xu UCLA VLSI CAD Lab Toshiyuki Shibuya Fujitsu Lab, LTD Support

More information

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs Vaughn Betz Jonathan Rose Alexander Marquardt

More information

Partitioning With Terminals: A New Problem and New Benchmarks

Partitioning With Terminals: A New Problem and New Benchmarks Partitioning With Terminals: A New Problem and New Benchmarks C. J. Alpert,A.E.Caldwell,A.B.KahngandI.L.Markov UCLA Computer Science Dept., Los Angeles, CA 90095-596 USA IBM Austin Research Laboratory,

More information

THE technology mapping and synthesis problem for field

THE technology mapping and synthesis problem for field 738 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 1998 An Efficient Algorithm for Performance-Optimal FPGA Technology Mapping with Retiming Jason

More information

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809 PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA Laurent Lemarchand Informatique ubo University{ bp 809 f-29285, Brest { France lemarch@univ-brest.fr ea 2215, D pt ABSTRACT An ecient distributed

More information

Research Article Accounting for Recent Changes of Gain in Dealing with Ties in Iterative Methods for Circuit Partitioning

Research Article Accounting for Recent Changes of Gain in Dealing with Ties in Iterative Methods for Circuit Partitioning Discrete Dynamics in Nature and Society Volume 25, Article ID 625, 8 pages http://dxdoiorg/55/25/625 Research Article Accounting for Recent Changes of Gain in Dealing with Ties in Iterative Methods for

More information

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance

More information

Device And Architecture Co-Optimization for FPGA Power Reduction

Device And Architecture Co-Optimization for FPGA Power Reduction 54.2 Device And Architecture Co-Optimization for FPGA Power Reduction Lerong Cheng, Phoebe Wong, Fei Li, Yan Lin, and Lei He Electrical Engineering Department University of California, Los Angeles, CA

More information

Geometric Steiner Trees

Geometric Steiner Trees Geometric Steiner Trees From the book: Optimal Interconnection Trees in the Plane By Marcus Brazil and Martin Zachariasen Part 2: Global properties of Euclidean Steiner Trees and GeoSteiner Marcus Brazil

More information

MODULAR PARTITIONING FOR INCREMENTAL COMPILATION

MODULAR PARTITIONING FOR INCREMENTAL COMPILATION MODULAR PARTITIONING FOR INCREMENTAL COMPILATION Mehrdad Eslami Dehkordi, Stephen D. Brown Dept. of Electrical and Computer Engineering University of Toronto, Toronto, Canada email: {eslami,brown}@eecg.utoronto.ca

More information

A Data Parallel Algorithm for Boolean Function Manipulation

A Data Parallel Algorithm for Boolean Function Manipulation A Data Parallel Algorithm for Boolean Function Manipulation S. Gai, M. Rebaudengo, M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica e Informatica Torino, Italy Abstract * This paper describes

More information

Efficient SAT-based Boolean Matching for FPGA Technology Mapping

Efficient SAT-based Boolean Matching for FPGA Technology Mapping Efficient SAT-based Boolean Matching for FPGA Technology Mapping Sean Safarpour, Andreas Veneris Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada {sean, veneris}@eecg.toronto.edu

More information

Further Improve Circuit Partitioning Using GBAW Logic Perturbation Techniques

Further Improve Circuit Partitioning Using GBAW Logic Perturbation Techniques IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003 451 Further Improve Circuit Partitioning Using GBAW Logic Perturbation Techniques Yu-Liang Wu, Member, IEEE,

More information

Designing Heterogeneous FPGAs with Multiple SBs *

Designing Heterogeneous FPGAs with Multiple SBs * Designing Heterogeneous FPGAs with Multiple SBs * K. Siozios, S. Mamagkakis, D. Soudris, and A. Thanailakis VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus

More information

On the Relation between SAT and BDDs for Equivalence Checking

On the Relation between SAT and BDDs for Equivalence Checking On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda 1 Rolf Drechsler 2 Alex Orailoglu 1 1 Computer Science & Engineering Department University of California, San Diego La Jolla,

More information

COMPARATIVE STUDY OF CIRCUIT PARTITIONING ALGORITHMS

COMPARATIVE STUDY OF CIRCUIT PARTITIONING ALGORITHMS COMPARATIVE STUDY OF CIRCUIT PARTITIONING ALGORITHMS Zoltan Baruch 1, Octavian Creţ 2, Kalman Pusztai 3 1 PhD, Lecturer, Technical University of Cluj-Napoca, Romania 2 Assistant, Technical University of

More information

Can Recursive Bisection Alone Produce Routable Placements?

Can Recursive Bisection Alone Produce Routable Placements? Supported by Cadence Can Recursive Bisection Alone Produce Routable Placements? Andrew E. Caldwell Andrew B. Kahng Igor L. Markov http://vlsicad.cs.ucla.edu Outline l Routability and the placement context

More information

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing Umadevi.S #1, Vigneswaran.T #2 # Assistant Professor [Sr], School of Electronics Engineering, VIT University, Vandalur-

More information

Graphical Analysis. Figure 1. Copyright c 1997 by Awi Federgruen. All rights reserved.

Graphical Analysis. Figure 1. Copyright c 1997 by Awi Federgruen. All rights reserved. Graphical Analysis For problems with 2 variables, we can represent each solution as a point in the plane. The Shelby Shelving model (see the readings book or pp.68-69 of the text) is repeated below for

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Field Programmable Gate Arrays

Field Programmable Gate Arrays Chortle: A Technology Mapping Program for Lookup Table-Based Field Programmable Gate Arrays Robert J. Francis, Jonathan Rose, Kevin Chung Department of Electrical Engineering, University of Toronto, Ontario,

More information

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 1, Feb 2015, 01-07 IIST HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC

More information

Eu = {n1, n2} n1 n2. u n3. Iu = {n4} gain(u) = 2 1 = 1 V 1 V 2. Cutset

Eu = {n1, n2} n1 n2. u n3. Iu = {n4} gain(u) = 2 1 = 1 V 1 V 2. Cutset Shantanu Dutt 1 and Wenyong Deng 2 A Probability-Based Approach to VLSI Circuit Partitioning Department of Electrical Engineering 1 of Minnesota University Minneapolis, Minnesota 55455 LSI Logic Corporation

More information

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors RPack: Rability-Driven packing for cluster-based FPGAs E. Bozorgzadeh S. Ogrenci-Memik M. Sarrafzadeh Computer Science Department Department ofece Computer Science Department UCLA Northwestern University

More information

Core-Level Compression Technique Selection and SOC Test Architecture Design 1

Core-Level Compression Technique Selection and SOC Test Architecture Design 1 17th Asian Test Symposium Core-Level Compression Technique Selection and SOC Test Architecture Design 1 Anders Larsson +, Xin Zhang +, Erik Larsson +, and Krishnendu Chakrabarty * + Department of Computer

More information

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Jason Cong and Yean-Yow Hwang Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this paper, we

More information

Hierarchical Partitioning

Hierarchical Partitioning Hierarchical Partitioning Dirk Behrens Klaus Harbich Erich Barke Institute of Microelectronic Systems Department of Electrical Engineering University of Hanover, D-30167 Hanover, Germany E-mail: {behrens,

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

COMPILED CODE IN DISTRIBUTED LOGIC SIMULATION. Jun Wang Carl Tropper. School of Computer Science McGill University Montreal, Quebec, CANADA H3A2A6

COMPILED CODE IN DISTRIBUTED LOGIC SIMULATION. Jun Wang Carl Tropper. School of Computer Science McGill University Montreal, Quebec, CANADA H3A2A6 Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. COMPILED CODE IN DISTRIBUTED LOGIC SIMULATION Jun Wang Carl

More information

Cell Density-driven Detailed Placement with Displacement Constraint

Cell Density-driven Detailed Placement with Displacement Constraint Cell Density-driven Detailed Placement with Displacement Constraint Wing-Kai Chow, Jian Kuang, Xu He, Wenzan Cai, Evangeline F. Y. Young Department of Computer Science and Engineering The Chinese University

More information

ARELAY network consists of a pair of source and destination

ARELAY network consists of a pair of source and destination 158 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 55, NO 1, JANUARY 2009 Parity Forwarding for Multiple-Relay Networks Peyman Razaghi, Student Member, IEEE, Wei Yu, Senior Member, IEEE Abstract This paper

More information

A New Algorithm to Create Prime Irredundant Boolean Expressions

A New Algorithm to Create Prime Irredundant Boolean Expressions A New Algorithm to Create Prime Irredundant Boolean Expressions Michel R.C.M. Berkelaar Eindhoven University of technology, P.O. Box 513, NL 5600 MB Eindhoven, The Netherlands Email: michel@es.ele.tue.nl

More information

PARTITIONING COMBINATIONAL CIRCUITS FOR K-LUT BASED FPGA MAPPING

PARTITIONING COMBINATIONAL CIRCUITS FOR K-LUT BASED FPGA MAPPING U.P.B. Sci. Bull., Series C, Vol. 68, No. 2, 2006 PARTITIONING COMBINATIONAL CIRCUITS FOR K-LUT BASED FPGA MAPPING I. I. BUCUR Partiţionarea este o problemă centrală în automatizarea proiectării VLSI vizând

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

SafeChoice: A Novel Approach to Hypergraph Clustering for Wirelength-Driven Placement

SafeChoice: A Novel Approach to Hypergraph Clustering for Wirelength-Driven Placement IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, 2011 1 SafeChoice: A Novel Approach to Hypergraph Clustering for Wirelength-Driven Placement Jackey Z. Yan, Chris Chu and Wai-Kei Mak Abstract This paper presents

More information

On the Complexity of the Channel Routing Problem in the Dogleg-free Multilayer Manhattan Model

On the Complexity of the Channel Routing Problem in the Dogleg-free Multilayer Manhattan Model On the Complexity of the Channel Routing Problem in the Dogleg-free Multilayer Manhattan Model Kornélia Ambrus Somogyi Budapest Tech, email: ambrusne.somogyi.kornelia@nik.bmf.hu András Recski 1 Budapest

More information

A General Sign Bit Error Correction Scheme for Approximate Adders

A General Sign Bit Error Correction Scheme for Approximate Adders A General Sign Bit Error Correction Scheme for Approximate Adders Rui Zhou and Weikang Qian University of Michigan-Shanghai Jiao Tong University Joint Institute Shanghai Jiao Tong University, Shanghai,

More information

Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization

Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 1 Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization Navaratnasothie Selvakkumaran and

More information

TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS

TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS Zoltan Baruch E-mail: Zoltan.Baruch@cs.utcluj.ro Octavian Creţ E-mail: Octavian.Cret@cs.utcluj.ro Kalman Pusztai E-mail: Kalman.Pusztai@cs.utcluj.ro Computer

More information

Introduction VLSI PHYSICAL DESIGN AUTOMATION

Introduction VLSI PHYSICAL DESIGN AUTOMATION VLSI PHYSICAL DESIGN AUTOMATION PROF. INDRANIL SENGUPTA DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Introduction Main steps in VLSI physical design 1. Partitioning and Floorplanning l 2. Placement 3.

More information

Factor Cuts. Satrajit Chatterjee Alan Mishchenko Robert Brayton ABSTRACT

Factor Cuts. Satrajit Chatterjee Alan Mishchenko Robert Brayton ABSTRACT Factor Cuts Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu ABSTRACT Enumeration of bounded size cuts is an important

More information

An Enhanced Perturbing Algorithm for Floorplan Design Using the O-tree Representation*

An Enhanced Perturbing Algorithm for Floorplan Design Using the O-tree Representation* An Enhanced Perturbing Algorithm for Floorplan Design Using the O-tree Representation* Yingxin Pang Dept.ofCSE Univ. of California, San Diego La Jolla, CA 92093 ypang@cs.ucsd.edu Chung-Kuan Cheng Dept.ofCSE

More information

Reconfigurable Linear Decompressors Using Symbolic Gaussian Elimination

Reconfigurable Linear Decompressors Using Symbolic Gaussian Elimination Reconfigurable Linear Decompressors Using Symbolic Gaussian Elimination Kedarnath J. Balakrishnan and Nur A. Touba Computer Engineering Research Center University of Texas at Austin {kjbala,touba}@ece.utexas.edu

More information

TCG-Based Multi-Bend Bus Driven Floorplanning

TCG-Based Multi-Bend Bus Driven Floorplanning TCG-Based Multi-Bend Bus Driven Floorplanning Tilen Ma Department of CSE The Chinese University of Hong Kong Shatin, N.T. Hong Kong Evangeline F.Y. Young Department of CSE The Chinese University of Hong

More information

Read this before starting!

Read this before starting! Points missed: Student's Name: Total score: /100 points East Tennessee State University Department of Computer and Information Sciences CSCI 2150 (Tarnoff) Computer Organization TEST 1 for Spring Semester,

More information