Cluster-Based Architecture, Timing-Driven Packing and Timing-Driven Placement for FPGAs

Size: px
Start display at page:

Download "Cluster-Based Architecture, Timing-Driven Packing and Timing-Driven Placement for FPGAs"

Transcription

1 Cluster-Based Architecture, Timing-Driven Packing and Timing-Driven Placement for FPGAs by Alexander R. Marquardt A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Department of Electrical and Computer Engineering University of Toronto Copyright by Alexander Ronald Marquardt, 1999

2 Abstract Cluster-Based Architecture, Timing-Driven Packing and Timing-Driven Placement for FPGAs Master of Applied Science, 1999 Alexander R. Marquardt Department of Electrical and Computer Engineering University of Toronto As process geometries shrink into the deep-submicron region, interconnect resistance and capacitance account for an increasingly significant portion of the delay of circuits implemented in Field- Programmable Gate Arrays (FPGAs). One way to improve FPGA speed is to employ logiccluster-based architectures which have high-speed local connections among groups of logic elements. In this work we show what size logic-cluster results in the best area-speed trade-off. To obtain the best choices for a cluster-based architecture, we use computer aided design (CAD) tools to experimentally evaluate architectures with different sized logic clusters. As part of this CAD flow, we develop a timing-driven algorithm that packs logic elements into these clusters. In addition, we develop a timing-driven placement algorithm that results in significant improvements in FPGA speed over existing non-timing-driven algorithms. ii

3 Acknowledgments I would like to thank my advisor Jonathan Rose for providing direction, motivation, and advice throughout this work. He has taught me a great deal about FPGA research. I would also like to give thanks to Vaughn Betz. He and I spent many hours discussing FPGA architecture and CAD, and each discussion we had was educational. I would also like to thank the students in Jonathan s research group, Yaska, Jordan, Khalid, Rob, and Paul. Through our weekly meetings, and through other informal meetings, we have shared many ideas. I am grateful to my parents for giving me constant support and encouragement throughout my life and always having faith in me. iii

4 Table of Contents CHAPTER 1 Introduction Cluster-Based Logic Blocks Timing-Driven Packing Timing-Driven Placement Thesis Organization CHAPTER 2 Background and Previous Work Overview of FPGA Architecture Cluster-Based Logic Blocks CAD for FPGAs Timing Analysis Packing Algorithms for Cluster-Based FPGAs The VPack Logic Cluster Packing Tool RASP Placement Simulated Annealing The VPR Placement Tool (VPlace) Timing-Driven Placement TimberWolfSC PROXI Summary CHAPTER 3 Timing-Driven and Connection-Driven Packing Experimental Methodology iv

5 3.2 Timing-Driven Packing: T-VPack Timing Analysis and Delay Models Timing-Driven Packing Description Preliminary Definitions Seed Selection and Attraction Function Algorithm Analysis Computational Complexity Connection-Driven Packing: C-VPack Attraction Function Time Complexity Result Quality of T-VPack, C-VPack, and VPack Summary CHAPTER 4 The Effect of Cluster Size on FPGA Speed and Density Trade-offs in Cluster-Based FPGAs Architecture Modeling Area Model Delay Model Effect of Cluster Size on the Physical Length of FPGA Routing Segments Sizing Routing Transistors to Compensate for Different Physical Segment Lengths FPGA Architectural Parameters Basic Architecture Inputs Required vs. Cluster Size Routing Architecture Flexibility of Logic Block to Routing Interconnect vs. Cluster Size Architecture Evaluation Metric: Area-Delay Product Speed and Area-Efficiency vs. Cluster Size Discussion of Delay vs. Cluster Size Results Effect of Cluster Size on Compile Time Summary CHAPTER 5 Timing-Driven Placement Introduction Timing-Driven Placement: T-VPlace Delay Modeling and Cost Function Delay Lookup Matrix Cost Function v

6 5.2.2 Algorithm Tuning Verification of the Fidelity of the Placement Estimated Critical Path Delay Time Complexity Results: VPlace vs. T-VPlace Summary CHAPTER 6 Conclusions and Future Work Conclusions and Contributions Future Work APPENDIX A MCNC Benchmarks APPENDIX B VPack and T-VPack Sink Delay Distributions: Size 8 Clusters APPENDIX C Sink Delay Distributions for the 2 MCNC Benchmark Circuits C.1 Placement Estimated Sink Delay Distributions: Size 1 Clusters C.2 Low-Stress Sink Delay Distributions: Size 1 Clusters C.3 Placement Estimated Sink Delay Distributions: Size 8 Clusters C.4 Low-Stress Sink Delay Distributions: Size 8 Clusters vi

7 List of Tables TABLE 3.1 Effects of using tie-breakers, and the recompute timing interval (cluster size = 8) TABLE 3.2 Comparison of VPack, T-VPack, and C-VPack result quality (Cluster Size = 8) TABLE 3.3 Net absorption and inputs used (cluster size 8) TABLE 4.1 Important intra-cluster delays in TSMC s.35 µm CMOS process TABLE 4.2 Inputs required for 98% utilization for VPack and T-VPack TABLE 4.3 Routing area vs. F c, input for various cluster sizes TABLE 5.1 Effect of re-timing-analysis in the outer loop TABLE 5.2 Effect of re-timing-analysis in the inner loop TABLE 5.3 Effect of Criticality_Exponent with a λ value of TABLE 5.4 Effect of Criticality_Exponent with a λ value of TABLE 5.5 Effect of λ with an adaptive Criticality_Exponent of TABLE 5.6 Post-place-and-route comparison of VPlace and T-VPlace (cluster size = 1) TABLE 5.7 Post-place-and-route comparison of VPlace and T-VPlace (cluster size = 8) TABLE 5.8 Post-place-and-route comparison with Xilinx-like architecture (cluster size = 4) TABLE A.1 MCNC benchmark circuits vii

8 List of Figures FIGURE 1.1 Example logic cluster containing two LUTs [BETZ99] FIGURE 2.1 A generic FPGA [Brow92] FIGURE 2.2 Logic cluster and basic logic element (BLE) FIGURE 2.3 CAD flow FIGURE 2.4 Packing example FIGURE 2.5 Pseudo-code for VPack [Betz98b, Betz99] FIGURE 2.6 Pseudo-code of a generic Simulated Annealing-based placer [Betz98b, Betz99] FIGURE 3.1 Architecture evaluation CAD flow [Betz98b, Betz99] FIGURE 3.2 Pseudo-code for T-VPack FIGURE 3.3 Determining BaseBLECrit from connection criticalities FIGURE 3.4 Example of first criticality tie-breaker FIGURE 3.5 Example of second criticality tie-breaker FIGURE 3.6 Post place and route T-VPack alpha trade-off curves FIGURE 3.7 Post place and route C-VPAck alpha trade-off curves FIGURE 3.8 Why reducing the number of nets in a circuit is good FIGURE 4.1 Structure and speed paths of a logic cluster FIGURE 4.2 Effect of cluster size on physical length of routing segments FIGURE 4.3 Effect of cluster size on tile length FIGURE 4.4 Inputs required for 98% utilization vs. cluster Size FIGURE 4.5 FPGA with length 4 segments, 5% buffered and 5% pass transistor switches FIGURE 4.6 Total area vs. cluster size FIGURE 4.7 Area components vs. cluster size FIGURE 4.8 Critical path delay vs. cluster size FIGURE 4.9 Area-delay product vs. cluster size viii

9 FIGURE 4.1 Inter-cluster and intra-cluster nets on the critical path FIGURE 4.11 Breakdown of critical path delay into inter-cluster and intra-cluster components FIGURE 4.12 Decrease in logical manhattan distance as cluster size increases FIGURE 4.13 Variation of circuit compile time with logic cluster size FIGURE 5.1 Pseudo-code T-VPlace FIGURE 5.2 Graph showing fidelity of placement estimated critical path ix

10 1 CHAPTER 1 Introduction Field-Programmable Gate Arrays (FPGAs) have become one of the most popular implementation media for digital circuits, and since their introduction in 1984, FPGAs have become a multibillion dollar industry. The key to the success of FPGAs is their programmability, which allows any circuit to be instantly realized by appropriately programming an FPGA. FPGAs have some compelling advantages over Standard Cells or Mask-Programmed Gate Arrays (MPGAs): faster time-to-market, lower non-recurring engineering costs (NRE), and easier debugging. Additionally, FPGAs offer designers the ability to fix errors or to add features to systems that have already been manufactured. FPGAs are also useful for implementing designs that are low volume or are required immediately, since they do not require extensive manufacturing like Standard Cells or MPGAs. The benefits offered by FPGAs come at a price FPGAs are at least three times slower, and require at least ten times the area of MPGAs [Brow92]. This loss in speed is mainly due to the fact that logic in FPGAs is connected via programmable switches, while in Standard Cells or MPGAs, logic is directly connected with metal wires. The programmable switches in FPGAs have high resistance and capacitance compared to the metal wiring in Standard Cells or MPGAs, and therefore reduce circuit speed. Interconnect delay is more significant (a larger proportion of circuit delay) in FPGAs than it is in MPGAs or Standard Cells, and consequently it is more important to minimize the interconnect delay in FPGAs than it is in MPGAs or Standard Cells.

11 CHAPTER 1 Introduction 2 Another important factor affecting circuit delay is the process used in the manufacture of an FPGA. As process geometries shrink into the deep-submicron region, interconnect 1 resistance and capacitance become increasingly significant smaller processes which result in improvements in logic speed do not result in similar improvements in interconnect speed. The result of this is that as processes shrink, interconnect delay accounts for an increasing proportion of total circuit delay. Clearly each process shrink makes interconnect delay more and more significant, and it must be minimized to achieve the best possible circuit performance. The quality of the computer-aided design (CAD) tools used to map circuits into an FPGA and the quality of the FPGA architecture can have a significant impact on the FPGA s performance. It is clear that interconnect delay is an increasingly important factor in the overall performance of an FPGA, so it is crucial that FPGA CAD tools and FPGA architectures minimize this delay. Our research focuses on the following two areas 1. Exploring FPGA logic block architectures to minimize interconnect delay, and 2. Developing CAD tools that minimize interconnect delay. It is important that FPGA architecture and CAD be studied in concert, since architectural features must be properly utilized by CAD tools to be of any benefit, and CAD tool enhancements cannot be properly evaluated without a good architecture. In this thesis, we are concerned with improving FPGA performance without sacrificing large amounts of area. To accomplish this we investigate three promising aspects of FPGA architecture and CAD: Logic-cluster based FPGA architectures, timing-driven packing, and timing-driven placement. These three areas are described in the following sections. 1. Interconnect is the wiring and switches that connect logic elements.

12 CHAPTER 1 Introduction 3 Logic Cluster FPGA Logic Cluster Inputs Local Interconnect (X-Bar) BLE BLE Logic Cluster Outputs FIGURE 1.1 Example logic cluster containing two LUTs [BETZ99] 1.1 Cluster-Based Logic Blocks An important factor affecting the area and speed of an FPGA is the logic block (logic cluster) architecture used within the FPGA. In general a logic cluster consists one or more basic logic elements (BLEs) connected by fast local interconnect [Betz98b, Betz99], where the BLE (described fully in Section 2.1.1) that we use consists of a 4-LUT and a register. Figure 1.1 shows an example logic cluster consisting of two BLEs and local interconnect. The size of the logic cluster (number of BLEs it contains) used in an FPGA architecture can have a dramatic effect on its area and performance. Previous work [Betz98b] demonstrated the effect of cluster size on area efficiency. Also, in [Betz98b] it was speculated that as cluster size is increased, circuit speed would be improved. As cluster size is increased, two things happen 1. More critical path connections are able to use the fast local interconnect rather than using slow inter-cluster (between cluster) interconnect, but this local interconnect becomes slower. 2. More connections are completely absorbed within clusters so less inter-cluster routing is required (reducing area), but the local interconnect area per cluster is growing quadratically (increasing area).

13 CHAPTER 1 Introduction 4 We are concerned with determining the effect of logic cluster size on circuit speed as well as area and finding what size logic cluster has the best area-delay trade-off. To our knowledge no work has been done which simultaneously investigates logic clusters with respect to both area and speed. 1.2 Timing-Driven Packing To fairly evaluate different size logic clusters with respect to speed, it is important that the CAD tools take advantage of the fast local interconnect within the clusters in order to minimize the critical path delay. A packing algorithm selects how BLEs in a circuit are to be mapped into logic clusters, while a timing-driven packing algorithm attempts to map BLEs along the critical path into the fewest number of clusters so that many critical path connections use fast local interconnect. We give a more formal definition of packing in Section Timing-Driven Placement Placement involves selecting the coordinates in the FPGA where each logic cluster will be mapped to. A timing-driven placement algorithm attempts to map logic clusters that are on the critical path into physical locations that are close together so as to minimize the amount of interconnect through which the critical signal must travel. Previous work [Betz99, Betz98b] has done a good job considering timing during routing, but it did not consider timing during placement. While there is evidence that timing-driven placement improves speed for standard cells, there has been no clear quantification of how much the improvement is for FPGAs. A goal of this work is to determine what improvements can be obtained with timing-driven placement. Placement is formally defined in Section

14 CHAPTER 1 Introduction Thesis Organization This thesis is organized as follows. Chapter 2 describes FPGA architecture and CAD, and gives an overview of existing CAD tools. Chapter 3 introduces a new timing-driven logic block packing algorithm. Chapter 4 describes architecture experiments that evaluate different size logic clusters with respect to area and speed. Chapter 5 describes a new timing-driven placement algorithm. Finally, in Chapter 6 we present our conclusions and suggestions for future work. (1.1)

15 CHAPTER 1 Introduction 6

16 7 CHAPTER 2 Background and Previous Work In this chapter, we first give an overview of FPGA architecture with a focus on logic block architecture. After this we discuss the CAD flow used to map circuits into FPGAs including an introduction to timing analysis, and a detailed review of logic cluster packing, placement, and timingdriven placement. 2.1 Overview of FPGA Architecture In general, an FPGA consists of logic blocks, I/O blocks, and programmable routing as shown in Figure 2.1. To implement a circuit in an FPGA, each of the logic blocks in the FPGA are appropriately programmed to perform a small portion of the functionality of the desired circuit, and each of the I/O blocks is programmed to be an input pad or an output pad as required by the circuit. Then these functional portions and I/Os are all appropriately connected through the programmable routing. The logic block used in an FPGA can have a significant impact on the performance of an FPGA, and since we are interested in determining the effects and trade-offs of cluster-based logic blocks, we describe cluster-based logic blocks below.

17 CHAPTER 2 Background and Previous Work 8 Logic block I/O block Programmable routing Cluster-Based Logic Blocks FIGURE 2.1 A generic FPGA [Brow92] We are interested in studying logic blocks that consist of a grouping of basic logic elements (BLEs) connected with fast local interconnect. In general, a BLE is a small indivisible unit combining sequential and combinational logic, while the BLE that we study consists of a 4-LUT and a flip-flop as shown in Figure 2.2-b. A logic block combining many BLEs is known as a logic cluster [Betz99, Betz98b]. An example of a logic cluster is the Logic Array Blocks used in Altera s FLEX 6K, FLEX 8K, and FLEX 1K parts [Alte98a], as well as the Configurable Logic Blocks used in the Xilinx 52 [Xili97] and Virtex [Xili98] parts. Figure 2.2-a shows the structure of a logic cluster that consists of one or more BLEs and the routing required to connect them together. The clusters that we study are fully-connected, meaning that any BLE input can connect to any cluster input or any BLE output. Since the cluster is fully connected it is possible to bring a net into the cluster on a single cluster input, and route this net to many BLEs within the cluster via the local routing. This allows the number of nets brought into the cluster (number of cluster inputs

18 CHAPTER 2 Background and Previous Work 9 I Inputs Clock Local Routing (X-Bar) BLE #1... BLE #N (a) Logic Cluster N Outs Inputs 4- LUT Clock DFF Out (b) Basic Logic Element (BLE) FIGURE 2.2 Logic cluster and basic logic element (BLE) used) to be less than the total number of BLE inputs within the cluster. Another benefit of fully connected clusters is that CAD tools are simplified since all BLEs within the cluster are logically equivalent. A logic cluster consisting of BLEs is described with the following four parameters [Betz99, Betz98b]: 1. The size of (number of inputs to) a LUT (K), 2. The number of BLEs in a cluster (N), 3. The number of inputs to the cluster for use as inputs by the LUTs (I), and 4. The number of clock inputs to a cluster (for use by the registers), Mclk. The work of [Betz99, Betz98b] focused on logic clusters in which the LUT size, K, is 4 and the number of clock pins on a cluster, M clk, is 1 this is the case shown in Figure 2.2. The total number of BLE inputs is K N, however, only I inputs are brought into the cluster. [Betz98b] showed that a good rule of thumb 1 is to design logic clusters with I=2 N + 2. Also shown was that FPGAs composed of logic clusters of size 1-1 BLEs (with the exception of size 2) have the best area efficiency. This research did not consider the effect of cluster size on circuit speed, however, it was speculated that larger cluster sizes would have a positive impact on FPGA performance. 1. This rule of thumb applies to the case when the LUT size, K, is 4. An interesting direction for future research would be to study the interactions between LUT size, K, the number of inputs to a cluster, I, and the number of BLEs in a cluster, N, and determine the best combination of these parameters.

19 CHAPTER 2 Background and Previous Work CAD for FPGAs Figure 2.3 illustrates the CAD flow that is used to evaluate FPGA architectures and CAD algorithms. This CAD flow mirrors the actual CAD flow employed by FPGA and ASIC designers. Each circuit we use is logic-optimized by SIS [Sent92] and then technology-mapped into 4-LUTs by FlowMap [Cong94]. VPack [Betz98b] is then used to group the LUTs and registers into logic clusters 1 of the desired size. Finally, we use VPR [Betz98b, Betz99] to place (determine the x, y position of each cluster in the FPGA) and route (connect the wires) each circuit. VPR s timingdriven router extracts the elmore delay [Elmo48] of each routed net, and performs a path-based timing analysis to determine the delay of the circuit s critical path. Finally, VPR uses a transistorbased area model [Betz98b, Betz99] to estimate the total layout area required by this FPGA to implement each circuit. Circuit Logic Optimization Technology Map to 4-LUTs Cluster Size (N) Pack BLEs into Logic Clusters Placement Routing Timing and Area Results FIGURE 2.3 CAD flow 1. Note, following the convention of [Betz98b] our CAD flow shows packing and placement as two separate steps. After packing, we treat a logic cluster as an indivisible unit which is then placed. This division is not always necessary (depending on the CAD flow used), but we impose it in order to simplify the CAD tools. Another approach would be to eliminate packing, and allow the placement algorithm to move LUTs and registers freely between different clusters. This approach to placement would considerably increase the computational complexity of the placement algorithm, but would likely produce better results.

20 CHAPTER 2 Background and Previous Work 11 In this section we first describe how timing analysis is used to evaluate a circuit s speed, and how it guides timing-driven algorithms. Then we discuss two packing algorithms VPack and RASP. After this we discuss placement, and give an overview of Simulated Annealing and VPR s placement tool, and we discuss several timing-driven placement approaches Timing Analysis Timing analysis [Hitc83] has two main purposes: 1. To determine the final maximum speed that a circuit implementation can achieve. 2. To determine the delay of all the paths and connections in a circuit during placement and routing, and use these as a guide to reduce the total circuit delay. To perform a timing analysis, we must first represent the circuit as a directed graph. Nodes in the graph represent input and output pins of circuit elements such as LUTs, registers, and I/O pads. Connections 1 between these nodes are modeled with edges in the graph. These edges are annotated with a delay corresponding to the physical delay between the nodes. To determine the delay of the circuit, a breadth first traversal is performed on the graph starting at sources (input pads, and register outputs). Then we compute the arrival time, T arrival, at all nodes in the circuit with the following equation T ( arrival i ) = Max j fanin ( i ) { T arrival ( j) + delay( j, i) } (2.1) Where node i is the node currently being computed, and delay(j,i) is the delay value of the edge joining node j to node i. The delay of the circuit is then the maximum arrival time, D max, of all nodes in the circuit. 1. In a graph representation of the circuit we define a connection to be an edge between a net driver and any of its terminals.

21 CHAPTER 2 Background and Previous Work 12 To guide a placement or routing algorithm, it is useful to know how much delay may be added to a connection before the path that the connection is on becomes critical. The amount of delay that may be added to a connection before it becomes critical is called the slack [Hitc83] of that connection. To compute the slack of a connection, we must compute the required arrival time, T required, at every node in the circuit. We first set the T required at all sinks (output pads and register inputs) to be D max. Required arrival time is then propagated backwards starting from the sinks with the following equation T required ( i) = Min j fanout( i) { T required ( j) delay( i, j) } (2.2) Finally, the slack of a connection driving node, i, is defined as: Slack( i, j) = T required ( j) T arrival ( i) delay( i, j) (2.3) Packing Algorithms for Cluster-Based FPGAs A packing algorithm takes a netlist consisting of LUTs and registers and produces a netlist consisting of logic clusters. This involves combining the LUTs and registers into BLEs, and then grouping the BLEs into logic clusters (Figure 2.4). There are two main constraints that packing algorithms must meet: 1. The number of BLEs must be less than the cluster size, N. 2. The number of distinct inputs generated outside the cluster and used as inputs to BLEs within the cluster must be less than or equal to the number of cluster inputs, I.

22 CHAPTER 2 Background and Previous Work 13 Netlist of BLEs Netlist of Clusters A A B B F G C H Pack C D D F G Clusters BLEs E E H FIGURE 2.4 Packing example Altera has an in-house tool [Alte95] that targets cluster-based logic blocks, and Xilinx has an inhouse tool targeting the cluster-like logic blocks of the 52 [Xili97] and Virtex [Xili98] FPGAs, however to our knowledge, this work has not been made publicly available. In this section we discuss two publicly available packing algorithms, VPack [Betz98b] and RASP [Cong96] The VPack Logic Cluster Packing Tool VPack [Betz98b, Betz99] takes a netlist of LUTs and registers, and produces a netlist of logic clusters. All parameters relating to the logic clustering (N, I, K, M clk ) are specified at run-time. VPack first groups LUTs and registers into BLEs, and then packs the BLEs into logic clusters. The pseudo-code for the VPack algorithm is given in Figure 2.5 [Betz98b, Betz99]. The VPack algorithm has two optimization goals. The first is to pack each logic cluster to its capacity to minimize the number of clusters needed. The second goal is to minimize the number of inputs to each cluster in order to reduce the number of connections required between clusters.

23 CHAPTER 2 Background and Previous Work 14 Let: UnclusteredBLEs be the set of BLEs not contained in any cluster C be the set of BLEs contained in the current cluster LogicClusters be the set of clusters (where each cluster is a set of BLEs) UnclusteredBLEs = PatternMatchToBLEs (LUTs, Registers); LogicClusters = NULL; while (UnclusteredBLEs!= NULL) { /* More BLEs to cluster */ C = GetBLEwithMostUsedInputs (UnclusteredBLEs); while ( C < N) { /* Cluster is not full */ BestBLE = MaxAttractionLegalBLE (C, UnclusteredBLEs); if (BestBLE == NULL) /* No BLE can be added to cluster */ break; UnclusteredBLEs = UnclusteredBLEs - BestBLE; C = C BestBLE; } LogicClusters = LogicClusters C; } FIGURE 2.5 Pseudo-code for VPack [Betz98b, Betz99] Vpack uses a greedy algorithm to construct each cluster sequentially. At the start of each cluster operation, VPack selects as a seed an unclustered BLE with the most used inputs, and then places this seed into a cluster C. Then VPack selects a new BLE, B to pack into C based on the attraction that B has to C. Attraction is determined by the number of inputs and outputs that B and C have in common: Attraction( B) = Nets( B) Nets( C) (2.4) BLEs are added to the current cluster until it cannot fit any more, at which point packing begins on a new cluster. The process terminates when there are no more unclustered BLEs left.

24 CHAPTER 2 Background and Previous Work 15 The time complexity of this algorithm is O(k max K n) which is a result of the fact that when each BLE is clustered (n BLEs) we must examine all of the nets attached to the BLE (K nets), and we must examine all BLEs that each net fans out to (maximum fanout = k max ). This results in an execution time of about four seconds to pack the largest MCNC 1 circuit (clma) [Yang91] on a 296 MHz UltraSPARC-II processor RASP In [Cong96] the RASP logic block packing tool is described. This tool is capable of mapping circuits represented as a network of LUTs into several different types of logic blocks. This algorithm uses a closeness cost function to weigh the desirability of mapping LUTs into the same logic block. This closeness cost function can be set up to prefer to minimize delay or area, or to maximize routability. The closeness of two LUTs is marked on an edge in a compatibility graph if it is allowable to pack the two LUTs into one logic block. If the LUTs cannot be packed together (i.e. they violate some hard constraint such as number of inputs or BLEs allowed) then there is no edge put into the compatibility graph. The packing step selects LUTs to pack together by performing a maximum weighted matching on the compatibility graph. The complexity of this algorithm is O(nm) where n is the number of LUTs, and m is the number of edges in the compatibility graph. With the logic blocks used in our research, the number of edges, m, in the compatibility graph is O(n 2 ), which leads to an algorithm complexity of O(n 3 ) Placement Placement is the process by which a netlist of circuit blocks (I/Os or logic clusters) is mapped into physical locations in an FPGA. The locations that blocks are mapped to can significantly affect the performance of the FPGA. There are three main goals that placement algorithms may attempt to satisfy: 1. We give a brief overview of the 2 largest MCNC circuits in Appendix A.

25 CHAPTER 2 Background and Previous Work To minimize the amount of wiring required, which we refer to as wirelength-driven placement. 2. To balance the wiring density across the FPGA, called routability-driven placement. 3. Minimize the delay of the critical path(s), called timing-driven placement. Placement algorithms may simultaneously satisfy one or more of these goals. In the remainder of this section we review the Simulated Annealing algorithm that is commonly applied to placement problems. Then we discuss the Simulated Annealing-based placer built into VPR [Betz98b, Betz99] which we call VPlace. After this we review various timing-driven placement approaches Simulated Annealing The Simulated Annealing placement algorithm mimics the annealing process used to gradually cool molten metal to produce high-quality metal structures [Kirk83]. A Simulated Annealingbased placer initially places logic clusters and I/Os (circuit blocks) randomly into physical locations in an FPGA. Then the placement is iteratively improved by randomly swapping blocks and evaluating the goodness of each swap with a cost function. If the move will result in a reduction in the placement cost, then the move is accepted. If the move would cause an increase in the placement cost, then the move may still be accepted even though it makes the placement worse. The purpose of accepting some bad moves is to prevent the Simulated Annealing-based placer from becoming trapped in a local minimum. The probability of accepting a bad move is given by e - C/T, where C is the positive change in cost function that acceptance of the move would result in, and T is a parameter called temperature that controls the likelihood of accepting each move. Initially, a Simulated Annealing-based placer starts at a high temperature, so that almost all moves are accepted, then the temperature is gradually reduced so that the probability of accepting moves that make the placement worse becomes very low. In the final stages of placement only moves that decrease the placement cost are accepted.

26 CHAPTER 2 Background and Previous Work 17 S = RandomPlacement (); T = InitialTemperature (); R limit = InitialR limit (); while (ExitCriterion () == False) { /* Outer loop */ while (InnerLoopCriterion () == False) { /* Inner loop */ S new = GenerateViaMove (S, R limit ); C = Cost (S new ) - Cost (S); if ( C < ) { S = S new /*Move is good, accept*/ } else { r = random (,1); if (r < e - C/T ) { S = S new ; /*Move is bad, accept any way*/ } } } /* End inner loop */ T = UpdateTemp (); R limit = UpdateR limit (); } /* End outer loop */ FIGURE 2.6 Pseudo-code of a generic Simulated Annealing-based placer [Betz98b, Betz99]. In the final (low temperature) stages of the placement, if all blocks in the FPGA are considered for swapping, most swaps will be rejected because they result in large positive changes in the cost function. To increase the number of accepted moves at low temperatures, only blocks that are close together should be considered for swapping since local swaps tend to result in relatively small changes in the placement cost. Accordingly, a Simulated Annealing-based placer uses a parameter called R limit ( range limiter ) that controls how close together circuit blocks must be to be considered for swapping. Initially, R limit spans the entire FPGA which means that blocks on opposite sides of the FPGA may be considered for swapping. As the placement proceeds, R limit is decreased, so that in the final stages of placement, only blocks that are close together are considered for swapping. In Figure 2.6 we show the pseudo-code for a generic Simulated Annealing-based placer, as presented in [Betz98b, Betz99].

27 CHAPTER 2 Background and Previous Work The VPR Placement Tool (VPlace) In this document we will refer to the placement algorithm used within VPR as VPlace. VPlace is a Simulated Annealing-based placement algorithm that attempts to minimize the wirelength of the resulting circuit by placing circuit blocks that are on the same net close together. To accomplish this, VPlace uses a bounding-box based linear congestion [Betz98b, Betz99] cost function to estimate wirelength requirements. The VPlace algorithm follows the format of the pseudo-code shown in Figure 2.6. The linear congestion cost function has the following functional form [Betz98b, Betz99] Cost linear congestion = N nets i = 1 q( i) [ bb x ( i) + bb y ( i) ] (2.5) where there are N nets in the circuit. The cost of each net, i, is determined by its horizontal span, bb x (i), and its vertical span, bb y (i). The q(i) factor compensates for the fact that the bounding box wire length model underestimates the wiring necessary to connect nets with more than three terminals. The values used for q(i) were obtained from [Chen94] so that q(i) is set to 1 for nets with 3 or fewer terminals, and it slowly increases to 2.79 for nets with 5 terminals. Beyond 5 terminals, the q(i) function linearly increases at the rate of q(i) = (Num_Terminals - 5). (2.6) The complexity of this algorithm is O(n 4/3 ) where n is the number of blocks in the circuit Timing-Driven Placement Placement algorithms that attempt to minimize the critical path delay of the resulting circuits are called timing-driven. There are different approaches to minimizing critical path delay in timingdriven placement algorithms. One approach which we call path-based timing-driven placement computes path delays at every stage of the placement, and uses theses delays in its cost function. This path-based approach is computationally expensive since path delays must be continuously re-computed. Another approach is connection-based timing-driven placement, which involves

28 CHAPTER 2 Background and Previous Work 19 performing a path-based timing analysis and assigning slacks to each connection in the circuit. Then during placement, more attention is paid to connections with low slack, but the more global view of the complete path delay is not used. It is also possible to combine connection-based and path-based timing-driven placement by periodically performing a full path analysis based on the current placement, and then updating the slacks on individual connections. In this section we discuss the existing timing-driven placement algorithms that are most relevant to our work. TimberWolfSC The TimberWolfSC timing driven placement algorithm for row-based standard cell ICs is presented in [Swar95]. This algorithm uses a Simulated Annealing approach to placement. In this algorithm, net delay is computed as Net Delay = T driver + R driver ( C net + C gates ) (2.7) Where T driver is the intrinsic delay of the driver, R driver is the resistance of the driver, C net is the estimated capacitance of the net, and C gates is the gate input capacitance of all sinks on the net. The arrival time at the sink of a path is the summation of all of the net delays along that path. This formulation of delay assumes that the driver resistance is much larger than the wiring resistance (so that it can ignore wiring resistance). The fact that wiring resistance is ignored likely makes these net delays optimistic, especially for circuits implemented in deep-submicron processes where wiring resistance and delay is significant. The cost function used in this algorithm penalizes any paths where the arrival time is greater than the required (user defined) arrival time with the following: Penalty = T arrival T required (2.8) The total timing penalty P t is the sum of all critical path penalties.

29 CHAPTER 2 Background and Previous Work 2 P t = Penalty paths (2.9) The cost function consists of two terms, a wire length term represented by W, total timing penalty, P t, and a trade-off variable λ that trades off between the two terms Cost = W + λ P t (2.1) The authors of [Swar95] found that setting λ = W P t (2.11) gave the best results, where W is the a verage change in wire length and P t is the average change in the timing penalty measured during the first outer loop iteration of a Simulated Annealing algorithm. This implies that changes in the timing penalty are three times as important as changes in the wire length. The authors presented results for three MCNC standard cell circuits, for which timing information was previously available. Compared to the previous results they reduced delay by 28% - 5% at an area cost of between 2.5% and 6%. It is not clear from the paper how the previous timing results were obtained. This algorithm is path based, so the computational complexity is likely quite high, but is not revealed in the paper. PROXI In [Nag95] a performance-driven simultaneous place and route algorithm (PROXI) is presented. After each placement perturbation in the anneal, a small subset of relevant nets (previously unroutable and newly disturbed nets) is ripped up and rerouted with a fast maze router. As the placement evolves the critical path is evaluated. The cost function used in this algorithm is Cost = W r R + W t T (2.12)

30 CHAPTER 2 Background and Previous Work 21 Where R is the number of unrouted nets and T is the critical path. W r and W t are weights that are determined adaptively at runtime so as to normalize the components of the cost function so that each term contributes equally to the cost function. This algorithm is unique in that it performs placement and routing simultaneously most place and route software does placement first, and then routes the placed circuit. Performing placement and routing in one stage should theoretically give better results than a two stage (place then route) algorithm, however it is much more computationally expensive. This algorithm achieves 8% - 15% improvement in delay when compared to the Xilinx XACT5. place and route system. This algorithm, however, has a significant disadvantage in CPU compile time compared to the XACT5. tool, ranging from 6 times for the smallest design (12x12 array), to 11 times for the largest design (16x16 array). 2.3 Summary In this chapter we presented an overview of FPGA architecture including a description of cluster based logic blocks [Betz99, Betz98b]. Then we discussed CAD for FPGAs. This included discussions of timing analysis, packing algorithms, and placement. TABLE 1.1 TABLE 2.1

31 CHAPTER 2 Background and Previous Work 22

32 23 CHAPTER 3 Timing-Driven and Connection-Driven Packing In this chapter we first discuss the experimental methodology that we use to evaluate different CAD algorithms and FPGA architectures. Then we introduce two new packing algorithms that are extensions to the VPack [Betz98b, Betz99] algorithm. The first is a timing-driven packing algorithm that we call T-VPack, and the second is a connection-absorption-driven packing algorithm that we call C-VPack. We then compare the results of both of these algorithms to the results of VPack. 3.1 Experimental Methodology The CAD flow that we use to evaluate different CAD algorithms and FPGA architectures is the same as in [Betz98b, Betz99], and is given in Figure 3.1. First each circuit is logic-optimized by SIS [Sent92] and technology mapped into 4-LUTs by FlowMap [Cong94]. T-VPack (described in Section 3.2) is then used to group the LUTs and registers into logic clusters of the desired size with the desired number of inputs. Then VPR is used to place and route each circuit. The placement algorithm in VPR is simulated annealing based and optimizes the final placement to minimize the required routing area. The router in VPR is fully timing-driven and attempts to minimize the critical path delay (given the current placement). After placement and routing, we

33 CHAPTER 3 Timing-Driven and Connection-Driven Packing 24 Cluster Parameters (N, I, K) Routing Architecture Parameters (Fc, etc.) Circuit Logic optimization (SIS) Technology map to 4-LUTS (FlowMap + Flowpack) Pack FFs and LUTs into logic clusters (T-VPack) Placement (VPR) Routing (VPR, timing-driven router) Min # tracks? No Yes Wmin determined Adjust channel capacities (W) Routing with W = 1.2 Wmin (VPR, timing-driven router) Determine critical path delay and transistor area to build FPGA (VPR + TransCount) FIGURE 3.1 Architecture evaluation CAD flow [Betz98b, Betz99]. know the estimated area and track width required to implement each circuit and the estimated critical path delay, where area and delay values are computed using the area and delay models described in the next chapter. Figure 3.1 shows how VPR computes the minimum number of tracks in which a circuit will route, which we refer to as a high-stress routing. Basically VPR repeatedly routes each circuit with different channel widths (number of tracks per channel), scaling the FPGA s architecture until it finds the minimum number of tracks in which the circuit will route. We define a low-stress routing (as does [Swar98a]) to occur when an FPGA has 2% more routing resources than the minimum required to route a given circuit. We feel that low-stress routings are indicative of how an FPGA will generally be used (it is rare that a user will utilize 1% of all routing and logic

34 CHAPTER 3 Timing-Driven and Connection-Driven Packing 25 resources), so many of our delay results are based on low-stress routings. We also present results that are based on an infinite 1 number of routing resources. These infinite routing results tell us the best possible router-achievable speed of a circuit given the current packing and placement of that circuit. We feel that is a useful indicator of how well a packing or placement algorithm performs with respect delay. By allowing the channel width to vary, and searching for the minimum routable width, we can detect small improvements in FPGA architectures or CAD algorithms that might otherwise go unnoticed. Compare this to mapping a circuit into a fixed size FPGA this would only tell us if the circuit fit or not. A binary result like this makes it is difficult to draw conclusions about new architectures or CAD algorithms. 3.2 Timing-Driven Packing: T-VPack Our timing-driven logic block packing algorithm, T-VPack, attempts not only to pack each logic block to capacity and minimize the number of cluster inputs used, but also to minimize the number of inter-cluster (between cluster) connections on the critical path(s). The local routing within clusters is faster than the general-purpose routing between logic clusters, so reducing the number of inter-cluster connections on the critical path(s) reduces circuit delay. The basic operation of the algorithm is the same as that of the VPack algorithm described in Section with a few modifications. We show the pseudo-code for the T-VPack algorithm in Figure 3.2. T-VPack first performs a timing analysis (defined in Section 2.2.1) to determine the critical path(s) of the circuit. Then T-Vpack finds a seed BLE by selecting a BLE on the critical path(s) rather than selecting a BLE with the most used inputs. BLEs are then added to the current cluster 1. Infinite routing resource results are delay results from the router when it ignores congestion, i.e. the router is allowed to use a single resource for multiple un-related connections. This allows the router to allocate the fastest possible resource for every connection in the circuit. See [Betz98b, Betz99] for a detailed description of how the router in VPR works.

35 CHAPTER 3 Timing-Driven and Connection-Driven Packing 26 Let: UnclusteredBLEs be the set of BLEs not contained in any cluster C be the set of BLEs contained in the current cluster LogicClusters be the set of clusters (where each cluster is a set of BLEs) UnclusteredBLEs = PatternMatchToBLEs (LUTs, Registers); LogicClusters = NULL; ComputeCriticalities(); BLEsSinceLastCriticalityRecompute = ; while (UnclusteredBLEs!= NULL) { /* More BLEs to cluster */ C = GetMostCriticalBLE (UnclusteredBLEs); BLEsSinceLastCriticalityRecompute ++; while ( C < N) { /* Cluster is not full */ if (BLEsSinceLastCriticalityRecompute >= RecomputeInterval) { ComputeCriticalities(); BLEsSinceLastCriticalityRecompute = ; } BestBLE = MaxAttractionLegalBLE (C, UnclusteredBLEs); if (BestBLE == NULL) /* No BLE can be added to cluster */ break; UnclusteredBLEs = UnclusteredBLEs - BestBLE; C = C BestBLE; BLEsSinceLastCriticalityRecompute ++; } } LogicClusters = LogicClusters C; FIGURE 3.2 Pseudo-code for T-VPack based on the attraction they have to the current cluster, where the attraction function is modified to prefer to absorb connections along the critical paths(s). After each cluster is full, packing begins on a new cluster. In this section we first discuss timing-analysis and delay modeling within T-VPack. Then we give details of the algorithm implementation. After this we provide an analysis of T-VPack to see the effect of various parameters within T-VPack. Finally after this we analyze the complexity of the algorithm.

36 CHAPTER 3 Timing-Driven and Connection-Driven Packing Timing Analysis and Delay Models To minimize the number of inter-cluster connections on the critical path(s), T-VPack first needs to determine which connections are on the critical path(s). Accordingly, T-VPack performs a timing analysis to determine the slack of each connection between BLEs. The timing analyzer within T- VPack models three types of delay: the delay through a BLE, or LogicDelay, the connection delay between blocks within the same cluster or IntraClusterConnectionDelay, and the connection delay between blocks that are in different clusters, or InterClusterConnectionDelay. The delay of a connection between two BLEs in different logic clusters is not known until after a circuit has been placed and routed, so T-VPack approximates the delay between clusters as a constant Inter- ClusterConnectionDelay. Note that this leads to some inaccuracy in T-VPack s estimate of where the critical path(s) lies, so that sometimes T-VPack will be attempting to shorten a path that will not be part of the post-place-and-route critical path(s). The performance of T-VPack is not very sensitive to the exact values chosen for these three delay parameters. Throughout this work we set LogicDelay to.1, IntraClusterConnectionDelay to.1 and InterClusterConnectionDelay to 1.. Note that the timing analysis can be performed as often as the user specifies, i.e. a timing analysis can be performed after each BLE is clustered, or at the other end of the spectrum timing analysis may be done once at the beginning of the algorithm execution and never again. The effect of this recompute interval is discussed in Section Timing-Driven Packing Description After a timing analysis is complete, we are able to begin packing. This section describes how we determine which BLE will be selected as a seed for each cluster, and how BLEs to be added to each cluster are selected. We first define many sub-equations that are used in selecting a cluster seed and in the attraction function. After these preliminaries, we finally present how we select a cluster seed, and our new attraction function.

SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES

SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES Alexander (Sandy) Marquardt, Vaughn Betz, and Jonathan Rose Right Track CAD Corp. #313-72 Spadina Ave. Toronto, ON, Canada M5S 2T9 {arm, vaughn,

More information

Timing-Driven Placement for FPGAs

Timing-Driven Placement for FPGAs Timing-Driven Placement for FPGAs Alexander (Sandy) Marquardt, Vaughn Betz, and Jonathan Rose 1 {arm, vaughn, jayar}@rtrack.com Right Track CAD Corp., Dept. of Electrical and Computer Engineering, 720

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Place and Route for FPGAs

Place and Route for FPGAs Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming

More information

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs Vaughn Betz Jonathan Rose Alexander Marquardt

More information

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR An Introduction to FPGA Placement Yonghong Xu Supervisor: Dr. Khalid RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR

More information

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface. Placement Introduction A very important step in physical design cycle. A poor placement requires larger area. Also results in performance degradation. It is the process of arranging a set of modules on

More information

Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures

Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures by Daniele G Paladino A thesis submitted in conformity with the requirements for the degree of Master of Applied

More information

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA A Path Based Algorithm for Timing Driven Logic Replication in FPGA By Giancarlo Beraudo B.S., Politecnico di Torino, Torino, 2001 THESIS Submitted as partial fulfillment of the requirements for the degree

More information

Niyati Shah Department of ECE University of Toronto

Niyati Shah Department of ECE University of Toronto Niyati Shah Department of ECE University of Toronto shahniya@eecg.utoronto.ca Jonathan Rose Department of ECE University of Toronto jayar@eecg.utoronto.ca 1 Involves connecting output pins of logic blocks

More information

Toward More Efficient Annealing-Based Placement for Heterogeneous FPGAs. Yingxuan Liu

Toward More Efficient Annealing-Based Placement for Heterogeneous FPGAs. Yingxuan Liu Toward More Efficient Annealing-Based Placement for Heterogeneous FPGAs by Yingxuan Liu A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department

More information

Using Sparse Crossbars within LUT Clusters

Using Sparse Crossbars within LUT Clusters Using Sparse Crossbars within LUT Clusters Guy Lemieux Dept. of Electrical and Computer Engineering University of Toronto Toronto, Ontario, Canada M5S 3G4 lemieux@eecg.toronto.edu David Lewis Dept. of

More information

Reducing Power in an FPGA via Computer-Aided Design

Reducing Power in an FPGA via Computer-Aided Design Reducing Power in an FPGA via Computer-Aided Design Steve Wilton University of British Columbia Power Reduction via CAD How to reduce power dissipation in an FPGA: - Create power-aware CAD tools - Create

More information

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric. SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA

More information

EN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5)

EN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5) EN2911X: Lecture 13: Design Flow: Physical Synthesis (5) Prof. Sherief Reda Division of Engineering, rown University http://scale.engin.brown.edu Fall 09 Summary of the last few lectures System Specification

More information

Placement Algorithm for FPGA Circuits

Placement Algorithm for FPGA Circuits Placement Algorithm for FPGA Circuits ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,

More information

On pin-to-wire routing in FPGAs. Niyati Shah

On pin-to-wire routing in FPGAs. Niyati Shah On pin-to-wire routing in FPGAs by Niyati Shah A thesis submitted in conformity with the requirements for the degree of Master of Applied Science and Engineering Graduate Department of Electrical & Computer

More information

Chapter 5: ASICs Vs. PLDs

Chapter 5: ASICs Vs. PLDs Chapter 5: ASICs Vs. PLDs 5.1 Introduction A general definition of the term Application Specific Integrated Circuit (ASIC) is virtually every type of chip that is designed to perform a dedicated task.

More information

Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices

Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Deshanand P. Singh Altera Corporation dsingh@altera.com Terry P. Borer Altera Corporation tborer@altera.com

More information

SYNTHETIC CIRCUIT GENERATION USING CLUSTERING AND ITERATION

SYNTHETIC CIRCUIT GENERATION USING CLUSTERING AND ITERATION SYNTHETIC CIRCUIT GENERATION USING CLUSTERING AND ITERATION Paul D. Kundarewich and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, ON, M5S G4, Canada {kundarew,

More information

CAD Algorithms. Placement and Floorplanning

CAD Algorithms. Placement and Floorplanning CAD Algorithms Placement Mohammad Tehranipoor ECE Department 4 November 2008 1 Placement and Floorplanning Layout maps the structural representation of circuit into a physical representation Physical representation:

More information

OPTIMIZATION OF TRANSISTOR-LEVEL FLOORPLANS FOR FIELD-PROGRAMMABLE GATE ARRAYS. Ryan Fung. Supervisor: Jonathan Rose. April 2002

OPTIMIZATION OF TRANSISTOR-LEVEL FLOORPLANS FOR FIELD-PROGRAMMABLE GATE ARRAYS. Ryan Fung. Supervisor: Jonathan Rose. April 2002 OPTIMIZATION OF TRANSISTOR-LEVEL FLOORPLANS FOR FIELD-PROGRAMMABLE GATE ARRAYS by Ryan Fung Supervisor: Jonathan Rose April 2002 OPTIMIZATION OF TRANSISTOR-LEVEL FLOORPLANS FOR FIELD-PROGRAMMABLE GATE

More information

CAD Flow for FPGAs Introduction

CAD Flow for FPGAs Introduction CAD Flow for FPGAs Introduction What is EDA? o EDA Electronic Design Automation or (CAD) o Methodologies, algorithms and tools, which assist and automatethe design, verification, and testing of electronic

More information

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University Abbas El Gamal Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program Stanford University Chip stacking Vertical interconnect density < 20/mm Wafer Stacking

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

MODULAR PARTITIONING FOR INCREMENTAL COMPILATION

MODULAR PARTITIONING FOR INCREMENTAL COMPILATION MODULAR PARTITIONING FOR INCREMENTAL COMPILATION Mehrdad Eslami Dehkordi, Stephen D. Brown Dept. of Electrical and Computer Engineering University of Toronto, Toronto, Canada email: {eslami,brown}@eecg.utoronto.ca

More information

Exploring Logic Block Granularity for Regular Fabrics

Exploring Logic Block Granularity for Regular Fabrics 1530-1591/04 $20.00 (c) 2004 IEEE Exploring Logic Block Granularity for Regular Fabrics A. Koorapaty, V. Kheterpal, P. Gopalakrishnan, M. Fu, L. Pileggi {aneeshk, vkheterp, pgopalak, mfu, pileggi}@ece.cmu.edu

More information

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada, V6T

More information

Advanced FPGA Design Methodologies with Xilinx Vivado

Advanced FPGA Design Methodologies with Xilinx Vivado Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,

More information

Memory Footprint Reduction for FPGA Routing Algorithms

Memory Footprint Reduction for FPGA Routing Algorithms Memory Footprint Reduction for FPGA Routing Algorithms Scott Y.L. Chin, and Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, B.C., Canada email:

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

TROUTE: A Reconfigurability-aware FPGA Router

TROUTE: A Reconfigurability-aware FPGA Router TROUTE: A Reconfigurability-aware FPGA Router Karel Bruneel and Dirk Stroobandt Hardware and Embedded Systems Group, ELIS Dept., Ghent University, Sint-Pietersnieuwstraat 4, B-9000 Gent, Belgium {karel.bruneel;dirk.stroobandt}@ugent.be

More information

L14 - Placement and Routing

L14 - Placement and Routing L14 - Placement and Routing Ajay Joshi Massachusetts Institute of Technology RTL design flow HDL RTL Synthesis manual design Library/ module generators netlist Logic optimization a b 0 1 s d clk q netlist

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS

ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS ABSTRACT As Field-Programmable Gate Array (FPGA) power consumption continues to increase, lower power FPGA circuitry, architectures, and Computer-Aided

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 6 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Contents FPGA Technology Programmable logic Cell (PLC) Mux-based cells Look up table PLA

More information

Architecture Evaluation for

Architecture Evaluation for Architecture Evaluation for Power-efficient FPGAs Fei Li*, Deming Chen +, Lei He*, Jason Cong + * EE Department, UCLA + CS Department, UCLA Partially supported by NSF and SRC Outline Introduction Evaluation

More information

Congestion-Driven Regional Re-clustering for Low-Cost FPGAs

Congestion-Driven Regional Re-clustering for Low-Cost FPGAs Congestion-Driven Regional Re-clustering for Low-Cost FPGAs Darius Chiu, Guy G.F. Lemieux, Steve Wilton Electrical and Computer Engineering, University of British Columbia British Columbia, Canada dariusc@ece.ubc.ca

More information

An automatic tool flow for the combined implementation of multi-mode circuits

An automatic tool flow for the combined implementation of multi-mode circuits An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João M. P. Cardoso and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat

More information

Animation of VLSI CAD Algorithms A Case Study

Animation of VLSI CAD Algorithms A Case Study Session 2220 Animation of VLSI CAD Algorithms A Case Study John A. Nestor Department of Electrical and Computer Engineering Lafayette College Abstract The design of modern VLSI chips requires the extensive

More information

Georgia Institute of Technology College of Engineering School of Electrical and Computer Engineering

Georgia Institute of Technology College of Engineering School of Electrical and Computer Engineering Georgia Institute of Technology College of Engineering School of Electrical and Computer Engineering ECE 8832 Summer 2002 Floorplanning by Simulated Annealing Adam Ringer Todd M c Kenzie Date Submitted:

More information

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors RPack: Rability-Driven packing for cluster-based FPGAs E. Bozorgzadeh S. Ogrenci-Memik M. Sarrafzadeh Computer Science Department Department ofece Computer Science Department UCLA Northwestern University

More information

CS310 Embedded Computer Systems. Maeng

CS310 Embedded Computer Systems. Maeng 1 INTRODUCTION (PART II) Maeng Three key embedded system technologies 2 Technology A manner of accomplishing a task, especially using technical processes, methods, or knowledge Three key technologies for

More information

Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation

Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation Hamid Shojaei, and Azadeh Davoodi University of Wisconsin 1415 Engineering Drive, Madison WI 53706 Email: {shojaei,

More information

FPGA Power and Timing Optimization: Architecture, Process, and CAD

FPGA Power and Timing Optimization: Architecture, Process, and CAD FPGA Power and Timing Optimization: Architecture, Process, and CAD Chun Zhang, Lerong Cheng, Lingli Wang* and Jiarong Tong Abstract Field programmable gate arrays (FPGAs) allow the same silicon implementation

More information

Digital VLSI Design. Lecture 7: Placement

Digital VLSI Design. Lecture 7: Placement Digital VLSI Design Lecture 7: Placement Semester A, 2016-17 Lecturer: Dr. Adam Teman 29 December 2016 Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from

More information

Iterative-Constructive Standard Cell Placer for High Speed and Low Power

Iterative-Constructive Standard Cell Placer for High Speed and Low Power Iterative-Constructive Standard Cell Placer for High Speed and Low Power Sungjae Kim and Eugene Shragowitz Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455

More information

A Novel Net Weighting Algorithm for Timing-Driven Placement

A Novel Net Weighting Algorithm for Timing-Driven Placement A Novel Net Weighting Algorithm for Timing-Driven Placement Tim (Tianming) Kong Aplus Design Technologies, Inc. 10850 Wilshire Blvd., Suite #370 Los Angeles, CA 90024 Abstract Net weighting for timing-driven

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

A Routing Approach to Reduce Glitches in Low Power FPGAs

A Routing Approach to Reduce Glitches in Low Power FPGAs A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin Wong Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign This research

More information

Automatic Generation of FPGA Routing Architectures from High-Level Descriptions

Automatic Generation of FPGA Routing Architectures from High-Level Descriptions Automatic Generation of FPGA Routing Architectures from High-Level Descriptions Vaughn Betz and Jonathan Rose {vaughn, jayar}@rtrack.com Right Track CAD Corp., Dept. of Electrical and Computer Engineering,

More information

Runtime and Quality Tradeoffs in FPGA Placement and Routing

Runtime and Quality Tradeoffs in FPGA Placement and Routing Runtime and Quality Tradeoffs in FPGA Placement and Routing Chandra Mulpuri Department of Electrical Engineering University of Washington, Seattle, WA 98195, USA chandi@ee.washington.edu Scott Hauck Department

More information

Vdd Programmability to Reduce FPGA Interconnect Power

Vdd Programmability to Reduce FPGA Interconnect Power Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important

More information

DESIGN METHODS IN SUB-MICRON TECHNOLOGIES

DESIGN METHODS IN SUB-MICRON TECHNOLOGIES Chapter 1 DESIGN METHODS IN SUB-MICRON TECHNOLOGIES Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 Field Programmable Gate Arrays (FPGA) provides an

More information

Designing Heterogeneous FPGAs with Multiple SBs *

Designing Heterogeneous FPGAs with Multiple SBs * Designing Heterogeneous FPGAs with Multiple SBs * K. Siozios, S. Mamagkakis, D. Soudris, and A. Thanailakis VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus

More information

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Digital Design Methodology (Revisited) Design Methodology: Big Picture Digital Design Methodology (Revisited) Design Methodology Design Specification Verification Synthesis Technology Options Full Custom VLSI Standard Cell ASIC FPGA CS 150 Fall 2005 - Lec #25 Design Methodology

More information

Parallel Simulated Annealing for VLSI Cell Placement Problem

Parallel Simulated Annealing for VLSI Cell Placement Problem Parallel Simulated Annealing for VLSI Cell Placement Problem Atanu Roy Karthik Ganesan Pillai Department Computer Science Montana State University Bozeman {atanu.roy, k.ganeshanpillai}@cs.montana.edu VLSI

More information

Term Paper for EE 680 Computer Aided Design of Digital Systems I Timber Wolf Algorithm for Placement. Imran M. Rizvi John Antony K.

Term Paper for EE 680 Computer Aided Design of Digital Systems I Timber Wolf Algorithm for Placement. Imran M. Rizvi John Antony K. Term Paper for EE 680 Computer Aided Design of Digital Systems I Timber Wolf Algorithm for Placement By Imran M. Rizvi John Antony K. Manavalan TimberWolf Algorithm for Placement Abstract: Our goal was

More information

IMPROVING TIMING-DRIVEN FPGA PACKING WITH PHYSICAL INFORMATION

IMPROVING TIMING-DRIVEN FPGA PACKING WITH PHYSICAL INFORMATION IMPROVING TIMING-DRIVEN FPGA PACKING WITH PHYSICAL INFORMATION Doris T. Chen, Kristofer Vorwerk, Andrew Kennings University of Waterloo Waterloo, ON {dtlchen,kpvorwer,akenning}@cheetah.vlsi.uwaterloo.ca

More information

Estimation of Wirelength

Estimation of Wirelength Placement The process of arranging the circuit components on a layout surface. Inputs: A set of fixed modules, a netlist. Goal: Find the best position for each module on the chip according to appropriate

More information

Digital Design Methodology

Digital Design Methodology Digital Design Methodology Prof. Soo-Ik Chae Digital System Designs and Practices Using Verilog HDL and FPGAs @ 2008, John Wiley 1-1 Digital Design Methodology (Added) Design Methodology Design Specification

More information

Logic Block Clustering of Large Designs for Channel-Width Constrained FPGAs

Logic Block Clustering of Large Designs for Channel-Width Constrained FPGAs { Logic Block Clustering of Large Designs for Channel-Width Constrained FPGAs Marvin Tom marvint @ ece.ubc.ca Guy Lemieux lemieux @ ece.ubc.ca Dept of ECE, University of British Columbia, Vancouver, BC,

More information

Design Methodologies and Tools. Full-Custom Design

Design Methodologies and Tools. Full-Custom Design Design Methodologies and Tools Design styles Full-custom design Standard-cell design Programmable logic Gate arrays and field-programmable gate arrays (FPGAs) Sea of gates System-on-a-chip (embedded cores)

More information

3. G. G. Lemieux and S. D. Brown, ëa detailed router for allocating wire segments

3. G. G. Lemieux and S. D. Brown, ëa detailed router for allocating wire segments . Xilinx, Inc., The Programmable Logic Data Book, 99.. G. G. Lemieux and S. D. Brown, ëa detailed router for allocating wire segments in æeld-programmable gate arrays," in Proceedings of the ACM Physical

More information

Advanced FPGA Design Methodologies with Xilinx Vivado

Advanced FPGA Design Methodologies with Xilinx Vivado Advanced FPGA Design Methodologies with Xilinx Vivado Lecturer: Alexander Jäger Course of studies: Technische Informatik Student number: 3158849 Date: 30.01.2015 30/01/15 Advanced FPGA Design Methodologies

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

Multilayer Routing on Multichip Modules

Multilayer Routing on Multichip Modules Multilayer Routing on Multichip Modules ECE 1387F CAD for Digital Circuit Synthesis and Layout Professor Rose Friday, December 24, 1999. David Tam (2332 words, not counting title page and reference section)

More information

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic ECE 42/52 Rapid Prototyping with FPGAs Dr. Charlie Wang Department of Electrical and Computer Engineering University of Colorado at Colorado Springs Evolution of Implementation Technologies Discrete devices:

More information

FPGAs & Multi-FPGA Systems. FPGA Abstract Model. Logic cells imbedded in a general routing structure. Logic cells usually contain:

FPGAs & Multi-FPGA Systems. FPGA Abstract Model. Logic cells imbedded in a general routing structure. Logic cells usually contain: s & Multi- Systems Fit logic into a prefabricated system Fixed inter-chip routing Fixed on-chip logic & routing XBA Partitioning Global outing Technology Map. XBA XBA Placement outing 23 Abstract Model

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

Chapter 5 Global Routing

Chapter 5 Global Routing Chapter 5 Global Routing 5. Introduction 5.2 Terminology and Definitions 5.3 Optimization Goals 5. Representations of Routing Regions 5.5 The Global Routing Flow 5.6 Single-Net Routing 5.6. Rectilinear

More information

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering FPGA Fabrics Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 Logic Design Process Combinational logic networks Functionality. Other requirements: Size. Power. Primary inputs Performance.

More information

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas Power Solutions for Leading-Edge FPGAs Vaughn Betz & Paul Ekas Agenda 90 nm Power Overview Stratix II : Power Optimization Without Sacrificing Performance Technical Features & Competitive Results Dynamic

More information

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Chen-Wei Liu 12 and Yao-Wen Chang 2 1 Synopsys Taiwan Limited 2 Department of Electrical Engineering National Taiwan University,

More information

Basic Idea. The routing problem is typically solved using a twostep

Basic Idea. The routing problem is typically solved using a twostep Global Routing Basic Idea The routing problem is typically solved using a twostep approach: Global Routing Define the routing regions. Generate a tentative route for each net. Each net is assigned to a

More information

Integrated Retiming and Placement for Field Programmable Gate Arrays

Integrated Retiming and Placement for Field Programmable Gate Arrays Integrated Retiming and Placement for Field Programmable Gate Arrays Deshanand P. Singh Dept. of Electrical and Computer Engineering University of Toronto Toronto, Canada singhd@eecg.toronto.edu Stephen

More information

Design Methodologies. Full-Custom Design

Design Methodologies. Full-Custom Design Design Methodologies Design styles Full-custom design Standard-cell design Programmable logic Gate arrays and field-programmable gate arrays (FPGAs) Sea of gates System-on-a-chip (embedded cores) Design

More information

FPGA Clock Network Architecture: Flexibility vs. Area and Power

FPGA Clock Network Architecture: Flexibility vs. Area and Power FPGA Clock Network Architecture: Flexibility vs. Area and Power Julien Lamoureux and Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, B.C.,

More information

Don t expect to be able to write and debug your code during the lab session.

Don t expect to be able to write and debug your code during the lab session. EECS150 Spring 2002 Lab 4 Verilog Simulation Mapping UNIVERSITY OF CALIFORNIA AT BERKELEY COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Lab 4 Verilog Simulation Mapping

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Planning for Local Net Congestion in Global Routing

Planning for Local Net Congestion in Global Routing Planning for Local Net Congestion in Global Routing Hamid Shojaei, Azadeh Davoodi, and Jeffrey Linderoth* Department of Electrical and Computer Engineering *Department of Industrial and Systems Engineering

More information

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Fall 2002 EECS150 - Lec06-FPGA Page 1 Outline What are FPGAs? Why use FPGAs (a short history

More information

Challenges of FPGA Physical Design

Challenges of FPGA Physical Design Challenges of FPGA Physical Design Larry McMurchie 1 and Jovanka Ciric Vujkovic 2 1 Principal Engineer, Solutions Group, Synopsys, Inc., Mountain View, CA, USA 2 R&D Manager, Solutions Group, Synopsys,

More information

An LP-based Methodology for Improved Timing-Driven Placement

An LP-based Methodology for Improved Timing-Driven Placement An LP-based Methodology for Improved Timing-Driven Placement Qingzhou (Ben) Wang, John Lillis and Shubhankar Sanyal Department of Computer Science University of Illinois at Chicago Chicago, IL 60607 {qwang,

More information

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs? EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Outline What are FPGAs? Why use FPGAs (a short history lesson). FPGA variations Internal logic

More information

Architecture and Synthesis of. Field-Programmable Gate Arrays with. Hard-wired Connections. Kevin Charles Kenton Chung

Architecture and Synthesis of. Field-Programmable Gate Arrays with. Hard-wired Connections. Kevin Charles Kenton Chung Architecture and Synthesis of Field-Programmable Gate Arrays with Hard-wired Connections by Kevin Charles Kenton Chung A thesis submitted in conformity with the requirements for the Degree of Doctor of

More information

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints

Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints Amit Kulkarni, Tom Davidson, Karel Heyse, and Dirk Stroobandt ELIS department, Computer Systems Lab, Ghent

More information

FAST time-to-market, steadily decreasing cost, and

FAST time-to-market, steadily decreasing cost, and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 10, OCTOBER 2004 1015 Power Estimation Techniques for FPGAs Jason H. Anderson, Student Member, IEEE, and Farid N. Najm, Fellow,

More information

Automated FPGA Design, Verification and Layout. Ian Carlos Kuon

Automated FPGA Design, Verification and Layout. Ian Carlos Kuon Automated FPGA Design, Verification and Layout by Ian Carlos Kuon A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and

More information

PRODUCT-TERM BASED SYNTHESIZABLE EMBEDDED PROGRAMMABLE LOGIC CORES. Andy Chee Wai Yan B.A.Sc, University of British Columbia, 2002

PRODUCT-TERM BASED SYNTHESIZABLE EMBEDDED PROGRAMMABLE LOGIC CORES. Andy Chee Wai Yan B.A.Sc, University of British Columbia, 2002 PRODUCT-TERM BASED SYNTHESIZABLE EMBEDDED PROGRAMMABLE LOGIC CORES by Andy Chee Wai Yan B.A.Sc, University of British Columbia, 2002 A thesis submitted in partial fulfillment of the requirements for the

More information

Routing. Robust Channel Router. Figures taken from S. Gerez, Algorithms for VLSI Design Automation, Wiley, 1998

Routing. Robust Channel Router. Figures taken from S. Gerez, Algorithms for VLSI Design Automation, Wiley, 1998 Routing Robust Channel Router Figures taken from S. Gerez, Algorithms for VLSI Design Automation, Wiley, 1998 Channel Routing Algorithms Previous algorithms we considered only work when one of the types

More information

Variation Aware Routing for Three-Dimensional FPGAs

Variation Aware Routing for Three-Dimensional FPGAs Variation Aware Routing for Three-Dimensional FPGAs Chen Dong, Scott Chilstedt, and Deming Chen Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign {cdong3, chilste1,

More information

ECE 4514 Digital Design II. Spring Lecture 20: Timing Analysis and Timed Simulation

ECE 4514 Digital Design II. Spring Lecture 20: Timing Analysis and Timed Simulation ECE 4514 Digital Design II Lecture 20: Timing Analysis and Timed Simulation A Tools/Methods Lecture Topics Static and Dynamic Timing Analysis Static Timing Analysis Delay Model Path Delay False Paths Timing

More information