Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors

Size: px
Start display at page:

Download "Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors"

Transcription

1 RPack: Rability-Driven packing for cluster-based FPGAs E. Bozorgzadeh S. Ogrenci-Memik M. Sarrafzadeh Computer Science Department Department ofece Computer Science Department UCLA Northwestern University UCLA Los Angeles, CA Evanston, IL Los Angeles, CA Abstract Ring tools consume a signicant portion of the total design time. Considering rability at earlier steps of the CAD ow would both yield better quality and faster design process. In this paper we are presenting a rability-driven clustering method for cluster-based FPGAs. Our method packs LUTs into logic clusters while incorporating rability metrics into a cost function. The objective is to minimize this rability cost function. Our cost function is consistently able to indicate improved rability. Our method yields up to 50 % improvement over existing clustering methods in terms of the number of ring tracks required. The average improvement obtained is 16.5 %. Reduction in number of tracks yields reduced ring area. I. Introduction CAD ow for FPGAs contains four major steps which are logic optimization, technology mapping, placement, and ring. Among those, placement and ring are the most time consuming tasks. Due to highly constrained and discrete interconnect structure of current FPGAs, ring is a challenging problem. Most of the time current FPGA rers can not use available ring resources eciently. This leads to a large portion of the ring area to be wasted. Also, depending on the complexity of the particular design ring might require a fairly large amount of time, often several hours to be completed. Hence considering rability at earlier steps of the CAD ow would both yield a better quality result and less design time. FPGAs consist of smaller congurable building blocks called logic blocks or congurable logic blocks (CLBs), which are placed on the FPGA chip either on a two-dimensional array orinrows. There are two kinds of such logic blocks, LUT based blocks and multiplexor based blocks. LUT based logic blocks are more popular. One LUT based logic block may contain multiple LUTs depending on the type of the FPGA. FPGAs with logic blocks containing multiple LUTs are called cluster-based FPGAs. FPGA vendors have dierent logicblock congurations. The structure and granularity of the logic block has a signicant impact on the area-eciency of the FPGA. If the logic block is ne-grained, e.g. contains a single LUT, the circuit to be implemented will be distributed over more number of logic blocks. This has a negative impact on rability, since more blocks need to be interconnected. If several LUTs are clustered into one logic block, signal sharing among LUTs can be exploited. Since the interconnect inside the logic blocks are hardwired local interconnect can be made very fast and eciently. This improves rability and decreases the load on the rer signicantly. On the other hand, it is not feasible to increase the complexity of the logic blocks beyond a certain limit. If the logic blocks become too complex it becomes dicult to fully utilize these blocks leading to waste of logic, hence chip area. As a result there is a trade-o between the complexity of the logic blocks and the area-eciency. Many commercial FPGAs, such as Xilinx, Altera, and Actel FPGAs all include logic blocks that contain several LUTs. A collection of LUTs that are grouped together to be placed in one complex logic block is called a cluster. At the technology mapping stage logic gates are assigned to LUTs. In the following phase which only applies to cluster-based FPGAs several LUTs are collected in one cluster. The task of assigning LUTs to clusters is called packing. If during packing special properties of the interconnect, such as sharing among the pins can be exploited, signicant gains can be obtained in terms of rability. Rability problem is dicult to deal with at early stages of CAD ow, since there are no accurate means available to estimate the interconnect at this level. This problem has not been considered very extensively at the packing stage before. Clustering phase being at a later stage than technology mapping, can bring improvements on the rability, since after technology mapping a more accurate estimation on the interconnect is available. In this paper we are proposing improvements upon the state-of-theart logic packing algorithm called VPack: Logic Block Packing Algorithm [2]. We are introducing a new method to consider rability at the packing stage. We are presenting the eect of our method on the rability bysynthesizing the benchmark circuits through the complete CAD ow. We have technology mapped a given circuit, then applied our rability-driven packing method for clustering, and nally placed and red the circuit. We are presenting the results of the nal ring and show that our method improves the rability signicantly. This paper is organized in several sections. Previous work on rability-driven technology mapping will be presented in Section II. Also algorithms for cluster packing will be discussed. Section III provides denitions and the FPGA architecture we are targeting. The issues related to rability will be discussed in Section IV. Our rability-driven packing method will be presented in Section V. Section VI presents experimental results. In Section VII conclusions and future work are given. II. Previous Work Packing LUTs into clusters is an important design step introduced for cluster-based FPGAs. It can be viewed as a sub-task within the technology mapping stage. Hence we will rst mention contributions made in the technology mapping area. The majority of research devoted to technology mapping has been done with the objective of improving either timing or area-eciency. Compared

2 IV. Rability I inputs Clock Input Multiplexors (a) Basic Block Basic Block N puts Inputs Clock K input LUT FF (b) 2x1 MUX Fig. 1. (a) Internal Structure of a Logic Cluster (b) Basic Logic Block to the amount of these eorts there is little work in the rability driven technology mapping domain [5], [6]. The rability driven technology mapper for LUT based arrays, Rmap, employs a mapping strategy that considers several objectives aecting rability and tries to balance those goals. Those objectives will be discussed in more detail in Section V, where we will also present further objectives. On clustering there is also relatively little work published. VPack is one of the best known packing tools for FPGAs [2]. VPack rstpacks a ip op and a LUT together into one BLE, basic logic element. Then these BLEs are packed into logic clusters with the optimization objectives being to ll each cluster to its capacity and minimize the number of used inputs to each cluster. In [3] a timing-driven packing tool for FPGAs, T-VPack was also proposed. We are introducing new metrics that are used to form a new objective function to evaluate rability. Wehave taken the algorithm of VPack as a basis as we will describe in later sections and have built our own approach upon it. Our new algorithm indeed improves rability compared to VPack. As we will describe our results in Section VI, we are able to improve te ring area by 16.5% on average. III. Architecture Description The FPGA we are targeting is of island-style structure. It contains a square matrix of logic blocks. Between every row and column ring tracks are located. The structure of the basic logic block is illustrated in Figure 1 (b). It contains a K input LUT and one ip op. In our case K is 4. The inputs to the basic block are the inputs of the LUT and the clock input for the ip op. The put is red through a two input multiplexor. The logic cluster is shown in Figure 1 (a). The two-dimensional array of logic blocks is constructed from these clusters. The cluster size, N, is dened as the number of basic blocks contained in the cluster. The cluster takes I inputs that are connected to the LUTs inside basic blocks. Not all 4N basic block inputs are accesible externally. Only I of these are connected to input multiplexors of the cluster. These input multiplexors allow any of the I inputs to be connected to any ofthe4n basic block inputs. Also any put of N basic blocks can be connected to any basic block input through these multiplexors. The cluster contains N put pins connecting each basic block put to one cluster put. One advantage of cluster-based logic blocks is that many nets can be red locally inside a cluster, which make faster interconnect compared to general-purpose interconnect between logic blocks. Since part of the interconnect is internally red, less number of nets are left to be red among the clusters by the rer. Since most of an FPGA's area and delay is due to ring, it is important to consider the rability of the design in earlier stages. Technology mapping, clustering and placement can aect the rability. By performing actual ring, rability could be incorporated into early CAD stages. However, this is a very slow process. Also the actual ring cost is very insensitive and thus is a poor cost function. Therefore we consider rability in early stages by using predictors of rability. By clustering the LUTs, the number of connections between clusters can be reduced. In other words the rability can be increased. Consequently, the ring area can be improved in terms of number of tracks required in each ring channel and switch area. In FPGAs, since the ring tracks in each channel are xed, ring in a congested channel is hard for an FPGA rer. There are several factors aecting rability atdierent CAD ow stages. In technology mapping and clustering stage the following factors can improve the rability. The rst three factors have been mentioned previously in[6]: Ratio of cells used to total cells available Ratio of pins per cell Ratio of pins per net Ratio of used pins to total number of pins of the logic block Number of nets The rst two factors depend on the architecture. Two clusterbased FPGAs with dierent cluster sizes and dierent numbers of input pins are not equivalent in terms of rability. The impact of dierent logic block architectures on FPGA performance has been studied earlier [1]. Increasing either LUT size, K, or cluster size, N, increases the functionality of the logic block. This can lead to a decrease in the total number of logic blocks needed to implement a circuit on FPGA, but the size of the logic block increases with both K and N. The size of the cluster is quadratic in N [2]. In addition and more important, the ring area depends on K and N. It has been experimentally shown that the best area-delay can be obtained by using cluster with size of between 4 and 10 and LUT sizes of 4 to 6 [4]. Betz in [4] showed that the uniform distribution of pins on the logic block perimeter is better than top/bottom pin architecture in terms of rability and area. The remaining three rability factors are algorithmic and relatively architecture independent. These factors can be considered in technology mapping or packing algorithms. Ratio of pins per net indicates the density of high-fan nets in the circuit. Rers spend the major part of their eort for ring high-fan nets. Hence the goal is to make this ratio low. Ratio of used pins to total number of pins of the logic block indicates the trac in and around a logic block. A block with high connection trac is likely to create a congested area on the chip. Similarly high connectivity between logic blocks can lead to congestion. Number of nets is also closely related to rability. As the number of nets increases the more dicult it is for the rer to complete the ring. Naturally the number of nets depends on the circuit. In a cluster-based FPGA local ring inside clusters is not a concern compared to ring the external nets among clusters. In the technology mapping and packing stages the number of external nets can be reduced by inserting as many nets as possible inside the clusters. In this paper, we are focusing on the last three factors and we will actually show that they can aect rability signicantly.

3 Ring Area ( 2 * M * M * W ) Our Cost Function Ring Area tseng diffeq bigkey alu4 dsip s298 frisc benchmarks spla ex1010 s tseng ex5p apex4 s298 apex2 elliptic dsip s38584 spla pdc Advanced Method Cost Simple Method Cost Advanced Method Simple Method Fig. 2. Rability Cost after Packing V. RPack:Rability-Driven Packing for Cluster-Based FPGAs We introduce a new method to pack the logic blocks into clusters while considering rability by satisfying the last three factors mentioned in Section IV In order to satisfy these factors we are using the following objective function: RabilityCost = X Nets N i i (1) where N i denotes the number of nets with i terminals and i is the rability weight of nets with i terminals. A net with more number of terminals is harder to re. The rability weight, i, reects the degree of diculty to re a specic net with i terminals. Studies on wirelength estimation, [7], suggest that the ratio of imin imax should be chosen such that, as the number of terminals increases i should increase. On the other hand, for high-fan nets the rability does not vary signicantly. Therefore the dierence of their corresponding i should be small. Due to this reason the range of values that i can take is restricted. In our approach we dene i to be i =2; 1 i : (2) Our objective function is a weighted sum of nets considering the rability of nets with dierent fan. This objective function reects both the eect of the number of nets and also the nature of the nets exposed to the rer. Our goal is to minimize this objective function. In Figure 2 two approaches for packing are compared and the values of the cost function are presented for dierent benchmark circuits. One approach is a simple packer, and the other approach is a sophisticated packing method. From the graph it can be seen that the cost function dierentiates between the two approaches and takes larger values in the case of the simple method. In Figure 3 the corresponding ring area resulting from both approaches is presented. The ring area is measured in terms of number of ring tracks. We are proposing a packing method that aims to minimize the cost function introduced. The input to our packing algorithm, is a list of LUTs, registers, and connections among the resources. In RPack, similar to VPack [2,4],in the rst stage, a LUT and a register are packed into a basic logic block when possible. After that, the blocks are packed into clusters using a heuristic approach since clustering is an NP-hard problem. In this approach, clusters are constructed sequentially. After choosing a seed for a cluster being constructed, the logic block which gives the highest gain is selected to be added to the current cluster provided that it is a Fig. 3. Ring Results of Simple Method and Advanced Method. M M is array size,wisthenumber of tracks in each ring channel. N4 N1 in0 in1 in2 in3 in4 N2 Basic Logic Block B g ( N1, C, B ) =1 g ( N2, C, B ) = 2 N3 Cluster C being built g ( N3, C, B ) = 2 g ( N4, C, B ) = Gain ( B, C ) = 4 Fig. 4. A Cluster Being Constructed and a Candidate Block, K =4,I=5 legal choice. This means that the number of external inputs do not exceed the number of input pins of the cluster. The algorithm continues adding blocks into one cluster until the cluster is full or no more legal choices can be found. Similarly new clusters are constructed until all the blocks are packed into clusters. The best choice for the seed for each cluster is the unclustered logic block with the most used inputs as mentioned in [2, 4]. Assume that the gain computed for a candidate block B, is the number of inputs and puts it has in common with current cluster C as in [2] Gain(B) =j Nets(B) \ Nets(C) j (3) In Figure 4, a candidate basic block B and cluster C being constructed are shown. According to the denition of the gain function in Equation 3, nets N1, N2, and, N3 have the same contribution to the gain of block B to be added to the cluster C. If these nets are observed closely, their contributions to rability gain of the block are slightly dierent. Block B has three common nets with cluster C: N1, N2 and N3. An input pin of the net N1 is inside the cluster C. By adding the block Btothecluster, another input terminal of the net N1 would be inside the cluster. This leads to a decrease in the number of the terminals of the multi-terminal net N1. The gain obtained by moving B to cluster C corresponding to net N1 using Equation 3 is 1. In fact, the gain

4 function originates from the third rability factor discussed in Section IV, i.e. reducing pins per net. Table 1.Gain of a Candidate Block According to a Single Net pin in K! status in-pin edge put total pin in C gain gain congestion gain! in ! in all in pins in C in! in in! in! rest of pins already in C new net in new net The driving pin of net N2 is the put of the logic block B. There is an input pin of net N2 inside the cluster C. If the put terminal of a net is inside the cluster, internal connections can be used to connect the input pins of the net which are located inside the cluster. In such a case, there is no need to use an input pin of the cluster to connect the net N2 to other terminals of the net side the cluster, since an put pin of the cluster can be used for external interconnection. According to the reason above, the contribution of net N2 to block gainis more than just covering an edge of a multi terminal net. Actually, by adding block Bto cluster C, an input pin of the cluster gets free and can be used for another net connection. In Table 1 this is dened as in-pin gain. This increases the probability of acceptance of adding the block to the cluster. In other words, the probability of violating the input constraints of the cluster decreases. Note that each blockput pin is accessible from side and there is no sharing among the put pins. Therefore there would be no put pin constraints for the clusters. According to this observation, saving on put pins does not bring any gain except in one case. Suppose all the input pins of a net are already inside the cluster and the logic block being added to a cluster contains the put pin of the net. Net N3 in Figure 4 is an example of such a case. The put pin of the cluster corresponding to the block driving the net N3 cannot be used by other blocks. This means that there would be no connection from side to this pin since all the terminals of the corresponding net is located inside the cluster. Therefore the number of external connections of the cluster decreases, dened as put congestion gain in Table 1. This yields less congestion among the clusters. In other words, it reduces the number of used pins of a cluster which is the fourth rability factor. Net N4 has no pin in the cluster. The gain of moving logic block B to the cluster would be zero according to the gain function above. However, not only no edge from N4 would be covered, but also one input pin of the cluster would be used for N4. So the gain of moving logic block B to the cluster due to N4 in terms of used pins per cluster is -1. This means N4 has a degrading eect on the rability according to the fourth rability factor. As explained above, by considering just the number of shared inputs and puts as in Equation 3, the packing algorithm cannot dierentiate among the candidate blocks which have dierent impact on rability. All the dierent cases yielding dierent total gains are presented in the table below for one net connected to a candidate block. By incorporating the other rability factors, the gain for each logic block B going into cluster C can be computed as follows: where Gain(B C) = f(nets(b) Nets(C)) X = g(i Nets(C) B) g(i C B) = 8>< >: Nets(B) 1+f in(p (i B) P(i C)) +f o(p (i B) P(i C)) i 2 Nets(C) ;1 T (i B) otherwise f in(p (i B) P(i C)) is dened as the gain obtained in input pins of cluster C as dened in Table 1. Similarly f o(p (i B) P(i C)) is the gain obtained in put congestion. The additional gain of value 1 to the sum of these two gain terms corresponds to the edge gain. T(i, B) returns the type of the pin of Net i connected to basic block B. It returns 0 if the pin is an put pin, and 1 otherwise. P(i, B) is the set of all pins of Net i that are on block B. P(i, C) is the set of pins of Net i connected to cluster C. Nets(C) is the set of nets connected to cluster C. After choosing the best block according to our gain function, the feasibility of adding the block to the cluster is checked. The pseudo code of our approach is shown below. Input: Netlist of LUTs and Registers N = Cluster Size K=LUTSize I = Inputs per Cluster Output: List of Logic Clusters 1 Pack LUTs and Registers together into Basic Blocks 2 while Unclustered Basic Blocks available 3 Find Seed for new Cluster 4 while Cluster is not full 5 Update gains of unclustered Basic Blocks //Candidate blocks 6 Choose Basic Block with highest gain // Pick a candidate block 7 If Candidate is NOT feasible 8 then Go to Step 6 9 Else 10 Remove blockfrom unclustered blocks list 11 Add block to current Cluster 12 end while 13 end while At this pointwe are able to analyze the operation of our algorithm from another point of view. Inserting a whole multi-terminal net in one cluster is practically impossible. In most of the cases the best we can do is to eliminate two-terminal nets. Therefore, reducing an edge from a multi-terminal net should not be considered equivalent to reducing an edge resulting from inserting a twoterminal net inside a cluster. The average gain that a block can take from an n-terminal net i connected to one of its pins, depending on type of the net can be estimated from Table 1 as follows: (4)

5 ( 2 n =2 Gain avg(i n) = (5) n + 2 n;1 3 n n>2 According to Equation 5, the average gain obtained from a twoterminal net is the highest. This implies that the algorithm gives priority topushingatwo-terminal net entirely inside a cluster as compared to reducing a pin of a multi-terminal net. This leads to a decrease in number of nets satisfying the last rability factor. Lemma: The complexity ofrpack iso(m 2 ), where M is the number of basic blocks. Proof: Assume the interconnection among basic blocks is represented by a connectivity graph and each hypernet is represented by a complete graph. If this algorithm is applied on such a connectivity graph, each edge would be visited P at most once. At each visit the gains of the nodes (representing basic blocks) are updated. Therefore the er loop takes O( Nets P (i)2 ), where P (i) isthenumber of pins of net i. Also nding the seed for each cluster takes O(M 2 ) time, where M is the number of basic blocks. Totally the run-time of the algorithm would be O( P Nets P (i)2 + M 2 ). Let P be the total number pins in the circuit. Since P Nets P (i)2 <P 2 and P 2 ( M K )2, the complexity of the algorithm can be expressed as O(M 2 ). 2 VI. Results We are presenting our experimental results in this section. We have implemented our method on top of VPack. The rst set of our experiments consists of comparing our algorithm and VPack in terms of their success in satisfying the rability factors discussed earlier. Number of nets, the average number of pins per cluster and the number of clusters derived from the packing results are reported. Comparing number of nets or just average used pins per cluster does not strongly indicate rability. Since the cost function described in Section V is able to reect the rability quite successfully, it is our comparison metric for rability. Figure 5 presents the rability cost function for the two approaches, VPack and RPack. The gure shows that in all benchmarks RPack yields a smaller cost compared to VPack, indicating enhanced rability. The average improvement in the cost function over all benchmarks is 22.4 %. Although our algorithm gives priority to reducing two-terminal nets the distribution of high-fan nets is also improved. Otherwise, the existing of very high-fan nets would degrade the value of the cost function. In order to compare area, we are reporting the total number of resulting clusters and FPGA array size ( M M).TheFPGA array is the smallest square M M grid into which the resulting logic clusters can t. Array size is a more accurate metric for logic area as compared to total number of clusters. We have set the values of logic architecture parameters according to the discussion in Section IV for best logic utilization. The LUT size is 4 and the cluster size is 8. According to Equation 6 experimentally derived in [1], the input size is set to 18. Finally, a uniform pin distribution is used. I =(N K +1) (6) 2 We ran the 20 largest MCNC benchmarks on VPACK and RPack. The blif input format of each benchmark is obtained by SIS [9] logic minimization and FlowMap [8] technology mapper. The results presented in Table 2 show that our method successfully decreased the number of nets and the average number of used pins per cluster. RPack and VPack use similar number of clusters Our Cost Function alu4 apex4 clma diffeq elliptic ex5p misex3 s298 s38584 spla benchmarks Rpack Cost VPack Cost Fig. 5. Rability Cost after Packing (Our cost function correlates well with the ring results as shown in Table 3) such that the array size resulting from both approaches for almost all benchmarks is same. Even in one case RPack yielded smaller array size. The array size of each benchmark is reported in Table 2. In accordance with average gain estimated in Section V, the results show that the major portion of the decrease in the number of the nets is due to decrease in the number of two-terminal nets. We also observed that the decrease in the average number of used pins per cluster is due to reduced number of put pins. In conclusion, reducing the number of put pins is strongly related to reducing the number of nets. Table 2. Statistics of VPack and RPack.N= 8, K =4, I = 18 Number of Pins Used Number circuit Nets per Cluster of (avg.) Clusters VPack RPack % VPack RPack VPack RPack alu apex apex bigkey clma des dieq dsip elliptic ex ex5p frisc misex pdc s s s seq spla tseng Avg In order to verify that our method meets the objective of

6 improving rability, we synthesized the benchmark circuits through the complete CAD ow to obtain the ring area. We used VPR [2] to place and re the benchmarks. The ring architecture which weusedemploys only the single segment wires which leads to better rability. Subset switch type, in which each track inachannel can be connected to the same track number of the neighboring channels, is used. In addition we have set the fraction of the tracks of each channel to which each logic block input and put pins connect to 1. Table 3. Ring area and array size resulted from RPack and VPack. N=8,K=4,I=1 circuit Array Size Number of Tracks Improvement VPack RPack VPack RPack % alu apex apex bigkey clma des dieq dsip elliptic ex ex5p frisc misex pdc s s s seq spla tseng Average Table 3 summarizes our results after placement and ring of the benchmarks. As shown in Table 3 RPack is able to improve the ring area by decreasing the number of tracks signicantly. The average improvement we obtained is 16.5 %. The number of ring tracks required in each channel is a reliable metric for ring area, since less number of ring tracks does not only mean saving wiring area, but also decreasing the size of the ring switches drastically. In total our approach achieved a big gain in ring area. Such an improvement in ring area decreases the total chip area signicantly, since ring area is typically a large percentage of the total area. We also compared the results of VPack and RPack in terms of ring area and logic area for cluster size 4 and input size 10. By comparing the ring area and logic area for the two dierent cluster sizes, we note that the improvement in ring area is larger for the bigger cluster size. On the other hand, the more ecient logic utilization in cluster size 4 leads to a smaller logic area compared to cluster size 8. The FPGA placement and ring tools play a crucial role in the nal result. The VPR placer and rer are not transparent to the optimization goal of our algorithm. VPR placement algorithm is simulated annealing-based. The cost function is the total bounding box of the nets. In other words, this algorithm is not congestion-driven. Despite the fact that we could not use a suitable placer and rer, we were still able to improve the ring area signicantly. We expect that by combining our packing method with a congestion-driven placer and rer, better results can be obtained. VII. Conclusions and Future Work In this paper we proposed a rability driven cluster packing method for cluster-based FPGAs. Our method is able to improve the rability by decreasing the number of required tracks in the FPGA ring channels. This improvement was achieved by incorporating several rability factors in our packing algorithm. We proposed a new cost function as an indicator for rability before placement. Like VPack, an existing cluster packing tool for FPGAs, RPack is a quite fast heuristic approach. There is even more space for further improvements in RPack, since the time consumed for packing is a very small fraction of the total time consumed to synthesize a circuit through the whole CAD ow. Our results indicate that providing more accurate design decisions at the packing stage even at the expense of spending more time is recommended. Eorts to improve the design performance in early stages such as clustering can yield a big gain in the nal result. References [1] E. Ahmed, J. Rose, \The Eect of LUT and Cluster Size on Deep- Submicron FPGA Performance and Density," ACM/SIGDA International Symposium on FPGA, pp. 3-12, Feb [2] V. Betz, J. Rose, \VPR: A New Packing Placement and Ring Tool for FPGA Research," Int. Workshop on Field-Programmable Logic and Application, pp , [3] A. Marquardt, V. Betz, J. Rose, \Using Cluster-Based Logic Blocks and Timing-Driven Packing to Improve FPGA Speed and Density," ACM/SIGDA International Symposium on FPGAs, pp.37-46, Feb [4] V. Betz, J. Rose, A. Marquardt, \ Architecture and CAD for Deep- Submicron FPGAs," Kluwer Academic Publishers, [5] N. Bhat, D. Hill, \Rable Technology Mapping for FPGAs," First International Workshop on FPGAs, pp , Feb [6] M. Schlag, J. Kong, P. K. Chan, \Rability-Driven Technology Mapping for Lookup Table-Based FPGAs," IEEE Transactions on CAD, Vol. 13, pp.13-26, Jan [7] A. E. Caldwell, A. B. Kahng, S. Mantik, I. L. Markov, A. Zelikovsky, \ On Wirelength Estimations for Row-Based Placement," International Symposium on Physical Design, [8] J. Cong, Y. Ding, \ FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Design," IEEE Trans. on Computer-Aided Design, Vol.13, No.1, pp. 1-12,Jan [9] E. M. Sentovich, et al, \SIS: A System for Sequential Analysis," Tech. Report No. UCD/ERL M92/41, University of California, Berkeley, [10] S. Yang, \Logic Synthesis and Optimization Benchmarks, Version 3.0," Tech. Report, Microelectronics Center of North Carolina, 1991.

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES

SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES Alexander (Sandy) Marquardt, Vaughn Betz, and Jonathan Rose Right Track CAD Corp. #313-72 Spadina Ave. Toronto, ON, Canada M5S 2T9 {arm, vaughn,

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Logic Block Clustering of Large Designs for Channel-Width Constrained FPGAs

Logic Block Clustering of Large Designs for Channel-Width Constrained FPGAs { Logic Block Clustering of Large Designs for Channel-Width Constrained FPGAs Marvin Tom marvint @ ece.ubc.ca Guy Lemieux lemieux @ ece.ubc.ca Dept of ECE, University of British Columbia, Vancouver, BC,

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

Designing Heterogeneous FPGAs with Multiple SBs *

Designing Heterogeneous FPGAs with Multiple SBs * Designing Heterogeneous FPGAs with Multiple SBs * K. Siozios, S. Mamagkakis, D. Soudris, and A. Thanailakis VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus

More information

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric. SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA

More information

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada, V6T

More information

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809 PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA Laurent Lemarchand Informatique ubo University{ bp 809 f-29285, Brest { France lemarch@univ-brest.fr ea 2215, D pt ABSTRACT An ecient distributed

More information

Vdd Programmability to Reduce FPGA Interconnect Power

Vdd Programmability to Reduce FPGA Interconnect Power Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important

More information

Simultaneous Placement with Clustering and Duplication

Simultaneous Placement with Clustering and Duplication Simultaneous Placement with Clustering and Duplication GANG CHEN Magma Design Automation and JASON CONG UCLA Clustering, duplication, and placement are critical steps in a cluster-based FPGA design flow.

More information

Reducing Power in an FPGA via Computer-Aided Design

Reducing Power in an FPGA via Computer-Aided Design Reducing Power in an FPGA via Computer-Aided Design Steve Wilton University of British Columbia Power Reduction via CAD How to reduce power dissipation in an FPGA: - Create power-aware CAD tools - Create

More information

A Novel Net Weighting Algorithm for Timing-Driven Placement

A Novel Net Weighting Algorithm for Timing-Driven Placement A Novel Net Weighting Algorithm for Timing-Driven Placement Tim (Tianming) Kong Aplus Design Technologies, Inc. 10850 Wilshire Blvd., Suite #370 Los Angeles, CA 90024 Abstract Net weighting for timing-driven

More information

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction 44.1 Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA

More information

Timing-Driven Placement for FPGAs

Timing-Driven Placement for FPGAs Timing-Driven Placement for FPGAs Alexander (Sandy) Marquardt, Vaughn Betz, and Jonathan Rose 1 {arm, vaughn, jayar}@rtrack.com Right Track CAD Corp., Dept. of Electrical and Computer Engineering, 720

More information

Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays

Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada,

More information

A Routing Approach to Reduce Glitches in Low Power FPGAs

A Routing Approach to Reduce Glitches in Low Power FPGAs A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin Wong Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign This research

More information

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs Vaughn Betz Jonathan Rose Alexander Marquardt

More information

An LP-based Methodology for Improved Timing-Driven Placement

An LP-based Methodology for Improved Timing-Driven Placement An LP-based Methodology for Improved Timing-Driven Placement Qingzhou (Ben) Wang, John Lillis and Shubhankar Sanyal Department of Computer Science University of Illinois at Chicago Chicago, IL 60607 {qwang,

More information

Buffer Design and Assignment for Structured ASIC *

Buffer Design and Assignment for Structured ASIC * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, 107-124 (2014) Buffer Design and Assignment for Structured ASIC * Department of Computer Science and Engineering Yuan Ze University Chungli, 320 Taiwan

More information

FPGA Power Reduction Using Configurable Dual-Vdd

FPGA Power Reduction Using Configurable Dual-Vdd FPGA Power Reduction Using Configurable Dual-Vdd 45.1 Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA {feil, ylin, lhe}@ee.ucla.edu ABSTRACT Power

More information

Placement Algorithm for FPGA Circuits

Placement Algorithm for FPGA Circuits Placement Algorithm for FPGA Circuits ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,

More information

Figure 1. PLA-Style Logic Block. P Product terms. I Inputs

Figure 1. PLA-Style Logic Block. P Product terms. I Inputs Technology Mapping for Large Complex PLDs Jason Helge Anderson and Stephen Dean Brown Department of Electrical and Computer Engineering University of Toronto 10 King s College Road Toronto, Ontario, Canada

More information

Congestion-Driven Regional Re-clustering for Low-Cost FPGAs

Congestion-Driven Regional Re-clustering for Low-Cost FPGAs Congestion-Driven Regional Re-clustering for Low-Cost FPGAs Darius Chiu, Guy G.F. Lemieux, Steve Wilton Electrical and Computer Engineering, University of British Columbia British Columbia, Canada dariusc@ece.ubc.ca

More information

ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS

ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS ABSTRACT As Field-Programmable Gate Array (FPGA) power consumption continues to increase, lower power FPGA circuitry, architectures, and Computer-Aided

More information

Fast Timing-driven Partitioning-based Placement for Island Style FPGAs

Fast Timing-driven Partitioning-based Placement for Island Style FPGAs .1 Fast Timing-driven Partitioning-based Placement for Island Style FPGAs Pongstorn Maidee Cristinel Ababei Kia Bazargan Electrical and Computer Engineering Department University of Minnesota, Minneapolis,

More information

Static and Dynamic Memory Footprint Reduction for FPGA Routing Algorithms

Static and Dynamic Memory Footprint Reduction for FPGA Routing Algorithms 18 Static and Dynamic Memory Footprint Reduction for FPGA Routing Algorithms SCOTT Y. L. CHIN and STEVEN J. E. WILTON University of British Columbia This article presents techniques to reduce the static

More information

Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization

Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization Zhiru Zhang School of ECE, Cornell University September 29, 2017 @ EPFL A Case Study on Digit Recognition bit6 popcount(bit49 digit)

More information

Conclusions and Future Work. We introduce a new method for dealing with the shortage of quality benchmark circuits

Conclusions and Future Work. We introduce a new method for dealing with the shortage of quality benchmark circuits Chapter 7 Conclusions and Future Work 7.1 Thesis Summary. In this thesis we make new inroads into the understanding of digital circuits as graphs. We introduce a new method for dealing with the shortage

More information

Statistical Analysis and Design of HARP Routing Pattern FPGAs

Statistical Analysis and Design of HARP Routing Pattern FPGAs Statistical Analysis and Design of HARP Routing Pattern FPGAs Gang Wang Ý, Satish Sivaswamy Þ, Cristinel Ababei Þ, Kia Bazargan Þ, Ryan Kastner Ý and Eli Bozorgzadeh ÝÝ Ý Dept. of ECE Þ ECE Dept. ÝÝ Computer

More information

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this report, we

More information

Timing Optimization of FPGA Placements by Logic Replication

Timing Optimization of FPGA Placements by Logic Replication 13.1 Timing Optimization of FPGA Placements by Logic Replication Giancarlo Beraudo ECE Department, University of Illinois at Chicago 851 S. Morgan St., Chicago IL, 60607 gberaudo@ece.uic.edu John Lillis

More information

Christophe HURIAUX. Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration

Christophe HURIAUX. Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration Mid-term Evaluation March 19 th, 2015 Christophe HURIAUX Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration Accélérateurs matériels reconfigurables embarqués avec reconfiguration

More information

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu

More information

IMPROVING TIMING-DRIVEN FPGA PACKING WITH PHYSICAL INFORMATION

IMPROVING TIMING-DRIVEN FPGA PACKING WITH PHYSICAL INFORMATION IMPROVING TIMING-DRIVEN FPGA PACKING WITH PHYSICAL INFORMATION Doris T. Chen, Kristofer Vorwerk, Andrew Kennings University of Waterloo Waterloo, ON {dtlchen,kpvorwer,akenning}@cheetah.vlsi.uwaterloo.ca

More information

Toward More Efficient Annealing-Based Placement for Heterogeneous FPGAs. Yingxuan Liu

Toward More Efficient Annealing-Based Placement for Heterogeneous FPGAs. Yingxuan Liu Toward More Efficient Annealing-Based Placement for Heterogeneous FPGAs by Yingxuan Liu A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department

More information

An Efficient Chip-level Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction

An Efficient Chip-level Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction An Efficient Chip-level Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunat 2 Electrical Engineering Dept., UCLA, Los Angeles, CA 1 Purdue

More information

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi Incorporating the Controller Eects During Register Transfer Level Synthesis Champaka Ramachandran and Fadi J. Kurdahi Department of Electrical & Computer Engineering, University of California, Irvine,

More information

An automatic tool flow for the combined implementation of multi-mode circuits

An automatic tool flow for the combined implementation of multi-mode circuits An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João M. P. Cardoso and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat

More information

Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures

Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures by Daniele G Paladino A thesis submitted in conformity with the requirements for the degree of Master of Applied

More information

IMPROVING LOGIC DENSITY THROUGH SYNTHESIS-INSPIRED ARCHITECTURE Jason H. Anderson

IMPROVING LOGIC DENSITY THROUGH SYNTHESIS-INSPIRED ARCHITECTURE Jason H. Anderson IMPROVING LOGIC DENITY THROUGH YNTHEI-INPIRED ARCHITECTURE Jason H. Anderson Dept. of ECE, Univ. of Toronto Toronto, ON Canada email: janders@eecg.toronto.edu ABTRACT We leverage properties of the logic

More information

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA A Path Based Algorithm for Timing Driven Logic Replication in FPGA By Giancarlo Beraudo B.S., Politecnico di Torino, Torino, 2001 THESIS Submitted as partial fulfillment of the requirements for the degree

More information

EN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5)

EN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5) EN2911X: Lecture 13: Design Flow: Physical Synthesis (5) Prof. Sherief Reda Division of Engineering, rown University http://scale.engin.brown.edu Fall 09 Summary of the last few lectures System Specification

More information

Exploring Logic Block Granularity for Regular Fabrics

Exploring Logic Block Granularity for Regular Fabrics 1530-1591/04 $20.00 (c) 2004 IEEE Exploring Logic Block Granularity for Regular Fabrics A. Koorapaty, V. Kheterpal, P. Gopalakrishnan, M. Fu, L. Pileggi {aneeshk, vkheterp, pgopalak, mfu, pileggi}@ece.cmu.edu

More information

Detailed Router for 3D FPGA using Sequential and Simultaneous Approach

Detailed Router for 3D FPGA using Sequential and Simultaneous Approach Detailed Router for 3D FPGA using Sequential and Simultaneous Approach Ashokkumar A, Dr. Niranjan N Chiplunkar, Vinay S Abstract The Auction Based methodology for routing of 3D FPGA (Field Programmable

More information

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this

More information

Variation Aware Routing for Three-Dimensional FPGAs

Variation Aware Routing for Three-Dimensional FPGAs Variation Aware Routing for Three-Dimensional FPGAs Chen Dong, Scott Chilstedt, and Deming Chen Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign {cdong3, chilste1,

More information

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs Harrys Sidiropoulos, Kostas Siozios and Dimitrios Soudris School of Electrical & Computer Engineering National

More information

A New K-Way Partitioning Approach. Bernhard M. Riess, Heiko A. Giselbrecht, and Bernd Wurth. Technical University of Munich, Munich, Germany

A New K-Way Partitioning Approach. Bernhard M. Riess, Heiko A. Giselbrecht, and Bernd Wurth. Technical University of Munich, Munich, Germany A New K-Way Partitioning Approach for Multiple Types of s Bernhard M. Riess, Heiko A. Giselbrecht, and Bernd Wurth Institute of Electronic Design Automation Technical University of Munich, 8090 Munich,

More information

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs . FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles,

More information

Submitted for TAU97 Abstract Many attempts have been made to combine some form of retiming with combinational

Submitted for TAU97 Abstract Many attempts have been made to combine some form of retiming with combinational Experiments in the Iterative Application of Resynthesis and Retiming Soha Hassoun and Carl Ebeling Department of Computer Science and Engineering University ofwashington, Seattle, WA fsoha,ebelingg@cs.washington.edu

More information

AN ACCELERATOR FOR FPGA PLACEMENT

AN ACCELERATOR FOR FPGA PLACEMENT AN ACCELERATOR FOR FPGA PLACEMENT Pritha Banerjee and Susmita Sur-Kolay * Abstract In this paper, we propose a constructive heuristic for initial placement of a given netlist of CLBs on a FPGA, in order

More information

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Dynamic HW/SW Partitioning Initially execute application in software only 5 Partitioned application executes faster

More information

Interconnect Driver Design for Long Wires in Field-Programmable Gate Arrays1

Interconnect Driver Design for Long Wires in Field-Programmable Gate Arrays1 Interconnect Driver Design for Long Wires in Field-Programmable Gate Arrays1 Edmund Lee, Guy Lemieux, Shahriar Mirabbasi University of British Columbia, Vancouver, Canada { eddyl lemieux shahriar } @ ece.ubc.ca

More information

Heterogeneous Technology Mapping for Area Reduction in FPGA s with Embedded Memory Arrays

Heterogeneous Technology Mapping for Area Reduction in FPGA s with Embedded Memory Arrays 56 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 1, JANUARY 2000 Heterogeneous Technology Mapping for Area Reduction in FPGA s with Embedded Memory Arrays

More information

Cross-layer Optimized Placement and Routing for FPGA Soft Error Mitigation

Cross-layer Optimized Placement and Routing for FPGA Soft Error Mitigation Cross-layer Optimized Placement and Routing for FPGA Soft Error Mitigation Keheng Huang Yu Hu Xiaowei Li Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy

More information

FPGA Programmable Logic Block Evaluation using. Quantified Boolean Satisfiability

FPGA Programmable Logic Block Evaluation using. Quantified Boolean Satisfiability FPGA Programmable Logic Block Evaluation using Quantified Boolean Satisfiability Andrew C. Ling, Deshanand P. Singh, and Stephen D. Brown, December 12, 2005 Abstract This paper describes a novel Field

More information

Accelerating FPGA Routing Using Architecture-Adaptive A* Techniques

Accelerating FPGA Routing Using Architecture-Adaptive A* Techniques Accelerating FPGA Routing Using Architecture-Adaptive A* Techniques Akshay Sharma Actel Corporation Mountain View, CA 9443, USA Akshay.Sharma@actel.com Scott Hauck University of Washington Seattle, WA

More information

/97 $ IEEE

/97 $ IEEE NRG: Global and Detailed Placement Majid Sarrafzadeh Maogang Wang Department of Electrical and Computer Engineering, Northwestern University majid@ece.nwu.edu Abstract We present a new approach to the

More information

Cluster-Based Architecture, Timing-Driven Packing and Timing-Driven Placement for FPGAs

Cluster-Based Architecture, Timing-Driven Packing and Timing-Driven Placement for FPGAs Cluster-Based Architecture, Timing-Driven Packing and Timing-Driven Placement for FPGAs by Alexander R. Marquardt A thesis submitted in conformity with the requirements for the degree of Master of Applied

More information

Revisiting Genetic Algorithms for the FPGA Placement Problem

Revisiting Genetic Algorithms for the FPGA Placement Problem Revisiting Genetic Algorithms for the FPGA Placement Problem Peter Jamieson Miami University, Oxford, OH, 45056 Email: jamiespa@muohio.edu Abstract In this work, we present a genetic algorithm framework

More information

Research Article FPGA Interconnect Topologies Exploration

Research Article FPGA Interconnect Topologies Exploration International Journal of Reconfigurable Computing Volume 29, Article ID 259837, 13 pages doi:1.1155/29/259837 Research Article FPGA Interconnect Topologies Exploration Zied Marrakchi, Hayder Mrabet, Umer

More information

Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains

Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains Hadi Parandeh-Afshar hadi.parandehafshar@epfl.ch Philip Brisk philip@cs.ucr.edu Grace Zgheib grace.zgheib@lau.edu.lb Paolo Ienne

More information

Architecture Evaluation for

Architecture Evaluation for Architecture Evaluation for Power-efficient FPGAs Fei Li*, Deming Chen +, Lei He*, Jason Cong + * EE Department, UCLA + CS Department, UCLA Partially supported by NSF and SRC Outline Introduction Evaluation

More information

STATIC PROFILE DRIVEN OPTIMIZATION OF DIGITAL CIRCUITS Srihari Cadambi Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania. Contents 1 Introduction 1

More information

CAD Flow for FPGAs Introduction

CAD Flow for FPGAs Introduction CAD Flow for FPGAs Introduction What is EDA? o EDA Electronic Design Automation or (CAD) o Methodologies, algorithms and tools, which assist and automatethe design, verification, and testing of electronic

More information

DYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR. Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro

DYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR. Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro DYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro Instituto de Informática Universidade Federal do Rio Grande do Sul (UFRGS) Porto Alegre, RS - Brazil

More information

Technology Mapping and Packing. FPGAs

Technology Mapping and Packing. FPGAs Technology Mapping and Packing for Coarse-grained, Anti-fuse Based FPGAs Chang Woo Kang, Ali Iranli, and Massoud Pedram University of Southern California Department of Electrical Engineering Los Angeles

More information

Device And Architecture Co-Optimization for FPGA Power Reduction

Device And Architecture Co-Optimization for FPGA Power Reduction 54.2 Device And Architecture Co-Optimization for FPGA Power Reduction Lerong Cheng, Phoebe Wong, Fei Li, Yan Lin, and Lei He Electrical Engineering Department University of California, Los Angeles, CA

More information

TROUTE: A Reconfigurability-aware FPGA Router

TROUTE: A Reconfigurability-aware FPGA Router TROUTE: A Reconfigurability-aware FPGA Router Karel Bruneel and Dirk Stroobandt Hardware and Embedded Systems Group, ELIS Dept., Ghent University, Sint-Pietersnieuwstraat 4, B-9000 Gent, Belgium {karel.bruneel;dirk.stroobandt}@ugent.be

More information

Timing-driven Partitioning-based Placement for Island Style FPGAs

Timing-driven Partitioning-based Placement for Island Style FPGAs IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 24, No. 3, Mar. 2005 1 Timing-driven Partitioning-based Placement for Island Style FPGAs Pongstorn Maidee, Cristinel

More information

Using Sparse Crossbars within LUT Clusters

Using Sparse Crossbars within LUT Clusters Using Sparse Crossbars within LUT Clusters Guy Lemieux Dept. of Electrical and Computer Engineering University of Toronto Toronto, Ontario, Canada M5S 3G4 lemieux@eecg.toronto.edu David Lewis Dept. of

More information

mrfpga: A Novel FPGA Architecture with Memristor-Based Reconfiguration

mrfpga: A Novel FPGA Architecture with Memristor-Based Reconfiguration mrfpga: A Novel FPGA Architecture with Memristor-Based Reconfiguration Jason Cong Bingjun Xiao Department of Computer Science University of California, Los Angeles {cong, xiao}@cs.ucla.edu Abstract In

More information

Faster Placer for Island-style FPGAs

Faster Placer for Island-style FPGAs Faster Placer for Island-style FPGAs Pritha Banerjee and Susmita Sur-Kolay Advanced Computing and Microelectronics Unit Indian Statistical Institute 0 B. T. Road, Kolkata, India email:{pritha r, ssk}@isical.ac.in

More information

Type T1: force false. Type T2: force true. Type T3: complement. Type T4: load

Type T1: force false. Type T2: force true. Type T3: complement. Type T4: load Testability Insertion in Behavioral Descriptions Frank F. Hsu Elizabeth M. Rudnick Janak H. Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract A new synthesis-for-testability

More information

Energy and Switch Area Optimizations for FPGA Global Routing Architectures

Energy and Switch Area Optimizations for FPGA Global Routing Architectures Energy and Switch Area Optimizations for FPGA Global Routing Architectures YI ZHU, YUANFANG HU, MICHAEL B. TAYLOR, and CHUNG-KUAN CHENG University of California, San Diego Low energy and small switch area

More information

MODULAR PARTITIONING FOR INCREMENTAL COMPILATION

MODULAR PARTITIONING FOR INCREMENTAL COMPILATION MODULAR PARTITIONING FOR INCREMENTAL COMPILATION Mehrdad Eslami Dehkordi, Stephen D. Brown Dept. of Electrical and Computer Engineering University of Toronto, Toronto, Canada email: {eslami,brown}@eecg.utoronto.ca

More information

3. G. G. Lemieux and S. D. Brown, ëa detailed router for allocating wire segments

3. G. G. Lemieux and S. D. Brown, ëa detailed router for allocating wire segments . Xilinx, Inc., The Programmable Logic Data Book, 99.. G. G. Lemieux and S. D. Brown, ëa detailed router for allocating wire segments in æeld-programmable gate arrays," in Proceedings of the ACM Physical

More information

Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs

Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs Anurag Tiwari and Karen A. Tomko Department of ECECS, University of Cincinnati Cincinnati, OH 45221-0030, USA {atiwari,

More information

Rajarshi Mukherjee and Seda Ogrenci Memik {rajarsh,

Rajarshi Mukherjee and Seda Ogrenci Memik {rajarsh, 1 Realizing Low Power FPGAs: A Design Partitioning Algorithm for Voltage Scaling and A Comparative Evaluation of Voltage Scaling Techniques for FPGAs ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, NORTWESTERN

More information

g a0 1 a0 b 1 (3) (2) (4) (5) (1) 1 i 2

g a0 1 a0 b 1 (3) (2) (4) (5) (1) 1 i 2 A New Retiming-based Technology Mapping Algorithm for LUT-based FPGAs Peichen Pan y and Chih-Chang Lin z y Dept. of ECE, Clarkson University, Potsdam, NY 13699 z Verplex Systems, Inc., San Jose, CA 95112

More information

FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability

FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL 2005 1 FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability

More information

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica A New Register Allocation Scheme for Low Power Data Format Converters Kala Srivatsan, Chaitali Chakrabarti Lori E. Lucke Department of Electrical Engineering Minnetronix, Inc. Arizona State University

More information

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR An Introduction to FPGA Placement Yonghong Xu Supervisor: Dr. Khalid RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR

More information

Retiming & Pipelining over Global Interconnects

Retiming & Pipelining over Global Interconnects Retiming & Pipelining over Global Interconnects Jason Cong Computer Science Department University of California, Los Angeles cong@cs.ucla.edu http://cadlab.cs.ucla.edu/~cong Joint work with C. C. Chang,

More information

HARP: Hard-wired Routing Pattern FPGAs

HARP: Hard-wired Routing Pattern FPGAs : Hard-wired Routing Pattern FPGAs Satish Sivaswamy, Gang Wang, Cristinel Ababei, Kia Bazargan, Ryan Kastner and Eli Bozorgzadeh ECE Dept. Dept. of ECE Computer Science Dept. Univ. of Minnesota Univ. of

More information

HYBRID FPGA ARCHITECTURE

HYBRID FPGA ARCHITECTURE HYBRID FPGA ARCHITECTURE Alireza Kaviani and Stephen Brown Department of Electrical and Computer Engineering University of Toronto, Canada Email: kaviani brown@eecg.toronto.edu Abstract This paper 1 proposes

More information

Emerald - An Architecture-Driven Tool Compiler for FPGAs. Darren C. Cronquist and Larry McMurchie. University of Washington.

Emerald - An Architecture-Driven Tool Compiler for FPGAs. Darren C. Cronquist and Larry McMurchie. University of Washington. Emerald - An Architecture-Driven Tool Compiler for FPGAs Darren C. Cronquist and Larry McMurchie Department of Computer Science and Engineering University of Washington Box 352350 Seattle, WA 98195-2350

More information

An FPGA Design And Implementation Framework Combined With Commercial VLSI CADs

An FPGA Design And Implementation Framework Combined With Commercial VLSI CADs An FPGA Design And Implementation Framework Combined With Commercial VLSI CADs ReCoSoC 2013 Qian Zhao Motoki Amagasaki Masahiro Iida Morihiro Kuga Toshinori Sueyoshi (, Japan) Background FPGA IP core development

More information

FPGA Placement by Graph Isomorphism

FPGA Placement by Graph Isomorphism FPGA Placement by Graph Isomorphism Hossein Omidian Savarbaghi Dept. of Computer Eng. Islamic Azad University Science and research branch Tehran, Iran h_omidian_sa@yahoo.com Kia Bazargan Dept. of Elec.

More information

Research Article An Improved Diffusion Based Placement Algorithm for Reducing Interconnect Demand in Congested Regions of FPGAs

Research Article An Improved Diffusion Based Placement Algorithm for Reducing Interconnect Demand in Congested Regions of FPGAs Reconfigurable Computing Volume 205, Article ID 75604, 0 pages http://dx.doi.org/0.55/205/75604 Research Article An Improved Diffusion Based Placement Algorithm for Reducing Interconnect Demand in Congested

More information

Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints

Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints Amit Kulkarni, Tom Davidson, Karel Heyse, and Dirk Stroobandt ELIS department, Computer Systems Lab, Ghent

More information

Improvements to Technology Mapping for LUT-Based FPGAs

Improvements to Technology Mapping for LUT-Based FPGAs Improvements to Technology Mapping for LUT-Based FPGAs Alan Mishchenko Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, satrajit, brayton}@eecs.berkeley.edu

More information

Runtime and Quality Tradeoffs in FPGA Placement and Routing

Runtime and Quality Tradeoffs in FPGA Placement and Routing Runtime and Quality Tradeoffs in FPGA Placement and Routing Chandra Mulpuri Department of Electrical Engineering University of Washington, Seattle, WA 98195, USA chandi@ee.washington.edu Scott Hauck Department

More information

Integrated Retiming and Placement for Field Programmable Gate Arrays

Integrated Retiming and Placement for Field Programmable Gate Arrays Integrated Retiming and Placement for Field Programmable Gate Arrays Deshanand P. Singh Dept. of Electrical and Computer Engineering University of Toronto Toronto, Canada singhd@eecg.toronto.edu Stephen

More information

Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices

Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Deshanand P. Singh Altera Corporation dsingh@altera.com Terry P. Borer Altera Corporation tborer@altera.com

More information

Place and Route for FPGAs

Place and Route for FPGAs Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming

More information

Challenges of FPGA Physical Design

Challenges of FPGA Physical Design Challenges of FPGA Physical Design Larry McMurchie 1 and Jovanka Ciric Vujkovic 2 1 Principal Engineer, Solutions Group, Synopsys, Inc., Mountain View, CA, USA 2 R&D Manager, Solutions Group, Synopsys,

More information

A Hierarchical Description Language and Packing Algorithm for Heterogenous FPGAs. Jason Luu

A Hierarchical Description Language and Packing Algorithm for Heterogenous FPGAs. Jason Luu A Hierarchical Description Language and Packing Algorithm for Heterogenous FPGAs by Jason Luu A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate

More information

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong Computer Science Department University of California, Los Angeles {demingc, cong}@cs.ucla.edu ABSTRACT

More information