RALP:Reconvergence-Aware Layer Partitioning For 3D FPGAs*

Size: px

Start display at page:

Download "RALP:Reconvergence-Aware Layer Partitioning For 3D FPGAs*"

Dominick Bridges
5 years ago
Views:

1 RALP:Reconvergence-Aware Layer Partitioning For 3D s* Qingyu Liu 1, Yuchun Ma 1, Yu Wang 2, Wayne Luk 3, Jinian Bian 1 1 Department of Computer Science and Technology, Tsinghua University, Beijing, China 2 Department of Electronic and Engineer, Tsinghua University, Beijing, China 3 Imperial College, London, United Kingdom myc@mail.tsinghua.edu.cn Abstract In 3D designs, the circuit elements are distributed among multiple layers. Therefore, the partition strategies will influence the optimization of the entire design. Without the layout information, it is quite difficult to evaluate the effect of partitioning before placement. As a prior estimation model, reconvergence has shown its efficiency to estimate wire length before placement in 2D designs. However, when it comes to 3D, the traditional method is no longer applicable due to the change of routing architecture. In this paper, we propose a novel prior estimator called 3D-reconvergence to evaluate wire length of the netlists in 3D designs. A reconvergenceaware layer partition (RALP) algorithm for 3D design is proposed. Experimental results show that our partitioning approach could lead to better physical layout results. Compared with the traditional min-cut based partitioning approach, the design flow with RALP can obtain better routing results by reducing 7.06% wire length and 4.86% delay for 2-layer designs, 4.71% wire length and 4.73% delay for 3-layer designs. Keywords Reconvergence; Partition; 3D; I. INTRODUCTION Nowadays, three dimensional (3D) ICs have shown significant performance improvements compared with their 2D counterparts. For design area, both industrial and academic communities now are also focusing on the 3D designs. However, comparing with the traditional 2D designs, though 3D s have good quality in delay, wire length and so on [5][7][8][9], the specific routing architecture with multiple stacking layers also invokes the challenges for the layout optimization. As shown in Fig.1(a), the architecture of traditional is composed of two main parts: routing resource (SB: switching box; CB: connecting box) and logic resource (CLB: configure logic block). The CLBs are connected to the horizontal and vertical channels with a CB. Connections are switched at the intersection of horizontal and vertical channels by a SB [3]. For 3D which stacks multiple layers, as shown in Fig. 1(b), to switch connections from a layer to another, some SBs must be extended to 3D-SBs with through-silicon vias (TSVs) to realize the communications between layers [2]. Normally, there are several TSVs clustered in each 3D-SB. The architecture shown in [12] utilized 20 TSVs for a 3D-SB. Given the specific architecture of chips, the distribution of 3D-SBs is fixed. And normally only parts of the SBs on the chip are 3D-SBs due to the fabrication cost. Therefore, the physical This work is supportted by NSFC /13/$ IEEE layout optimization is seriously restricted by the distribution of 3D-SBs. Fig. 1. architecture Normally in 3D designs, there is a fundamental process before the placement called partitioning in which the blocks are assigned to the specific layers. With the constraints of the 3D- SBs, the partitions among layers influence the later design process of placement and routing (P&R) greatly. There are several researches focusing on the layout optimization with layer partitioning. TPR [5][6] and 3D MEANDER [7] are two major synthesis frameworks targeting 3D so far. Both of them contain the layer partition step based on min-cut approach, which tries to minimize the number of used TSVs. However, as shown in Fig. 1(b), both the number and distribution of TSVs clustered in 3D-SBs in 3D s are initially fixed. As shown in Fig.2, K. Siozios[8] quantifies the utilization of fabricated TSVs regarding the 20 largest MCNC benchmarks and it shows that there are just about 46% TSVs used, and 54% TSVs are still vacant. Therefore, instead of minimizing the number of the vertical connections, the partitioning approach should take advantage of existed TSVs to improve the designs. Though the layer partition algorithms proposed in [8] and [9] are not based on min-cut approach, their partitioning schemes are realized by either iterative heuristic optimization or simulated annealing, which may consume much time but cannot guarantee the quality of the results. What s more, most of the previous approaches are focusing on the number of the vertical connections. In fact, the partitions influence the placement and routing greatly. Even with the same number of vertical connections, different partitions may lead to different layout results. However, due to the lack of the layout information before placement, it is difficult to consider the layout optimization during the partitioning stage.

2 TSV constraints: The cut nets in layer partition step will utilize TSVs to realize communications between different layers in P&R. As fabricated TSVs are fixed in 3D, the number of utilized TSVs in a layer partition result should not exceed the total quantity of fabricated TSVs. Unlike the previous works whose objectives are to minimize the number of nets being cut, in this paper, we introduce an extra objective to improve the performance during the partitioning process. Fig. 2. Utilization of TSVs in 3D In this paper, we propose a novel prior wirelength estimator called 3D-reconvergence which could be used to evaluate different partitions before placement. Actually the term reconvergence has been used in 2D designs to estimate wire length before placement [10][11]. But with multiple layers stacked, the original definition of reconvergence is no longer suitable for 3D s. By representing the partitioned netlists considering the vertical connections, we propose a new estimator 3D-reconvergence accordingly. We also propose a reconvergence-aware layer partition algorithm which could provide better layer partition result for the later-on placement and routing. The contributions of this paper are summarized as follow: A new estimator 3D-reconvergence is proposed to reflect the relationship between reconvergence and wire length in 3D designs. A novel reconvergence-aware layer partition algorithm (RALP) based on 3D-reconvergence is proposed to provide better layer partition results for P&R in 3D designs. RALP could fit in 3D design flow smoothly and experimental results show that our approach obtain better result compared with traditional partitioning approach by considering the wirelength estimation at the early design stage. II. PROBLEM FORMULATION Layer partitioning for 3D is a physical design process in which a netlist of circuit blocks is mapped onto multiple chip layers. The input netlist is modeled as a hypergraph H(V, E). In the hypergraph each vertices corresponds to a logic block and each hyperedge corresponds to a net. Layer partitioning contains two steps: partition and assignment. The partition step first partitions the input netlist into multiple parts and the assignment then assigns each part to a layer in 3D. A good layer partition strategy should favor P&R under following constraints: Resource balance: As a general 3D basically stacks multiple identical 2D layers together, the resource used such as the number of utilized CLBs in different layers should be balanced to minimize the waste of the chip area resource. III. 3D-RECONVERGENCE To evaluate the partitioning results` performance without detailed layout information, a prior estimator is needed. Reconvergence [10] is used efficiently to estimate wire length before placement in 2D. However, the method is no longer applicable for 3D design. New techniques are needed to signify the relationship between reconvergence and wire length in 3D to guide the layer partitioning. A. Reconvergence for 2D Reconvergence occurs when multiple fan outs from a single node in the circuit branch back together at a later point. As shown in Fig.3, there is a reconvergence between node A and F. Based on this idea, [10] first put forwards the definition of reconvergence as follows: Definition 1. R : Reconvergence is defined to originate on a node x, if branches of multiple fan outs of x join later at a node y. The reconvergence will be referred by the name R xy. x is the origin and y is the destination of the R xy. For sequential circuits, the circuit is divided into different sequential levels at the flip-flop boundaries, and reconvergences of a sequential circuit are only within same sequential levels. In R, for each path p, the length of the path is noted as l (p). For R, the most important characteristic is its average path length. We could define it as the value of R, namely RV. RV l( p), p N R [10] uses R to generate test circuits. [11] uses R to estimate wire length before placement for 2D. In [11] the weight of every node is related to RV which it belongs to and a net`s length is linked to weights of its nodes. [11] has shown its efficiency. It achieves an average error around 10% w.r.t. semiperimeter wire length measured from the optimized layout using VPR. It suggests reconvergence dictates a kind of forces exerted by cells on each other. If there are a lot of reconvergences in a circuit, cells will compete with each other for sites in the layout so as to minimize the wire length in the placement. Therefore, a netlist with many complex reconvergences is hard to be placed. Since the larger reconvergence value indicates the possible larger wirelength of the layout, for 3D s, we expect our layer partition algorithm to partition a netlist into parts in a way which could reduce and simplify reconvergences in each part. However, during the partition process, the cut parts in the

3 netlist are connected by vertical connections and therefore the sub netlists on different layers are not completely separated with each other. Their placements depend on each other. Therefore, the traditional method on a planar space is no longer applicable due to that it is unable to reflect the vertical connection of these partitioned reconvergences. We need new models to estimate the wire length of the partitioned reconvergences so as to obtain a layer partition algorithm which favors P&R. Fig. 3. Different placement results of R AF B. 3D-reconvergence Since the vertical connections play an important role in partitions, we need to model it properly in our formulation. The physical location of the TSVs among layers constrains the layout of the blocks. The cut on the same reconvergence will lead to several TSVs which are required to be near to each other to favor the later-on placement and routing. Most placement algorithms for 3D place blocks layer by layer. As shown in Fig. 3, R AF is required to be placed on a 2-layer and we place layer 0 first. If TSVs are not close to each other as shown in Fig. 3(a), the optimal placement on layer 0 could lead to a bad result for layer 1 with total wire length of 8. Nevertheless, if we cluster TSVs in R AF, it could shorten wire length from 8 to 6 under the same placement strategy as shown in Fig. 3(b). Especially, there are indeed several TSVs clustered in each 3D- SB in the architecture. Furthermore, the proposed 3D `s architecture in [12] has 20 TSVs for a 3D-SB. Therefore, to represent the cut, we add a virtual node T on the netlist. As shown in Fig. 4, suppose R AF has been partitioned into two parts with AB and DE cut. A virtual node T is introduced connecting four nodes accordingly which is just representing the partition among layer. Fig. 4. 3D-Reconvergence Definition 1. Cutting node: If the R is partitioned into two parts, we add virtual nets and a virtual node T called cutting node in the netlist and the original R is turned into two small reconvergences R XT and R TY. In the real layout T stands for the TSVs in R resulted from nets cut during partition. Now with the cutting node, R AF is transformed into two small reconvergences R AT and R TF. We use R AT and R TF as a 3D-reconvergence: 3D-R AF which represents R AF after it is partitioned into sub-reconvergence. The definition of 3Dreconvergence is as follows: Definition 2. 3D-R: 3D-reconvergence is composed of several 2D reconvegences connected by cutting nodes. Suppose R is cut by cutting node T, 3D-R = {R XT, R TY}. For a reconvergence partitioned into N (N>2) parts, it is recursive defined. To evaluate the wirelength, the half parameter of the bounding box of the netlist is commonly used. In 3D designs, the wirelength is always evaluated by the bounding box of the projected elements on the planar space. With the cutting nodes representing the TSVs in the center of the netlists, we could roughly evaluate the wirelength by the largest bounding box among layers. Therefore, for each individual part on the same layer, we could estimate the wirelength according to its 2D reconveregence and the value of 3D-R (3D-RV ) should be evaluated by the maximal 2D reconvergence value among layers. Suppose 3D-R = {R XT1, R T1T2,, R TkY}, then 3D RV max( RV ), R 3D R ij Fig. 5. Different partition leads to different P&R results. In fact, different cut positions in R will form different 3D- R and also will lead to different wire length after P&R. Fig. 5 shows an example. R AF is same in Fig. 5(a) and Fig. 5(b). However, AB, AF and AD are cut in Fig. 5(a) while BC, AF and DE are cut in Fig. 5(b). This difference in layer partitioning forms different 3D-R and leads to different wire length after P&R. Fig. 5(a) tends to have a wire length of 6 while Fig. 5(b) tends to have a wire length of 5. Now we calculate 3D-RV in Fig. 5. 3D-RV for Fig. 5(a) = max(1, 7/3) = 7/3, while for Fig. 5(b) it is max(5/3, 5/3) = 5/3. This corresponds to the fact that Fig. 5(a) have larger wirelength and verifies the validity of 3D-RV. ij

4 IV. RALP FOR 3D A. Layer Partition in RALP As 3D-R with smaller value is likely to lead to a better P&R result shown in section above, the goal of RALP is to utilize limited TSV resources to cut reconvergences while at the same time to minimize the average 3D-RV of all 3Dreconvergences. As 3D-RV is determined by the maximal 2D-RV among layers, in a reconvergence, it is optimal to cut edge with the balanced sub netlists. When an edge in the reconvergence is cut, the corresponding path will be cut into two parts. Suppose edge E ij is on the path p of R. If E ij is cut during partition, according to equation (1) and (2), when calculating 3D-RV, the length of p is no longer l(p), but is max{l XT(p), l TY(p)}. In this case, by cutting E ij and partitioning p into two parts, we bring a benefit of min{l(p)-l XT(p), l(p)-l TY(p)} to path p. Therefore, from this view, we could define CB Rxy ij as the cut benefit of E ij in R. CB 0, Eij R R ij min{ l( p) lxt ( p), l( p) lty ( p )}, Eij R Based on (3), we can evaluate the effect of cutting each edge. Therefore, we can assign the netlist graph with the corresponding weight on each edge by its cutting benefit. Then to reduce the estimated wire length, we are aiming at finding the optimized cut on the graph with the maximal cutting benefits. Similar to TPR, we also use hmetis [13] for layer partition due to its quality and speed. Hmetis is a software package for partitioning large hypergraphs. The weights of vertices and hyperedges in the hypergraph are two main important parameters. TPR sets the same weight for all vertices due to that all clbs have the same area and also sets the same weight for all hyperedges when layer partition. The weights for vertices in RALP are same, too. However, the weights for hyperedges are different and based on cut benefits. Since Hmetis is based on min-cut, the goal of which is to minimize the total cut weight, namely nets with small weights tend to be cut, to maximize the cut benefits, we assign the reverse of the average value of cut benefits as the edge weights according to equation (4). With a limitation that total cuts are no more than TSVs` number, equation (4) could insure the goal of RALP discussed above. Specifically, weights are calculated as follows: 1, Ni 0 W N i i, Ni 0 R CB ij 0, Ei R Ni 1, Ei R B. Layer Assignment in RALP After the nodes are partitioned into different groups, it is still not settled which layer should be assigned to each group. Normally, the objective of this assignment is to minimize the utilization of TSVs. Based on the goal, RALP assigns layer to each group by greedily choosing the best solution among n! choices if the has n layers. Most placement algorithms for 3D place blocks layer by layer. Therefore after the assignment, we still need to determine the sequence of placement. Since the positions of blocks are greatly constrained by TSVs, it is reasonable to place the part with more blocks related to TSVs first. As shown in Fig. 5(a), we should place layer with block B, C, D, E, F first. Definition 4. 3D-Ri and 3D-Ri: For 3D-R, 3D-Ri represents the number of blocks in 3D-R in part i. 3D-Ri represents the total number of blocks of all 3D-reconvergences in part i. 3D-Ri just right reflects the number of blocks related to TSVs in reconvergences. Therefore RALP also uses 3Dreconvergence to determine the sequence of placement. It depends on the values of 3D-Ri. For a part with a larger 3D-Ri, it should be placed earlier. C. Total Flow of RALP Similar to hmetis, RALP partitions a netlist into an arbitrary number of parts using recursive bisection, namely a series of 2 partitions. Using three layers partition as an example, RALP includes two phases. Firstly, we perform an unbalanced 2 partition in which one part has 2/3 of the total number of vertices, and the other part has 1/3. Secondly, we bisects the part which has 2/3 of the total number of vertices into two equal-size parts. After these two phases, we could obtain three equal-size parts, each of which is 1/3 of the initial netlist. Fig. 6. RALP flow RALP could partition a netlist into n parts for a n-layer design and then determine the sequence of placement of each layer. The first step in RALP is also the most important step, which constructs a hypergraph according to input netlist and obtains the reconvergence information. Then according to equation (4), weights are calculated for hyperedges and hmetis is utilized to do the partition job. Layer assignment is then performed. Fig. 6 shows the total flow of RALP.

5 V. EXPERIMENTAL RESULTS This section evaluates the efficiency of RALP. RALP has been implemented with C++ and is tested on Red Hat Enterprise Linux 6 with 2.30GHz Intel Core i5-2410m CPU and 1 GB memory. In the test, a total of 9 circuits from the benchmark given by [6] are evaluated on the optimal 3D architecture of SP (20, 2)[12] (similar to the architecture shown in Fig. 1(b), the distance between two adjacent 3D-SBs in X- direction is 2 and there are 20 TSVs in every 3D-SB). The P&R algorithms used for RALP results are the same as TPR. A. The estimation effect of 3D-reconvergence In order to demonstrate 3D-RV could be used to evaluate the wire length trend in 3D design, in Table Ⅰ we show half perimeter (HP) obtained after placement together with average 3D-RV value obtained before placement for two different benchmarks: alu4 and des, for 2-layer and 3-layer designs. For every benchmark, we perform experiments for 10 times. The values of HP are sorted in ascending order. In Fig. 7, we show corresponding data in graphs for 2-layer cases. From the table and the figure we could see the 3D-RV value increase with the HP values accordingly which could verify the feasibility to use 3D-reconvergence as a prior wirelength estimator before placement. B. The Optimization of RALP As shown in Fig.6, though RALP is before placement and routing, the optimization effects of partitioning approach should be evaluated by the placement and routing results. To show the partitioning effects to the final layout results, we need to consider the wirelength (WL) and delay(delay) for both the placement level and the routing level. To show the advantage of our approach, we also compare our results with TPR[5,6] which utilizes the min-cut partitioning scheme. To make a fair comparison, the placement and routing tools from two approaches are using the same algorithms and settings. Table Ⅱ shows the results when 3D has 2 layers and Table Ⅲ shows the results when layer number is 3. From these tables we can see RALP leads to an average reduction of wire length and delay about 7.06% and 4.86% with 7.60% more TSVs utilized when layer number is 2 and an average reduction of wire length and delay about 4.71% and 4.73% with 6.74% more TSVs utilized when layer number is 3. These definitely show the efficiency of RALP. As what has been discussed that the ratio of used TSVs in 3D is rather small, the cost to use more TSVs could be ignored compared with the gain of the reduction of delay and wire length. Though RALP proposes an additional estimation for the partition step, it is still very efficient and the overhead is little compared with the run time of later placement and routing stages. Table IV shows the runtime of RALP, P&R for 2-layer s. From the results we can see that the runtime of RALP occupies less than 1% of the total run time of the physical design flow. VI. CONCLUSION In this paper, we first show the important role reconvergence could play in 3D design. In order to signify the relationship between reconvergence and wire length in 3D architecture, a new model called 3D-reconvergence is proposed. Based on 3D-reconvergence, we propose a reconvergenceaware layer partition algorithm for 3D. Experimental results show that compared to existing 3D partitioning approach, average wire length and delay could be reduced by 7.06% and 4.86% for 2 layers, 4.71% and 4.73% for 3 layers, respectively. VII. REFERENCES [1] R.S. Patti, Three-dimensional integrated circuits and the future of system-on-chip designs, Proc. IEEE, June 2006, vol. 94, no. 6, pp [2] Young-Su Kwon, Payam Lajevardi, Anantha P. Chandrakasan, Frank Honore, Donald E. Troxel, A 3-D wire resource prediction model validated using a 3-D placement and routing tool, SLIP, San Francisco, April 2005, pp [3] J. Rose, S. Brown, Flexibility of interconnection structures for fieldprogrammable gate arrays, IEEE Journal of Solid-State Circuits, 1991, vol. 26, pp [4] V. Betz and J. Rose, VPR: a new packing, placement and routing tool for research, Seventh International Workshop on Field- Programmable Logic and Applications, London, UK, 1997, pp [5] C. Ababei, H. Mogal, K. Bazargan, Three-dimensional place and route for s, IEEE Trans. Computer-Aided Design Integrated Circuits and Systems, June 2006, vol. 25, no. 6, pp [6] [7] K. Siozios, A. Bartzas, D. Soudirs, Architecture-level exploration of alternative interconnection schemes targeting 3D s: a softwaresupported methodology, Int`l Journal of Reconfigurable Computing, 2008, vol [8] K. Siozios, D. Soudris, A tabu-based partitioning and layer assignment algorithm for 3-D s, Embedded Systems Letters, IEEE, 2011, vol. 3, pp [9] Wentao Sui, Sheqin Dong, Jinian Bian, Wirelength-driven forcedirected 3D placement, Pro. Of the 20 th symposium on Great lakes symposium on VLSI, May [10] [M. Hutton, Characterization and parameterized generation of digital circuits, Ph.D. dissertation, Dept. Comput. Sci., Univ. Toronto, Toronto, ON, Canada, [11] S. Balachandran, D. Bhatia, A prior wirelength and interconnect estimation based on circuit characteristics, IEEE transactions on Computer-Aided Design of Integrated Circuit and Systems, 2005, vol. 24, pp [12] [Chia-I Chen, Bau-Cheng Lee, Juinn-Dar Huang, Architectural exploration of 3D s towards a better balance between area and delay, DATE, 2011, pp [13] G. Karypis, R. Aggarwal, V. Kumar, Shashi Shekhar, Multilevel hypergraph partitioning: applications in VLSI domain, IEEE Transactions on VLSI Systems, 1999, vol. 7, pp

6 bench mark ALU4 DES arch 2-layer 3-layer 2-layer 3-layer TABLE I. 3D-RV & HP FOR ALU4 & DES experiments D-RV HP D-RV HP D-RV HP D-RV HP DRV HP 3DRV HP (a) alue4 (b) des Fig. 7 3D-RV & HP for 2-layer deisgns of alu4 and des TABLE II. EXPERIMENTAL RESULTS FOR 2-LAYER (TSV) TPR RALP Placement-level result Routing-level result TSV Placement-level result Routing-level result TSV WL Delay WL Delay number WL Delay WL Delay number Alu E E E E Misex E E E E Ex5p E E E E Apex E E E E Apex E E E E Seq E E E E Des E E E E Ex E E E E Pdc E E E E Average E E E E Improve % -5.61% -7.06% -4.86% +7.60% TABLE III. EXPERIMENTAL RESULTS FOR 3-LAYER TPR RALP Placement-level result Routing-level result TSV Placement-level result Routing-level result TSV WL Delay WL Delay number WL Delay WL Delay number Alu E E E E Misex E E E E Ex5p E E E E Apex E E E E Apex E E E E Seq E E E E Des E E E E Ex E E E E Pdc E E E E Average E E E E Improve -8.90% -1.86% -4.71% -4.73% +6.74% TABLE IV. RUNTIME (SECONDS) OF RALP ALU4 MISEX3 EX5P APEX4 APEX2 SEQ DES EX1010 PDC RALP Place Route

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca