SEMICONDUCTOR industry is beginning to question the

Size: px

Start display at page:

Download "SEMICONDUCTOR industry is beginning to question the"

Domenic Henderson
5 years ago
Views:

1 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) Placement and Routing for 3D System-On-Package Designs Jacob Minz, Eric Wong, Mohit Pathak, and Sung Kyu Lim, Member, IEEE Abstract 3D packaging via System-On-Package (SOP) is a viable alternative to System-On-Chip (SOC) to meet the rigorous requirements of today s mixed signal system integration. In this article, we present the first physical design algorithms for thermal and power supply noise-aware 3D placement and crosstalk-aware 3D global routing. Existing approaches consider the thermal distribution, power supply noise, and crosstalk issues as an afterthought, which may require an expensive cooling scheme, more decoupling capacitors (= decap), and additional routing layers. Our goal is to overcome this problem with our thermal/decap/crosstalk-aware 3D layout automation tools. The traditional design objectives such as performance, area, wirelength, and via are considered simultaneously to ensure high quality results. The related experimental results demonstrate the effectiveness of our approaches. Index Terms System-On-Package, mixed-signal CAD, 3D packaging, placement and routing, thermal distribution, power supply noise, crosstalk. M P P A D D D A D P D M D M P D P A M M M P A A D M M A D P top-down view A P D cross-section view Fig. 1. Illustration of 3D System-On-Package, where A, D, M, and P respectively node analog chip, digital chip, memory module, and embedded passive blocks. I. INTRODUCTION SEMICONDUCTOR industry is beginning to question the viability of System-On-Chip (SOC) approach due to its low-yield and high-cost problem. Recently, 3D packaging via System-On-Package (SOP) [1], [2], [3] has been proposed as an alternative solution to meet the rigorous requirements of today s mixed signal system integration. 1 The SOP is about 3D integration of multiple functions in a miniaturized package achieved by thin film embedding. The 3D SOP concept optimizes ICs for transistors and the package for integration of digital, RF, optical, sensor and others. It accomplishes this by both build-up SOP, similar to IC fabrication, and by stacked SOP, similar to parallel board fabrication. The uniqueness of 3D SOP is in the highly integrated or embedded RF, optical or digital functional blocks, and sensors, in contrast to stacked ICs and stacked package. Due to the high complexity in designing large-scale SOP under multiple objectives and constraints, computer-aided design (CAD) tools have become indispensable. Thus, innovative ideas on CAD tools for SOP technology are crucial to fully exploit the potential of this new emerging technology. However, there exists very few tools, if not none, that handle the complexity of automatic 3D SOP layout generation. This article tackles three most crucial issues that threaten the reliability of SOP paradigm: thermal distribution, power Jacob Minz, Eric Wong, Mohit Pathak, and Sung Kyu Lim are with the School of Electrical and Computer Engineering, Georgia Institute of Technology. Publisher Item Identifier???. 1 A special issue on SOP (vol. 27, issue 2, May 2004) provides a comprehensive survey of the state-of-the-art in SOP technology /00$00.00 c 2005 IEEE supply noise, and crosstalk. First, thermal issues can no longer be ignored in high performance 3D packages due to higher power densities and other issues. High temperatures not only require more advanced heat sinks, they also degrade circuit performance. Interconnect delay increases with temperature, which degrades circuit timing. If timing deteriorates enough, logic faults can occur. Hence thermal issues must be considered early-on in the design process. Second, the continuing trend of reducing power supply voltage has resulted in reduced noise margin, which effects reliability and may even cause functional failures due to spurious transitions. Active devices in 3D packaging draw a large volume of instantaneous current during switching, which causes simultaneous switching noise (SSN). Third, due to the scaling down of device geometry in deep-submicron technologies, the crosstalk noise between adjacent nets has become a major concern in high performance packaging design. Increased coupling noise can cause signal delays, logic hazards and even malfunctioning of the designs. Thus controlling the level of crosstalk noise in 3D packages chip is an important task for the designers. However, existing approaches consider these issues as an afterthought, which may require expensive cooling schemes, more decoupling capacitors (= decap), and more routing layers. In addition, many time-consuming iterations are required between full-length thermal/ssn/crosstalk simulation and manual layout repair until the convergence to a satisfactory result. We note that the placement of modules in 3D SOP design has huge impact on thermal distribution and SSN while the routing of the signal nets has direct impact on crosstalk. Therefore, the goal of this article is to present the

2 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) first physical layout algorithm for 3D SOP that combines thermal/decap-aware 3D placement and crosstalk-aware 3D global routing algorithm. The traditional design objectives such as performance, area, wirelength, and via costs are considered simultaneously to ensure high quality results. The related experimental results demonstrate the effectiveness of our approach. The remainder of this paper is organized as follows. Section II discusses previous works. Section III presents the problem formulation and an overview of our 3D algorithms. Section IV presents the SOP placement algorithm. Section V presents the SOP routing algorithm. Section VI presents the experimental results. We conclude in Section VII. II. RELATED WORKS The physical design for 3D integration has drawn interest from both academia and industry recently. Unlike the active physical design research effort in mixed signal System-On- Chip (SOC) [4], [5] and 3D stacked die technology [6], [7], [8], however, the research for 3D System-On-Package (SOP) considerably lacks behind due to its short history. The physical design research for SOP has been recently pioneered by the authors [9], [10], [11], [12], [13], [14]. Decoupling capacitor placement is done for 2D printed circuit board (PCB) designs [15], [16], [17]. Thermal-aware module placement algorithms are proposed for PCB designs [18], [19]. Many multi-chip module (MCM) routing algorithms have been proposed in the literature [20], [21], [22], [23], [24], [25]. Several works on MCM pin redistribution include [26], [27], [28]. There exists several similarities and differences between MCM and SOP routing. In general, the design objectives and constraints in both MCM and SOP routing are quite similar, where performance, wirelength, via, layer, and crosstalk optimization are crucial. In addition, the routing process is divided into multiple steps: pin distribution, topology generation, layer assignment, and detailed routing. A notable difference between SOP and MCM routing lies in the fact that there exists multiple device layers in SOP, whereas in MCM there is only one device layer. Therefore, nets are now connecting pins located in all intermediate layers in SOP, and the blocks in each layer behave as obstacles. In MCM, however, all pins are located only at the top layer, and there exists no obstacles except for the wires themselves. This makes SOP or 3D package routing problem more general than MCM routing. In SOP designs, some pins in each device layer can be distributed either to the layer above or below. In addition, some nets that connect blocks in non-neighboring device layers need to penetrate intermediate device layers. Since the blocks in these intermediate layers become obstacles, designers use routing channels in each device layer to insert feed-through vias. Thus, we have to determine which routing channel to use for feed-through vias. These routing channels are also used for intra-layer connection. Thus, after feed-through via insertion is done, we need to decide which remaining channels to use for the intra-layer connection. We then need to finish connection between the original and distributed pins. In case the pin location is not known for some soft blocks, we routing interval 1 RI(1) routing interval 2 RI(2) placement lyr1, L p (1) top pin distr lyr1, L t (1) x-y routing lyr1, L r (1) bot pin distr lyr1, L b (1) placement lyr2, L p (2) top pin distr lyr2, L t (2) x-y routing lyr2, L r (2) bot pin distr lyr2, L b (2) placement lyr3, L p (3) Fig. 2. Illustration of the layer structure and routing resource in SOP. The block and white dots respectively denote the original and redistributed pins. The x denotes a feed-through pin for an x-net to pass through a placement layer using a routing channel. The solid, dotted, and arrowed lines respectively denote signal wires, vias, and feed-through vias. need to determine their location either along the boundary or underneath the block during our local routing step. A. SOP Layer Structure III. PROBLEM FORMULATION The layer structure in multi-layer SOP is illustrated in Figure 2. The placement layers 2 contain the blocks (such as ICs, embedded passives, opto-electric components, etc), which from the point of view of physical design are just rectangular blocks with pins along the boundary. The interval between two adjacent placement layers is called the routing interval. A routing interval contains a stack of routing layers sandwiched between pin distribution layers. These layers are actually x-y routing layer pairs so that the rectilinear partial net topologies may be assigned to them. The pin distribution layers in each routing interval are used to evenly distribute pins from the nets that are assigned to this interval. Then these evenly distributed pins are connected using the routing layer pairs. Each placement layer consists of a pair of x-y routing layers, so routing is permitted. A feed-through via is used to connect two pin distribution layers from different routing intervals. Thus, the routing channels in each placement layer are used for two purposes: (i) accommodate feed-through vias, and (ii) perform local routing, where limited number of intralayer connections are made. In the SOP model the nets are classified into two categories. The nets which have all their terminals in the same placement layer are called i-nets, while the ones having terminals in different placement layers are x-nets. The i-nets can be routed in a single routing interval or indeed within the placement layer itself. On the other hand, the x-nets may span more than one routing interval. We use the Floor Connection Graph (FCG) [29], illustrated in Figure 3, to model the placement layer, where each routing channel becomes an edge and each channel intersection point becomes a node. In addition, each soft block becomes a node, and pin assignment edges are added to this node to connect to all adjacent channels. This model allows us to determine which boundary to use for the pins with unknown location. Each channel edge is associated with (i) via capacity for 2 We use placement layer and device layer interchangeably.

3 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) b 1 b 2 b 3 b 1 b 2 b b 4 b 4 (a) (b) (a) Fig. 3. Graph-based modelling of placement layer. (a) a connection between two hard blocks, (b) b 2 is a soft block, and the new pin is located on the bottom boundary. Dotted lines denote pi assignment edges. feed-through vias, and (ii) wire capacity for local routing. In addition, pin assignment edges have pin capacity for each boundary of a soft block. The routing and pin distribution layers in each routing interval are modelled with a standard x y z 3D grid, where each node represents a routing region and each x/y-direction edge represents each horizontal/vertical boundary among the regions. Each z-direction edge represents a group of vias each region can accommodate. Thus, all edges in this 3D grid are associated with capacity: x and y edges for wire capacity, and z edges for via capacity. B. SOP Placement Problem The following are given as the input to our 3D SOP placement problem: (i) a set of blocks B = {B 1, B 2,, B m } that represent the various active and passive components in the given SOP design, (ii) width, height, and maximum switching currents for each block, (iii) a netlist NL = {n 1, n 2,, n k } that specifies how the blocks are connected via electrical wires, (iv) K, the number of placement layers (= L p ) in the 3D packaging structure, (v) the number of power/ground signal layers along with the location of the power/ground pins. For each net n from a given netlist NL, let wl n denote the wirelength of n. The wirelength wl n is the sum of Manhattan distance in x, y, and z directions, where the z direction is the height of the associated vias. 3 Let w(i), h(i), and A(i) respectively denote the width, height, and area of the placement layer L p (i). Let A t = max{w(i)} max{h(j)}, i.e., the product of the maximum width and the maximum height among all placement layers. Let A s = i A(i). Let w ave and h ave denote the average width and height among all placement layers. Let A d = i { w ave w(i) + h ave h(i) }, i.e., the sum of the deviation of the dimension among all placement layers. Lastly, the area objective of 3D SOP placement, denoted A tot, is the weighted sum of A t, A s, and A d. Let T tot denote the maximum temperature among all blocks. Let D tot denote the total amount of decoupling capacitance required to suppress the simultaneous switching noise (SSN) under the given tolerance value. Lastly, the goal of the SOP Placement Problem is to find the location of each block 3 We assume that the height of vias connecting two x-y layers in a routing and pin distribution layer pair is of one grid, whereas the height of feedthrough vias that penetrate a placement layer is of 6 grid. (b) Fig. 4. Illustration of crosstalk computation between two Steiner trees. The numbers denote the coupling length. (a) crosstalk is 7, (b) crosstalk-free routing. in the placement layers such that the following cost function is minimized while SSN constraint is satisfied: 4 w 1 A tot + w 2 wl n + w 3 D tot + w 4 T tot (1) n NL In addition, decaps are required to be placed adjacent to the blocks that require them. C. SOP Routing Problem For each net n from a given netlist NL, let xt n and v n respectively denote the amount of crosstalk and via associated with n. Let cl(n, m) denote the coupling length between n and m as illustrated in Figure 4. We define xt n as follows: xt n = m NL, m n cl(n, m) z(n) z(m) where z(n) denote the routing layer that contains net n. For each net n, let d n (i) denote the Elmore delay [30] at sink i. Then, the maximum sink delay of net n, denoted d n, is max{d n (i) i n}. The performance of a SOP global routing is to minimize the following cost function: P x = max{d n n NL} For each routing interval i in a 3D package with K placement layers, let L t (i), L r (i), and L b (i) respectively denote the top pin distribution layer pairs, routing layer pairs, and bottom pin distribution layer pairs. There exists three kinds of connections in each routing interval i: top (connection between L p (i) and L t (i)), middle (connection between L t (i) and L b (i)), and bottom (connection between L b (i) and L p (i+1)). We use the routing layer pairs in L t (i), L r (i), and L t (i) respectively for the top, middle, and bottom connections. We construct the routing grid, denoted G(i), that contains all the distributed pins from L t (i) and L b (i) in a m n 2D grid. These pins are connected by a graph-based Steiner tree topology generation algorithm (we use an existing heuristic [31] for this purpose). The total layers used in a SOP global routing solution is given by: L tot = 1 i K ( L t (i) + L r (i) + L b (i) ) 4 In this weighted cost function, the area term A tot is normalized to A tot /A 0, where A 0 is the footprint area of the initial solution used for Simulated Annealing. This will ensure the area term stays close to one during the annealing process. The same normalization is done for the other objectives as well wirelength, decap, and thermal.

4 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) module netlist SOP 3D Module Placement simulated annealing noise/thermal analysis SOP 3D Global Routing 1. pin redistribution 2. topology generation 3. layer assignment 4. channel assignment 5. local routing Fig. 6. dy dz dx Top: 3D grid of a SOP for thermal modelling yn xp zp zn xn yp 3D SOP layout Fig. 5. Overall design flow Section V-D discusses how to compute L r (i) and Section V-E discusses how to compute L t (i) and L b (i). Lastly, the formal definition of SOP Global Routing Problem is as follows: Given a 3D placement and netlist, generate a routing topology for each net n, assign n to a set of routing layers and assign all pins of n to legal locations. All conflicting nets are assigned to different routing layers while satisfying various wire/via capacity constraints. The objective is to minimize the following cost function: w 5 L tot + w 6 P x + (w 7 xt n + w 8 wl n + w 9 v n ) (2) n NL We minimize L r during our layer assignment step and L t + L b during our channel assignment step. P x is the focus during our topology generation step. The wirelength, via, and crosstalk minimization are addressed in all steps of our global router. D. Overview of the Approach An overview of our 3D SOP physical design tool is shown in Figure 5. The input to the flow is a module-level netlist that specifies how the SOP modules such as digital die, analog die, embedded passives, opto/mems modules, etc, are connected. Our 3D module placement is based on Simulated Annealing approach [32], where we examine many candidate 3D module placement solutions and evaluate each of them using our thermal and power supply noise analyzers. More details are presented in Section IV. Our 3D global routing is decomposed into five steps, where the IO pins from the blocks are first evenly distributed using pin distribution layers to alleviate congestion problem. We then construct routing tree topology for all nets and assign them to the routing layers to minimize crosstalk. We place the feedthrough-vias for x-nets and finish connection in each placement layer. More details are presented in Section V. IV. SOP PLACEMENT ALGORITHM A. Overview of the Algorithm Simulated Annealing is a very popular approach for module placement due to its high quality solutions and flexibility in handling various constraints. We extend the existing 2D Sequence Pair scheme [33] to represent our 3D module placement solutions. Simulated Annealing procedure starts with an initial multi-layer placement along with its cost in terms of area, wirelength, decap, and maximum temperature. We then make a random perturbation (move) to the initial solution to generate a new 3D placement solution and measure its cost. We perform 3D SSN analysis to compute the amount of decap needed. The algorithm does a one time set-up of the thermal matrices. These matrices are used during incremental temperature calculations to evaluate the thermal cost. The thermal modelling and evaluation is explained in section IV- B. White space insertion (discussed in Section IV-D) is done and the increase in footprint area is estimated. If the new cost is lower than the old one, the solution is accepted; otherwise the new solution is accepted based on some probability that is dependent on temperature of the annealing schedule. We examine a pre-determined number of candidate solutions at each temperature. The temperature is decreased exponentially, and the annealing process terminates when the freezing temperature is reached. The actual decap allocation is done after optimization using LP solver discussed in Section IV-D. B. 3D Thermal Analysis We use a 3D thermal resistance mesh, as shown in Figure 6, for thermal analysis. Each node models a small volume of the 3D SOP substrate, and each edge denotes the connectivity between two adjacent regions. This is equivalent to using a discrete approximation of the steady state thermal equation k 2 T = P, where k is thermal conductivity, T is temperature, and P is power. This results in the matrix equation R P = T. Since the matrix computation involved with thermal analysis for each candidate 3D placement solution is expensive, we tackle this problem with the following scheme. Assuming that the thermal conductivity of the modules are similar, swapping the module locations would not change the thermal resistance matrix R too much. This means that matrix R only needs to be computed once in the beginning. To calculate the temperature profile of a new block configuration, the power profile P needs to be updated and then multiplied by R. Alternatively, a change in power profile P can be defined. Multiplying R and P

5 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) p 1 s 2 POWER 1 s 3 p 5 p 4 B p 3 C A p 2 p0 POWER 2 s 1 POWER 3 (a) Fig. 7. Illustration of 3D power supply modelling. (a) multi-layer power supply network, where multiple sets of device, routing, and power supply layers are stacked together, (b) 3D-grid modelling. where black and gray nodes respectively denote power supply and consumption nodes. will give change in temperature profile T. Adding T to the old temperature profile will give the new temperature profile. These equations summarize the two methods: T new = R P new, P = P new P old, T = R P, T new = T old + T. Swapping two blocks usually has a small effect on the power profile, so P should be sparse. This reduces the number of multiplications used by the second method at the expense of doing extra additions and subtractions. (b) Fig. 8. Illustration of SSN calculation. The dominant current source for block A is s 1, which is not located in the same layer. The dominant (shortest) path p 0 carries I A /6 amount of current, where I A denotes the current demand of A. The block C draws current from s 2 and s 3 using p 1, p 2, and p 3 (each of these carries I C /3 amount of current). The resistance of p 34, the overlap between p 3 and p 4, contributes to the SSN at B and C. 3D Decap Allocation Subject to X Maximize S = j N k x (j) k=h X k=1 HX X k=1 β kj x (j) k (3) j N k k A k, k = 1, 2,, H (4) x (j) k S (j), j = 1, 2,, M (5) x (j) k 0, k, j (6) C. 3D Power Supply Noise Modelling We model the P/G network for 3D SOP as a 3D grid graph as shown in Figure 7. The edges in the grid-graph have inductive and resistive impedances. The mesh contains power-supply points and connection points, which supply and consume currents. The current distibution for the blocks is inversely proportional to the impedance of path from source. The dominant current source for a block is defined as the voltage source supplying significantly more power to the block than any other neighboring sources. The dominant path for a block is the path from the dominant supply to the block causing the most drop in voltage. It has been shown experimentally in [34] that the shortest path between the dominant current source (nearest Vdd pins) and the block offers highly accurate SSN estimation within reasonable runtime. In our 3D SSN analysis engine, we compute dominant paths for all blocks which is dynamically updated whenever a new placement solution is evaluated in terms of SSN. Let P k be a dominant current path for block k. Then T k = {P j : P j P k } denotes the set of dominating paths overlapping with P k (T k includes P k itself). Let P jk be the overlapping segments between path P j and P k. Let R Pjk and L Pjk denote the resistance and inductance of P jk. After the current paths and their values have been determined for all blocks, the SSN for B k is given by V k noise = P j T k (i j R Pjk + L P jk di j dt ) where i j is the current in the path P j, which is the sum of all currents through this path to various consumers. An illustration is shown in Figure 8. The weight of i j and its rate of change Fig. 9. LP-based 3D decap allocation are the resistive and inductive components of the path. Let Q k denote the maximum charge drawn from the power supply by block B k. If θ = max(1, Vnoise k lim lim /Vnoise), where Vnoise is the noise tolerance, the decap allocated to block B k is given by D k = (1 1/θ)Qk Vnoise lim, 1 k M where M denotes the total number of blocks to be placed. Finally, the decap cost is given by D tot = M k=1 Dk. D. 3D Decoupling Capacitor Placement In our LP-based 3D decap allocation shown in Figure 9, x (j) k denotes the amount of decap allocated from white space k to block j. The objective is to maximize the utilization of white spaces for decap allocation. The constraint (4) limits the total allocation of a white space to its total area, where A k denotes the area of white space k, and N k denotes the neighboring blocks of k. The constraint (5) ensures that the total amount of decap allocated to each block does not exceed its demand, where S (j) denotes the demand. Unlike the 2D decap assignment as done in [34], the neighboring blocks in 3D case are the adjacent blocks either from the same placement layer or from neighboring layers. The assumption here is that the white space from different layers can be used to allocate decap. In order to facilitate this, we introduce predetermined parameters β kj to control decap allocation to block j from white space module k. β kj evaluates the usefulness of whitespace k to be used as a decap for

6 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) (a) (b) (c) Fig. 10. Illustration of 3D decap allocation. (a) 3D placement, (b) X- expansion, (c) XY-expansion, where the darker blocks denote the neighboring blocks of the decap (= white space) inserted. Note that blocks from other layers can utilize the white space for decap insertion. ws Coarse Pin Distribution 1: CP = m n grid; 2: for (each pin p NL) 3: s = slot closest to p and under-utilized; 4: place p in s; 5: C = restricted multi-level clustering of pins; 6: hgt = height of C; 7: B(hgt) = initial m n placement at top level; 8: for (i = hgt downto 0) 9: move clusters in C(i) to optimize cost; 10: B(i) = new m n placement at level i; 11: project B(i) to B(i 1); 12: return B(0); block j. The value of β kj decreases with increasing distance of whitespace from the module that requires decap. Hence allocations of decap from far located white spaces will be avoided. The second aspect of the formulation deals with the generation of A k, the optimal area that will accommodate all decap without too much penalty on the final footprint area. The decap budget S (j) for each block is determined through SSN analysis of a particular 3D placement. The decap allocation involves several iteration between white space insertion and LP solving to reach a good solution both in terms of completeness of decap allocated and the additional increase in area. The white space is generated by expanding the floorplan in the X and Y direction as illustrated in Figure 10. In a multi-layer placement, however, we have to take additional care to balance the expansion in each layer, so as to minimize the expansion of the total footprint area. The expansion is proportional to the decap demand of each module. In our Sequence Pair-based 3D placement, we modify the horizontal and vertical constraint graphs to expand the placement into X and Y directions, respectively. Note that we may have to iterate between white space insertion and allocation if the current expansion does not satisfy all the decap needs. In order to prevent from iterating between LP and white space insertion, we start by adding white spaces generously in the floorplan then use LP to perform compaction. In our scheme, the XY-expansion is controlled by β parameters discussed earlier, where the existing white spaces have higher ranks than the newly inserted ones. The actual amount of expansion is determined by a parameter α 1, where we increase the amount of expansion by a factor of α for each block that needs decap. This ensures that the initial expansion satisfies the LP constraint. Since LP determines the individual x (j) k, we use these assignment values to decide which white space insertion was not necessary and thus can be removed. As exact decap allocation is a time-consuming process, it is impractical to solve an LP problem for every candidate solution. However, our white space insertion takes O(n 2 ), which is computationally equivalent to computing block location using longest path computation for area. Therefore, we can afford white space insertion (and the calculation of area increase due to decap insertion) at every move. We perform the actual LPbased decap allocation at the end of the annealing process as Fig. 11. Pseudocode for coarse pin distribution algorithm a compaction stage. A. Overview of the Approach V. SOP GLOBAL ROUTING Our 3D global router is divided into the following five steps: 1) pin redistribution: we first determine which set of i-nets and x-net segments are assigned to each routing interval. The pins from these nets are then evenly distributed in the top and bottom pin distribution layers. Our pin redistribution step is further divided into three steps: coarse pin distribution, net distribution, and detailed pin distribution. 2) topology generation: Steiner trees are generated for all nets in each routing interval so that the performance of the routed design is optimized. 3) layer assignment: the routed nets are assigned to a unique routing pair in the routing layer so that the total number of layers used is minimized. 4) channel assignment: for each x-net, its location of feedthrough via in the routing channel is determined. We also assign channels and finish the connections for the i-nets that are to be routed in each placement layer. 5) local routing: we finish connections between the pins now located in the routing channels and the pins on the block boundaries. We also determine the location of pins from soft blocks. We use an existing max-cut partitioning heuristic [35] for the net distribution in step 1. For step 2, we use an existing RSA/G heuristic [31] to generate the net topologies. In addition, we use the congestion-driven rip-up-and-reroute [36] for step 5. Therefore, the focus of this paper is to develop heuristics for the remaining steps in 1, 3, and 4. B. Coarse Pin Distribution A pseudocode for our coarse pin distribution algorithm is shown in Figure 11. First, we assign all pins in the placement layers to a nearby grid point in CP, a m n 2D grid, while trying to balance the number of pins assigned to each grid point (line 1-4). An illustration is shown in Figure 12. We impose pin capacity for each grid point so that the pins are evenly distributed in CP. Our approach is to visit the pins in

7 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) P 2 P 1 Fig. 12. Illustration of coarse pin distribution. Pins along the external boundary are not shown for simplicity. Fig. 13. Illustration of the gain computation for coarse pin distribution. A net n indicated by the black node is moved from P 1 to P 2. Then, g d (n) = 2, g w = 0, and g b = 1, where deg(p 2 ) = 3 becomes the maximum-degree partition. a random order and find the best grid for each pin. For each pin p, a grid point that is closest to the original location of p and has not violated the pin capacity constraint is chosen (line 3). After this process is finished, CP serves the starting point of our min-cut placement based algorithm, where each grid point corresponds to a partition. We then iteratively improve the quality of this initial solution via move-based approach. In our new heuristic algorithm, our cost function is based on (i) how far the new pin location is from the initial location, (ii) total wirelength, and (iii) how evenly distributed the interpartition connections are. For each pin p, we define the displacement gain, denoted g d (p), to represent how much the distance between the original and new location is reduced if p is moved to another partition. We define the wirelength gain, denoted g w (p), to represent how much the length of the nets that contain p (estimated by the half-perimeter of the bounding box) is reduced if p is moved to another partition. For a partition P, let deg(p ) denote the number of nets that have connections to P. Then, the cutsize balance factor is defined to be the difference between the maximum and minimum deg(p ) among all partitions. We define the balance gain, denoted g b (p), to represent how much the cutsize balance factor is reduced. Our move-based multilevel mincut partitioning algorithm performs cell move based on the combined gain function: g(p) = w 1 g d (p) + w 2 g w (p) + w 3 g b (p) Figure 13 shows an illustration of the gain computation. In our multi-level approach, we first perform restricted multi-level clustering (line 5) that preserves the initial m n placement result, where two pins that are in different partitions initially are not clustered together. At each level of the cluster hierarchy from top to bottom (line 8), we compute the combined gain g(p) for each cluster and perform cluster moves. In order to compute the displacement and balance gain of a group of pins (= cluster), we add the individual displacement and balance gain of all pins in this cluster. When there is no gain at a certain level, we decompose the clusters into next lower level and perform refinement. This process continues until we obtain a solution at the bottom level (line 12). C. Detailed Pin Distribution After coarse pin distribution and net distribution are finished, we know which set of nets are assigned to each routing interval as well as their (evenly distributed) entry/exit points in pin distribution layers. However, the coarse pin distribution is Fig. 14. Detailed Pin Distribution 1: for (each routing interval i) 2: m i = n i = total pin(i) ; 3: DP [i] = m i n i grid; 4: for (each pin p i) 5: assign p to DP [i] using CP result; 6: for (each routing interval i) 7: for (each net n DP [i]) 8: F n = displacement force on n; 9: obtain new location in DP [i] based on F n ; 10: L i = sort nets in DP [i] in lexicographic order; 11: split L i and form columns in DP [i]; Pseudocode for SOP detailed pin distribution algorithm done based on the 2D grid that merged all multiple placement layers into one. The even pin distribution in this 2D grid offers a good enough reference points for net distribution. But, it does not consider even pin distribution in each individual routing interval. In addition, it is also possible that pin capacity for each routing region in each routing interval may be violated. Therefore, the goal of detailed pin distribution is to address these problems in each routing interval so that the subsequent topology generation and layer assignment truly benefit from this even pin distribution. In addition, we use a grid large enough for each routing interval to legalize pin location, i.e., each grid point contains only one pin. Since the crosstalk minimization are addressed during the prior steps, the major focus of detailed pin distribution step is on (i) how far the new location is from the original location obtained from coarse pin distribution, and (ii) the total wirelength. A pseudocode for our net distribution algorithm is shown in Figure 14. Our force-directed heuristic algorithm encourages all pins from the same net to be placed closer to the center of mass while minimizing the distance between the old and new pin location. The size of the grids for detailed pin distribution (= DP [i]) is determined for each routing interval so that each pin can be assigned to a unique grid point (line 1-3). In addition, we project the coarse pin distribution result (= CP ) to this new set of grids (line 5). Note that there still exists overlap among the pins in DP [i] at this point even though DP is usually finer than CP. In order to remove this overlap, we apply an additional force that slightly pulls each pin towards the center of mass. For each pin p in a net n, the displacement force (line 8) for x-direction is defined as follows: F x (p) = x(m p) x(p) width(n p )

8 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) Fig. 15. Illustration of layer assignment where x(m p ) denotes the x-coordinate of M, the center of mass of n, and width(n p ) denotes the width of the bounding box of n. We compute F y (p) using the y-coordinates. Note that 1 F x (p), F y (p) 1. The vector (F x (p), F y (p)) is then added to (x p, y p ) (line 9). This minor change on the original pin location helps to remove most of the overlap in DP [i] while not increasing the wirelength too much. We then sort the pins based on the lexicographic order of new locations and assign each pin starting from the top-most row in the left-most column (line 10-11). Due to its simplicity, this deterministic algorithm is quite efficient and effective in reducing the additional wirelength required for pin distribution as well as the total wirelength among all nets as shown in Section VI. D. SOP Layer Assignment For each routing interval i, the routing grid G i contains all the (redistributed) pins from the top and bottom pin distribution layers (L t (i) and L b (i)). We generate Steinertree based routing topology using the RSA/G heuristic [31] to connect these pins. The goal is to minimize the maximum sink delay D max as discussed in Section III-C. The routing layer L r (i) in each routing interval i consists of several layer pairs, where each pair consists of one layer for horizontal wires and another layer for vertical wires. Thus, we can assign a entire rectilinear routing tree to a routing pair. In addition, two trees that are intersecting can also be assigned to the same routing pair provided that they do not violated the wire capacity of the routing regions involved. An illustration is shown in Figure 15. The SOP layer assignment problem is to assign each net to a routing layer pair so that the wire capacity constraint is satisfied and the total number of layer pairs used for all routing intervals is minimized. Our approach is to perform layer assignment for each routing interval independently. For each routing interval i, we construct a Layer Constraint Graph (LCG) [37], denoted LCG i, as follows: corresponding to each net in n RI(i) we have a node in LCG i. Two nodes x, y LCG i have an edge e = (x, y) between them if a net segment s x x and s y y are sharing the same edge in G i, i.e., s x and s y are sharing the same boundary of a routing region. Then we use a node coloring algorithm to assign colors to the nodes in LCG i such that no two nodes sharing an edge are assigned the same color. Let C tot denote the total number of colors used during the node coloring, and let w denote the wire capacity of the boundary in routing region. Then, the total number of layer pairs used in this routing interval is computed as follows: L r (i) = C tot /w. Lastly, a node with color q w + r (r < w) is assigned to layer pair q. Let h max Fig. 16. SOP Layer Assignment 1: for (each routing interval i) 2: LCG i = layer constraint graph for i; 3: S = sort all nodes in LCG i; 4: color = 0; 5: for (node j S) 6: fin[j] = colors used by neighbors of j; 7: tmp = lowest used color / fin[j]; 8: if (tmp = ) 9: color = color + 1; 10: j color; 11: else 12: j tmp; 13: assign a layer pair to each net; Pseudocode for SOP layer assignment algorithm and v max respectively denote the maximum number of wires used among all horizontal and vertical edges in G i. Then, the following is a lower-bound on the number of layers used in L r (i): L r (i) max{h max, v max }/w. Figure 16 shows the SOP layer assignment algorithm that includes our coloring heuristic. We first sort all nodes in LCG i in a decreasing order of the number of their neighbors (line 3). Let fin[n] denote the set of colors used by the neighbors of n (line 6). We visit the nodes in the sorted order (line 5). In case there exists an used color that is not included in fin[n] (line 7), we assign this color to n (line 12). In case there exist multiple colors that satisfy this condition, we use the lowest color. Otherwise, we introduce a new color and assign it to n (line 9-10). Lastly, we assign a layer pair to each net based on its color (line 13). In spite of its simplicity, this greedy algorithm provides results that are very close to the lower bound on total number of layers used as demonstrated in Section VI. E. SOP Channel Assignment Our prior topology generation and layer assignment steps focus on the connections among the distributed pins in the pin distribution layers. after these steps are finished, there remain two kinds of connections: (i) connections between the original and distributed pins, and (ii) feed-through via insertion. For both types of connections, the routing channels in the placement layers are used. During the SOP channel assignment step, each pin in the pin distribution layer is mapped to a routing channel in the neighboring placement layer. In addition, each pin from an x-net that needs to penetrate a placement layer is also mapped to a routing channel in the placement layer. Lastly, we generate routing topology for each pin-to-channel connection and assign it to a routing layer pair in the pin distribution layer. Each placement layer is modelled with the Floor Connection Graph (FCG) [29] as illustrated in Figure 3. During the SOP local routing step, we use the congestion-driven rip-up-and-reroute [36] to (i) finish the routing for the given set of two-pin connections while satisfying the wire capacity constraint, and (ii) decide the location of pins for soft blocks along their boundaries. An illustration is shown in Figure 17.

9 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) Fig. 17. (a) (b) Illustration of (a) channel assignment, (b) local routing SOP Channel Assignment 1: for (each placement layer i) 2: C(i) = channels in i; 3: G(i) = 2D routing grid; 4: S = sorted feed-through pins; 5: for (each pin p S) 6: best = ; 7: for (each channel c C(i)); 8: compute cost(p, c) and update best; 9: T p = p-to-best routing topology in G(i); 10: update channel and edge usage in G(i); 11: repeat line 4-10 for terminal pin set; 12: perform layer assignment on G(i); For each placement layer L p (i), let X(i) denote the set of pins that need to be mapped to a routing channel in L p (i). X(i) contains pins from L b (i 1) and L t (i). The pins in X(i) are grouped into two sets: terminal pin set P t (i) for the pins that have terminals in L p (i) and feed-through pin set P f (i) for the pairs of pins that require feed-through vias to penetrate L p (i). Let C(i) denote the set of routing channels in L p (i). Each channel c C(i) is associated with the pin capacity constraint. The goal of SOP Channel Assignment Problem is to map each pin p X i to a routing channel c C i for 1 i K and finish p-to-c connection while satisfying the pin capacity constraint A, i.e., C i < A. For each pair of pins (p 1, p 2 ) P f (i), we map p 1 and p 2 to the same channel c C i. Let L t (i) denote the number of layers used to finish connection between pins from L p (i) and L t (i), and let L b (i) denote the number of layers used to finish connection between pins from L p (i+1) and L b (i). The objective of SOP channel assignment is to minimize the following cost function: w 5 ( L t (i) + L b (i) ) + (w 8 wl n + w 9 v n ) 1 i K n NL A pseudocode for SOP channel assignment algorithm is shown in Figure 18. We visit each placement layer and assign the feed-through pins and terminal pins to the channels. In addition, an L-shaped routing topology for each pin-to-channel is constructed in a 2D grid G(i) (line 3). The signal delay of feed-through vias is larger than that of other types of vias. Since each channel is under a capacity constraint, it is important to assign the feed-through pins to the nearest channels first. Therefore, our strategy is to perform channel assignment for the feed-through pins first to minimize the delay of x-nets that require feed-through vias. In addition, we give priority to the pins that are included in long nets. Thus, we sort the pins based on the wirelength (line 4). Our heuristic algorithm assigns pins to channels based on the cost of mapping we seek a channel with the best mapping cost for a given pin (line 6-8). We compute the cost of mapping for a given pin p and a channel c as follows: cost(p, c) = room(c) dist(p, c) bend(p, c) cong(p, c) where room(c) denotes the number of pins c can accommodate until it violates the capacity constraint, dist(p, c) is the Manhattan distance between p and c, bend(p, c) is the number of bends in the connection between p and c, and cong(p, c) Fig. 18. Pseudocode for SOP channel assignment algorithm denotes the total number of existing connections along the proposed L-shaped route. We choose the channel with the maximum cost(p, c). Note that a channel is represented with a line instead of a point. Thus, distance(p, c) and bend(p, c) are based on the shortest connection between p and any point on c. Upon a pin to channel mapping, we update the usage of channel in C(i) and edges in G(i) (line 10). After the channel assignment and topology generation for all pin-tochannel connections are finished, we perform layer assignment using our coloring heuristic presented in Section V-D and compute the total number of layers used. VI. EXPERIMENTAL RESULTS We implemented our algorithms in C++/STL and ran our experiments on Linux Beowulf clusters. We tested our algorithms with two sets of benchmarks. The first set is from the standard GSRC floorplan circuits [38]. The second set, named the GT benchmark, was synthesized from the IBM circuits [39], where we use our multi-level partitioner [40] to divide the gate-level netlist into multiple blocks. The GSRC benchmarks are small to medium-sized in terms of both the number of blocks and nets. 5 The GT benchmarks contain medium to large number of blocks with dense connections. A. SOP Placement Results The dimension of the area was assumed to be in mm 2 and area per unit decap cost was chosen to be 50mm 2. The number of placement layers is fixed to four for all experiments. In our experiments, we used the same parameters across the benchmarks. The technology parameters used were 0.01 Ω/µm for wire resistance per unit length, 1pH/µm for wire inductance per unit length, and 10fF/µm for wire capacitance per unit length. We report the average runtime for each benchmark measured in seconds. For our thermal analysis we used a grid size of 10 x 10 x 7 in the x, y and z direction. The number of active layers was four with three passive layers in between them. The conductivity of the substrate was chosen to be that of silicon (0.1W/mmC). The conductivity of the sides of the package was fixed at 0.01W/mmC. The heat sink was 5 The GT benchmark circuits are available for download at our website:

10 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) assumed to be at the top of the package with a conductivity of 0.5W/mmC and the conductivity of the bottom of the package was chosen to be 0.2W/mmC. The power density of the blocks varied from 10W/mm 2 to 300W/mm 2 depending on the switching current demands. The switching current demands were generated using a formula using a random number and the size of the block. Table I shows a comparison among 3D SOP placement algorithms with three different objectives: (1) area/wire-driven (w 1 = w 2 = 1 and w 3 = w 4 = 0 in Equation (1), Section III-B), (2) decap-driven (w 1 = w 2 = w 3 = 1 and w 4 = 0) and (3) thermal-driven (w 1 = w 2 = w 4 = 1 and w 3 = 0). Our baseline algorithm is the area/wirelength-driven algorithm. We achieved a 21% improvement over baseline in decap cost using our decap-driven algorithm, with a 7% increase in final area and 15% increase in wirelength. The temperature decreases by 3%. Our thermal-driven floorplanning achieved a 21% improvement over baseline with a dramatic increase in total area of 76%. Wirelength increased by 28% and decap requirement increases by 19%. We note that it may be possible to reduce area and wirelength costs by finetuning parameters. Table II shows the result of our final 3D placement algorithm that considers all four objectives: area, wirelength, decap, and thermal (w 1 = w 2 = w 3 = w 4 = 1 in Equation (1), Section III-B). We note that our area/wire/decap/thermaldriven algorithm achieves an improvement in both decap and temperature over baseline by 13% and 9% respectively. There is a reasonable increase in final area and wirelength by 19% and 15% respectively. We also tested our area/wire/decap algorithm under a thermal constraint. In this case, all candidate solutions during the simulated annealing are discarded if the maximum temperature is above a certain threshold (100 C in our case). We observe that the area, wirelength, and decap results improve simultaneously at the cost of slight increase in temperature. The CPU times of the algorithm depend on the number of candidate solutions evaluated. The times reported in the table directly reflect the subset of the configuration space spanned by the algorithm. The constrained algorithm ran much faster, making a small number of moves because fewer moves were accepted. Figure 19 shows the temperature and decap requirement of each block of the final floorplan of the n300 benchmark. The figure clearly shows that there is little correlation between temperature and decap. This shows that minimizing one objective does not necessarily minimize the other objective as a positive correlation would indicate. It also means that minimizing both objectives is not mutually exclusive, as a negative correlation would indicate. This matches with the block placement results. Table III shows the correlation constants. Table IV shows power supply noise simulation results for three 3D placement schemes no-decap aware, decap-aware, and decap-aware+decap-placement. The P/G plane structure size is 246mm 254mm, and the top P/G plane pair was modelled using cavity resonator model [41] and simulated in HSPICE. The placement layer that uses this P/G plan includes 14 active devices. The DC 5V sources are located at four Decap Fig. 19. Decap Requirement vs Block Temperature Temperature (C) Decap vs temperature correlation plot for n300 TABLE III DECAP vs TEMPERATURE CORRELATION CONSTANTS ckt correlation ckt correlation n n50b n50c n n100b n100c n n200b n200c n gt gt gt gt gt n300 Trend Line R 2 = edges in the plane pair and fourteen current sources exist in the plane pair. As can be seen from the Table IV, the SSN for the decap-aware algorithm is lower compared to non-decapaware algorithm. The SSN of the noisiest block blk4 is 1.58V, which is reduced to 1.42V by our decap-aware scheme. With the insertion of decap, the noise is suppressed to 0.22V. In addition, the total amount of decap required for non-decapaware algorithm is 26.7, which is reduced to 19.9 with our decap-aware scheme. The largest amount of decap is used for blk5 (0.50nF ), because of an increase in its SSN after optimization. The numbers show that the SSN was efficiently suppressed and the amount of decap reduced by using our algorithms. B. SOP Routing Results Table V shows the characteristics of the GSRC and GT benchmark designs used during routing. In Table VII we compare pin redistribution results. Under the DPD columns, we perform DPD (= detailed pin distribution presented in Section V-C) only, where we skip CPD (= coarse pin distribution presented in Section V-B) and assign all i-nets to the routing interval below for net distribution. Under the CPD+DPD, we perform CPD using the algorithm, assign all i-nets to the routing interval below for net distribution, and perform DPD. DPD serves as our baseline, where CPD+DPD respectively demonstrate the impact of our coarse pin distribution. In all cases, we perform detailed pin distribution to legalize the pin location, i.e., remove overlaps among the pins. We use the following metrics to evaluate our solutions: wirelength

11 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) TABLE I AREA/WIRE-DRIVEN VS DECAP-DRIVEN VS THERMAL-DRIVEN ALGORITHMS ckts area/wire-driven decap-driven thermal-driven name size area wire decap temp area wire decap temp area wire decap temp n E E E n50b E E E n50c E E E n E E E n100b E E E n100c E E E n E E E n200b E E E n200c E E E n E E E gt E E E gt E E E gt E E E gt E E E gt E E E RATIO TIME TABLE II THERMAL/DECAP-DRIVEN AND THERMAL-CONSTRAINT/DECAP-DRIVEN ALGORITHMS (THE RATIOS ARE RELATIVE TO THE BASELINE) ckts area/wire/decap/thermal thermal constraint name area wire decap temp area wire decap temp n E E n50b E E n50c E E n E E n100b E E n100c E E n E E n200b E E n200c E E n E E gt E E gt E E gt E E gt E E gt E E RATIO TIME between the original and the new location (dw), total wirelength (wl), crosstalk (xt) and total number of layer pairs used (lyr) in the routing layers. The displacement (dw) and wirelength (wl) results are scaled by 10 6, and the time reported is the average runtime among the GSRC/GT circuits. From the comparison between DPD and CPD+DPD, we note that the displacement result (dw) increases by an average of 50%. However, CPD lowers the total wirelength (wl) consistently by 10% on average and the number of layers (lyr) by 10% on average. In table VIII we show our topology generation (RSA/G) and layer assignment (LAYER) results. We used the technology parameters for 0.13µ process for Elmore delay computation. Specifically, the driver resistance of 29.4kΩ, input capacitance of 0.050f F, unit-length resistance of 0.82Ω/µm and unitlength capacitance of 0.24f F/µm are used. We report the total wirelength (wl), Elmore delay of the nets with maximum sink delay (dly), and the lower bound and the actual number of layers used for the top-most routing interval. In general, GSRC benchmarks have bigger delay than GT benchmarks due to the larger average wirelength. Our layer assignment algorithm presented in Section V-D is able to achieve results very close to the lower bounds discussed in Section V-D. For the GT circuits, the layer assignment results are within 10% of the lower bound. For the GSRC circuits, we were able to achieve the results equal to the lower bound. Our channel assignment results are shown in Table IX. The baseline case is where we optimize the wirelength only. We then compare it to our multi-objective channel assignment algorithm that simultaneously optimizes the wirelength, via, and layer usage. Our comparison indicates that the number of layers is consistently and significantly reduced especially for the bigger GT benchmarks, where an average improvement of 48% is observed. In case of the second largest benchmark gt1000, we achieved 57% improvement. This saving on the layer usage comes at the cost of increase in wirelength and vias. The average increase in wirelength is 12% and 30% for GSRC and GT benchmarks, respectively. The average increase

12 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) TABLE IV POWER SUPPLY NOISE SIMULATION RESULTS. WE REPORT SSN NOISE FOR EACH BLOCK PLACED IN THE TOP PLACEMENT LAYER. (A) NON-OPTIMIZED CASE WITHOUT ANY DECOUPLING CAPACITOR. FROM TABLE I WE NEED 26.7 DECAP UNITS TO SUPPRESS SSN NOISE. (B) OPTIMIZED CASE WITHOUT ANY DECOUPLING CAPACITOR. FROM TABLE I WE NEED 19.9 DECAP UNITS. (C) OPTIMIZED CASE WITH DECOUPLING CAPACITORS INSERTED. no decap decap no decap decap blk decap aware used blk decap aware used blk blk blk blk blk blk blk blk blk blk blk blk blk blk TABLE V BENCHMARK CHARACTERISTICS OF GSRC AND GT CIRCUITS. WCAP DENOTES THE WIRE CAPACITY OF EACH BOUNDARY IN THE ROUTING TILES. CP AND DP RESPECTIVELY DENOTE THE DIMENSION OF 2D GRIDS USED DURING OUR COARSE AND DETAILED PIN DISTRIBUTION STEPS. ckts blk i-net x-net pin wcap CP DP n x3 24x22 n x4 28x26 n x5 34x35 n x6 46x48 n x8 51x50 gt x4 110x101 gt x6 138x140 gt x8 163x165 gt x15 218x219 gt x15 225x227 in via usage is 63% and 79% for GSRC and GT benchmarks, respectively. We noted that the channel assignment result is very sensitive to the weighting constants among the objectives used in our cost function. This indicates that the solution space of the channel assignment problem offers many useful tradeoff points. Table X reports our local routing results. We report the wirelength, maximum and average routing demand as well as the standard deviation. Our baseline is the local routing optimized for wirelength only. We then compare it against our multiobjective local routing algorithm that simultaneously optimizes wirelength and routing demands. In both cases, the same pin demand constraint is imposed. We note that the improvement of our multi-objective algorithm over the baseline is significant, especially for GSRC circuits the routing demands were reduced by 33% on average while the wirelength increased by only 10%. In addition, we reduced the routing demands for the GT benchmarks by 41% on average, with wirelength increase by 20%. In our biggest benchmarks (gt1500), our routing demand reduction is the largest (53%), which comes with the maximum increase in wirelength (23%). This again indicates that the local routing result is very sensitive to the weighting constants among the objectives used in our cost function. The lower standard deviation of our multi-objective algorithm indicates that the routing demand is more evenly TABLE VI AREA INFORMATION OF THE 4-LAYER SOP FLOORPLANNING FOR GSRC AND GT BENCHMARK DESIGNS. THE MIN AND MAX RESPECTIVELY DENOTE THE DIMENSION OF THE MINIMUM AND MAXIMUM FLOORPLAN AREA, WHEREAS THE FINAL COLUMN DENOTES THE DIMENSION OF THE OVERALL FOOTPRINT. ckts min max final n30 239x x x237 n50 226x x x230 n x x x231 n x x x227 n x x x274 gt50 88x108 76x139 97x139 gt x x x143 gt x x x161 gt x x x172 gt x x x183 TABLE VII SOP PIN REDISTRIBUTION RESULTS USING OUR COARSE AND DETAILED PIN DISTRIBUTION ALGORITHMS. WE REPORT THE WIRELENGTH BETWEEN THE ORIGINAL AND THE NEW LOCATION (DW), TOTAL WIRELENGTH (WL), CROSSTALK (XT) AND TOTAL NUMBER OF LAYER PAIRS USED (LYR) IN THE ROUTING LAYERS. DPD CPD+DPD ckt dw wl xt lyr dw wl xt lyr n n n n n gt gt gt gt gt TIME distributed (= lower congestion) compared to the wirelengthonly case. Lastly, Figure 20 shows a snapshot of a 4-layer SOP with 3 routing interval for n200 benchmark. VII. CONCLUSIONS AND ONGOING WORKS In this article, we presented the first physical layout algorithm for 3D SOP designs that includes thermal/decap-aware 3D placement and crosstalk-aware 3D routing algorithm. We extended the thermal and SSN models to 3D and used them to guide our 3D module placement. In addition to estimating the amount of decap needed to keep the SSN at the circuits tolerance level, we also efficiently used near-by white spaces to allocate decap if necessary. Our routing process is divided into pin redistribution, topology generation, layer assignment, channel assignment, and local routing steps. Our major objective is to reduce the amount of crosstalk and layers while satisfying various constraints on routing resource. Our ongoing work includes SOP detailed routing. ACKNOWLEDGMENT This research is partially supported by the grants from National Science Foundation and the Packaging Research Center at Georgia Institute of Technology.

13 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) TABLE X SOP LOCAL ROUTING RESULTS FOR THE BASELINE (WIRELENGTH MINIMIZATION ONLY) AND MULTI-OBJECTIVE ALGORITHMS. WE REPORT THE WIRELENGTH (WL), MAXIMUM (MAX) AND AVERAGE (AVE) ROUTING DEMAND AS WELL AS THE STANDARD DEVIATION (DEV). wl-only wl+rd ckt wl max avg dev wl max avg dev n n n n n gt gt gt gt gt TIME TABLE VIII TOPOLOGY GENERATION AND LAYER ASSIGNMENT RESULTS. WE REPORT THE TOTAL WIRELENGTH (WL), ELMORE DELAY OF THE NETS WITH MAXIMUM SINK DELAY (DLY), AND THE LOWER BOUND (LOW) AND THE ACTUAL NUMBER (LYR) OF LAYERS USED. RSA/G LAYER ckt wl dly low lyr n n n n n gt gt gt gt gt TIME TABLE IX SOP CHANNEL ASSIGNMENT RESULTS. WE REPORT THE LAYER USAGE, WIRELENGTH, AND VIA FOR THE BASELINE (WIRELENGTH MINIMIZATION ONLY) AND MULTI-OBJECTIVE ALGORITHMS. wl-only lyr+wl+via ckt lyr wl via lyr wl via n n n n n gt gt gt gt gt TIME REFERENCES [1] R. Tummala and V. Madisetti, System on chip or system on package? IEEE Design & Test of Computers, pp , [2] R. Tummala, SOP: What is it and why? a new microsystem-integration technology paradigm-moore s law for system integration of miniaturized convergent systems of the next decade, IEEE Transactions on Advanced Packaging, [3] R. T. et al, The SOP for miniaturized, mixed-signal computing, communication, and consumer systems of the next decade, IEEE Transactions on Advanced Packaging, [4] K. Kundert, H. Chang, D. Jefferies, G. Lamant, E. Malavasi, and F. Sendig, Design of mixed-signal systems-on-a-chip, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, [5] G. Gielen and R. Rutenbar, Computer-aided design of analog and mixed-signal integrated circuits, Proceedings of the IEEE, [6] Y. Deng and W. Maly, Interconnect characteristics of 2.5-D system integration scheme, in Proc. Int. Symp. on Physical Design, [7] S. Das, A. Chandrakasan, and R. Reif, Design tools for 3-D integrated circuits, in Proc. Asia and South Pacific Design Automation Conf., [8] B. Goplen and S. Sapatnekar, Efficient thermal placement of standard cells in 3D ICs using a force directed approach, in Proc. IEEE Int. Conf. on Computer-Aided Design, [9] J. Minz and S. K. Lim, Layer assignment for system on packages, in Proc. Asia and South Pacific Design Automation Conf., [10] J. Minz, M. Pathak, and S. K. Lim, Net and pin distribution for 3D package global routing, in Proc. Design, Automation and Test in Europe, [11] R. Ravichandran, J. Minz, M. Pathak, S. Easwar, and S. K. Lim, Physical layout automation for system-on-packages, in IEEE Electronic Components and Technology Conference, [12] J. Minz, S. K. Lim, J. Choi, and M. Swaminathan, Module placement for power supply noise and wire congestion avoidance in 3D packaging, in Proc. IEEE Electrical Performance of Electronic Packaging, [13] J. Minz and S. K. Lim, A global router for system-on-package targeting layer and crosstalk minimization, in Proc. IEEE Electrical Performance of Electronic Packaging, [14] J. Minz, E. Wong, and S. K. Lim, Thermal and congestion-aware physical design for 3d system-on-pacakge, in IEEE Electronic Components and Technology Conference, [15] J. Choi, S. Chun, N. Na, M. Swaminathan, and L. Smith, A methodology for the placement and optimization of decoupling capacitors for gigahertz systems, in VLSI Design Symposium, [16] A. Kamo, T. Watanabe, and H. Asai, An optimization method for placement of decoupling capacitors on printed circuit board, in Proc. IEEE Electrical Performance of Electronic Packaging, [17] Y. Chen, Z. Chen, and J. Fang, Optimum placement of decoupling capacitors on packages and printed circuit boards under the guidance of electromagnetic field simulation, in IEEE Electronic Components and Technology Conference, [18] J. Lee, Thermal placement algorithm based on heat conduction analogy, IEEE Trans. on Components and Packaging Technologies, [19] K. Deb, P. Jain, N. Gupta, and H. Maji, Multiobjective placement of electronic components using evolutionary algorithms, IEEE Trans. on Components and Packaging Technologies, [20] K. Khoo and J. Cong, An efficient multilayer MCM router based on four-via routing, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, pp , [21] J. D. Cho, K. Liao, S. Raje, and M. Sarrafzadeh, M2R: Multilayer routing algorithm for high-performance MCM, IEEE Trans. on Circuits and Systems Part I, [22] W. W. Dai, Topological routing in surf: Generating a rubberband sketch, in Proc. ACM Design Automation Conf., 1991, pp

IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) 2005 14 Modeling and simulation of core switching noise for asics, IEEE Trans. Advanced Packaging, pp. 4 11, 2002.

He was with the Advanced VLSI Design Lab, Indian Institute of Technology, Kharagpur for a year, where he was involved in the design of digital chips.

His areas of interest are physical design automation and algorithms for electronic CAD. Fig. 20. A snapshot of 4-layer SOP with 3 routing interval for n200 benchmark.

, 1997. [24] Y. Cha, C. Rim, and K. Nakajima, A simple and effective greedy multilayer router for MCMs, in Proc. Int. Symp. on Physical Design, 1997. [25] J. Cong and P.

Sarrafzadeh, The pin redistribution problem in multichip modules, in Proc. Int. ASIC Conf., 1991. [27] D. Chang, T. Gonzales, and O.

14 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) Modeling and simulation of core switching noise for asics, IEEE Trans. Advanced Packaging, pp. 4 11, Jacob Rajkumar Minz (S 05) received his B. Tech degree in Computer Science and Engineering from the Indian Institute of Technology, Kharagpur in He was with the Advanced VLSI Design Lab, Indian Institute of Technology, Kharagpur for a year, where he was involved in the design of digital chips. He joined the School of Electrical and Computer Engineering, Georgia Institute of Technology in August 2002, where he is pursuing research towards his PhD. His areas of interest are physical design automation and algorithms for electronic CAD. Fig. 20. A snapshot of 4-layer SOP with 3 routing interval for n200 benchmark. The routing-tile boundaries with heavy usage are shown with thicker lines. [23] D. Wang and E. Kuh, A new timing-driven multilayer MCM/IC routing algorithm, in Proc. IEEE Multi-Chip Module Conf., [24] Y. Cha, C. Rim, and K. Nakajima, A simple and effective greedy multilayer router for MCMs, in Proc. Int. Symp. on Physical Design, [25] J. Cong and P. Madden, Performance driven multi-layer general area routing for PCB/MCM designs, in Proc. ACM Design Automation Conf., [26] J. Cho and M. Sarrafzadeh, The pin redistribution problem in multichip modules, in Proc. Int. ASIC Conf., [27] D. Chang, T. Gonzales, and O. Ibarra, A flow based approach to the pin redistribution problem for multi-chip modules, in Proc. Great Lakes Symposum on VLSI, [28] J. D. Cho, An optimum pin redistribution for multichip modules, in Proc. IEEE Multi-Chip Module Conf., [29] J. Cong, Pin assignment with global routing for general cell design, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, pp , [30] W. Elmore, The transient response of damped linear networks with particular regard to wideband amplifiers, Journal of Applied Physics, pp , [31] J. Cong, A. B. Kahng, and K.-S. Leung, Efficient algorithms for the minimum shortest path steiner arborescence problem with applications to VLSI physical design, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, pp , [32] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimization by simulated annealing, Science, pp , [33] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, Rectangle packing based module placement, in Proc. IEEE Int. Conf. on Computer- Aided Design, 1995, pp [34] S. Zhao, C. Koh, and K. Roy, Decoupling capacitance allocation and its application to power supply noise aware floorplanning, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, pp , [35] J. D. Cho, S. Raje, M. Sarrafzadeh, M. Sriram, and S. M. Kang, Crosstalk-minimum layer assignment, in IEEE Custom Integrated Circuits Conference, [36] L. McMurchie and C. Ebeling, Pathfinder: A negotiation based performance-driven router for FPGAs, in ACM Intl Symposium on Field-Programmable Gate Arrays, 1995, pp [37] M. Ho, M. Sarrafzadeh, G. Vijayan, and C. Wong, Layer assignment for multichip modules, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, pp , [38] GSRC, [39] C. J. Alpert, The ISPD98 circuit benchmark suite, in Proc. Int. Symp. on Physical Design, 1998, pp [40] J. Cong and S. K. Lim, Edge separability based circuit clustering with application to multi-level circuit partitioning, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, no. 3, pp , [41] N. Na, J. Choi, M. Swaminathan, J. P. Libous, and D. P. O Connor, Eric Wong (S 05) received his BS degree in electrical engineering from the State University of New York, Binghamton in He is currently studying for his PhD in electrical and computer engineering at Georgia Institute of Technology. His research interests include physical VLSI design and design automation for VLSI circuits. Mohit Pathak (S 05) is a Ph.D. candidate and a graduate research assistant in the school of Electrical and Computer Engineering at Georgia Institute of Technology. He received his B.Tech degree in Computer Science and Engineering from Indian Institute of Technology, Kharagpur in He was with Magma Design Automation, India for a year, where he worked as a member of technical staff. His areas of interest are physical design automation and VLSI design. Sung Kyu Lim (S 94-M 00) received B.S. (1994), M.S. (1997), and Ph.D. (2000) degrees all from the Computer Science Department of University of California at Los Angeles. During , He was a post-doctoral scholar at UCLA, and a senior engineer at Aplus Design Technologies, Inc. He joined the School of Electrical and Computer Engineering at Georgia Institute of Technology as an assistant professor in August 2001 and the College of Computing as an adjunct assistant professor in September He is currently the director of the Georgia Tech Computer Aided Design Laboratory. Dr. Lim s research focus is on the physical design automation for 3D circuits, 3D System-On-Packages, microarchitectural physical planning, Field Programmable Analog Arrays, and Quantum Cell Automata. He has been on the advisory board of ACM/SIGDA since He is currently serving the technical program committee of IEEE International Symposium on Circuits and Systems, ACM Great Lakes Symposium on VLSI, IEEE International Conference on Computer Design, ACM International Symposium on Physical Design, and ACM/IEEE Asia and South Pacific Design Automation Conference. He has been awarded the ACM/SIGDA DAC Graduate Scholarship in June 2003.

SEMICONDUCTOR industry is beginning to question the

SEMICONDUCTOR industry is beginning to question the 644 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 Placement and Routing for 3-D System-On-Package Designs Jacob Rajkumar Minz, Student Member, IEEE, Eric Wong,