SEMICONDUCTOR industry is beginning to question the

Size: px

Start display at page:

Download "SEMICONDUCTOR industry is beginning to question the"

Karin Dean
6 years ago
Views:

1 644 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 Placement and Routing for 3-D System-On-Package Designs Jacob Rajkumar Minz, Student Member, IEEE, Eric Wong, Student Member, IEEE, Mohit Pathak, Student Member, IEEE, and Sung Kyu Lim, Senior Member, IEEE Abstract Three-dimensional (3-D) packaging via system-onpackage (SOP) is a viable alternative to system-on-chip (SOC) to meet the rigorous requirements of today s mixed signal system integration. In this article, we present the first physical design algorithms for thermal and power supply noise-aware 3-D placement and crosstalk-aware 3-D global routing. Existing approaches consider the thermal distribution, power supply noise, and crosstalk issues as an afterthought, which may require an expensive cooling scheme, more decoupling capacitors (=decap), and additional routing layers. Our goal is to overcome this problem with our thermal/decap/crosstalk-aware 3-D layout automation tools. The traditional design objectives such as performance, area, wirelength, and via are considered simultaneously to ensure high quality results. The related experimental results demonstrate the effectiveness of our approaches. Index Terms Crosstalk, mixed-signal CAD, placement and routing, power supply noise, system-on-package (SOP), thermal distribution, three-dimensional (3-D) packaging. I. INTRODUCTION SEMICONDUCTOR industry is beginning to question the viability of system-on-chip (SOC) approach due to its lowyield and high-cost problem. Recently, three-dimensional (3-D) packaging via system-on-package (SOP) [1] [3] has been proposed as an alternative solution to meet the rigorous requirements of today s mixed signal system integration. 1 The SOP is about 3-D integration of multiple functions in a miniaturized package achieved by thin film embedding. The 3-D SOP concept optimizes integrated circuits (ICs) for transistors and the package for integration of digital, RF, optical, sensor and others. It accomplishes this by both build-up SOP, similar to IC fabrication, and by stacked SOP, similar to parallel board fabrication. The uniqueness of 3-D SOP is in the highly integrated or embedded RF, optical or digital functional blocks, and sensors, in contrast to stacked ICs and stacked package. Due to the high complexity in designing large-scale SOP under multiple objectives and constraints, computer-aided design (CAD) tools have become indispensable. Thus, innovative ideas on CAD tools for Manuscript received November 26, 2004; revised July 30, This work was supported by Grants from MARCO C2S2 and NSF CAREER under CCF This work was recommended for publication by Associate Editor C. Amon upon evaluation of the reviewers comments. The authors are with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA USA ( limsk@ece.gatech.edu). Color versions of Figs. 6, 7, 10, and 19 are available online at Digital Object Identifier /TCAPT A special issue on SOP (vol. 27, issue 2, May 2004) provides a comprehensive survey of the state-of-the-art in SOP technology. SOP technology are crucial to fully exploit the potential of this new emerging technology. However, there exists very few tools, if not none, that handle the complexity of automatic 3-D SOP layout generation. This article tackles the three most crucial issues that threaten the reliability of SOP paradigm: thermal distribution, power supply noise, and crosstalk. First, thermal issues can no longer be ignored in high performance 3-D packages due to higher power densities and other issues. High temperatures not only require more advanced heat sinks, they also degrade circuit performance. Interconnect delay increases with temperature, which degrades circuit timing. If timing deteriorates enough, logic faults can occur. Hence thermal issues must be considered early-on in the design process. Second, the continuing trend of reducing power supply voltage has resulted in reduced noise margin, which effects reliability and may even cause functional failures due to spurious transitions. Active devices in 3-D packaging draw a large volume of instantaneous current during switching, which causes simultaneous switching noise (SSN). Third, due to the scaling down of device geometry in deep-submicron technologies, the crosstalk noise between adjacent nets has become a major concern in high performance packaging design. Increased coupling noise can cause signal delays, logic hazards and even malfunctioning of the designs. Thus controlling the level of crosstalk noise in 3-D packages chip is an important task for the designers. However, existing approaches consider these issues as an afterthought, which may require expensive cooling schemes, more decoupling capacitors decap, and more routing layers. In addition, many time-consuming iterations are required between full-length thermal/ssn/crosstalk simulation and manual layout repair until the convergence to a satisfactory result. We note that the placement of modules in 3-D SOP design has huge impact on thermal distribution and SSN while the routing of the signal nets has direct impact on crosstalk. Therefore, the goal of this article is to present the first physical layout algorithm for 3-D SOP that combines thermal/decap-aware 3-D placement and crosstalk-aware 3-D global routing algorithm. The traditional design objectives such as performance, area, wirelength, and via costs are considered simultaneously to ensure high quality results. The related experimental results demonstrate the effectiveness of our approach. The remainder of this paper is organized as follows. Section II discusses previous works. Section III presents the problem formulation and an overview of our 3-D algorithms. Section IV presents the SOP placement algorithm. Section V presents the SOP routing algorithm. Section VI presents the experimental results. We conclude in Section VII /$ IEEE

MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 645 II. RELATED WORKS The physical design for 3-D integration has drawn interest from both academia and industry recently.

history. The physical design research for SOP has been recently pioneered by the [9] [14].

2 MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 645 II. RELATED WORKS The physical design for 3-D integration has drawn interest from both academia and industry recently. Unlike the active physical design research effort in mixed signal SOC [4], [5] and 3-D stacked die technology [6] [8], however, the research for 3-D SOP considerably lacks behind due to its short history. The physical design research for SOP has been recently pioneered by the [9] [14]. Decoupling capacitor placement is done for two-dimensional (2-D) printed circuit board (PCB) designs [15] [17]. Thermal-aware module placement algorithms are proposed for PCB designs [18], [19]. Many multichip module (MCM) routing algorithms have been proposed in the literature [20] [25]. Several works on MCM pin redistribution include [26] [28]. There exists several similarities and differences between MCM and SOP routing. In general, the design objectives and constraints in both MCM and SOP routing are quite similar, where performance, wirelength, via, layer, and crosstalk optimization are crucial. In addition, the routing process is divided into multiple steps: pin distribution, topology generation, layer assignment, and detailed routing. A notable difference between SOP and MCM routing lies in the fact that there exists multiple device layers in SOP, whereas in MCMs there is only one device layer. Therefore, nets are now connecting pins located in all intermediate layers in SOP, and the blocks in each layer behave as obstacles. In MCMs, however, all pins are located only at the top layer, and there exists no obstacles except for the wires themselves. This makes SOP or 3-D package routing problem more general than MCM routing. In SOP designs, some pins in each device layer can be distributed either to the layer above or below. In addition, some nets that connect blocks in nonneighboring device layers need to penetrate intermediate device layers. Since the blocks in these intermediate layers become obstacles, designers use routing channels in each device layer to insert feed-through vias. Thus, we have to determine which routing channel to use for feed-through vias. These routing channels are also used for intra-layer connection. Thus, after feed-through via insertion is done, we need to decide which remaining channels to use for the intra-layer connection. We then need to finish connection between the original and distributed pins. In case the pin location is not known for some soft blocks, we need to determine their location either along the boundary or underneath the block during our local routing step. III. PROBLEM FORMULATION A. SOP Layer Structure The layer structure in multilayer SOP is illustrated in Figs. 1 and 2. The placement layers 2 contain the blocks (such as ICs, embedded passives, opto-electric components, etc), which from the point of view of physical design are just rectangular blocks with pins along the boundary. The interval between two adjacent placement layers is called the routing interval. A routing interval contains a stack of routing layers sandwiched between pin distribution layers. These layers are actually routing 2 We use placement layer and device layer interchangeably. Fig. 1. Illustration of 3-D SOP, where A, D, M, and P, respectively, node analog chip, digital chip, memory module, and embedded passive blocks. Fig. 2. Illustration of the layer structure and routing resource in SOP. The block and white dots respectively denote the original and redistributed pins. The x denotes a feed-through pin for an x-net to pass through a placement layer using a routing channel. The solid, dotted, and arrowed lines, respectively, denote signal wires, vias, and feed-through vias. layer pairs so that the rectilinear partial net topologies may be assigned to them. The pin distribution layers in each routing interval are used to evenly distribute pins from the nets that are assigned to this interval. Then these evenly distributed pins are connected using the routing layer pairs. Each placement layer consists of a pair of routing layers, so routing is permitted. A feed-through via is used to connect two pin distribution layers from different routing intervals. Thus, the routing channels in each placement layer are used for two purposes: i) accommodate feed-through vias and ii) perform local routing, where limited number of intra-layer connections are made. In the SOP model the nets are classified into two categories. The nets which have all their terminals in the same placement layer are called -nets, while the ones having terminals in different placement layers are -nets. The -nets can be routed in a single routing interval or indeed within the placement layer itself. On the other hand, the -nets may span more than one routing interval. We use the floor connection graph (FCG) [29], illustrated in Fig. 3, to model the placement layer, where each routing channel becomes an edge and each channel intersection point becomes a node. In addition, each soft block becomes a node, and pin assignment edges are added to this node to connect to all adjacent channels. This model allows us to determine which boundary to

Dotted lines denote pi assignment edges. use for the pins with unknown location. Each channel edge is associated with i) via capacity for feed-through vias and ii) wire capacity for local routing.

3 646 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 Fig. 3. Graph-based modeling of placement layer: (a) a connection between two hard blocks and (b) b is a soft block, and the new pin is located on the bottom boundary. Dotted lines denote pi assignment edges. use for the pins with unknown location. Each channel edge is associated with i) via capacity for feed-through vias and ii) wire capacity for local routing. In addition, pin assignment edges have pin capacity for each boundary of a soft block. The routing and pin distribution layers in each routing interval are modeled with a standard 3-D grid, where each node represents a routing region and each -direction edge represents each horizontal/vertical boundary among the regions. Each -direction edge represents a group of vias each region can accommodate. Thus, all edges in this 3-D grid are associated with capacity: and edges for wire capacity, and edges for via capacity. B. SOP Placement Problem The following are given as the input to our 3-D SOP placement problem: i) a set of blocks that represent the various active and passive components in the given SOP design, ii) width, height, and maximum switching currents for each block, iii) a netlist that specifies how the blocks are connected via electrical wires, iv), the number of placement layers in the 3-D packaging structure, and v) the number of power/ground signal layers along with the location of the power/ground pins. For each net from a given netlist, let denote the wirelength of. The wirelength is the sum of Manhattan distance in, and directions, where the direction is the height of the associated vias. 3 Let, and, respectively, denote the width, height, and area of the placement layer. Let, i.e., the product of the maximum width and the maximum height among all placement layers. Let. Let and denote the average width and height among all placement layers. Let, i.e., the sum of the deviation of the dimension among all placement layers. Last, the area objective of 3-D SOP placement, denoted, is the weighted sum of, and. Let denote the maximum temperature among all blocks. Let denote the total amount of decoupling capacitance required to suppress the simultaneous switching noise (SSN) 3 We assume that the height of vias connecting two x y layers in a routing and pin distribution layer pair is of one grid, whereas the height of feed-through vias that penetrate a placement layer is six grid. Fig. 4. Illustration of crosstalk computation between two Steiner trees. The numbers denote the coupling length. (a) Crosstalk is 7. (b) Crosstalk-free routing. under the given tolerance value. Last, the goal of the SOP Placement Problem is to find the location of each block in the placement layers such that the following cost function is minimized while SSN constraint is satisfied: 4 In addition, decaps are required to be placed adjacent to the blocks that require them. C. SOP Routing Problem For each net from a given netlist NL, let and, respectively, denote the amount of crosstalk and via associated with. Let denote the coupling length between and as illustrated in Fig. 4. We define as follows: where denote the routing layer that contains net. For each net, let denote the Elmore delay [30] at sink. Then, the maximum sink delay of net, denoted, is. The performance of a SOP global routing is to minimize the following cost function: For each routing interval in a 3-D package with placement layers, let, and, respectively, denote the top pin distribution layer pairs, routing layer pairs, and bottom pin distribution layer pairs. There exists three kinds of connections in each routing interval : top (connection between and ), middle (connection between and ), and bottom (connection between and ). We use the routing layer pairs in, and, respectively, for the top, middle, and bottom connections. We construct the routing grid, denoted, that contains all the distributed pins 4 In this weighted cost function, the area term A is normalized to A =A, where A is the footprint area of the initial solution used for Simulated Annealing. This will ensure the area term stays close to one during the annealing process. The same normalization is done for the other objectives as well wirelength, decap, and thermal. (1)

These pins are connected by a graph-based Steiner tree topology generation algorithm (we use an existing heuristic [31] for this purpose).

4 MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 647 Fig. 6. Top: 3-D grid of a SOP for thermal modeling. Fig. 5. Overall design flow. We place the feedthrough-vias for -nets and finish connection in each placement layer. More details are presented in Section V. from and in a 2-D grid. These pins are connected by a graph-based Steiner tree topology generation algorithm (we use an existing heuristic [31] for this purpose). The total layers used in a SOP global routing solution is given by Section V-D discusses how to compute and Section V-E discusses how to compute and. Last, the formal definition of SOP Global Routing Problem is as follows: Given a 3-D placement and netlist, generate a routing topology for each net, assign to a set of routing layers and assign all pins of to legal locations. All conflicting nets are assigned to different routing layers while satisfying various wire/via capacity constraints. The objective is to minimize the following cost function: We minimize during our layer assignment step and during our channel assignment step. is the focus during our topology generation step. The wirelength, via, and crosstalk minimization are addressed in all steps of our global router. D. Overview of the Approach An overview of our 3-D SOP physical design tool is shown in Fig. 5. The input to the flow is a module-level netlist that specifies how the SOP modules such as digital die, analog die, embedded passives, opto/mems modules, etc, are connected. Our 3-D module placement is based on the Simulated Annealing approach [32], where we examine many candidate 3-D module placement solutions and evaluate each of them using our thermal and power supply noise analyzers. More details are presented in Section IV. Our 3-D global routing is decomposed into five steps, where the input/output (IO) pins from the blocks are first evenly distributed using pin distribution layers to alleviate congestion problem. We then construct routing tree topology for all nets and assign them to the routing layers to minimize crosstalk. (2) IV. SOP PLACEMENT ALGORITHM A. Overview of the Algorithm Simulated Annealing is a very popular approach for module placement due to its high quality solutions and flexibility in handling various constraints. We extend the existing 2-D Sequence Pair scheme [33] to represent our 3-D module placement solutions. Simulated Annealing procedure starts with an initial multi-layer placement along with its cost in terms of area, wirelength, decap, and maximum temperature. We then make a random perturbation (move) to the initial solution to generate a new 3-D placement solution and measure its cost. We perform 3-D SSN analysis to compute the amount of decap needed. The algorithm does a one time set-up of the thermal matrices. These matrices are used during incremental temperature calculations to evaluate the thermal cost. The thermal modeling and evaluation is explained in Section IV-B. White space insertion (discussed in Section IV-D) is done and the increase in footprint area is estimated. If the new cost is lower than the old one, the solution is accepted; otherwise the new solution is accepted based on some probability that is dependent on temperature of the annealing schedule. We examine a predetermined number of candidate solutions at each temperature. The temperature is decreased exponentially, and the annealing process terminates when the freezing temperature is reached. The actual decap allocation is done after optimization using LP solver discussed in Section IV-D. B. 3-D Thermal Analysis We use a 3-D thermal resistance mesh, as shown in Fig. 6, for thermal analysis. Each node models a small volume of the 3-D SOP substrate, and each edge denotes the connectivity between two adjacent regions. This is equivalent to using a discrete approximation of the steady state thermal equation, where is thermal conductivity, is temperature, and is power. This results in the matrix equation. Since the matrix computation involved with thermal analysis for each candidate 3-D placement solution is expensive, we

648 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 (a) (b) Fig. 7. Illustration of 3-D power supply modeling.

where black and gray nodes respectively denote power supply and consumption nodes. Fig. 8. Illustration of SSN calculation.

The block C draws current from s and s using p ;p, and p (each of these carries I =3 amount of current). The resistance of p, the overlap between p and p, contributes to the SSN at B and C.

5 648 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 (a) (b) Fig. 7. Illustration of 3-D power supply modeling. (a) Multilayer power supply network, where multiple sets of device, routing, and power supply layers are stacked together. (b) 3-D-grid modeling. where black and gray nodes respectively denote power supply and consumption nodes. Fig. 8. Illustration of SSN calculation. The dominant current source for block A is s, which is not located in the same layer. The dominant (shortest) path p carries I =6 amount of current, where I denotes the current demand of A. The block C draws current from s and s using p ;p, and p (each of these carries I =3 amount of current). The resistance of p, the overlap between p and p, contributes to the SSN at B and C. tackle this problem with the following scheme. Assuming that the thermal conductivity of the modules are similar, swapping the module locations would not change the thermal resistance matrix too much. This means that matrix only needs to be computed once in the beginning. To calculate the temperature profile of a new block configuration, the power profile needs to be updated and then multiplied by. Alternatively, a change in power profile can be defined. Multiplying and will give change in temperature profile. Adding to the old temperature profile will give the new temperature profile. These equations summarize the two methods:. Swapping two blocks usually has a small effect on the power profile, so should be sparse. This reduces the number of multiplications used by the second method at the expense of doing extra additions and subtractions. C. 3-D Power Supply Noise Modeling We model the P/G network for 3-D SOP as a 3-D grid graph as shown in Fig. 7. The edges in the grid-graph have inductive and resistive impedances. The mesh contains power-supply points and connection points, which supply and consume currents. The current distribution for the blocks is inversely proportional to the impedance of path from source. The dominant current source for a block is defined as the voltage source supplying significantly more power to the block than any other neighboring sources. The dominant path for a block is the path from the dominant supply to the block causing the most drop in voltage. It has been shown experimentally in [34] that the shortest path between the dominant current source (nearest Vdd pins) and the block offers highly accurate SSN estimation within reasonable runtime. In our 3-D SSN analysis engine, we compute dominant paths for all blocks which is dynamically updated whenever a new placement solution is evaluated in terms of SSN. Let be a dominant current path for block. Then denotes the set of dominating paths overlapping with ( includes itself). Let be the overlapping segments between path and. Let and denote the resistance and inductance of. Fig. 9. LP-based 3-D decap allocation. After the current paths and their values have been determined for all blocks, the SSN for is given by where is the current in the path, which is the sum of all currents through this path to various consumers. An illustration is shown in Fig. 8. The weight of and its rate of change are the resistive and inductive components of the path. Let denote the maximum charge drawn from the power supply by block. If, where is the noise tolerance, the decap allocated to block is given by where denotes the total number of blocks to be placed. Finally, the decap cost is given by. D. 3-D Decoupling Capacitor Placement In our LP-based 3-D decap allocation shown in Fig. 9, denotes the amount of decap allocated from white space to block. The objective is to maximize the utilization of white spaces for decap allocation. The constraint (??) limits the total allocation of a white space to its total area, where denotes the area of white space, and denotes the neighboring blocks of. The constraint (??) ensures that the total amount of decap allocated to each block does not exceed its demand, where denotes the demand.

Note that blocks from other layers can utilize the white space for decap insertion. Fig. 11. Pseudocode for coarse pin distribution algorithm.

6 MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 649 Fig. 10. Illustration of 3-D decap allocation: (a) 3-D placement, (b) X-expansion, and (c) XY -expansion, where the darker blocks denote the neighboring blocks of the decap ( = white space) inserted. Note that blocks from other layers can utilize the white space for decap insertion. Fig. 11. Pseudocode for coarse pin distribution algorithm. Unlike the 2-D decap assignment as done in [34], the neighboring blocks in 3-D case are the adjacent blocks either from the same placement layer or from neighboring layers. The assumption here is that the white space from different layers can be used to allocate decap. In order to facilitate this, we introduce predetermined parameters to control decap allocation to block from white space module. evaluates the usefulness of whitespace to be used as a decap for block. The value of decreases with increasing distance of whitespace from the module that requires decap. Hence allocations of decap from far located white spaces will be avoided. The second aspect of the formulation deals with the generation of, the optimal area that will accommodate all decap without too much penalty on the final footprint area. The decap budget for each block is determined through SSN analysis of a particular 3-D placement. The decap allocation involves several iteration between white space insertion and LP solving to reach a good solution both in terms of completeness of decap allocated and the additional increase in area. The white space is generated by expanding the floorplan in the and direction as illustrated in Fig. 10. In a multilayer placement, however, we have to take additional care to balance the expansion in each layer, so as to minimize the expansion of the total footprint area. The expansion is proportional to the decap demand of each module. In our sequence pair-based 3-D placement, we modify the horizontal and vertical constraint graphs to expand the placement into and directions, respectively. Note that we may have to iterate between white space insertion and allocation if the current expansion does not satisfy all the decap needs. In order to prevent from iterating between LP and white space insertion, we start by adding white spaces generously in the floorplan then use LP to perform compaction. In our scheme, the -expansion is controlled by parameters discussed earlier, where the existing white spaces have higher ranks than the newly inserted ones. The actual amount of expansion is determined by a parameter 1, where we increase the amount of expansion by a factor of for each block that needs decap. This ensures that the initial expansion satisfies the LP constraint. Since LP determines the individual, we use these assignment values to decide which white space insertion was not necessary and thus can be removed. As exact decap allocation is a time-consuming process, it is impractical to solve an LP problem for every candidate solution. However, our white space insertion takes, which is computationally equivalent to computing block location using longest path computation for area. Therefore, we can afford white space insertion (and the calculation of area increase due to decap insertion) at every move. We perform the actual LP-based decap allocation at the end of the annealing process as a compaction stage. V. SOP GLOBAL ROUTING A. Overview of the Approach Our 3-D global router is divided into the following five steps. 1) Pin redistribution: we first determine which set of -nets and -net segments are assigned to each routing interval. The pins from these nets are then evenly distributed in the top and bottom pin distribution layers. Our pin redistribution step is further divided into three steps: coarse pin distribution, net distribution, and detailed pin distribution. 2) Topology generation: Steiner trees are generated for all nets in each routing interval so that the performance of the routed design is optimized. 3) Layer assignment: the routed nets are assigned to a unique routing pair in the routing layer so that the total number of layers used is minimized. 4) Channel assignment: for each -net, its location of feedthrough via in the routing channel is determined. We also assign channels and finish the connections for the i-nets that are to be routed in each placement layer. 5) Local routing: we finish connections between the pins now located in the routing channels and the pins on the block boundaries. We also determine the location of pins from soft blocks. We use an existing max-cut partitioning heuristic [35] for the net distribution in step 1. For step 2, we use an existing RSA/G heuristic [31] to generate the net topologies. In addition, we use the congestion-driven rip-up-and-reroute [36] for step 5. Therefore, the focus of this paper is to develop heuristics for the remaining steps in 1, 3, and 4. B. Coarse Pin Distribution A pseudocode for our coarse pin distribution algorithm is shown in Fig. 11. First, we assign all pins in the placement layers to a nearby grid point in CP, a 2-D grid, while trying to balance the number of pins assigned to each grid point (line

650 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 Fig. 12. Illustration of coarse pin distribution.

Illustration of the gain computation for coarse pin distribution. A net n indicated by the black node is moved from P to P.

We impose pin capacity for each grid point so that the pins are evenly distributed in CP. Our approach is to visit the pins in a random order and find the best grid for each pin.

7 650 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 Fig. 12. Illustration of coarse pin distribution. Pins along the external boundary are not shown for simplicity. Fig. 14. Pseudocode for SOP detailed pin distribution algorithm. Fig. 13. Illustration of the gain computation for coarse pin distribution. A net n indicated by the black node is moved from P to P. Then, g (n) =02; g = 0, and g = 01, where deg(p )=3becomes the maximum-degree partition. 1 4). An illustration is shown in Fig. 12. We impose pin capacity for each grid point so that the pins are evenly distributed in CP. Our approach is to visit the pins in a random order and find the best grid for each pin. For each pin, a grid point that is closest to the original location of and has not violated the pin capacity constraint is chosen (line 3). After this process is finished, CP serves the starting point of our min-cut placement based algorithm, where each grid point corresponds to a partition. We then iteratively improve the quality of this initial solution via move-based approach. In our new heuristic algorithm, our cost function is based on i) how far the new pin location is from the initial location, ii) total wirelength, and iii) how evenly distributed the inter-partition connections are. For each pin, wedefine the displacement gain, denoted, to represent how much the distance between the original and new location is reduced if is moved to another partition. We define the wirelength gain, denoted, to represent how much the length of the nets that contain (estimated by the half-perimeter of the bounding box) is reduced if is moved to another partition. For a partition, let denote the number of nets that have connections to. Then, the cutsize balance factor is defined to be the difference between the maximum and minimum among all partitions. We define the balance gain, denoted, to represent how much the cutsize balance factor is reduced. Our move-based multilevel mincut partitioning algorithm performs cell move based on the combined gain function Fig. 13 shows an illustration of the gain computation. In our multilevel approach, we first perform restricted multilevel clustering (line 5) that preserves the initial placement result, where two pins that are in different partitions initially are not clustered together. At each level of the cluster hierarchy from top to bottom (line 8), we compute the combined gain for each cluster and perform cluster moves. In order to compute the displacement and balance gain of a group of pins ( cluster), we add the individual displacement and balance gain of all pins in this cluster. When there is no gain at a certain level, we decompose the clusters into next lower level and perform refinement. This process continues until we obtain a solution at the bottom level (line 12). C. Detailed Pin Distribution After coarse pin distribution and net distribution are finished, we know which set of nets are assigned to each routing interval as well as their (evenly distributed) entry/exit points in pin distribution layers. However, the coarse pin distribution is done based on the 2-D grid that merged all multiple placement layers into one. The even pin distribution in this 2-D grid offers a good enough reference points for net distribution. But, it does not consider even pin distribution in each individual routing interval. In addition, it is also possible that pin capacity for each routing region in each routing interval may be violated. Therefore, the goal of detailed pin distribution is to address these problems in each routing interval so that the subsequent topology generation and layer assignment truly benefit from this even pin distribution. In addition, we use a grid large enough for each routing interval to legalize pin location, i.e., each grid point contains only one pin. Since the crosstalk minimization are addressed during the prior steps, the major focus of detailed pin distribution step is on i) how far the new location is from the original location obtained from coarse pin distribution and ii) the total wirelength. A pseudocode for our net distribution algorithm is shown in Fig. 14. Our force-directed heuristic algorithm encourages all pins from the same net to be placed closer to the center of mass while minimizing the distance between the old and new pin location. The size of the grids for detailed pin distribution is determined for each routing interval so that each pin can be assigned to a unique grid point (line 1 3). In addition, we project the coarse pin distribution result to this new set of grids (line 5). Note that there still exists overlap among the pins in at this point even though DP is usually finer than CP. In order to remove this overlap, we apply an additional force that slightly pulls each pin toward the center of mass. For each pin in a net, the displacement force (line 8) for -direction is defined as follows:

8 MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 651 Fig. 15. Illustration of layer assignment. where denotes the -coordinate of, the center of mass of, and denotes the width of the bounding box of. We compute using the -coordinates. Note that. The vector is then added to (line 9). This minor change on the original pin location helps to remove most of the overlap in while not increasing the wirelength too much. We then sort the pins based on the lexicographic order of new locations and assign each pin starting from the top-most row in the left-most column (line 10 11). Due to its simplicity, this deterministic algorithm is quite efficient and effective in reducing the additional wirelength required for pin distribution as well as the total wirelength among all nets as shown in Section VI. D. SOP Layer Assignment For each routing interval, the routing grid contains all the (redistributed) pins from the top and bottom pin distribution layers ( and ). We generate Steiner-tree based routing topology using the RSA/G heuristic [31] to connect these pins. The goal is to minimize the maximum sink delay as discussed in Section III-C. The routing layer in each routing interval consists of several layer pairs, where each pair consists of one layer for horizontal wires and another layer for vertical wires. Thus, we can assign a entire rectilinear routing tree to a routing pair. In addition, two trees that are intersecting can also be assigned to the same routing pair provided that they do not violated the wire capacity of the routing regions involved. An illustration is shown in Fig. 15. The SOP layer assignment problem is to assign each net to a routing layer pair so that the wire capacity constraint is satisfied and the total number of layer pairs used for all routing intervals is minimized. Our approach is to perform layer assignment for each routing interval independently. For each routing interval, we construct a layer constraint graph (LCG) [37], denoted LCG, as follows: corresponding to each net in we have a node in LCG. Two nodes LCG have an edge between them if a net segment and are sharing the same edge in, i.e., and are sharing the same boundary of a routing region. Then we use a node coloring algorithm to assign colors to the nodes in LCG such that no two nodes sharing an edge are assigned the same color. Let denote the total number of colors used during the node coloring, and let denote the wire capacity of the boundary in routing region. Then, the total number of layer pairs used in this routing interval is computed as follows:. Last, a node with color is assigned to layer pair. Let and, respectively, denote the maximum number of wires used among all horizontal and vertical edges in. Then, the following is a lower-bound Fig. 16. Pseudocode for SOP layer assignment algorithm. on the number of layers used in :. Fig. 16 shows the SOP layer assignment algorithm that includes our coloring heuristic. We first sort all nodes in LCG in a decreasing order of the number of their neighbors (line 3). Let denote the set of colors used by the neighbors of (line 6). We visit the nodes in the sorted order (line 5). In case there exists an used color that is not included in (line 7), we assign this color to (line 12). In case there exist multiple colors that satisfy this condition, we use the lowest color. Otherwise, we introduce a new color and assign it to (line 9 10). Last, we assign a layer pair to each net based on its color (line 13). In spite of its simplicity, this greedy algorithm provides results that are very close to the lower bound on total number of layers used as demonstrated in Section VI. E. Sop Channel Assignment Our prior topology generation and layer assignment steps focus on the connections among the distributed pins in the pin distribution layers. after these steps are finished, there remain two kinds of connections: i) connections between the original and distributed pins and ii) feed-through via insertion. For both types of connections, the routing channels in the placement layers are used. During the SOP channel assignment step, each pin in the pin distribution layer is mapped to a routing channel in the neighboring placement layer. In addition, each pin from an x-net that needs to penetrate a placement layer is also mapped to a routing channel in the placement layer. Last, we generate routing topology for each pin-to-channel connection and assign it to a routing layer pair in the pin distribution layer. Each placement layer is modeled with the floor connection graph (FCG) [29] as illustrated in Fig. 3. During the SOP local routing step, we use the congestion-driven rip-up-and-reroute [36] to i) finish the routing for the given set of two-pin connections while satisfying the wire capacity constraint and ii) decide the location of pins for soft blocks along their boundaries. An illustration is shown in Fig. 17. For each placement layer, let denote the set of pins that need to be mapped to a routing channel in. contains pins from and. The pins in are grouped into two sets: terminal pin set for the pins that have terminals in and feed-through pin set for the pairs of pins that require feed-through vias to penetrate.

652 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 TABLE I AREA/WIRE-DRIVEN VERSUS DECAP-DRIVEN VERSUS THERMAL-DRIVEN ALGORITHMS Fig. 17.

The goal of SOP Channel Assignment Problem is to map each pin to a routing channel for 1 and finish -to- connection while satisfying the pin capacity constraint,

Let denote the number of layers used to finish connection between pins from and, and let denote the number of layers used to finish connection between pins from and.

9 652 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 TABLE I AREA/WIRE-DRIVEN VERSUS DECAP-DRIVEN VERSUS THERMAL-DRIVEN ALGORITHMS Fig. 17. Illustration of (a) channel assignment and (b) local routing. Let denote the set of routing channels in. Each channel is associated with the pin capacity constraint. The goal of SOP Channel Assignment Problem is to map each pin to a routing channel for 1 and finish -to- connection while satisfying the pin capacity constraint, i.e.,. For each pair of pins,we map and to the same channel. Let denote the number of layers used to finish connection between pins from and, and let denote the number of layers used to finish connection between pins from and. The objective of SOP channel assignment is to minimize the following cost function: A pseudocode for SOP channel assignment algorithm is shown in Fig. 18. We visit each placement layer and assign the feed-through pins and terminal pins to the channels. In addition, an L-shaped routing topology for each pin-to-channel is constructed in a 2-D grid (line 3). The signal delay of feed-through vias is larger than that of other types of vias. Since each channel is under a capacity constraint, it is important to assign the feed-through pins to the nearest channels first. Therefore, our strategy is to perform channel assignment for the feed-through pins first to minimize the delay of -nets that require feed-through vias. In addition, we give priority to the Fig. 18. Pseudocode for SOP channel assignment algorithm. pins that are included in long nets. Thus, we sort the pins based on the wirelength (line 4). Our heuristic algorithm assigns pins to channels based on the cost of mapping we seek a channel with the best mapping cost for a given pin (line 6 8). We compute the cost of mapping for a given pin and a channel as follows: cost room dist bend cong where room denotes the number of pins can accommodate until it violates the capacity constraint, dist is the Manhattan distance between and bend is the number of bends in the connection between and, and cong denotes the total number of existing connections along the proposed L-shaped route. We choose the channel with the maximum cost. Note that a channel is represented with a line instead of a point. Thus, distance and bend are based on the shortest connection between and any point on. Upon a pin to channel mapping, we update the usage of channel in and edges in (line 10). After the channel assignment and topology generation for all pin-to-channel connections are finished, we perform layer assignment using our coloring heuristic presented in Section V-D and compute the total number of layers used.

MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 653 VI. EXPERIMENTAL RESULTS We implemented our algorithms in C /STL and ran our experiments on Linux Beowulf clusters.

10 MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 653 VI. EXPERIMENTAL RESULTS We implemented our algorithms in C /STL and ran our experiments on Linux Beowulf clusters. We tested our algorithms with two sets of benchmarks. The first set is from the standard GSRC floorplan circuits [38]. The second set, named the GT benchmark, was synthesized from the IBM circuits [39], where we use our multilevel partitioner [40] to divide the gate-level netlist into multiple blocks. The GSRC benchmarks are small to medium-sized in terms of both the number of blocks and nets. 5 The GT benchmarks contain medium to large number of blocks with dense connections. A. SOP Placement Results The dimension of the area was assumed to be in mm and area per unit decap cost was chosen to be 50 mm. The number of placement layers is fixed to four for all experiments. In our experiments, we used the same parameters across the benchmarks. The technology parameters used were 0.01 m for wire resistance per unit length, 1 ph/ m for wire inductance per unit length, and 10 ff/ m for wire capacitance per unit length. We report the average runtime for each benchmark measured in seconds. For our thermal analysis we used a grid size of in the and direction. The number of active layers was four with three passive layers in between them. The conductivity of the substrate was chosen to be that of silicon (0.1 W/mm C). The conductivity of the sides of the package was fixed at 0.01 W/mm C. The heat sink was assumed to be at the top of the package with a conductivity of 0.5 W/mm C and the conductivity of the bottom of the package was chosen to be 0.2 W/mm C. The power density of the blocks varied from 10 W/mm^2 to 300 W/mm^2 depending on the switching current demands. The switching current demands were generated using a formula using a random number and the size of the block. Table I shows a comparison among 3-D SOP placement algorithms with three different objectives: 1) area/wire-driven ( 1 and 0 in (1), Section III B), 2) decap-driven ( 1 and 0), and 3) thermal-driven ( 1 and 0). Our baseline algorithm is the area/wirelength-driven algorithm. We achieved a 21% improvement over baseline in decap cost using our decap-driven algorithm, with a 7% increase in final area and 15% increase in wirelength. The temperature decreases by 3%. Our thermal-driven floorplanning achieved a 21% improvement over baseline with a dramatic increase in total area of 76%. Wirelength increased by 28% and decap requirement increases by 19%. We note that it may be possible to reduce area and wirelength costs by fine-tuning parameters. Table I shows the result of our final 3-D placement algorithm that considers all four objectives: area, wirelength, decap, and thermal ( 1 in (1), Section III-B). We note that our area/wire/decap/thermal-driven algorithm achieves an improvement in both decap and temperature over baseline 5 The GT benchmark circuits are available for download at our website: Fig. 19. Decap versus temperature correlation plot for n300. TABLE II DECAP VERSUS TEMPERATURE CORRELATION CONSTANTS by 13% and 9%, respectively. There is a reasonable increase in final area and wirelength by 19% and 15%, respectively. We also tested our area/wire/decap algorithm under a thermal constraint. In this case, all candidate solutions during the simulated annealing are discarded if the maximum temperature is above a certain threshold (100 C in our case). We observe that the area, wirelength, and decap results improve simultaneously at the cost of slight increase in temperature. The CPU times of the algorithm depend on the number of candidate solutions evaluated. The times reported in the table directly reflect the subset of the configuration space spanned by the algorithm. The constrained algorithm ran much faster, making a small number of moves because fewer moves were accepted. Fig. 19 shows the temperature and decap requirement of each block of the final floorplan of the n300 benchmark. The figure clearly shows that there is little correlation between temperature and decap. This shows that minimizing one objective does not necessarily minimize the other objective as a positive correlation would indicate. It also means that minimizing both objectives is not mutually exclusive, as a negative correlation would indicate. This matches with the block placement results. Table II shows the correlation constants. Table III shows power supply noise simulation results for three 3-D placement schemes-no-decap aware, decap-aware, and decap-aware decap-placement. The P/G plane structure size is 246 mm 254 mm, and the top P/G plane pair was modeled using cavity resonator model [41] and simulated in HSPICE. The placement layer that uses this P/G plan includes 14 active devices. The dc 5 V sources are located at four edges in the plane pair and fourteen current sources exist in the plane pair. As can be seen from the Table III, the SSN for the

11 654 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 TABLE III POWER SUPPLY NOISE SIMULATION RESULTS. WE REPORT SSN NOISE FOR EACH BLOCK PLACED IN THE TOP PLACEMENT LAYER. (a) NONOPTIMIZED CASE WITHOUT ANY DECOUPLING CAPACITOR. FROM TABLE I WE NEED 26.7 DECAP UNITS TO SUPPRESS SSN NOISE. (b) OPTIMIZED CASE WITHOUT ANY DECOUPLING CAPACITOR. FROM TABLE I WE NEED 19.9 DECAP UNITS. (c) OPTIMIZED CASE WITH DECOUPLING CAPACITORS INSERTED TABLE VI SOP PIN REDISTRIBUTION RESULTS USING OUR COARSE AND DETAILED PIN DISTRIBUTION ALGORITHMS. WE REPORT THE WIRELENGTH BETWEEN THE ORIGINAL AND THE NEW LOCATION (dw), TOTAL WIRELENGTH (wl), CROSSTALK (xt) AND TOTAL NUMBER OF LAYER PAIRS USED (1 yr) IN THE ROUTING LAYERS TABLE IV BENCHMARK CHARACTERISTICS OF GSRC AND GT CIRCUITS.WCAP DENOTES THE WIRE CAPACITY OFEACH BOUNDARY IN THE ROUTING TILES.CP AND DP RESPECTIVELY DENOTE THE DIMENSION OF 2-D GRIDS USED DURING OUR COARSE AND DETAILED PIN DISTRIBUTION STEPS TABLE VII TOPOLOGY GENERATION AND LAYER ASSIGNMENT RESULTS. WE REPORT THE TOTAL WIRELENGTH (WL), ELMORE DELAY OF THE NETS WITH MAXIMUM SINK DELAY (dly), AND THE LOWER BOUND (low) AND THE ACTUAL NUMBER (1 yr) OF LAYERS USED TABLE V AREA INFORMATION OF THE 4-LAYER SOP FLOORPLANNING FOR GSRC AND GT BENCHMARK DESIGNS. THE MIN AND MAX RESPECTIVELY DENOTE THE DIMENSION OF THE MINIMUM AND MAXIMUM FLOORPLAN AREA, WHEREAS THE FINAL COLUMN DENOTES THE DIMENSION OF THE OVERALL FOOTPRINT decap-aware algorithm is lower compared to nondecap-aware algorithm. The SSN of the noisiest block blk4 is 1.58 V, which is reduced to 1.42 V by our decap-aware scheme. With the insertion of decap, the noise is suppressed to 0.22 V. In addition, the total amount of decap required for nondecap-aware algorithm is 26.7, which is reduced to 19.9 with our decap-aware scheme. The largest amount of decap is used for blk5 (0.50 nf), because of an increase in its SSN after optimization. The numbers show that the SSN was efficiently suppressed and the amount of decap reduced by using our algorithms. B. SOP Routing Results Table IV shows the characteristics of the GSRC and GT benchmark designs used during routing (see Table V). In Table VI we compare pin redistribution results. Under the DPD columns, we perform DPD ( detailed pin distribution presented in Section V-C) only, where we skip CPD ( coarse pin distribution presented in Section V-B) and assign all -nets to the routing interval below for net distribution. Under the CPD DPD, we perform CPD using the algorithm, assign all -nets to the routing interval below for net distribution, and perform DPD. DPD serves as our baseline, where CPD+DPD respectively demonstrate the impact of our coarse pin distribution. In all cases, we perform detailed pin distribution to legalize the pin location, i.e., remove overlaps among the pins. We use the following metrics to evaluate our solutions: wirelength between the original and the new location (dw), total wirelength (wl), crosstalk (xt) and total number of layer pairs used (lyr) in the routing layers. The displacement (dw) and wirelength (wl) results are scaled by 10, and the time reported is the average runtime among the GSRC/GT circuits. From the comparison between DPD and CPD DPD, we note that the displacement result (dw) increases by an average of 50%. However, CPD lowers the total wirelength (wl) consistently by 10% on average and the number of layers (lyr) by 10% on average. In Table VII we show our topology generation (RSA/G) and layer assignment (LAYER) results. We used the technology parameters for process for Elmore delay computation. Specifically, the driver resistance of 29.4 k, input capacitance of ff, unit-length resistance of 0.82 m and

MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 655 TABLE VIII SOP CHANNEL ASSIGNMENT RESULTS.

MINIMIZATION ONLY) AND MULTI-OBJECTIVE ALGORITHMS. WE REPORT THE WIRELENGTH (wl), MAXIMUM (max) AND AVERAGE (avg) ROUTING DEMAND AS WELL AS THE STANDARD DEVIATION (dev) unit-length capacitance of 0.

We report the total wirelength (wl), Elmore delay of the nets with maximum sink delay (dly), and the lower bound and the actual number of layers used for the top-most routing interval.

12 MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 655 TABLE VIII SOP CHANNEL ASSIGNMENT RESULTS. WE REPORT THE LAYER USAGE, WIRELENGTH, AND VIA FOR THE BASELINE (WIRELENGTH MINIMIZATION ONLY) AND MULTI-OBJECTIVE ALGORITHMS TABLE IX SOP LOCAL ROUTING RESULTS FOR THE BASELINE (WIRELENGTH MINIMIZATION ONLY) AND MULTI-OBJECTIVE ALGORITHMS. WE REPORT THE WIRELENGTH (wl), MAXIMUM (max) AND AVERAGE (avg) ROUTING DEMAND AS WELL AS THE STANDARD DEVIATION (dev) unit-length capacitance of 0.24 ff/ m are used. We report the total wirelength (wl), Elmore delay of the nets with maximum sink delay (dly), and the lower bound and the actual number of layers used for the top-most routing interval. In general, GSRC benchmarks have bigger delay than GT benchmarks due to the larger average wirelength. Our layer assignment algorithm presented in Section V-D is able to achieve results very close to the lower bounds discussed in Section V-D. For the GT circuits, the layer assignment results are within 10% of the lower bound. For the GSRC circuits, we were able to achieve the results equal to the lower bound. Our channel assignment results are shown in Table VIII. The baseline case is where we optimize the wirelength only. We then compare it to our multi-objective channel assignment algorithm that simultaneously optimizes the wirelength, via, and layer usage. Our comparison indicates that the number of layers is consistently and significantly reduced especially for the bigger GT benchmarks, where an average improvement of 48% is observed. In case of the second largest benchmark gt1000, we achieved 57% improvement. This saving on the layer usage comes at the cost of increase in wirelength and vias. The average increase in wirelength is 12% and 30% for GSRC and GT benchmarks, respectively. The average increase in via usage is 63% and 79% for GSRC and GT benchmarks, respectively. We noted that the channel assignment result is very sensitive to the weighting constants among the objectives used in our cost function. This indicates that the solution space of the channel assignment problem offers many useful tradeoff points. Table IX reports our local routing results. We report the wirelength, maximum and average routing demand as well as the standard deviation. Our baseline is the local routing optimized for wirelength only. We then compare it against our multi-objective local routing algorithm that simultaneously optimizes wirelength and routing demands. In both cases, the same pin demand constraint is imposed. We note that the improvement of our multi-objective algorithm over the baseline is significant, especially for GSRC circuits the routing demands were reduced by 33% on average while the wirelength increased by only 10%. In addition, we reduced the routing demands for the GT benchmarks by 41% on average, with wirelength increase by 20%. In our biggest benchmarks (gt1500), our routing demand reduction Fig. 20. Snapshot of four-layer SOP with three routing interval for n200 benchmark. The routing-tile boundaries with heavy usage are shown with thicker lines. is the largest (53%), which comes with the maximum increase in wirelength (23%). This again indicates that the local routing result is very sensitive to the weighting constants among the objectives used in our cost function. The lower standard deviation of our multi-objective algorithm indicates that the routing demand is more evenly distributed ( lower congestion) compared to the wirelength-only case. Last, Fig. 20 shows a snapshot of a four-layer SOP with 3 routing interval for n200 benchmark. VII. CONCLUSION In this article, we presented the first physical layout algorithm for 3-D SOP designs that includes thermal/decap-aware 3-D placement and crosstalk-aware 3-D routing algorithm. We extended the thermal and SSN models to 3-D and used them to guide our 3-D module placement. In addition to estimating the amount of decap needed to keep the SSN at the circuits tolerance level, we also efficiently used near-by white spaces to allocate decap if necessary. Our routing process is divided into pin redistribution, topology generation, layer assignment, channel assignment, and local routing steps. Our major objective is to reduce the amount of crosstalk and layers while satisfying various constraints on routing resource. Our ongoing work includes SOP detailed routing.

656 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 REFERENCES [1] R. Tummala and V. Madisetti, System on chip or system on package?

A new microsystem-integration technology paradigm-moore s law for system integration of miniaturized convergent systems of the next decade, IEEE Trans. Adv. Packag., vol. 27, no. 2, pp.

13 656 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 REFERENCES [1] R. Tummala and V. Madisetti, System on chip or system on package?, IEEE Design Test Comput., vol. 16, no. 2, pp , Apr./Jun [2] R. Tummala, SOP: What is it and why? A new microsystem-integration technology paradigm-moore s law for system integration of miniaturized convergent systems of the next decade, IEEE Trans. Adv. Packag., vol. 27, no. 2, pp , May [3] R. R. Tummala et al., The SOP for miniaturized, mixed-signal computing, communication, and consumer systems of the next decade, IEEE Trans. Adv. Packag., vol. 27, no. 2, pp , May [4] K. Kundert, H. Chang, D. Jefferies, G. Lamant, E. Malavasi, and F. Sendig, Design of mixed-signal systems-on-a-chip, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 19, no. 12, pp , Dec [5] G. Gielen and R. Rutenbar, Computer-aided design of analog and mixed-signal integrated circuits, Proc. IEEE, vol. 88, no. 12, pp , Dec [6] Y. Deng and W. Maly, Interconnect characteristics of 2.5-D system integration scheme, in Proc. Int. Symp. Phys. Design, 2001, p [7] S. Das, A. Chandrakasan, and R. Reif, Design tools for 3-D integrated circuits, in Proc. Asia South Pacific Design Automat. Conf., 2003, pp [8] B. Goplen and S. Sapatnekar, Efficient thermal placement of standard cells in 3-D IC s using a force directed approach, in Proc. IEEE Int. Conf. Comput.-Aided Design, Nov. 2003, pp [9] J. Minz and S. K. Lim, Layer assignment for system on packages, in Proc. Asia South Pacific Design Automat. Conf., Jan. 2004, pp [10] J. Minz, M. Pathak, and S. K. Lim, Net and pin distribution for 3-D package global routing, in Proc. Design, Automat. Test Europe, Feb. 2004, pp [11] R. Ravichandran, J. Minz, M. Pathak, S. Easwar, and S. K. Lim, Physical layout automation for system-on-packages, in Proc. IEEE Electron. Comp. Technol. Conf., vol. 1, Jun. 2004, pp [12] J. Minz, S. K. Lim, J. Choi, and M. Swaminathan, Module placement for power supply noise and wire congestion avoidance in 3-D packaging, in Proc. IEEE Electr. Perform. Electron. Packag., 2004, pp [13] J. Minz and S. K. Lim, A global router for system-on-package targeting layer and crosstalk minimization, in Proc. IEEE Electr. Perform. Electron. Packag., 2004, pp [14] J. Minz, E. Wong, and S. K. Lim, Thermal and congestion-aware physical design for 3-D system-on-pacakge, in Proc. 55th IEEE Electron. Comp. Technol. Conf., May 31 Jun , pp [15] J. Choi, S. Chun, N. Na, M. Swaminathan, and L. Smith, A methodology for the placement and optimization of decoupling capacitors for gigahertz systems, in Proc. VLSI Design Symp., 2000, pp [16] A. Kamo, T. Watanabe, and H. Asai, An optimization method for placement of decoupling capacitors on printed circuit board, in Proc. IEEE Electr. Perform. Electron. Packag., 2000, pp [17] Y. Chen, Z. Chen, and J. Fang, Optimum placement of decoupling capacitors on packages and printed circuit boards under the guidance of electromagnetic field simulation, in Proc. IEEE Electron. Comp. Technol. Conf., Orlando, FL, May 1996, pp [18] J. Lee, Thermal placement algorithm based on heat conduction analogy, IEEE Trans. Compon. Packag. Technol., vol. 26, no. 2, pp , Jun [19] K. Deb, P. Jain, N. Gupta, and H. Maji, Multiobjective placement of electronic components using evolutionary algorithms, IEEE Trans. Compon. Packag. Technol., vol. 27, no. 3, pp , Sep [20] K. Khoo and J. Cong, An efficient multilayer MCM router based on four-via routing, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 14, no. 10, pp , Oct [21] J. D. Cho, K. Liao, S. Raje, and M. Sarrafzadeh, M2R: Multilayer routing algorithm for high-performance MCMs, IEEE Trans. Circuits. Syst. I, vol. 41, no. 4, pp , Apr [22] W. W. Dai, Topological routing in surf: Generating a rubberband sketch, in Proc. ACM Design Automat. Conf., 1991, pp [23] D. Wang and E. Kuh, A new timing-driven multilayer MCM/IC routing algorithm, in Proc. IEEE Multi-Chip Module Conf., 1997, pp [24] Y. Cha, C. Rim, and K. Nakajima, A simple and effective greedy multilayer router for MCMs, in Proc. Int. Symp. Physical Design, 1997, pp [25] J. Cong and P. Madden, Performance driven multi-layer general area routing for PCB/MCM designs, in Proc. ACM Design Automat. Conf., Mar. 1998, pp [26] J. Cho and M. Sarrafzadeh, The pin redistribution problem in multichip modules, Math. Programm. Spec. Iss. Appl. Discrete Programm. Comput. Sci., pp , [27] D. Chang, T. Gonzales, and O. Ibarra, A flow based approach to the pin redistribution problem for multi-chip modules, in Proc. Great Lakes Symp. VLSI, 1994, p [28] J. D. Cho, An optimum pin redistribution for multichip modules, in Proc. IEEE Multi-Chip Module Conf. (MCMC 96), 1996, p [29] J. Cong, Pin assignment with global routing for general cell design, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 10, no. 11, pp , Nov [30] W. Elmore, The transient response of damped linear networks with particular regard to wideband amplifiers, J. Appl. Phys., pp , [31] J. Cong, A. B. Kahng, and K.-S. Leung, Efficient algorithms for the minimum shortest path steiner arborescence problem with applications to VLSI physical design, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 17, no. 1, pp , Jan [32] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimization by simulated annealing, Science, pp , [33] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, Rectangle packing based module placement, in Proc. IEEE Int. Conf. Computer-Aided Design, 1995, pp [34] S. Zhao, C. Koh, and K. Roy, Decoupling capacitance allocation and its application to power supply noise aware floorplanning, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 1, pp , Jan [35] J. D. Cho, S. Raje, M. Sarrafzadeh, M. Sriram, and S. M. Kang, Crosstalk-minimum layer assignment, in Proc. IEEE Custom Integr. Circuits Conf., San Diego, CA, 1993, pp [36] L. McMurchie and C. Ebeling, Pathfinder: A negotiation based performance-driven router for FPGAs, in Proc. ACM Int. Symp. Field-Programmable Gate Arrays, 1995, pp [37] M. Ho, M. Sarrafzadeh, G. Vijayan, and C. Wong, Layer assignment for multichip modules, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 9, no. 12, pp , Dec [38] GSRC, Floorplan Benchmark Suites, Tech. Rep., Online. Available: [39] C. J. Alpert, The ISPD98 circuit benchmark suite, in Proc. Int. Symp. Phys. Design, 1998, pp [40] J. Cong and S. K. Lim, Edge separability based circuit clustering with application to multi-level circuit partitioning, IEEE Trans. Comput.- Aided Design Integr. Circuits Syst., vol. 23, no. 3, pp , Mar [41] N. Na, J. Choi, M. Swaminathan, J. P. Libous, and D. P. O Connor, Modeling and simulation of core switching noise for asics, IEEE Trans. Adv. Packag., vol. 25, no. 1, pp. 4 11, Feb Jacob Rajkumar Minz (S 05) received the B.Tech. degree in computer science and engineering from the Indian Institute of Technology, Kharagpur, in 2001 and is currently pursuing the Ph.D. degree at the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta. He was with the Advanced VLSI Design Lab, Indian Institute of Technology, Kharagpur, for one year, where he was involved in the design of digital chips. His areas of interest are physical design automation and algorithms for electronic CAD. Eric Wong (S 05) received the B.S. degree in electrical engineering from the State University of New York at Binghamton in 2004 and is currently pursuing the Ph.D. degree in electrical and computer engineering at the Georgia Institute of Technology, Atlanta. His research interests include physical VLSI design and design automation for VLSI circuits.

degree in the School of Electrical and Computer Engineering, Georgia Institute of Technology (Georgia Tech), Atlanta. He is a Graduate Research Assistant at Georgia Tech.

14 MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 657 Mohit Pathak (S 05) received the B.Tech. degree in computer science and engineering from the Indian Institute of Technology, Kharagpur, in 2004 and is currently pursuing the Ph.D. degree in the School of Electrical and Computer Engineering, Georgia Institute of Technology (Georgia Tech), Atlanta. He is a Graduate Research Assistant at Georgia Tech. He was with Magma Design Automation, India for one year, where he worked as a Member of Technical Staff. His areas of interest are physical design automation and VLSI design. Sung Kyu Lim (S 94 M 00 SM 05) received the B.S., M.S., and Ph.D. degrees from the Computer Science Department, University of California, Los Angeles (UCLA), in 1994, 1997, and 2000, respectively. From 2000 to 2001, he was a Post-Doctoral Scholar at UCLA, and a Senior Engineer at Aplus Design Technologies, Inc. He joined the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, as an Assistant Professor in August 2001 and the College of Computing as an Adjunct Assistant Professor in September He served as a Guest Editor for ACM Transactions on Design Automation of Electronic Systems. His research focus is on the physical design automation for 3-D circuits, 3-D system-on-packages, microarchitectural physical planning, field programmable analog arrays, and quantum cell automata. Dr. Lim received the Design Automation Conference Graduate Scholarship in 2003 and the National Science Foundation Faculty Early Career Development (CAREER) Award in He has been on the Advisory Board of the ACM Special Interest Group on Design Automation since He has served the Technical Program Committee of several ACM and IEEE conferences on electronic design automation.

SEMICONDUCTOR industry is beginning to question the

SEMICONDUCTOR industry is beginning to question the IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL.??, NO.??, JANUARY(?) 2005 1 Placement and Routing for 3D System-On-Package Designs Jacob Minz, Eric Wong, Mohit Pathak, and Sung Kyu Lim,