High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology

Size: px

Start display at page:

Download "High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology"

Ashley Joseph
6 years ago
Views:

1 High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology Shreepad Panth 1, Kambiz Samadi 2, Yang Du 2, and Sung Kyu Lim 1 1 Dept. of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA Qualcomm Research, San Diego, CA Abstract Three dimensional integrated circuits (3D-ICs) have emerged as a promising solution to continue device scaling. They can be realized using Through Silicon Vias (TSVs), or monolithic integration using Monolithic Inter-tier vias (MIVs), an emerging alternative that provides much higher via densities. In this paper, we provide a framework for floorplanning existing IP blocks into 3D-ICs using MIVs. We take the floorplanning solution all the way through place-and-route and report post-layout metrics for area, wirelength, timing, and power consumption. Results show that the wirelength of TSV-based 3D designs outperform designs by upto 14% in large-scale circuits only. MIV-based 3D designs, however, offer an average wirelength improvement of 33% for a wide range of benchmark circuits. We also show that while TSV-based 3D cannot improve the performance and power unless the TSV capacitance is reduced, MIV-based 3D offers significant reduction of upto 33% in the longest path delay and 35% in the inter-block net power. I. INTRODUCTION Fig. 1. A sample monolithic 3D technology with three metal layers per tier. Three dimensional integrated circuits (3D-ICs) have emerged as a promising solution to extend the scaling trajectory predicted by the Moore s Law. Currently, through-silicon vias (TSVs) enable 3D- ICs, allowing vertical stacking of multiple dies fabricated separately. However, the quality of TSV-based 3D-ICs strongly depends on the TSV dimensions and parisitics, and are limited to memory-on-logic or large logic-on-logic designs with relatively small number of global interconnects [1], [2]. An emerging alternative to TSV-based 3D is monolithic 3D that enables orders of magnitude higher integration density compared to that of TSV-based technology, due to the extremely small size of the monolithic inter-tier vias (MIVs). Monolithic 3D integration technology fabricates two or more tiers of devices sequentially, instead of bonding two previously fabricated dies using micro bumps and TSVs. Figure 1 shows a typical monolithic 3D structure with three metal layers per tier. The two device tiers are connected by inter-tier vias, which are essentially the same size as intra-tier vias. To fabricate the top device tier, low-thermal budgeting process must be applied to prevent damage to the underlying tier s back-end-of-line (BEOL). Currently, several monolithic 3D integration processes are developed. CEA/LETI [3], [4] has developed a sequential integration flow based on low temperature bonding process. Samsung [5] has developed a S3 technology for 3-tier SRAM cell using low-thermal TFT process. Overall, MIVs provide better electrical characteristics (i.e., less parasitics, electrical coupling, etc.) than TSVs, and also enable higher integration densities due to their small size. In this paper, we propose an efficient 3D design space exploration framework (i.e., 3D floorplanning) which accounts for the different characteristics between TSV-based and monolithic 3D integration technologies. Since re-designing existing logic, memory and IP blocks for 3D incurs significant design overhead and cost, near-term 3D-ICs will focus on reusing existing blocks [6], [7], [8]. In this paper, we present a floorplanning framework that uses different optimization objectives, based on physical characteristics of both TSVs and MIVs. To the best of our knowledge, this paper is the first to provide a 3D floorplanning framework specifically for monolithic 3D integration technology. To further enhance the applicability of our approach, we integrate our proposed 3D block-level floorplanning framework with the existing commercial place and route (P&R) tools to more accurately assess the solution quality. We use four different testcases with varying complexities, ranging from 33K to 1.7M gates in 45nm technology to better show the impact of our proposed methodology. The contributions of our work are listed below. We propose and develop an integrated 3D block-level floorplanning framework with appropriate objective functions for both TSV-based and monolithic 3D to enable an efficient 3D-IC design space exploration. In addition to using simulated annealing for floorplanning, we propose a post-floorplan refinement (PFPR) heuristic which achieves an average reduction of 6.13% in inter-block wirelength with respect to the initial floorplan. We propose a methodology for MIV planning, which relies on custom scripts, and existing commercial P&R tools. We develop a methodology that takes the obtained 3D floorplan all the way through place and route, and is capable of reporting post-layout timing and power numbers. II. RELATED WORK Monolithic design for high-performance ICs was presented in [9]. This paper presented two design styles, the first one in which PMOS and NMOS devices are fabricated on separate layers, and another in which standard cells have both PMOS and NMOS devices in the same tier. They presented a placement algorithm to fully utilize high density MIVs. Although the stackup is similar to our case, this paper carried out their study at the gate level, and the placement algorithm is not applicable to block-level monolithic 3D-ICs. Design for monolithic 3D-SRAM was carried out in [10]. The authors provided different design styles of the SRAM cell, assuming different PMOS and NMOS tiers, and compared them w.r.t. static noise margin, write margin, and data retention voltage. While several prior works exist on adding TSVs at the gate level or core level, only a few works consider adding TSVs into existing whitespace blocks at the floorplanning stage. Simultaneous buffering /13/$ IEEE 681

2 Center-to-Center based Annealing Update with pin locations Annealing based refinement Create Verilog and DEF files with pins Route with Encounter swap two blocks in either the positive sequence, negative sequence, or both, and (3) move or swap two blocks between a pair of dies/tiers. In TSV-based 3D, we need to control the number of TSVs due to its significant silicon area. Hence, the TSV-based 3D cost function is given as follows. C TSV = αw L + βa + γn TSV (1) Fig. 2. Monolithic? No TSV planning Yes Extract MIV location and connectivity Create Verilog/DEF file for each die Existing work Custom program Cadence Encounter The design flow to obtain a 3D floorplan with TSV/MIV insertion. and TSV planning was carried out in [11], but the authors reported inaccurate 3D HPWL and timing metrics. An improved algorithm was presented in [7], but the same inaccurate HPWL metric was used. Results based on an improved BB--HPWL metric was presented in [8], and the most accurate HPWL metric based on subnets was presented in [6]. However, none of these papers compared the quality of their engine with that of a commercially available tool, or took the obtained floorplans through place and route and reported postlayout numbers. These shortcomings are overcome in this paper, and therefore, the numbers reported are the most accurate. To the best of our knowledge, this is the first work to fully exploit the high density offered by monolithic 3D integration, use a validated floorplanner to perform block-level monolithic 3D design, and compare post-layout 3D wirelength, timing and power numbers with those of a commercial tool. III. 3D FLOORPLANNING WITH MONOLITHIC INTER-TIER VIAS A. Problem Formulation and Overview A general form of the 3D floorplanning problem can be stated as follows : Given the number of desired tiers, and a set of blocks along with their corresponding widths and heights, determine the (x, y, z) locations of each of the blocks and all MIVs/TSVs. The overall design flow is shown in Figure 2. We first perform floorplanning to determine the location of all the blocks assuming the pins are placed at the center. Once the locations of all the blocks are determined, we update the locations of the pins and perform a refinement step (i.e., PFPR) to further minimize wirelength. Depending on whether we are dealing with TSVs or MIVs, we have different via planning engines. Finally, we create separate Verilog files for each die/tier with the corresponding connectivity information, and a design exchange format (DEF) file with the location of blocks and TSVs/MIVs. Each of the above steps are further explained in following subsections. B. Floorplanning Engine In this step, we take the description of all the blocks as well as the connectivity information and generate an output floorplan that minimizes a certain cost function depending on whether we are using TSV-based or monolithic 3D. We use a simulated annealing engine similar to [6], maintaining a separate sequence pair for each die. We perform the following different moves during the annealing process: (1) change aspect ratio of a block (or rotate in case of hard blocks), (2) In the above equation, WL represents the inter-block wirelength, A represents the chip area, and N TSV represents the number of TSVs. However, if we are dealing with monolithic 3D, then the MIV size is negligible, and we do not need to constrain the number of MIVs, opening up the possibility for further optimization. The monolithic 3D cost function is given as follows. C MIV = α WL+ β A (2) Considering the pin locations of the blocks during floorplanning will require an extra step to compute the physical location of all block-pins. Since the number of block-pins are quite large, this will lead to large runtime overhead. We instead propose a postfloorplanning refinement (PFPR) step to consider pin locations once block locations have been determined. C. Post-Floorplan Refinement (PFPR) After we determine the relative locations of all the blocks, we update the blocks with the pin locations. Each block has 8 possible orientations, 0, 90, 180, 270, and their flipped counterparts. Without changing the relative locations of the blocks in the floorplan, each block can only have four possible orientations. For example, if the pins are in the center of a block, 0, 180 or 90, 270 and their flipped counterparts are all the same. However, if the pins are placed along the periphery each of the above four orientations gives a different wirelength result. The goal of this step is to determine the orientation of each block, such that the wirelength is minimized. To do this, we use simulated annealing, where the only operation allowed is to change block orientation. The block orientation can only be changed among the allowed four scenarios. No sequence pair is necessary, as the relative locations of blocks do not change. Furthermore, wirelength computation can be done incrementally as we only change one block at a time. D. MIV Planning Algorithm Once we obtain the 3D floorplanning result, we need to insert TSVs or MIVs (monolithic inter-tier vias) in the case of monolithic 3D to connect blocks in different tiers. Since TSVs are big (around 5μm to 10μm) and we may not have enough whitespace in the dies, a whitespace manipulation step is required. We use an existing TSV planner [6] that constructs a 3D rectilinear Steiner tree (RST) from a rectilinear Steiner minimum tree (RSMT), and then moves TSVs to nearby whitespace based on a network-flow formulation. In the case that there is insufficient whitespace, we insert whitespace at desired locations. However, in the case of monolithic 3D, MIVs are very small (around 70nm) and hence, we can safely assume that there is always whitespace available for MIV insertion. In this case, we can utilize existing obstacle avoiding routers to perform MIV insertion. We use the IC router in Cadence SOC Encounter, and since it is limited to 15 metal layers, we use three metal layers to represent a given tier for the MIV planning stage only. This allows us to represent up to 5 tiers. For example, if a block is in tier 2, we use metal layer 4 to place block-pins, and metal layers 5 and 6 to represent interblock routing on that tier. Vias between metal 6 and 7 represent MIVs between tier 2 and 3. Our choice of the number of metal layers used 682

3 Algorithm 1: MIV Planning Algorithm Input : Location of all blocks in B, block orientation, block-pin locations, and connectivity information Output: Number, location, and connectivity information of MIVs 1 for n 1 to N net do 2 add connectivity information into a Verilog file; 3 end 4 for i 1 to B do 5 for p 1 to N b i pin do 6 add pin physical location (x p b i,y p b i,l b i ) in the DEF; 7 end 8 add routing blockage for b i on its assigned layer l b i j ; 9 end 10 read the above Verilog and DEF files into SOC Encounter; 11 route the design and save the routed DEF file; 12 read the routed DEF file and reconstruct the routing graphs; 13 extract corresponding subnets in each die / tier from the routing graphs; 14 create Verilog file for each die/tier with subnet connectivity; 15 create DEF file for each die/tier with MIV locations; TABLE I DESIGN STATISTICS FOR ALL BENCHMARKS Design # Gates #Blk #Inter-blk Intra-blk Target nets WL (μm) period (ns) des perf 33, , , cf rca , ,135 1,210, cf fft , ,402 4,490, mult ,639, ,471 12,354, is justified because we only route the inter-block nets in our blocklevel monolithic 3D designs, and they are routed in the top 2 or 3 metal layers of each tier. Our MIV planning heuristic starts with creating a netlist that contains the connectivity information of the pins of all the 3D nets as shown in Lines 1 to 3 of Algorithm 1, where N net denotes the total number of 3D nets. We then create a DEF file that contains the physical location of every pin of each block; x p b i and y p b i denote the x and y coordinates of pin p of block b i, respectively, and l b i denotes the metal layer that block b i is assigned to. In addition, we add routing blockages for each block to account for (1) the fact that MIVs cannot be placed within the blocks and (2) the internal wiring of each block (Lines 4 to 9). Next, we give the Verilog and DEF files to SOC Encounter to route all the 3D nets simultaneously (Lines 10 and 11). Simultaneous routing of all 3D nets avoids any possible congestion issues due to the small size of MIVs. Once we obtain the routed DEF, we trace the routing topology to determine (1) which MIV belongs to which net, and (2) which block-pin the MIV connects to (Lines 12 and 13). Finally, we generate the Verilog and DEF files for each tier (Lines 14 and 15) that contains the block/miv locations. IV. EVALUATION A. Experimental Setup All required code and scripts are implemented in C/C++ and python, and all experiments are carried out on a 2.5 GHz 64- bit linux system. The 45nm Nangate open source standard cell library is used in our experiments. The TSV diameter, landing pad size, pitch, and thickness are assumed to be 6μm, 7μm, 10μm, and 50μm respectively. The MIV diameter, pitch and thickness are 0.07μm, 0.28μm and 0.31μm respectively. The TSV resistance and Fig. 3. Our design flow used to get post-layout simulation results. capacitance are 50mΩ, and 122fF respectively. These parasitics are measured values, taken from [12]. The MIV resistance and capacitance are similar to that of local vias and are 4Ω, and 1fF respectively. The monolithic structure is similar to that of Figure 1, except that we use six metal layers per tier. We consider four benchmarks in this work, statistics of which are shown in Table I. The first three are taken from the Opencores benchmark suite [13], and the fourth is a custom built 256-bit integer multiplier. This multiplier is built out of 256x4-bit multiplier and 512- bit adder blocks, arranged into an adder tree. Each multiplier block has 3 pipeline stages and each adder block has 4 pipeline stages. The design flow used to obtain all results is shown in Figure 3. It consists of roughly two steps: block design, and top-level design and analysis. 1) Block Design: We begin by designing each block separately in Cadence SOC Encounter. The netlist for each block is obtained by grouping modules bottom up along the hierarchy, until they reach a certain area threshold. Timing constraints for each block depend on the overall system frequency, and are determined by context characterization. Each block is then placed, routed and timing optimized in SOC Encounter. This step finalizes the pin locations within each block. We choose four blocks at random from cf rca 16 testcase and show their layouts in Figure 4. 2) Top-level Design and Analysis: We perform floorplanning using the methodology described in Section III-B. Three different floorplanning methodologies are considered, the first two (1) TSV-based 3D (TSV) and (2) monolithic 3D (MIV) are already described. The third one, MIV TF is obtained by using the same floorplan output as in the TSV case (before whitespace insertion), but using the MIVplanning engine instead of the TSV-planning engine. This compares the quality of the two different methodologies, starting with the same floorplan. The number of MIVs in MIV TF used can be more than the number of TSVs because multi-pin nets might use far more MIVs due to their small size. Some sample layouts for floorplanning and 2-Die implementations of cf rca 16 are shown in Figure 4. We next route each die separately in SOC Encounter. We perform parasitic extraction to obtain the SPEF files for each die. In addition, we create a top-level Verilog file with the interconnections between dies, and a top-level SPEF file with the TSV/MIV parasitics. All netlist and parasitic information is then fed into Synopsys Primetime to obtain true 3D timing and power numbers. B. Experimental Results and Discussions 1) Floorplanner Validation: We run our floorplanner in mode, and compare it with the results obtained from wirelength-driven floorplanning in Cadence Encounter. The Encounter footprint area is 683

ENCOUNTER Footprint (mm 2 ) Inter-block WL (m) Encounter Ours Encounter Ours des perf 0.0655 (1.00) 0.0604 (0.92) 0.352 (1.00) 0.356 (1.01) cf rca 16 0.445 (1.00) 0.413 (0.93) 0.361 (1.00) 0.368 (1.

obtained by gradually increasing the area and running floorplanning until no block overlap is observed. The results are summarized in Table II.

This is presumably due to some bug in the legalization stage of SOC Encounter.

As seen from Table II, our floorplanner produces comparable results with SOC Encounter.

The clock period assumed for Total Negative Slack (TNS) and power calculation is taken from Table I. The different components of net power are explained in Figure 5.

), (2) inter-block component (OBN-Top) and (3) pin component of the loading cell (OBN-Pin). At the block level, the only component of net power that can be optimized is OBN-Top.

From this table, we see that with respect to the inter-block wirelength, monolithic 3D gives us significant advantage.

TSV-based 3D design however, does not give any improvement in wirelength for the small design des perf, and we start to see small improvements in the cf rca 16 and cf fft 256 8 testcases.

Also, as expected, MIV TF gives better wirelength than the TSV-based method, but worse than the MIV case.

The timing of MIV TF is sometimes better than the timing of MIV, as wirelength driven floorplanning does not guarantee best timing.

This is because the large 122fF capacitance is analogous to more than 700μm of Metal 10 wire in the 45nm technology, and a significant number of such long wires are required to see a sensible

4 Fig. 4. Some sample layouts for cf rca 16 testcase, along with select block designs, and zoomed in shots of TSVs and MIVs TABLE II A COMPARISON OF THE PERFORMANCE OF OUR FLOORPLANNER AND CADENCE ENCOUNTER Footprint (mm 2 ) Inter-block WL (m) Encounter Ours Encounter Ours des perf (1.00) (0.92) (1.00) (1.01) cf rca (1.00) (0.93) (1.00) (1.02) cf fft (1.00) (0.68) (1.00) (1.06) mul (1.00) (0.94) (1.00) (1.05) Average Fig. 5. Various components of net power reported in this paper. obtained by gradually increasing the area and running floorplanning until no block overlap is observed. The results are summarized in Table II. The large area reduction in the cf fft design is due to the fact that Cadence Encounter repeatedly produces module overlaps when provided with smaller area. This is presumably due to some bug in the legalization stage of SOC Encounter. It can still provide comparable wirelength to our floorplanner however, as this particular testcase is only locally connected, and each block communicates with only one or two neighbours. As seen from Table II, our floorplanner produces comparable results with SOC Encounter. 2) Comparison of versus 3D: In this section, we compare the wirelength, timing and top-level net power of and 3D cases of all designs. The clock period assumed for Total Negative Slack (TNS) and power calculation is taken from Table I. The different components of net power are explained in Figure 5. We have intrablock nets, and inter-block nets. The inter-block net power is further split up into three components: (1) intra-block component (OBN-Int.), (2) inter-block component (OBN-Top) and (3) pin component of the loading cell (OBN-Pin). At the block level, the only component of net power that can be optimized is OBN-Top. Furthermore, since we do not have a true 3D timing optimization engine, we report preoptimization timing and power numbers. The results for all designs are summarized in Table III. From this table, we see that with respect to the inter-block wirelength, monolithic 3D gives us significant advantage. The total wirelength reduction depends upon the ratio of inter-block wirelength to intrablock wirelength, and varies depending on the circuit. TSV-based 3D design however, does not give any improvement in wirelength for the small design des perf, and we start to see small improvements in the cf rca 16 and cf fft testcases. However, with the largest design, we see no improvement, mainly because we need to travel a large distance to the nearest whitespace block to place a TSV. Also, as expected, MIV TF gives better wirelength than the TSV-based method, but worse than the MIV case. With respect to timing and net power, we see that the MIV case improves the longest path delay (LPD), the total negative slack (TNS) and the top-level net power. The timing of MIV TF is sometimes better than the timing of MIV, as wirelength driven floorplanning does not guarantee best timing. In the benchmarks considered, except in the 2-Die case of cf fft 256 8, the TSV case does not give any timing or power improvement over. This is because the large 122fF capacitance is analogous to more than 700μm of Metal 10 wire in the 45nm technology, and a significant number of such long wires are required to see a sensible reduction. In general, the reduction in top net power of MIV follows the reduction in top net wirelength. The only exception is mult Here we see that our design has 43% more power than encounter, with only 5% more wirelength. This is because power consumption depends on the wirelength distribution, and our floorplanner results in solutions with the longer nets having higher switching activity. Therefore, we conclude that monolithic 3D can provide significant benefits over even in the case of small designs, while TSVbased 3D is suitable for designs with a large number of long interconnections or memory-on-logic stacking applications; and the improvement in the case of logic-on-logic will be observed only with smaller TSV parasitics. 3) Power benefit of monolithic 3D: We provide a detailed preoptimization power split-up of all four testcases in Table IV, with the legend explained in Figure 5. We compare with MIV-based 3D, and also provide a reference case of ideal interconnections. This ideal case does not correspond to any real physical scenario, but represents the theoretical minimum power consumption at the block level. The values are obtained by setting the parasitics of the OBN- 684

5 MIV TABLE III A COMPARISON OF WIRELENGTH, TIMING AND TOP NET POWER OF VERSUS 3D Footprint Normalised #MIV/ Inter-block Total routed LPD TNS OBN-Top (μm μm) Si. Area #TSV routed WL (μm) WL (μm) (ns) (ns) power (mw) des perf Encounter 256x ,805 (1.00) 563,293 (1.00) 1.65 (1.00) (1.00) (1.00) Ours 251x ,489 (1.01) 566,977 (1.01) 1.73 (1.05) (1.21) (1.06) 2 Dies 146x , ,678 (0.76) 478,166 (0.85) 1.44 (0.87) (0.58) 8.55 (0.76) 3 Dies 127x , ,240 (0.63) 432,728 (0.77) 1.23 (0.74) (0.35) 7.29 (0.65) 4 Dies 111x , ,868 (0.58) 415,356 (0.74) 1.10 (0.67) (0.17) 6.41 (0.57) 2 Dies 215x ,092 (1.34) 683,580 (1.21) 2.18 (1.32) (1.92) (1.88) TSV 3 Dies 320x ,267 (1.46) 725,755 (1.29) 2.46 (1.49) (3.33) (2.69) 4 Dies 359x ,739 (2.08) 945,227 (1.68) 4.09 (2.48) (4.37) (4.28) 2 Dies 213x ,823 (1.05) 581,311 (1.03) 2.06 (1.25) (1.35) (1.13) MIV TF 3 Dies 211x ,226 (1.00) 563,714 (1.00) 1.65 (1.00) (1.2) (1.04) 4 Dies 186x , ,356 (0.68) (0.80) 1.25 (0.75) (0.40) 7.29 (0.65) cf rca 16 Encounter 667x ,673 (1.00) 1,572,291 (1.00) 1.85 (1.00) -2, (1.00) 4.71 (1.00) Ours 555x ,542 (1.02) 1,578,160 (1.00) 1.75 (0.95) -2, (0.78) 4.73 (1.00) 2 Dies 416x , ,156 (0.80) 1,499,774 (0.95) 1.73 (0.94) -1, (0.71) 3.74 (0.79) MIV 3 Dies 367x , ,910 (0.71) 1,466,258 (0.93) 1.72 (0.93) -1, (0.63) 3.61 (0.77) 4 Dies 273x , ,583 (0.67) 1,451,201 (0.92) 1.69 (0.92) -1, (0.57) 3.37 (0.72) 2 Dies 484x ,347 (1.07) 1,564,965 (1.00) 2.48 (1.34) -11,093 (4.02) 7.49 (1.59) TSV 3 Dies 377x ,425 (1.11) 1,612,043 (1.03) 3.23 (1.75) -16,074 (5.82) (2.44) 4 Dies 350x ,090 (0.95) 1,555,708 (0.99) 3.63 (1.97) -18,825 (6.81) 13.4 (2.85) 2 Dies 438x ,631 (0.89) 1,534,249 (0.98) 1.79 (0.97) -2,463.8 (0.89) 4.12 (0.87) MIV TF 3 Dies 375x ,093 (0.78) 1,491,711 (0.95) 1.65 (0.90) -1, (0.49) 3.7 (0.79) 4 Dies 317x ,092 (0.73) 1,473,710 (0.94) 1.66 (0.90) (0.45) 3.45 (0.73) cf fft Encounter 1,300x1, ,674 (1.00) 4,904,487 (1.00) 2.18 (1.00) -22,308 (1.00) 7.7 (1.00) Ours 1,142x ,933 (1.06) 4,927,746 (1.00) 2.12 (0.97) -11,388 (0.51) 8.2 (1.06) 2 Dies 819x , ,787 (0.64) 4,754,600 (0.97) 1.96 (0.90) -3,618 (0.16) 5.3 (0.69) MIV 3 Dies 581x , ,256 (0.61) 4,745,069 (0.97) 1.9 (0.87) -4,447 (0.20) 5.06 (0.66) 4 Dies 595x , ,049 (0.65) 4,759,862 (0.97) 1.85 (0.85) -4,023 (0.18) 5.29 (0.69) 2 Dies 679x ,166 (0.89) 4,859,979 (0.99) 2.1 (0.96) -14,655 (0.66) 9.22 (1.20) TSV 3 Dies 653x ,592 (0.86) 4,848,405 (0.99) 2.47 (1.13) -34,950 (1.57) 11.1 (1.44) 4 Dies 584x ,216 (1.02) 4,913,029 (1.00) 3.22 (1.48) -67,602 (3.03) (2.16) 2 Dies 675x ,887 (0.87) 4,848,700 (0.99) 1.87 (0.86) -6,314 (0.28) 6.82 (0.89) MIV TF 3 Dies 649x ,045 (0.82) 4,829,858 (0.98) 1.74 (0.80) -1,358 (0.06) 6.24 (0.81) 4 Dies 578x ,465 (0.75) 4,801,278 (0.98) 1.85 (0.85) -1,626 (0.07) 5.74 (0.75) mult Encounter 2,280x2, ,089,968 (1.00) 29,444,308 (1.00) 1.12 (1.00) (1.00) (1.00) Ours 2,144x2, ,870,346 (1.05) 30,224,686 (1.03) 1.27 (1.14) (1.17) (1.43) 2 Dies 1,506x1, ,513 13,815,376 (0.81) 26,169,716 (0.89) 1.17 (1.05) (1.16) (1.01) MIV 3 Dies 1,286x1, ,682 11,392,196 (0.67) 23,746,536 (0.81) 0.95 (0.85) (0.62) 125 (0.87) 4 Dies 1,177x1, ,994 10,116,222 (0.59) 22,470,562 (0.76) 0.97 (0.87) (0.60) (0.77) 2 Dies 1,608x1, ,683 18,825,744 (1.10) 31,180,084 (1.06) 1.76 (1.58) (2.04) (2.11) TSV 3 Dies 1,508x1, ,599 21,184,404 (1.24) 33,538,744 (1.14) 2.02 (1.8) (3.87) (2.58) 4 Dies 1,240x1, ,232 20,890,062 (1.22) 33,244,402 (1.13) 2.45 (2.19) (4.37) (2.61) 2 Dies 1,601x1, ,162 16,127,948 (0.94) 28,482,288 (0.97) 1.06 (0.95) (0.95) (1.30) MIV TF 3 Dies 1,501x1, , ,560,50 (0.89) 27,610,390 (0.94) 0.99 (0.88) (0.86) (1.25) 4 Dies 1,182x1, ,260 15,1246,51 (0.89) 27,478,991 (0.93) 1.12 (1.00) (1.07) (1.30) Fig. 6. Timing slack histograms comparing and MIV-based 3D (2 die) for FFT benchmark. Negative slacks are shown in red, and positive slacks in green. Top nets to zero in Primetime. With a reduction in the wirelength of top level nets, we expect reduction the following power components: (1) Inter-block components of inter-block nets (OBN-Top), and (2) Switching power of the standard cells driving inter-block nets. From Table IV, we see that even theoretically, only a 10% average reduction in the total power consumption is possible, and the reduction is larger for designs with relatively more inter-block nets. We also see that MIV-based 3D gives us 3.1% average reduction in the total power consumption across our four testcases. If we consider the parameter that is being optimized by floorplanning, i.e., OBN-Top, we see that a large reduction in the power consumption is obtained by using monolithic 3D. The reduction in the driving cell power is present in all testcases, but most noticeable in the mult , which has a huge number of driving cells. Since we do not have a true 3D timing optimization tool, we cannot compare post-optimization numbers directly. However, we can predict the trend from the TNS reduction (Table III), and timing slack histograms (shown for cf fft testcase in Figure 6). Due to the average reduction of 51% in TNS, fewer buffer insertions and cell 685

6 TABLE IV A DETAILED SPLIT UP OF THE POWER FOR AND MONOLITHIC 3D ( IN MW ) Std. Cell Leakage IBN OBN-Pin OBN-Int. OBN-Top Total des perf Ideal interconnections (-) 50.1 (0.80) Encounter (1.00) 62.5 (1.00) Ours (1.06) 63.3 (1.01) 2 Dies (0.76) 59.5 (0.95) MIV 3 Dies (0.65) 58.1 (0.93) 4 Dies (0.57) 57.1 (0.91) cf rca 16 Ideal interconnections (-) (0.97) Encounter (1.00) (1.00) Ours (1.00) (1.00) 2 Dies (0.79) (0.99) MIV 3 Dies (0.77) (0.99) 4 Dies (0.72) (0.99) cf fft Ideal interconnections (-) (0.98) Encounter (1.00) (1.00) Ours (1.06) (1.00) 2 Dies (0.69) (0.99) MIV 3 Dies (0.66) (0.99) 4 Dies (0.69) (0.99) mult Ideal interconnections (-) (0.84) Encounter (1.00) (1.00) Ours (1.43) (1.04) 2 Dies (1.01) (0.98) MIV 3 Dies (0.87) (0.97) 4 Dies (0.77) (0.95) upsizing will be required to meet timing. Also, since the entire slack histograms are shifted towards the right, techniques such as timing slack redistribution or multi-v th design can be employed to achieve further power benefit. C. Design Guidelines for block-level MIV-based 3D We consider two possible scenarios: timing critical and power critical designs. In the case of timing critical designs, we have shown that MIV-based 3D can give significant reduction in longest path delay, as well as the total negative slack. Larger reductions in delay will be seen for designs with combinational paths through blocks. In the case of power critical designs, we have shown that MIV-based 3D gives significant reduction in inter-block net power, and depending on the number of inter-block nets, significant savings in power of driving cells of inter-block nets. Further power reduction can be achieved in one of several ways: (1) re-designing the blocks to downsize interblock drivers, (2) voltage scaling of the 3D system, which will shift the entire timing distribution back to the case, and (3) Multi-V th optimization will require fewer low V th cells to meet timing, reducing device power. V. CONCLUSIONS In this paper, we provided a floorplanning framework for monolithic 3D-ICs, and a methodology to obtain post-layout wirelength, timing, and power numbers for block-level 3D-ICs. We demonstrated that monolithic inter-tier via (MIV)-based 3D-ICs can achieve up to 42% reduction in wirelength when compared with -ICs. In addition, we compared our monolithic 3D designs to the throughsilicon-via (TSV)-based 3D-IC designs in terms of area, wirelength, power and performance. We observed that TSV-based 3D is only beneficial if either the TSV capacitance scales down, or the circuit has a large number of long wires. We also showed that due to a significant reduction in the total negative slack, and increase of the positive slacks, MIV-based 3D-ICs require less timing optimization. Moreover, with the application of advanced methods such as multi- Vth etc, further reduction in power is possible. REFERENCES [1] K. Yang, D. H. Kim, and S. K. Lim, Design Quality Tradeoff Studies for 3D ICs Built with Nano-scale TSVs and Devices, in Proc. Int. Symp. on Quality Electronic Design, 2012, pp [2] X.Dong, J. Zhao, and Y. Xie, Fabrication Cost Analysis and Cost-Aware Design Space Exploration for 3D-ICs, in IEEE Trans. on Computer- Aided Design of Integrated Circuits and Systems, 2010, pp [3] P. Batude et al., Advances in 3D CMOS Sequential Integration, in Proc. IEEE Int. Electron Devices Meeting, 2009, pp [4] O.Thomas et al., Compact 6T SRAM cell with robust read/write stabilizing design in 45nm Monolithic 3D IC technology, in Proc. IEEE Int. Conf. on Integrated Circuit Design and Tech., 2009, pp [5] S.-M. Jung, H. Lim, K. Kwak, and K. Kim, 500-MHz DDR High- Performance 72-Mb 3-D SRAM Fabricated With Laser-Induced Epitaxial c-si Growth Technology for a Stand-Alone and Embedded Memory Application, in IEEE Trans. on Electron Devices, 2010, pp [6] D. H. Kim, R. O. Topaloglu, and S. K. Lim, Block-Level 3D IC Design with Through-Silicon-Via Planning, in Proc. Asia and South Pacific Design Aut. Conf., 2012, pp [7] M. Tsai, T. Wang, and T. Hwang, Through-Silicon Via Planning in 3-D Floorplanning, in IEEE Trans. on VLSI Systems, 2011, pp [8] J. Knechtel, I. Markov, and J. Lienig, Assembling 2-D Blocks Into 3-D Chips, in IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2012, pp [9] S. Bobba et al., CELONCEL: Effective design technique for 3-D monolithic integration targeting high performance integrated circuits, in Proc. Asia and South Pacific Design Aut. Conf., 2011, pp [10] C. Liu and S. K. Lim, Ultra-High Density 3D SRAM Cell Designs for Monolithic 3D Integration, in Proc. IEEE Int. Interconnect Technology Conference, [11] H. Xu, D. Sheqin, M. Yuchun, and H. Xianlong, Simultaneous buffer and interlayer via planning for 3D floorplanning, in Proc. Int. Symp. on Quality Electronic Design, 2009, pp [12] X. Wu et al., Electrical Characterization for Inter-tier Connections and Timing Analysis for 3-D ICs, in IEEE Trans. on VLSI Systems, 2012, pp [13] 686

A Design Tradeoff Study with Monolithic 3D Integration

A Design Tradeoff Study with Monolithic 3D Integration Chang Liu and Sung Kyu Lim Georgia Institute of Techonology Atlanta, Georgia, 3332 Phone: (44) 894-315, Fax: (44) 385-1746 Abstract This paper studies