Calibrating Achievable Design GSRC Annual Review June 9, 2002 Wayne Dai, Andrew Kahng, Tsu-Jae King, Wojciech Maly,, Igor Markov, Herman Schmit, Dennis Sylvester DUSD(Labs) Calibrating Achievable Design (C.A.D.) Theme GTX / Living Roadmap: Where to Focus? What is the benefit of low-k? Achievable global signaling quality? Optimal memory integration and architecture? http://vlsicad.ucsd.edu vlsicad.ucsd.edu/gtx CAD-IP Reuse: Faster and Better R&D Industry-compatible, open-source, back-end flows Remote execution autograding infrastructure http://vlsicad.eecs.umich.edu/bk (VLSI design education, common data model, ) METRICS: Measure & Improve Survey of design metrics, design project metrics Clock speed, front-end acceptance, tool noise, Industry deployment experience http://vlsicad.ucsd.edu vlsicad.ucsd.edu/metrics Page 1
Implementation Platform for and Logic Integration Wayne Dai June 9, 2002 DUSD(Labs) Outline Challenges and opportunities for System-in in-a-package (SiP) SiP implementation platform for memory/logic integration Configurable area-io memory architecture SiP performance analysis and modeling based on GTX framework Concluding remarks Page 2
Messages from ITRS Package cost increases 5% each year. 8% - 11% increase in pin count per packaged IC each year, 5% reduction in cost per pin each year. Inter-chip signal integrity issues will be more challenging. In 2002, chip to board clock frequency is 400MHz for cost- performance system, 800MHz for high-performance system. Package size can not shrink due to the fanout problem. Moore s law is good for silicon, but not good for board. System-on-a-Chip is not always a good idea. Cost penalty, complexity of design and verification, difficulty of integrating different technologies The Y Chart of System Design ARCHITECTURAL DOMAIN FUNCTIONAL DOMAIN Flash up Synthesis DRAM Platform-based design methodology is the only solution to deliver complex embedded systems in a limited design time. Implementation PHYSICAL DOMAIN Flash up DRAM Missing Page 3
System-in in-a-package Implementation Platform Chip-on-Chip Chip-Laminate-Chip DRAM and graphic chip integration A giant chip rather than a miniaturized circuit board: preserving on-chip electrical environment Chip-Laminate Laminate-Chip Technology Logic side side Characteristic: Maximum off-chip delay << IO buffer delay (3.5ns) Signal round trip time < rise time (500ps) Inter-chip skew < board skew (500ps) No terminating resistors required Smaller IO buffer size and minimized ESD protection Decoupling C Logic Area-IO DRAM Laminate BGA ball Chip-Laminate-Chip (CLC) architecture Maximum variation of interface delay CLC Conventional Technology Package 40 ps 500 ps Interface data rate 500 MHz DDR 266 MHz DDR Power consumption per pin 7.6 mw 19 mw Source: SyChip Inc. Page 4
Single-Package Computer A high performance system Server CPU (700MHz, 2MB L2 cache) Graphic chip & north bridge 266 MHz DDR SDRAM The performance is limited by the memory access time The power consumption of the CPU is over 30W in active mode A low cost system 500-700 MHz integrated core logic (integrated CPU, north bridge, and graphic chip) 400-500 MHz DDR SDRAM CLC BGA package Better performance achieved by balancing the core logic and memory access speed CPU 700MHz L2 Cache (up to 2MB) CPU/North Bridge/Graphic Chip Integrated Core North Bridge and Graphic Chip 266MHz DDR SDRAM 500MHz DDR SDRAM Issues Addressed What is the most cost-effective implementation platform for memory and logic integration, embedded DRAM, SiP, or PCB? What are the trade-offs? What is the maximum bandwidth achievable by SiP? What is the maximum IO speed? How should the IO design take advantage of this platform? How should the memory architecture be re-optimized for this platform? Page 5
Issues Addressed What is the routability of IO redistribution? What will be the optimal power/ground structure on laminate? What will be the optimal clock structure on laminate? What is the model of junction temperature in SiP module? Etc. Outline Challenges and opportunities for System-in in-a-package (SiP) SiP implementation platform for memory/logic integration Configurable area-io memory architecture SiP performance analysis and modeling based on GTX framework Concluding remarks Page 6
IO Issues in System-in in-a-package Integration with conventional logic and memory chips can not fully ly realize the potential of SiP IO IO topology topology IO IO drive drive capability capability Conventional Conventional IO IO SiP SiP Problems Problems Periphery Periphery IO IO for for wire wire Area Area array array IO IO for for Long Long rerouting rerouting wires wires bonding bonding flip-chip flip-chip assembly assembly and and redundant redundant parasitic parasitic load load Drive Drive large large capacitance capacitance Capacitance Capacitance could could caused caused by by wire wire bonding bonding be be one one order order less less than than wire wire bonding bonding ESD ESD protection protection Design Design for for interface interface with with outside outside world world Interconnect Interconnect inside inside package, package, no no breakdown breakdown voltage voltage accumulation accumulation Extra Extra chip chip area, area, delay, delay, and and power power consumption consumption Extra Extra chip chip area area and and power power consumption consumption Area-IO Is the Solution! Flip-chip technology preserves on-chip electrical environment for SiP. ESD Protection can be minimized for intra-package IOs. Design-specific specific IOs are desired for optimal driving strength. Area-IO architecture provides rich power/ground pads for better signal integrity. Logic&Buffer ESD Protection Circuit PAD Conventional IO Logic&Buffer PAD Area-IO Page 7
Configurable Architecture Different architectures require different memory organizations. organization for n-bit-serial processors short word-width (1-8) large number of words large number of banks organization for microprocessors medium word-width (16-64) medium number of words multiple banks organization for graphics processors long word-width (512-1K) small number of words single bank Configurable Architecture Commercial memory can not provide high bandwidth communication with small chip/board area. Embedded memory does not have the flexibility to change the memory organization for different programming models. Configurable memory for System-in in-a-package (SiP) provides the opportunity to make one memory chip meet the requirements of different architectures. organization can be programmed for different architecture (n-bit bit- serial processors, microprocessors, graphics processors). Word-width ranges from 8 to 1K. Page 8
Design Case: Configurable Area-IO SRAM Give users the flexibility to program the memory for different applications. 15 configuration modes. Consists of 16 x 32k SRAMs with configuration control circuit. Distributing area-io cells all around the chip. Easy to migrate to Multi-DRAM DRAM- Module. Area-IO configuration logic asram Final Layout 3.85 mm 6.80 mm Top( 3.34M Tr., 570 Area-IO ) Page 9
Area-IO Vs. Peripheral-IO Area-IO architecture significantly reduces the parasitic capacitance of IO redistribution. 300 Peripheral-IO Area-IO rerouting Number of Net 200 100 Area-IO 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 Capacitance (pf) 1.5 Peripheral-IO rerouting Outline Challenges and opportunities for System-in in-a-package (SiP) SiP implementation platform for memory/logic integration Configurable area-io memory architecture SiP performance analysis and modeling based on GTX framework Concluding remarks Page 10
DRAM Performance Analysis Analyze DRAM delay/area/power based on architectural parameters (size, IO width, address width, etc.) and technological parameters (feature size, transistor size, cell capacitance, etc.) Predict design feasibility based on SiP platform. Compare different DRAM architectures and implementations. Enable designers to analyze the DRAM cost and performance without actual physical implementation. Modeled DRAM Architecture subarray sense amplifier WL BL Row decoder & WL dirver Row predecoder data bus Output Multiplixer Column decoder address Datapath predecoder Column predecoder data bus Page 11
Wordline Timing Wordline equivalent circuit Kbootstrap T = K R C / 2 is a process bootstrap bootstrap eq eq dependent constant Wordline delay is proportional to wordline length. Sense Amplifier Timing Sensing time Vs. bitline capacitance (SPICE simulation result) T = K C senseamp senseamp bitline Ksenseamp is a process dependent constant Bitline delay is proportional to bitline capacitance. Page 12
DRAM Core Area Analysis Compare core area of embedded DRAM and adram for SiP. adram for SiP has area-io architecture with various bit-width. Assume ASIC technology for edram and conventional DRAM technology for adram. Chip Area Comparison of edram and adram 64Mb edram adram 256bit IO adram 512bit IO Chip Size (um sqr) 70 60 50 40 30 20 10 0 1999 2000 2001 2002 2003 Year Area overhead of IO circuitry is not significant. Implications from Our Study DRAM performance can be improved by dividing DRAM cell array into smaller self-contained s. Additional IOs can be implemented with area array architecture. With rich area-io, it is possible to minimize or even remove column decoding circuit to improve timing. With SiP implementation platform, memory (DRAM/SRAM) architecture should be reoptimized for better electrical environment. Page 13
Routability Analysis for IO Rerouting Given package size and number of pins, what is maximum pin pitch? Given number of pins, what is minimum package size? Given package size, what is maximum total pins? Octilinear Routing All-Angle Routing Power/Ground Analysis for SiP How many P/G pins needed? Where to place decoupling capacitors? On-chip? On-card? On-board? How much decoupling capacitance? Too little noisy power supplies Too much unpredictable LC resonance increase die area Power/Ground Distribution Structure Planes Grid Mesh Planes Cross Traces Resistive drops Very low Low Medium Inductive drops Low Medium High # Layers High Medium Low Page 14
Power/Ground Analysis for SiP On IC, hybrid full-wave techniques are applied for different types of P/G structures P/G Structure in Chip Field Computed with MEI Method Iterations 1 2 3 4 IMET MEI MoM Inversion 1.6s 3.2s 4.8s 6.4s 14.4s Total 2.7s 5.1s 7.4s 9.8s 16.9s 57.4s In package, EM fields are decomposed into two modes (J. Fang, UCSC) : Strip-line mode fields propagate along metal traces Parallel-plate mode field propagate between adjacent planes Three to four orders of magnitude faster than ASTAP Mesh Density ASTAP on IBM 3090 Mainframe Decompostion Method on IBM R/6000-350 Workstation Ratio of CPU Times Signal Trace between Two Planes on Package Level A Pulse Propagate down the Via and onto the Trace 30 30 1 m 55.29 s 0.18 s 640 42 42 5 m 42.73 s 0.35 s 980 60 60 19 m 30.88 s 0.74 s 1582 Thermal Analysis for SiP Junction Junction temperature should be estimated at early design stage. Simplified Simplified thermal model can provide relatively accurate result for early analysis. Detailed Detailed thermal simulation with numeric methods can be applied to obtain accurate junction temperature. Simplified thermal model for one logic, two DRAM SiP module Page 15
Concluding Remarks System-on-a-Chip should be generalized to System-in in-a- Package (SiP). SiP provides new opportunities for gigascale integration. SiP brings cost-effective alternatives for embedded DRAM. Area-IO opens up a new paradigm for trading off on-chip interconnect versus on-package interconnect. Configurable memory enables single memory chip to meet the requirements for various applications. Cost/performance and design feasibility early analysis is highly desired for SiP implementation platform. Page 16