Calibrating Achievable Design GSRC Annual Review June 9, 2002

Similar documents
Physical Implementation

CSE241 VLSI Digital Circuits Winter Lecture 17: Packaging

Unleashing the Power of Embedded DRAM

The Memory Hierarchy 1

edram to the Rescue Why edram 1/3 Area 1/5 Power SER 2-3 Fit/Mbit vs 2k-5k for SRAM Smaller is faster What s Next?

COMPUTER ARCHITECTURES

Burn-in & Test Socket Workshop

TABLE OF CONTENTS 1.0 PURPOSE INTRODUCTION ESD CHECKS THROUGHOUT IC DESIGN FLOW... 2

ECE 5745 Complex Digital ASIC Design Topic 7: Packaging, Power Distribution, Clocking, and I/O

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.

CS250 VLSI Systems Design Lecture 9: Memory

ENEE 759H, Spring 2005 Memory Systems: Architecture and

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Additional Slides for Lecture 17. EE 271 Lecture 17

Lecture 20: Package, Power, and I/O

Technical Note. Design Considerations when using NOR Flash on PCBs. Introduction and Definitions

ECE 486/586. Computer Architecture. Lecture # 2

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

Design Methodologies. Full-Custom Design

Chapter 5: ASICs Vs. PLDs

DesignConEast 2005 Track 6: Board and System-Level Design (6-TA4)

Signal Integrity Comparisons Between Stratix II and Virtex-4 FPGAs

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

Memory in Digital Systems

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

More Course Information

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

Design Methodologies and Tools. Full-Custom Design

DIRECT Rambus DRAM has a high-speed interface of

Package level Interconnect Options

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

Memory in Digital Systems

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

Digital IO PAD Overview and Calibration Scheme

EE414 Embedded Systems Ch 5. Memory Part 2/2

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

MEMORIES. Memories. EEC 116, B. Baas 3

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

Chapter 5 Internal Memory

DRAM Memory Modules Overview & Future Outlook. Bill Gervasi Vice President, DRAM Technology SimpleTech

SSO Noise And Conducted EMI: Modeling, Analysis, And Design Solutions

3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

An Overview of Standard Cell Based Digital VLSI Design

Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals

Chip/Package/Board Design Flow

8D-3. Experiences of Low Power Design Implementation and Verification. Shi-Hao Chen. Jiing-Yuan Lin

A Study of IR-drop Noise Issues in 3D ICs with Through-Silicon-Vias

GLAST Silicon Microstrip Tracker Status

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS

USING LOW COST, NON-VOLATILE PLDs IN SYSTEM APPLICATIONS

Chapter 8 Memory Basics

Memory Classification revisited. Slide 3

Physical Design Implementation for 3D IC Methodology and Tools. Dave Noice Vassilios Gerousis

OVERCOMING THE MEMORY WALL FINAL REPORT. By Jennifer Inouye Paul Molloy Matt Wisler

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM

SMAFTI Package Technology Features Wide-Band and Large-Capacity Memory

AMchip architecture & design

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Advanced Surface Based MoM Techniques for Packaging and Interconnect Analysis

Challenges and Opportunities for Design Innovations in Nanometer Technologies

High-speed, high-bandwidth DRAM memory bus with Crosstalk Transfer Logic (XTL) interface. Outline

Semiconductor Memory Classification

A Practical Approach to Preventing Simultaneous Switching Noise and Ground Bounce Problems in IO Rings

CSE502: Computer Architecture CSE 502: Computer Architecture

Advanced 1 Transistor DRAM Cells

Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey

SYNTHESIS FOR ADVANCED NODES

CSE502: Computer Architecture CSE 502: Computer Architecture

Design and Implementation of 8K-bits Low Power SRAM in 180nm Technology

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Design and Characterization of an Embedded ASIC DRAM

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization

OVERALL TECHNOLOGY ROADMAP CHARACTERISTICS TABLES CONTENTS

On GPU Bus Power Reduction with 3D IC Technologies

Symbol Parameter Min Typ Max VDD_CORE Core power 0.9V 1.0V 1. 1V. VDD33 JTAG/FLASH power 2.97V 3.3V 3.63V

2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today.

Digital Design Methodology

Prototype of SRAM by Sergey Kononov, et al.

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Introduction to SRAM. Jasur Hanbaba

Memory Design I. Semiconductor Memory Classification. Read-Write Memories (RWM) Memory Scaling Trend. Memory Scaling Trend

Wafer Level Packaging The Promise Evolves Dr. Thomas Di Stefano Centipede Systems, Inc. IWLPC 2008

Transcription:

Calibrating Achievable Design GSRC Annual Review June 9, 2002 Wayne Dai, Andrew Kahng, Tsu-Jae King, Wojciech Maly,, Igor Markov, Herman Schmit, Dennis Sylvester DUSD(Labs) Calibrating Achievable Design (C.A.D.) Theme GTX / Living Roadmap: Where to Focus? What is the benefit of low-k? Achievable global signaling quality? Optimal memory integration and architecture? http://vlsicad.ucsd.edu vlsicad.ucsd.edu/gtx CAD-IP Reuse: Faster and Better R&D Industry-compatible, open-source, back-end flows Remote execution autograding infrastructure http://vlsicad.eecs.umich.edu/bk (VLSI design education, common data model, ) METRICS: Measure & Improve Survey of design metrics, design project metrics Clock speed, front-end acceptance, tool noise, Industry deployment experience http://vlsicad.ucsd.edu vlsicad.ucsd.edu/metrics Page 1

Implementation Platform for and Logic Integration Wayne Dai June 9, 2002 DUSD(Labs) Outline Challenges and opportunities for System-in in-a-package (SiP) SiP implementation platform for memory/logic integration Configurable area-io memory architecture SiP performance analysis and modeling based on GTX framework Concluding remarks Page 2

Messages from ITRS Package cost increases 5% each year. 8% - 11% increase in pin count per packaged IC each year, 5% reduction in cost per pin each year. Inter-chip signal integrity issues will be more challenging. In 2002, chip to board clock frequency is 400MHz for cost- performance system, 800MHz for high-performance system. Package size can not shrink due to the fanout problem. Moore s law is good for silicon, but not good for board. System-on-a-Chip is not always a good idea. Cost penalty, complexity of design and verification, difficulty of integrating different technologies The Y Chart of System Design ARCHITECTURAL DOMAIN FUNCTIONAL DOMAIN Flash up Synthesis DRAM Platform-based design methodology is the only solution to deliver complex embedded systems in a limited design time. Implementation PHYSICAL DOMAIN Flash up DRAM Missing Page 3

System-in in-a-package Implementation Platform Chip-on-Chip Chip-Laminate-Chip DRAM and graphic chip integration A giant chip rather than a miniaturized circuit board: preserving on-chip electrical environment Chip-Laminate Laminate-Chip Technology Logic side side Characteristic: Maximum off-chip delay << IO buffer delay (3.5ns) Signal round trip time < rise time (500ps) Inter-chip skew < board skew (500ps) No terminating resistors required Smaller IO buffer size and minimized ESD protection Decoupling C Logic Area-IO DRAM Laminate BGA ball Chip-Laminate-Chip (CLC) architecture Maximum variation of interface delay CLC Conventional Technology Package 40 ps 500 ps Interface data rate 500 MHz DDR 266 MHz DDR Power consumption per pin 7.6 mw 19 mw Source: SyChip Inc. Page 4

Single-Package Computer A high performance system Server CPU (700MHz, 2MB L2 cache) Graphic chip & north bridge 266 MHz DDR SDRAM The performance is limited by the memory access time The power consumption of the CPU is over 30W in active mode A low cost system 500-700 MHz integrated core logic (integrated CPU, north bridge, and graphic chip) 400-500 MHz DDR SDRAM CLC BGA package Better performance achieved by balancing the core logic and memory access speed CPU 700MHz L2 Cache (up to 2MB) CPU/North Bridge/Graphic Chip Integrated Core North Bridge and Graphic Chip 266MHz DDR SDRAM 500MHz DDR SDRAM Issues Addressed What is the most cost-effective implementation platform for memory and logic integration, embedded DRAM, SiP, or PCB? What are the trade-offs? What is the maximum bandwidth achievable by SiP? What is the maximum IO speed? How should the IO design take advantage of this platform? How should the memory architecture be re-optimized for this platform? Page 5

Issues Addressed What is the routability of IO redistribution? What will be the optimal power/ground structure on laminate? What will be the optimal clock structure on laminate? What is the model of junction temperature in SiP module? Etc. Outline Challenges and opportunities for System-in in-a-package (SiP) SiP implementation platform for memory/logic integration Configurable area-io memory architecture SiP performance analysis and modeling based on GTX framework Concluding remarks Page 6

IO Issues in System-in in-a-package Integration with conventional logic and memory chips can not fully ly realize the potential of SiP IO IO topology topology IO IO drive drive capability capability Conventional Conventional IO IO SiP SiP Problems Problems Periphery Periphery IO IO for for wire wire Area Area array array IO IO for for Long Long rerouting rerouting wires wires bonding bonding flip-chip flip-chip assembly assembly and and redundant redundant parasitic parasitic load load Drive Drive large large capacitance capacitance Capacitance Capacitance could could caused caused by by wire wire bonding bonding be be one one order order less less than than wire wire bonding bonding ESD ESD protection protection Design Design for for interface interface with with outside outside world world Interconnect Interconnect inside inside package, package, no no breakdown breakdown voltage voltage accumulation accumulation Extra Extra chip chip area, area, delay, delay, and and power power consumption consumption Extra Extra chip chip area area and and power power consumption consumption Area-IO Is the Solution! Flip-chip technology preserves on-chip electrical environment for SiP. ESD Protection can be minimized for intra-package IOs. Design-specific specific IOs are desired for optimal driving strength. Area-IO architecture provides rich power/ground pads for better signal integrity. Logic&Buffer ESD Protection Circuit PAD Conventional IO Logic&Buffer PAD Area-IO Page 7

Configurable Architecture Different architectures require different memory organizations. organization for n-bit-serial processors short word-width (1-8) large number of words large number of banks organization for microprocessors medium word-width (16-64) medium number of words multiple banks organization for graphics processors long word-width (512-1K) small number of words single bank Configurable Architecture Commercial memory can not provide high bandwidth communication with small chip/board area. Embedded memory does not have the flexibility to change the memory organization for different programming models. Configurable memory for System-in in-a-package (SiP) provides the opportunity to make one memory chip meet the requirements of different architectures. organization can be programmed for different architecture (n-bit bit- serial processors, microprocessors, graphics processors). Word-width ranges from 8 to 1K. Page 8

Design Case: Configurable Area-IO SRAM Give users the flexibility to program the memory for different applications. 15 configuration modes. Consists of 16 x 32k SRAMs with configuration control circuit. Distributing area-io cells all around the chip. Easy to migrate to Multi-DRAM DRAM- Module. Area-IO configuration logic asram Final Layout 3.85 mm 6.80 mm Top( 3.34M Tr., 570 Area-IO ) Page 9

Area-IO Vs. Peripheral-IO Area-IO architecture significantly reduces the parasitic capacitance of IO redistribution. 300 Peripheral-IO Area-IO rerouting Number of Net 200 100 Area-IO 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 Capacitance (pf) 1.5 Peripheral-IO rerouting Outline Challenges and opportunities for System-in in-a-package (SiP) SiP implementation platform for memory/logic integration Configurable area-io memory architecture SiP performance analysis and modeling based on GTX framework Concluding remarks Page 10

DRAM Performance Analysis Analyze DRAM delay/area/power based on architectural parameters (size, IO width, address width, etc.) and technological parameters (feature size, transistor size, cell capacitance, etc.) Predict design feasibility based on SiP platform. Compare different DRAM architectures and implementations. Enable designers to analyze the DRAM cost and performance without actual physical implementation. Modeled DRAM Architecture subarray sense amplifier WL BL Row decoder & WL dirver Row predecoder data bus Output Multiplixer Column decoder address Datapath predecoder Column predecoder data bus Page 11

Wordline Timing Wordline equivalent circuit Kbootstrap T = K R C / 2 is a process bootstrap bootstrap eq eq dependent constant Wordline delay is proportional to wordline length. Sense Amplifier Timing Sensing time Vs. bitline capacitance (SPICE simulation result) T = K C senseamp senseamp bitline Ksenseamp is a process dependent constant Bitline delay is proportional to bitline capacitance. Page 12

DRAM Core Area Analysis Compare core area of embedded DRAM and adram for SiP. adram for SiP has area-io architecture with various bit-width. Assume ASIC technology for edram and conventional DRAM technology for adram. Chip Area Comparison of edram and adram 64Mb edram adram 256bit IO adram 512bit IO Chip Size (um sqr) 70 60 50 40 30 20 10 0 1999 2000 2001 2002 2003 Year Area overhead of IO circuitry is not significant. Implications from Our Study DRAM performance can be improved by dividing DRAM cell array into smaller self-contained s. Additional IOs can be implemented with area array architecture. With rich area-io, it is possible to minimize or even remove column decoding circuit to improve timing. With SiP implementation platform, memory (DRAM/SRAM) architecture should be reoptimized for better electrical environment. Page 13

Routability Analysis for IO Rerouting Given package size and number of pins, what is maximum pin pitch? Given number of pins, what is minimum package size? Given package size, what is maximum total pins? Octilinear Routing All-Angle Routing Power/Ground Analysis for SiP How many P/G pins needed? Where to place decoupling capacitors? On-chip? On-card? On-board? How much decoupling capacitance? Too little noisy power supplies Too much unpredictable LC resonance increase die area Power/Ground Distribution Structure Planes Grid Mesh Planes Cross Traces Resistive drops Very low Low Medium Inductive drops Low Medium High # Layers High Medium Low Page 14

Power/Ground Analysis for SiP On IC, hybrid full-wave techniques are applied for different types of P/G structures P/G Structure in Chip Field Computed with MEI Method Iterations 1 2 3 4 IMET MEI MoM Inversion 1.6s 3.2s 4.8s 6.4s 14.4s Total 2.7s 5.1s 7.4s 9.8s 16.9s 57.4s In package, EM fields are decomposed into two modes (J. Fang, UCSC) : Strip-line mode fields propagate along metal traces Parallel-plate mode field propagate between adjacent planes Three to four orders of magnitude faster than ASTAP Mesh Density ASTAP on IBM 3090 Mainframe Decompostion Method on IBM R/6000-350 Workstation Ratio of CPU Times Signal Trace between Two Planes on Package Level A Pulse Propagate down the Via and onto the Trace 30 30 1 m 55.29 s 0.18 s 640 42 42 5 m 42.73 s 0.35 s 980 60 60 19 m 30.88 s 0.74 s 1582 Thermal Analysis for SiP Junction Junction temperature should be estimated at early design stage. Simplified Simplified thermal model can provide relatively accurate result for early analysis. Detailed Detailed thermal simulation with numeric methods can be applied to obtain accurate junction temperature. Simplified thermal model for one logic, two DRAM SiP module Page 15

Concluding Remarks System-on-a-Chip should be generalized to System-in in-a- Package (SiP). SiP provides new opportunities for gigascale integration. SiP brings cost-effective alternatives for embedded DRAM. Area-IO opens up a new paradigm for trading off on-chip interconnect versus on-package interconnect. Configurable memory enables single memory chip to meet the requirements for various applications. Cost/performance and design feasibility early analysis is highly desired for SiP implementation platform. Page 16