edram to the Rescue Why edram 1/3 Area 1/5 Power SER 2-3 Fit/Mbit vs 2k-5k for SRAM Smaller is faster What s Next?

Similar documents
Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.

Teaching the Old Dog some New Tricks!

ENEE 759H, Spring 2005 Memory Systems: Architecture and

MEMORIES. Memories. EEC 116, B. Baas 3

Calibrating Achievable Design GSRC Annual Review June 9, 2002

Lecture-14 (Memory Hierarchy) CS422-Spring

The Memory Hierarchy 1

Memory in Digital Systems

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 10: Three-Dimensional (3D) Integration

Unleashing the Power of Embedded DRAM

3D Integration & Packaging Challenges with through-silicon-vias (TSV)

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

A 5.2 GHz Microprocessor Chip for the IBM zenterprise

Moore s s Law, 40 years and Counting

BREAKING THE MEMORY WALL

A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context.

A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

Memory in Digital Systems

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Deep Sub-Micron Cache Design

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

Congestion-Aware Power Grid. and CMOS Decoupling Capacitors. Pingqiang Zhou Karthikk Sridharan Sachin S. Sapatnekar

The Processor That Don't Cost a Thing

CS 152 Computer Architecture and Engineering

Memory in Digital Systems

FABRICATION TECHNOLOGIES

A Write-Back-Free 2T1D Embedded. a Dual-Row-Access Low Power Mode.

Lecture 20: Package, Power, and I/O

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

EEM 486: Computer Architecture. Lecture 9. Memory

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

SRC 3D Summit. Bob Patti, CTO

Three DIMENSIONAL-CHIPS

On GPU Bus Power Reduction with 3D IC Technologies

3D Integration: New Opportunities for Speed, Power and Performance. Robert Patti, CTO

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

+1 (479)

memories The world of Memories DEEP SUBMICRON CMOS DESIGN 10. Memories

Monolithic 3D Integration using Standard Fab & Standard Transistors. Zvi Or-Bach CEO MonolithIC 3D Inc.

Memory Classification revisited. Slide 3

COMPUTER ARCHITECTURES

Chapter 0 Introduction

CSE502: Computer Architecture CSE 502: Computer Architecture

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University

Introduction to memory system :from device to system

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

Stacked Silicon Interconnect Technology (SSIT)

Power Technology For a Smarter Future

Near-Threshold Computing: Reclaiming Moore s Law

TechSearch International, Inc.

10. Interconnects in CMOS Technology

3D SYSTEM INTEGRATION TECHNOLOGY CHOICES AND CHALLENGE ERIC BEYNE, ANTONIO LA MANNA

Lost in the Bermuda Triangle: Energy, Complexity, and Performance. Dennis Abts Cray Inc.

Computer Architecture

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape

Macro in a Generic Logic Process with No Boosted Supplies

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN

Computers: Inside and Out

Pushing the Boundaries of Moore's Law to Transition from FPGA to All Programmable Platform Ivo Bolsens, SVP & CTO Xilinx ISPD, March 2017

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

Consideration for Advancing Technology in Computer System Packaging. Dale Becker, Ph.D. IBM Corporation, Poughkeepsie, NY

Lecture 14. Advanced Technologies on SRAM. Fundamentals of SRAM State-of-the-Art SRAM Performance FinFET-based SRAM Issues SRAM Alternatives

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Memory Technology March 15, 2001

Open Innovation with Power8

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Magnetic core memory (1951) cm 2 ( bit)

An Overview of Standard Cell Based Digital VLSI Design

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly)

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

POWER7: IBM's Next Generation Server Processor

Systems-on-a-Chip (SoCs)

3D-IC is Now Real: Wide-IO is Driving 3D-IC TSV. Samta Bansal and Marc Greenberg, Cadence EDPS Monterey, CA April 5-6, 2012

POWER7+ TM IBM IBM Corporation

CMOS Logic Circuit Design Link( リンク ): センター教官講義ノートの下 CMOS 論理回路設計

Concept of Memory. The memory of computer is broadly categories into two categories:

Japanese two Samurai semiconductor ventures succeeded in near 3D-IC but failed the business, why? and what's left?

Additional Slides for Lecture 17. EE 271 Lecture 17

ECE 485/585 Microprocessor System Design

Current status of SOI / MPU and ASIC development for space

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

Technology Platform Segmentation

Industry Trends in 3D and Advanced Packaging

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Packaging of Selected Advanced Logic in 2x and 1x nodes. 1 I TechInsights

TABLE OF CONTENTS III. Section 1. Executive Summary

The communication bottleneck

Non-destructive, High-resolution Fault Imaging for Package Failure Analysis. with 3D X-ray Microscopy. Application Note

BRIDGING THE GLOBE WITH INNOVATIVE TECHNOLOGY

CS 201 The Memory Hierarchy. Gerson Robboy Portland State University

IMEC CORE CMOS P. MARCHAL

DRAM with Boosted 3T Gain Cell, PVT-tracking Read Reference Bias

ECE 5745 Complex Digital ASIC Design Topic 7: Packaging, Power Distribution, Clocking, and I/O

Advanced 1 Transistor DRAM Cells

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

Key Challenges in High-End Server Interconnects. Hubert Harrer

Transcription:

edram to the Rescue Why edram 1/3 Area 1/5 Power SER 2-3 Fit/Mbit vs 2k-5k for SRAM Smaller is faster What s Next? 1

Integrating DRAM and Logic Integrate with Logic without impacting logic Performance, Reliability or Yields The Deep Trench process is intrinsically logic friendly Capacitor fabricated first No perturbation of the remaining logic flow Significant Process & Test knowhow in DRAMs using deep trench technology Over six generations, embedded DRAM has adapted to logic technology resulting In simpler processes and significantly higher cell performance 2

Technology Innovation Development of SOI DRAM cell Technology: Use the Buried oxide to simplify the process & reduce parasitics half the cost of bulk edram Array passgate W Ct BOX Deep trench capacitors Cu M1 Bitline STI Scale the pass transistor for higher performance 800 700 Design Address retention through concurrent refresh Vt(mV) 600 500 400 300 200 100 Conventional Logic-like 90 nm Ldes 3 Ultra short bitlines with direct sense architecture 0 0.01 0.1 1 Ldes(um)

edram Performance Advantage? 4

ISSCC 04 Paper 27.3

CPU ISSCC 04 CPU Bus Logic L3 Tag Chip size 432mm 2 245mm 2 (43% smaller) CPU with DRAM (4x denser/2x slower) CPU Bus Logic L3 L3 Tag *14mm L3 *23mm L3 Cache * 39% decrease in wire delay to furthest L3-cache subarray L3 Latency (normalized to 20 cycles) L3-Tag L3-Cache Wire Delay Total SRAM 5 cycles 5 cycles 10 cycles 20 cycles edram 5 cycles 10 cycles 6 cycles 21 cycles

edram Size/Latency Advantage Delay (ns) 3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 45nm edram vs. SRAM Latency edram Total Latency SRAM Total Latency edram Wire/Repeater Delay SRAM Wire/Repeater Delay 1Mb 4Mb 8Mb 16Mb 32Mb 64Mb Memory Block Size Built With 1Mb Macros edram has lower latency!

edram in IBM systems edram used on the MCM till 65 nm in p systems Power 4,5, 6 MCM P5 CPU (SOI) edram integrated with power PC in BlueGene usic ASICs flow In 45 nm we integrated edram with the processor in SOI 36MB edram L3 cache (Bulk) 8

Integrating large amounts of low latency memory is huge challenge for modern multi-core processor design Mem Ctrl Local SMP Links Remote SMP + I/O Links L3 Cache and Chip Interconnect Mem Ctrl P7 die Hot Chips 2009

Deep Trench Capacitor Advantage VCS Grid 1 st Stage Sense Amp Global Deep Trench Decoupling Capacitors DT decoupling capacitance: 200fF/μm 2 o Total cap of 300pF o 20X efficiency of MOSCAP 128 X 128 Decode DT connected to VCS grid o Grid on Metal 4 (low R) No performance impact o 200mV charge up in < 150ps 2 nd Stage Sense DIO CONTROL DIO 10

Stacked DRAMs to increase capacity without increasing power I/O ckts drive considerable power requirements These do not need to e duplicated on multiple DRAMs Master Slave approach to share I/Os, PLLs etc Enabled by TSVs Must compete with conventional scaled 3D chip Kang et al, ISSCC 2009

Getting Power in and out of high-end Processors is the challenge 150 200 W versus a few watts Need to deliver about 200 A to die uniformly Perimeter Wire Bond Inadequate High Power Die needs uniform power delivery across the Die (Grid) Need to heat sink the die

Structural Schematic of 3D integration of cache and processor Why TSVs TSVs P M Heat Sink Processor Die Thinned memory Die with TSVs P die: 11 level build Die-die interconnect @~50 μm pitch M die: 6LM, TSVs, lots of decaps C4s @186 μm pitch Laminate TSV: Through Silicon Via C4s: bumps