Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

Similar documents
Tera-scale Computing and Interconnect Challenges

Advancing high performance heterogeneous integration through die stacking

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Intel: Driving the Future of IT Technologies. Kevin C. Kahn Senior Fellow, Intel Labs Intel Corporation

Bringing 3D Integration to Packaging Mainstream

Jeff Kash, Dan Kuchta, Fuad Doany, Clint Schow, Frank Libsch, Russell Budd, Yoichi Taira, Shigeru Nakagawa, Bert Offrein, Marc Taubenblatt

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 10: Three-Dimensional (3D) Integration

3D-IC is Now Real: Wide-IO is Driving 3D-IC TSV. Samta Bansal and Marc Greenberg, Cadence EDPS Monterey, CA April 5-6, 2012

L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers

Tera-scale Computing Research

2009 International Solid-State Circuits Conference Intel Paper Highlights

Multi-Core Microprocessor Chips: Motivation & Challenges

1. NoCs: What s the point?

3D Integration & Packaging Challenges with through-silicon-vias (TSV)

3D TECHNOLOGIES: SOME PERSPECTIVES FOR MEMORY INTERCONNECT AND CONTROLLER

High Performance Memory in FPGAs

Xilinx SSI Technology Concept to Silicon Development Overview

From Majorca with love

Pushing the Boundaries of Moore's Law to Transition from FPGA to All Programmable Platform Ivo Bolsens, SVP & CTO Xilinx ISPD, March 2017

Emerging IC Packaging Platforms for ICT Systems - MEPTEC, IMAPS and SEMI Bay Area Luncheon Presentation

High Performance Memory Opportunities in 2.5D Network Flow Processors

Stacked Silicon Interconnect Technology (SSIT)

The Future of Electrical I/O for Microprocessors. Frank O Mahony Intel Labs, Hillsboro, OR USA

Moore s s Law, 40 years and Counting

All Programmable: from Silicon to System

3D & Advanced Packaging

3D SYSTEM INTEGRATION TECHNOLOGY CHOICES AND CHALLENGE ERIC BEYNE, ANTONIO LA MANNA

Optical Interconnects: Trend and Applications

3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

VLSI Design Automation

Consideration for Advancing Technology in Computer System Packaging. Dale Becker, Ph.D. IBM Corporation, Poughkeepsie, NY

Comparison & highlight on the last 3D TSV technologies trends Romain Fraux

The FPGA: An Engine for Innovation in Silicon and Packaging Technology

SMAFTI Package Technology Features Wide-Band and Large-Capacity Memory

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

VLSI Design Automation

Moore s Law: Alive and Well. Mark Bohr Intel Senior Fellow

Outline Marquette University

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1

From 3D Toolbox to 3D Integration: Examples of Successful 3D Applicative Demonstrators N.Sillon. CEA. All rights reserved

Opportunities & Challenges: 28nm & 2.5/3-D IC Design and Manufacturing

PSMC Roadmap For Integrated Photonics Manufacturing

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

BREAKING THE MEMORY WALL

SoC Memory Interfaces. Today and tomorrow at TSMC 2013 TSMC, Ltd

3D technology evolution to smart interposer and high density 3D ICs

HPC Technology Trends

TechSearch International, Inc.

Multi-Die Packaging How Ready Are We?

MICROPROCESSOR TECHNOLOGY

Monolithic Integration of Energy-efficient CMOS Silicon Photonic Interconnects

EE586 VLSI Design. Partha Pande School of EECS Washington State University

Advanced Packaging For Mobile and Growth Products

Stacking Untested Wafers to Improve Yield. The 3D Enigma

Packaging Technology for Image-Processing LSI

Chapter 0 Introduction

CMOS Photonic Processor-Memory Networks

Philippe Thierry Sr Staff Engineer Intel Corp.

Trends in the Infrastructure of Computing

Parallelism for the Masses: Opportunities and Challenges Andrew A. Chien

Petascale Computing Research Challenges

High Volume Manufacturing Supply Chain Ecosystem for 2.5D HBM2 ASIC SiPs

VLSI Design Automation. Maurizio Palesi

Bridging the Real World with the Digital

Parallelism for the Masses: Opportunities and Challenges

Packaging of Selected Advanced Logic in 2x and 1x nodes. 1 I TechInsights

Technology Platform Segmentation

A 1.5GHz Third Generation Itanium Processor

TSV : impact on microelectronics European 3D TSV Summit MINATEC Campus Grenoble, January 22nd, 2013

3D Hetero-Integration Technology for Future Automotive Smart Vehicle System

The Memory Hierarchy 1

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

VCSEL-based solderable optical modules

Part IV: 3D WiNoC Architectures

Photon-to-Photon CMOS Imager: Opto-Electronic 3D Integration

Exploiting Dark Silicon in Server Design. Nikos Hardavellas Northwestern University, EECS

Design and Technology Trends

Physical Design Implementation for 3D IC Methodology and Tools. Dave Noice Vassilios Gerousis

IMEC CORE CMOS P. MARCHAL

CMSC 411 Computer Systems Architecture Lecture 2 Trends in Technology. Moore s Law: 2X transistors / year

Vertical Circuits. Small Footprint Stacked Die Package and HVM Supply Chain Readiness. November 10, Marc Robinson Vertical Circuits, Inc

WaferBoard Rapid Prototyping

ECE484 VLSI Digital Circuits Fall Lecture 01: Introduction

Silicon Based Packaging for 400/800/1600 Gb/s Optical Interconnects

Non-contact Test at Advanced Process Nodes

MARKET PERSPECTIVE: SEMICONDUCTOR TREND OF 2.5D/3D IC WITH OPTICAL INTERFACES PHILIPPE ABSIL, IMEC

Burn-in & Test Socket Workshop

Package (1C) Young Won Lim 3/13/13

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

Industry Trends in 3D and Advanced Packaging

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN

ECE232: Hardware Organization and Design

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Japanese two Samurai semiconductor ventures succeeded in near 3D-IC but failed the business, why? and what's left?

The Design of the KiloCore Chip

Beyond Moore. Beyond Programmable Logic.

Thermal Management Challenges in Mobile Integrated Systems

Intel HPC Technologies Outlook

Heterogeneous Integration and the Photonics Packaging Roadmap

Transcription:

Interconnect Challenges in a Many Core Compute Environment Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

Agenda Microprocessor general trends Implications Tradeoffs Summary 2

Microprocessor Evolution 1985 2009 Intel386 Nehalem Transistor Count 280 Thousand 731 Million Frequency 16 MHz > 3.6 GHz Core Count 1 4 Cache Size None 8 MB I/O Peak BW 64 MB/s 50 GB/s Adaptive Circuits None Sleep Mode Turbo Mode Power Gating Adaptive Frequency Clocking 3 3

System Integration Discrete 2-D Integration (SoC) High Density Memory High Speed Memory Low Power Logic High Perf. Logic Power Regulator Sensors Radio Photonics 3-D Integration Logic Memory Power Reg. Radio Sensors Photonics System integration needed for performance, power, form factor Challenge is to integrate wider range of heterogeneous elements 4 4

A Tera-scale Platform Vision Cache Cache Cache Special Purpose Engines Integrated IO devices Scalable On-die Interconnect Fabric Last Level Cache Last Level Cache Last Level Cache Integrated Memory Controllers Off Die interconnect High Bandwidth Memory IO Socket Inter- Connect Multi-core today becomes many core tomorrow 5

Visual Computing 3D and more Looks real Acts real Feels real Acts Real = Physics and Behavioral Intelligence Looks Real = Life-like Images 6

Platform Performance Demands TYPE SOFTWARE MAX CLIENTS PERSERVER SERVERS: 10x More Work MMORPGS WoW 2500 VWs Second Life 160 CLIENTS: 3x CPU, 20x GPU APPLICATION % CPU UTILIZATION % GPU UTILIZATION 2D Websites 20 0-1 Google Earth 50 10-15 Second Life 70 35-75 NETWORK: 100x Bandwidth Maximum Bandwidth Limited by Server to Client Bandwidth (In KB/s)) 10 0 50 0 Cached Uncached 25 50 75 100 125 150 Time (In Seconds) Sources: WoW data (source www.warcraftrealms.com), Second Life data (source CTO- CTO meeting and www.secondlife.com), and Intel measurements. 7

Research shows Applications Scale Well Parallel Speedup 64 48 32 16 0 0 16 32 48 64 Cores Production Fluid Production Face Production Cloth Game Fluid Game Rigid Body Game Cloth Marching Cubes Sports Video Analysis Video Cast Indexing Home Video Editing Text Indexing Ray Tracing Foreground Estimation Human Body Tracker Portifolio Management Geometric Mean Many applications can be parallelized across a large number of core with almost 1:1 speed-up 8

21.72mm Teraflops Research Processor Technology Transistors Die Area C4 bumps # PLL I/O Area 12.64mm I/O Area single tile 1.5mm 2.0mm 65nm, 1 poly, 8 metal (Cu) 100 Million (full-chip) 1.2 Million (tile) 275mm 2 (full-chip) 3mm 2 (tile) 8390 TAP Goals: Deliver Tera-scale performance - Single precision TFLOP at desktop power - Frequency target 5GHz - Bi-section B/W order of Terabits/s - Link bandwidth in hundreds of GB/s Prototype two key technologies - On-die interconnect fabric - 3D stacked memory Develop a scalable design methodology - Tiled design approach - Mesochronous clocking - Power-aware capability 1 TeraFLOP in < 60 W envelope 100 Million Transistors 80 Tiles 275mm 2 9

Desktop Platform Bandwidth Roadmap - driven by 2x perf/2 yrs Tera-scale Computing 100 s GB/s FSB and Memory Effective Bandwidth, Effective CPU core MHz 100000 10000 1000 100 GB/s 10 GB/s PentiumIII FSB 64 bit SDRAM 64 bit EDO Pentium4 FSB 2 ch. RDRAM 128 bit DDR 128 bit DDR2 192 bit DDR2/3 CSI Eff CPU Core Freq Mem Eff BW FSB EFF BW 2x / 21 mon 2x / 24 mon 2x / 27 mon 100 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 Staying on-trend puts memory BW target at 25-40 GB/s for the 2009-2011 time frame - Tera-scale implies a 10x increase beyond that! 10

Future CPU s Tera-Scale Performance 100 s GB/s DIMMs Bulk memory 10 s GB/s Bulk Storage 1 s TB/s 1 s TB/s 1 s GB/s Another box, shelf or rack Tera FLOPS implies tera bytes/sec bandwidth 11

IO Link Attributes Application Attributes Box-Box Package- Package Chip-Chip On-Die Distance Long (~10m) Medium (~10 s cm) Medium (~1cm) Short (~1mm) Speed Low (10Gbps) Medium (10-40Gb/s) High (40Gbps) High (40+Gbps) Power High (10 s mw/gbps) Medium (<5mW/Gbps) Low (<1mW/Gbps) Low (<1mW/Gbps) Temperature Low (<70C) Medium (up to ~80C) High (up to ~ 100C) High (up to ~ 100C) Form Factor Large (cm s) Medium (~1cm) Small (a few mm s) Small (a few mm s) Physical Layer Optical or Electrical Std Hybrid and/or Optical Electrical 3D stack or Die array On-die metal layers 12

Stacking Memory w/ CPU 256 KB SRAM per core 4X C4 bump density 8490 thru-silicon vias CPU with Cu bump Polaris Denser than C4 pitch Memory Thru-Silicon Via C4 pitch Package 13

BBUL-W Approach Wafer thin film processes used to interconnect dies Dies reconstituted into molded reconfigured wafer* *Ref.: M. Brunnbauer et al., An embedded device technology based on a molded reconfigured wafer, in Proc. ECTC, pp. 547, May 2006. Singulated BBUL-W package connected to interposer 14

Silicon Interposer Refs.: M. Sunohara et al., Silicon interposer with TSVs (through silicon vias) and fine multilayer wiring, in Proc. ECTC, pp. 847, May 2008. K. Kumagai et al., A silicon interposer BGA package with Cu-filled TSV and multi-layer Cu-plating interconnect, in Proc. ECTC, pp. 571, May 2008. 15

Optical Interconnect on CPU Platform Box-Box Package-Package Memory CPU CPU Chip-Chip On-Die/On-Package More Difficult Longer Term 16

Optical Infinband* Product Parameters: - Copper cable replacement - 4 channels in each direction at up to 5Gb/s per channel for DDR product (10Gb/s per channel for QDR product) Product description: - Two 4 channel transceivers coupled with 12 channel 62.5 or 50um fiber - Plastic lens for optical coupling - 1x4 array VCSEL and PIN - 4 channel tx and rx IC designed by Intel, and fabbed by TSMC - dies are attached directly onto a PCB and wire bonding from chip to chip and chip to pcb *No longer an Intel product 17

Summary Parallelism in Many-Core platforms is key to delivering continued performance and fine-grained power management Future applications stress the entire interconnect stack, from on-die to the broader network. New multi-chip packaging approaches are required to address more aggressive integration. Optical to electrical transition shrinking now approaching sub 1 m, but highly dependent upon specific application/topology. Entire HW/SW stack is impacted by the move to parallelism requiring broad ecosystem enablement 18