New Dimensions in Microarchitecture Harnessing 3D Integration Technologies

Similar documents
Multi-Modal Communication

Service Level Agreements: An Approach to Software Lifecycle Management. CDR Leonard Gaines Naval Supply Systems Command 29 January 2003

Setting the Standard for Real-Time Digital Signal Processing Pentek Seminar Series. Digital IF Standardization

An Update on CORBA Performance for HPEC Algorithms. Bill Beckwith Objective Interface Systems, Inc.

4. Lessons Learned in Introducing MBSE: 2009 to 2012

DoD Common Access Card Information Brief. Smart Card Project Managers Group

Topology Control from Bottom to Top

FUDSChem. Brian Jordan With the assistance of Deb Walker. Formerly Used Defense Site Chemistry Database. USACE-Albuquerque District.

Dana Sinno MIT Lincoln Laboratory 244 Wood Street Lexington, MA phone:

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs

Distributed Real-Time Embedded Video Processing

Requirements for Scalable Application Specific Processing in Commercial HPEC

MODELING AND SIMULATION OF LIQUID MOLDING PROCESSES. Pavel Simacek Center for Composite Materials University of Delaware

73rd MORSS CD Cover Page UNCLASSIFIED DISCLOSURE FORM CD Presentation

CASE STUDY: Using Field Programmable Gate Arrays in a Beowulf Cluster

Kathleen Fisher Program Manager, Information Innovation Office

Concept of Operations Discussion Summary

High-Assurance Security/Safety on HPEC Systems: an Oxymoron?

COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS AND DEVELOPMENT OF HALON- REPLACEMENT FIRE EXTINGUISHING SYSTEMS (PHASE II)

The C2 Workstation and Data Replication over Disadvantaged Tactical Communication Links

Using Model-Theoretic Invariants for Semantic Integration. Michael Gruninger NIST / Institute for Systems Research University of Maryland

Washington University

A Review of the 2007 Air Force Inaugural Sustainability Report

ARINC653 AADL Annex Update

Running CyberCIEGE on Linux without Windows

A Distributed Parallel Processing System for Command and Control Imagery

Empirically Based Analysis: The DDoS Case

75th Air Base Wing. Effective Data Stewarding Measures in Support of EESOH-MIS

LARGE AREA, REAL TIME INSPECTION OF ROCKET MOTORS USING A NOVEL HANDHELD ULTRASOUND CAMERA

73rd MORSS CD Cover Page UNCLASSIFIED DISCLOSURE FORM CD Presentation

Information, Decision, & Complex Networks AFOSR/RTC Overview

ATCCIS Replication Mechanism (ARM)

Sparse Linear Solver for Power System Analyis using FPGA

Technological Advances In Emergency Management

Towards a Formal Pedigree Ontology for Level-One Sensor Fusion

ENVIRONMENTAL MANAGEMENT SYSTEM WEB SITE (EMSWeb)

Balancing Transport and Physical Layers in Wireless Ad Hoc Networks: Jointly Optimal Congestion Control and Power Control

DoD M&S Project: Standardized Documentation for Verification, Validation, and Accreditation

Accuracy of Computed Water Surface Profiles

Rockwell Science Center (RSC) Technologies for WDM Components

Vision Protection Army Technology Objective (ATO) Overview for GVSET VIP Day. Sensors from Laser Weapons Date: 17 Jul 09 UNCLASSIFIED

ASPECTS OF USE OF CFD FOR UAV CONFIGURATION DESIGN

Space and Missile Systems Center

2013 US State of Cybercrime Survey

COTS Multicore Processors in Avionics Systems: Challenges and Solutions

CENTER FOR ADVANCED ENERGY SYSTEM Rutgers University. Field Management for Industrial Assessment Centers Appointed By USDOE

The extreme Adaptive DSP Solution to Sensor Data Processing

Data Warehousing. HCAT Program Review Park City, UT July Keith Legg,

SURVIVABILITY ENHANCED RUN-FLAT

Title: An Integrated Design Environment to Evaluate Power/Performance Tradeoffs for Sensor Network Applications 1

Monte Carlo Techniques for Estimating Power in Aircraft T&E Tests. Todd Remund Dr. William Kitto EDWARDS AFB, CA. July 2011

Dr. ing. Rune Aasgaard SINTEF Applied Mathematics PO Box 124 Blindern N-0314 Oslo NORWAY

ASSESSMENT OF A BAYESIAN MODEL AND TEST VALIDATION METHOD

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

Dr. Stuart Dickinson Dr. Donald H. Steinbrecher Naval Undersea Warfare Center, Newport, RI May 10, 2011

Shallow Ocean Bottom BRDF Prediction, Modeling, and Inversion via Simulation with Surface/Volume Data Derived from X-Ray Tomography

Compliant Baffle for Large Telescope Daylight Imaging. Stacie Williams Air Force Research Laboratory ABSTRACT

Lessons Learned in Adapting a Software System to a Micro Computer

Space and Missile Systems Center

U.S. Army Research, Development and Engineering Command (IDAS) Briefer: Jason Morse ARMED Team Leader Ground System Survivability, TARDEC

Architecting for Resiliency Army s Common Operating Environment (COE) SERC

C2-Simulation Interoperability in NATO

High-Performance Linear Algebra Processor using FPGA

MATREX Run Time Interface (RTI) DoD M&S Conference 10 March 2008

SINOVIA An open approach for heterogeneous ISR systems inter-operability

Parallel Matlab: RTExpress on 64-bit SGI Altix with SCSL and MPT

INTEGRATING LOCAL AND GLOBAL NAVIGATION IN UNMANNED GROUND VEHICLES

Nationwide Automatic Identification System (NAIS) Overview. CG 939 Mr. E. G. Lockhart TEXAS II Conference 3 Sep 2008

NEW FINITE ELEMENT / MULTIBODY SYSTEM ALGORITHM FOR MODELING FLEXIBLE TRACKED VEHICLES

Increased Underwater Optical Imaging Performance Via Multiple Autonomous Underwater Vehicles

Wireless Connectivity of Swarms in Presence of Obstacles

Use of the Polarized Radiance Distribution Camera System in the RADYO Program

Web Site update. 21st HCAT Program Review Toronto, September 26, Keith Legg

BUPT at TREC 2009: Entity Track

Defense Hotline Allegations Concerning Contractor-Invoiced Travel for U.S. Army Corps of Engineers' Contracts W912DY-10-D-0014 and W912DY-10-D-0024

REPORT DOCUMENTATION PAGE

M&S Strategic Initiatives to Support Test & Evaluation

AFRL-ML-WP-TM

Exploring the Query Expansion Methods for Concept Based Representation

DEVELOPMENT OF A NOVEL MICROWAVE RADAR SYSTEM USING ANGULAR CORRELATION FOR THE DETECTION OF BURIED OBJECTS IN SANDY SOILS

Use of the Polarized Radiance Distribution Camera System in the RADYO Program

75 TH MORSS CD Cover Page. If you would like your presentation included in the 75 th MORSS Final Report CD it must :

Headquarters U.S. Air Force. EMS Play-by-Play: Using Air Force Playbooks to Standardize EMS

Using Templates to Support Crisis Action Mission Planning

Parallelization of a Electromagnetic Analysis Tool

Corrosion Prevention and Control Database. Bob Barbin 07 February 2011 ASETSDefense 2011

ENHANCED FRAGMENTATION MODELING

Col Jaap de Die Chairman Steering Committee CEPA-11. NMSG, October

NATO-IST-124 Experimentation Instructions

Agile Modeling of Component Connections for Simulation and Design of Complex Vehicle Structures

HEC-FFA Flood Frequency Analysis

Edwards Air Force Base Accelerates Flight Test Data Analysis Using MATLAB and Math Works. John Bourgeois EDWARDS AFB, CA. PRESENTED ON: 10 June 2010

US Army Industry Day Conference Boeing SBIR/STTR Program Overview

Virtual Prototyping with SPEED

WAITING ON MORE THAN 64 HANDLES

Advanced Numerical Methods for Numerical Weather Prediction

Guide to Windows 2000 Kerberos Settings

SETTING UP AN NTP SERVER AT THE ROYAL OBSERVATORY OF BELGIUM

DATABASE FOR TA (PL) AND UTC (PL)

QuanTM Architecture for Web Services

Transcription:

New Dimensions in Microarchitecture Harnessing 3D Integration Technologies Kerry Bernstein IBM T.J. Watson Research Center Yorktown Heights, NY 6 March, 27 San Jose, California DARPA Microsystems Technology Symposium Escher Envy courtesy of David Bryant

Report Documentation Page Form Approved OMB No. 74-188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 124, Arlington VA 2222-432. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 6 MAR 27 2. REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE New Dimensions in Microarchitecture Harnessing 3D Integration Technologies 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) IBM Corporation 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 1. SPONSOR/MONITOR S ACRONYM(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, distribution unlimited 11. SPONSOR/MONITOR S REPORT NUMBER(S) 13. SUPPLEMENTARY NOTES DARPA Microsystems Technology Symposium held in San Jose, California on March 5-7, 27. Presentations, The original document contains color images. 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU a. REPORT unclassified b. ABSTRACT unclassified c. THIS PAGE unclassified 18. NUMBER OF PAGES 12 19a. NAME OF RESPONSIBLE PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

IBM Research Server Trends Frequency no longer increasing Logic speed scaled faster than memory bus (Processor clocks / Bus clock) consumes bandwidth More speculation; attempts to prefetch Wrong guesses increase miss traffic Shortening linesize limited by directory as cache size grows But doubling linesize doubles bus occupancy Cores / die increasing each generation Multiplies off-chip bus transactions by N / 2*Sqrt(2) More threads per core, and increase in virtualization Multiplies off-chip bus transactions by N Processors / SMP increasing Aggravates queueing throughout the system 2 March 6, 27 25 IBM Corporation

Without more bandwidth at low latencies, core counts on chip will saturate Max Core Count In 3D? Performance (A.U.) Number of Cores per Processor System Perf 27 Max Core Count In 2D Device Technology Performance Interconnect Technology Performance 2D Core Bandwidth Limit From New Dimensions in Microarchitecture K. Bernstein, MICRO-39, 12/6 Year 6 March, 27 New Dimensions in Microarchitecture 3D extends transfer of performance from the device to the core level 3

Chip-Package Technology Gap Design rule - pitch(µm) 1 1 1.1.1 199 Package/Flip-chip chip Wire bond 25 ITRS Flip chip 25 ITRS Wire bond 21 ITRS Flip chip 21 ITRS Multi-level level wiring 2 Technology Gap Global (minimum) ITRS 25 ITRS 21 Intermediate ITRS 25 ITRS 21 Local (metal 1 wiring) ITRS 25 ITRS 21 21 Comparison of of trends adapted from from 21 21 ITRS ITRS road road map map to to data data points from 25 ITRS road map Technology gap in the design rule between on-chip wiring and packaging interconnects 4 DARPA MTS March 6, 27 26 IBM Corporation

POWER Series FP Performance POWER Series Architectural Perf Contributions 4 Bandwidth 3 SpecFP Thousands 2 1 Superscalar Clock Gating Speculative Execution Branch Prediction. Multi-core Processor Deep Pipelining Out-of-Order Execution Register Renaming 64 Bit Dataflow POWER3 POWER4 POWER4+ POWER5 POWER5+ Processor Perf Data Delivery Transaction Rate Dependence 6 March, 27 New Dimensions in Microarchitecture Symmetric Multi-threading P5 5

Components of Processor Performance From ISCA 6 Keynote address by Phil Emma, IBM Finite Cache Effect Inf. Cache Perf. L1 Size limited by Cycle time I Unit etc. L2 Cache L1 Cache E Not Busy E Busy E Unit Cache, Memory Hierarchy Uniprocessor Core CPI MIPS = f / CPI Slope = Miss Penalty (Cycles Per Miss) Inf. Cache Perf. E Busy Miss Rate Finite Cache Effect E Not Busy Delay is sequentially determined by a) ideal processor, b) access to local cache, and c) refill of cache 6 March, 27 New Dimensions in Microarchitecture 6

1 From ISCA 6 Keynote address by Phil Emma, IBM Queueing Effects vs. Log Miss Rate.8 Relative Performance &.6 Bus Utilization CPI = 3.63 TE=128 TE=32 TE=8 TE=2.4.2 TE=128 TE=32 TE=8 TE=2 1 9 8 7 6 5 4 3 2 1-1 -2-3 Log (Instructions per Miss) 2 = Relative Performance, = Bus Utilization 6 March, 27 New Dimensions in Microarchitecture 7

What Is Bandwidth Used For? In a computer, it is mostly for handling cache misses: From ISCA 6 Keynote address by Phil Emma, IBM Bus Events Miss Leading Edge Access First Data Processor Events Rest of Cache Line Trailing Edge Last Data Time Miss Penalty = Leading Edge + Effects(Trailing Edge) 6 March, 27 New Dimensions in Microarchitecture 8

3D - Bandwidth and Latency From New Dimensions in Microarchitecture K. Bernstein, MICRO-39, 12/6 Processor load trade-off between I/O Bandwidth, Bus Latency. - For generic workloads, uni-processor perf saturates bandwidth benefit, becomes latency-limited. - As core counts increase, I/O Bandwidth becomes increasingly important Arch Perf, TpCC (Norm) 4 3 2 1 Bandwidth and Latency Boundaries General Purpose Processor Loads Band width limited Latency limited 1 2 3 4 Single Core Double Core Quad Core Bandwidth, GB/Sec (Norm) 3D opportunity for improving High Perf Compute thruput in sustaining a higher number of cores per chip 6 March, 27 New Dimensions in Microarchitecture 9

IBM Research Low Vdd Technology and Parallelism Energy optimum for fixed performance as function of V dd, V T and effectiveness of parallelism α determines the device (circuit) over head to maintain constant performance through parallelism α =1 no overhead: half the speed double the devices α > 1 increasing overhead: passive power becomes dominant Total Energy / Transaction 1.8.6.4.2 Energy per Transaction at Constant Performance 2 Stages, Activity=1% α = 2 α = 1.2.2.4.6.8 1 Vdd α = 1 P ~ P d d N N ( ) 1 α N=number of ckts, d=ckt delay N o = number of ckts at 1V d o = ckt delay at 1V From 3D Intergration Special Topic Sessionl W. Haensch, ISSCC 7, 2/7 1 March 6, 27 25 IBM Corporation

Two Classes of 3DI Processes at IBM ~.2 um ~ 6 um ~ 1.4-2. um SOI wafer ~ 5 um Bulk wafer SOI-3D Bulk-3D SOI top layer Advantage: Smallest 3D vias Bulk top layer Advantage: Broader foundry compatibility 11 DARPA MTS March 6, 27 27 IBM Corporation

Summary λp architecture tricks to avoid atomistic, QM scaling boundaries overwhelm present interconnect technologies Integration into Z-plane again postpones interconnectrelated limitations to extending classic scaling. No aspect of architecture or technology remains 2D, so why even view chips as being monolithic anymore? Transaction retirement rate dependence on data delivery is increasing: dependence on λp performance and CMOS device speed is decreasing 3D Integration improves storage density and access to that storage The last remaining opportunity in CMOS to save power is in delivery of data rather than in its generation. 6 March, 27 New Dimensions in Microarchitecture 12