Multi-Core Microprocessor Chips: Motivation & Challenges

Similar documents
Lecture 2: Performance

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

INTEL MULTI-CORE FACTS, FIGURES AND DECODER RING

From Dual Core to Multi-Core, is everybody ready?

Moore s s Law, 40 years and Counting

Lecture 1: CS/ECE 3810 Introduction

Intel Enterprise Processors Technology

Microprocessor Trends and Implications for the Future

Fundamentals of Quantitative Design and Analysis

Outline Marquette University

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Copyright 2012, Elsevier Inc. All rights reserved.

MICROPROCESSOR TECHNOLOGY

Advanced Computer Architecture (CS620)

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Parallel Software 2.0

VLSI Design Automation

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

How many cores are too many cores? Dr. Avi Mendelson, Intel - Mobile Processors Architecture group

Computer Architecture!

CS/EE 6810: Computer Architecture

Transistors and Wires

ECE 486/586. Computer Architecture. Lecture # 2

HPC Technology Trends

Lecture 1: Introduction

Performance of computer systems

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Architecture at HP: Two Decades of Innovation

FABRICATION TECHNOLOGIES

VLSI Design Automation

EE586 VLSI Design. Partha Pande School of EECS Washington State University

2009 International Solid-State Circuits Conference Intel Paper Highlights

Computer Architecture

Inside Intel Core Microarchitecture

Computer Architecture s Changing Definition

Announcements. Advanced Digital Integrated Circuits. No office hour next Monday. Lecture 2: Scaling Trends

Computer Architecture

CS758: Multicore Programming

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

EECS4201 Computer Architecture

Chapter 2: Computer-System Structures. Hmm this looks like a Computer System?

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Uniprocessor Computer Architecture Example: Cray T3E

HISTORY OF MICROPROCESSORS

VLSI Design Automation. Maurizio Palesi

Chap. 4 Multiprocessors and Thread-Level Parallelism

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

ECE520 VLSI Design. Lecture 1: Introduction to VLSI Technology. Payman Zarkesh-Ha

Calendar Description

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101

Intel SSD Data center evolution

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

High performance, power-efficient DSPs based on the TI C64x

Chapter 1: Introduction to the Microprocessor and Computer 1 1 A HISTORICAL BACKGROUND

Understanding Dual-processors, Hyper-Threading Technology, and Multicore Systems

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

Chapter 1: Fundamentals of Quantitative Design and Analysis

Microprocessor and DSP Technologies for the Nanoscale Era

Goro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America

Computer Architecture = CS/ECE 552: Introduction to Computer Architecture. 552 In Context. Why Study Computer Architecture?

1.13 Historical Perspectives and References

ECE 588/688 Advanced Computer Architecture II

Multicore and Parallel Processing

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc.

Intel Corporation Silicon Technology Review

3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape

EE241 - Spring 2004 Advanced Digital Integrated Circuits

ECE 2162 Intro & Trends. Jun Yang Fall 2009

CSE502: Computer Architecture CSE 502: Computer Architecture

ECE/CS 552: Introduction To Computer Architecture 1

(ii) Why are we going to multi-core chips to find performance? Because we have to.

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem)

A Peek at the Future Intel s Technology Roadmap. Jesse Treger Datacenter Strategic Planning October/November 2012

Why Parallel Architecture

Technology Trends Presentation For Power Symposium

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Practical Information

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Ubiquitous Location: challenges and opportunities of enabling all-day, everywhere location for all mobile platforms

Memory Systems IRAM. Principle of IRAM

Computer Architecture

Computer Architecture!

INTEL MULTI-CORE PROCESSOR ARCHITECTURE DEVELOPMENT A

Intel Many Integrated Core (MIC) Architecture

ECE 588/688 Advanced Computer Architecture II

Transcription:

Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005 Intel Corporation.

Agenda Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary 2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

40 Years of Moore s s Law 10,000 1,000 100 Itanium 2 221M in 2002 410M in 2003 Montecito 1.7 Billion Tulsa 1.3 Billion Million Transistors 10 1 Pentium Pro 486 Pentium proc Pentium 4 0.1 286 386 0.01 0.001 8085 8086 4004 8008 8080 70 80 90 00 10 More than 1 Billion Transistors in 2005!

First Billion Transistor Dual Core Chip 1MB L2I 2 Way Multi-threading threading Dualcore 2x12MB L3 Caches Arbiter 1.72 Billion Transistors

Continuation of Moore s s Law Process Name P856 P858 Px60 P1262 P1264 P1266 P1268 P1270 1st Production 1997 1999 2001 2003 2005 2007 2009 2011 Process Generation Wafer Size (mm) Inter-connect Metal layers 5 6 6 7 8 8-9? 9-10? 10? Channel Gate dielectric Gate electrode Lithography 0.25µm 0.18µm 0.13µm Si Si Si 90 nm 65 nm 45 nm 32 nm 22 nm 200 200 200/300 300 300 300 300 300 Al SiO 2 Al SiO 2 Cu SiO 2 Cu Strained Si SiO 2 Cu Cu Strained Strained Si Si SiO 2 Poly- silicon Poly- silicon Poly- silicon Poly- silicon Poly- silicon High-k Metal Cu? Strained Si High-k Metal 248 nm 248 nm 248 nm 193 nm 193 nm 193 nm EUV 13.4 nm Strained Si High-k Metal EUV 13.4 nm R R Source: Intel Subject to change

Power Will be the Limiter 1000 100 10 Million Transistors 1 0.1 0.01 0.001 8086 8080 Pentium 4 proc 386 Pentium proc 1 Billion Transistors 1970 1980 1990 2000 2010 2020 >1B transistor integration capacity will exist 100000 MHz 10000 1000 100 10 1 0.1 8086 8080 386 Pentium 4 proc Pentium proc 15-30 GHz 1970 1980 1990 2000 2010 2020 Clock frequency can increase 1000000 100000 Pentium 4 proc 10000 1000 MIPS 100 386 Pentium proc 10 8086 1 0.1 8080 0.01 1 TIPS 1970 1980 1990 2000 2010 2020 Power (Watts) 1000 100 10 1 0.1 8086 8080 386 Pentium 4 proc Pentium proc 1000's of Watts? 1970 1980 1990 2000 2010 2020 Applications will demand TIPS performance But the Power Challenge: Highest performance in the power envelope

Agenda Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary 2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

Design Challenges Memory latency not scaling as fast as processor speed Power growing non-linearly with single thread performance Designer productivity lagging design complexity Ability to validate and test complex design Keeping up with new process technology every two years

Long Latency DRAM Accesses: Needs Latency Tolerant Techniques Peak Instructions during DRAM Access 1400 1200 1000 800 600 400 200 0 Pentium Processor 66 MHz Pentium-Pro Pro Processor 200 MHz Pentium III Processor 1100 MHz Pentium 4 Processor 2 GHz Future CPUs

DRAM Latency Tolerance Continue building even larger caches Every semiconductor process generation provides opportunity to double cache size Cache becomes larger part of die Hide multiple threads of execution behind memory latency Intel implemented simultaneous multi- threading in 2000 Implement multi-core products as Moore s s Law allows

From 180 nm to 130 nm Core Core Bus Core Bus 3MB L3 Cache 180 nm 6MB L3 Cache 130 nm Bus 9MB L3 Cache Cache becomes a larger part of the Die Area. Dual core does not double die size.

Agenda Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary 2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

Single-stream Performance vs Costs Relative power/die size/perf vs 486 25 20 15 10 5 0 SPECInt SPECFP Die Size Power 486 Pentium Processor Pentium III Processor Pentium 4 Processor

Situational Analysis With Each Process Generation transistor density doubles Frequency has increased by ~1.5X; ~1.3x in future Vcc has scaled by about ~0.8x; ~0.9x in future Capacitance has scaled by 0.7x Total power may not scale down due to increased leakage Instruction Level Parallelism harder to find Increasing single-stream stream performance often requires non-linear increase in design complexity Many server applications are inherently parallel Parallelism exists in multimedia applications Multi-tasking tasking usage models becoming popular

Processor Power

Design Complexity and Productivity factors Huge transistor budgets stress ability to design and verify complex chips Multi-core fits well with increasing transistor budgets Multi-core design addresses density/designer gap

Agenda Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary 2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

Iron Law of Performance Execution Time is the product of Path Length Cycles Per Instruction (CPI) Cycle Time CPI is the sum of infinite-cache core cpi miss rate * effective memory latency Bad (good) news is that performance does not scale up (down) linearly with frequency

The Magic of Voltage Scaling Power = Capacitance * Voltage 2 * Frequency Frequency α Voltage in region of interest Power increases as the cube of Frequency Good news is that voltage scaling works 10% reduction in voltage yields 10% reduction in frequency 30% reduction in power less than 10% reduction in performance

Simple Dual Core Example Assume Single Core processor at 100W 80W for core, 20W for cache and I/O 50% die are is core Dual core within same power envelop 20W for I/O and cache 40W per core Die size increases by 50% Reduce voltage by 21% to reduce core power to 40W Frequency reduces by ~20% Single thread perf reduces by ~15% Throughput increases by 70-80%

Possible Improvements Develop new power efficient core E.g. extensive clock gating Big power savings with little or no performance loss Design a smaller core with lower performance Area and power savings much greater than performance loss Use larger number of cores Adjust frequency and power of each core with load factor Inactive cores can be put in sleep mode Maintain overall die power constant

Intel s s Next Generation Micro-Architecture NetBurst µarch Mobile µarch + New Innovations Next Generation µarch

Server Products Whitefield 2 to 4 highly efficient cores Large/shared and scalable L2 cache 40% reduction in TDP power Server *Ts Woodcrest Greater compute density Lower energy consumption Greater application responsiveness Improved virtualization, manageability, and security Enabling

Advancing Performance Today SPECint Rate Performance Q1 06 2H 06 Single Core +88% (estimate) Dual-Core +52% (estimate) Lower Power Dual-Core Irwindale Lindenhurst Dempsey Bensley Woodcrest Bensley Source: Intel Corporation. Estimated SPECint_rate2000 performance. Projections and technical specifications are based on internal analysis and subject to change All dates and products specified are for planning purposes only and are subject to change.

Agenda Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary 2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

Possible Evolution Transistor density doubles with each process generation New generation enables complex new core Possible alternative design point Double the cache capacity in same area Double the number of processor cores Frequency improves with process technology Core Core Core Core Core Core Core Cache 2 x Cache 4 x Cache 90 nm 65 nm 45 nm

Multi-Core Everywhere 2005 2006 2007 SERVER Shipping >85% ~100% DESKTOP PERFORMANCE Shipping >70% >90% MOBILE PERFORMANCE Shipping >70% >90%

CMP Challenges How much Thread Level Parallelism is there in most workloads? Ability to generate code with lots of threads & performance scaling Thread synchronization Operating systems for parallel machines Single thread performance tradeoff Power limitations On-chip interconnect/cache infrastructure Memory and I/O bandwidth required

Intel s s Software Tools and Support Thread Checker Thread Profiler Solutions, Blueprints, Sizing/Scaling Guides Math Kernel Libraries Performance Primitives Driver Optimization Labs Drivers Compilers Solution Services Developer Services VTune Analyzers Software College Early Access Programs

How Many Cores? Where does the doubling stop? Driven by software issues Today Microsoft Windows supports only 64 threads! How many applications scale to 64 threads? How well does performance scale with thread count?

Xeon Server & Workstation Roadmap Enterprise MP 64-bit Xeon MP 8M Dual-Core Xeon 7000 (Paxville MP) Truland & OEM Platforms Tulsa Dual-Core Whitefield Dunnington Reidland Volume DP 64-bit Xeon Dual-Core 2M Xeon 2x2M (Paxville DP ) Lindenhurst Dual-Core Xeon 5000 (Dempsey) Bensley Woodcrest Dual-Core Future Processor Future Platform Power Optimized DP 64-bit Xeon LV 2M Lindenhurst Sossaman Lindenhurst Future Processor Future Platform 2005 2006 Future

Agenda Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary 2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

Looking Beyond CMP How far do we push the number of general purpose cores? Is there are role for application specific engines? Programming model for heterogeneous cores

Improving Power Efficiency >3% performance for 1% in power penalty: Specialized MIPS Performance *CMP = Symmetric General Purpose (GP) cores 2 CMP CMP* 3 CMP ~1% performance for 1% in power 4 CMP penalty: parallelism One processor Conventional design 1% performance for 3% in power Power

Application Specific Engines Can achieve better power efficiency than general purpose cores Simpler design due to targeted application and lack of support for full operating system Challenge Needs to support high volume application Reconfigurable? Graphics and Multimedia engines are good candidates

Agenda Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary 2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

Summary One billion transistors in 2005 Chip Level Multiprocessing and large caches can exploit Moore s s Law Amount of parallelism in future microprocessor systems will increase Heterogeneous cores may emerge eventually Need applications and tools that can exploit parallelism Design challenges and software issues remain Collaborate, Innovate, Lead!

Closing Thought Don t t be encumbered by past history, go off and do something wonderful. - Robert Noyce Intel Co-founder