POWER7: IBM's Next Generation Server Processor

Similar documents
POWER7: IBM's Next Generation Server Processor

POWER7+ TM IBM IBM Corporation

POWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist

Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group

Power 7. Dan Christiani Kyle Wieschowski

Power Technology For a Smarter Future

IBM's POWER5 Micro Processor Design and Methodology

Open Innovation with Power8

POWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX

Puey Wei Tan. Danny Lee. IBM zenterprise 196

1. PowerPC 970MP Overview

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

AMD Opteron 4200 Series Processor

PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors

POWER6 Processor and Systems

Eric Schwarz. IBM Accelerators. July 11, IBM Corporation

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers

Exam Questions C

A 5.2 GHz Microprocessor Chip for the IBM zenterprise

Bridging High Performance and Low Power in the era of Big Data and Heterogeneous Computing

HP solutions for mission critical SQL Server Data Management environments

HW Trends and Architectures

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle

1 Copyright 2013 Oracle and/or its affiliates. All rights reserved.

Jeff Stuecheli, PhD IBM Power Systems IBM Systems & Technology Group Development International Business Machines Corporation 1

Six-Core AMD Opteron Processor

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

FUJITSU SPARC ENTERPRISE SERVER FAMILY MID-RANGE AND HIGH- END MODELS ARCHITECTURE FLEXIBLE, MAINFRAME- CLASS COMPUTE POWER

ECE 571 Advanced Microprocessor-Based Design Lecture 24

PowerPC 740 and 750

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

EPYC VIDEO CUG 2018 MAY 2018

Sun Fire V880 System Architecture. Sun Microsystems Product & Technology Group SE

The PowerEdge M830 blade server

A 1.5GHz Third Generation Itanium Processor

Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc.

SPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server

Concurrent High Performance Processor design: From Logic to PD in Parallel

Advanced Multimedia Architecture Prof. Cristina Silvano June 2011 Amir Hossein ASHOURI

POWER8 for DB2 and SAP

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

IBM Power 755 server. High performance compute node for scalable clusters using InfiniBand architecture interconnect products.

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Intel Enterprise Processors Technology

Best Practices for Setting BIOS Parameters for Performance

Overview of the SPARC Enterprise Servers

IBM POWER6 microarchitecture

IBM POWER7 Systems Express Blades Quick Reference Guide November 2011

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem)

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

IBM POWER5 CHIP: A DUAL-CORE MULTITHREADED PROCESSOR

OPENSPARC T1 OVERVIEW

The Future of SPARC64 TM

What SMT can do for You. John Hague, IBM Consultant Oct 06

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

p5 520 server Robust entry system designed for the on demand world Highlights

IBM System x3850 M2 servers feature hypervisor capability

IBM i テクニカル ワークショップ IBM i and the Future. Tim Rowe. IBM i Architect Application Development Systems Management.

SAP HANA x IBM POWER8 to empower your business transformation. PETER LEE Distinguished Engineer Systems Hardware, IBM Greater China Group

A Preliminary evalua.on of OpenPOWER through op.mizing stencil based algorithms

IBM Power AC922 Server

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

AMD Opteron Processors In the Cloud

eslim SV Xeon 1U Server

IBM Power Advanced Compute (AC) AC922 Server

eslim SV Xeon 2U Server

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

All About the Cell Processor

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD

Reference Architecture Microsoft Exchange 2013 on Dell PowerEdge R730xd 2500 Mailboxes

Computer system energy management. Charles Lefurgy

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

IBM _` p5 570 servers

Multi-Threading. Last Time: Dynamic Scheduling. Recall: Throughput and multiple threads. This Time: Throughput Computing

These slides contain projections or other forward-looking statements within the meaning of Section 27A of the Securities Act of 1933, as amended, and

Hybrid Memory Cube (HMC)

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

IBM System p5 550 and 550Q Express servers

IBM System zenterprise 196: Memory Subsystem Overview

IBM POWER4: a 64-bit Architecture and a new Technology to form Systems

POWER3: Next Generation 64-bit PowerPC Processor Design

IBM EXAM QUESTIONS & ANSWERS

Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

IBM Power Facts and Features IBM Power Systems, IBM PureFlex and Power Blades

eslim SV Dual and Quad-Core Xeon Server Dual and Quad-Core Server Computing Leader!! ESLIM KOREA INC.

Advanced Parallel Programming I

Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

Roadrunner. By Diana Lleva Julissa Campos Justina Tandar

Crusoe Processor Model TM5800

Mainstream Computer System Components

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded

Next Generation Technology from Intel Intel Pentium 4 Processor

This Material Was All Drawn From Intel Documents

AMD HyperTransport Technology-Based System Architecture

Transcription:

Hot Chips 21 POWER7: IBM's Next Generation Server Processor Ronald Kalla Balaram Sinharoy POWER7 Chief Engineer POWER7 Chief Core Architect Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002

Outline POWER Processor History POWER7 TM Motivation Design Overview Summary 2

20+ Years of POWER Processors 65nm 45nm Next Gen. 1.0um POWER1 -AMERICA s -Cobra A10-64 bit.72um RSC Muskie A35 RS64I Apache BiCMOS.5um.6um -601 RS64IV Sstar RS64III Pulsar RS64II North Star.5um.35um POWER2 TM P2SC.35um -603.5um.18um.25um.35um.25um 604e.22um POWER3 TM -630 180nm POWER4 TM -Dual Core 130nm POWER5 TM -SMT POWER6 TM -Ultra High Frequency POWER7 -Multi-core Major POWER Innovation -1990 RISC Architecture -1994 SMP -1995 Out of Order Execution -1996 64 Bit Enterprise Architecture -1997 Hardware Multi-Threading -2001 Dual Core Processors -2001 Large System Scaling -2001 Shared Caches -2003 On Chip Memory Control -2003 SMT -2006 Ultra High Frequency -2006 Dual Scope Coherence Mgmt -2006 Decimal Float/VSX -2006 Processor Recovery/Sparing -2009 Balanced Multi-core Processor -2009 On Chip EDRAM 1990 1995 2000 2005 2010 3 * Dates represent approximate processor power-on dates, not system availability

POWER7 Processor Chip 567mm 2 Technology: 45nm lithography, Cu, SOI, edram 1.2B transistors Equivalent function of 2.7B edram efficiency Eight processor cores 12 execution units per core 4 Way SMT per core 32 Threads per chip 256KB L2 per core 32MB on chip edram shared L3 Dual DDR3 Memory Controllers 100GB/s Memory bandwidth per chip sustained Scalability up to 32 Sockets 360GB/s SMP bandwidth/chip 20,000 coherent operations in flight Advanced pre-fetching Data and Instruction Binary Compatibility with POWER6 4 * Statements regarding SMP servers do not imply that IBM will introduce a system with this capability.

POWER7 Design Principles: Multiple optimization Points 10000 1000 Traditional Performance View Balanced Design Multiple optimization points Improved energy efficiency RAS improvements Improved Thread Performance Dynamic allocation of resources Shared L3 100 10 1 Thread Core Socket 32 Chip System Increased Core parallelism 4 Way SMT Aggressive out of order execution 10000 Balanced View 32 Chip System POWER6 Extreme Increase in Socket Throughput Continued growth in socket bandwidth Balanced core, cache, memory improvements System Scalable interconnect Reduced coherence traffic System Thruput 1000 100 10 Socket Core Thread POWER7 5 * Statements regarding SMP servers do not imply that IBM will introduce a system with this capability. 1 Single Thread Performance Graphs for illustration purposes only (Not actual data)

POWER7 Design Principles: Flexibility and Adaptability Cores: 8, 6, and 4-core offerings with up to 32MB of L3 Cache Dynamically turn cores on and off, reallocating energy Dynamically vary individual core frequencies, reallocating energy Dynamically enable and disable up to 4 threads per core Memory Subsystem: Full 8 channel or reduced 4 channel configurations System Topologies: Standard, half-width, and double-width SMP busses supported Multiple System Packages 2/ 4s Blades and Racks Single Chip Organic High-End and Mid-Range Single Chip Glass Ceramic Compute I ntensive Quad-chip MCM 6 1 Memory Controller 3 4B local links 2 Memory Controllers 3 8B local links 2 8B Remote links 8 Memory Controllers 3 16B local links (on MCM) * Statements regarding SMP servers do not imply that IBM will introduce a system with this capability.

POWER7: Core Execution Units 2 Fixed point units 2 Load store units 4 Double precision floating point 1 Vector unit 1 Branch 1 Condition register 1 Decimal floating point unit 6 Wide dispatch/8 Wide Issue Recovery Function Distributed 1,2,4 Way SMT Support Out of Order Execution 32KB I-Cache 32KB D-Cache 256KB L2 Tightly coupled to core ISU FXU Add Boxes IFU CRU/BRU DFU 256KB L2 LSU VSX FPU 7

POWER7: Performance Estimates POWER7 Continues Tradition of Excellent Scalability Core Performance Core performance increased by: Re-pipelined execution units Reduced L1 cache latency Tightly coupled L2 cache Additional execution units More flexible execution units Increased pipeline utilization with SMT4 and aggressive out of order execution Floating Pt. Integer Commercial POWER6 SMT2 POWER7 SMT4 Chip Performance Improved Greater then 4X: High performance on chip interconnect Improved cache utilization Dual high speed integrated memory controllers System Advanced SMP links will provide near linear scaling for larger POWER7 systems. Chip Performance Floating Pt. Integer Commercial POWER6 POWER7 SMT4 8 * Performance estimates relate to processor only and should not be used to estimate projected server performance.

Energy Management: Architected Idle Modes Two Design Points Chosen for Technology 4 PowerPC Architected States Nap (optimized for wake-up time) Turn off clocks to execution units Reduce frequency to core Caches and TLB remain coherent Fast wake-up Sleep (optimized for power reduction) Purge caches and TLB Turn off clocks to full core and caches Reduce voltage to V-retention Leakage current reduced substantially Voltage ramps-up on wake up No core re-initialization required Wake-Up Latency Doze Nap RV Winkle Power gate Sleep Energy Reduction 9

Adaptive Energy Management: Energy Scale TM Chip FO4 Tuned for Optimal Performance/Watt in Technology DVFS (Dynamic Voltage and Frequency Slewing) -50% to +10% frequency slew independent per core Frequency and voltage adjusted based on: Work load and utilization. On board activity monitors Turbo-Mode Up to 10% frequency boost Leverages excess energy capacity from: Non worst case work loads Idle cores Processor and Memory Energy Usage can be independently Balanced. Real time hardware performance monitors used. On board power proxy logic estimates power Power Capping Support Allows budgeting of power to different parts of system AC Power 100 SPECPower: Mean System Power per Load Level 80 Vmin 60 40 Load Level (%) 20 0 10

POWER7: Reliability and Availability Features Dynamic Oscillator Failover OSC0 OSC1 Fabric Interface Fabric Bus Interface to other Chips and Nodes ECC protected Node hot add /repair BUF BUF BUF BUF Core Recovery Leverage speculative execution resources to enable recovery Error detected in GPRs FPRs VSR, flushed and retried Stacked latches to improve SER Alternate Processor Recovery Partition isolation for core checkstops 64 Byte ECC on Memory Corrects full chip kill on X8 dimms Spare X8 devices implemented Dual memory chip failures do not cause outage Selective memory mirror capability to recover partition from dimm failures HW assisted scrubbing SUE handling Dynamic sparing on channel interface PowerVM Hypervisor protected from full dimm failures 11 X8 Dimms IO Hub PCI Bridge PCI Adapter L3 edram ECC protected SUE handling Line delete Spare rows and columns GX IO Bus ECC protected Hot add InfiniBand Interface Redundant paths * Statements regarding SMP servers do not imply that IBM will introduce a system with this capability.

Summary Power Systems continue strong 7th Generation Power chip: Balanced Multi-Core design EDRAM technology SMT4 Greater then 4X performance in same power envelope as previous generation Scales to 32 socket, 1024 threads balanced system Building block for peta-scale PERCS project Power7 High Volume Card POWER7 Systems Running in Lab AIX, IBM i, Linux all operational 12 * Statements regarding SMP servers do not imply that IBM will introduce a system with this capability.