PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors

Similar documents
1. PowerPC 970MP Overview

PowerPC 740 and 750

Cell Programming Tips & Techniques

Power 7. Dan Christiani Kyle Wieschowski

Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group

POWER3: Next Generation 64-bit PowerPC Processor Design

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE. 64-Bit PowerPC 970 Targets Entry-Level Servers and Desktops

Introduction to PCI Express Positioning Information

Optimizing Data Sharing and Address Translation for the Cell BE Heterogeneous CMP

XT Node Architecture

POWER7: IBM's Next Generation Server Processor

POWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist

Digital Semiconductor Alpha Microprocessor Product Brief

IBM's POWER5 Micro Processor Design and Methodology

Eric Schwarz. IBM Accelerators. July 11, IBM Corporation

Intel released new technology call P6P

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

POWER7: IBM's Next Generation Server Processor

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights

IBM "Broadway" 512Mb GDDR3 Qimonda

The network interface configuration property screens can be accessed by double clicking the network icon in the Windows Control Panel.

Itanium 2 Processor Microarchitecture Overview

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

Spring 2011 Prof. Hyesoon Kim

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Multi-Threading. Last Time: Dynamic Scheduling. Recall: Throughput and multiple threads. This Time: Throughput Computing

PowerPC 620 Case Study

Developing Code for Cell - Mailboxes

CS 152 Computer Architecture and Engineering

The ARM10 Family of Advanced Microprocessor Cores

Power Technology For a Smarter Future

IBM POWER6 microarchitecture

Portland State University ECE 588/688. Cray-1 and Cray T3E

CS 152 Computer Architecture and Engineering

The PowerPC RISC Family Microprocessor

Microarchitecture Overview. Performance

Pipelining and Vector Processing

Parallel Computing: Parallel Architectures Jin, Hai

620 Fills Out PowerPC Product Line

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

Hands-on - DMA Transfer Using Control Block

EECS 322 Computer Architecture Superpipline and the Cache

Microarchitecture Overview. Performance

EC 513 Computer Architecture

Case Study IBM PowerPC 620

SA-1500: A 300 MHz RISC CPU with Attached Media Processor*

All About the Cell Processor

Limitations of Scalar Pipelines

A brief History of INTEL and Motorola Microprocessors Part 1

SPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server

Hello World! Course Code: L2T2H1-10 Cell Ecosystem Solutions Enablement. Systems and Technology Group

Crusoe Processor Model TM5800

POWER6 Processor and Systems

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Portland State University ECE 588/688. IBM Power4 System Microarchitecture

Low-Power, High-Performance Architecture of the PWRficient Processor Family

Exercise Euler Particle System Simulation

IBM PowerPC Enablement Kit: ChipBench-SLD: System Level Analysis and Design Tool Suite. Power.org, September 2005

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

What SMT can do for You. John Hague, IBM Consultant Oct 06

Family 15h Models 00h-0Fh AMD FX -Series Processor Product Data Sheet

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation

CS425 Computer Systems Architecture

Chapter 06: Instruction Pipelining and Parallel Processing. Lesson 14: Example of the Pipelined CISC and RISC Processors

Cell Broadband Engine Architecture. Version 1.0

SH4 RISC Microprocessor for Multimedia

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

Intel Enterprise Processors Technology

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

Vector IRAM: A Microprocessor Architecture for Media Processing

Introducing the Superscalar Version 5 ColdFire Core

CS 152 Computer Architecture and Engineering

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

Family 15h Models 00h-0Fh AMD Opteron Processor Product Data Sheet

Jim Keller. Digital Equipment Corp. Hudson MA

SH-5: A First 64-bit SuperH Core with Multimedia Extension. Fumio Arakawa Hitachi, Ltd.

IBM Cell Processor. Gilbert Hendry Mark Kretschmann

HXRHPPC Processor Rad Hard Microprocessor

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

UltraSparc-3 Aims at MP Servers

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Basic Computer Architecture

MIPS R4300I Microprocessor. Technical Backgrounder-Preliminary

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

The University of Texas at Austin

Cell Broadband Engine Architecture. Version 1.02

The CoreConnect Bus Architecture

Lecture 8: RISC & Parallel Computers. Parallel computers

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

MPC740 Microprocessor Overview Floating-point unit (FPU) Branch processing unit (BPU) System register unit (SRU) Load/store unit (LSU) Two integer uni

Design Objectives of the 0.35µm Alpha Microprocessor (A 500MHz Quad Issue RISC Microprocessor)

Vector Engine Processor of SX-Aurora TSUBASA

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy

This Material Was All Drawn From Intel Documents

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Transcription:

PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors Peter Sandon Senior PowerPC Processor Architect IBM Microelectronics All information in these materials is subject to change without notice. All information is provided on an as is basis, without any warranty of any kind. Any performance data contained herein is preliminary and subject to change. Copyright IBM Corporation 2002. All rights reserved.

PowerPC 970 Design Objectives and Overview Leverage architectural advantages of 64-bit POWER4 TM for new generation of PowerPC processor Provide high performance general purpose processing through advanced superscalar design with multiple, pipelined execution units Enhance multimedia, graphics and data movement through hardware implementation of a SIMD processing facility Support the bandwidth demands of a highly superscalar and SIMD enhanced core through a high speed processor bus

IBM PowerPC High Performance Roadmap 64-Bit Microprocessors PPC 970 1.4 1.8 GHz 32-Bit Microprocessors PPC 750 300-500MHz PPC 750CXe 300-600MHz PPC 750FX 500-1000MHz Target frequencies are subject to change without notice.

Based on POWER4 Architecture Winner MPR analyst s choice, 2001 POWER4 design goals Balanced system/bus throughput design SMP optimization Native 32-bit compatibility High frequency POWER4 1.3 GHz CPU Shared L2 cache L3 Cntl 1.3 GHz CPU Distributed Switch PPC 970 implementation SIMD enhanced Lower power Smaller die Single processor core SIMD 1.8 GHz CPU L2 cache BIU PPC 970

PPC 970 / POWER4 Size Comparison

PPC 970 Features Instruction pipe 64KB L1 Inst cache, direct mapped 32 entry I buffer 8 instructions fetch / cycle Branch prediction Highly accurate dynamic prediction Dispatch, issue 1 group (4 + branch) / cycle Up to 20 active groups Up to 8 issue / cycle Over 200 instructions in flight Data pipe 32 KB L1 Data cache, 2-way sa 32 x 64b GPR, FPR 32 x 128b VRF 512KB L2 cache, 8-way sa 8 data prefetch streams L1 I IFU IDU ISU FPR FPU1 FPU2 BIU L2 TLB/SLB I ERAT D ERAT GPR FXU1 FXU2 L1 D LSU1 LSU2 VRF VALU PERM

PPC 970 Features (cont.) Memory management 64 entry SLB, fully associative 256 x 4-way TLB 64 x 2-way Inst and Data ERATs Supports 42-bit real addresses BIU L2 Execution 2 Load/store units 64b for GPR, FPR 128b for VRF 2 Fixed point units 2 IEEE floating point units Single-, double-precision 2 SIMD sub-units VALU 2 integer, float subunits VPERM permute Branch unit Condition register unit L1 I IFU IDU ISU FPR FPU1 FPU2 TLB/SLB I ERAT D ERAT GPR FXU1 FXU2 L1 D LSU1 LSU2 VRF VALU PERM

PowerPC 970 Die Overview

PPC 970 Pipeline Pipeline depths 9 fetch, decode stages 5 to 13 out of order execute stages 2-3 dispatch, completion stages Pipeline width Up to 8 fetched per cycle Up to 5 dispatched per cycle Up to 8 issued per cycle 12 execution units Up to 8 L1 D cache misses Up to 5 completed per cycle Fetch, decode Dispatch Execute Complete Branch prediction Up to 2 branches per cycle 3 16K x 1 BHTs Link stack, count cache L1 L2 X1 X2 BRCR F1 F2 VXVCVF VP

64-bit Processing - 32-bit Compatibility 64-bit advantages Driven by need to address larger memory spaces Performance advantage for data intensive applications Enable new 64-bit solutions Native 64-bit mode 64-bit fixed point processing 64-bit effective addresses 42-bit real addresses Segment lookaside buffer caches segment table entries Native 32-bit mode High word of all effective addresses are cleared 32-bit PPC application code supported First 16 entries of SLB are used as segment registers

SIMD/Vector Engine Features 162 specialized SIMD instructions 128-bit data paths 4-way SIMD single precision floating point (8 FP ops/cycle) 4-way, 8-way, 16-way SIMD fixed point operations Two execution units Permute unit 16-entry issue queue Permute, splat, merge ALU 3 subunits 20-entry issue queue Simple, complex fixed point Floating point Dispatch group Issue Q0 Permute VRF Simple fixed Complex fixed Issue Q1 Float

High Bandwidth Processor Bus Features Two unidirectional busses 32-bit read, 32-bit write Point-to-point Source synchronous Elastic interface Allows multiple cycle wire delays between chips Hardware deskews bit lines at POR Bus protocol Address, control and data multiplexing Sideband signals for snoop and ACK Pipelined transactions Out of order data Coherency and sharing via snooping Processor synchronization for SMP Up to 900 MHz bit rate achieves up to 6.4 GB/s useable bandwidth sys_clk EI receiver 32 EI driver Clk dist Clk dist EI driver 32 EI receiver PPC 970 Companion chip

PPC 970 Performance SPECint2000 937 @ 1.8 GHz* SPECfp2000 1051 @ 1.8 GHz* Dhrystone MIPS 5220 @ 1.8 GHz* 2.9 DMIPS / MHz Additional Performance Peak scalar GFLOPS = 7.2 Peak SIMD GFLOPS = 14.4 RC5 : 18M keys/sec* 1200 1000 800 600 400 200 0 Estimated SPEC2000 Performance SPECint2000 SPECfp2000 PPC 970 @ 1.0 GHz POWER4 @ 1.0 GHz PPC 970 @ 1.8 GHz *All results are estimated performance; subject to change without notice

PowerPC 970 Parametrics Target Frequencies Architecture Performance* Caches Voltages Typical Power Dissipation* Package Technology Target Schedule* 1.4 to 1.8 GHz 64-bit PowerPC, 32-bit compatible 937 SPECint2000 @ 1.8 GHz 1051 SPECfp2000 @ 1.8 GHz 5220 DMIPS @ 1.8 GHz (2.9 DMIPS/MHz) 64KB, I cache, w/parity 32KB, D cache, w/parity 512KB, L2 cache, w/ecc 1.3V core logic and I/Os 42W @ 1.8 GHz, 1.3v 19W @ 1.2 GHz, 1.1v 25x25mm CBGA 576 pins on 1mm pitch (161 signals) 0.13mm, CMOS w/ SOI 8 levels of copper interconnect Samples 2Q 2003, Production 2H 2003 *Estimation only; subject to change without notice. Target frequencies are subject to change without notice. Any performance data contained herein is preliminary and subject to change.

Conclusion The IBM PowerPC 970 design constitutes An advanced 64-bit processor Derived from the POWER4 core Enhanced by a SIMD/Vector engine With a high bandwidth memory bus To achieve high performance on compute and bandwidth intensive applications ibm.com/powerpc IBM, the IBM logo, POWER4, PowerPC, the PowerPC logo, and the PowerPC Architecture are trademarks of International Business Machines Corporation.

(c) Copyright International Business Machines Corporation 2002. All Rights Reserved. Printed in the United Sates October 2002 The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both. IBM IBM Logo PowerPC PowerPC Logo POWER4 PowerPC 750 PowerPC 750CX PowerPC 750CXe PowerPC 750FX PowerPC 970 Other company, product and service names may be trademarks or service marks of others. All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary. While the information contained herein is believed to be accurate, such information is preliminary, and should not be relied upon for accuracy or completeness, and no representations or warranties of accuracy or completeness are made. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN "AS IS" BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document. IBM Microelectronics Division 1580 Route 52, Bldg. 504 Hopewell Junction, NY 12533-6351 The IBM home page is http://www.ibm.com The IBM Microelectronics Division home page is http://www.chips.ibm.com