Intel Enterprise Processors Technology

Similar documents
Intel released new technology call P6P

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Next Generation Technology from Intel Intel Pentium 4 Processor

Intel Enterprise Solutions

Basic Computer Architecture

The AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance

Multi-Core Microprocessor Chips: Motivation & Challenges

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation

Simultaneous Multithreading on Pentium 4

Exploring the Effects of Hyperthreading on Scientific Applications

Computer Architecture. Introduction. Lynn Choi Korea University

Inside Intel Core Microarchitecture

Fundamentals of Computer Design

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Advanced Processor Architecture

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

AMD Opteron 4200 Series Processor

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

This Material Was All Drawn From Intel Documents

eslim SV Xeon 1U Server

HP s Performance Oriented Datacenter

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

eslim SV Xeon 2U Server

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

Modular Platforms Market Trends & Platform Requirements Presentation for IEEE Backplane Ethernet Study Group Meeting. Gopal Hegde, Intel Corporation

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

MICROPROCESSOR TECHNOLOGY

HP ProLiant BL35p Server Blade

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

Fundamentals of Computers Design

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem)

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Philippe Thierry Sr Staff Engineer Intel Corp.

HP ProLiant blade planning and deployment

Update for New Implementations. As new implementations of the Itanium architecture

Ultimate Workstation Performance

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Outline Marquette University

Agenda. What is Ryzen? History. Features. Zen Architecture. SenseMI Technology. Master Software. Benchmarks

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Why Parallel Architecture

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits

FAST FORWARD TO YOUR <NEXT> CREATION

What Transitioning from 32-bit to 64-bit x86 Computing Means Today

Node Hardware. Performance Convergence

Parallel Computer Architecture

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

Storage Systems Market Analysis Dec 04

Systems Design and Programming. Instructor: Chintan Patel

Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc.

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Sun N1: Storage Virtualization and Oracle

SUPERMICRO, VEXATA AND INTEL ENABLING NEW LEVELS PERFORMANCE AND EFFICIENCY FOR REAL-TIME DATA ANALYTICS FOR SQL DATA WAREHOUSE DEPLOYMENTS

HW Trends and Architectures

How to write powerful parallel Applications

Adaptive Scientific Software Libraries

Introduction to Microprocessor

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers

The Mont-Blanc approach towards Exascale

Ken Kroeker. Partner Technology Access Center e Services Partner Division

The Future of Computing: AMD Vision

IBM POWER4: a 64-bit Architecture and a new Technology to form Systems

The Optimal CPU and Interconnect for an HPC Cluster

Hyperthreading Technology

AMD Opteron Processors In the Cloud

Godson Processor and its Application in High Performance Computers

CS425 Computer Systems Architecture

GPUs and GPGPUs. Greg Blanton John T. Lubia

IBM _` p5 570 servers

Page 1. Review: Dynamic Branch Prediction. Lecture 18: ILP and Dynamic Execution #3: Examples (Pentium III, Pentium 4, IBM AS/400)

PowerEdge 3250 Features and Performance Report

VIA ProSavageDDR KM266 Chipset

Advances of parallel computing. Kirill Bogachev May 2016

Head to Head with Dell & IBM: How ProLiant Wins

Intel Core Microarchitecture

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

Innovate. Integrate. Innovate. Integrate.

Introducing Sandy Bridge

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Data Storage World - Tokyo December 16, 2004 SAN Technology Update

Pentium 4 Processor Block Diagram

Agenda. Pentium III Processor New Features Pentium 4 Processor New Features. IA-32 Architecture. Sunil Saxena Principal Engineer Intel Corporation

Itanium 2 Processor Microarchitecture Overview

Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations

Data Storage World - Tokyo December 16, 2004 SAN Technology Update

UMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming

Intel Many Integrated Core (MIC) Architecture

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

New 130nm Itanium 2 Processors for 2003

Intel Workstation Technology

Transcription:

Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology Summary *Third party brands and names are the property of their respective owners 2 Page 1 1

Server Shipments 1996 2001 Thousands of Systems 1,400 1,200 1,000 Non-IA 800 600 400 200 Architecture (IA) 0 Q196 Q396 Q197 Q397 Q198 Q398 Q199 Q399 Q100 Q300 Q101 Source: IDC Tracker Q2 01 *Third party brands and names are the property of their respective owners 3 Architecture vs. On-line Transactions Top TPC-C* Results By Performance 1999 1 2 3 4 5 6 7 8 9 10 2001 Decision Support Top TPC-H* Results By Performance 100GB 300GB 2001 1 2 1 2 1000GB 1 2 3000GB 1 2 Internet Commerce Top TPC-W* Results By Performance 2001 10,000 Items 1 2 3 100,000 Items 1 2 3 *Third party brands and names are the property of their respective owners *Source: www.tpc.org 4 Page 2 2

Flexibility for Enterprise and Scientific Environments Web Application Database & Servers Servers HPC Servers Scale Up Mainframes 16-way 8-way 4-way 2-way Scale Out *Third *Other party names brands and names brands are are the property of their respective owners owners. 5 Introducing the new Xeon Processor 6 Page 3 3

Xeon Processor MP Features for Multi-Processor Server Platforms Integrated Three Level Cache 1MB or 512KB Level 3 Cache + Level 2 Advanced Transfer Cache 256KB 400MHz System Bus NetBurst Microarchitecture Hyper-Threading Technology ServerWorks* Chipset with DDR 200 Memory and PCI-X support Speeds up to 1.60 GHz 108 million transistors 144 New SSE-2 Instructions Designed for 4-way, 8-way and above Multi-Processor Servers System Manageability Bus *Third party brands and names are the property of their respective owners 7 NetBurst Micro-architecture Delivering new Server capabilities 400 MHz System Bus Rapid Execution Engine Higher throughput when accessing memory and I/O devices for improved server headroom and scalability 2x clock speed for integer computations providing increased performance for web and database servers Higher clock speeds and greater throughput for server workloads resulting in higher transaction rates and faster response times New instructions improve response times for media servers, secure transactions and next generation web services Hyper Pipelined Technology Streaming SIMD Extensions 2 NetBurst micro-architecture enables higher transaction rates and and faster response times for for new new capabilities *Third party brands and names are the property of their respective owners 8 Page 4 4

Xeon Processor MP Integrated Three-Level Cache Architecture Level 1 Execution Trace Cache 12K u-ops Provides fast access to decoded micro-op instructions maximizing pipeline throughput High-bandwidth path to memory increasing throughput for large server workloads Integrated Level 3 Cache 1MB or 512KB for MP only Rapid Execution Engine Level 1 Data Cache 8KB Provides innovative cache access techniques reducing access latency for Rapid Execution Engine L2/L3 Cache Control Tightly synchronized with L1 Data Cache and Rapid Execution Engine improving access times Level 2 Advanced Transfer Cache 256KB for MP 512KB for DP Innovative, three-level cache architecture designed to to meet the needs of of high-end server applications *Third party brands and names are the property of their respective owners 9 Hyper-Threading Technology P6 Microarchitecture NetBurst Microarchitecture Dual Processor Hyper-Threading Enabled Dual Processor Processor Execution Resources Processor Execution Resources Processor Execution Resources Processor Execution Resources System Bus System Bus Hyper-Threading Technology enables multi-threaded server software to execute tasks in parallel within each processor Duplicates architectural state allowing 1 physical processor to appear as 2 logical processors to software (operating system and applications) One set of shared execution resources (caches, FP, ALU, dispatch, etc.) Today s threaded server software is compatible with Hyper-Threading Industry s s first simultaneous multi-threading technology on on a general purpose microprocessor *Third party brands and names are the property of their respective owners 10 Page 5 5

Historical Performance for MP $70.00 $60.00 $50.00 $40.00 $/Performance 4P Performance ~3X performance increase Cost per transaction drops 70000 60000 50000 40000 $30.00 30000 $20.00 20000 $10.00 10000 $0.00 Q397 Q398 Q399 Q300 Q301 Q302 Source: IDC Server Tracker Q3 01 & projection IDEAS Competitive Profiles (4P Example based on Compaq ProLiant series) 0 Significant installed base of of platforms 98-00 results in in 2-3X performance increase *Third party brands and names are the property of their respective owners 11 Introducing the next generation Itanium Processor 12 Page 6 6

McKinley Update McKinley on schedule for release in mid-2002 Sampling to OEMs since Feb 01 Pre-production pilot systems underway with end users McKinley builds on and extends Itanium architecture Improved data speed and throughput Additional execution resources for higher levels of parallelism Compatible with Itanium-based software Estimate McKinley to deliver ~1.5-2X performance increase over Itanium-based systems *Third party brands and names are the property of their respective owners 13 EPIC Architecture Features Explicitly Parallel Instruction Computing Performance through parallelism Focused on maximizing instructions executed in parallel Multiple execution units and issue ports in parallel 2 bundles (up to 6 Instructions) dispatched every cycle Massive on-chip resources 128 general registers, 128 floating point registers 64 predicate registers, 8 branch registers Provides compiler flexibility to exploit parallelism Efficient management engines (register stack engine) Scalable Modular, able to seamlessly add execution resources, issue ports Architecture designed to maximize synergy of compiler and hardware *Third party brands and names are the property of their respective owners 14 Page 7 7

Building Out the Itanium Architecture Itanium Processor 4 MB L3 on board, 96k L2, 32k L1 on -die Pipeline Stages 2.1 GB/s 64 bits wide 266 MHz 10 1 2 3 4 5 6 7 8 9 328 on-board Registers System bus Issue Ports McKinley 6.4 GB/s 128 bits wide 400 MHz 3 MB L3, 256k L2, 32k L1 all on-die 1 2 3 4 5 6 7 8 9 1011 8 328 on-board Registers 3X increase System bus bandwidth Large on-die cache, reduced latency Additional Issue ports 4 Integer, 3 Branch 2 FP, 2 SIMD 800 MHz 6 Instructions / Cycle 2 Load or 2 Store 6 Integer, 3 Branch 2 FP, 1 SIMD 1 GHz 6 Instructions / Cycle 2 Load & 2 Store McKinley delivers performance through: Bandwidth and cache improvements Micro-architecture enhancements Increased frequency and compatible with Itanium processor software Additional Execution units Increased Core frequency McKinley 221 million transistors total 25 million in CPU core *Third party brands and names are the property of their respective owners 15 Micro-Architecture Comparison Sun UltraSparc * III 2.4 System Bus Bandwidth GB/s 96K L1* 14 On-die Cache McKinley 8 6.4 GB/s L3 = 3 MB, L2 = 256K; L1 = 16K + 16K 1 2 3 4 96 Registers Issue Ports On-die Registers 1 2 3 4 5 6 7 8 9 1011 328 Registers 2 Integer 1 Branch 2 FP/VIS 1 Load/Store 900 MHz 4 instructions / Cycle Architecture Execution Units Core Frequency Instructions / Clk *8 MB L2 External Cache Source: www.sun.com, The SPARC Architecture Manual (Prentice Hall) 6 Integer, 3 Branch 2 FP, 1 SIMD 2 Load, 2 Store 1 GHz 6 Instructions / Cycle EPIC Architecture Other names and brands may be claimed as the property of others. *Third party brands and names are the property of their respective owners 16 Page 8 8

Enterprise Roadmap System Back-End/ Mid-Tier Server (4P-8P+) Q1 02 Itanium processor 460 Chipset 4M L3 /.18u Xeon processor MP 3 rd Party Chipsets 1.60 GHz / 1M il3 /.18u Q2 02 2H 02 2003 McKinley 870 Chipset 1 GHz / 3M /.18u Gallatin.13u Madison 6M.13u Performance /Volume DP Server Xeon processor (Prestonia) E7500 Chipset / 3 rd Party Chipsets 2.20 GHz / 512K /.13u Deerfield 3M.13u Nocona Ultra Dense Low Voltage Pentium III processor 440GX Chipset (UP) / 3 rd Party Chipset (DP) 800MHz / 512K /.13u Banias High End Workstation Itanium processor 460 Chipset 4M L3 /.18 McKinley 870 Chipset 1 GHz /3M/.18u Madison / Deerfield Mainstream Workstation Xeon processor (Prestonia) 860 Chipset 2.20 GHz / 512K /.13u Nocona All products, dates, and figures are preliminary, for planning purposes only and are subject to change *Third party brands and names are the property of their respective owners 17 Interconnect Technology Page 9 9

Interconnect Summary Inter-Facility Inter-System Intra-System Intra-Board Site to Site Data Center to Data Center Box to Box Blade to Blade Line Cards Chip to Chip/ Add in Card Ethernet InfiniBand 3GIO *Third party brands and names are the property of their respective owners 19 Evolving Interconnects InfiniBand* Ethernet 3GIO* Chip to Chip Interconnect Blade to Blade Box to Box Site to Site High Performance (RDMA) IPC/Clustering Within Data Center Networking SAN/Storage Shared I/O Protocols Memory, Message SRP, DAFS, RNDIS, IPoIB, Raw, VI, SDP TCP/IP Memory, PCI transparent *Third party brands and names are the property of their respective owners 20 Page 10 10

Enterprise Network 2003/4 JBOD Departmental SAN Switch SMB/ Departmental Switch NAS Dedicated Smart Array Server/Storage IPC/Cluster NAS (DAFS) MAN/WAN Internet Enterprise Backbone Router HI-end DAS Rack SAN SAN Switch/Router Complex Non-clustered servers NAS Data Center Legacy (FC) SAN Smart Arrays 10/100/1000 Ethernet 10G Ethernet S-ATA InfiniBand* architecture Fibre Channel *Third party brands and names are the property of their respective owners 21 Thank You 22 Page 11 11