Next Generation Technology from Intel Intel Pentium 4 Processor

Similar documents
Pentium 4 Processor Block Diagram

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs

Agenda. Pentium III Processor New Features Pentium 4 Processor New Features. IA-32 Architecture. Sunil Saxena Principal Engineer Intel Corporation

Inside Intel Core Microarchitecture

Exploring the Effects of Hyperthreading on Scientific Applications

This Material Was All Drawn From Intel Documents

Intel released new technology call P6P

EC 513 Computer Architecture

Inside Intel Core Microarchitecture: Setting New Standards for Energy-Efficient Performance

Superscalar Processors

Intel Enterprise Processors Technology

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

Intel Core Microarchitecture

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

Basic Computer Architecture

Pentium IV-XEON. Computer architectures M

XT Node Architecture

Intel instruc,on set architecture-32 bit (IA-32)

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

How to write powerful parallel Applications

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights

Superscalar Machines. Characteristics of superscalar processors

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

1. PowerPC 970MP Overview

Superscalar Processors

Processing Unit CS206T

Introduction to Microprocessor

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

CS 426 Parallel Computing. Parallel Computing Platforms

A Key Theme of CIS 371: Parallelism. CIS 371 Computer Organization and Design. Readings. This Unit: (In-Order) Superscalar Pipelines

Hardware-Based Speculation

Computer Science 146. Computer Architecture

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

COMPUTER ORGANIZATION AND DESI

Page 1. Review: Dynamic Branch Prediction. Lecture 18: ILP and Dynamic Execution #3: Examples (Pentium III, Pentium 4, IBM AS/400)

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Jim Keller. Digital Equipment Corp. Hudson MA

Pipelined Processor Design. EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon

Intel Architecture for Software Developers

Microarchitecture Overview. Performance

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

Getting CPI under 1: Outline

EITF20: Computer Architecture Part4.1.1: Cache - 2

Superscalar Organization

Chapter 5. Introduction ARM Cortex series

Microarchitecture Overview. Performance

Unit 8: Superscalar Pipelines

The Pentium II/III Processor Compiler on a Chip

Hyperthreading 3/25/2008. Hyperthreading. ftp://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.

Pipelining. CSC Friday, November 6, 2015

Itanium 2 Processor Microarchitecture Overview

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

Pipelining and Vector Processing

Processor Design Pipelined Processor. Hung-Wei Tseng

Communications and Computer Engineering II: Lecturer : Tsuyoshi Isshiki

Computer Architecture and Data Manipulation. Von Neumann Architecture

Chapter 06: Instruction Pipelining and Parallel Processing. Lesson 14: Example of the Pipelined CISC and RISC Processors

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

PowerPC 740 and 750

Handout 2 ILP: Part B

Multiple Instruction Issue. Superscalars

All About the Cell Processor

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Datapoint 2200 IA-32. main memory. components. implemented by Intel in the Nicholas FitzRoy-Dale

Advanced Computer Architecture

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

Processor (IV) - advanced ILP. Hwansoo Han

November 7, 2014 Prediction

Processors. Young W. Lim. May 12, 2016

Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview

Advanced Computer Architecture

" # " $ % & ' ( ) * + $ " % '* + * ' "

The Microarchitecture Level

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

The World s First Seventh-Generation x86 Processor: Delivering the Ultimate Performance for Cutting-Edge Software Applications

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

CS146 Computer Architecture. Fall Midterm Exam

Hardware-based Speculation

CS425 Computer Systems Architecture

IA-32 Architecture COE 205. Computer Organization and Assembly Language. Computer Engineering Department

Programmazione Avanzata

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures

ECE 571 Advanced Microprocessor-Based Design Lecture 4

Processors. Young W. Lim. May 9, 2016

The Processor: Instruction-Level Parallelism

Limitations of Scalar Pipelines

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CS 152, Spring 2012 Section 8

IA-32 Intel Architecture Optimization

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem)

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

Transcription:

Next Generation Technology from Intel Intel Pentium 4 Processor 1

The Intel Pentium 4 Processor Platform Intel s highest performance processor for desktop PCs Targeted at consumer enthusiasts and business power users at introduction All new IA-32 micro-architecture designed to deliver leadership performance Performance for the visual Internet Designed for Where the Internet Is Is Going 2

Intel Pentium 4 Processor Design Goals Deliver world class, end user appreciable performance across both existing and emerging applications and usage models Internet, imaging, streaming video, speech, 3D and multimedia Multi-tasking user environments Deliver performance headroom and scalability for the future Base micro-architecture must deliver both performance and frequency scalability well into the future Micro-architecture that that will will Drive Performance Leadership for for the the Next Several Years 3

Introducing the Foundation for the Intel Pentium 4 Processor Performance P6 Micro-Architecture P5 Micro-Architecture Intel NetBurst Micro-Architecture Today Multimedia Visual Internet Visual Computing 486 Micro-architecture Windows* Time 4

The Intel Pentium III Processor Family Brought Us... Advanced Transfer Cache Streaming SIMD Extensions 133 MH System bus 5

The Intel Pentium 4 Processor Intel NetBurst Micro-Architecture 400 MH System Bus Advanced Dynamic Execution Rapid Execution Engine 42 million transistors Advanced Transfer Cache Hyper Pipelined Technology Streaming SIMD Extensions 2 Execution Trace Cache Enhanced Floating Point / Multi-Media 6

Rapid Execution Engine Arithmetic Logic Units (ALUs( ALUs) ) run at twice the core frequency Executes certain instructions in 1/2 core clock tick Results in higher execution throughput and reduced latency of execution Integer Instructions Executing at at 2X core frequency 7

Hyper Pipelined Technology Intel Pentium 4 processor doubles the pipeline depth to 20 stages Significantly increases processor performance and frequency capability 1 2 3 4 5 Prefetch Decode Decode Exec Wrtbck P5 Micro-Architecture 233MH 1 2 3 4 5 6 7 8 9 10 Fetch Fetch Decode Decode Decode Rename ROB Rd Rdy/Sch P6 Micro-Architecture Dispatch Exec 1 GH Today Intro at t1.4 GH.18µ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive Intel NetBurst TM Micro-Architecture 8

Advanced Dynamic Execution Very deep, out-of-order speculative execution engine Keeps the Execution units executing instructions Up to 126 instructions in flight - 3x P6 Up to 48 loads and 24 stores in pipeline 2x P6 Enhanced branch prediction capability Keeps the processor executing to the correct program flow Reduces the mis-prediction penalty associated with deeper pipelines Advanced branch prediction algorithm 4K entry branch target array - 8x P6 Keeps the Correct Instructions Executing 9

Revolutionary New Cache Subsystem Advanced Level 1 Execution Trace Cache Caches decoded instructions (~12K micro-ops) Removes decoder latency from main execution loop Caches the path of program execution flow Integrates taken branches into single line Makes more efficient use of cache memory Level 2 Advanced Transfer Cache Full Speed, 256 KB Delivers ~45 GB/sec data throughput( @1.4GH) Bandwidth / performance increases with core frequency Optimies Data Transfer to the Core 10

Streaming SIMD Extension 2 (SSE2) SSE2 Extends MMX and SSE technology with the addition of 144 new instructions 128-bit SIMD integer arithmetic 128-bit SIMD double precision floating point Cache and memory management operations Delivers Performance Increases Across Broad Range of of Applications 11

400 MH System Bus 3.2 GB/sec data transfer rate 3x bandwidth of Pentium III processor system bus 400 MH quad pumped off 100MH clock Split-transaction, deeply pipelined 128-byte lines with 64-byte accesses ( 32-byte lines on P6) Makes better use of the system bus bandwidth P6 Bus 400 MH System Bus Highest Bandwidth Desktop Bus Bus 3.2 3.2 GB/sec 12

Intel Pentium 4 Processor Intel NetBurst Micro-architecture Summary Foundation for next generation Intel IA-32 processors Hyper pipelined for industry leading performance scalability Exceptional performance advantages from architectural innovations Hyper Pipeline Technology Rapid Execution Engine 400 MH System bus Execution Trace Cache Intel s Next Generation Micro-architecture 13

Beyond the Silicon Intel Pentium 4 Processor Usage Models Consumer Usage Model Categories Enhanced streaming media Real-time video encoding Video/photo editing 3D visualiation Video-as-input HDTV SW decode Speech recognition Voice over IP Excels where users need and and recognie performance most Business Usage Model Categories e-business Knowledge Management Data Mining/Visualiation Communications Telephony Security 14

Intel Pentium 4 Processor Summary Next Generation NetBurst micro-architecture Designed for the visual internet Headroom for the future Designed for where the Internet is is going 15