Poulson: An 8 Core 32 nm Next Generation I ntel I tanium. Processor. Steve Undy I ntel I NTEL CONFI DENTI AL

Size: px
Start display at page:

Download "Poulson: An 8 Core 32 nm Next Generation I ntel I tanium. Processor. Steve Undy I ntel I NTEL CONFI DENTI AL"

Transcription

1 Poulson: An 8 Core 32 nm Next Generation I ntel I tanium Processor Steve Undy I ntel 1 I NTEL CONFI DENTI AL

2 This slide MUST be used with any slides rem oved from this presentation Legal Disclaim er I NFORMATI ON I N THI S DOCUMENT I S PROVI DED I N CONNECTI ON WI TH I NTEL PRODUCTS. NO LI CENSE, EXPRESS OR I MPLI ED, BY ESTOPPEL OR OTHERWI SE, TO ANY I NTELLECTUAL PROPERTY RI GHTS I S GRANTED BY THI S DOCUMENT. EXCEPT AS PROVI DED I N I NTEL S TERMS AND CONDI TI ONS OF SALE FOR SUCH PRODUCTS, I NTEL ASSUMES NO LI ABI LI TY WHATSOEVER, AND I NTEL DI SCLAI MS ANY EXPRESS OR I MPLI ED WARRANTY, RELATI NG TO SALE AND/ OR USE OF I NTEL PRODUCTS I NCLUDI NG LI ABI LI TY OR WARRANTI ES RELATI NG TO FI TNESS FOR A PARTI CULAR PURPOSE, MERCHANTABI LI TY, OR I NFRI NGEMENT OF ANY PATENT, COPYRI GHT OR OTHER I NTELLECTUAL PROPERTY RI GHT. I NTEL PRODUCTS ARE NOT I NTENDED FOR USE I N MEDI CAL, LI FE SAVI NG, OR LI FE SUSTAI NI NG APPLI CATI ONS. I ntel m ay m ake changes to specifications and product descriptions at any tim e, without notice. Software and workloads used in perform ance tests m ay have been optim ized for perform ance only on I ntel m icroprocessors. Perform ance tests, such as SYSm ark and MobileMark, are m easured using specific com puter system s, com ponents, software, operations and functions. Any change to any of those factors m ay cause the results to vary. You should consult other inform ation and perform ance tests to assist you in fully evaluating your contem plated purchases, including the perform ance of that product when com bined with other products. Configurations: [ describe config + what test used + who did testing]. For m ore inform ation go to http: / / / perform ance I ntel does not control or audit the design or im plem entation of third party benchm arks or Web sites referenced in this docum ent. I ntel encourages all of its custom ers to visit the referenced Web sites or others where sim ilar perform ance benchm arks are reported and confirm whether the referenced benchm arks are accurate and reflect perform ance of system s available for purchase. I ntel processor num bers are not a m easure of perform ance. Processor num bers differentiate features within each processor fam ily, not across different processor fam ilies. See / products/ processor_num ber for details. I ntel, processors, chipsets, and desktop boards m ay contain design defects or errors known as errata, which m ay cause the product to deviate from published specifications. Current characterized errata are available on request. The code nam es Poulson and Tukwila presented in this docum ent are only for use by I ntel to identify products, technologies, or services in developm ent, that have not been m ade com m ercially available to the public, i.e., announced, launched or shipped. They are not "com m ercial" nam es for products or services and are not intended to function as tradem arks. I nt el, I ntel I tanium, I tanium, I ntel Xeon, I ntel Core m icroarchitecture, and the I ntel logo are tradem arks or registered tradem arks of I ntel Corporation or its subsidiaries in the United States and other countries. Copyright 2011 I nt el Corporat ion. All right s reserved. 2

3 Agenda Design Space Overview Parallelism Data I ntegrity Power Conclusion 3

4 Design Space: Mission Critical Perform ance Both single-thread speed and total throughput are im portant Goal: > 2x generat ional perform ance im provem ent Scalabilit y Glueless up to 8 sockets Support larger topologies via node controllers from system vendors Reliabilit y Redundancy and self- correct ion Error detection and recovery via hardware and firm ware Com patibility Socket com patible with I tanium 9300 Processor Series (Tukwila) Com m on platform ingredients with Xeon E7 Processor Series 4

5 Poulson Overview Tukw ila Poulson Process 65nm 32nm Devices 2.05 Billion 3.1 Billion Area 700 m m m m 2 Power ( m ax TDP) 185W 170W I tanium Cores 4 8 Last Level Cache Size 24MB 32MB I ntel QuickPath I nterconnect Links 4 full + 2 half 4 full + 2 half I ntel QPI Link Speed 4.8GT/ s 6.4GT/ s I ntel Scalable Mem ory I nterface 4 4 Links I ntel SMI Link Speed 4.8GT/ s 6.4GT/ s 5

6 Poulson Overview QPI Core 0 Core 1 QPI Shared Last Level Cache QPI QPI Core 6 Core 7 8 I tanium Cores 32MB Last Level Cache Directory Cache Core 2 ½ QPI Syst em Logic Shared Last Level Cache Directory Cache Core 5 Core 3 Core 4 SMI SMI ½ QPI 4 full-width and 2 half-width I ntel QPI links 2 Direct ory caches 2 I ntegrated m em ory controllers with 2 I ntel SMI links each 6

7 Exam ple 8 - Socket Topology Full I ntel QPI Mem ory I / O I / O Hub Mem ory Mem ory Hub Mem ory Half I ntel QPI I ntel SMI Poulson Poulson Poulson Poulson Poulson Poulson Poulson Poulson Mem ory I / O Mem ory Mem ory I / O Mem ory Hub Hub 7

8 Poulson Core Tukw ila Poulson 8 Devices 108 m illion 89 m illion Area 69 m m 2 20 m m 2 Process scaled TDP Process scaled frequency 1.0 > 1.2 Threads I nstruction Q size 48 96x2 Max inst ruct ion issue/ cycle 6 12 Pipeline stages Pipeline hazard resolution I nterlock+ Stall Replay I nteger RF size I nteger RF ports 12R 8W 12R 12W RF protection Parity SECDED All- new I tanium core

9 I tanium New I nstructions I nteger operations m py4 m pyshl4 clz Thread Cont rol hint@priority Expanded Data Access Hints m ov dahr Expanded Software Prefetch lfetch.count 9

10 Exploiting Parallelism on All Levels I nst ruct ion Level Parallelism Mem ory Parallelism Thread Parallelism Multi- core Parallelism 10 Multi- level parallelism exposed to softw are control

11 I nstruction Parallelism Front- end Pipeline Flush Main Pipeline Replay Replay I PG FET FDC REN DEC REG EXE DET WRB WB2 I PG: Address Gen FET: Array Access FDC: Early Decode REN: Regist er Renam e Fetch width: 32B/ cycle Mult i- level branch Predict ion Renam ing: 128 logical to 185 physical registers I ndependent thread dom ain I BD/ Q I BD: I nstr Buffer/ Dispersal DEC: I nstr Decode REG: Register Read EXE: Execut e DET: Except ion Det ect WRB: Writ e- back WB2: Com m it Q holds 96 instructions I n-order issue I ndependent thread dom ain OZQ MLD Pipeline Concurrency via 3 decoupled pipelines MLD: Mid-level data cache OZQ: Architectural m em ory ordering queue OZQ holds 16 m em ory ops OOO issue I ndependent thread dom ain 11

12 I nstruction Parallelism Front- end Back- end MLD I PG FET FDC REN 32B 6 I nstructions MQ I Q Mem ory I nteger OZQ Com pilers precisely control instruction dispersal AQ FQ BQ NopQ ALU Float ing Point Branch NOP w ide issue under softw are control 12

13 Mem ory Parallelism Avoiding Hazards Expanded Soft ware Hint s Provides control over allocation, speculation and prefetch policies Multi-line software prefetch Reduced Speculat ion Cost s Spont aneous Deferrals on speculat ive loads Deferred load can transform into a prefetch to cache and/ or TLB New Hardware Prefet cher I nto FLD and/ or MLD Adaptive based on FLD/ MLD m iss patterns Reduced Dat a Hazards Expanded store to consum er bypassing in FLD and MLD Single-cycle MLD to FLD line transfers Pipeline bottlenecks rem oved 13

14 Mem ory Parallelism I ncreasing Throughput More pending m em ory operations in each core I ncreased from 48 to 64 operations in MLD I ncreased from 44 to 80 operations in Ring interface Ring Cache 8 I ndependent LLC pipelines and controllers per socket 2 caching agents per socket to generate I ntel QPI transactions I ncreased I ntel QPI link bandwidth for m ulti- socket throughput 14 Mem ory Subsystem I m provem ents Hom e agent in-flight transaction tracker increased by 50% MC scheduler algorithm changes focused on perform ance and power efficiency I ntel SMI link speed increased I m proved queuing and B/ W

15 Thread Parallelism I ntel Hyper-Threading Technology with Dual Dom ain Multithreading support Front-end and m ain pipelines are independently threaded Pipeline-specific thread switch m echanism s More per thread resources I nstruction queues Data TLBs Hardware page walker handles concurrent walks to both threads I nstructions provide software hints to thread switch hardware Attention to Thread Fairness Hardware t o ident ify unfair behavior Measures instruction retirem ent and data cache resource usage Firm ware configurable responses to unfairness By biasing towards victim thread I ncreased throughput via im proved threading 15

16 Multi- core and Multi- socket Parallelism 700GB/ s (aggregate) Core 3 LLC LLC Core 4 Core 2 LLC LLC Core 5 10 port Rout er 128GB/ s QPI x6 I ntel QPI HAx2 MCx2 45GB/ s I ntel SMI Core 1 LLC LLC Core 6 Core 0 LLC LLC Core 7 16

17 Reliability, Fault Tolerance and Recoverability I ntel I nstruction Replay Technology Main pipeline redesigned for error handling Enabled hardware and firm ware recovery of correct able errors I ncreased Error Detection and Correction Arrays MLI, MLD, LLC tag and Directory cache SECDED LLC data DECTED Register files (GR and FR) SECDED Logic path error detection in FP ALU via residues Key paths are protected end to end with invalidation and replay Reworked Error Det ect ion, Response and Report ing Fram ework Large increase in error detection and response capabilities I ncreased hardware responses and recovery I ntel Cache Safe Technology: lockout of failing cache locations 17

18 Pow er Aw are Design Replay- based Pipeline Enabled fully gated clocks Elim inated slow and power-wasting global pipeline stalls Aggressive Clock Gat ing Substantial reductions in both idle, typical and m ax power Rem oved Dynam ic Logic from FPU Low- Leakage FETs 18 Digit al Power Predict ion Measures, weights and averages key core state signals Beyond architectural state uses data patterns Estim ation error decreased from 35% to 2% 6 0 % TDP, 7 0 % idle C dyn reduction over process scaled Tukw ila

19 Poulson Status Well into post-si validation Booted and being tested on m ultiple operating system s Running in m any topologies On track for 2012 shipm ents 19

20 Conclusion All new core design 12- wide issue I ntel Hyper-Threading Technology with Dual Dom ain Multithreading I tanium New I nstructions and policies for fine-grain software control of hardware parallelism I ntel I nstruction Replay Technology for frequency, power and fault tolerance Core power efficiency: > 3x better than Tukwila 1 Ring- based design Large last level caches High bandwidth system interface Outstanding data reliability Enhanced error detection I m proved error recovery Better RAS integration 1 I nternal lab m easurem ent 20

21

22 Glossary DECTED DTB FI T FLB Double Error Correction, Triple Error Detection Second Level Data TLB Failure in Thousands First Level Branch Cache MLD MLI OZQ Mid Level Data Cache Mid Level I nstruction Cache Ordering czar Queue: architectural m em ory ordering QPI I ntel QuickPath I nterface FLD FLI HA I LP First Level Dat a Cache First Level I nstruction Cache Hom e Agent: I ntel QPI agent responsible for m anaging m em ory I nst ruct ion Level Parallelism I PF I tanium Processor Fam ily LLC MC Last Level Cache Mem ory Cont roller RAS RF SECDED Reliability, Availability and Serviceability Register File Single Error Correction, Double Error Detection SMI I ntel Scalable Mem ory I nt erconnect TD P TLB Therm al Design Power Translat ion Look- aside Buffer 22

Transitioning the I ntel Next. ( Nehalem and W estm ere) into the. Lily Looi, St ephan Jourdan I ntel Corporation Hot Chips 21 August 24, 2009

Transitioning the I ntel Next. ( Nehalem and W estm ere) into the. Lily Looi, St ephan Jourdan I ntel Corporation Hot Chips 21 August 24, 2009 Transitioning the I ntel Next Generation Microarchitectures ( Nehalem and W estm ere) into the Mainstream Lily Looi, St ephan Jourdan I ntel Corporation Hot Chips 21 August 24, 2009 1 Agenda Next Generation

More information

AMD and OpenCL. Mike Houst on Senior Syst em Archit ect Advanced Micro Devices, I nc.

AMD and OpenCL. Mike Houst on Senior Syst em Archit ect Advanced Micro Devices, I nc. AMD and OpenCL Mike Houst on Senior Syst em Archit ect Advanced Micro Devices, I nc. Overview Brief overview of ATI Radeon HD 4870 architecture and operat ion Spreadsheet perform ance analysis Tips & tricks

More information

I ntel. QuickPath I nterconnect Overview. presented by Robert Safranek. Contributors: Gurbir Singh, Robert Maddox

I ntel. QuickPath I nterconnect Overview. presented by Robert Safranek. Contributors: Gurbir Singh, Robert Maddox Overview presented by Robert Safranek Contributors: Gurbir Singh, Robert Maddox Legal Disclaim er INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Forging a Future in Memory: New Technologies, New Markets, New Applications

Forging a Future in Memory: New Technologies, New Markets, New Applications Forging a Future in Memory: New Technologies, New Markets, New Applications Ed Doller V.P. Chief Memory Systems Architect Non-Volatile Memory Seminar Hot Chips Conference August 22, 2010 Memorial Auditorium

More information

Quadrics QsNet II : A network for Supercomputing Applications

Quadrics QsNet II : A network for Supercomputing Applications Quadrics QsNet II : A network for Supercomputing Applications David Addison, Jon Beecroft, David Hewson, Moray McLaren (Quadrics Ltd.), Fabrizio Petrini (LANL) You ve bought your super computer You ve

More information

Intel Architecture for Software Developers

Intel Architecture for Software Developers Intel Architecture for Software Developers 1 Agenda Introduction Processor Architecture Basics Intel Architecture Intel Core and Intel Xeon Intel Atom Intel Xeon Phi Coprocessor Use Cases for Software

More information

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation Intel Atom Processor Based Platform Technologies Intelligent Systems Group Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Itanium 2 Processor Microarchitecture Overview

Itanium 2 Processor Microarchitecture Overview Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation Next Generation Itanium Processor Overview Gary Hammond Principal Architect Enterprise Platform Group Corporation August 27-30, 2001 Sam Naffziger Lead Circuit Architect Microprocessor Technology Lab HP

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

The Transition to PCI Express* for Client SSDs

The Transition to PCI Express* for Client SSDs The Transition to PCI Express* for Client SSDs Amber Huffman Senior Principal Engineer Intel Santa Clara, CA 1 *Other names and brands may be claimed as the property of others. Legal Notices and Disclaimers

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

IBM's POWER5 Micro Processor Design and Methodology

IBM's POWER5 Micro Processor Design and Methodology IBM's POWER5 Micro Processor Design and Methodology Ron Kalla IBM Systems Group Outline POWER5 Overview Design Process Power POWER Server Roadmap 2001 POWER4 2002-3 POWER4+ 2004* POWER5 2005* POWER5+ 2006*

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Reliability & Flow Control

Reliability & Flow Control Read 7.E Reliability & Flow Control Prof. ina Kat abi Some slides are from lectures by Nick Mckeown, Ion Stoica, Frans Kaashoek, Hari Balakrishnan, and Sam Madden 1 Previous Lecture How the link layer

More information

The Alpha and Microprocessors:

The Alpha and Microprocessors: The Alpha 21364 and 21464 icroprocessors: Continuing the Performance Lead Beyond Y2K Shubu ukherjee, Ph.D. Principal Hardware Engineer VSSAD Labs, Alpha Development Group Compaq Computer Corporation Shrewsbury,

More information

Solid-State Drive System Optimizations In Data Center Applications

Solid-State Drive System Optimizations In Data Center Applications Solid-State Drive System Optimizations In Data Center Applications Tahmid Rahman Senior Technical Marketing Engineer Non Volatile Memory Solutions Group Intel Corporation Flash Memory Summit 2011 Santa

More information

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false. CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in

More information

Lecture 16: Checkpointed Processors. Department of Electrical Engineering Stanford University

Lecture 16: Checkpointed Processors. Department of Electrical Engineering Stanford University Lecture 16: Checkpointed Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 18-1 Announcements Reading for today: class notes Your main focus:

More information

1. PowerPC 970MP Overview

1. PowerPC 970MP Overview 1. The IBM PowerPC 970MP reduced instruction set computer (RISC) microprocessor is an implementation of the PowerPC Architecture. This chapter provides an overview of the features of the 970MP microprocessor

More information

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product

More information

Advanced Com puter Architecture: s1/ 2005

Advanced Com puter Architecture: s1/ 2005 Advanced Com puter Architecture: s1/ 2005 Project Presen tation David Mirabito Handling branches through context fking Currently: b eq Hypothetical instruction stream (operands rem oved) Currently: b eq

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations? Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB)

More information

COSC4201 Instruction Level Parallelism Dynamic Scheduling

COSC4201 Instruction Level Parallelism Dynamic Scheduling COSC4201 Instruction Level Parallelism Dynamic Scheduling Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Outline Data dependence and hazards Exposing parallelism

More information

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2. Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 12: Hardware Assisted Software ILP and IA64/Itanium Case Study Lecture Outline Review of Global Scheduling,

More information

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle M7: Next Generation SPARC Hotchips 26 August 12, 2014 Stephen Phillips Senior Director, SPARC Architecture Oracle Safe Harbor Statement The following is intended to outline our general product direction.

More information

" # " $ % & ' ( ) * + $ " % '* + * ' "

 #  $ % & ' ( ) * + $  % '* + * ' ! )! # & ) * + * + * & *,+,- Update Instruction Address IA Instruction Fetch IF Instruction Decode ID Execute EX Memory Access ME Writeback Results WB Program Counter Instruction Register Register File

More information

Jomar Silva Technical Evangelist

Jomar Silva Technical Evangelist Jomar Silva Technical Evangelist Agenda Introduction Intel Graphics Performance Analyzers: what is it, where do I get it, and how do I use it? Intel GPA with VR What devices can I use Intel GPA with and

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Advanced Parallel Programming I

Advanced Parallel Programming I Advanced Parallel Programming I Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Levels of Parallelism RISC Software GmbH Johannes Kepler University

More information

OpenCL *, Heterogeneous Computing, and the CPU

OpenCL *, Heterogeneous Computing, and the CPU OpenCL *, Heterogeneous Computing, and the CPU Dr. Tim Mattson Visual Applications Research lab Intel Corporation * 3 rd party nam es are the property of their ow ners I ntel Corporation, 2009 Agenda Heterogeneous

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in

More information

Modern CPU Architectures

Modern CPU Architectures Modern CPU Architectures Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2014 16.04.2014 1 Motivation for Parallelism I CPU History RISC Software GmbH Johannes

More information

Static Multiple-Issue Processors: VLIW Approach

Static Multiple-Issue Processors: VLIW Approach Static Multiple-Issue Processors: VLIW Approach Instructor: Prof. Cristina Silvano, email: cristina.silvano@polimi.it Teaching Assistant: Dr. Giovanni Agosta, email: agosta@acm.org Dipartimento di Elettronica,

More information

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007, Chapter 3 (CONT II) Instructor: Josep Torrellas CS433 Copyright J. Torrellas 1999,2001,2002,2007, 2013 1 Hardware-Based Speculation (Section 3.6) In multiple issue processors, stalls due to branches would

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor Hot Chips 21 POWER7: IBM's Next Generation Server Processor Ronald Kalla Balaram Sinharoy POWER7 Chief Engineer POWER7 Chief Core Architect Acknowledgment: This material is based upon work supported by

More information

Intel Xeon Phi Coprocessor Performance Analysis

Intel Xeon Phi Coprocessor Performance Analysis Intel Xeon Phi Coprocessor Performance Analysis Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO

More information

Intel Xeon Phi coprocessor (codename Knights Corner) George Chrysos Senior Principal Engineer Hot Chips, August 28, 2012

Intel Xeon Phi coprocessor (codename Knights Corner) George Chrysos Senior Principal Engineer Hot Chips, August 28, 2012 Intel Xeon Phi coprocessor (codename Knights Corner) George Chrysos Senior Principal Engineer Hot Chips, August 28, 2012 Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

Kaisen Lin and Michael Conley

Kaisen Lin and Michael Conley Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC

More information

Adapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK]

Adapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Review and Advanced d Concepts Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Pipelining Review PC IF/ID ID/EX EX/M

More information

Pentium IV-XEON. Computer architectures M

Pentium IV-XEON. Computer architectures M Pentium IV-XEON Computer architectures M 1 Pentium IV block scheme 4 32 bytes parallel Four access ports to the EU 2 Pentium IV block scheme Address Generation Unit BTB Branch Target Buffer I-TLB Instruction

More information

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-06 1 Processor and L1 Cache Interface

More information

/ : Computer Architecture and Design Fall Final Exam December 4, Name: ID #:

/ : Computer Architecture and Design Fall Final Exam December 4, Name: ID #: 16.482 / 16.561: Computer Architecture and Design Fall 2014 Final Exam December 4, 2014 Name: ID #: For this exam, you may use a calculator and two 8.5 x 11 double-sided page of notes. All other electronic

More information

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Huiyang Zhou School of Computer Science University of Central Florida New Challenges in Billion-Transistor Processor Era

More information

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipelining to Superscalar Forecast Real

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Transient Fault Detection and Reducing Transient Error Rate Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Steven Swanson Outline Motivation What are transient faults? Hardware Fault Detection

More information

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last

More information

Portland State University ECE 588/688. Cray-1 and Cray T3E

Portland State University ECE 588/688. Cray-1 and Cray T3E Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector

More information

Intel and the Future of Consumer Electronics. Shahrokh Shahidzadeh Sr. Principal Technologist

Intel and the Future of Consumer Electronics. Shahrokh Shahidzadeh Sr. Principal Technologist 1 Intel and the Future of Consumer Electronics Shahrokh Shahidzadeh Sr. Principal Technologist Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Exploring the Effects of Hyperthreading on Scientific Applications

Exploring the Effects of Hyperthreading on Scientific Applications Exploring the Effects of Hyperthreading on Scientific Applications by Kent Milfeld milfeld@tacc.utexas.edu edu Kent Milfeld, Chona Guiang, Avijit Purkayastha, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER

More information

Intel s Architecture for NFV

Intel s Architecture for NFV Intel s Architecture for NFV Evolution from specialized technology to mainstream programming Net Futures 2015 Network applications Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction

More information

TDT 4260 lecture 7 spring semester 2015

TDT 4260 lecture 7 spring semester 2015 1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC

More information

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting Important OpenCL*-related Metrics with Intel GPA System Analyzer Introduction Intel SDK for OpenCL* Applications

More information

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation

More information

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lecture 29 Review CPU time: the best metric Be sure you understand CC, clock period Common (and good) performance metrics Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3

More information

E0-243: Computer Architecture

E0-243: Computer Architecture E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation

More information

Processor Architecture

Processor Architecture Processor Architecture Advanced Dynamic Scheduling Techniques M. Schölzel Content Tomasulo with speculative execution Introducing superscalarity into the instruction pipeline Multithreading Content Tomasulo

More information

Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance

Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance By Robert E Larsen NVM Cache Product Line Manager Intel Corporation August 2008 1 Legal Disclaimer INFORMATION IN THIS

More information

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Dealing With Control Hazards Simplest solution to stall pipeline until branch is resolved and target address is calculated

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327281-001US

More information

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model. Performance of Computer Systems CSE 586 Computer Architecture Review Jean-Loup Baer http://www.cs.washington.edu/education/courses/586/00sp Performance metrics Use (weighted) arithmetic means for execution

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/

More information

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

Lecture 9: Multiple Issue (Superscalar and VLIW)

Lecture 9: Multiple Issue (Superscalar and VLIW) Lecture 9: Multiple Issue (Superscalar and VLIW) Iakovos Mavroidis Computer Science Department University of Crete Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019

250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019 250P: Computer Systems Architecture Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019 The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction and instr

More information

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA CSE 490/590 Computer Architecture Complex Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo Last time Virtual address caches Virtually-indexed, physically-tagged cache design

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights The Alpha 21264 Microprocessor: Out-of-Order ution at 600 MHz R. E. Kessler Compaq Computer Corporation Shrewsbury, MA 1 Some Highlights Continued Alpha performance leadership 600 MHz operation in 0.35u

More information

Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group

Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group Simultaneous Multi-threading Implementation in POWER5 -- IBM's Next Generation POWER Microprocessor Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group Outline Motivation Background Threading Fundamentals

More information

Hyperthreading 3/25/2008. Hyperthreading. ftp://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.

Hyperthreading 3/25/2008. Hyperthreading. ftp://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01. Hyperthreading ftp://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.pdf Hyperthreading is a design that makes everybody concerned believe that they are actually using

More information