Superscalar Processors

Size: px
Start display at page:

Download "Superscalar Processors"

Transcription

1 Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input of one instruction depends upon the output of a preceding instruction locate nearby independent instructions o issue & complete instructions in an order different than specified in the code stream uses branch prediction methods rather than delayed branches RISC or CICS Super Pipelined Processor Pipeline stages can be segmented into n distinct non-overlapping parts each of which can execute in of a clock cycle Limitations of Instruction-Level Parallelism True Data Dependency (Flow Dependency, Write-After-Read [WAR] Dependency) o Second Instruction requires data produced by First Instruction Procedural Dependency o Branch Instruction Instructions following either Branches-Taken Branches-NotTaken have a procedural-dependency on the branch o Variable-Length Instructions Partial Decoding is required prior to the fetching of the subsequent instruction, i.e., computing PC value Resource Conflicts o Memory, cache, buses, register ports, file ports, functional units access Output Dependency o See In-Order Issue Out-of-Order Completion below Anti-dependency o See Out-of-Order Issue Out-of-Order Completion below Instruction-Level Parallelism Instructions in sequence are independent instructions can be executed in parallel by overlapping Degree of Instruction-Level Parallelism depends upon o Frequencies of True Data Dependencies & Procedural Dependencies in the code which is dependent upon the Instruction Set Architecture & the Application o Operational Latency the time until the result of an instruction is available for use in a subsequent instruction, i.e., Execution Completion Time Machine Parallelism Number of instructions that can be fetched and executed simultaneously, i.e., number of parallel pipelines Sophistication, i.e., speed, of the mechanisms that the processor uses to locate independent instructions

2 Instruction Issue Policy Instruction Issue o process of initiating instruction execution in the processor s functional units o occurs when instruction moves from the decode stage to the first execute stage of the pipeline Instruction Issue Policy o protocol used to issue instructions o looking ahead of the current point of execution to locate instructions that can be placed into the pipeline & executed Types of Orderings o Order in which instructions are fetched o Order in which instructions are executed o Order in which instructions update register contents & memory locations Instruction Issue Policy Categories o In-Order Issue In-Order Completion superscalar pipeline specifications (example) fetch/decode two instructions per time period digital arithmetic functional units (2) floating-point arithmetic functional unit (1) write-back pipeline stages (2) code fragment ( six instructions ) constraints I1 requires two cycles to complete I3 & I4 conflict for the same functional unit I5 depends upon a value produced by I4 I5 & I6 conflict for a functional unit Procedure fetch next two instructions into the decode stage issuing of instructions must wait until current instructions have passed the decode pipeline stages conflict for a functional unit issuing of instructions must temporary halt functional unit requires more than one cycle to generate a result issuing of instructions must temporary halt (figure 14.4 page 531)

3 o In-Order Issue Out-of-Order Completion Procedure fetch next two instructions into the decode stage number of instructions in execute stages maximum degree of machine parallelism across all functional units instruction issuing stalled by resource conflict data dependency procedural dependency output dependency, i.e., write-after-write (WAW) dependency code fragment ( four instructions ) I1: R3 R3 op R5 I2: R4 R3 + 1 I3: R3 R5 + 1 I4: R7 R3 op R4 Constraints I1 must execute before I2 I1 produces R3 contents required by I2 I3 must execute before I4 I3 produces R3 contents required by I4 (RAW) I3 must complete after I1 I3 must write R3 after I1 writes R3 (WAW) conflict for a functional unit issuing of instructions must temporary halt functional unit requires more than one cycle to generate a result issuing of instructions must temporary halt issuing an instruction must stall if its result might later be overridden by an instruction issued earlier which takes longer to complete (WAW) Instruction Issue Logic more complex interrupts & exceptions handling o resumption some instructions which logically follow the interrupted instruction may have already completed and perhaps must not be executed again

4 o Out-of-Order Issue Out-of-Order Completion In-Order Issue decode instructions up to the point of dependency; cannot look ahead of dependency to subsequent instructions independent of those in the pipeline that could be usefully introduced into the pipeline Out-of-Order Issue decouple the decode stages from the execute stages of the pipeline Instruction Window Buffer decoded instructions are placed in the Instruction Window Buffer Instruction Window Buffer full instruction fetching & decoding must temporarily halt functional unit becomes available instruction from the Instruction Window Buffer may be issued provided it needs the particular functional unit available no conflicts or dependencies block the instruction processor has lookahead capability allowing it to identify independent instructions that can be brought into the execute stage Procedure fetch next two instructions into the decode stage during each cycle, subject to the buffer size, two instructions move from the decode stage to the Instruction Window Buffer ( instruction) Instruction Window Buffer processor has sufficient information to decide when it can be issued more instructions are available for issuing reducing probability of pipeline stage stall number of instructions in execute stages maximum degree of machine parallelism across all functional units instruction issuing stalled by resource conflict data dependency procedural dependency antidependency, read-after-write dependency (RAW), i.e., second instruction destroys a value that is required by the first instruction code fragment ( four instructions ) I1: R3 R3 op R5 I2: R4 R3 + 1 I3: R3 R5 + 1 I4: R7 R3 op R4 I3 cannot complete execution before I2 begins execution and has fetched its operands I3 updates R3 which is a source register for I2 (RAW) Reorder Buffer & Tomasulo s Algorithm Appendix I

5 Register Renaming execution sequence & data flow RAW data dependencies & resource conflicts out-of-order instruction issuing or completion WAW & WAR dependencies values in registers no longer reflect sequence of values dictated by the program flow antidependencies & output dependencies storage conflicts, i.e., multiple instructions are completing for the same registers pipeline constraints compiler register optimization techniques maximize register conflicts registers are allocated dynamically by the processor hardware; these registers are associated with the values needed by instructions at certain points in time instruction with a register as a destination operand executes new register is allocated for that value subsequent instructions that access that value as a source operand in that register must undergo a renaming process, i.e., the register references must be revised to refer to the registers containing the needed values the same original register references in several different instructions may refer to different actual registers if different values are indicated code fragment ( four instructions ) I1: R3 b R3 a op R5 a I2: R4 b R3 b + 1 I3: R3 c R5 a + 1 I4: R7 b R3 c op R4 b register allocation protocol Rx a is allocated before Rx b Rx b is allocated before Rx c allocation made for a particular logical register subsequent instruction references to that logical register as a source operand are made to refer to the most recently allocated hardware register in the program sequence of instructions Creation of R3 c in I3 avoids RAW dependency in I2 OUTPUT dependency on I1 Does not interfere w2ith the correct value being accessed by I4 hence with renaming I3 can be issued immediately Without register renaming, I3 cannot be issued until I1 is complete and I2 is issued Scoreboarding Appendix I Machine Parallelism page 535 figure 14.5

6 Branch Prediction Intel fetched both o next sequential instruction o speculatively fetched the branch target instruction o two pipeline stages between prefetch & execution two cycle delay when branch is taken RISC machines o delayed branch strategy o calculate the single instruction that follows the branch instruction o if necessary, fetch a new instruction stream superscalar machines o static branch prediction always predict branch not taken always predict branch taken o dynamic branch prediction based on branch history analysis branch based on recent history branch based on long term history branch based on a weighted historical computation Superscalar Execution Static Program linear sequence of instructions Execution Stage Processor performs the execution stage of each instruction in the order of the true data dependencies and hardware resource availability Instruction Fetch Process With Branch Prediction Window of Execution dynamic stream of instructions artificial dependencies removed instructions are structured according to their true data dependencies Committing or Retiring the Instruction Instructions are conceptually placed back into the original linear sequence and the results are recorded parallel, multiple pipelines instructions complete in an order different from the static program branch prediction & speculative execution instructions may complete execution & be abandoned permanent storage & program-visible registers cannot be updated immediately results must be held in some sort of temporary storage until it is determined that the sequential model would have executed the instruction Superscalar Implementation (Hardware) Instruction Fetch Strategies o Simultaneously Fetching Multiple Instructions o Predicting the Outcomes of, and Fetching Beyond, Conditional Branch Instructions o Requires the use of Multiple Pipeline Fetch & Decode Stages o Requires the use of Branch Prediction Logic True Dependencies Logic o Logical Register Assignments o Tracking the Register Assignments to where they are needed during execution Mechanisms for Issuing Multiple Instructions in Parallel Resources for Parallel Execution of Multiple Instructions o Multiple Functional Units o Memory Hierarchies capable of simultaneously servicing Multiple Memory References Mechanisms for Committing the Process State in the Original Sequential Order

7 Intel 80xxx traditional CICS non-pipelined machine pipelined processor reduced integer operations average latency from two to four cycles to one cycle executed a single instruction per cycle Pentium implemented a limited superscalar system two separate integer execution units Pentium Pro implemented a full featured superscalar system Pentium 4 Operational Protocol o fetch instructions from memory in static program order o translate each instruction into one or more micro-operations o execute the micro-ops in a superscalar pipeline organization, i.e., micro-ops can be executed out of order o results of each micro-op execution are submitted to the register set in the order of the original program flow Architecture o outer CICS shell Front End Micro-Op Generation Branch Target Buffer (BTB) branch prediction static branch prediction determines the next effective program counter value Instruction Translation Lookaside Buffer (I-TLB) Translates the linear instruction pointer address into a physical address required to access the L2 cache L2 Cache is feed sequential static code from the original program each cache line fetch in L2 contains the next sequential instruction L1 Instruction Trace Cache contains micro-op code page figures 14.7 & 14.8 Fetch/Decode Unit, with the aid of the BTB & I-TLB, retrieves instructions, using in-order issue, unless it is altered by branch prediction by the BTB & I-TLB, from the L2 cache (static code) in 64 byte packets and feeds the L1 instruction cache, i.e., Trace Cache scans the bytes to determine the instruction boundaries translates each instruction into one to four micro-ops each micro-op is a 118 bit RISC instruction micro-op (RISC) are stored in the L1 Trace Cache processor operates from the Trace Cache cache miss In-Order Front End feeds new instructions to the Trace Cache

8 o inner RISC core pipeline 20 stages selected instructions use a longer pipeline Trace Cache Next Instruction Pointer BTB caches the history of recent executions of branch instructions dynamic branch prediction strategy based on the recent history if a particular branch instruction is in the BTB then its branch history is used to predict the current branch strategy (dynamic branch prediction) after the branch execution, the BTB entry is updated with the recent results if a particular branch instruction is not in the BTB then an entry is made; an older entry is deleted if necessary static branch prediction algorithm IP-relative backward conditional branches (loops) predict that branch is taken IP-relative forward conditional branches predict that branch is not taken Branch addresses that are not IP-relative, predict taken if branch is a return predict not taken otherwise four-way set-associative cache with 512 lines tag is the branch address branch destination address for last time branch was taken 4 bit history field to indicate the last 16 times with the specified branch Yeh s Algorithm used for dynamic branch prediction Trace Cache Fetch Trace Cache assembles the decoded micro-ops into program-ordered sequences, i.e., traces Micro-ops are fetched sequentially from the Trace Cache subject to branch prediction logic Instructions requiring more than four micro-ops are transferred to the microcode ROM unit which is a microprogrammed control unit; after the ROM finishes the sequencing of the current micro-ops, fetching resumes from the Trace Cache Drive pipeline delivers micro-ops from the Trace Cache to the Rename/Allocator module Allocate Stage (allocates resources) resource required by micro-op arriving during a clock cycle is unavailable allocator stalls the pipeline three micro-ops arrive at Allocator during each clock cycle allocates a reorder buffer (ROB) entry which tracks the completion status of a selected micro-op which is currently in process 126 micro-ops could be in process at any time allocates an integer or floating-point register entry for the result data possibly allocates a load or store buffer used to track one of the 48 loads or 24 stores in the pipeline allocates an entry in one of the two micro-op queues in front of the instruction schedulers

9 Reorder Buffer (ROB) circular buffer holds a maximum of 126 micro-ops contains 128 hardware registers buffer entry fields o State Scheduled for Execution Dispatched for Execution Completed Execution Ready for Retirement o Memory Address address of instruction that generated the micro-op o Micro-op o Alias Register If the micro-op references one of the 16 architectural registers, then the entry redirects tat reference to one of the 128 hardware registers micro-ops enter ROB in-order micro-ops are dispatched from the ROB to the Dispatch/Execute unit out-of-order dispatch criterion appropriate execution unit is available all necessary data items required by micro-code are available micro-ops are retired from the ROB in-order micro-ops are retired oldest first after each micro-op are designated as ready for retirement Register Renaming Architectural Registers (16) available to programmers Floating point registers (8) EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP Physical Registers (128) available to hardware renaming : Architectural Registers Physical Registers removes false dependencies caused by a limited number of Architectural Registers preserves true dependencies (RAW) Micro-Op Queuing page 540, figure 14.9f After Resource Allocation & Register Renaming micro-ops are placed in one of two micro-op queues Micro-ops Queue for micro-ops requiring memory operations Micro-ops Queue for micro-ops that do not require memory operations each queue, individually, obeys FIFO access between queues there is no access discipline

10 Micro-Op Scheduling and Dispatching Schedulers look for micro-ops with Status indicating that they have all operands Execution unit is available for the indicated micro-op micro-op is dispatched More than one micro-op is available for an available execution unit, micro-ops are dispatched in FIFO order Maximum of 6 micro-ops can be dispatched in a single cycle Ports attach Schedulers to Execution Units (4) Port 1 handles Simple Integer Operations & Branch Mispredictions Port 0 handles other Integer & Floating Point Instructions MMX Execution Units are allocated between Port 0 and Port 1 Ports 2 and Port 3 are used for memory loads and Stores Integer & Floating Point Execution Units Integer Register Files Floating Point Register Files source for pending operations by execution units o o o Execution Units retrieve information from the Register Files and the L1 Data Cache Separate pipeline stage is used to compute Flags which are input to branch instructions Subsequent pipeline stage compares the actual branch result with the prediction If prediction is wrong, there are micro-ops in various stages of the pipeline that must be removed; then the proper branch destination is provided to the Branch Predictor during a Drive Stage, thus restarting the pipeline from a new target address

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. CS 320 Ch. 16 SuperScalar Machines A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. A superpipelined machine is one in which a

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? Common instructions (arithmetic, load/store,

More information

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise CSCI 4717/5717 Computer Architecture Topic: Instruction Level Parallelism Reading: Stallings, Chapter 14 What is Superscalar? A machine designed to improve the performance of the execution of scalar instructions.

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

Pentium IV-XEON. Computer architectures M

Pentium IV-XEON. Computer architectures M Pentium IV-XEON Computer architectures M 1 Pentium IV block scheme 4 32 bytes parallel Four access ports to the EU 2 Pentium IV block scheme Address Generation Unit BTB Branch Target Buffer I-TLB Instruction

More information

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Superscalar Processors Ch 14

Superscalar Processors Ch 14 Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency? Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers) physical register file that is the same size as the architectural registers

More information

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Portland State University ECE 587/687. The Microarchitecture of Superscalar Processors

Portland State University ECE 587/687. The Microarchitecture of Superscalar Processors Portland State University ECE 587/687 The Microarchitecture of Superscalar Processors Copyright by Alaa Alameldeen and Haitham Akkary 2011 Program Representation An application is written as a program,

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

Instruction Level Parallelism (ILP)

Instruction Level Parallelism (ILP) 1 / 26 Instruction Level Parallelism (ILP) ILP: The simultaneous execution of multiple instructions from a program. While pipelining is a form of ILP, the general application of ILP goes much further into

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2. Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

The Microarchitecture of Superscalar Processors

The Microarchitecture of Superscalar Processors The Microarchitecture of Superscalar Processors James E. Smith Department of Electrical and Computer Engineering 1415 Johnson Drive Madison, WI 53706 ph: (608)-265-5737 fax:(608)-262-1267 email: jes@ece.wisc.edu

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

Announcements. EE382A Lecture 6: Register Renaming. Lecture 6 Outline. Dynamic Branch Prediction Using History. 1. Branch Prediction (epilog)

Announcements. EE382A Lecture 6: Register Renaming. Lecture 6 Outline. Dynamic Branch Prediction Using History. 1. Branch Prediction (epilog) Announcements EE382A Lecture 6: Register Renaming Project proposal due on Wed 10/14 2-3 pages submitted through email List the group members Describe the topic including why it is important and your thesis

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

Dynamic Scheduling. CSE471 Susan Eggers 1

Dynamic Scheduling. CSE471 Susan Eggers 1 Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip

More information

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar Complex Pipelining COE 501 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Diversified Pipeline Detecting

More information

The Microarchitecture of Superscalar Processors

The Microarchitecture of Superscalar Processors The Microarchitecture of Superscalar Processors James E. Smith Department of Electrical and Computer Engineering 1415 Johnson Drive Madison, WI 53706 ph: (608)-265-5737 fax:(608)-262-1267 email: jes@ece.wisc.edu

More information

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency Superscalar Processors Ch 13 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction 1 New dependency for superscalar case? (8) Name dependency (nimiriippuvuus) two use the same

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

TECH. CH14 Instruction Level Parallelism and Superscalar Processors. What is Superscalar? Why Superscalar? General Superscalar Organization

TECH. CH14 Instruction Level Parallelism and Superscalar Processors. What is Superscalar? Why Superscalar? General Superscalar Organization CH14 Instruction Level Parallelism and Superscalar Processors Decode and issue more and one instruction at a time Executing more than one instruction at a time More than one Execution Unit What is Superscalar?

More information

Chapter. Out of order Execution

Chapter. Out of order Execution Chapter Long EX Instruction stages We have assumed that all stages. There is a problem with the EX stage multiply (MUL) takes more time than ADD MUL ADD We can clearly delay the execution of the ADD until

More information

Hardware and Software Architecture. Chapter 2

Hardware and Software Architecture. Chapter 2 Hardware and Software Architecture Chapter 2 1 Basic Components The x86 processor communicates with main memory and I/O devices via buses Data bus for transferring data Address bus for the address of a

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation

More information

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011 5-740/8-740 Computer Architecture Lecture 0: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Fall 20, 0/3/20 Review: Solutions to Enable Precise Exceptions Reorder buffer History buffer

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Case Study IBM PowerPC 620

Case Study IBM PowerPC 620 Case Study IBM PowerPC 620 year shipped: 1995 allowing out-of-order execution (dynamic scheduling) and in-order commit (hardware speculation). using a reorder buffer to track when instruction can commit,

More information

Basic concepts UNIT III PIPELINING. Data hazards. Instruction hazards. Influence on instruction sets. Data path and control considerations

Basic concepts UNIT III PIPELINING. Data hazards. Instruction hazards. Influence on instruction sets. Data path and control considerations UNIT III PIPELINING Basic concepts Data hazards Instruction hazards Influence on instruction sets Data path and control considerations Performance considerations Exception handling Basic Concepts It is

More information

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Tomasulo

More information

EECC551 Exam Review 4 questions out of 6 questions

EECC551 Exam Review 4 questions out of 6 questions EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

Processor: Superscalars Dynamic Scheduling

Processor: Superscalars Dynamic Scheduling Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),

More information

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction

More information

Pentium 4 Processor Block Diagram

Pentium 4 Processor Block Diagram FP FP Pentium 4 Processor Block Diagram FP move FP store FMul FAdd MMX SSE 3.2 GB/s 3.2 GB/s L D-Cache and D-TLB Store Load edulers Integer Integer & I-TLB ucode Netburst TM Micro-architecture Pipeline

More information

The P6 Architecture: Background Information for Developers

The P6 Architecture: Background Information for Developers The P6 Architecture: Background Information for Developers 1995, Intel Corporation P6 CPU Overview CPU Dynamic Execution 133MHz core frequency 8K L1 caches APIC AA AA A AA A A A AA AA AA AA A A AA AA A

More information

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

EITF20: Computer Architecture Part3.2.1: Pipeline - 3 EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information

E0-243: Computer Architecture

E0-243: Computer Architecture E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation

More information

Chapter 06: Instruction Pipelining and Parallel Processing

Chapter 06: Instruction Pipelining and Parallel Processing Chapter 06: Instruction Pipelining and Parallel Processing Lesson 09: Superscalar Processors and Parallel Computer Systems Objective To understand parallel pipelines and multiple execution units Instruction

More information

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures Superscalar Architectures Have looked at examined basic architecture concepts Starting with simple machines Introduced concepts underlying RISC machines From characteristics of RISC instructions Found

More information

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance

More information

Super Scalar. Kalyan Basu March 21,

Super Scalar. Kalyan Basu March 21, Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches

More information

Lecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections )

Lecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections ) Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections 2.3-2.6) 1 Correlating Predictors Basic branch prediction: maintain a 2-bit saturating counter for each

More information

Static vs. Dynamic Scheduling

Static vs. Dynamic Scheduling Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor

More information

Precise Exceptions and Out-of-Order Execution. Samira Khan

Precise Exceptions and Out-of-Order Execution. Samira Khan Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 09

More information

Lecture 9: Superscalar processing

Lecture 9: Superscalar processing Lecture 9 Superscalar processors Superscalar- processing Stallings: Ch 14 Instruction dependences Register renaming Pentium / PowerPC Goal Concurrent execution of scalar instructions Several independent

More information

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) 18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures

More information

References EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)

References EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions) EE457 Out of Order (OoO) Execution Introduction to Dynamic Scheduling of Instructions (The Tomasulo Algorithm) By Gandhi Puvvada References EE557 Textbook Prof Dubois EE557 Classnotes Prof Annavaram s

More information

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections ) Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections 2.3-2.6) 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB) Branch prediction and instr fetch R1 R1+R2 R2 R1+R3 BEQZ R2 R3

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica e Informatica 1 Introduction Hardware-based speculation is a technique for reducing the effects of control dependences

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , ) Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections 3.4-3.5, 3.8-3.14) 1 1-Bit Prediction For each branch, keep track of what happened last time and use

More information

EEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW)

EEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) 1 EEC 581 Computer Architecture Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

ILP: Instruction Level Parallelism

ILP: Instruction Level Parallelism ILP: Instruction Level Parallelism Tassadaq Hussain Riphah International University Barcelona Supercomputing Center Universitat Politècnica de Catalunya Introduction Introduction Pipelining become universal

More information

Superscalar Machines. Characteristics of superscalar processors

Superscalar Machines. Characteristics of superscalar processors Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance

More information

Computer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013

Computer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013 18-447 Computer Architecture Lecture 13: State Maintenance and Recovery Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed

More information

EEC 581 Computer Architecture. Lec 7 Instruction Level Parallelism (2.6 Hardware-based Speculation and 2.7 Static Scheduling/VLIW)

EEC 581 Computer Architecture. Lec 7 Instruction Level Parallelism (2.6 Hardware-based Speculation and 2.7 Static Scheduling/VLIW) EEC 581 Computer Architecture Lec 7 Instruction Level Parallelism (2.6 Hardware-based Speculation and 2.7 Static Scheduling/VLIW) Chansu Yu Electrical and Computer Engineering Cleveland State University

More information

Current Microprocessors. Efficient Utilization of Hardware Blocks. Efficient Utilization of Hardware Blocks. Pipeline

Current Microprocessors. Efficient Utilization of Hardware Blocks. Efficient Utilization of Hardware Blocks. Pipeline Current Microprocessors Pipeline Efficient Utilization of Hardware Blocks Execution steps for an instruction:.send instruction address ().Instruction Fetch ().Store instruction ().Decode Instruction, fetch

More information

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines 6.823, L15--1 Branch Prediction & Speculative Execution Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L15--2 Branch Penalties in Modern Pipelines UltraSPARC-III

More information

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 18-447 Computer Architecture Lecture 14: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Dynamic Instruction Scheduling with Branch Prediction

More information

The basic structure of a MIPS floating-point unit

The basic structure of a MIPS floating-point unit Tomasulo s scheme The algorithm based on the idea of reservation station The reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from

More information

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

Module 5: MIPS R10000: A Case Study Lecture 9: MIPS R10000: A Case Study MIPS R A case study in modern microarchitecture. Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R10000 A case study in modern microarchitecture Overview Stage 1: Fetch Stage 2: Decode/Rename Branch prediction Branch

More information

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model. Performance of Computer Systems CSE 586 Computer Architecture Review Jean-Loup Baer http://www.cs.washington.edu/education/courses/586/00sp Performance metrics Use (weighted) arithmetic means for execution

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Introduction Sixth-Generation Superscalar Superpipelined Architecture - Dual 7-stage integer pipelines - High performance

More information

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple

More information

Tutorial 11. Final Exam Review

Tutorial 11. Final Exam Review Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 11 Superscalar Pendidikan Teknik Elektronika FT UNY What is Superscalar? Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance

More information

ECE 505 Computer Architecture

ECE 505 Computer Architecture ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems

More information