Advanced Computer Architecture
|
|
- Gary Rice
- 6 years ago
- Views:
Transcription
1 Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata
2 What is a Processors Architecture Instruction Set Architecture (ISA) Describes class of ISA, addressing and access to a memory, operands a operations, controls the flow of instructions, encoding ISA Design and Organization Defines an interface (or boundary) between software and hardware Describes the high-level aspect of the processors design. Defines logical blocks or units that processors contain (ALU, FPU, General-purpose register, MMU, Controller, ) Hardware Includes detailed logic design of the processors the low level aspect (expected process technology) 2
3 Evolution of the Microprocessor Architecture - ISA CISC and RISC architecture Two critical performance techniques Today CISC instructions are usually translated internally on RISC-like instructions in the most x86 architectures Instruction Level Parallelism (ILP) pipelining and multiple instruction issue later Cache memory usage from the simple to the hierarchical cache memory with sophistical organization and optimization Growth of processor performance today Using ILP, Cache memory and specialized processor units have brought ~50% performance growth per year for 16 years since 2002 Today it is ~20% per year in the uniprocessor performance The future growth is focused on the usage of TLP and DLP techniques 3
4 Evolution of the Microprocessor Architecture - Design Design and Organization From simple processors with single ALU to complex multicore processors in these days CPU off-load on peripheral devices (for example graphic or network cards) Integration of peripheral devices on chip (northbridge, graphic card) Hardware evolution The first fully electronic digital computing device - Atanasoff Berry Computer, which was made based on vacuum tube. The modern process technology 32nm respectively 22nm, which are used in productions today. 4
5 A Reference RISC processor (1) A RISC architecture will be used to illustrate the basic concepts. The introduced ideas are of course applicable to the other processor architectures. A typical RISC processor 32-bit fixed format instruction 32-bit general-purpose registers Memory access only via Load and Store instructions with the single addressing mode (based + displacement) Simple branch conditions Delayed branch 5
6 A Reference RISC processor (2) Instruction types Register-Register 5 stage of the RISC instruction Register-Immediate Branch Jump/Call Instruction fetch cycle (IF) Instruction decode/register fetch cycle (ID) Execution/effective address cycle (EX) Memory access (MEM) Write-back cycle (WB) Every stage of the instruction uses no shared resources of the processor (expect for MEM and IF) 6
7 5 Stages of MIPS Datapath 7
8 Sequential Unpipeline Instruction Execution It was used in the first computers. Processor architecture model is called sub-scalar architecture. Instructions are executed one by one without any overlap in sequential order. The same processing time for each instruction stage is not needed Some parts of the processor may not be used in some clock cycles. Amount of time, which is needed to execute the whole process, is sum of execution time of all instructions. Tc=T.N.CPI Tc - Total execution time, T - Clock period, N - Total number of instructions, CPI - Clock per instruction It is still used in very simple computers (like simple calculators) Q: How many cycles require execution of these two instructions? 8
9 Instruction Pipeline Execution The overlap of the execution time of instructions brings better usage of the processor units. Scalar processor architecture model It does not reduce the execution time of the instruction It increases the CPU instruction throughput Some modifications of hardware architecture for pipeline execution are needed Bypassing and/or short-circuiting Pipeline registers Pipeline overhead Setup time on pipeline registers Q: Will it work correctly? 9
10 MIPS Pipeline Datapath 10
11 Pipeline Timing, Throughput and Latency Pipeline Timing Unpipeline execution Pipeline execution Same execution time for each stage Balanced pipeline Pipeline Throughput Time of each stage can be different Count of completed instructions per second Pipeline Latency How long it takes to execute single instruction in pipeline. 11
12 Pipeline Hazards Hazards prevent the execution of the next instruction in its designated clock cycle. They reduce speedup of pipeline execution There are three classes of hazards Structural hazards: arise from resource conflicts. Hardware may not support all combinations of instruction in overlapped execution. Data hazards: Instruction depends on result of previous instruction, which is not available when it is needed. Control hazards: May arise when the program counter is changed by the flow control instruction (branch). 12
13 Structural Hazards Example 1: Memory access of instruction fetch and memory access of data, on the processor with single memory access channel. Instruction OR can not be loaded in cycle 4 due to structural hazard (there is not gate for access into the memory for getting a13 new instruction)
14 Structural Hazards Possible solution of structural hazard from Example 1 Separate instruction and data memory (brings higher cost of CPU) Stall the pipeline for one cycle (decrease pipeline execution speedup) 14
15 Data Hazard RAW data hazard on register R1 Hazard has impact on instructions sub, add and or. Result from the first instruction add is not available in time. 15
16 Three Generic Classes of Data Hazards Read After Write (RAW) Following instruction tries to use a result from previous instruction, which is not available in that moment add r1,r2,r3 sub r6,r1,r4 Impact of this type of the hazard can be reduced with hardware technique called forwarding. The ALU result from EX/MEM and MEM/WB pipeline registers is fed back to the ALU inputs. Hardware must be able to detect RAW hazards. Called a dependency 16
17 Three Generic Classes of Data Hazards Write After Read (WAR) The following instruction writes its result into the register before the previous instruction can read the original value. mul r2,r1,r3 add r1,r4,r5 The hardware may not allow occurrence of this data hazard It can not happen in this simple MIPS 5 stage pipeline It is common for processors with complex pipelining Called anti-dependence 17
18 Three Generic Classes of Data Hazards Write After Write (WAW) Following instruction writes its own result into the register before previous one. sub r1,r2,r3 add r1,r4,r3 The hardware may not allow occurrence of this data hazards It can not happen in this simple MIPS 5 stage pipeline Called output dependence Q: Why the last two types of data hazards can not happen in this type of the processor? 18
19 HW Changes for Forwarding Forwarding paths Modify the multiplexors to select where ALU operands should come from Logic for determining sources of ALU operands in ID stage 19
20 Unsolved Data Hazards Not all types of the data hazards can be solved with usage of forwarding technique The data from the load (lw) instruction is available after the fourth clock cycle. Even forwarding technique can not deliver it into the ALU in the end of third clock cycle.20
21 How To Avoid Load Hazards Hardware assisted solution Hardware must be able to recognize the load hazards The pipeline is stalled when the load hazard occurs Software scheduling by compiling an application The order of the instructions can be modified with help of independent instructions during compilation into acceptable sequence without load hazards 21
22 Control Hazards (Branch Hazards) It can cause a greater performance loss then the data hazards Branch Hazards Change of PC may or may not be other than plus 4, when a branch is executed 22
23 Branch Stall Impact Branch penalty for 5 stage MIPS pipeline is 3 clock cycle Minimization of branch penalty with HW modification Branch target is known at the end of execute/addresscalculation stage (3rd cycle clock) In the instruction decode stage Determines if the branch is taken or not Calculates its target address Branch penalty is minimized but still exists Branch prediction schemes Simple compile-time schemes Dynamic branch prediction 23
24 Reducing Pipeline Branch Penalties with a Simple HW Modification Early branch determination Target of the branch is known in the ID stage of a branch instruction 24
25 Reducing Pipeline Branch Penalties with the Static Scheduling Strategy Stalls the pipeline until the branch destination is known The simplest scheme to handle branches Pipeline is to be freezed or flushed wasting of CPU time Waste CPU time in case of 5 stage MIPS pipeline. Next instruction can not be fetched until branch destination is known Predicted branch not taken Next instruction is fetched as if the branch was not executed PC = PC + 4 Instruction must be removed from the pipeline if the branch is taken Processor state can not change until branch outcome is known Approximately 47% MIPS branches are not taken Predicted branch taken Inverse alternative to the previous case 25
26 Reducing Pipeline Branch Penalties with the Static Scheduling Strategy Delayed branch Branch takes place after a following instruction Delay slot is amount of clock cycles which are needed to obtain branch target address The instructions following branch in the delay slot are executed whether the branch is taken or not branch instruction sequential successor1 sequential successor2 sequential successorn branch target if taken branch delay slot of length n Q: What is a length of the branch delay slot in 5 stage MIPS pipeline with early branch determination? 26
27 Delayed Branch Where do we get instructions to fill branch delay slot? Canceling instructions from branch delay slot Second two cases may require canceling of the instructions from delay slot Instruction is executed but write-back must be disabled until the result from the branch is known. 27
28 Dynamic Branch Prediction The simple branch prediction schemes are a branchprediction buffer or branch history table A small cache indexed by the lower bits of branch address with limited number of entry contains a bit that says whether the branch was taken or not Prediction may not be correct. Another branch with same lover bits of address can modify the prediction bit A branch condition cannot be predicted Local scope of the prediction bit 2-bit prediction schemes are often used 28
29 Two Bits Dynamic Branch Prediction The scope of the predictors is local (per branch prediction) Mis-prediction rate for 2-bits branch predictor with 4k entries is from 1% to 18% 29
30 Advanced Branch Prediction Schemes Correlating Branch Predictors (Two-level adaptive predictors) Takes a look at the recent behavior of other branches at first Behavior of the predicted branch has lower priority Tournament Predictors: Adaptively Combining Local and Global Predictors Usage of multiple predictors Local base, Global base and combination of both bases on a selector Effective at medium size of prediction cache (8k 32k bits) 30
31 Extending the MIPS Pipeline to Handle Multi-cycle Operations 4 separate functional units Integer units (1 clock) Brings the possibility of WAW, RAW data and structural hazards. FP and integer multiplier (7 clock) FP adder (4 clock) Integer ALU, load, store, branches FP add, subtract, conversion FP and integer divider Unpipeline (24 clock) 31
32 Structural Hazards and Multi-cycle Operation Extension The probability of the structural hazards increases with the presence of unpipelined units Only one instruction can be in EX stage on these units (issuing problem) The number of register writes required in one cycle can be larger than 1, because of varying running time of instructions Possible solution is to implement multiple write gateways or separate integer and floating-point registers a simple shift register, that indicates when already-issued instructions will use register file and stall new instruction issue a mechanism to stall conflict instruction when it tries to enter MEM or WB stage 32
33 Data Hazards and Multi-cycle Operation Extension Stalls for RAW will be more frequent (varying running time of instructions) WAW hazards are possible Instructions do not reach WB stage in order in which they were issued The processor does not known anything about the length of the pipeline by issuing of an instruction A simple solution: if an instruction wants to write into the same register as an instruction already issued, it is stalled. 33
34 Super-scalar processors Issuing one instruction every clock cycle is not enough for effective usage of every function units of the processor Multiple instruction are issued, decoded and forwarded to their execution processor must have multiple function units for each pipeline stage multiple read/write gateways into the register set have to be present processor must have multiple read gateways into instruction cache (or memory) 34
35 Super-scalar processors A lot of independent instruction have to be present in the code of an application Issued instructions can not contain any dependency to avoid data hazards Independent instruction can be scheduled statically by the compilation or dynamically with help of the hardware before their execution 35
36 Superscalar processors Types of superscalar processors Superscalar with dynamic issue structure Some instructions can be executed out-of-order. No speculation during execution Not used today Superscalar with dynamic issue and speculative scheduling Out-of-order execution with speculation Pentium 4, Core, Core2,.., IBM, Power5 VLIW/LIW Static issued instructions with static scheduling. Hazard detection and other preparation in compilation time Embedded space EPIC (primarily static issued instructions with most static scheduling) Hazard detection in compilation time Intel Itanium 36
37 Advantage of Dynamic Scheduling Allows the code which was compiled for one pipeline to run efficiently on a different pipeline It simplifies the compiler Compiled code is not platform depended Optimization for efficient execution is not necessary It can handle dependencies that were not known at compile time. 37
38 Dynamic Scheduling The simple idea of dynamic scheduling is to allow out-of-order execution of instructions after their dynamic issue. Out-of-order execution implies out-of-order completion and it introduces the possibility of WAR and WAW hazards. Both these hazards are avoided by the use of register renaming. (will be discussed in future chapter) To allow out-of-order execution it is essential to split the DI stage of the simple five-stage pipeline into two stages Issue (II) decodes instruction and check for structural hazards - all in-order Read operands (RO) wait until no data hazards, then read operands for all out-of-order waiting instructions 38
39 Dynamic Scheduling Issue window represents count of instructions which can be forwarded to their out-of-order execution without being completed Out-of-order completion have to be tended with score board table or today more usually with help of thomasulo's algorithm 39
40 Static scheduling at the compilation time VLIW/LIW or EPIC processors schedule the instruction at the compilation time. Advantage at compilation time scheduling Compiler has code of whole application to find independent instructions (instructions without dependencies) The Long instruction word with most static scheduled instruction can be created very effectively Effective usage of all processor units (the best case) Disadvantage Not all potential hazards states can by solved in the compilation time Compiler detects potential hazards and the hardware help is necessary for their solving The processor has only limited ability to solve hazards problems 40
41 Conclusion Introduction into the computer architecture description with help of ISA Matter of unpipelined instruction execution on processors with independent 5 stage execution Pipeline instruction execution on the reference processors with the simple 5 stage MIPS pipeline Classes of the hazards and their possible solutions Extension of 5 stage MIPS pipeline for multi-cycle instruction execution Introduction into super-scalar processors and dynamic and static scheduling of independent instructions 41
42 Literature John L. Hennessy, David A. Patterson, Computer Architecture: A Quantitative Approach (4th Edition) Paul H. J. Kelly, Advanced Computer Architecture Lecture notes 332 Andrew S. Tanenbaum, Operating Systems: Design and Implementation 42
COSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationWhat is Pipelining? RISC remainder (our assumptions)
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationThe Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.
The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationExploitation of instruction level parallelism
Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationAppendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Appendix C Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure C.2 The pipeline can be thought of as a series of data paths shifted in time. This shows
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationECE 505 Computer Architecture
ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationLECTURE 10. Pipelining: Advanced ILP
LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationPage 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the
More informationControl Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.
Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationImprove performance by increasing instruction throughput
Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationPage # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationCOSC4201 Pipelining. Prof. Mokhtar Aboelaze York University
COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC
More informationInstruction Level Parallelism
Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches
More informationPipelining: Hazards Ver. Jan 14, 2014
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationECE 587 Advanced Computer Architecture I
ECE 587 Advanced Computer Architecture I Instructor: Alaa Alameldeen alaa@ece.pdx.edu Spring 2015 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2015 1 When and Where? When:
More informationSlide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng
Slide Set 7 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationProcessor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)
More informationAdvanced issues in pipelining
Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationECEC 355: Pipelining
ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly
More informationMIPS An ISA for Pipelining
Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationCPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts
CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The
More informationEECC551 Exam Review 4 questions out of 6 questions
EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving
More informationInstruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction
Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches
More informationLimiting The Data Hazards by Combining The Forwarding with Delay Slots Operations to Improve Dynamic Branch Prediction in Superscalar Processor
Limiting The Data Hazards by Combining The Forwarding with Delay Slots Operations to Improve Dynamic Branch Prediction in Superscalar Processor S.A.Hadoud and A.M.Mosbah Azzaituna University, Tarhuna Libya
More informationLecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )
Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationIF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4
12 1 CMPE110 Fall 2006 A. Di Blas 110 Fall 2006 CMPE pipeline concepts Advanced ffl ILP ffl Deep pipeline ffl Static multiple issue ffl Loop unrolling ffl VLIW ffl Dynamic multiple issue Textbook Edition:
More informationCOMPUTER ORGANIZATION AND DESIGN
ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some
More informationProcessor Architecture
Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)
More informationModern Computer Architecture
Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each
More informationPipelining and Vector Processing
Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC
More informationLecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)
Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining
More informationEN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design
EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationHY425 Lecture 05: Branch Prediction
HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware
More informationPage 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationPipeline Review. Review
Pipeline Review Review Covered in EECS2021 (was CSE2021) Just a reminder of pipeline and hazards If you need more details, review 2021 materials 1 The basic MIPS Processor Pipeline 2 Performance of pipelining
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationInstruction Level Parallelism
Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic
More informationECE 571 Advanced Microprocessor-Based Design Lecture 4
ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted
More informationBasic Pipelining Concepts
Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationMIPS ISA AND PIPELINING OVERVIEW Appendix A and C
1 MIPS ISA AND PIPELINING OVERVIEW Appendix A and C OUTLINE Review of MIPS ISA Review on Pipelining 2 READING ASSIGNMENT ReadAppendixA ReadAppendixC 3 THEMIPS ISA (A.9) First MIPS in 1985 General-purpose
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationEECC551 Review. Dynamic Hardware-Based Speculation
EECC551 Review Recent Trends in Computer Design. Computer Performance Measures. Instruction Pipelining. Branch Prediction. Instruction-Level Parallelism (ILP). Loop-Level Parallelism (LLP). Dynamic Pipeline
More informationRecall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics
More informationArchitectural Performance. Superscalar Processing. 740 October 31, i486 Pipeline. Pipeline Stage Details. Page 1
Superscalar Processing 740 October 31, 2012 Evolution of Intel Processor Pipelines 486, Pentium, Pentium Pro Superscalar Processor Design Speculative Execution Register Renaming Branch Prediction Architectural
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationLecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2
Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationPipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!
Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!
More informationImproving Performance: Pipelining
Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More information