Vorlesung / Course IN2075: Mikroprozessoren / Microprocessors

Size: px
Start display at page:

Download "Vorlesung / Course IN2075: Mikroprozessoren / Microprocessors"

Transcription

1 Vorlesung / Course IN2075: Mikroprozessoren / Microprocessors Pipelining 20 Nov 2017 Carsten Trinitis (LRR)

2 P Techniques: Pipelining etc.

3 Processor Organisation Tasks of a processor Input: Stream of instructions Instructions taken from ISA Execute instructions Modify stream in case of control instructions Continuous loop Instruction Cycle

4 Simple Instruction Cycle Fetch Instruction Decode Instruction Fetch Data Process Data Write Data Read the instruction from main memory Decode to query the requested action Get the data required for the requested action Perform the requested data processing Store the result of the processed data

5 Comment Instruction Cycle Simplified Some concepts missing Main point: Interrupts / Exception Interrupts External, asynchronous interruption of the processor Processor stops execution on current instructions Starts with interrupt routine Asynchronous change in control flow Exact number and tasks of each step varies (See also later)

6 Running a Processor A true Stream of Instructions Let s assume no branches for now FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD Time 1 cycle Each step is carried out by a different unit During one cycle only one unit activated Others are idle

7 Optimisation Time FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD Pipeline stage 1 cycle Make use of otherwise unused units Concept is called Pipelining Think of assembly lines

8 Basic Principle Task: D operation steps have to be executed on N items each. Sequential approach: D N time steps required Pipelining approach (overlapped execution): D workers: each of the D steps performed by a different worker. approx. N time steps required Prerequisites (simplification): Each step needs the same amount of time Each item needs the same types of operation (at least in the beginning)

9 Instruction Pipelining Problems: Not all phases are of equal length Some instructions need additional phases Hence, very well suited for RISC systems Fast instructions with little variance However, also CISC Systems use pipelining Saving in Execution Time (in optimal case) Example: 5 stages / 10 Instructions Without pipelining: 5*10=50 cycles With pipelining: 10+4 = 14 cycles

10 Comparison Pipelining (overlapped execution) D workers: each of the D steps is performed by a different worker approx. N time steps required Parallel execution: M workers: each worker performs all D steps on N/M items (N D)/M time required Parallel execution and pipelining can be (and usually are) combined. N M

11 Piplelining in Computer Science Basic operations: Arithmetics (e.g. FMUL, FDIV) Instruction/data load Instruction stream Fetch, Decode, Execute, Writeback,... Vector operations c = a + αb load, mult, load, add, store Example: Cray-1, vector computer

12 Piplelining in Computer Science Communicating processes Parallel threads on different cores, CPUs, machines Output of thread serves as input for next thread Example: IBM Cell Special purpose devices Digital signalling systems Network switches (routers) Graphic accelerators rendering pipeline Our focus: pipelining on basic operations and the instruction stream.

13 CL: Combinatorial logic R: Register Max. clock-frequency depends on longest path in CL

14 2 CL is divided into CL1 and CL2 Longest path in each CL is now shorter (half as long?) Clock frequency can be increased (doubled?)

15 Effects of pipelining Pipeline depth: D Throughput: times D (at best, usually worse) Latency: unchanged (at best, usually worse) Performance depends on both throughput and latency!

16 Increasing the Pipeline Growing benefit with growing pipeline length System with more units More units work on active instructions Approach: Split phases further Add more pipeline stages Example: Fetch Data Calculate Operands & Fetch Operands Similar things possible with other stages

17 Instruction Stream Pipelinine

18 Different Types of Instructions Most instructions need 1 cycle for execution (ADD, AND, MUL,...) Exceptions: FPU instructions, e.g. Pentium Pro: FADD 3, FMUL 5, FDIV 18, FSQRT 29 Load: At least two: generate address, read data Cache miss: up to thousands! Store: Stores are put into a queue Memory access is carried out by a dedicated unit

19 Problems with Pipelining So far: Optimal pipelining Each instruction independent Units can operate concurrently In reality: Three typrs ofconflicts can Occur Data Conflicts One instruction needs data from a previous one Resource Conflicts Two instructions need the same resources at the same time Occurs in more complex hardware Control Conflicts Wrong instructions executed in the pipeline

20 Data Conflicts Again three types: Read after write (RAW) Write after read (WAR) Write after write (WAW)

21 Data Conflicts (RAW) One instruction works on the data written by the previous instruction Example: ADD R2,R1,R2 // R2=R1+R2 MUL R3,R2,R3 // R3=R2*R3 Without pipelining, no problem Time FI DI FD PD WD ADD R2,R1,R2 FI DI FD PD WD MUL R3,R2,R3 Register 2

22 Data Conflicts (RAW) Scenario changes with pipelining ADD R2,R1,R2 MUL R3,R2,R3 FI DI FD PD WD FI DI FD PD WD Time Produces wrong result Based on old value in R2 Result of a data dependency Register 2

23 Data Conflicts (WAR) Example: 1 MOV AND DEC r0 [r1], r2 //(post decrement of r2) 2 ADD r2 r5, r6 Instruction 1 reads r2 in a late pipeline stage (after the move). Instruction 2 changes r2 in an early pipeline stage. 1 uses the new value.

24 Data Conflicts (WAW) Example: 1 FSQRT r0 r1 2 ADD r0 r1,r2 Execution of SQRT takes much longer than ADD. Hence, SQRT is stored instead of sum in r0!

25 Solving Data Conflicts Pipeline Interlocking Bypassing Software Solutions Detection of Data Conflicts by the Compiler Appropriate code structuring to avoid conflicts Hardware Solutions (automatic reordering) Hardware detects dependencies dynamically Treatment of conflicts Latter solution more complex, but No code change required (old codes still run) No complex compiler work

26 User/Compiler-based Re-ordering Example: ADD r0 r1,r2 SUB r3 r0, r5 AND r6 r7, r8 Insert NOPs Example: ADD r0 r1, r2 SUB r3 r0, r5 ADD r0 r1, r2 AND r6 r7, r8 SUB r3 r0, r5 ADD r0 r1, r2 NOP SUB r3 r0, r5

27 Pipeline Interlocking Detect conflict (hardware comparator for operand fields) Execution of early pipeline stages is stopped (pipeline stall) Wait until late stages are executed Continue operation NOPs are inserted automatically

28 Pipeline Interlocking

29 Pipeline Interlocking

30 Bypassing Detect conflict (hardware comparator for operand fields). Directly pass output of a late pipeline stage to an earlier stage + Pipeline can run at full speed! Usually implemented for all instructions requiring only one execution phase.

31 Bypassing

32 Bypassing

33 Structural Conflicts Definition: Two pipeline stages want to use the same circuit at the same time. Example: LOAD r0 [r1] ADD r5 r2,r3 If LOAD requires one more cycle than ADD, both operations want to write back within the same cycle.

34 Structural Conflicts Solutions: By programmer / compiler Pipeline interlocking Resource multiplication (e.g. virtual registers, see superscalarity)

35 Control Conflicts Pipelining requires known instruction stream Instructions need to be started before previous one has been finished Problem: Control flow instructions E.g. conditional branches Not known until after PD phase where the target, i.e. the next instruction is Until then, already many other instructions may have been started and partially executed

36 Control Conflicts Time JNZ target May be useless FI DI FD PD WD FI DI FD PD WD Start of new instruction stream FI DI FD PD WD FI DI FD PD WD After branch has been executed, Pipeline must be restarted Intermediate instructions must be aborted Their results must be dropped FI DI FD PD WD

37 Control Conflicts Problem: Execution of a branch/jump: change program counter Assumption: this is done in the n th pipeline stage Consequence: n 1 instructions after the jump are already in the pipeline n 1 instructions must be cancelled (pipeline flush) Every fourth to sixth instruction is a jump!

38 Control Conflicts: Solutions Reduce n: execute jump early in the pipeline Use delay slots Branch prediction Avoid branches

39 Avoiding Branches by Predication Example: ARM architecture Every instruction has a 4bit condition code Equal, not equal, unsigned higher or same, unsigned lower,..., always Conditional execution: instruction is only executed if condition holds true Avoids many jumps resulting from if-statements Instruction requires time even if condition is not true cannot be used for: long if-blocks avoiding jumps belonging to loops

40 ARM: Condition Flags & Codes Consider a simple fragment of C code: for (i = 10; i!= 0; i ) { do_something(); } A standard compiler would yield: mov r4, #10 loop_label: bl do_something sub r4, r4, #1 cmp r4, #0 bne loop_label

41 ARM: Condition Flags & Codes On an ARM architecture: mov r4, #10 loop_label: bl do_something subs r4, r4, #1 bne loop_label The s suffix causes the instruction (in this case sub) to update the flags itself based on its result. Example will be provided with exercises!

42 Conditional Execution on ARM Implemented with a 4-bit condition code selector. One of the four-bit codes is reserved as an "escape code" to specify certain unconditional instructions. However, nearly all common instructions are conditional. Example: Compute greatest common divisor (GCD) of two integers through Euclidean Algorithm

43 Euclidean Algorithm Find the Greatest Common Denominator (GCD) of two given numbers a and b. Basic algorithm: if (a == 0) return b; else while (b!= 0) if (a > b) a = a b; else b = b a; return a;

44 Euclidean Algorithm on ARM C code: In ARM assembly language: while (b!= 0) if (a > b) a = a b; else b = b a; Loop: CMP Ra, Rb ;set condition: ;"NE" if (a!=b), ;"GT" if (a>b), ;"LT" if (a<b) SUBGT Ra, Ra, Rb ; if "GT", a=a b; SUBLE Rb, Rb, Ra ; if "LT", b=b a CMP Rb, #0 BNE loop ; if "NE" then loop

45 Versions of ADD on ARM Unconditional versions of ADD: ADD ADDS (or ADDAL ADDALS) Conditional versions of ADD: ADDEQ ADDEQS ADDNE ADDNES ADDCS ADDCSS ADCC ADDCCS ADDMI ADDMIS ADDPL ADDPLS ADDVS ADDVSS ADDVC ADDVCS ADDHI ADDHIS ADDLS ADDLSS ADDGE ADDGES ADDLT ADDLTS ADDGT ADDGTS ADDLE ADDLES

46 Code Meaning (for cmp or subs) Flags Tested Technische Versions Universität München of ADD on ARM EQ Equal. Z==1 NE Not equal. Z==0 CS or HS CC or LO Unsigned higher or same (or carry set). Unsigned lower (or carry clear). C==1 C==0 MI Negative ( "minus"). N==1 PL Positive or zero ("plus"). N==0 VS Signed overflow ("V set"). V==1 VC No signed overflow ("V clear"). V==0 HI Unsigned higher. (C==1) && (Z==0) LS Unsigned lower or same. (C==0) (Z==1) GE Signed greater than or equal. N==V LT Signed less than. N!=V GT Signed greater than. (Z==0) && (N==V) LE Signed less than or equal. (Z==1) (N!=V) AL (omitted) always

47 Delay Slots Delay slots are the n 1 instructions following a branch On some architectures these are executed, even if jump is taken Example (assuming n==2): Conventional ISA Loop: LOAD r2 [r1] ADD r0 r0,r2 DEC r1 JZ loop ISA with delay slots loop: LOAD r2 [r1] DEC r1 JZ loop ADD r0 r0,r2 if no independent instruction is found, a NOP must be inserted used e.g. in AM29000 microprocessors (only works for small n)

48 Branch Prediction Problem: conditional jumps are executed at a later stage in the pipeline Solution: predict whether branch is taken pipeline must be flushed if prediction was wrong. Two approaches: Static branch prediction (does not depend on earlier branches) Dynamic branch prediction (depends on branch history)

49 Branch Prediction Simplest case: always assume no branch Better: always assume branch ( 2/3 of all branches are taken!) Even better: Backward branch: assume branch Forward branch: assume no branch Unconditional branch: assume branch Alternative: Compiler gives hint whether branch is taken or not Every conditional branch needs two opcodes

50 Dynamic Branch Prediction Idea: remember if branch at address XXX was taken in the past or not. loop: LOA r2 [r1] ADD r0 r0,r2 DEC r1 XXX: JZ loop Branch only mispredicted in last iteration!

51 Dynamic Branch Prediction Branch prediction cache: N entries Tag: address of jump Entry: one bit (taken or not taken) Bits needed for LRU Stores the N most recently encountered jumps

52 Dynamic Branch Prediction More sophisticated approach: Assume branch sequence: TNTNTNTNTNT No branch predicted correctly! Solution (2 bits per entry required):

53 Dynamic Branch Prediction More sophisticated approach: e.g. Assume branch sequence: TNTNTNTNTNT No branch predicted correctly! Solution (2 bits per entry required):

54 Branch Target Cache Motivation: Branch prediction is not sufficient. Determining branch target (e.g. load from memory) takes too long! Even worse for indirect jumps and returns. Solution: Branch target cache Maps location of branches to branch targets IFETCH stage: if <current IP matches cache tag> load cache entry into pipeline else: load next (IP+1) into pipeline

55 Branch Target Cache

56 Branch Target Cache

57 Branch Target Cache Problem: Branch target cache does not work well for RETinstructions. Solution (as introduced by Cyrix): CALL stores address not only on stack... but also in branch stack (within control unit) As long as branch cache does not overflow... all returns can be correctly predicted.

Flow Control In Assembly

Flow Control In Assembly Chapters 6 Flow Control In Assembly Embedded Systems with ARM Cortext-M Updated: Monday, February 19, 2018 Overview: Flow Control Basics of Flowcharting If-then-else While loop For loop 2 Flowcharting

More information

Vorlesung / Course IN2075: Mikroprozessoren / Microprocessors

Vorlesung / Course IN2075: Mikroprozessoren / Microprocessors Vorlesung / Course IN2075: Mikroprozessoren / Microprocessors Superscalarity 8 Jan 2018 Carsten Trinitis (LRR) Superscalarity Parallel Execution & ILP at at Instruction Level Parallelism Superscalarity:

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

TECH. CH14 Instruction Level Parallelism and Superscalar Processors. What is Superscalar? Why Superscalar? General Superscalar Organization

TECH. CH14 Instruction Level Parallelism and Superscalar Processors. What is Superscalar? Why Superscalar? General Superscalar Organization CH14 Instruction Level Parallelism and Superscalar Processors Decode and issue more and one instruction at a time Executing more than one instruction at a time More than one Execution Unit What is Superscalar?

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? Common instructions (arithmetic, load/store,

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

Unsigned and signed integer numbers

Unsigned and signed integer numbers Unsigned and signed integer numbers Binary Unsigned Signed 0000 0 0 0001 1 1 0010 2 2 0011 3 3 0100 4 4 Subtraction sets C flag opposite of carry (ARM specialty)! - if (carry = 0) then C=1 - if (carry

More information

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions Lecture 7: Pipelining Contd. Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 More pipelining complications: Interrupts and Exceptions Hard to handle in pipelined

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Branch Instructions. R type: Cond

Branch Instructions. R type: Cond Branch Instructions Standard branch instructions, B and BL, change the PC based on the PCR. The next instruction s address is found by adding a 24-bit signed 2 s complement immediate value

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Chapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University

Chapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University Chapter 2 Instructions Sets Hsung-Pin Chang Department of Computer Science National ChungHsing University Outline Instruction Preliminaries ARM Processor SHARC Processor 2.1 Instructions Instructions sets

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

Comparison InstruCtions

Comparison InstruCtions Status Flags Now it is time to discuss what status flags are available. These five status flags are kept in a special register called the Program Status Register (PSR). The PSR also contains other important

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input

More information

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. CS 320 Ch. 16 SuperScalar Machines A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. A superpipelined machine is one in which a

More information

Module 4c: Pipelining

Module 4c: Pipelining Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A

More information

ARM Cortex-M4 Architecture and Instruction Set 3: Branching; Data definition and memory access instructions

ARM Cortex-M4 Architecture and Instruction Set 3: Branching; Data definition and memory access instructions ARM Cortex-M4 Architecture and Instruction Set 3: Branching; Data definition and memory access instructions M J Brockway February 17, 2016 Branching To do anything other than run a fixed sequence of instructions,

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

Chapter 14 - Processor Structure and Function

Chapter 14 - Processor Structure and Function Chapter 14 - Processor Structure and Function Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 14 - Processor Structure and Function 1 / 94 Table of Contents I 1 Processor Organization

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

Pipelining. Parts of these slides are from the support material provided by W. Stallings

Pipelining. Parts of these slides are from the support material provided by W. Stallings Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 11 Superscalar Pendidikan Teknik Elektronika FT UNY What is Superscalar? Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed

More information

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution

More information

ARM Architecture and Instruction Set

ARM Architecture and Instruction Set AM Architecture and Instruction Set Ingo Sander ingo@imit.kth.se AM Microprocessor Core AM is a family of ISC architectures, which share the same design principles and a common instruction set AM does

More information

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1... Instruction-set Design Issues: what is the format(s) Opcode Dest. Operand Source Operand 1... 1) Which instructions to include: How many? Complexity - simple ADD R1, R2, R3 complex e.g., VAX MATCHC substrlength,

More information

William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function Rev. 3.2.1 (2005-06) by Enrico Nardelli 11-1 CPU Functions CPU must: Fetch instructions Decode instructions

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Updated Exercises by Diana Franklin

Updated Exercises by Diana Franklin C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

ARM Shift Operations. Shift Type 00 - logical left 01 - logical right 10 - arithmetic right 11 - rotate right. Shift Amount 0-31 bits

ARM Shift Operations. Shift Type 00 - logical left 01 - logical right 10 - arithmetic right 11 - rotate right. Shift Amount 0-31 bits ARM Shift Operations A novel feature of ARM is that all data-processing instructions can include an optional shift, whereas most other architectures have separate shift instructions. This is actually very

More information

Control Flow. September 2, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 September 2, / 21

Control Flow. September 2, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 September 2, / 21 Control Flow Geoffrey Brown Bryce Himebaugh Indiana University September 2, 2016 Geoffrey Brown, Bryce Himebaugh 2015 September 2, 2016 1 / 21 Outline Condition Codes C Relational Operations C Logical

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

ARM Cortex-M4 Programming Model Flow Control Instructions

ARM Cortex-M4 Programming Model Flow Control Instructions ARM Cortex-M4 Programming Model Flow Control Instructions Textbook: Chapter 4, Section 4.9 (CMP, TEQ,TST) Chapter 6 ARM Cortex-M Users Manual, Chapter 3 1 CPU instruction types Data movement operations

More information

Photo David Wright STEVEN R. BAGLEY PIPELINES AND ILP

Photo David Wright   STEVEN R. BAGLEY PIPELINES AND ILP Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts C-2 Appendix C Pipelining: Basic and Intermediate Concepts C.1 Introduction Many readers of this text will have covered the basics of pipelining in another text (such as our more basic text Computer Organization

More information

There are different characteristics for exceptions. They are as follows:

There are different characteristics for exceptions. They are as follows: e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines Assigned April 7 Problem Set #5 Due April 21 http://inst.eecs.berkeley.edu/~cs152/sp09 The problem sets are intended

More information

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance

More information

The PAW Architecture Reference Manual

The PAW Architecture Reference Manual The PAW Architecture Reference Manual by Hansen Zhang For COS375/ELE375 Princeton University Last Update: 20 September 2015! 1. Introduction The PAW architecture is a simple architecture designed to be

More information

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise CSCI 4717/5717 Computer Architecture Topic: Instruction Level Parallelism Reading: Stallings, Chapter 14 What is Superscalar? A machine designed to improve the performance of the execution of scalar instructions.

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

CPU Structure and Function

CPU Structure and Function Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com http://www.yildiz.edu.tr/~naydin CPU Structure and Function 1 2 CPU Structure Registers

More information

Structure of Computer Systems

Structure of Computer Systems 288 between this new matrix and the initial collision matrix M A, because the original forbidden latencies for functional unit A still have to be considered in later initiations. Figure 5.37. State diagram

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Superscalar Processors Ch 14

Superscalar Processors Ch 14 Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Writing ARM Assembly. Steven R. Bagley

Writing ARM Assembly. Steven R. Bagley Writing ARM Assembly Steven R. Bagley Hello World B main hello DEFB Hello World\n\0 goodbye DEFB Goodbye Universe\n\0 ALIGN main ADR R0, hello ; put address of hello string in R0 SWI 3 ; print it out ADR

More information

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency? Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1... Instruction-set Design Issues: what is the format(s) Opcode Dest. Operand Source Operand 1... 1) Which instructions to include: How many? Complexity - simple ADD R1, R2, R3 complex e.g., VAX MATCHC substrlength,

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

STEVEN R. BAGLEY ARM: PROCESSING DATA

STEVEN R. BAGLEY ARM: PROCESSING DATA STEVEN R. BAGLEY ARM: PROCESSING DATA INTRODUCTION CPU gets instructions from the computer s memory Each instruction is encoded as a binary pattern (an opcode) Assembly language developed as a human readable

More information

Chapter 9. Pipelining Design Techniques

Chapter 9. Pipelining Design Techniques Chapter 9 Pipelining Design Techniques 9.1 General Concepts Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence. Each subtask

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

More information

CS252 Graduate Computer Architecture Midterm 1 Solutions

CS252 Graduate Computer Architecture Midterm 1 Solutions CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate

More information

Preventing Stalls: 1

Preventing Stalls: 1 Preventing Stalls: 1 2 PipeLine Pipeline efficiency Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal pipeline CPI: best possible (1 as n ) Structural hazards:

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

CS152 Computer Architecture and Engineering. Complex Pipelines

CS152 Computer Architecture and Engineering. Complex Pipelines CS152 Computer Architecture and Engineering Complex Pipelines Assigned March 6 Problem Set #3 Due March 20 http://inst.eecs.berkeley.edu/~cs152/sp12 The problem sets are intended to help you learn the

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version: SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Pipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard.

Pipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard. Calcolatori Elettronici e Sistemi Operativi Pipeline issues Hazards Pipeline issues Data hazard Control hazard Structural hazard Pipeline hazard: RaW Pipeline hazard: RaW 5 6 7 8 9 5 6 7 8 9 : add R,R,R

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC

More information

Pipeline Architecture RISC

Pipeline Architecture RISC Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must

More information

Chapter. Out of order Execution

Chapter. Out of order Execution Chapter Long EX Instruction stages We have assumed that all stages. There is a problem with the EX stage multiply (MUL) takes more time than ADD MUL ADD We can clearly delay the execution of the ADD until

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction

More information

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2 Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time

More information

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar Complex Pipelining COE 501 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Diversified Pipeline Detecting

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Chapter 8 Pipelining and Vector Processing 8 1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8 2 Pipeline

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information