Vorlesung / Course IN2075: Mikroprozessoren / Microprocessors
|
|
- Cassandra Shelton
- 5 years ago
- Views:
Transcription
1 Vorlesung / Course IN2075: Mikroprozessoren / Microprocessors Pipelining 20 Nov 2017 Carsten Trinitis (LRR)
2 P Techniques: Pipelining etc.
3 Processor Organisation Tasks of a processor Input: Stream of instructions Instructions taken from ISA Execute instructions Modify stream in case of control instructions Continuous loop Instruction Cycle
4 Simple Instruction Cycle Fetch Instruction Decode Instruction Fetch Data Process Data Write Data Read the instruction from main memory Decode to query the requested action Get the data required for the requested action Perform the requested data processing Store the result of the processed data
5 Comment Instruction Cycle Simplified Some concepts missing Main point: Interrupts / Exception Interrupts External, asynchronous interruption of the processor Processor stops execution on current instructions Starts with interrupt routine Asynchronous change in control flow Exact number and tasks of each step varies (See also later)
6 Running a Processor A true Stream of Instructions Let s assume no branches for now FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD Time 1 cycle Each step is carried out by a different unit During one cycle only one unit activated Others are idle
7 Optimisation Time FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD FI DI FD PD WD Pipeline stage 1 cycle Make use of otherwise unused units Concept is called Pipelining Think of assembly lines
8 Basic Principle Task: D operation steps have to be executed on N items each. Sequential approach: D N time steps required Pipelining approach (overlapped execution): D workers: each of the D steps performed by a different worker. approx. N time steps required Prerequisites (simplification): Each step needs the same amount of time Each item needs the same types of operation (at least in the beginning)
9 Instruction Pipelining Problems: Not all phases are of equal length Some instructions need additional phases Hence, very well suited for RISC systems Fast instructions with little variance However, also CISC Systems use pipelining Saving in Execution Time (in optimal case) Example: 5 stages / 10 Instructions Without pipelining: 5*10=50 cycles With pipelining: 10+4 = 14 cycles
10 Comparison Pipelining (overlapped execution) D workers: each of the D steps is performed by a different worker approx. N time steps required Parallel execution: M workers: each worker performs all D steps on N/M items (N D)/M time required Parallel execution and pipelining can be (and usually are) combined. N M
11 Piplelining in Computer Science Basic operations: Arithmetics (e.g. FMUL, FDIV) Instruction/data load Instruction stream Fetch, Decode, Execute, Writeback,... Vector operations c = a + αb load, mult, load, add, store Example: Cray-1, vector computer
12 Piplelining in Computer Science Communicating processes Parallel threads on different cores, CPUs, machines Output of thread serves as input for next thread Example: IBM Cell Special purpose devices Digital signalling systems Network switches (routers) Graphic accelerators rendering pipeline Our focus: pipelining on basic operations and the instruction stream.
13 CL: Combinatorial logic R: Register Max. clock-frequency depends on longest path in CL
14 2 CL is divided into CL1 and CL2 Longest path in each CL is now shorter (half as long?) Clock frequency can be increased (doubled?)
15 Effects of pipelining Pipeline depth: D Throughput: times D (at best, usually worse) Latency: unchanged (at best, usually worse) Performance depends on both throughput and latency!
16 Increasing the Pipeline Growing benefit with growing pipeline length System with more units More units work on active instructions Approach: Split phases further Add more pipeline stages Example: Fetch Data Calculate Operands & Fetch Operands Similar things possible with other stages
17 Instruction Stream Pipelinine
18 Different Types of Instructions Most instructions need 1 cycle for execution (ADD, AND, MUL,...) Exceptions: FPU instructions, e.g. Pentium Pro: FADD 3, FMUL 5, FDIV 18, FSQRT 29 Load: At least two: generate address, read data Cache miss: up to thousands! Store: Stores are put into a queue Memory access is carried out by a dedicated unit
19 Problems with Pipelining So far: Optimal pipelining Each instruction independent Units can operate concurrently In reality: Three typrs ofconflicts can Occur Data Conflicts One instruction needs data from a previous one Resource Conflicts Two instructions need the same resources at the same time Occurs in more complex hardware Control Conflicts Wrong instructions executed in the pipeline
20 Data Conflicts Again three types: Read after write (RAW) Write after read (WAR) Write after write (WAW)
21 Data Conflicts (RAW) One instruction works on the data written by the previous instruction Example: ADD R2,R1,R2 // R2=R1+R2 MUL R3,R2,R3 // R3=R2*R3 Without pipelining, no problem Time FI DI FD PD WD ADD R2,R1,R2 FI DI FD PD WD MUL R3,R2,R3 Register 2
22 Data Conflicts (RAW) Scenario changes with pipelining ADD R2,R1,R2 MUL R3,R2,R3 FI DI FD PD WD FI DI FD PD WD Time Produces wrong result Based on old value in R2 Result of a data dependency Register 2
23 Data Conflicts (WAR) Example: 1 MOV AND DEC r0 [r1], r2 //(post decrement of r2) 2 ADD r2 r5, r6 Instruction 1 reads r2 in a late pipeline stage (after the move). Instruction 2 changes r2 in an early pipeline stage. 1 uses the new value.
24 Data Conflicts (WAW) Example: 1 FSQRT r0 r1 2 ADD r0 r1,r2 Execution of SQRT takes much longer than ADD. Hence, SQRT is stored instead of sum in r0!
25 Solving Data Conflicts Pipeline Interlocking Bypassing Software Solutions Detection of Data Conflicts by the Compiler Appropriate code structuring to avoid conflicts Hardware Solutions (automatic reordering) Hardware detects dependencies dynamically Treatment of conflicts Latter solution more complex, but No code change required (old codes still run) No complex compiler work
26 User/Compiler-based Re-ordering Example: ADD r0 r1,r2 SUB r3 r0, r5 AND r6 r7, r8 Insert NOPs Example: ADD r0 r1, r2 SUB r3 r0, r5 ADD r0 r1, r2 AND r6 r7, r8 SUB r3 r0, r5 ADD r0 r1, r2 NOP SUB r3 r0, r5
27 Pipeline Interlocking Detect conflict (hardware comparator for operand fields) Execution of early pipeline stages is stopped (pipeline stall) Wait until late stages are executed Continue operation NOPs are inserted automatically
28 Pipeline Interlocking
29 Pipeline Interlocking
30 Bypassing Detect conflict (hardware comparator for operand fields). Directly pass output of a late pipeline stage to an earlier stage + Pipeline can run at full speed! Usually implemented for all instructions requiring only one execution phase.
31 Bypassing
32 Bypassing
33 Structural Conflicts Definition: Two pipeline stages want to use the same circuit at the same time. Example: LOAD r0 [r1] ADD r5 r2,r3 If LOAD requires one more cycle than ADD, both operations want to write back within the same cycle.
34 Structural Conflicts Solutions: By programmer / compiler Pipeline interlocking Resource multiplication (e.g. virtual registers, see superscalarity)
35 Control Conflicts Pipelining requires known instruction stream Instructions need to be started before previous one has been finished Problem: Control flow instructions E.g. conditional branches Not known until after PD phase where the target, i.e. the next instruction is Until then, already many other instructions may have been started and partially executed
36 Control Conflicts Time JNZ target May be useless FI DI FD PD WD FI DI FD PD WD Start of new instruction stream FI DI FD PD WD FI DI FD PD WD After branch has been executed, Pipeline must be restarted Intermediate instructions must be aborted Their results must be dropped FI DI FD PD WD
37 Control Conflicts Problem: Execution of a branch/jump: change program counter Assumption: this is done in the n th pipeline stage Consequence: n 1 instructions after the jump are already in the pipeline n 1 instructions must be cancelled (pipeline flush) Every fourth to sixth instruction is a jump!
38 Control Conflicts: Solutions Reduce n: execute jump early in the pipeline Use delay slots Branch prediction Avoid branches
39 Avoiding Branches by Predication Example: ARM architecture Every instruction has a 4bit condition code Equal, not equal, unsigned higher or same, unsigned lower,..., always Conditional execution: instruction is only executed if condition holds true Avoids many jumps resulting from if-statements Instruction requires time even if condition is not true cannot be used for: long if-blocks avoiding jumps belonging to loops
40 ARM: Condition Flags & Codes Consider a simple fragment of C code: for (i = 10; i!= 0; i ) { do_something(); } A standard compiler would yield: mov r4, #10 loop_label: bl do_something sub r4, r4, #1 cmp r4, #0 bne loop_label
41 ARM: Condition Flags & Codes On an ARM architecture: mov r4, #10 loop_label: bl do_something subs r4, r4, #1 bne loop_label The s suffix causes the instruction (in this case sub) to update the flags itself based on its result. Example will be provided with exercises!
42 Conditional Execution on ARM Implemented with a 4-bit condition code selector. One of the four-bit codes is reserved as an "escape code" to specify certain unconditional instructions. However, nearly all common instructions are conditional. Example: Compute greatest common divisor (GCD) of two integers through Euclidean Algorithm
43 Euclidean Algorithm Find the Greatest Common Denominator (GCD) of two given numbers a and b. Basic algorithm: if (a == 0) return b; else while (b!= 0) if (a > b) a = a b; else b = b a; return a;
44 Euclidean Algorithm on ARM C code: In ARM assembly language: while (b!= 0) if (a > b) a = a b; else b = b a; Loop: CMP Ra, Rb ;set condition: ;"NE" if (a!=b), ;"GT" if (a>b), ;"LT" if (a<b) SUBGT Ra, Ra, Rb ; if "GT", a=a b; SUBLE Rb, Rb, Ra ; if "LT", b=b a CMP Rb, #0 BNE loop ; if "NE" then loop
45 Versions of ADD on ARM Unconditional versions of ADD: ADD ADDS (or ADDAL ADDALS) Conditional versions of ADD: ADDEQ ADDEQS ADDNE ADDNES ADDCS ADDCSS ADCC ADDCCS ADDMI ADDMIS ADDPL ADDPLS ADDVS ADDVSS ADDVC ADDVCS ADDHI ADDHIS ADDLS ADDLSS ADDGE ADDGES ADDLT ADDLTS ADDGT ADDGTS ADDLE ADDLES
46 Code Meaning (for cmp or subs) Flags Tested Technische Versions Universität München of ADD on ARM EQ Equal. Z==1 NE Not equal. Z==0 CS or HS CC or LO Unsigned higher or same (or carry set). Unsigned lower (or carry clear). C==1 C==0 MI Negative ( "minus"). N==1 PL Positive or zero ("plus"). N==0 VS Signed overflow ("V set"). V==1 VC No signed overflow ("V clear"). V==0 HI Unsigned higher. (C==1) && (Z==0) LS Unsigned lower or same. (C==0) (Z==1) GE Signed greater than or equal. N==V LT Signed less than. N!=V GT Signed greater than. (Z==0) && (N==V) LE Signed less than or equal. (Z==1) (N!=V) AL (omitted) always
47 Delay Slots Delay slots are the n 1 instructions following a branch On some architectures these are executed, even if jump is taken Example (assuming n==2): Conventional ISA Loop: LOAD r2 [r1] ADD r0 r0,r2 DEC r1 JZ loop ISA with delay slots loop: LOAD r2 [r1] DEC r1 JZ loop ADD r0 r0,r2 if no independent instruction is found, a NOP must be inserted used e.g. in AM29000 microprocessors (only works for small n)
48 Branch Prediction Problem: conditional jumps are executed at a later stage in the pipeline Solution: predict whether branch is taken pipeline must be flushed if prediction was wrong. Two approaches: Static branch prediction (does not depend on earlier branches) Dynamic branch prediction (depends on branch history)
49 Branch Prediction Simplest case: always assume no branch Better: always assume branch ( 2/3 of all branches are taken!) Even better: Backward branch: assume branch Forward branch: assume no branch Unconditional branch: assume branch Alternative: Compiler gives hint whether branch is taken or not Every conditional branch needs two opcodes
50 Dynamic Branch Prediction Idea: remember if branch at address XXX was taken in the past or not. loop: LOA r2 [r1] ADD r0 r0,r2 DEC r1 XXX: JZ loop Branch only mispredicted in last iteration!
51 Dynamic Branch Prediction Branch prediction cache: N entries Tag: address of jump Entry: one bit (taken or not taken) Bits needed for LRU Stores the N most recently encountered jumps
52 Dynamic Branch Prediction More sophisticated approach: Assume branch sequence: TNTNTNTNTNT No branch predicted correctly! Solution (2 bits per entry required):
53 Dynamic Branch Prediction More sophisticated approach: e.g. Assume branch sequence: TNTNTNTNTNT No branch predicted correctly! Solution (2 bits per entry required):
54 Branch Target Cache Motivation: Branch prediction is not sufficient. Determining branch target (e.g. load from memory) takes too long! Even worse for indirect jumps and returns. Solution: Branch target cache Maps location of branches to branch targets IFETCH stage: if <current IP matches cache tag> load cache entry into pipeline else: load next (IP+1) into pipeline
55 Branch Target Cache
56 Branch Target Cache
57 Branch Target Cache Problem: Branch target cache does not work well for RETinstructions. Solution (as introduced by Cyrix): CALL stores address not only on stack... but also in branch stack (within control unit) As long as branch cache does not overflow... all returns can be correctly predicted.
Flow Control In Assembly
Chapters 6 Flow Control In Assembly Embedded Systems with ARM Cortext-M Updated: Monday, February 19, 2018 Overview: Flow Control Basics of Flowcharting If-then-else While loop For loop 2 Flowcharting
More informationVorlesung / Course IN2075: Mikroprozessoren / Microprocessors
Vorlesung / Course IN2075: Mikroprozessoren / Microprocessors Superscalarity 8 Jan 2018 Carsten Trinitis (LRR) Superscalarity Parallel Execution & ILP at at Instruction Level Parallelism Superscalarity:
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationTECH. CH14 Instruction Level Parallelism and Superscalar Processors. What is Superscalar? Why Superscalar? General Superscalar Organization
CH14 Instruction Level Parallelism and Superscalar Processors Decode and issue more and one instruction at a time Executing more than one instruction at a time More than one Execution Unit What is Superscalar?
More informationENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design
ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationWhat is Pipelining? RISC remainder (our assumptions)
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationAdministration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering
dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors
William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? Common instructions (arithmetic, load/store,
More informationPipelining, Branch Prediction, Trends
Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping
More informationUnsigned and signed integer numbers
Unsigned and signed integer numbers Binary Unsigned Signed 0000 0 0 0001 1 1 0010 2 2 0011 3 3 0100 4 4 Subtraction sets C flag opposite of carry (ARM specialty)! - if (carry = 0) then C=1 - if (carry
More informationLecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions
Lecture 7: Pipelining Contd. Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 More pipelining complications: Interrupts and Exceptions Hard to handle in pipelined
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order
More informationBranch Instructions. R type: Cond
Branch Instructions Standard branch instructions, B and BL, change the PC based on the PCR. The next instruction s address is found by adding a 24-bit signed 2 s complement immediate value
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationChapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University
Chapter 2 Instructions Sets Hsung-Pin Chang Department of Computer Science National ChungHsing University Outline Instruction Preliminaries ARM Processor SHARC Processor 2.1 Instructions Instructions sets
More informationTi Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr
Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationComparison InstruCtions
Status Flags Now it is time to discuss what status flags are available. These five status flags are kept in a special register called the Program Status Register (PSR). The PSR also contains other important
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationSuperscalar Processors
Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input
More informationA superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.
CS 320 Ch. 16 SuperScalar Machines A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. A superpipelined machine is one in which a
More informationModule 4c: Pipelining
Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A
More informationARM Cortex-M4 Architecture and Instruction Set 3: Branching; Data definition and memory access instructions
ARM Cortex-M4 Architecture and Instruction Set 3: Branching; Data definition and memory access instructions M J Brockway February 17, 2016 Branching To do anything other than run a fixed sequence of instructions,
More informationRISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.
COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped
More informationChapter 14 - Processor Structure and Function
Chapter 14 - Processor Structure and Function Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 14 - Processor Structure and Function 1 / 94 Table of Contents I 1 Processor Organization
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining
More informationCS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25
CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem
More informationPipelining. Parts of these slides are from the support material provided by W. Stallings
Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance
More informationOrganisasi Sistem Komputer
LOGO Organisasi Sistem Komputer OSK 11 Superscalar Pendidikan Teknik Elektronika FT UNY What is Superscalar? Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed
More informationPipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations
Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution
More informationARM Architecture and Instruction Set
AM Architecture and Instruction Set Ingo Sander ingo@imit.kth.se AM Microprocessor Core AM is a family of ISC architectures, which share the same design principles and a common instruction set AM does
More informationInstruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...
Instruction-set Design Issues: what is the format(s) Opcode Dest. Operand Source Operand 1... 1) Which instructions to include: How many? Complexity - simple ADD R1, R2, R3 complex e.g., VAX MATCHC substrlength,
More informationWilliam Stallings Computer Organization and Architecture
William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function Rev. 3.2.1 (2005-06) by Enrico Nardelli 11-1 CPU Functions CPU must: Fetch instructions Decode instructions
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationUpdated Exercises by Diana Franklin
C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationARM Shift Operations. Shift Type 00 - logical left 01 - logical right 10 - arithmetic right 11 - rotate right. Shift Amount 0-31 bits
ARM Shift Operations A novel feature of ARM is that all data-processing instructions can include an optional shift, whereas most other architectures have separate shift instructions. This is actually very
More informationControl Flow. September 2, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 September 2, / 21
Control Flow Geoffrey Brown Bryce Himebaugh Indiana University September 2, 2016 Geoffrey Brown, Bryce Himebaugh 2015 September 2, 2016 1 / 21 Outline Condition Codes C Relational Operations C Logical
More information6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU
1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high
More informationARM Cortex-M4 Programming Model Flow Control Instructions
ARM Cortex-M4 Programming Model Flow Control Instructions Textbook: Chapter 4, Section 4.9 (CMP, TEQ,TST) Chapter 6 ARM Cortex-M Users Manual, Chapter 3 1 CPU instruction types Data movement operations
More informationPhoto David Wright STEVEN R. BAGLEY PIPELINES AND ILP
Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationControl Hazards. Branch Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationC.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts
C-2 Appendix C Pipelining: Basic and Intermediate Concepts C.1 Introduction Many readers of this text will have covered the basics of pipelining in another text (such as our more basic text Computer Organization
More informationThere are different characteristics for exceptions. They are as follows:
e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationCS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines
CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines Assigned April 7 Problem Set #5 Due April 21 http://inst.eecs.berkeley.edu/~cs152/sp09 The problem sets are intended
More informationAnnouncements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory
ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance
More informationThe PAW Architecture Reference Manual
The PAW Architecture Reference Manual by Hansen Zhang For COS375/ELE375 Princeton University Last Update: 20 September 2015! 1. Introduction The PAW architecture is a simple architecture designed to be
More informationWhat is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise
CSCI 4717/5717 Computer Architecture Topic: Instruction Level Parallelism Reading: Stallings, Chapter 14 What is Superscalar? A machine designed to improve the performance of the execution of scalar instructions.
More information5008: Computer Architecture
5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage
More informationCPU Structure and Function
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com http://www.yildiz.edu.tr/~naydin CPU Structure and Function 1 2 CPU Structure Registers
More informationStructure of Computer Systems
288 between this new matrix and the initial collision matrix M A, because the original forbidden latencies for functional unit A still have to be considered in later initiations. Figure 5.37. State diagram
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationSuperscalar Processors Ch 14
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationWriting ARM Assembly. Steven R. Bagley
Writing ARM Assembly Steven R. Bagley Hello World B main hello DEFB Hello World\n\0 goodbye DEFB Goodbye Universe\n\0 ALIGN main ADR R0, hello ; put address of hello string in R0 SWI 3 ; print it out ADR
More informationSuperscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationInstruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...
Instruction-set Design Issues: what is the format(s) Opcode Dest. Operand Source Operand 1... 1) Which instructions to include: How many? Complexity - simple ADD R1, R2, R3 complex e.g., VAX MATCHC substrlength,
More informationUNIT- 5. Chapter 12 Processor Structure and Function
UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers
More informationSTEVEN R. BAGLEY ARM: PROCESSING DATA
STEVEN R. BAGLEY ARM: PROCESSING DATA INTRODUCTION CPU gets instructions from the computer s memory Each instruction is encoded as a binary pattern (an opcode) Assembly language developed as a human readable
More informationChapter 9. Pipelining Design Techniques
Chapter 9 Pipelining Design Techniques 9.1 General Concepts Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence. Each subtask
More informationCS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines
CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19
More informationChapter 12. CPU Structure and Function. Yonsei University
Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor
More informationLecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest
More informationCS252 Graduate Computer Architecture Midterm 1 Solutions
CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate
More informationPreventing Stalls: 1
Preventing Stalls: 1 2 PipeLine Pipeline efficiency Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal pipeline CPI: best possible (1 as n ) Structural hazards:
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationCS152 Computer Architecture and Engineering. Complex Pipelines
CS152 Computer Architecture and Engineering Complex Pipelines Assigned March 6 Problem Set #3 Due March 20 http://inst.eecs.berkeley.edu/~cs152/sp12 The problem sets are intended to help you learn the
More informationEECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)
Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static
More informationReal instruction set architectures. Part 2: a representative sample
Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length
More informationComputer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović
Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationSISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:
SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationControl Hazards. Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationPipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard.
Calcolatori Elettronici e Sistemi Operativi Pipeline issues Hazards Pipeline issues Data hazard Control hazard Structural hazard Pipeline hazard: RaW Pipeline hazard: RaW 5 6 7 8 9 5 6 7 8 9 : add R,R,R
More informationCS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming
CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationCOSC4201 Pipelining. Prof. Mokhtar Aboelaze York University
COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC
More informationPipeline Architecture RISC
Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must
More informationChapter. Out of order Execution
Chapter Long EX Instruction stages We have assumed that all stages. There is a problem with the EX stage multiply (MUL) takes more time than ADD MUL ADD We can clearly delay the execution of the ADD until
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationInstruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction
Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches
More informationAdvanced issues in pipelining
Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationOrange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction
More informationLecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2
Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time
More informationComplex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar
Complex Pipelining COE 501 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Diversified Pipeline Detecting
More informationPipelining and Vector Processing
Chapter 8 Pipelining and Vector Processing 8 1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8 2 Pipeline
More informationAdvanced processor designs
Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The
More information