EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont
|
|
- Frederick Ward
- 5 years ago
- Views:
Transcription
1 GAS STATION Pipelining & Hazards II Fall 208 Jon Beaumont Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie ellon niversity, Purdue niversity, niversity of ichigan, niversity of Pennsylvania, and niversity of Wisconsin. Slide
2 Class Question Which of the following best explains why pipelining results in speedup? a) Instructions are executed with shorter latency b) Clock period is reduced c) ore instructions are executed at the same time d) agnets Slide 2
3 Announcements and Readings Reminder Lab #2 due Thursday before lab Get checked off during GSI/IA OH Verilog assignment #2 due Tuesday 9/25 Submit to autograder by :59p HW # due Tuesday 9/8 (tomorrow) Submit through Gradescope by :59p Arm at career fair Slide 3
4 Last Time Baseline processor discussion Review 5-stage pipeline from EECS 370 Slide 4
5 Today Hazards Detection Resolution Software (avoidance) Hardware (stalling, forwarding) Slide 5
6 Hazards Instruction pipelines are not ideal i.e. Instructions in different stages can have dependencies Suppose add 2 3 nand RAW!! add nand Inst 0 Inst t 0 t t 2 t 3 t 4 t 5 Ft 0 Dt Et 2 t 3 Wt 4 t 5 F FD DE E StallW E W F FD D E Stall DW E Why does this discourage deep pipelines? Slide 6
7 Terminology Dependency: Anything that requires the result of a previous instruction Pipeline Hazards: Anything that might cause a delay for given implementation ust ensure program dependences are not violated Hazard Resolution: Static ethod: Performed at compile time in software Dynamic ethod: Performed at run time using hardware Pipeline Interlock: Hardware mechanisms for dynamic hazard resolution ust detect and enforce dependences at run time Slide 7
8 Types of Dependencies and Hazards Data Dependence (Both memory and register) True dependence (RAW) Instruction must wait for all required input operands Anti-Dependence (WAR) Later write must not clobber a still-pending earlier read Output dependence (WAW) Earlier write must not clobber already-completed later write Last two are also called name dependences Control Dependence (aka Procedural Dependence) Conditional branches may change instruction sequence Instructions after cond. branch depend on outcome Structural Hazard Any limitation of hardware resources Slide 8
9 Hazard Distance Necessary Conditions for Data Hazards j:r k _ Reg Write j:r k _ Reg Write j:_ r k stage Reg Read stage Y i:r k _ Reg Write i:_ r k Reg Read i:r k _ Reg Write WAW Hazard WAR Hazard RAW Hazard dist(i,j) dist(,y)?? Hazard!! dist(i,j) > dist(,y)?? Safe Slide 9
10 Handling Data Hazards Avoidance (static) ake sure there are no hazards in the code Detect and Stall (dynamic) Stall until earlier instructions finish Detect and Forward (dynamic) Get correct value from elsewhere in pipeline Slide 0
11 Handling Data Hazards: Avoidance Programmer/compiler must know implementation details Insert nops between dependent instructions add 2 3 nop nop nand write R3 in cycle 5 read R3 in cycle 6 Slide
12 Problems with Avoidance Binary compatibility Code size New implementations may require more nops Higher instruction cache footprint Longer binary load times Worse in machines that execute multiple instructions / cycle Intel Itanium 25-40% of instructions are nops Slower execution CPI=, but many instructions are nops How to handle non-deterministic latencies? E.g. cache misses and floating-point? Slide 2
13 Handling Data Hazards: Detect & Stall Detection Stall Compare rega & regb with DestReg of preceding insn(s) n-bit comparators Do not advance pipeline register for Fetch/Decode Pass nop to Execute Slide 3
14 Register file Fetch Decode Execute emory WB PC Inst mem PC instruction rega regb R0 R R2 R3 R4 R5 R6 R7 0 PC vala valb offset A L target eq? AL result valb Data memory AL result mdata data dest Bits 0-2 Bits 6-8 Bits dest op dest op dest op IF/ ID ID/ E E/ em em/ WB 4
15 5 PC Inst mem Register file A L Data memory IF/ ID ID/ E E/ em em/ WB op dest offset valb vala PC PC target AL result op dest valb op dest AL result mdata eq? instruction 0 R2 R3 R4 R5 R R6 R0 R7 rega regb data dest Fetch Decode Execute emory WB
16 Register file End of Cycle PC Inst mem PC add 2 3 rega regb data R0 R R2 R3 R4 R5 R6 R PC vala valb offset A L target eq? AL result valb Data memory AL result mdata op op op IF/ ID ID/ E E/ em em/ WB 6
17 Register file End of Cycle 2 PC Inst mem PC nand rega regb data R0 R R2 R3 R4 R5 R6 R PC A L target eq? AL result valb Data memory AL result mdata add op op IF/ ID ID/ E E/ em em/ WB 7
18 Register file First half of cycle 3 PC Inst mem PC nand Hazard detection 3 3 rega regb data R0 R R2 R3 R4 R5 R6 R PC A L target eq? AL result valb Data memory AL result mdata add op op IF/ ID ID/ E E/ em em/ WB 8
19 Hazard detected 3 compare compare compare compare rega regb 3 REG file IF/ ID ID/ E 9
20 Hazard detected compare rega regb 20
21 Register file First half of cycle 3 en en PC Inst mem 2 nand Hazard 3 3 rega regb data R0 R R2 R3 R4 R5 R6 R A L target eq? AL result Data memory AL result mdata valb add IF/ ID ID/ E E/ em em/ WB 2
22 Register file End of cycle 3 PC Inst mem 2 nand rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory AL result mdata noop add IF/ ID ID/ E E/ em em/ WB 22
23 Register file First half of cycle 4 en en PC Inst mem 2 nand Hazard 3 rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory AL result mdata noop add IF/ ID ID/ E E/ em em/ WB 23
24 Register file End of cycle 4 PC Inst mem 2 nand rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L Data memory 2 noop noop add IF/ ID ID/ E E/ em em/ WB 24
25 Register file First half of cycle 5 PC Inst mem 2 nand No Hazard 3 rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L Data memory 2 noop noop add IF/ ID ID/ E E/ em em/ WB 25
26 Register file End of cycle 5 PC Inst mem 3 add rega regb data R0 R R2 R3 R4 R5 R6 R A L Data memory nand noop noop IF/ ID ID/ E E/ em em/ WB 26
27 Problems with Detect & Stall CPI increases on every hazard Are these stalls necessary? Not always! The new value for R3 is in the E/em register Reroute the result to the nand Called forwarding or bypassing Slide 27
28 Handling Data Hazards: Detect & Forward Detection Same as detect and stall, but each possible hazard requires different forwarding paths Forward Add data paths for all possible sources Add mux in front of AL to select source bypassing logic often a critical path in wide-issue machines # paths grows quadratically with machine width Slide 28
29 Sample Code Reminder Run the following code on a pipelined datapath: nand ; reg 5 = reg 3 ~& reg 4 add ; reg 7 = reg 6 reg 3 lw ; reg 6 = em[reg30] sw ; em[reg60] =reg 2 Slide 29
30 Register file First half of cycle 3 PC Inst mem 2 nand Hazard 3 3 rega regb data R0 R R2 R3 R4 R5 R6 R A L Data memory add IF/ ID fwd fwd fwd ID/ E E/ em em/ WB 30
31 Register file End of cycle 3 PC Inst mem 3 add rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory nand add IF/ ID H ID/ E E/ em em/ WB 3
32 Register file First half of cycle 4 PC Inst mem 3 add New Hazard rega regb data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory nand add IF/ ID H ID/ E E/ em em/ WB 32
33 Register file End of cycle 4 PC Inst mem 4 lw rega regb 75 3 data R0 R R2 R3 R4 R5 R6 R A L -2 Data memory 2 add nand add IF/ ID H2 ID/ E H E/ em em/ WB 33
34 Register file First half of cycle 5 PC Inst mem 4 lw No Hazard 3 rega regb 75 3 data R0 R R2 R3 R4 R5 R6 R A L -2 Data memory 2 add nand add IF/ ID H2 ID/ E H E/ em em/ WB 34
35 Register file End of cycle 5 PC Inst mem 5 sw rega regb 67 5 data R0 R R2 R3 R4 R5 R6 R A L 22 Data memory -2 lw add nand IF/ ID ID/ E H2 E/ em H em/ WB 35
36 Register file First half of cycle 6 en en PC Inst mem 5 sw Hazard 6 rega regb 67 5 L data R0 R R2 R3 R4 R5 R6 R A L 22 Data memory -2 lw add nand IF/ ID ID/ E H2 E/ em H em/ WB 36
37 Register file End of cycle 6 PC Inst mem 5 sw rega regb 6 7 data R0 R R2 R3 R4 R5 R6 R A L 3 Data memory 22 noop lw add IF/ ID ID/ E E/ em H2 em/ WB 37
38 Register file First half of cycle 7 PC Inst mem 5 sw Hazard 6 rega regb 6 7 data R0 R R2 R3 R4 R5 R6 R A L 3 Data memory 22 noop lw add IF/ ID ID/ E E/ em H2 em/ WB 38
39 39 PC Inst mem Register file A L Data memory IF/ ID ID/ E E/ em em/ WB sw noop lw R2 R3 R4 R5 R R6 R0 R7 rega regb 6 data H3 End of cycle 7
40 40 PC Inst mem Register file A L Data memory IF/ ID ID/ E E/ em em/ WB sw noop lw R2 R3 R4 R5 R R6 R0 R7 rega regb 6 data H3 First half of cycle
41 4 PC Inst mem Register file A L Data memory IF/ ID ID/ E E/ em em/ WB sw 7 noop R2 R3 R4 R5 R R6 R0 R7 rega regb data H3 End of cycle 8
42 Control Hazards beq 0 sub beq sub t 0 t t 2 t 3 t 4 t 5 F D E W F D E W squash Slide 42
43 Handling Control Hazards Avoidance (static) No branches? Convert branches to predication Control dependence becomes data dependence Detect and Stall (dynamic) Stop fetch until branch resolves Speculate and squash (dynamic) Keep going past branch, throw away instructions if wrong Slide 43
44 Avoidance: if-conversion if (a == b) { x; y = n / d; } sub t a, b jnz t, PC2 add x x, # div y n, d sub t a, b add(t) x x, # div(t) y n, d sub t a, b add t2 x, # div t3 n, d cmov(t) x t2 cmov(t) y t3 Slide 44
45 Handling Control Hazards: Detect & Stall Detection In decode, check if opcode is branch or jump Stall Hold next instruction in Fetch Pass noop to Decode Slide 45
46 Problems with Detect & Stall CPI increases on every branch Are these stalls necessary? Not always! Branch is only taken half the time Assume branch is NOT taken Keep fetching, treat branch as noop If wrong, make sure bad instructions don t complete Slide 46
47 Handling Control Hazards: Speculate & Squash Speculate Assume branch is not taken Squash Overwrite opcodes in Fetch, Decode, Execute with noop Pass target to Fetch Slide 47
48 PC beq sub add nand Inst mem noop add IF/ ID Control REG file sign ext noop sub ID/ E equal A L noop beq E/ em Data memory beq em/ WB 48
49 Problems with Speculate & Squash Always assumes branch is not taken Can we do better? Yes. Predict branch direction and target! Why possible? Program behavior repeats. ore on branch prediction to come... Slide 49
50 Pipeline Hazard Checklist emory Data Dependences Output Dependence (WAW) Anti Dependence (WAR) True Data Dependence (RAW) Register Data Dependences Output Dependence (WAW) Anti Dependence (WAR) True Data Dependence (RAW) Control Dependences Slide 50
51 Instruction Level Parallelism Slide 5
52 Limitations of Scalar Pipelines pper Bound on Scalar Pipeline Throughput Limited by IPC= Flynn Bottleneck Inefficient nification Into Single Pipeline Long latency for each instruction Performance Lost Due to Rigid In-order Pipeline nnecessary stalls Slide 52
53 Architectures for Instruction-Level Parallelism Slide 53
54 Superscalar achine Slide 54
55 What is the real problem? CPI of in-order pipelines degrades very sharply if the machine parallelism is increased beyond a certain point, i.e., when Nx approaches average distance between dependent instructions Forwarding is no longer effective Pipeline may never be full due to frequent dependency stalls! Slide 55
56 ILP: Instruction-Level Parallelism ILP is a measure of the amount of inter-dependencies between instructions Average ILP =no. instruction / no. cyc required code: ILP = code2: ILP = 3 i.e. must execute serially i.e. can execute at the same time code: r r2 r3 r / 7 r4 r0 - r3 code2: r r2 r3 r9 / 7 r4 r0 - r0 Slide 56
57 Purported Limits on ILP Weiss and Smith [984].58 Sohi and Vajapeyam [987].8 Tjaden and Flynn [970].86 Tjaden and Flynn [973].96 ht [986] 2.00 Smith et al. [989] 2.00 Jouppi and Wall [988] 2.40 Johnson [99] 2.50 Acosta et al. [986] 2.79 Wedig [982] 3.00 Butler et al. [99] 5.8 elvin and Patt [99] 6 Wall [99] 7 Kuck et al. [972] 8 Riseman and Foster [972] 5 Nicolau and Fisher [984] 90 Slide 57
58 Scope of ILP Analysis ILP= ILP=3 r r2 r3 r / 7 r4 r0 - r3 r r2 r3 r9 / 7 r4 r0 - r20 ILP=.5 =2.0 more accurately Out-of-order execution exposes more ILP Slide 58
59 The Problem With In-Order Pipelines addf f0,f,f2 F D E E E W mulf f2,f3,f2 F D d* d* E* E* E* E* E* W subf f0,f,f4 F p* p* D E E E W What s happening in cycle 4? mulf stalls due to RAW hazard OK, this is a fundamental problem subf stalls due to pipeline hazard Why? subf can t proceed into D because mulf is there That is the only reason, and it isn t a fundamental one Why can t subf go into D in cycle 4 and E in cycle 5? Slide 59
Pipelining & Hazards. Prof. Thomas Wenisch GAS STATION. Lecture 3 EECS 470. Slide 1
Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining & Hazards Fall 2 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs4
More informationEECS 470. Further review: Pipeline Hazards and More. Lecture 2 Winter 2018
EECS 470 Further review: Pipeline Hazards and ore Lecture 2 Winter 208 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar,
More informationEECS 470. Control Hazards and ILP. Lecture 3 Winter 2014
EECS 470 Control Hazards and ILP Lecture 3 Winter 2014 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of
More informationSuperscalar Organization
Superscalar Organization Nima Honarmand Instruction-Level Parallelism (ILP) Recall: Parallelism is the number of independent tasks available ILP is a measure of inter-dependencies between insns. Average
More information(Basic) Processor Pipeline
(Basic) Processor Pipeline Nima Honarmand Generic Instruction Life Cycle Logical steps in processing an instruction: Instruction Fetch (IF_STEP) Instruction Decode (ID_STEP) Operand Fetch (OF_STEP) Might
More informationBeyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL
Beyond Pipelining Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/
More informationECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti
ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipelining to Superscalar Forecast Real
More informationPipeline design. Mehran Rezaei
Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We
More informationPipelining to Superscalar
Pipelining to Superscalar ECE/CS 752 Fall 207 Prof. Mikko H. Lipasti University of Wisconsin-Madison Pipelining to Superscalar Forecast Limits of pipelining The case for superscalar Instruction-level parallel
More informationPipeline Processor Design
Pipeline Processor Design Beyond Pipeline Architecture Virendra Singh Computer Design and Test Lab. Indian Institute of Science Bangalore virendra@computer.org Advance Computer Architecture Branch Hazard
More informationEECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018
EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,
More informationInstr. execution impl. view
Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction
More informationEECS 470 Lecture 2. Performance, Power & ISA. Fall Jon Beaumont
Performance, Power & ISA Fall 218 Jon Beaumont Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, udge, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie
More informationBasic Pipelining Concepts
Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationEECS 470 Lecture 1. Computer Architecture Winter 2014
EECS 470 Lecture 1 Computer Architecture Winter 2014 Slides developed in part by Profs. Brehob, Austin, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch 1 What Is Computer
More informationEECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont
Lecture 18 Simultaneous Multithreading Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi,
More informationEE382A Lecture 3: Superscalar and Out-of-order Processor Basics
EE382A Lecture 3: Superscalar and Out-of-order Processor Basics Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 3-1 Announcements HW1 is due today Hand
More informationEN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture
EN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture Prof. Sherief Reda School of Engineering Brown University Material from: Mostly from Modern Processor Design by Shen and
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationEECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018
EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,
More informationWide Instruction Fetch
Wide Instruction Fetch Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 edu/courses/eecs470 block_ids Trace Table pre-collapse trace_id History Br. Hash hist. Rename Fill Table
More informationEN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture Prof. Sherief Reda School of Engineering Brown University Material from: Mostly from Modern Processor Design by Shen and
More informationECE/CS 552: Pipeline Hazards
ECE/CS 552: Pipeline Hazards Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipeline Hazards Forecast Program Dependences
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationLecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationSuperscalar Organization
Superscalar Organization ECE/CS 752 Fall 2017 Prof. Mikko H. Lipasti University of Wisconsin-Madison Stage Phase Function performed CPU, circa 1986 IF φ 1 Translate virtual instr. addr. using TLB φ 2 Access
More informationVery Simple MIPS Implementation
06 1 MIPS Pipelined Implementation 06 1 line: (In this set.) Unpipelined Implementation. (Diagram only.) Pipelined MIPS Implementations: Hardware, notation, hazards. Dependency Definitions. Hazards: Definitions,
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationLecture 10: Pipelined Implementations: Hazards and Resolutions. Instruction Pipeline Reality
18-447 Lecture 10: Pipelined Implementations: Hazards and Resolutions S 09 L10-1 James C. Hoe José F. Martínez Electrical and Computer Engineering Carnegie Mellon University February 15, 2010 Instruction
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationPIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PIPELINING: HAZARDS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission deadline: Jan. 30 th This
More informationECE 505 Computer Architecture
ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationHigh-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs
High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs October 29, 2002 Microprocessor Research Forum Intel s Microarchitecture Research Labs! USA:
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationCS 152 Computer Architecture and Engineering Lecture 4 Pipelining
CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationVery Simple MIPS Implementation
06 1 MIPS Pipelined Implementation 06 1 line: (In this set.) Unpipelined Implementation. (Diagram only.) Pipelined MIPS Implementations: Hardware, notation, hazards. Dependency Definitions. Hazards: Definitions,
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationAdvanced Instruction-Level Parallelism
Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike
More informationPipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations
Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationEECS 470 Lecture 13. Basic Caches. Fall 2018 Jon Beaumont
Basic Caches Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationCPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor
1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationHomework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures
Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order
More informationPipelined Processor Design
Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 20 SE-273: Processor Design Courtesy: Prof. Vishwani Agrawal
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More informationEECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)
Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static
More informationCPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor
Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationModern Computer Architecture
Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationLecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S
Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationProcessor (II) - pipelining. Hwansoo Han
Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationPipelined Processor Design
Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationThis Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods
10-1 Dynamic Scheduling 10-1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods Not yet complete. (Material below may
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Instruction Level Parallelism Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson /
More informationChapter 06: Instruction Pipelining and Parallel Processing
Chapter 06: Instruction Pipelining and Parallel Processing Lesson 09: Superscalar Processors and Parallel Computer Systems Objective To understand parallel pipelines and multiple execution units Instruction
More informationChapter 4. The Processor
Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationThe Processor: Improving the performance - Control Hazards
The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More information