CS 152 Computer Architecture and Engineering

Size: px
Start display at page:

Download "CS 152 Computer Architecture and Engineering"

Transcription

1 CS 152 Computer Architecture and Engineering Lecture 6 Superpipelining + Branch Prediction John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152: L6: Superpipelining + Branch Prediction UC Regents Spring 2014 UCB 1

2 Today: First advanced processor lecture Super-pipelining: Beyond 5 stages. Short Break. Branch prediction: Can we escape control hazards in long CPU pipelines? CS 152: L6: Superpipelining + Branch Prediction UC Regents Spring 2014 UCB 2

3 From Appendix C: Filling the branch delay slot 3

4 Superpipelining CS L9: Advanced Processors I UC Regents Fall 2008 UCB 4

5 5 Stage Pipeline: A point of departure Seconds Program Instructions Program Cycles Instruction Seconds Cycle Perfect caching CS L9: Advanced Processors I ALU IM Reg DM Reg At best, the 5-stage pipeline executes one instruction per clock, with a clock period determined by the slowest stage Processor has no multi-cycle instructions (ex: multiply with an accumulate register) Filling all delay slots (branch,load) UC Regents Fall 2008 UCB 5

6 Superpipelining: Add more stages Today! Seconds Program Instructions Program Cycles Instruction Seconds Cycle Also, power! CS L9: Advanced Processors I Goal: Reduce critical path by adding more pipeline stages. Example: 8-stage ARM XScale: extra IF, ID, data cache stages. Difficulties: Added penalties for load delays and branch misses. Ultimate Limiter: As logic delay goes to 0, FF clk-to-q and setup. UC Regents Fall 2008 UCB 6

7 Note: Some stages now overlap, some instructions take extra stages. 5 Stage 8 Stage IF IR ID+RF IR EX IR MEM IR WB IM Reg DM Reg ALU IF now takes 2 stages (pipelined I-cache) ID and RF each get a stage. ALU split over 3 stages MEM takes 2 stages (pipelined D-cache) CS L9: Advanced Processors I UC Regents Fall 2008 UCB 7

8 Superpipelining techniques... Split ALU and decode logic over several pipeline stages. Pipeline memory: Use more banks of smaller arrays, add pipeline stages between decoders, muxes. Remove rarely-used forwarding networks that are on critical path. Creates stalls, affects CPI. Pipeline the wires of frequently used forwarding networks. Also: Clocking tricks (example: use positive-edge AND negative-edge flip-flops) CS L9: Advanced Processors I UC Regents Fall 2008 UCB 8

9 Recall: IBM Power Timing Closure Pipeline engineering happens here about 1/3 of project schedule From The circuit and physical design of the POWER4 microprocessor, IBM J Res and Dev, 46:1, Jan 2002, J.D. Warnock et al. CS L9: Advanced Processors I UC Regents Fall 2008 UCB 9

10 Pipelining a 256 byte instruction memory. Fully combinational (and slow). Only read behavior shown. A7-A0: 8-bit read address 3 A7 A6 A5 A4 A3 { { A2 3 Can we add two pipeline stages? 1 D E M U X... OE OE OE OE --> Tri-state Q outputs! Byte 0-31 Byte Byte Q Q Q M U X 3 Data output is 32 bits D0-D31 32 i.e. 4 bytes Each register holds 32 bytes (256 bits) CS 152: L6: Superpipelining + Branch Prediction UC Regents Spring 2014 UCB 10

11 On a chip: Registers become SRAM cells Architects specify number of rows and columns. Word and bit lines slow down as array grows larger! Din 3 Din 2 Din 1 Din 0 Precharge WrEn Parallel Data I/O Lines WrWrite Driver & WrWrite Driver & WrWrite Driver & WrWrite Driver & - Precharger Driver + - Precharger Driver + - Precharger Driver + - Precharger Driver + SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell CS 152: L6: Superpipelining + Branch Prediction SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell : : : : SRAM Cell - Sense Amp + - Sense Amp + - Sense Amp + - Sense Amp + Dout 3 Dout 2 Dout 1 Dout 0 How could we pipeline this memory? See last slide. Word 0 Word 1 Word 15 Address Decoder A0 A1 A2 A3 Add muxes here to select subset of bits Q: Which is longer: word line or bit line? UC Regents Spring 2014 UCB 11

12 RISC CPU 5.85 million devices 0.65 million devices 12

13 IC processes are optimized for small SRAM cells From Marvell ARM CPU paper: 90% of the 6.5 million transistors, and 60% of the chip area, is devoted to cache memories. Implication? SRAM is 6X as dense as logic. 13

14 RAM Compilers Fig degree image of 22 nm tri-gate LVC SRAM bitcell. On average, 30% of a modern logic chip is SRAM, which is generated by RAM compilers. Compile-time parameters set number of bits, aspect ratio, ports, etc. Fig nm HDC and LVC SRAM bitcells. Figure : 22nm HDC CS 250 L1: Fab/Design Interface low voltage, achieving low SRAM minimum operati and LVC Tri-gate SRAM bitcells. is desirable to avoid integration, routing, and control of multiple supply domains. In the 22 nm tri-gate technology, fin quantization the fine-grained width tuning conventionally used to read stability and write margin and presents a ch designing minimum-area SRAM bitcells constrain pitch. The 22 nm process technology includes bo density m 6T SRAM bitcell (HDC) and a lo m 6T SRAM bitcell (LVC) to support tradeo performance, and minimum operating voltage acro of application requirements. In Fig. 1, a 45-degree im LVC tri-gate SRAM is pictured showing the UCB thin s UC Regents Fall 2013 wrapped on three sides by a polysilicon gate. The 14

15 ALU: Pipelining Unsigned Multiply!"#$%&#%'()*!"#$%&#%+, 3* --.-/// ///0--2 Facts to remember 5(,$%(#/&,6*"'$ m bits x n bits = m+n bit product Binary makes it easy: 0 => place 0 ( 0 x multiplicand) 1 => place a copy ( 1 x multiplicand) CS L9: Advanced Processors I UC Regents Fall 2008 UCB 15

16 Building Block: Full-Adder Variant 1-bit signals: x, y, z, s, Cin, Cout x y z Cout Cin z: one bit of multiplier s x: one bit of multiplicand If z = 1, {Cout, s} <= x + y + Cin If z = 0, {Cout, s} <= y + Cin Verilog for 2-bit entity, assign CS L9: Advanced Processors I y: one bit of the running sum UC Regents Fall 2008 UCB 16

17 Put it together: Array computes P = A x B To pipeline array: Place registers between adder stages (in green). Add registers to delay selected A and B bits (not shown) Cout P 7 CS L9: Advanced Processors I Cout A 3 P 6 Cout A 3 A 2 P 5 x A 3 A 2 A 1 P 4 Cout A 3 A 2 A 1 A 0 P Fully combinational implementation is slow! A 2 A 1 A 0 P 2 y A 1 A 0 P 1 A 0 P 0 z B 0 B 1 B 2 B 3 UC Regents Fall 2008 UCB 17

18 Adding pipeline stages is not enough... MIPS R4000: Simple 8-stage pipeline Branch stalls are the main reason why pipeline CPI > 1. 2-cycle load delay, 3-cycle branch delay. (Appendix C, Figure C.52) CS 152: L6: Superpipelining + Branch Prediction UC Regents Spring 2014 UCB 18

19 Branch Prediction CS L9: Advanced Processors I UC Regents Fall 2008 UCB 19

20 Add pipeline stages, reduce clock period Seconds Program Instructions Cycles Seconds Program Instruction Cycle Q. Could adding pipeline stages hurt the CPI for an application? A. Yes, due to these problems: CPI Problem Possible Solution Taken branches cause longer stalls Branch prediction, loop unrolling ARM XScale 8 stages CS L9: Advanced Processors I Cache misses take more clock cycles Larger caches, add prefetch opcodes to ISA UC Regents Fall 2008 UCB 20

21 Recall: Control hazards... + IF (Fetch) ID (Decode) EX (ALU) MEM WB 0x4 IR IR IR IR D PC Q I-Cache Instr Mem Addr Data We avoiding stalling by (1) adding a branch delay slot, and (2) adding comparator to ID stage If we add more early stages, we must stall. Sample Program Time: t1 t2 t3 t4 t5 t6 t7 t8 (ISA w/o branch Inst EX stage delay slot) I1: IF ID EX MEM WB computes if I2: IF ID branch is I1: BEQ R4,R3,25 I3: IF taken I2: AND R6,R5,R4 I4: I3: SUB R1,R9,R8 If branch is taken, these I5: instructions MUST NOT I6: complete! CS L9: Advanced Processors I UC Regents Fall 2008 UCB 21

22 + Solution: Branch prediction... IF (Fetch) ID (Decode) EX (ALU) MEM WB 0x4 IR IR IR IR D PC Q I-Cache Instr Mem Addr Data We update the PC based on the outputs of the branch predictor. If it is perfect, pipe stays full! Dynamic Predictors: a cache of branch history A control instr? Taken or Not Taken? Branch Predictor Predictions If taken, where to? What PC? CS L9: Advanced Processors I Time: Inst I1: I2: I3: I4: I5: I6: t1 t2 t3 t4 t5 t6 t7 t8 EX stage IF ID EX MEM WB computes if IF ID branch is taken IF If we predicted incorrectly, these instructions MUST NOT complete! UC Regents Fall 2008 UCB 22

23 Branch predictors cache branch history Address of branch instruction 0b0110[...] entries... = = = = Hit 30 bits Branch Target Buffer (BTB) 30-bit address tag 0b0110[...]0010 At EX stage, update BTB/BHT, kill instructions, if necessary, CS 152: L6: Superpipelining + Branch Prediction target address PC Loop Taken Address Branch instruction BNEZ R1 Loop Branch History Table (BHT) 2 state bits Taken or Not Taken Drawn as fully associative to focus on the essentials. In real designs, always directmapped. UC Regents Spring 2014 UCB 23

24 Branch predictor: direct-mapped version 0b011[..]010[..]100 BNEZ R1 Loop Address of BNEZ instruction Branch Target Buffer (BTB) 18-bit address tag 0b011[...]01 = Hit 18 bits CS L9: Advanced Processors I target address PC Loop Taken Address Must check prediction, kill instruction if needed % accurate As in real-life... direct-mapped bits 4096 BTB/BHT entries Branch History Table (BHT) Taken or Not Taken Update BHT/BTB for next time, once true behavior known UC Regents Fall 2008 UCB 24

25 Simple ( 2-bit ) Branch History Table Entry Prediction for next branch. (1 = take, 0 = not take) Initialize to 0. Was last prediction correct? (1 = yes, 0 = no) Initialize to 1. D Q D Q Flip bit if prediction is not correct and last predict correct bit is 0. After we check prediction... Set to 1 if prediction bit was correct. Set to 0 if prediction bit was incorrect. Set to 1 if prediction bit flips. We do not change the prediction the first time it is incorrect. Why? loop: ADDI R4,R0,11 SUBI R4,R4,-1 BNE R4,R0,loop CS L9: Advanced Processors I This branch taken 10 times, then not taken once (end of loop). The next time we enter the loop, we would like to predict take the first time through. UC Regents Fall 2008 UCB 25

26 80-90% accurate 4096-entry 2-bit predictor Figure C.19 26

27 Branch Prediction: Trust, but verify... Instr Fetch Decode & Reg Fetch Execute D PC Q Instr I-Cache Mem Addr Data IR IR IR +4 Predicted PC Logic A branch instr? Taken or Not Taken? Branch Predictor and BTB Predictions If taken, where to? What PC? B P CS 152: L6: Superpipelining + Branch Prediction rs1 rs2 ws wd RegFile WE rd1 rd2 Ext Note instruction type and branch target. Pass to next stage. A B B P op A L U 32 Branch Taken/Not Taken Prediction info --> Prediction info --> Y Check all predictions. Take actions if needed (kill instructions, update predictor). UC Regents Spring 2014 UCB 27

28 Flowchart control for dynamic branch prediction. Figure

29 Spatial Predictors C code snippet: b1 b2 b3 After compilation: b1 Idea: Devote hardware to four 2-bit predictors for BEQZ branch. P1: Use if b1 and b2 not taken. P2: Use if b1 taken, b2 not taken. P3: Use if b1 not taken, b2 taken. P4: Use if b1 and b2 taken. Track the current taken/not-taken status of b1 and b2, and use it to choose from P1... P4 for BEQZ... How? b1 We want to predict this branch. b2 b2 b3 Can b1 and b2 help us predict it? 29

30 Branch History Register: Tracks global history D PC Q Instr Fetch Instr I-Cache Mem Addr Data Decode & Reg Fetch We choose which predictor to use (and update) based on the Branch History Register. IR IR IR +4 Predicted PC Logic A branch instr? Taken or Not Taken? Branch Predictor and BTB Predictions If taken, where to? What PC? B P CS 152: L6: Superpipelining + Branch Prediction rs1 rs2 ws wd Prediction info --> RegFile WE rd1 rd2 Ext A B op A L U 32 Branch History Register Branch Taken/Not Taken D Q D Q WE WE Shift register. Holds taken/not-taken status of last 2 branches. Y UC Regents Spring 2014 UCB 30

31 Spatial branch predictor (BTB, tag not shown) 0b0110[...] BEQZ R3 L3 Branch History Tables Map PC to index P1 P2 P3 P4 Detects patterns in: 2 state bits 2 state bits 2 state bits 2 state bits Branch History Register D WE WE Q (bb==2) branch D Q (aa==2) branch CS 152: L6: Superpipelining + Branch Prediction Mux to choose which branch predictor Taken or Not Taken For (aa!= bb) branch code. Yeh and Patt, % accurate UC Regents Spring 2014 UCB 31

32 Performance For more details on branch prediction: 4096 vs 1024? Fair comparison, matches total # of bits) One BHT (4096 entries) Spatial (4 BHTs, each with 1024 entries) 95% accurate Figure

33 Predict function returns by stacking call info Program counter Alternate Branch history tables Branch prediction Return stack Target cache Figure

34 Hardware limits to superpipelining? FO4 Delays Historical limit: about 12 FO4s CS 250 L3: Timing MIPS stages CPU Clock Periods Pentium Pro 10 stages FO4: How many fanout-of-4 inverter delays in the clock period. Pentium 4 20 stages Thanks to Francois Labonte, Stanford * intel 386 intel 486 intel pentium intel pentium 2 intel pentium 3 intel pentium 4 intel itanium Alpha Alpha Alpha Sparc SuperSparc Sparc64 Mips HP PA Power PC AMD K6 AMD K7 AMD x86-64 Power wall: Intel Core Duo has 14 stages UC Regents Fall 2013 UCB 34

35 CPU DB: Recording Microprocessor History With this open database, you can mine microprocessor trends over the past 40 years. Andrew Danowitz, Kyle Kelley, James Mao, John P. Stevenson, Mark Horowitz, Stanford University F04 Delays Per Cycle for Processor Designs F04 / cycle FO4 delay per cycle is roughly proportional to the amount of computation completed per cycle. 35

36 On Tuesday We turn our focus to memory system design... Have a good weekend! 36

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 17 Advanced Processors I 2005-10-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 20 Advanced Processors I 2005-4-5 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 10 -- Cache I 2014-2-20 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L10: Cache I UC

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/

More information

CS Digital Systems Project Laboratory. Lecture 9: Advanced Processors I

CS Digital Systems Project Laboratory. Lecture 9: Advanced Processors I CS 194-6 Digital Systems Project Laboratory Lecture 9: Advanced Processors I 2008-11-17 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TA: Greg Gibeling www-inst.eecs.berkeley.edu/~cs194-6/ 1 Today: Beyond

More information

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 52 Computer Architecture and Engineering Lecture 26 Mid-Term II Review 26--3 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs52/ CS 52 L26: Mid-Term

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

EECS Digital Design

EECS Digital Design EECS 150 -- Digital Design Lecture 11-- Processor Pipelining 2010-2-23 John Wawrzynek Today s lecture by John Lazzaro www-inst.eecs.berkeley.edu/~cs150 1 Today: Pipelining How to apply the performance

More information

HY425 Lecture 05: Branch Prediction

HY425 Lecture 05: Branch Prediction HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 52 Computer Architecture and Engineering Lecture 6 -- Midterm I Review Session 204-3-3 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs52/ Play: CS 52 L6: Midterm

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2005-9-20 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/ Office Hours

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer rchitecture and Engineering Lecture 10 Pipelining III 2005-2-17 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Ts: Ted Hong and David arquardt www-inst.eecs.berkeley.edu/~cs152/ Last time:

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2006-9-19 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ Last Time: ipod

More information

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design 2014-1-21 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today s lecture

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 4 Testing Processors 2005-1-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction

Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction ISA Support Needed By CPU Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with control hazards in instruction pipelines by: 1 2 3 4 Assuming that the branch

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Material from Earlier in the Semester Throughput and latency

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence 2014-3-6 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today:

More information

CS152 Computer Architecture and Engineering Lecture 16: Memory System

CS152 Computer Architecture and Engineering Lecture 16: Memory System CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

CS 152 Computer Architecture and Engineering Lecture 3 Metrics

CS 152 Computer Architecture and Engineering Lecture 3 Metrics CS 152 Computer Architecture and Engineering Lecture 3 Metrics 2014-1-28 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-insteecsberkeleyedu/~cs152/ Play: CS 152 L3: Metrics UC Regents

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

The Processor: Improving the performance - Control Hazards

The Processor: Improving the performance - Control Hazards The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary

More information

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Hiroaki Kobayashi // As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Branches will arrive up to n times faster in an n-issue processor, and providing an instruction

More information

RISC, CISC, and ISA Variations

RISC, CISC, and ISA Variations RISC, CISC, and ISA Variations CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. iclicker

More information

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

CpE 442. Memory System

CpE 442. Memory System CpE 442 Memory System CPE 442 memory.1 Outline of Today s Lecture Recap and Introduction (5 minutes) Memory System: the BIG Picture? (15 minutes) Memory Technology: SRAM and Register File (25 minutes)

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines 6.823, L15--1 Branch Prediction & Speculative Execution Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L15--2 Branch Penalties in Modern Pipelines UltraSPARC-III

More information

Lecture 9 Pipeline and Cache

Lecture 9 Pipeline and Cache Lecture 9 Pipeline and Cache Peng Liu liupeng@zju.edu.cn 1 What makes it easy Pipelining Review all instructions are the same length just a few instruction formats memory operands appear only in loads

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

ECE 2300 Digital Logic & Computer Organization. Caches

ECE 2300 Digital Logic & Computer Organization. Caches ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Lecture 8: Compiling for ILP and Branch Prediction. Advanced pipelining and instruction level parallelism

Lecture 8: Compiling for ILP and Branch Prediction. Advanced pipelining and instruction level parallelism Lecture 8: Compiling for ILP and Branch Prediction Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 Advanced pipelining and instruction level parallelism

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 15 Cache II 2005-3-8 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last Time: Locality

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

Branch prediction ( 3.3) Dynamic Branch Prediction

Branch prediction ( 3.3) Dynamic Branch Prediction prediction ( 3.3) Static branch prediction (built into the architecture) The default is to assume that branches are not taken May have a design which predicts that branches are taken It is reasonable to

More information

6.823 Computer System Architecture Datapath for DLX Problem Set #2

6.823 Computer System Architecture Datapath for DLX Problem Set #2 6.823 Computer System Architecture Datapath for DLX Problem Set #2 Spring 2002 Students are allowed to collaborate in groups of up to 3 people. A group hands in only one copy of the solution to a problem

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II

CS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II CS252 Spring 2017 Graduate Computer Architecture Lecture 8: Advanced Out-of-Order Superscalar Designs Part II Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Super Scalar. Kalyan Basu March 21,

Super Scalar. Kalyan Basu March 21, Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy

More information

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002

More information

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson

More information

Static Branch Prediction

Static Branch Prediction Static Branch Prediction Branch prediction schemes can be classified into static and dynamic schemes. Static methods are usually carried out by the compiler. They are static because the prediction is already

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 26 -- Midterm II Review Session 2014-4-29 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152

More information

CS 152 Computer Architecture and Engineering. Lecture 5 - Pipelining II (Branches, Exceptions)

CS 152 Computer Architecture and Engineering. Lecture 5 - Pipelining II (Branches, Exceptions) CS 152 Computer Architecture and Engineering Lecture 5 - Pipelining II (Branches, Exceptions) John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences

More information

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Hakim Weatherspoon CS 3410 Computer Science Cornell University Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. Prelim today Starts

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last

More information

Chapter 1. Computer Abstractions and Technology. Lesson 3: Understanding Performance

Chapter 1. Computer Abstractions and Technology. Lesson 3: Understanding Performance Chapter 1 Computer Abstractions and Technology Lesson 3: Understanding Performance Manufacturing ICs 1.7 Real Stuff: The AMD Opteron X4 Yield: proportion of working dies per wafer Chapter 1 Computer Abstractions

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

EE 3170 Microcontroller Applications

EE 3170 Microcontroller Applications EE 3170 Microcontroller Applications Lecture 4 : Processors, Computers, and Controllers - 1.2 (reading assignment), 1.3-1.5 Based on slides for ECE3170 by Profs. Kieckhafer, Davis, Tan, and Cischke Outline

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31 4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor

More information

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop

More information

Topic #6. Processor Design

Topic #6. Processor Design Topic #6 Processor Design Major Goals! To present the single-cycle implementation and to develop the student's understanding of combinational and clocked sequential circuits and the relationship between

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction

More information

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipelining to Superscalar Forecast Real

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information