Architectures. Computer Architecture. CPU - Harvard. CPU Von Neumann. Calcolatori Elettronici II. Processor. Memory Hierarchy I/O INSTR/DATA

Size: px
Start display at page:

Download "Architectures. Computer Architecture. CPU - Harvard. CPU Von Neumann. Calcolatori Elettronici II. Processor. Memory Hierarchy I/O INSTR/DATA"

Transcription

1 Calcolatori Elettronici II Computer Architecture Processor Architectures architecture ctrl/dp pipeline issues and high-performance solutions Memory Hierarchy cache (L,L,L,...), central memory, mass memory, backup storage why, how I/O polling, interrupt, DMA A.A. 9/ CPU Von Neumann CPU - Harvard CTRL DP CTRL DP INSTR/ INSTR

2 CPU Pipeline issues CTRL-Unit Instruction Fetch Instruction Decode Branch prediction Hazard management Hazards stall Data hazard (RaR, RaW, WaR, WaW) data forwarding register renaming Datapath Register File Control hazard branch prediction pseudocode R = R-R if (R<) R= Functional units Adder, Multiplier, Divider, Logic functions,... Pipeline static dynamic conditional execution using branches SUB R, R, R ; R <- R-R BPL LABEL ; result is negative? MOV R, ; R <- LABEL:... ILP (Instruction Level Parallelism) speculative execution Structural hazard using conditional execution SUB R, R, R ; R <- R-R MOVMI R, ; R <- if result is negative Data forwarding Register renaming DIVF F6, F, F8 Static register renaming WaR Compiler IF ID OF EX ME WB SUBF F8, F, F WaW ADDF F6, F, F8 must take into account: branches subroutines rename registers DIVF S, F, F8 SUBF T, F, F RaW (data forwarding) Dynamic register renaming Reservation Station Tomasulo's algorithm IBM 6/9 FP unit ADDF F6, F, T

3 Branch prediction Branch prediction Branch Target Buffer Static STAT STAT DEST delay slots to fill (programmer/compiler) [MIPS] prediction bit Dynamic history prediction N N /T /T T /N /N T T T N N T,T N,T T,N N,N P T Miss always predict as not taken (do not insert in BTB if correctly predicted) always predict as taken (insert in BTB) taken if DEST < Branch prediction Branch Target Buffer Branch prediction Branch Target Buffer TAG IDX TAG IDX TAG STAT DEST TAG STAT DEST TAG STAT DEST { { n-ways

4 Conditional execution - Speculative execution Superscalar Conditional execution instruction is fetched but executed if a condition is true ARM IF ID OF EX EX ME WB ADDEQ R, R, R EX n Speculative execution both jump branches are executed wrong results are discarded Superscalar Superscalar HW multithreading IF ID IF ID OF EX ME OF EX WB ME WB Pipe IF ID+OF EX ME WB Thread A Instruction Thread C Instruction Thread A Instruction Thread C Instruction Thread A Instruction EX n Pipe Thread B Instruction Thread D Instruction Thread B Instruction Thread D Instruction Thread B Instruction

5 In-order execution EX In order start, in order end IF ID OF EX ME WB Reorder buffer EX n In order start, out of order end, in order write back Slow instructions cause stalls even with no hazards addf F, F,F 5 cycles mov R, R cycle History buffer In order start, out of order end and write back Reservation Shift Register Reservation Shift Register FU Rd V FU: Functional Unit used Rd: Destination Register V: Valid : Program Counter FU Rd V FU: Functional Unit used Rd: Destination Register V: Valid : Program Counter Instruction that requires k cycles is inserted in row k All position before k are marked as used At each cycle, data in RSR are shifted to up ( row) : mul R, R, R cycles : mov R, cycle 8: addf F, F, F cycles FU X mul Rd X R V X X In-order execution: in order start, in order end

6 Reservation Shift Register Reservation Shift Register FU Rd V FU: Functional Unit used Rd: Destination Register V: Valid : Program Counter FU Rd V FU: Functional Unit used Rd: Destination Register V: Valid : Program Counter : mul R, R, R cycles : mov R, cycle 8: addf F, F, F cycles FU mul Rd R V : mul R, R, R cycles : mov R, cycle 8: addf F, F, F cycles FU mov Rd R V Reservation Shift Register FU Rd V FU: Functional Unit used Rd: Destination Register V: Valid : Program Counter ReOrder Buffer FU V ROBptr RSR ROBptr: pointer to ROB entry Rd C RES ROB C: Completed RES: Result : mul R, R, R cycles : mov R, cycle 8: addf F, F, F cycles FU Rd V X X X X X X X X X addf F X X X 8 Instruction that requires k cycles is inserted in row k of RSR An entry in ROB is filled (not entirely) ROB is a circular buffer At each cycle, data in RSR are shifted to up ( row) When an instruction exits from RSR, result is written in ROB When an instruction exits from ROB, result is written in destination In-order execution: in order start, out of order end, in order write back

7 ReOrder Buffer ReOrder Buffer FU V addf ROBptr Rd C RES F C: Completed RES: Result FU V ROBptr add addf Rd C RES F R C: Completed RES: Result RSR head = tail = ROB RSR head = tail = ROB : addf F, F, F cycles : mov R, cycle 8: mul R, R, R cycles : addf F, F, F cycles : mov R, cycle 8: mul R, R, R cycles ReOrder Buffer ReOrder Buffer FU V addf ROBptr Rd C RES F R C: Completed RES: Result FU V addf mul ROBptr Rd C RES F R R 8 C: Completed RES: Result RSR head = tail = ROB RSR head = tail = ROB : addf F, F, F cycles : mov R, cycle 8: mul R, R, R cycles : addf F, F, F cycles : mov R, cycle 8: mul R, R, R cycles

8 ReOrder Buffer ReOrder Buffer FU V mul ROBptr Rd C RES F. R R 8 C: Completed RES: Result FU V mul ROBptr Rd C RES R R 8 C: Completed RES: Result RSR head = tail = ROB RSR head = tail = ROB : addf F, F, F cycles : mov R, cycle 8: mul R, R, R cycles Instruction in ROB() can exit write. in F : addf F, F, F cycles : mov R, cycle 8: mul R, R, R cycles ReOrder Buffer ReOrder Buffer FU V ROBptr Rd C RES R R 5 8 C: Completed RES: Result FU V mul ROBptr Rd C RES R R 5 8 C: Completed RES: Result RSR head = tail = ROB RSR head = tail = ROB : addf F, F, F cycles : mov R, cycle 8: mul R, R, R cycles : addf F, F, F cycles : mov R, cycle 8: mul R, R, R cycles Instruction in ROB() can exit write to R

9 Very Long Instruction Word History Buffer FU V HBptr Rd Rd C OLD OLD: Old destination value Parallelism is explicit in instructions Control simplified Compiler complex RSR allows faster WB HB High bandwitdh CPU/Memory Instruction that requires k cycles is inserted in row k of RSR An entry in ROB is filled (the current value of destination is saved in OLD) ROB is a circular buffer At each cycle, data in RSR are shifted to up ( row) When an instruction exits from RSR, result is written in destination Until an instruction is in HB, old data can be restored if needed (interrupt, exception, branch) op Rd Rsa Rsb op Rd Rsa Rsb op n Rd n Rsa n Rsb n FU FU FU n In-order execution: in order start, out of order end and write back Memory Hierarchy Memory Hierarchy CPU / Memory speed mismatch Memory access time - cycles high cost (area/energy/$) 5- cycles 5-5 cycles Many accesses for small areas Program characteristics: Predictability / Structure / Linear data structures / Sequential flow Principle of locality / Locality of reference Temporal locality an accessed memory location is likely to be accessed again in the near future Spatial locality if program accesses memory location X, it is probable that will access locations X±, X±, X±n (n small) Temporal locality Spatial locality

10 Memory Hierarchy Memory Hierarchy Small Expensive Fast CPU registers (one o more levels, on- and off-chip) RAM Mass storage (HDD, Flash) Backups (Tape) Big Cheap Slow Register Memory structure n- n Data in LOAD n Line precharge write D D D Address Row Decoder Memory Array LOAD LOAD IN LOAD CLK D OUT IN CLK M U X D OUT Sense amplifiers Data out read

11 Memory structure SRAM cell Data in Line precharge write BL P V DD P BL_b Address Row Decoder Memory Array WL N N GND N N Sense amplifiers MUX Data out read. precharge bitlines read: V DD / write V DD /+ V DD /-. address wordlines write: keep bitlines driven DRAM cell Memories Small BL WL Destructive read restore data after each read Need refresh ROM access time SRAM read write access time address setup time address stable before WR data setup time data stable before WR address hold time address stable after WR

12 Memories Memories DRAM multiplexed address, sent in phases (ROW,COL) DRAM RAS time (row address setup time) ROW stable before -RAS signal row address hold time ROW stable after -RAS signal CAS time (column address setup time) column address hold time RAS access time time between -RAS signal and data ready -CAS -RAS COL ROW or CAS access time RAS/CAS precharge time time between two accesses CAS time Column Address hold time RAS time RAS access time Memory Hierarchy ADDRESS = i ADDRESS CPU L L L REGS R A M H D D T A P E -Miss i N-

13 ADDRESS = i ADDRESS ADDRESS = i ADDRESS i = i -Hit i = Miss Penalty (time/energy) <Access> = Access cache (+MR MP) time/energy N- N- Read Hit Miss Write Hit Read data from next levels Read a whole line (exploits spatial locality) Write-through Write-back Miss Write-allocate Write-no-allocate Associative addressed by content CAM (content addressable memories) standard memories + control

14 Associative Direct-Mapped Lsize = #DSP (or block size) ADDRESS V ADDRESS COMP LINE HIT/MISS V LINE ADDRESS { TAG IDX DSP TAG } Lines = #IDX (or blocks) V: Valid COMP HIT/MISS MUX Size = Lines x Lsize ( Size) Actual size = Size + (TAG+V) Lines Set-Associative Set-Associative #TAG = #ADDRESS - #IDX - #DSP V TAG V TAG V TAG LINE { TAG IDX DSP ADDRESS COMP COMP COMP H H H MUX n MUX HIT = H + H H n Lines = #IDX Lsize = #DSP Size = nways x Lines x Lsize ( Size) Actual size = Size + (TAG+V) Lines nways = Direct-Mapped Lines = Associative

15 Replacement LRU counters or shift registers (nways x Lines) pseudo LRU FIFO Random Replacement LRU counters or shift registers (nways x Lines) access LRU stack for line i reg- reg- reg- reg- initial access counters for line i way- way- way- way- initial Insert the last accessed way, shift other values Reset the last accessed way's counter increment counters below the modified one Replacement LRU pseudo LRU ways, bits (B, B, B ) for each line: (B,B,B ) = x replace way ; (B,B,B ) = x (B,B,B ) = x replace way ; (B,B,B ) = x (B,B,B ) = x replace way ; (B,B,B ) = x (B,B,B ) = x replace way ; (B,B,B ) = x B B B replace way B = not B B = not B replace way B = not B B = not B replace way B = not B B = not B replace way B = not B B = not B Misses (-C's model) Compulsory cold-start miss Capacity miss in a fully associative cache Conflict (Collision) miss not happened in a fully associative cache associative caches do not have conflict misses too many conflicts: trashing conflict misses can avoid capacity misses

16 conflict misses can avoid capacity misses conflict misses can avoid capacity misses repeated, sequential accesses from to B (B+ bytes) repeated, sequential accesses from to B (B+ bytes). Associative cache, size B, LS= access to : miss (compulsory) insert the whole line (addresses,,,) access to,,: hit access to : miss (compulsory) insert the whole line (addresses,5,6,7)... access to B: miss (capacity) replace reference to addresses,,, access to : miss (capacity) replace reference to addresses,5,6,7 access to,,: hit access to : miss (capacity)... MR ~.5 ( MR = (B/ + )/(B+) ). DM cache, size B, LS=. Associative cache, size B, LS= MR =.5. DM cache, size B, LS= access to : miss (compulsory) insert the whole line (addresses,,,) access to,,: hit access to : miss (compulsory) insert the whole line (addresses,5,6,7)... access to B: miss (capacity) replace reference to addresses,,, access to : miss (capacity) replace reference to addresses,,, access to,,: hit access to : hit... MR = [.5 + N /(B+)] / (N+) /(B+) Rules of thumb MR(DM N ) ~ MR(-ways N/ ) x size ½ miss rate Enlarging Lsize: decrease MR increase MP SPEC9 Stack Distance program memory references addr, addr, addr,..., addr n push references in a stack (removing from stack if already present) stack distance of reference R position in stack (if present) (if not present)

17 Stack Distance program memory references,, 8,, 8 8 SD(8) = Stack Distance P HIT = D: stack distance L: Lines W: nways W a D a W L Hyp: uniform distribution of cache line access a LW L Da DM cache P HIT = L L D= P HIT = (two consecutive refs) D= access sequence: addr, other, addr miss if other has replaced addr P MISS = /L P HIT = -/L = (L-)/L D: prob other, other,..., other D has no evicted addr P HIT = P HIT (D=) D = [(L-)/L] D D Multi-level cache Special instructions Inclusive: data in L are in L, in... too Exclusive: data are in L, or in L, or... (only one) Mainly inclusive (intermediate) Victim cache Memory mapped Polling Interrupt DMA

18 Data P cache Memory Memory Address Devices are used reading and writing their internal registers Do not cache data from HW Data P cache Memory Memory Address Devices are used reading and writing their internal registers Do not cache data from HW HW I/O I/O Address Special instructions in, out e.g.: in R,x Address Data HW Enable HW I/O Special instructions in, out e.g.: in R,x Memory mapped e.g.: address xfe is R of HW address xfe is R of HW CLK Polling Device signals to check e.g.: READY, DEVREADY,... R is mapped at address xffff R(:) = (READY,DEVREADY) ; hw reg R: data MOV R, xffff L: LDR R, R ; read status (hw register R) TST R, ; data is ready? BEQ L ; no: read again MOV R, xffff LDR R, R ; read data (hw register R) very simple CPU time wasted Interrupt Program the device for data transfer Execute something else Get data when the device send a signal (interrupt) Interrupts have a priority

19 Daisy-Chained Interrupts Interrupt line CPU Interrupt line HW HW HW HW HW5 INT CTRL HW6 HW7 CPU Interrupt line ack HW ack HW ack HW Device signals an interrupt Device uses internal registers to show that is waiting to be served CPU reads HW register to find devices to handle Device signals an interrupt Device uses internal registers to show that is waiting to be served CPU reads HW register to find devices to handle Maskable Interrupt (IRQ) CPU can ignore interrupts instructions to mask / unmask interrupts Non Maskable Interrupt (NMI) always received critical events parity errors power off Interrupt Level-triggered interrupt line is kept high until the interrupt is handled if line is shared all interrupt must be served scan devices until a requesting one is found handle interrupt check the interrupt line again Interrupt Edge-triggered interrupt is signaled by a pulse if line is shared: check all devices (more pulses can be merged) if masked, interrupt can be lost latch to record pulses Interrupt Message-signaled

20 Interrupt handling. Finish current instruction. Save flags (not always) and return address. Signal interrupt handling. Find the handling routine The routine can depend on the interrupt line 5. Jump to routine. mask interrupts. access device. unmask interrupts. handle data transfer Precise Interrupt saved in a known position all instructions up to the current: executed current instruction: known state all instructions after the current: not executed or results discharged Imprecise interrupt Example: Interrupts in /AT CPU INTR INTA IMR IRR ISR 859A 859A: Intel programmable interrupt controller IMR IRR ISR 859A IMR: Interrupt Mask Register. 859A: INTR= IRR: Interrupt Request Register. CPU: INTA= (pulse) ISR: Interrupt Service Register. CPU: INTA= (pulse). 859A: data on data bus (8 bits) 5. CPU: jump to ISR (depends on data received) Example: Interrupt assignments in /AT Master 859 IRQ: System timer IRQ: Keyboard controller IRQ: to slave 859 IRQ: serial ports (COM and COM) IRQ: serial ports (COM and COM) IRQ5: parallel port LPT IRQ6: floppy disk controller IRQ7: parallel port LPT Slave 859 IRQ8: real-time clock (RTC) IRQ: mouse controller IRQ: math coprocessor IRQ: hd controller IRQ5: hd controller

21 DMA P cache HW Memory I/O Processor write device register to setup transfer memory pointer data size transfer type Device read/write data in memory with its own rate and latency Device send an interrupt Polling simple computationally expensive Interrupt CPU transfers data from device to memory an interrupt for each data word DMA an interrupt for each data block IO device must act as bus master

Pipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard.

Pipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard. Calcolatori Elettronici e Sistemi Operativi Pipeline issues Hazards Pipeline issues Data hazard Control hazard Structural hazard Pipeline hazard: RaW Pipeline hazard: RaW 5 6 7 8 9 5 6 7 8 9 : add R,R,R

More information

Architecture. System architecture. System architecture. System architecture. Calcolatori Elettronici e Sistemi Operativi. CPU(s) Memory hierarchy

Architecture. System architecture. System architecture. System architecture. Calcolatori Elettronici e Sistemi Operativi. CPU(s) Memory hierarchy Calcolatori Elettronici e Sistemi Operativi System architecture CPU(s) Architecture Memory hierarchy s Main memory Interconnect system System/Memory bus I/O busses ata, eisa, i2c, ide, pci, pcmcia, scsi,

More information

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design SRAMs to Memory Low Power VLSI System Design Lecture 0: Low Power Memory Design Prof. R. Iris Bahar October, 07 Last lecture focused on the SRAM cell and the D or D memory architecture built from these

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 12 Mahadevan Gomathisankaran March 4, 2010 03/04/2010 Lecture 12 CSCE 4610/5610 1 Discussion: Assignment 2 03/04/2010 Lecture 12 CSCE 4610/5610 2 Increasing Fetch

More information

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy

More information

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2. Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

These three counters can be programmed for either binary or BCD count.

These three counters can be programmed for either binary or BCD count. S5 KTU 1 PROGRAMMABLE TIMER 8254/8253 The Intel 8253 and 8254 are Programmable Interval Timers (PTIs) designed for microprocessors to perform timing and counting functions using three 16-bit registers.

More information

8086 Interrupts and Interrupt Responses:

8086 Interrupts and Interrupt Responses: UNIT-III PART -A INTERRUPTS AND PROGRAMMABLE INTERRUPT CONTROLLERS Contents at a glance: 8086 Interrupts and Interrupt Responses Introduction to DOS and BIOS interrupts 8259A Priority Interrupt Controller

More information

Module 3. Embedded Systems I/O. Version 2 EE IIT, Kharagpur 1

Module 3. Embedded Systems I/O. Version 2 EE IIT, Kharagpur 1 Module 3 Embedded Systems I/O Version 2 EE IIT, Kharagpur 1 Lesson 15 Interrupts Version 2 EE IIT, Kharagpur 2 Instructional Objectives After going through this lesson the student would learn Interrupts

More information

Multiple Instruction Issue and Hardware Based Speculation

Multiple Instruction Issue and Hardware Based Speculation Multiple Instruction Issue and Hardware Based Speculation Soner Önder Michigan Technological University, Houghton MI www.cs.mtu.edu/~soner Hardware Based Speculation Exploiting more ILP requires that we

More information

HY425 Lecture 05: Branch Prediction

HY425 Lecture 05: Branch Prediction HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. CS 320 Ch. 16 SuperScalar Machines A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. A superpipelined machine is one in which a

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =

More information

COSC4201. Prof. Mokhtar Aboelaze York University

COSC4201. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 3 Multi Cycle Operations Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RTI) 1 Multicycle Operations More than one function unit, each

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

CMSC411 Fall 2013 Midterm 2 Solutions

CMSC411 Fall 2013 Midterm 2 Solutions CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has

More information

Tutorial 11. Final Exam Review

Tutorial 11. Final Exam Review Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution

More information

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Tomasulo

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

SAE5C Computer Organization and Architecture. Unit : I - V

SAE5C Computer Organization and Architecture. Unit : I - V SAE5C Computer Organization and Architecture Unit : I - V UNIT-I Evolution of Pentium and Power PC Evolution of Computer Components functions Interconnection Bus Basics of PCI Memory:Characteristics,Hierarchy

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Super Scalar. Kalyan Basu March 21,

Super Scalar. Kalyan Basu March 21, Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build

More information

Processor: Superscalars Dynamic Scheduling

Processor: Superscalars Dynamic Scheduling Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

Complex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units

Complex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units 6823, L14--1 Complex Pipelining: Out-of-order Execution & Register Renaming Laboratory for Computer Science MIT http://wwwcsglcsmitedu/6823 Multiple Function Units 6823, L14--2 ALU Mem IF ID Issue WB Fadd

More information

Summary of Computer Architecture

Summary of Computer Architecture Summary of Computer Architecture Summary CHAP 1: INTRODUCTION Structure Top Level Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

E0-243: Computer Architecture

E0-243: Computer Architecture E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

CS152 Computer Architecture and Engineering. Complex Pipelines

CS152 Computer Architecture and Engineering. Complex Pipelines CS152 Computer Architecture and Engineering Complex Pipelines Assigned March 6 Problem Set #3 Due March 20 http://inst.eecs.berkeley.edu/~cs152/sp12 The problem sets are intended to help you learn the

More information

Current Microprocessors. Efficient Utilization of Hardware Blocks. Efficient Utilization of Hardware Blocks. Pipeline

Current Microprocessors. Efficient Utilization of Hardware Blocks. Efficient Utilization of Hardware Blocks. Pipeline Current Microprocessors Pipeline Efficient Utilization of Hardware Blocks Execution steps for an instruction:.send instruction address ().Instruction Fetch ().Store instruction ().Decode Instruction, fetch

More information

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

EECC551 Exam Review 4 questions out of 6 questions

EECC551 Exam Review 4 questions out of 6 questions EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

Execution/Effective address

Execution/Effective address Pipelined RC 69 Pipelined RC Instruction Fetch IR mem[pc] NPC PC+4 Instruction Decode/Operands fetch A Regs[rs]; B regs[rt]; Imm sign extended immediate field Execution/Effective address Memory Ref ALUOutput

More information

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation CAD for VLSI 2 Pro ject - Superscalar Processor Implementation 1 Superscalar Processor Ob jective: The main objective is to implement a superscalar pipelined processor using Verilog HDL. This project may

More information

Lecture 9 Pipeline and Cache

Lecture 9 Pipeline and Cache Lecture 9 Pipeline and Cache Peng Liu liupeng@zju.edu.cn 1 What makes it easy Pipelining Review all instructions are the same length just a few instruction formats memory operands appear only in loads

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue

More information

PC Interrupt Structure and 8259 DMA Controllers

PC Interrupt Structure and 8259 DMA Controllers ELEC 379 : DESIGN OF DIGITAL AND MICROCOMPUTER SYSTEMS 1998/99 WINTER SESSION, TERM 2 PC Interrupt Structure and 8259 DMA Controllers This lecture covers the use of interrupts and the vectored interrupt

More information

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards Computer Organization CS 231-01 Improving Performance Dr. William H. Robinson November 8, 2004 Topics Money's only important when you don't have any. Sting Cache Scoreboarding http://eecs.vanderbilt.edu/courses/cs231/

More information

CS 2410 Mid term (fall 2018)

CS 2410 Mid term (fall 2018) CS 2410 Mid term (fall 2018) Name: Question 1 (6+6+3=15 points): Consider two machines, the first being a 5-stage operating at 1ns clock and the second is a 12-stage operating at 0.7ns clock. Due to data

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Final Review Shuai Wang Department of Computer Science and Technology Nanjing University Computer Architecture Computer architecture, like other architecture, is the art

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lecture 29 Review CPU time: the best metric Be sure you understand CC, clock period Common (and good) performance metrics Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

Mo Money, No Problems: Caches #2...

Mo Money, No Problems: Caches #2... Mo Money, No Problems: Caches #2... 1 Reminder: Cache Terms... Cache: A small and fast memory used to increase the performance of accessing a big and slow memory Uses temporal locality: The tendency to

More information

Floating Point/Multicycle Pipelining in DLX

Floating Point/Multicycle Pipelining in DLX Floating Point/Multicycle Pipelining in DLX Completion of DLX EX stage floating point arithmetic operations in one or two cycles is impractical since it requires: A much longer CPU clock cycle, and/or

More information

Memory Organization MEMORY ORGANIZATION. Memory Hierarchy. Main Memory. Auxiliary Memory. Associative Memory. Cache Memory.

Memory Organization MEMORY ORGANIZATION. Memory Hierarchy. Main Memory. Auxiliary Memory. Associative Memory. Cache Memory. MEMORY ORGANIZATION Memory Hierarchy Main Memory Auxiliary Memory Associative Memory Cache Memory Virtual Memory MEMORY HIERARCHY Memory Hierarchy Memory Hierarchy is to obtain the highest possible access

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions Lecture 7: Pipelining Contd. Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 More pipelining complications: Interrupts and Exceptions Hard to handle in pipelined

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15

More information

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model. Performance of Computer Systems CSE 586 Computer Architecture Review Jean-Loup Baer http://www.cs.washington.edu/education/courses/586/00sp Performance metrics Use (weighted) arithmetic means for execution

More information

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory CS 152 Computer Architecture and Engineering Lecture 6 - Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs152!

More information

Memory Hierarchy and Caches

Memory Hierarchy and Caches Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline

More information