Lecture 9 Pipeline and Cache

Size: px
Start display at page:

Download "Lecture 9 Pipeline and Cache"

Transcription

1 Lecture 9 Pipeline and Cache Peng Liu liupeng@zju.edu.cn 1

2 What makes it easy Pipelining Review all instructions are the same length just a few instruction formats memory operands appear only in loads and stores Hazards limit performance Structural: need more HW resources Data: need forwarding, compiler scheduling Data hazards must be handled carefully MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) 2

3 MIPS Pipeline Data / Control Paths A (fast) 1 PCSrc 0 IF/ID Control ID/EX EX EX/MEM MEM PC 4 Instruction Memory Read Address Add RegWrite Read Addr 1 Register Read Read Addr 2Data 1 File Write Addr Read Data 2 Write Data Sign 16 Extend 32 Shift left 2 ALUSrc Add ALU ALU cntrl ALUOp Branch Address Write Data Data Memory Read Data MemWrite MemRead MEM/WB MemtoReg 1 0 WB RegDst 3

4 MIPS Pipeline Data / Control Paths (debug) 1 0 ID/EX EX/MEM PCSrc MEM/WB IF/ID Instr EX MEM WB Control Instr Control Instr Control PC 4 Instruction Memory Read Address Add RegWrite Read Addr 1 Register Read Read Addr 2Data 1 File Write Addr Read Data 2 Write Data Sign 16 Extend 32 Shift left 2 ALUSrc RegDst Add ALU ALU cntrl ALUOp Branch Address Write Data Data Memory Read Data MemWrite MemRead MemtoReg 1 0 4

5 MIPS Pipeline Control (pipelined debug) PC Instruction Memory Read Address Add IF/ID EX Control RegWrite Read Addr 1 Register Read Read Addr 2Data 1 File Write Addr Read Data 2 Write Data Sign 16 Extend 32 ID/EX Instr 0 MEM Control Shift left 2 ALUSrc 0 1 Add ALU ALU cntrl ALUOp EX/MEM Instr Branch Address WB Control Write Data Data Memory Read Data MemWrite MemRead PCSrc MEM/WB Instr MemtoReg RegDst 5

6 Control Hazards When the flow of instruction addresses is not what the pipeline expects; incurred by change of flow instructions Conditional branches (beq, bne) Unconditional branches (j) Possible solutions Stall Move decision point earlier in the pipeline Predict Delay decision (requires compiler support) Control hazards occur less frequently than data hazards; there is nothing as effective against control hazards as forwarding is for data hazards 6

7 The Last Type of Pipeline Hazards: Control Hazards Just like a data hazard on the program counter (PC) Cannot fetch the next instruction if I don t know its address (PC) Only worse: we use the PC at the very beginning of the pipeline Makes forwarding very difficult Calculating the next PC for MIPS: Most instructions: PC + 4 Can be calculated in IF stage and forward to next instructions Branches Must first compare registers and calculate PC+4+offset Must fetch, decode, and execute instruciton Jump Must calculate new PC based on offset Must fetch and decode instruction 7

8 1 0 Datapath Branch and Jump Hardware 1 0 Shift left 2 Jump ID/EX EX/MEM PCSrc IF/ID Control PC 4 Instruction Memory Read Address Add PC+4[31-28] Read Addr 1 Register Read Read Addr 2Data 1 File Write Addr Read Data 2 Write Data 16 Sign 32 Extend Shift left Add ALU ALU cntrl Branch Address Data Memory Write Data Read Data MEM/WB Forward Unit 8

9 Jumps Incur One Stall Jumps not decoded until ID, so one stall is needed I n s t r. O r d e r j stall lw and ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg Fortunately, jumps are very infrequent only 2% of the SPECint instruction mix 9

10 Review: Branches Incur Three Stalls I n s t r. O r d e r beq stall stall stall lw ALU IM Reg DM Reg ALU Can fix branch hazard by waiting stall but affects throughput IM Reg DM Reg and ALU IM Reg DM 10

11 Moving Branch Decisions Earlier in Pipe Move the branch decision hardware back to the EX stage Reduces the number of stall cycles to two Adds an and gate and a 2x1 mux to the EX timing path Add hardware to compute the branch target address and evaluate the branch decision to the ID stage Reduces the number of stall cycles to one (like with jumps) Computing branch target address can be done in parallel with RegFile read (done for all instructions only used when needed) Comparing the registers can t be done until after RegFile read, so comparing and updating the PC adds a comparator, an and gate, and a 3x1 mux to the ID timing path Need forwarding hardware in ID stage For longer pipelines, decision points are later in the pipeline, incurring more stalls, so we need a better solution 11

12 Early Branch Forwarding Issues Bypass of source operands from the EX/MEM if (IDcontrol.Branch and (EX/MEM.RegisterRd!= 0) and (EX/MEM.RegisterRd = IF/ID.RegisterRs)) ForwardC = 1 if (IDcontrol.Branch and (EX/MEM.RegisterRd!= 0) and (EX/MEM.RegisterRd = IF/ID.RegisterRt)) ForwardD = 1 Forwards the result from the second previous instr. to either input of the Compare MEM/WB forwarding is taken care of by the normal RegFile write before read operation If the instruction immediately before the branch produces one of the branch compare source operands, then a stall will be required since the EX stage ALU operation is occurring at the same time as the ID stage branch compare operation 12

13 1 0 Supporting ID Stage Branches PCSrc IF/ID Hazard Unit Control 0 Branch 1 0 ID/EX EX/MEM PC 4 Add Instruction Memory Read 0 Address IF.Flush Shift left 2 Read Addr 1 RegFile Read Addr 2 Read Data 1 Write Addr ReadData 2 Write Data 16 Sign Extend Add 32 Compare 1 0 ALU ALU cntrl Data Memory Read Data Address Write Data MEM/WB 1 0 Forward Unit 0 1 Forward Unit 13

14 Branch Prediction Resolve branch hazards by assuming a given outcome and proceeding without waiting to see the actual branch outcome 1. Predict not taken always predict branches will not be taken, continue to fetch from the sequential instruction stream, only when branch is taken does the pipeline stall If taken, flush instructions in the pipeline after the branch in IF, ID, and EX if branch logic in MEM three stalls in IF if branch logic in ID one stall ensure that those flushed instructions haven t changed machine state automatic in the MIPS pipeline since machine state changing operations are at the tail end of the pipeline (MemWrite or RegWrite) restart the pipeline at the branch destination 14

15 Flushing with Misprediction (Not Taken) I n s t r. O r d e r 4 beq $1,$2,2 8 flush sub $4,$1,$5 16 and $6,$1,$7 20 or r8,$1,$9 ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg To flush the IF stage instruction, add a IF.Flush control line that zeros the instruction field of the IF/ID pipeline register (transforming it into a noop) 15

16 Branch Prediction, con t Resolve branch hazards by statically assuming a given outcome and proceeding 2. Predict taken always predict branches will be taken Predict taken always incurs a stall (if branch destination hardware has been moved to the ID stage) As the branch penalty increases (for deeper pipelines), a simple static prediction scheme will hurt performance With more hardware, possible to try to predict branch behavior dynamically during program execution 3. Dynamic branch prediction predict branches at run-time using run-time information 16

17 Dynamic Branch Prediction A branch prediction buffer (aka branch history table (BHT)) in the IF stage, addressed by the lower bits of the PC, contains a bit that tells whether the branch was taken the last time it was execute Bit may predict incorrectly (may be from a different branch with the same low order PC bits, or may be a wrong prediction for this branch) but the doesn t affect correctness, just performance If the prediction is wrong, flush the incorrect instructions in pipeline, restart the pipeline with the right instructions, and invert the prediction bit The BHT predicts when a branch is taken, but does not tell where its taken to! A branch target buffer (BTB) in the IF stage can cache the branch target address (or!even! the branch target instruction) so that a stall can be avoided 17

18 1-bit Prediction Accuracy 1-bit predictor in loop is incorrect twice when not taken Assume predict_bit = 0 to start (indicating branch not taken) and loop control is at the Loop: 1 st loop instr bottom of the loop code 2 nd loop instr 1. First time through the loop, the predictor. mispredicts the branch since the branch is taken. back to the top of the loop; invert prediction bit. (predict_bit = 1) last loop instr 2. As long as branch is taken (looping), prediction is correct bne $1,$2,Loop fall out instr 3. Exiting the loop, the predictor again mispredicts the branch since this time the branch is not taken falling out of the loop; invert prediction bit (predict_bit = 0) For 10 times through the loop we have a 80% prediction accuracy for a branch that is taken 90% of the time 18

19 2-bit Predictors A 2-bit scheme can give 90% accuracy since a prediction must be wrong twice before the prediction bit is changed right 9 times wrong on loop Taken fall out 1 Taken 0 Predict Taken Predict Not Taken Not taken Taken right on 1 st iteration Not taken Taken Predict Taken 1 Not taken 0 Predict Not Taken Not taken Loop: 1 st loop instr 2 nd loop instr... last loop instr bne $1,$2,Loop fall out instr 19

20 Delayed Decision First, move the branch decision hardware and target address calculation to the ID pipeline stage A delayed branch always executes the next sequential instruction the branch takes effect after that next instruction MIPS software moves an instruction to immediately after the branch that is not affected by the branch (a safe instruction) thereby hiding the branch delay As processor go to deeper pipelines and multiple issue, the branch delay grows and need more than one delay slot. Delayed branching has lost popularity compared to more expensive but more flexible dynamic approaches Growth in available transistors has made dynamic approaches relatively cheaper 20

21 Scheduling Branch Delay Slots A. From before branch B. From branch target C. From fall through add $1,$2,$3 if $2=0 then delay slot sub $4,$5,$6 add $1,$2,$3 if $1=0 then delay slot A is the best choice, fills delay slot & reduces instruction count (IC) In B, the sub instruction may need to be copied, increasing IC In B and C, must be okay to execute sub when branch fails add $1,$2,$3 if $1=0 then delay slot sub $4,$5,$6 becomes becomes becomes add $1,$2,$3 if $2=0 then if $1=0 then add $1,$2,$3 add $1,$2,$3 if $1=0 then sub $4,$5,$6 sub $4,$5,$6 21

22 Compiler Optimizations & Data Hazards Compilers rearrange code to try to avoid load-use stalls Also known as: filling the load delay slot Idea: find an independent instruction to space apart load & use When successful, no stall is introduced dynamically Independent instruction from before or after the load-use No data dependence to the load-use instructions No control dependence to the load-use instructions No exception issues When can t fill the slot Leave it empty, hardware will introduce the stall In original MIPS (no HW stalls), a nop was needed in the slot 22

23 Code Scheduling to Avoid Stalls Recorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; 23

24 Stall on branches Control Hazard Solutions Wait until you know the answer (branch CPI becomes 3) Control inserted in ID stage of the pipeline Predict not-taken Assume branch is not taken and continue with PC + 4 If branch is actually not taken, all is good Otherwise, nullify misfetched instructions and restart from PC+4+offset Predict taken Assume branch is taken More complex, cause you also need to predict PC+4+offset Still need to nullify instructions if assumption is wrong Most machines do some type of predictions (taken, not-taken) 24

25 Exceptions and Interrupts Unexpected events requiring changes in flow of control Different ISAs use the terms differently Exception Arises within the CPU E.g., undefined opcode, overflow, syscall, Interrupt E.g., from an external I/O controller (network card) Dealing with them without sacrificing performance is hard 25

26 Handling Exceptions Something bad happens to an instruction Need to stop the machine Fix up the problem (with privileged code like the operating system) Start the machine up again MIPS: exceptions managed by a System Control Coprocessor (CP0) Save PC of offending (or interrupted) instruction In MIPS: Exception Program Counter (EPC) Save indication of the problem In MIPS: Cause register We ll assume 1-bit for the examples 0 for undefined opcode, 1 for overflow Jump to handler code at hardwired address (e.g., 8000_0180) 26

27 An Alternate Mechanism Vectored Interrupts Handler address determined by the cause Avoids the software overhead of selecting the write code to run based on cause register Example: Undefined opcode: C000_0000 Overflow: C000_0020 C000_0040 Instructions either Deal with the interrupt, or Jump to real handler 27

28 Handler Actions Read cause, and transfer to relevant handler A software switch statement May be necessary even if vectored interrupts are available Determine action required If restartable Take corrective action Use EPC to return to program Otherwise Terminate program (e.g. segfault, ) Report error using EPC, cause, 28

29 Precise Exceptions Definition: precise exceptions All previous instruction had completed The faulting instruction was not started None of the next instructions were started No changes to the architecture state (registers, memory) Why are precise exceptions desirable by OS developers? With a single cycle machine, precise exceptions are easy Why? 29

30 Exceptions in a Pipeline Another form of control hazard Consider overflow on add in EX stage Add $1, $2, $1 Prevent $1 from being clobbered by add Complete previous instructions Nullify subsequent instructions Set Cause and EPC register values Transfer control to handler Similar to mispredicted branch Uses much of the same hardware 30

31 Nullifying Instructions Nullify = turn an instruction into a nop ( or pipeline bubble) Reset its RegWrite and MemWrite signals Does not affect the sate of the system Resets its branch and jump signals Does not cause unexpected flow control Mark that is should not raise any exceptions of its own Does not cause unexpected flow control Let if flow down the pipeline 31

32 Dealing with Multiple Exceptions Pipelining overlaps multiple instructions Could have multiple exceptions at once Simple approach: deal with exception from earliest instruction Flush subsequent instructions Necessary for precise exceptions In more complex pipelines Multiple instructions issued per cycle Out-of-order completion Maintaining precise exceptions is difficult! 32

33 Imprecise Exceptions Actually available in several processors Just stop pipeline and save pipeline state Including exception causes Let the handler work out Which instruction(s) had exceptions Which to complete or flush May require manual completion Simplifies hardware, but more complex handler software Not feasible for complex multiple-issue out-of-order pipelines 33

34 Beyond the 5-sstage Pipeline Convert transistors to performance Use transistors to Exploit parallelism Or create it (speculate) Processor generations Simple machine Reuse hardware Pipelined Separate hardware for each stage Super-scalar Multiple port mems, function units Out-of-order Mega-ports, complex scheduling Speculation Multi-core processors Each design has more logic to accomplish same task (but faster) 34

35 Memory System Outline Memory systems basics Memory technologies SRAM, DRAM, Disk Principle of locality Spatial and temporal locality Memory Hierarchies Exploiting locality 35

36 Random Access Memory(RAM) Dynamic Random Access Memory (DRAM) High density, low power, cheap, but slow Dynamic since data must be refreshed regularly ( leaky buckets ) Contents are lost when power is lost Static Random Access Memory (SRAM) Lower density (about 1/10 density of DRAM), higher cost Static since data is held without refresh if power is on Fast access time, often 2 to 10 times faster than DRAM Flash memory Holds contents without power Data written in blocks, generally slow Very cheap 36

37 Principle of locality Programs Have Locality Programs access a relatively small portion of the address space at any given time Can tell what memory locations a program will reference in the future by looking at what it referenced recently in the past Two Types of Locality Temporal Locality if an item has been referenced recently, it will tend to be referenced again soon Spatial Locality if an item has been referenced recently, nearby items will tend to be referenced soon Nearby refers to memory addresses 37

38 Spatial Locality Locality Examples Likely to reference data near recent references Example: for (i=0; i<n; i++) a[i] = Temporal Locality Likely to reference same data that was referenced recently Example: for (i=0; i<n; i++) a[i] = f(a[i-1]); 38

39 Taking Advantages of Locality Memory hierarchy: Store everything on disk Or Flash in some systems Copy recently accessed (and nearby) data to smaller DRAM memory DRAM is called main memory Copy more recently accessed (and nearby) data to smaller SRAM memory Cache memory attached or close to CPU 39

40 Typical Memory Hierarchy: Everything is a Cache for Something Else Access time Capacity Managed by 1 cycle ~500B Software/compiler 1-3 cycles ~64KB hardware 5-10 cycles 1-10MB hardware ~100 cycles ~10GB Software/OS cycles ~100GB Software/OS 40

41 Caches A cache is an interim storage component Functions as a buffer for larger, slower storage components Exploits principles of locality Provide as much inexpensive storage space as possible Offer access speed equivalent to the fastest memory For data in the cache Key is to have the right data cached Computer systems often use multiple caches Cache ideas are not limited to hardware designers Example: Web caches widely used on the Internet 41

42 Memory Hierarchy Levels Block (aka line): unit of copying May be multiple words If accessed data is present in upper level Hit: access satisfied by upper level Hit ratio: hits/accesses If accessed data is absent Miss: block copied from lower level Time taken: miss penalty Miss ratio: misses/accesses= 1 - hit ratio Then data supplied to CPU from upper level 42

43 Average Memory Access Times Need to define an average access time Since some will be fast and some slow Access time = hit time + miss rate x miss penalty The hope is that the hit time will be low and the miss rate low since the miss penalty is so much larger than the hit time Average Memory Access Time (AMAT) Formula can be applied to any level of the hierarchy Access time for that level Can be generalized for the entire hierarchy Average access time that the processor sees for a reference 43

44 How Processor Handles a Miss Assume that cache access occurs in 1 cycle Hit is great, and basic pipeline is fine CPI penalty = miss rate x miss penalty A miss stalls the pipeline (for a instruction or data miss) Stall the pipeline (you don t have the data it needs) Send the address that missed to the memory Instruct main memory to perform a read and wait When access completes, return the data to the processor Restart the instruction 44

45 How to Build A Cache? Big question: locating data in the cache I need to map a large address space into a small memory How do I do that? Can build full associative lookup in hardware, but complex Need to find a simple but effective solution Two common techniques: Direct Mapped Set Associative Further questions Block size (crucial for spatial locality) Replacement policy (crucial for temporal locality) Write policy (writes are always more complex) 45

46 Terminology Block Minimum unit of information transfer between levels of the hierarchy Block addressing varies by technology at each level Blocks are moved one level at a time Hit Data appears in a block in lower numbered level Hit rate Percent of accesses found Hit time Time to access at lower nuumbered level Hit time = Cache access time + Time to determine hit/miss Miss Data was not in lower numbered level and had to be fetched from a higher numbered level Miss rate Percent of misses (1 Hit rate) Miss penalty Overhead in getting data from a higher numbered level Miss penalty = higher level access time + Time to deliver to lower level + Cache replacement/forward to processor time Miss penalty is usually much larger than the hit time 46

47 Direct Mapped Cache Location in cache determined by (main) memory address Direct mapped: only one choice (Block address) modulo (#Blocks in cache) 47

48 Tags and Valid Bits How do we know which particular block is stored in a cache location? Store block address as well as the data Actually, only need the high-order bits Called the tag What if there is no data in a location? Valid bit: 1 = present, 0 = not present Initially 0 48

49 Cache Example 8-blocks, 1 word/block, direct mapped Initial state Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N 49

50 Cache Example Compulsory/Cold Miss Word addr Binary addr Hit/Miss Cache block Miss 110 Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N 50

51 Cache Example Compulsory/Cold Miss Word addr Binary addr Hit/Miss Cache block Miss 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N 51

52 Cache Example Word addr Binary addr Hit/Miss Cache block Hit Hit 010 Hit Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N 52

53 Cache Example Word addr Binary addr Hit/Miss Cache block Miss Miss Hit 000 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N 53

54 Cache Example Replacement Word addr Binary addr Hit/Miss Cache block Miss 010 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N 54

55 Cache Organization & Access Assumptions 32-bit address 4 Kbyte cache 1024 blocks, 1word/block Steps Use index to read V, tag from cache Compare read tag with tag from address If match, return data & hit signal Otherwise, return miss 55

56 Cache Block Example Assume a 2 n byte direct mapped cache with 2 m byte blocks Byte select The lower m bits Cache index The lower (n-m) bits of the memory address Cache tag The upper (32-n) bits of the memory address 56

57 Block Sizes Larger block sizes take advantage of spatial locality Also incurs larger miss penalty since it takes longer to transfer the block into the cache Large block can also increase the average time or the miss rate Tradeoff in selecting block size Average Access Time = Hit Time *(1-MR) + Miss Penalty * MR 57

58 Direct Mapped Problems: Conflict misses Two blocks that are used concurrently and map to same index Only one can fit in the cache, regardless of cache size No flexibility in placing 2 nd block elsewhere Thrashing If accesses alternate, one block will replace the other before reuse No benefit from caching Conflicts & thrashing can happen quite often 58

59 Fully Associate Cache Opposite extreme in that I has no cache index to hash Use any available entry to store memory elements No conflict misses, only capacity misses Must compare cache tags of all entries to find the desired one 59

60 Acknowledgements These slides contain material from courses: UCB CS152 Stanford EE108B 60

The Processor: Improving the performance - Control Hazards

The Processor: Improving the performance - Control Hazards The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

CENG 3420 Lecture 06: Pipeline

CENG 3420 Lecture 06: Pipeline CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2019 Outline q Pipeline Motivations q Pipeline Hazards q Exceptions q Background: Flip-Flop Control Signals CENG3420 L06.2

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University The Processor (3) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

EIE/ENE 334 Microprocessors

EIE/ENE 334 Microprocessors EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009 ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control 6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD Pipelined CPU Break execution into

More information

MIPS An ISA for Pipelining

MIPS An ISA for Pipelining Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S. Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12a Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

CSE 2021: Computer Organization

CSE 2021: Computer Organization CSE 2021: Computer Organization Lecture-12 Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB

More information

COMP2611: Computer Organization. The Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

CS 251, Winter 2019, Assignment % of course mark

CS 251, Winter 2019, Assignment % of course mark CS 251, Winter 2019, Assignment 5.1.1 3% of course mark Due Wednesday, March 27th, 5:30PM Lates accepted until 1:00pm March 28th with a 15% penalty 1. (10 points) The code sequence below executes on a

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark CS 251, Winter 2018, Assignment 5.0.4 3% of course mark Due Wednesday, March 21st, 4:30PM Lates accepted until 10:00am March 22nd with a 15% penalty 1. (10 points) The code sequence below executes on a

More information

Chapter 4. The Processor. Jiang Jiang

Chapter 4. The Processor. Jiang Jiang Chapter 4 The Processor Jiang Jiang jiangjiang@ic.sjtu.edu.cn [Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2008, MK] Chapter 4 The Processor 2 Introduction CPU performance

More information

DEE 1053 Computer Organization Lecture 6: Pipelining

DEE 1053 Computer Organization Lecture 6: Pipelining Dept. Electronics Engineering, National Chiao Tung University DEE 1053 Computer Organization Lecture 6: Pipelining Dr. Tian-Sheuan Chang tschang@twins.ee.nctu.edu.tw Dept. Electronics Engineering National

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are

More information

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos Unresolved data hazards 81 Unresolved data hazards Arithmetic instructions following a load, and reading the register updated by the load: if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Chapter 5. Memory Technology

Chapter 5. Memory Technology Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information

EE 457 Unit 8. Exceptions What Happens When Things Go Wrong

EE 457 Unit 8. Exceptions What Happens When Things Go Wrong 1 EE 457 Unit 8 Exceptions What Happens When Things Go Wrong 2 What are Exceptions? Exceptions are rare events triggered by the hardware and forcing the processor to execute a software handler HW Interrupts

More information

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27

More information

ECE 2300 Digital Logic & Computer Organization. Caches

ECE 2300 Digital Logic & Computer Organization. Caches ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates

More information

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #8 Dynamic Branch Prediction

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

COSC121: Computer Systems. ISA and Performance

COSC121: Computer Systems. ISA and Performance COSC121: Computer Systems. ISA and Performance Jeremy Bolton, PhD Assistant Teaching Professor Constructed using materials: - Patt and Patel Introduction to Computing Systems (2nd) - Patterson and Hennessy

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

Virtual memory. Hung-Wei Tseng

Virtual memory. Hung-Wei Tseng Virtual memory Hung-Wei Tseng Why virtual memory How VM works VM and cache Outline 2 Virtual memory 3 Scenario I An application is design on machine A with memory size X. Can we safely execute the same

More information

Virtual memory. Hung-Wei Tseng

Virtual memory. Hung-Wei Tseng Virtual memory Hung-Wei Tseng Why virtual memory How VM works VM and cache Outline 4 Virtual memory 5 Scenario I An application is design on machine A with memory size X. Can we safely execute the same

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Recall. ISA? Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Instruction Format or Encoding how is it decoded? Location of operands and

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

Processor Design Pipelined Processor (II) Hung-Wei Tseng

Processor Design Pipelined Processor (II) Hung-Wei Tseng Processor Design Pipelined Processor (II) Hung-Wei Tseng Recap: Pipelining Break up the logic with pipeline registers into pipeline stages Each pipeline registers is clocked Each pipeline stage takes one

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor? Chapter 4 The Processor 2 Introduction We will learn How the ISA determines many aspects

More information

Memory Hierarchy: Caches, Virtual Memory

Memory Hierarchy: Caches, Virtual Memory Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories

More information

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts Prof. Sherief Reda School of Engineering Brown University S. Reda EN2910A FALL'15 1 Classical concepts (prerequisite) 1. Instruction

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27) Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining

More information

Computer Architecture CS372 Exam 3

Computer Architecture CS372 Exam 3 Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy Datorarkitektur och operativsystem Lecture 7 Memory It is impossible to have memory that is both Unlimited (large in capacity) And fast 5.1 Intr roduction We create an illusion for the programmer Before

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information