ECE331: Hardware Organization and Design

Size: px

Start display at page:

Download "ECE331: Hardware Organization and Design"

Phoebe Gibbs
5 years ago
Views:

1 ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

Midterm 2 Review Midterm will cover Section 1.6: Processor Performance Sections 4.5 4.8: Pipelining, Data and Control Hazard methods Sections 5.1 5.

2 Midterm 2 Review Midterm will cover Section 1.6: Processor Performance Sections : Pipelining, Data and Control Hazard methods Sections : Memory Hierarchy & Caches Processor Performance Instructions Clock cycles CPU Time = Program Instruction à to make comparisons between different CPUs: Seconds Clock cycle ECE331: Midterm 2 Review 2

3 CPI Example Alternative compiled code sequences using instructions in classes A, B, C For example: A unconflicted instruction, B jump instruction, C- beq instruction Class A B C Total IC CPI for class IC in sequence IC in sequence IC à instruction count Sequence 1: IC = 5 Clock Cycles = = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = = 9 Avg. CPI = 9/6 = 1.5 ECE331: Midterm 2 Review 3

4 Datapath With Control (no jumps) op rs rt rd addr/ func ECE331: Midterm 2 Review 4

5 Pipelining Pipelining versus single-cycle data path Breaks the task up into smaller chunks Length of any one instruction remains the same Inefficiencies at startup and shutdown of pipeline Hazards Structural Hazards Data Hazards Control Hazards ECE331: Midterm 2 Review 5

6 Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles 11 cycles ECE331: Midterm 2 Review 6

7 Datapath with Forwarding Hardware PCSrc ID/EX EX/MEM IF/ID Control PC 4 Instruction Memory Read Address Add Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read Data 2 Write Data 16 Sign 32 Extend Shift left 2 Add ALU ALU cntrl Branch Address Data Memory Write Data Read Data MEM/WB Forward Unit ECE331: Midterm 2 Review 7

8 Control Hazard I n s t r. beq stall ALU IM Reg DM Reg O r d e r stall stall lw Inst 3 Fix branch hazard by waiting introduce stalls ALU IM Reg DM Reg ALU IM Reg DM ECE331: Midterm 2 Review 8

9 Reducing branch penalty through HW design see Figure 4.65 and 4.66 in the book ECE331: Midterm 2 Review 9

10 Branch Prediction Easiest - static prediction Always taken, always not taken Opcode based Displacement based (forward not taken, backward taken) Compiler directed (branch likely, branch not likely) Dynamic prediction prediction per branch in program 1 bit predictor remember last taken/not taken per branch Use a branch-history table (BHT) with 1 bit entry Use part of the PC (low-order bits) to index table Multiple branches may share the same bit Invert the bit if prediction is wrong Branch PC BHT Predictor 0 Predictor 1 Predictor 127 ECE331: Midterm 2 Review 10

11 Memory Hierarchy Pyramid Processor (CPU) transfer datapath: bus Decreasing distance from CPU, Decreasing Access Time (Memory Latency) Level 1 Level 2 Level 3... Level n Increasing Distance from CPU, Decreasing cost / MB ECE331: Midterm 2 Review 11 Size of memory at each level

track if changed What if we run out of space in smaller, faster

12 Basic Philosophy for cache Move data into smaller, faster memory Operate on it Move it back to larger, cheaper memory How do we keep track if changed What if we run out of space in smaller, faster memory? Important Concepts: Latency, Bandwidth ECE331: Midterm 2 Review 12

13 Memory Hierarchy: Terminology Hit: data appears in upper level in block X Hit Rate: the fraction of memory accesses found in the upper level Miss: data needs to be retrieved from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Hit Time: Time to access the upper level which consists of Time to determine hit/miss + upper level access time Miss Penalty: Time to replace a block in the upper level + Time to deliver the block to the processor Note: Hit Time << Miss Penalty Lower Level To Processor Upper Level From Processor Block X Block Y Block X ECE331: Midterm 2 Review 13

14 Accessing data in a direct mapped cache Three types of events: cache hit: cache block is valid and contains proper address, so read desired word cache miss: nothing in cache in appropriate block, so fetch from memory cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory Cache Access Procedure: (1) Use Index bits to select cache block (2) If valid bit is 1, compare the tag bits of the address with the cache block tag bits (3) If they match, use the offset to read out the word/ byte ECE331: Midterm 2 Review 14

15 Selecting part of a block (block size > 1 byte) If block size > 1, rightmost bits of index are really the offset within the indexed block TAG INDEX OFFSET Tag to check if have correct block Index to select a block in cache Byte offset Example: Block size of 8 bytes; select byte 4 (or 2 nd word) Memory address tag 11 ECE331: Midterm 2 Review 15 Cache Index

16 Measuring Cache Performance Components of CPU time Program execution cycles Includes cache hit time Memory stall cycles Mainly from cache misses With simplifying assumptions: Memory stall cycles = = Memory accesses Program Instructions Program Miss rate Miss penalty Misses Instruction Miss penalty ECE331: Midterm 2 Review 16

17 Average Access Time Hit time is also important for performance Average memory access time (AMAT) AMAT = Hit rate * Hit time + Miss rate Miss penalty AMAT Hit time + Miss rate Miss penalty Since the hit rate is approximately equal to one. Example CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5% AMAT = = 2ns 2 cycles per instruction ECE331: Midterm 2 Review 17

18 Set Associative Cache - addressing From the main memory address TAG INDEX/Set # OFFSET Tag to check if have correct block anywhere in set Index to select a set in cache Byte offset Example: Main memory address 13 (001101) with 16 bytes of cache arranged in 4 blocks of 4 bytes each Direct mapped: tag: 00, index 11, offset 01 2-way associative: tag: 001, index 1, offset 01 4-way associative: tag: 0011, index -, offset 01 Notice: the size of the tag grows as associativity increases ECE331: Midterm 2 Review 18

19 4-way Set Associative Cache Organization Allow block anywhere in a set Advantages: Better hit rate Disadvantage: More tag bits More hardware Higher access time A Four-Way Set- Associative Cache, Block size = 4 Bytes Cache size = 4096 Bytes ECE331: Midterm 2 Review 19

ECE 331 Hardware Organization and Design. UMass ECE Discussion 10 4/5/2018

ECE 331 Hardware Organization and Design UMass ECE Discussion 10 4/5/2018 Today s Discussion Topics Direct and Set Associative Cache Midterm Review Hazards Code reordering and forwarding Direct Mapped