ECE331: Hardware Organization and Design

Size: px

Start display at page:

Download "ECE331: Hardware Organization and Design"

Audra Allen
5 years ago
Views:

1 ECE331: Hardware Organization and Design Lecture 24: Cache Performance Analysis Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

2 Overview Last time: Associative caches How do we calculate cache hit rate? What about processor performance? Need to consider cache replacement policy How to select what to remove from an associative cache Lots of research done to select the right choices Research ongoing for chips with multiple processors Need to know how to calculate values for the exam ECE331: Cache Performance Analysis 2

3 Data valid, tag OK, so read offset return word d Valid 3 Tag 2 Index 0x0-3 0x4-7 0x8-b 0xc-f a b c d bits bits ECE331: Cache Performance Analysis 3

4 Two-way Set Associative Cache Two direct-mapped caches operate in parallel Cache Index selects a set from the cache (set includes 2 blocks) The two tags in the set are compared in parallel Data is selected based on the tag result Valid Cache Tag Cache Data Cache Block 0 : : : Cache Index Cache Data Cache Block 0 : Cache Tag Valid : : Tag Compare Sel1 1 Mux 0 Sel0 Compare Tag Set Hit OR Cache Block ECE331: Cache Performance Analysis 4

5 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative Block access sequence: 0, 8, 0, 6, 8 Direct mapped Note: sets are shown horizontally in this table Block address Cache index Hit/miss Cache content after access miss Mem[0] 8 0 miss Mem[8] 0 0 miss Mem[0] 6 2 miss Mem[0] Mem[6] 8 0 miss Mem[8] Mem[6] ECE331: Cache Performance Analysis 5

6 Associativity Example 2-way set associative Note: sets are shown horizontally in this table Block address Cache index Hit/miss 0 0 miss Mem[0] 8 0 miss Mem[0] Mem[8] 0 0 hit Mem[0] Mem[8] 6 0 miss Mem[0] Mem[6] 8 0 miss Mem[8] Mem[6] Cache content after access Set 0 Set 1 Fully associative Block Hit/miss Cache content after access address 0 miss Mem[0] 8 miss Mem[0] Mem[8] 0 hit Mem[0] Mem[8] 6 miss Mem[0] Mem[8] Mem[6] 8 hit Mem[0] Mem[8] Mem[6] ECE331: Cache Performance Analysis 6

7 Block Size and Miss Penalty With increase in block size, the cost of a miss also increases Miss penalty: time to fetch the block from the next lower level of the hierarchy and load it into the cache With very large blocks, increase in miss penalty overwhelms decrease in miss rate Can minimize average access time if design memory system right ECE331: Cache Performance Analysis 7

8 Miss Rate Versus Block Size 40% 35% 30% total cache size 1 KB 8 KB 16 KB 64 KB 256 KB Miss rate 25% 20% 15% 10% 5% 0% Block size (bytes) ECE331: Cache Performance Analysis 8

9 Writing to the Cache and Block Replacement Need to keep cache consistent with memory Write to cache & memory simultaneously: Write-through Or: Write to cache and mark as dirty Need to eventually copy back to memory: Write-back Need to make space in cache for a new entry Which Line Should be Evicted Ideal?: Longest Time Till Next Access Least-recently used Complicated Random selection Simple Effect on hit rate is relatively small ECE331: Cache Performance Analysis 9

10 Replacement Policy Direct mapped: no choice Set associative Prefer non-valid entry, if there is one Otherwise, choose among entries in the set Least-recently used (LRU) Choose the one unused for the longest time Simple for 2-way, manageable for 4-way, too hard beyond that Random Gives approximately the same performance as LRU for high associativity ECE331: Cache Performance Analysis 10

11 Replacement Policy For direct-mapped cache - easy since only one block is replaced For fully-associative and set-associative cache - two strategies: Random Least-recently used (LRU) replace the block that has not been accessed for a long time. (Principle of temporal locality) Valid Cache Tag Cache Data Reference Cache Index Cache Data Cache Tag Valid Cache Block 0 Cache Block 0 : : : : : : Tag Compare Sel1 1 Mux 0 Sel0 Compare Tag Set ECE331: Cache Performance Analysis 11 Hit OR Cache Block

12 Measuring Cache Performance CPU time = Execution cycles clock cycle time = Instruction_Count CPI clock cycle If cache miss: (Execution cycles + Memory stall cycles) clock cycle time Memory-stall cycles = Memory accesses miss rate miss penalty = # instructions misses/instruction miss penalty ECE331: Cache Performance Analysis 12

13 Example Question: Cache miss penalty = 50 cycles and all instructions take 2.0 cycles without memory stalls. Assume cache miss rate of 2% and 1.33 (why?) memory references per instruction. What is the impact of cache? Answer: CPU time= IC (CPI + Memory stall cycles / instruction) cycle time τ Performance including cache misses is CPU time = IC (2.0 + ( )) cycle time = IC 3.33 τ For a perfect cache that never misses CPU time =IC 2.0 τ Hence, including the memory hierarchy stretches CPU time by 1.67 But, without memory hierarchy, the CPI would increase to x 1.33 or 68.5 a factor of over 30 times longer ECE331: Cache Performance Analysis 13

14 Problem 1 The following is a sequence of address references given as byte addresses. 1, 4, 8, 5, 20, 17, 19, 56, 9, 11, 4, 43, 5, 6, 9, 17 Assuming a direct-mapped cache with 16 one-byte blocks that is initially empty, label each reference in the list as a hit or a miss Reference Modulo 16 Hit or Miss 1 1 Miss 4 4 Miss 8 8 Miss 5 5 Miss 20 4 Miss 17 1 Miss 19 3 Miss 56 8 Miss 9 9 Miss Miss 4 4 Miss Miss 5 5 Hit 6 6 Miss 9 9 Hit 17 1 Hit ECE331: Cache Performance Analysis 14

15 Problem 2 The following is a sequence of address references given as byte addresses. 1, 4, 8, 5, 20, 17, 19, 56, 9, 11, 4, 43, 5, 6, 9, 17 Assuming a direct-mapped cache with 4 blocks, each with 4 bytes that is initially empty, label each reference in the list as a hit or a miss Block Address, Byte Address ECE331: Cache Performance Analysis 15

16 Problem 2 Block Byte0 Byte1 Byte2 Byte Reference Block Address Byte Address Hit/Miss Miss Miss Miss Hit Miss Miss Hit Miss Miss Hit Miss Miss Hit Hit Miss Hit ECE331: Cache Performance Analysis 16

17 Problem 3 The following is a sequence of address references given as byte addresses. 1, 4, 8, 5, 20, 17, 19, 56, 9, 11, 4, 43, 5, 6, 9, 17 Assuming a 2-way set-associative cache with 16 one-byte blocks that is initially empty, label each reference in the list as a hit or a miss Set Address, Byte Address ECE331: Cache Performance Analysis 17

18 Problem 3 Reference Set Address (modulo 8) Hit or Miss 1 1 Miss 4 4 Miss 8 0 Miss 5 5 Miss 20 4 Miss 17 1 Miss 19 3 Miss 56 0 Miss 9 1 Miss 11 3 Miss 4 4 Hit 43 3 Miss 5 5 Hit 6 6 Miss 9 1 Hit 17 1 Hit Set Address , , ECE331: Cache Performance Analysis 18

19 Problem 4 2-way set associative, four byte/ block, 4 words cache 0 1 Set # Block Byte0 Byte1 Byte2 Byte Reference Set Addr. Shift right 2 modulo 2 Hit or Miss 1 0 Miss 4 1 Miss 8 0 Miss 5 1 Hit 20 1 Miss ECE331: Cache Performance Analysis 19

20 Summary Today: Cache performance Need to understand cache replacement strategy Least recently used Random Determine CPI given cache misses Determine miss rates for cache Associative caches require more hardware More comparison hardware More difficult to replace blocks ECE331: Cache Performance Analysis 20

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140