ECE 2300 Digital Logic & Computer Organization. More Caches

Size: px

Start display at page:

Download "ECE 2300 Digital Logic & Computer Organization. More Caches"

Marilynn Floyd
5 years ago
Views:

1 ECE 23 Digital Logic & Computer Organization Spring 218 More Caches 1

2 Announcements Prelim 2 stats High: 79.5 (out of 8), Mean: 65.9, Median: 68 Prelab 5(C) deadline extended to Saturday 3pm No further extension (aka slip days) allowed 2

3 Hexadecimal Notation (used in HW7) Often convenient to write binary (base-2) numbers as hexadecimal (base-16) numbers Fewer digits: 4 bits per hex digit Less error prone: easy to misread long string of 1 s and s (such as memory address) Binary Hex Decimal Binary Hex Decimal A B C D E F 15 3

4 Converting from Binary to Hex Every group of four bits is a hex digit Start grouping from right-hand side A 8 F 4 D 7 This is not a new machine representation; just a compact way to write the number 4

5 Cache Basics: True or False? Cache is usually implemented in DRAM? Memory block size is larger than the cache block size? Memory block address is shorter than the memory address? Direct mapped cache allows a memory block to have more than one cache location? 5

6 Example: DM Cache Address Breakdown Assuming 16-bit memory addresses, how many bits are associated with the tag, index, and offset of the following configurations for a direct mapped cache? (a) 16 blocks, 4 bytes per block Byte offset: 2 bits; Index: 4 bits; Tag: 1 bits (b) 32 blocks, 8 bytes per block Byte offset: 3 bits; Index: 5 bits; Tag: 8 bits 6

7 Block Placement in DM Cache Direct mapped cache: Each memory block maps to one cache block Mapping conflicts may increase miss rate Block Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Memory with 8 blocks Direct mapped cache with 4 blocks 7

8 More Flexible Block Placement K-way Set Associate Cache: each memory block maps to one set, which contains K blocks A block can be stored anywhere in the set Block Block 1 Block 2 Block 3 Block 4 Set Set 1 Way Way 1 Block 5 Block 6 2-way set associative cache with 4 blocks Block 7 Memory with 8 blocks 8

9 Associative Caches K-way set associative Index bits determine which set to address Each set contains K entries (ways) All ways in the selected set are searched in parallel K comparators (more expensive than direct mapped) An extreme case: Fully associative Block can go in any cache location Only one set => No need for index bits All entries are searched in parallel Comparator per entry (most expensive) 9

10 Address Translation for Associative Caches Breakdown of memory address for cache use n-i-b tag bits i index bits b byte offset bits Parameters for a K-way set associative cache Size of each cache block is 2 b bytes Number of sets is 2 i Number of blocks is K 2 i Total cache size is (K 2 b+i ) bytes 1

11 4-way Set Associative Cache Index bits address one cache set 256 sets (4 ways per set, 124 blocks) All 4 ways within the selected cache set are searched in parallel 11

12 2-way Set Associative Example Size of each block is 4 bytes Cache holds 4 blocks, 2-way set associative Memory holds 16 blocks Memory address 1 3 tag bits 2 byte offset bits 2 sets 1 index bit 2 ways 12

13 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R3 miss

14 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R3 1 miss

15 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R3 1 miss

16 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R miss

17 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R miss

18 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R miss

19 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R miss

20 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R miss

21 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R hit

22 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R hit

23 2-way Set Associative Example Processor Cache Memory R1 <= M[] R2 <= M[1] R3 <= M[1] R2 <= M[111] R1 <= M[] R1 <= M[1] R R1 R2 R hit

24 Spectrum of Associativity A K-way set associative cache with N blocks Number of cache sets S = N / K Number of index bits = log 2 (S) When K = N, fully associative cache ONE cache set à zero index bits When K = 1 (one-way), direct mapped cache N cache sets Increasing the associatively Typically improves the hit rate (fewer conflicts) But increases the hit time (takes longer to search) 24

25 Spectrum of Associativity For a cache with 8 blocks 25

26 Exercise: Set Associate Cache Address Breakdown Assuming 16-bit addresses, how many bits are associated with the tag, index, and offset of the following cache configuration? 16 blocks, 16 bytes per block, 4-way set associative Byte offset: 4 bits; Index: 2 bits; Tag: 1 bits 16 blocks, 16 bytes per block, fully associative Byte offset: 4 bits; index: bits; Tag: 12 bits 26

27 Miss Classification Compulsory (Cold) misses Caused by the first access to a memory block Capacity misses Occur because the cache might not be big enough to hold the active set of memory blocks needed during program execution Conflict misses Occur with a direct mapped or set-associative cache when multiple memory blocks compete in the same set due to the inflexibility of block placement Would not occur in a fully associative cache 27

28 Misses vs. Associativity Example Compare different caches Capacity: 4 blocks Direct mapped, 2-way set associative, fully associative Block address sequence:, 8,, 6, 8 (in decimal) Direct mapped Block address Cache index Hit/miss Cache contents after access Block Block 1 Block 2 Block 3 Blocks (or Sets) 28

29 Misses vs. Associativity Example Compare different caches Capacity: 4 blocks Direct mapped, 2-way set associative, fully associative Block address sequence:, 8,, 6, 8 (in decimal) Direct mapped Block address Cache Hit/miss index miss Mem[] 8 miss Mem[8] miss Mem[] 6 2 miss Mem[] Mem[6] 8 miss Mem[8] Mem[6] Cache contents after access Block Block 1 Block 2 Block 3 Color code: Cold miss Conflict miss 29

30 Misses vs. Associativity Example 2-way set associative Block address Cache index Hit/miss Cache contents after access Set Set 1 Fully associative Ways Ways Block address Hit/miss Cache contents after access 3

31 Misses vs. Associativity Example 2-way set associative Block address Cache Hit/miss index miss Mem[] 8 miss Mem[] Mem[8] hit Mem[] Mem[8] 6 miss Mem[] Mem[6] 8 miss Mem[8] Mem[6] Fully associative Cache contents after access Set Set 1 Block Hit/miss Cache contents after access address miss Mem[] 8 miss Mem[] Mem[8] hit Mem[] Mem[8] 6 miss Mem[] Mem[8] Mem[6] 8 hit Mem[] Mem[8] Mem[6] Color code: Cold miss Conflict miss 31

32 Block Replacement Policy Direct mapped: no choice Set associative and fully associative Pick non-valid entry, if there is one Otherwise, choose among entries in the set Least recently used (LRU) Choose the one unused for the longest time Requires extra bits to order the blocks High overhead beyond 4-way set associative Random Similar performance as LRU for high associativity 32

33 LRU Replacement Example Fully associative (X) = LRU Age 2 bits in this case Block Cache Hit/miss Cache contents after access address index miss Mem[] () 4 miss Mem[] (1) Mem[4] () 2 miss Mem[] (2) Mem[4] (1) Mem[2] () 6 miss Mem[] (3) Mem[4] (2) Mem[2] (1) Mem[6] () 8 miss Mem[8] () Mem[4] (3) Mem[2] (2) Mem[6] (1) miss Mem[8] (1) Mem[] () Mem[2] (3) Mem[6] (2) 4 miss Mem[8] (2) Mem[] (1) Mem[4] () Mem[6] (3) 2 miss Mem[8] (3) Mem[] (2) Mem[4] (1) Mem[2] () 6 miss Mem[6] () Mem[] (3) Mem[4] (2) Mem[2] (1) 8 miss Mem[6] (1) Mem[8] () Mem[4] (3) Mem[2] (2) 2 hit Mem[6] (2) Mem[8] (1) Mem[4] (3) Mem[2] () 6 hit Mem[6] () Mem[8] (2) Mem[4] (3) Mem[2] (1) 2 hit Mem[6] (1) Mem[8] (2) Mem[4] (3) Mem[2] () miss Mem[6] (2) Mem[8] (3) Mem[] () Mem[2] (1) Color code: Cold miss Conflict miss Capacity miss 33

34 H&H 7.5.5, 8.2 Before Next Class Next Time More Caches Measuring Performance 34

ECE 2300 Digital Logic & Computer Organization. More Caches

ECE 2300 Digital Logic & Computer Organization. More Caches ECE 23 Digital Logic & Computer Organization Spring 217 More Caches 1 Prelim 2 stats High: 9 (out of 9) Mean: 7.2, Median: 73 Announcements Prelab 5(C) due tomorrow 2 Example: Direct Mapped (DM) Cache