Review: Computer Organization Cache Chansu Yu Caches: The Basic Idea A smaller set of storage locations storing a subset of information from a larger set. Typically, SRAM for DRAM main memory: Processor Cache Goal: Decrease average time for data access. Use: Look in for data. Look in larger storage only if not found in. Invisible to programmer. Multiple ways to organize we will see several. For simplicity, assume all memory accesses are word-sized. c.yu@csuohio.edu
Terminology Cache holds code/data that will probably be accessed later Later access is really found in : Hit or % it is not found in : Miss Hit time : access time Miss penalty : memory access time Bring the data into since it will probably be accessed later Bring the following data altogether ( block or line size) Need to throw away the oldest block in the ( replacement ) c.yu@csuohio.edu Caches: Flowchart memory reference yes (hit) read bytes from block in? no (miss) read the block from memory into extract desired bytes return data to CPU c.yu@csuohio.edu
Hits vs. Misses Read hits this is what we want! Read misses stall the CPU, fetch block from memory, deliver to, restart Write hits: can replace data in and memory (write-through) write the data only into the (write-back the later) Write misses: read the entire block into the, then write the word c.yu@csuohio.edu Cache Performance Describing performance: Hit time = time to access on hit. Usually, read hit time = write hit time. SRAM s: a few cycles Miss penalty = additional time to access on miss. Usually, read miss penalty write miss penalty. DRAM main memory: s-s cycles Hit ratio = #hits / #accesses Miss ratio = #misses / #accesses Measuring performance: Lots of benchmarking & simulations. c.yu@csuohio.edu
Cache Block Placement Two issues: How do we know if a data item is in the? If it is, how do we find it? Our first example: (with decimal numbers, which is not the actual case) Block size is bytes of data has bytes ( blocks) Cache has bytes ( blocks) Direct mapped" c.yu@csuohio.edu Direct-mapped Block Placement bytes... access [] bytes Direct-mapped c.yu@csuohio.edu
Decoding address [] offset within the block block index for identifying the original memory block Given the memory address (e.g. ) Extract the block index () Check if the block # corresponds to memory ~ For this, each entry remembers the tag data (e.g. ) If tag matches with the first digit in memory address, HIT Extract the byte within the block with offset address (e.g. ) c.yu@csuohio.edu Direct-mapped Block Placement bytes... bytes Direct-mapped c.yu@csuohio.edu
Exercise: Make connections! bytes... bytes Direct-mapped c.yu@csuohio.edu Exercise: Find them! bytes... bytes Direct-mapped c.yu@csuohio.edu
Decoding Given the memory address (e.g. ) Extract the block index () Check if the block # corresponds to memory ~ Since tag is and the first digit in memory address is, MISS So, what happens when a miss occurs? (see pages -) The current block # contains ~ Read memory ~ and replace the block # Change the to c.yu@csuohio.edu Decoding Given the memory address (e.g. ) Extract the block index () Check if the block # corresponds to memory ~ is and therefore, HIT However, it is possible that = because the is initially empty This is not a HIT??? => We need a valid bit! c.yu@csuohio.edu
Exercise: What is invalid? bytes... bytes Direct-mapped Valid One digit (in fact, -bit is enough in this example) One bit c.yu@csuohio.edu Handling Writes Write strategy (see pages -) When write hit Write through: data is written to both the and the memory Write back : data is written only to the the modified (dirty) block is written to memory when replaced requires dirty bit for each block (more complexity) Better performance but semantic problem When write miss Write allocate : fetch-on-write - with Write-back??? No write allocate : write-around - with Write-through??? c.yu@csuohio.edu
Single Word Cache Block : Exercise address size? (# bits for memory address?) Block size? (# bits for offset?) # blocks? (# bits for block index?) Hit Address (showing bit positions) Byte offset Index Index Valid Data Data address to Cache address # Cache blocks in direct-mapped? (# bits for block index?) Which (how many) memory blocks are candidates for block #? size? How memory address is decomposed? c.yu@csuohio.edu Exercise Multiword Cache Block Address (showing bit positions) Hit Byte offset Data Index Block offset bits bits V Data K entries Mux c.yu@csuohio.edu
Other Block Placement Policies Direct-mapped A memory block is d in only one position in the Set-associative (n-way) A memory block is d in n positions in the Fully-associative A memory block is d in any position in the Do they decrease miss ratio? c.yu@csuohio.edu Direct-mapped Block Placement bytes ( words) A B C D E F Valid Direct-mapped bytes ( words) -bit memory address tag -bit block # -bit offset in the block c.yu@csuohio.edu
(N-Way) Set-associative Block Placement bytes ( words) A B C D E F Valid Set-associative (-way ) bytes ( words) Set Set Set Set -bit memory address tag(???) block # (???) -bit offset in the block c.yu@csuohio.edu Fully-associative Block Placement bytes ( words) A B C D E F Valid Fully-associative bytes ( words) -bit memory address tag(???) block # (???) -bit offset in the block c.yu@csuohio.edu
Block Identification (cont d) N-way set-associative There are N blocks to compare N tag comparisons are done in parallel Block # to Set # choose low-order bits of blocks as set # tag + set# + offset consecutive blocks to map to different sets fewer conflicts in, especially in the presence of spatial locality Same size with higher associativity #blocks / set??? index size, tag size??? c.yu@csuohio.edu CPU address block # tag set # offset ways -way set associative index into tag data tag data hit OR MUX data c.yu@csuohio.edu
An Example Address Index V Data V Data V Data V Data Hit -to- multiplexor Data x-way set associate? Number of blocks? Block size? field size? Cache size? c.yu@csuohio.edu Replacement Policies In direct-mapped, no replacement policy is necessary In set-associative, an important question is Which block to replace among the set (see page )? Least recently used (LRU) The most commonly used scheme How to keep track of usage of blocks? Single bit in case of two-way set associative See Section. for higher associativity case c.yu@csuohio.edu