CACHE ARCHITECTURE. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Size: px

Start display at page:

Download "CACHE ARCHITECTURE. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah"

Arline McDowell
5 years ago
Views:

1 CACHE ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

2 Overview Announcement Mar. 14 th : Homework 4 release (due on Mar. 27 th ) This lecture Cache addressing and lookup Cache optimizations n Techniques to improve miss rate n Replacement policies n Write policies

$fraction of die area$ microprocessors 3-3.

3 Processor Cache Occupies a large fraction of die area in modern microprocessors GHz ~$ ) 20MB of cache Source: Intel Core i7

4 Cache Hierarchy Example three-level cache organization Core L1 L1 32 KB 1 cycle L2 L3 Application inst. data 1. Where to put the application? 2. Who decides? a. software (scratchpad) b. hardware (caches) 256 KB 10 cycles 8 GB ~300 cycles 4 MB 30 cycles Off-chip Memory

5 Principle of Locality Memory references exhibit localized accesses Types of locality spatial: probability of access to A+d at time t+e highest when d 0 temporal: probability of accessing A+e at time t+d highest when d 0 A spatial t temporal Key idea: store local data in fast cache levels for (i=0; i<1000; ++i) { sum = sum + a[i]; } temporal spatial

6 Cache Terminology Block (cache line): unit of data access Hit: accessed data found at current level hit rate: fraction of accesses that finds the data hit time: time to access data on a hit Miss: accessed data NOT found at current level miss rate: 1 hit rate miss penalty: time to get block from lower level hit time << miss penalty

7 Cache Performance Average Memory Access Time (AMAT) Outcome Rate Access Time Hit r h t h t h + t p r h = 1 r m Miss r m AMAT = r h t h +r m (t h +t p ) AMAT = t h + r m t p cache t p t h Hit Miss Example: hit rate is 90%; hit time is 2 cycles; and accessing the lower level takes 200 cycles; find the average memory access time? AMAT = x200 = 22 cycles

8 Example Problem Assume that the miss rate for instructions is 5%; the miss rate for data is 8%; the data references per instruction is 40%; and the miss penalty is 20 cycles; find performance relative to perfect cache with no misses misses/instruction = x 0.4 = Assuming hit time =1 n AMAT = x20 = 2.64 n Relative performance = 1/2.64

9 Summary: Cache Performance Bridging the processor-memory performance gap Core Level-1 Level-2 Main Memory Main memory access time: 300 cycles Two level cache L1: 2 cycles hit time; 60% hit rate L2: 20 cycles hit time; 70% hit rate What is the average mem access time? AMAT = t h1 + r m1 t p1 t p1 = t h2 + r m2 t p2 AMAT = 46

10 Cache Addressing Instead of specifying cache address we specify main memory address Simplest: direct-mapped cache Note: each memory address maps to a single cache location determined by modulo hashing How to exactly specify which blocks are in the cache? Cache Memory

11 2 Direct-Mapped Lookup Byte offset: to select the requested byte Tag: to maintain the address 0 1 v tag index byte Valid flag (v): whether content is meaningful Data and tag are always accessed = data hit

12 Example Problem Find the size of tag, index, and offset bits for an 8MB, direct-mapped L3 cache with 64B cache blocks. Assume that the processor can address up to 4GB of main memory.

13 Example Problem Find the size of tag, index, and offset bits for an 8MB, direct-mapped L3 cache with 64B cache blocks. Assume that the processor can address up to 4GB of main memory. 4GB = 2 32 B à address bits = 32 64B = 2 6 B à byte offset bits = 6 8MB/64B = 2 17 à index bits = 17 tag bits = = 9

14 Cache Optimizations How to improve cache performance? AMAT = t h + r m t p Reduce hit time (t h ) Memory technology, critical access path Improve hit rate (1 - r m ) Size, associativity, placement/replacement policies Reduce miss penalty (t p ) Multi level caches, data prefetching

15 Set Associative Caches Improve cache hit rate by allowing a memory location to be placed in more than one cache block N-way set associative cache Fully associative For fixed capacity, higher associativity typically leads to higher hit rates more places to simultaneously map cache lines 8-way SA close to FA in practice for (i=0; i<10000; i++) { a++; b++; } Memory a b

16 Set Associative Caches Improve cache hit rate by allowing a memory location to be placed in more than one cache block N-way set associative cache Fully associative For fixed capacity, higher associativity typically leads to higher hit rates more places to simultaneously map cache lines 8-way SA close to FA in practice for (i=0; i<10000; i++) { a++; b++; } way 1 way 0 Memory a b

17 n-way Set Associative Lookup Index into cache sets Multiple tag comparisons Multiple data reads Special cases Direct mapped n Single block sets Fully associative n Single set cache hit OR v = tag index byte = mux data

18 Example Problem Find the size of tag, index, and offset bits for an 4MB, 4-way set associative cache with 32B cache blocks. Assume that the processor can address up to 4GB of main memory.

19 Example Problem Find the size of tag, index, and offset bits for an 4MB, 4-way set associative cache with 32B cache blocks. Assume that the processor can address up to 4GB of main memory. 4GB = 2 32 B à address bits = 32 32B = 2 5 B à byte offset bits = 5 4MB/(4x32B) = 2 15 à index bits = 15 tag bits = = 12

20 Cache Miss Classifications Start by measuring miss rate with an ideal cache 1. ideal is fully associative and infinite capacity 2. then reduce capacity to size of interest 3. then reduce associativity to degree of interest 1. Cold (compulsory) 2. Capacity 3. Conflict qcold start: first access to block qhow to improve o large blocks o prefetching qcache is smaller than the program data qhow to improve o large cache qset size is smaller than mapped mem. locations qhow to improve o large cache o more assoc.

CACHE ARCHITECTURE. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

CACHE ARCHITECTURE. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah CACHE ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 3 will be released on Oct. 31 st This