Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Size: px

Start display at page:

Download "Chapter 7: Large and Fast: Exploiting Memory Hierarchy"

Everett Barnett
5 years ago
Views:

1 Chapter 7: Large and Fast: Exploiting Memory Hierarchy

2 Basic Memory Requirements Users/Programmers Demand: Large computer memory ery Fast access memory Technology Limitations Large Computer memory relatively slower access Small Computer memory Relatively Faster access So how do you build a large computer memory with faster access? Computer Architecture CS

3 Computer Memory Use-Case Scenarios If a memory item is referenced It will most likely be referenced again soon (Temporal Locality) Its neighbors will tend to be referenced soon (Spatial Locality) Basic Philosophy Employ basic requirements, technology limitations and stated use-case scenarios to architect the memory Computer Architecture CS

4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal Locality and Spatial Locality in the design CPU Increasing speed Levels in the memory hierarchy Level Level 2 Increasing distance from the CPU in access time Level n Size of the memory at each level Computer Architecture CS

5 Memory Speed Technology trend Technology SRAM DRAM Magnetic Disk Access Time ns ns 5,000,000 20,000,000 ns $ per GB in 2004 $4,000 - $0,000 $00 - $200 $ $2 Computer Architecture CS

6 Three-Level Computer Memory Hierarchy CPU Cache Main SRAM ---- Small Memory and Fastest DRAM irtual (stores data) Magnetic Disk ---- Biggest and Slowest Fastest Memory is closest to the CPU Computer Architecture CS

7 Memory Hierarchy Upper & Lower Levels if requested data appears in cache block hit not in cache miss Memory Performance q CPU request hit rate = hits/memory access miss rate = -hit rate hit time = time to access cache transfer q miss penalty = time to access lower level + time to transfer block in upper level + access time of upper memory + CPU access time block Computer Architecture CS

8 Basics of Caches CPU Simple Cache Requests X n CPU request X n X 4 X 4 But X n is not in Cache X X X n is copied in Cache from memory X n-2 X n-2 X n is returned to CPU X n- X n- X 2 X 2 Trip to Memory X n X 3 Before Request X 3 After Request Memory How do we know if a data item is in cache, and how do we find it? Computer Architecture CS

9 Cache Structure Direct Mapped Each memory location is mapped directly to a unique location in cache Example Mapping Scheme: Memory address (Block address) mudulo (umber of cache blocks in cache) Cache 8 cache entries word block of cache (8 = 2 3 ) Words: Memory Total entries in Direct mapped Cache must be a Power of 2 Computer Architecture CS

10 Basics of Cache Cache mapping: many words -to-one cache location How do we know whether the data in the cache corresponds to a requested word? Add a set of s to the cache Contain the upper bits of the word address not used in the indexing How do we know if a cache block contains valid information? Cache contents are invalid (empty) during CPU initialization Solution add alid Bits to indicate valid address Computer Architecture CS

11 Accessing Direct-Mapped Cache Assume a direct mapped Cache with -word blocks Requests. Decimal Ref Address 22 Binary Ref Address 00 Hits/Miss Miss Assigned cache block 0 (fig. 7.6b) Miss (fig. 7.6c) Hit Hit Miss (fig. 7.6d) Miss (fig,. 7.6e) Hit Miss (fig 7.6f) 00 Computer Architecture CS

12 Accessing a Cache State of Cache at Initialization Index alid Cache is empty Computer Architecture CS

13 Accessing a Cache Request #: Post Memory Reference 00 Index alid Memory contents00 CPU encounters a miss, Cache copies contents from memory 00 Computer Architecture CS

14 Accessing a Cache Request #2: Post Memory Reference 00 Index alid Memory contents Memory contents00 CPU encounters a miss, Cache copies contents from memory 00 Computer Architecture CS

15 Accessing a Cache Request #3: Post Memory Reference 00 Index alid Memory contents Memory contents00 CPU encounters a hit, Computer Architecture CS

16 Accessing a Cache Request #4: Post Memory Reference 00 Index alid Memory contents00 0 Memory contents00 CPU encounters a hit Computer Architecture CS

17 Accessing a Cache Request #5: Post Memory Reference 0000 Index alid Memory contents Memory contents Memory contents00 CPU encounters a miss, Cache copies contents from memory 0000 Computer Architecture CS

18 Accessing a Cache Request #6: Post Memory Reference 000 Index alid Memory contents Memory contents Memory contents Memory contents00 CPU encounters a miss, Cache copies contents from memory 000 Computer Architecture CS

19 Accessing a Cache Request #7: Post Memory Reference 0000 Index alid Memory contents Memory contents Memory contents Memory contents00 CPU encounters a hit Computer Architecture CS

20 Accessing a Cache Request #8: Post Memory Reference 000 Index alid Memory contents Memory contents Memory contents Memory contents00 CPU encounters a miss, Cache copies contents from memory 000 Computer Architecture CS

21 MIPS Direct Mapped Cache 32-bit Byte Reference Address Cache Index: Bits 2 Bits 0- ignored (not significant) Field: Bits 2-3 Hit CPU Address (showing bit positions) Byte offset 20 0 Index Index 0 2 alid Cache: = Cache size: 2 0 blocks block/word Computer Architecture CS

22 Direct Mapped Cache Cache size Let s assume Index field occupies n bits Size = 2 n blocks field occupies m bits -> 2 m bits Block size 2 m word 2 m+5 bits For a 32-bit byte address: umber of bits for field = 32 (n + m +2) bits Size of cache = 2 n x (block size + size + alid size) Since Block size = 2 m+5 bits umber of bits: 2 n x (block size + size + alid size) = 2 n x (2 m+5 + [32 (n + m +2)] + ) = 2 n x (m x 32 + [32 (n + m +2)] + ) Cache size = 2 n x (m x n m) Computer Architecture CS

23 Direct Mapped Cache Four-word blocks & Total size 6 words Cache DATA Index/Block How do we map the memory address to a Direct Mapped cache with four-word blocks? Computer Architecture CS

24 Four-word blocks & Total size 6 words Consider following reference Memory addresses Ref. Requests Decimal Ref Address Binary Ref Address Show the Direct-Mapped Cache contents at each stage Computer Architecture CS

25 Direct Mapped Cache: Four-word blocks & Total size 6 words Initial Content of Cache Cache Index/Block two 0 two 0 two two Cache is empty Computer Architecture CS

26 Direct Mapped Cache: Four-word blocks & Total size 6 words Request #: Post Memory Reference address 22 ( 0 0 ) Cache Index/Block two 0 two 0 two two 00 0 Con (20) Con(2) Con(22) Con(23) 0 Remaining bits of 22 Address = : diff from cache content Index: Four word 2bits max 0 ext 2 upper bits of 22 Miss Block: 22 mod 4 = 2 =0 two Trailing 2 bits of 22 Transfer con(2) thru con(23) to cache then set to & bit to Computer Architecture CS

27 Direct Mapped Cache: Four-word blocks & Total size 6 words Request #2: Post Memory Reference address 20 (000) Cache Index/Block two 0 two 0 two two 00 0 Con (20) Con(2) Con(22) Con(23) 0 Block: 20 mod 4 = 0 =00 two = (Remaining bits) Index: Four word 2bits max 0 & -bits already set to & resp. Hit Computer Architecture CS

28 Direct Mapped Cache: Four-word blocks & Total size 6 words Request #3: Post Memory Reference address 26 (00) Cache Index/Block two 0 two 0 two two 0 Con (20) Con(2) Con(22) Con(23) 0 Con (24) Con(25) Con(26) Con(27) = (Remaining bits) Index: Four word 2bits max 0 not set, -bit set to Block: 26 mod 4 = 2 =0 two Miss Transfer data to cache then set to & bit to Computer Architecture CS

29 Direct Mapped Cache: Four-word blocks & Total size 6 words Request #4: Post Memory Reference address 23 (0) Cache Index/Block two 0 two 0 two two 00 0 Con (20) Con(2) Con(22) Con(23) 0 Con (24) Con(25) Con(26) Con(27) Block: 23 mod 4 = 3 = two = (Remaining bits) Index: Four word 2bits max 0 already set to, -bit set to Hit Computer Architecture CS

30 Direct Mapped Cache: Four-word blocks & Total size 6 words Request #5: Post Memory Reference address 28 (00) Cache Index/Block two 0 two 0 two two 0 Con (20) Con(2) Con(22) Con(23) 0 Con (24) Con(25) Con(26) Con(27) Con (28) Con(29) Con(30) Con(3) = (Remaining bits) Index: Four word 2bits max not set, -bit set to Block: 28 mod 4 = 0 =00 two Miss Transfer data to cache then set to & bit to Computer Architecture CS

31 Direct Mapped Cache: Four-word blocks & Total size 6 words Request #6: Post Memory Reference address 6 (0000) Cache Index/Block two Con (6) 0 two Con(7) 0 two Con(8) two Con(9) 0 Con (20) Con(2) Con(22) Con(23) 0 Con (24) Con(25) Con(26) Con(27) Con (28) Con(29) Con(30) Con(3) = (Remaining bits) Index: Four word 2bits max 00 not set, -bit set to Block: 6 mod 4 = 0 =00 two Miss Transfer data to cache then set to & bit to Computer Architecture CS

32 Direct Mapped Cache: Four-word blocks & Total size 6 words Request #7: Post Memory Reference address 3 (000) Cache Index/Block two Con (0) 0 two Con() 0 two Con(2) two Con(3) 0 Con (20) Con(2) Con(22) Con(23) 0 Con (24) Con(25) Con(26) Con(27) Con (28) Con(29) Con(30) Con(3) = 0 (Remaining bits) Index: Four word 2bits max 00 But is set to Block: 3 mod 4 = 3 = two Miss Transfer data to cache then set to 0 & bit to Computer Architecture CS

33 Direct Mapped Cache: Four-word blocks & Total size 6 words Request #8: Post Memory Reference address 6 (0000) Cache Index/Block two Con (6) 0 two Con(7) 0 two Con(8) two Con(9) 0 Con (20) Con(2) Con(22) Con(23) 0 Con (24) Con(25) Con(26) Con(27) Con (28) Con(29) Con(30) Con(3) = (Remaining bits) Index: Four word 2bits max 00 But is set to 0 Block: 6 mod 4 = 0 =00 two Miss Transfer data to cache then set to & bit to Computer Architecture CS

34 Direct Mapped Cache: Four-word blocks & Total size 6 words Request #9: Post Memory Reference address 8 (000) Cache Index/Block two Con (6) 0 two Con(7) 0 two Con(8) two Con(9) 0 Con (20) Con(2) Con(22) Con(23) 0 Con (24) Con(25) Con(26) Con(27) Con (28) Con(29) Con(30) Con(3) Block: 8 mod 4 = 2 =0 two = (Remaining bits) Index: Four word 2bits max 00 already set to, -bit set to Hit Computer Architecture CS

35 Handling Hits and Misses Control Unit Read hits BAU! this is what we want! Read misses stall the CPU, control unit fetches block from memory, deliver to cache, restart Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) Write misses: read the entire block into the cache, then write the word Computer Architecture CS

36 Other Cache Structures Reducing Cache Misses Fully Associative Cache Each Block in Memory can be placed anywhere in the cache Set-Associative Cache Each Block in Memory can be placed in a fixed number of locations within a set A 2-Way Set-Associative Cache Each set has 2 elements: Cache Locations Computer Architecture CS

37 Set - Associative Cache Structure 2-way set-associative Locations Set Elements of Set 0 Elements of Set 3 Each Block Address can be placed in one of 2 elements within a given set Computer Architecture CS

38 Set - Associative Cache Structure 2-way set-associative How do we map the Block Address to a unique set within the cache? Bit selection algorithm. Set = (Block Address) MOD (number of sets within the Cache) 2. How do we identify the memory address in a set? 3. First, associate each element of the set with a The Set address + contents + -bit identifies the Block Address Computer Architecture CS

39 Set - Associative Cache 2-way set-associative Structure Set Element Element Computer Architecture CS

40 Set - Associative Cache 4-way set-associative Structure S e t Element Element 2 Element 3 Element Each Block Address can be placed in one of 4elements within a given set All s within a set are searched in parallel Computer Architecture CS

41 Two-way Set - Associative Cache Show Cache Contents after each requests Order of requests Decimal Reference Address Binary Reference Address Computer Architecture CS

42 Two-way Set - Associative Cache State of Cache at initialization We will assume:. The Set -Associative Cache has 2 Sets (2 ) with two elements per set 2. Cache Block Size 2 Bits = 2 Set Element Element 2 0 umber of bits in = Max bits to rep Address Max bits to rep Set Cache Block size = Max bits to rep Address - - Computer Architecture CS

43 Two-way Set - Associative Cache Contents Request #: Post Memory Reference address 0 (00000) Set Element Element CO[0] Locate Set: 0 mod 2 = 0 =0 two Search: bit: 5 2 Leading 3 bits Miss Does not exist in Set 0 Transfer data to cache then set to 000 & bit to Computer Architecture CS

44 Two-way Set - Associative Cache Contents Request #2: Post Memory Reference address 8 (0000) ) Set Element Element CO[0] 00 CO[8] Locate Set: 8 mod 2 = 0 =0 two Search: bit: 5 2 Leading 3 bits Miss Does not exist in Set 0 Transfer data to cache then set to 00 & bit to Computer Architecture CS

45 Two-way Set - Associative Cache Contents Request #3: Post Memory Reference address 0 (00000) Set Element Element CO[0] 00 CO[8] Locate Set: 0 mod 2 = 0 =0 two Search the TAG bits in set 0 (parallel algorithm) for 000: bit: 5 2 Leading 3 bits TAG bits exist in Set 0 Hit Send Cache contents (for the Memory address ) to CPU Computer Architecture CS

46 Two-way Set - Associative Cache Contents Request #4: Post Memory Reference address 6 (000) Set Element Element CO[0] 00 CO[6] Search: bit: 5 2 Leading 3 bits Miss Locate Set: 6 mod 2 = 0 =0 two Does not exist in Set 0 Transfer data to cache then set to 00 & bit to Cache Replacement Algorithm: Least Recently Used block Computer Architecture CS

47 Two-way Set - Associative Cache Contents Request #5: Post Memory Reference address 8 (0000) Set Element Element CO[8] 00 CO[6] Search: bit: 5 2 Leading 3 bits Miss Locate Set: 8 mod 2 = 0 =0 two Does not exist in Set 0 Transfer data to cache then set to 00 & bit to Cache Replacement Algorithm: Least Recently Used block Computer Architecture CS

48 Set-Associative Cache Block Replacement Algorithm Least Recently Used (LRU) The block replaced is the one that has been unused for the longest time. Track usage of each Element in a set Expensive with increased associativity (m-way set associative; where m very large) Most Recently Used (MRU) Least-Frequently Used (LFU ) Most-Frequently Used (MFU) First In First Out (FIFO) Computer Architecture CS

49 Let s Summarize M-Way Set-Associative Cache -Way Set-Associative Cache (M=) ( ) Direct Mapped 2-Way Set-Associative Cache (M=2) ( ) Cache Block Replacement: LRU 4-Way Set-Associative Cache (M=4) ( ) Cache Block Replacement: LRU Fully Associative Cache Block can be placed in any location in Cache All entries in cache must be searched in response to cache request (Expensive) Computer Architecture CS

50 Cache has a total size of 6 words configured in 4 blocks: There are 4 words in each of the blocks. 4 blocks 2 2 address bits for the blocks. Therefore the cache address size (i.e. index) is 2 bits wide: Decimal Address of Reference Binary conversio n IDEX TAG OFFSET HIT/MISS Miss Hit Hit Miss Miss Hit Miss Miss Miss COTET Set: 3,2,,0 3,2,,0 3,2,,0 Set: 23,22,2,20 Set: 9,8,7,6 23,22,2,20 Set: 27,26,25,24 Set: 5,50,49,48 Set: 3,2,,0 Memory address + Index + offset (2 bits) Hit 27,26,25,24 Final Cache Contents CACHE IDEX (Blocks) Final Cache Contents ot Set Computer Architecture CS

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM