CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Recall

Size: px

Start display at page:

Download "CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Recall"

Arline Morrison
6 years ago
Views:

1 CSCI 402: Computer Architectures Memory Hierarchy (2) Fengguang Song Department of Computer & Information Science IUPUI Recall What is memory hierarchy? Where each level is located? Each level s speed, size, and cost Principle of locality Memory technologies (SRAM, DRAM, NVM, Disk, 3D Memories) Basic idea about what it is The formula of average disk access time 6 1

2 Look more into main memory 7 CPU has an embedded memory controller (MMU). Connected to RAM via a memory interface: 64 bits wide 8 2

3 4/3/18 9 Revisit: Double Data Rate (DDR) DRAM 25.6 GB/sec is the best we can get from one memory module. Can we make the bandwidth higher? Build a multiple-channel memory system 10 3

4 Still only support 64 bits

4/3/18 To enable dual-channel architecture, you need to have: Memory Controller supporting dual-channel architecture All current CPUs support dual-channel architecture

5 4/3/18 To enable dual-channel architecture, you need to have: Memory Controller supporting dual-channel architecture All current CPUs support dual-channel architecture Two or 4 (an even number of) memory modules Each pair of modules must be identical. Install the memory modules in the correct memory sockets on the motherboard

4/3/18 15 Next: Look at Cache We have talked about main memory. 5.

6 4/3/18 15 Next: Look at Cache We have talked about main memory Basics of cache Measure and improve: cache performance 16 6

each row is a cache block (or a cache line) Given a list of memory accesses X1, X2,, Xn

7 4/3/18 One AMD CPU chip (with 8 cores) 17 What is Cache Cache: It is the level of the memory hierarchy closest to CPU (except registers) We ll start with a simple cache, where each row is a cache block (or a cache line) Given a list of memory accesses X1, X2,, Xn 1, how about n n Xn How do we know if the data (Xn) is present or not? Where do we find Xn? 18 7

Direct Mapped Cache The simplest cache is Direct Mapped Cache Given a block, its location can be decided by its address Block index = (Block address) modulo (#Blocks in cache) 8 cache lines (3 bits

which particular block is stored in a cache line?

8 Direct Mapped Cache The simplest cache is Direct Mapped Cache Given a block, its location can be decided by its address Block index = (Block address) modulo (#Blocks in cache) 8 cache lines (3 bits index) n #Blocks is a power of 2, can use low-order address bits 32 blocks in main memory (5bits block address) 19 Tags and Valid Bits But this is n:1 mapping from memory to cache Q: How do we know which particular block is stored in a cache line? Solution: store memory address as well as the data itself Actually, we only need to store the high-order bits Tag is used to identify whether a cache line in the cache corresponds to the requested block. Also, what if there is no data in a location? Valid bit: 1 = present, 0 = not present Initially = 0 Caching is the most important example of the idea of prediction This again relies on the principle of locality 20 8

Cache Example Cache: 8 blocks, 1 word/block, direct mapped cache Block address: 5 bits à 2 bits tag + 3 bits cache line index Same as the figure shown in the previous figure Initial state:

9 Cache Example Cache: 8 blocks, 1 word/block, direct mapped cache Block address: 5 bits à 2 bits tag + 3 bits cache line index Same as the figure shown in the previous figure Initial state: Index(3bits) V Tag(2bits) Data(32 bits) 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N 21 Cache Example Block addr Binary addr Hit/miss Cache block Miss 110 Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N 22 9

10 Cache Example Block addr Binary addr Hit/miss Cache block Miss 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N 23 Cache Example Block addr Binary addr Hit/miss Cache block Hit Hit 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N 24 10

11 Cache Example Block addr Binary addr Hit/miss Cache block Miss Miss Hit 000 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N 25 Cache Example Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N Block addr Binary addr Hit/miss Cache block ???? 26 11

12 Cache Example Block addr Binary addr Hit/miss Cache block Miss 010 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N 27 CPU will send a memory address in terms of byte. e.g., a pointer (32 bits) Then, how to map a byte-address (or a pointer) to a specific cache line? 28 12

13 Address Subdivision Tag Block Idx hardware information: Cache size = 1024 blocks Block size = 4 bytes 29 How about a larger Block Size (16 bytes per block) A cache (direct-mapped) has 64 blocks, 16 bytes per block Q: To what block number does byte address map? 1) block address=ë1200/16û = 75 2) cache block number = 75 modulo 64 = Tag Index Offset 22 bits 6 bits 4 bits =22 bits remaining bits of address One of 64=2 6 blocks. One of 16=2 4 bytes within given block

14 Direct-Mapped Cache Example (32 bytes per block) Given a 4 KB cache, each cache block is of 32 bytes 4 KB = 2 12, 32 = 2 5 How many blocks in the cache? 2 12 bytes / 2 5 bytes in block = 2 7 = 128 blocks How many bits are needed for the block index? log = 7 bits How many bits are need to represent the offset to select a byte in block? 5 bits How many bits left over if we assume 32-bit address? These bits are tag bits? = 20 bits 20-bit tag 7-bit index 5-bit offset Cache Overhead In the previous example, 4 KB is the visible size (to outside users) Let s look at its total space and overhead: Each cache block contains: 1 valid bit 20 bit tag 32 bytes of data (i.e., 256 bits) Total size per cache block: = 277 bits Thus, total cache size in hardware, including overhead storage is equal to: 277 bits x 128 blocks = bits = 4432 bytes = 4.32 KB Cache Overhead: 0.32 KB for valid bits and tags 14

15 Cache Access Examples Consider a direct-mapped cache with 8 blocks and 2-byte block. Address subdivision: 1 bit for offset/displacement, 3 bits for block index, the rest for Tag 4-bit tag 3-bit index 1-bit offset Consider a stream of memory reads to the following bytes: These are byte addresses: 3, 13, 1, 0, 5, 1, 4, 32, 33, 1 The corresponding cache block index: // (byteaddr/2) % 8 1, 6, 0, 0, 2, 0, 2, 0, 0, 0 Their tags: 2 for 32, 33; 0 for all the others // (byteaddr/2) / 8 Question: How many cache hits and how many cache misses? Cache block indices: 1, 6, 0, 0, 2, 0, 2, 0, 0, 0 3-bit block index 000 (0) N 001 (1) N 010 (2) N 011 (3) N 100 (4) N 101 (5) N 110 (6) N 111 (7) N V Tag (4 bits) Actual Data (2 bytes) 34 15

CSE 2021: Computer Organization

CSE 2021: Computer Organization Lecture-12a Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB