ECE 2300 Digital Logic & Computer Organization. Caches

ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1

Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2

Course Content Binary numbers and logic gates Boolean algebra and combinational logic Sequential logic and state machines Binary arithmetic Memories Instruction set architecture organization s and virtual memory Input/output Advanced topics Lecture 2: 3

Review: Pipelined Microprocessor PCJ CU sign bit =? MB, F MW, MD LD M U X PCJ P C PCL +2 Inst RAM Decoder Adder LD SA SB DR RF D_in M U X M U X M U X M U X M U X MB F m F ALU Data RAM D_IN MW M U X MD SE IF/ID ID/EX EX/MEM MEM/WB Lecture 2: 4

Example: Data Hazards w/ Forwarding Assume HW forwarding and NO delay slot for load Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back LW, () ADDI,, 1 BEQ,, X SW, 4() X: Lecture 2: 5

We Need Fast and Large IF ID EX MEM WB Instruction RAM Reg A L U Data RAM Reg IF/ID ID/EX EX/MEM MEM/WB cycle time: ~3ps-2ns (~3GHz-5MHz) DRAM Slow (1-5 ns for a read or write) Cheap (1 transistor + capacitor per bit cell) SRAM Fast ( s of ps to few ns for a read/write) Expensive (6 transistors per bit cell) Lecture 2: 6

Using s in the Pipeline IF ID EX MEM WB Instruction (SRAM) Reg A L U Data (SRAM) Reg IF/ID ID/EX EX/MEM MEM/WB Main (DRAM) Lecture 2: 7

Small SRAM memory that permits rapid access to a subset of instructions or data If the data is in the cache (cache hit), we retrieve it without slowing down the pipeline If the data is not in the cache (cache miss), we retrieve it from the main memory (penalty incurred in accessing DRAM) The hit rate is the fraction of memory accesses found in the cache The miss rate is (1 hit rate) Lecture 2: 8

Access with Average memory access time with cache: Hit time + Miss rate * Miss penalty Example Main memory access time = 5ns hit time = 2ns Miss rate = 1% Average mem access time w/o cache = 5ns Average mem access time w/ cache = 2 +.1*5 = 7ns Lecture 2: 9

Why s Work: Principle of Locality Temporal locality If memory location X is accessed, then it is likely to be accessed again in the near future s exploit temporal locality by keeping a referenced instruction or data in the cache Spatial locality If memory location X is accessed, then locations near X are likely to be accessed in the near future s exploit spatial locality by bringing in a block of instructions or data into the cache on a miss Lecture 2: 1

Some Important Terms is partitioned into blocks Each cache block (or cache line) typically contains multiple bytes of data A whole block is read or written during data transfer between cache and main memory Each cache block is associated with a tag and a valid bit Tag: A unique ID to differentiate between different memory blocks may be mapped into the same block Valid bit: indicates whether the data in a cache block is valid (1) or not () Lecture 2: 11

Direct Mapped Concepts A given memory block is mapped to one and only one cache block Block Block, 8, 16, 24 1 1, 9, 17, 25 2 2, 1, 18, 26 3 3, 11, 19, 27 4 4, 12, 2, 28 5 5, 13, 21, 29 6 6, 14, 22, 3 7 7, 15, 23, 31 Example: A cache with 8 blocks Block addresses in decimal Here we assume the main memory is 4 times larger than cache Lecture 2: 12

Direct Mapped (DM) Concepts 1 11 1 Same example: has 8 blocks and main memory has 32 blocks 1 1 11 Block addresses are in binary 4 different memory blocks may be mapped to the same cache location 11 Lecture 2: 13

DM Organization Lecture 2: 14

Address Translation for DM Breakdown of memory address for cache use n-i-b tag bits i index bits b byte offset bits DM cache parameters Size of each cache block is 2 b bytes cache block and cache line are synonymous Number of blocks is 2 i Total cache size is 2 b 2 i = 2 b+i bytes Lecture 2: 15

Use the index bits to retrieve the tag, data, and valid bit Reading DM Compare the tag from the address with the retrieved tag If valid & a match in tag (hit), select the desired data using the byte offset Otherwise (miss) Bring the memory block into the cache (also set valid) Store the tag from the address with the block Select the desired data using the byte offset Lecture 2: 16

Use the index bits to retrieve the tag and valid bit Writing DM Compare the tag from the address with the retrieved tag Data If valid & a match in tag (hit), write the data into the cache location Otherwise (miss), one option Bring the memory block into the cache (also set valid) Store the tag from the address with the block Write the data into the cache location Lecture 2: 17

Direct Mapped Example Size of each block is 4 bytes holds 4 blocks holds 16 blocks address has 6 bits 2 tag bits 2 index bits 2 byte offset bits 1 1 11 Lecture 2: 18

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R miss 1 1 11 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 block address (binary) Data (decimal) Lecture 2: 19

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R miss 1 1 11 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 2

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R miss 1 1 11 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 21

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R miss 1 1 11 1 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 22

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R miss 1 1 11 1 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 23

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 14 miss 1 1 11 1 1 14 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 24

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 14 miss 1 1 11 1 1 14 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 25

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 11 1 1 14 1 1 1 17 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 26

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 11 1 1 14 1 1 1 17 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 27

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 11 1 1 1 1 17 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 28

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 17 14 hit 1 1 11 1 1 1 1 17 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 29

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 17 14 hit 1 1 11 1 1 1 1 17 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 3

Doubling the Block Size Size of each block is 8 bytes holds 2 blocks holds 8 blocks address has 6 bits 2 tag bits 3 byte offset bits 1 V tag data 1 index bit Lecture 2: 31

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R miss 1 V tag data 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 32

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R miss 1 1 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 33

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R hit 1 1 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 34

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R hit 1 1 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 35

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R miss 1 1 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 36

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 14 miss 1 1 1 14 15 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 37

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 14 miss 1 1 1 14 15 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 38

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 1 14 15 1 1 16 17 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 39

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 1 14 15 1 1 16 17 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 4

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 1 1 16 17 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 41

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 17 14 hit 1 1 1 1 16 17 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 42

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 17 14 hit 1 1 1 1 16 17 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 43

Block Size Considerations Larger blocks may reduce miss rate due to spatial locality But in a fixed-sized cache Larger blocks => fewer of them => increased miss rate due to conflicts Larger blocks => data fetched along with the requested data may not be used Larger blocks increase the miss penalty Takes longer to transfer a larger block from memory Lecture 2: 44

Next Time More s Lecture 2: 45