Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Size: px
Start display at page:

Download "Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng"

Transcription

1 Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018

2 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 2/71 Contents Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

3 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 3/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

4 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 4/71 Must-haves for caches Here are the two essential requirements for a cache: Correctness: A cache must never feed a garbage instruction or data item to a processor core Speed, when there is a cache hit: Pipeline designs assume that instructions and data memory items will be ready within a fixed number of clock cycles, such as 1 cycle for the 5-stage pipeline we ve just studied, perhaps 2 4 cycles for deeper 10- to 15-stage pipelines in current commercial processors

5 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 5/71 Goals in cache design In comparing designs that are correct and sufficiently fast, these goals are important: Low miss rate for most programs Misses can t be entirely avoided, but obviously it s good to make them infrequent Relatively low chip area Smaller is better Relatively low energy use per clock cycle Why is this important both for computers running on batteries and for plugged-in computers? There are tradeoffs here! Most ways to reduce miss rate require more transistors (so more chip area) and more energy per clock cycle

6 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 6/71 Consequence of the requirement for speed The decisions about where to look in a cache, and whether or not there was a hit must both be very fast The choices are: Look in one place only, and do a simple comparison of bit patterns to determine hit or miss Look in multiple places at the same time, and do multiple parallel comparisons to determine hit or miss Serial solutions (check in one place, then in another, maybe then yet another, etc) are too slow

7 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 7/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

8 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 8/71 Direct-mapped caches A direct-mapped cache uses a look-in-one-place-only strategy to detect hits or misses Let s look at an example for MIPS with room for 1024 instructions (if it s an I-cache) or 1024 data words (if it s a D-cache) Note: 1024 = 2 10 How is cache capacity defined? What is the capacity of this example cache in KB? Our cache will be a small memory array, with ten-bit addresses The dimensions will be 1024 rows 53 columns We ll see shortly why 1024 won t work

9 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 9/71 How addresses are split for our example cache Here is a -bit main memory address split into three pieces: 31 search tag set bits 00 byte offset Byte offset: In our simple example that allows only word accesses, these bits don t get used They would matter in a cache that supported instructions like LB, LBU, and SB Set bits: These ten bits are used as an address into the cache Many textbooks use the word index for what Harris and Harris call the set bits Your instructor is likely to slip from time to time and say index when he means set bits! Search tag: We ll soon see how these 20 bits get used

10 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 10/71 An example address split for our example cache Suppose that 0x0040_3570 is the main memory address of some instruction, and our example cache is an I-cache To try to find this instruction in the I-cache, the main memory address is split into search tag, set bits, and byte offset as How many other addresses of main memory words would generate the same set bit pattern of ?

11 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 11/71 Memory cell organization in our example cache Our cache is organized as 1024 sets Each set needs 53 1-bit SRAM cells 1024 sets V-bits SRAM 20 SRAM cells per stored tag SRAM cells per cached instruction or data word A valid bit, or V-bit, indicates whether its set contains valid information: 1 means YES and 0 means NO The meaning of the stored tags will be explained by example

12 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 12/71 Hit detection logic in the example direct-mapped cache set bits from main memory address to 1024 decoder search tag from main memory address V 0 bits x20 stored tag array bit equality comparator Active wires, SRAM cells, and logic shown in RED 1 means Hit 0 means Miss For a hit the V-bit in the selected set must be 1 AND the stored tag in the selected set must match the search tag

13 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 13/71 Tracing an example instruction fetch Let s continue to suppose that our example cache is an I-cache Suppose also that a program has been running for a while, so many of the program s instructions are in the I-cache Finally, suppose that the processor core now wants the instruction at address 0x0040_3570 As seen a few slides back, the address is split into search tag, set bits, and byte offset as Set two in the cache is checked the decoder selects one 20-bit stored tag to copy into the 20-bit equality comparator, as shown on slide 12

14 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 14/71 Tracing an example instruction fetch, continued two is 348 ten This fact is irrelevant to the digital hardware, but handy for human discussion How is it determined whether there is a hit or a miss? What happens if there is a hit? What happens if there is a miss? What is a possibly unfortunate side effect of a miss if the V-bit in the selected set is already 1 when the miss is detected?

15 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 15/71 Textbook example, pages Harris and Harris present an example of a tiny direct-mapped cache with 8-word (so -byte) capacity (It s too small to be practical, but the size makes it possible to draw a schematic without using a lot of or ) The schematic on page 484 shows that the cache has 8 sets Using that information, let s show how main memory addresses should be split into byte offset, set bits, and search tag for this cache Highly recommended: Please read Example 88 on page 485 for a brief and clear example of how a data cache is helpful when a program has good temporal locality of reference

16 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 16/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

17 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 17/71 The set-bit conflict problem in direct-mapped caches for (i = 0; i < n; i++) { x = foo(i); bar(x); } Let s consider again the example 1024-word direct-mapped I-cache presented in slides 8 14 Suppose instructions for foo start at 0x0040_3570 Suppose instructions for bar start at 0x0041_3570 What is going to happen in the I-cache?

18 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 18/71 Set-bit conflicts, continued What could make the for-loop example even worse? Can you think of conflict examples for a direct-mapped D-cache?

19 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 19/71 The general problem of set-bit conflicts A conflict in a direct-mapped cache occurs when two or more frequently-accessed instructions or two or more frequentlyaccessed data items share the set bits for access in an I-cache or a D-cache Conflicts can cause a high miss rate and therefore many lost clock cycles even in a situation where a cache is not close to full of frequently-accessed items Misses due to conflicts can be reduced significantly but not totally eliminated by the use of set-associative caches

20 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 20/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

21 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 21/71 Set-associative caches Instead of starting with a definition for set-associative cache, let s jump straight into an example Suppose we would like the same 1024-word capacity as in our previous direct-mapped example, but with a 2-way set-associative organization It turns out that for this new design, main memory addresses should be split like this: 31 search tag set bits 00 byte offset

22 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 22/71 The next slide shows a schematic of the hit detection and read logic for our example 1024-word two-way set-associative cache Let s look at the schematic and answer some questions: Why is 9 the right number of set bits? What do the = boxes do? What is the purpose of the AND and OR gates? What does the mux do in the case of a hit? What does the mux do in the case of a miss? What will happen in the selected set in the case of a miss?

23 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 23/71 main memory address way 1 way 0 21 search tag 9 set bits 00 decoder V tag 21 data/instr V tag 21 data/instr set 511 set 510 set 1 set 0 = = Hit 1 Hit 0 1 for hit, 0 for miss 1 0 data/instruction available to core

24 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 24/71 Notes about slide 23 I designed the circuit and schematic to be very similar to Figure 89 on page 486 of Harris and Harris please take a close look at that figure I drew in the decoder to be as clear as possible about how one set is inspected for a hit, while all other sets are ignored Obviously, a capacity of 1024 words (my example) versus a capacity of 8 words (textbook example) affects the number of sets, the number of set bits, and the number of bits in tags

25 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 25/71 Logic details we WON T study for ANY cache designs We won t look at wiring and logic for communication of data between the cache and the processor core in the case of a memory write We won t look at wiring and logic for communication of addresses and data-or-instructions between the cache and main memory However, we will look at important concepts related to writing to caches and main memory

26 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 26/71 Review: Set-bit conflicts in a direct-mapped cache This is from slide 17: for (i = 0; i < n; i++) { x = foo(i); bar(x); } It s possible, by bad luck, that instruction addresses from foo generate the same set bits as instruction addresses from bar That s not a problem in a two-way set-associative cache

27 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 27/71 If foo needs to use sets , and bar needs to use sets , that can be resolved without conflict: V tag way 1 way 0 bar instr s and tags data/instr V tag data/instr foo instr s and tags set 511 set 353 set 352 set 351 set 350 set 349 set 348 set 0 What kind of conflict problem in the for loop would a 2-way cache be unable to solve?

28 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 28/71 How many ways in an N-way set-associative cache? In the example we ve just studied, N = 2 In a lot of textbook and lecture examples, N = 2, simply because with N > 2 it gets hard to fit diagrams into pages and slides! However, many caches in modern processor chips are 4-way, 8-way, or 16-way set-associative

29 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 29/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

30 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 30/71 Fully-associative caches A fully-associative cache can be thought of as an extreme case of a set-associative cache, in which there is only one set Every lookup in a fully-associative cache requires a parallel, simultaneous check of all of the V-bits and tags in the cache This uses a lot of energy, and makes fully-associative design a poor choice for medium-size and large caches Some very small memory systems do use fully-associative lookup See pages of the textbook for a little more discussion of fully-associative caches

31 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 31/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

32 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide /71 Multi-word blocks DRAM latency is the length of time needed for a DRAM system to start sending data in response to a read request This is typically processor core clock cycles DRAM bandwidth is the rate at which a DRAM system can transmit data from sequential addresses once transmission has started DRAM bandwidth is much less of a problem than DRAM latency For example, in a typical laptop with DRAM on two DIMMs, 64 bytes (so, 512 bits) of instructions or data can be transmitted in about 8 core clock cycles Conclusion: It makes much more sense to read DRAM in many-word bursts than it does to read DRAM in single-word accesses

33 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 33/ word direct-mapped cache with 16-word blocks A schematic for the read logic in this kind of cache is shown on the next slide For each V-bit and tag there will be 16 instruction words (if it s an I-cache) or 16 data words (if it s a D-cache) For this cache it turns out to be necessary to split the main memory address into four pieces, as follows: search tag set bits block 00 offset byte offset

34 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 34/71 main memory address 20 search tag 6 set bits block offset 00 4 decoder V tag data/instr words per block set 63 set 62 set 1 set 0 = Hit data/instruction word available to core

35 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 35/71 For the cache on the previous slide, let s answer some questions Q1: Is hit detection really any different from hit detection in a direct-mapped cache with one-word blocks? Q2: What are the roles of the block offset and the 16:1 -bit bus multiplexer? If we use this design as an I-cache, and try a fetch with the previous example instruction address of 0x0040_3570, the address would be split into tag, set bits, block offset, and byte offset as Q3: How is a hit detected for the above address, and in the case of a hit, how does the correct instruction get fed to the processor core?

36 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 36/71 Example instruction fetch in the cache of slide 34: Miss In set 21 either the V-bit is 0 or the stored tag doesn t match the tag coming from the instruction address The whole 16-word block needs to be replaced! Why? Words with addresses 0x0040_3540, 0x0040_3544,, 0x0040_357c, get copied from main memory into set 21 in the cache Why are those the right addresses to use? What other updates happen in set 21? What else happens?

37 16 word addresses with common tags and indexes address (hex) address (base two) 0x0040_ x0040_ x0040_ x0040_354c x0040_ x0040_ x0040_ x0040_355c x0040_ x0040_ x0040_ x0040_356c x0040_ x0040_ x0040_ x0040_357c

38 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 38/71 Why use multi-word blocks? As seen a few slides back, reading 64 bytes (16 4-byte words) from consecutive DRAM addresses does not take much more time than reading one 4-byte word Because most programs have good spatial locality of reference, it pays to read many adjacent words at once when copying instructions or data from main memory to a cache 64 bytes is a very common block size in current computers Much smaller block sizes (2 or 4 words) might be used to make a point in classroom examples or lab exercises

39 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 39/71 I-cache spatial locality example Suppose a MIPS program calls procedure bob for the first time bob is a leaf procedure with 27 instructions Suppose every one of those instructions is fetched at least once The instruction cache is organized like the cache on slide 34 How many misses will there be in the I-cache while the procedure runs?

40 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 40/71 D-cache spatial locality example Suppose that i, n and sum are of type int, in GPRs, and a is of type int*, also in a GPR for (i = 0; i < n; i++) { sum += a[i]; } The data cache is organized like the cache on slide 34 Assume that none of the array elements read by the loop are in the D-cache when the loop starts Roughly how many misses will there be in the D-cache while the loop runs?

41 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 41/71 Summary of cache examples in this slide set and textbook Section 83 The example 1024-word caches we ve seen in this slide set were, in order, direct-mapped with 1-word blocks; set-associative with 1-word blocks; direct-mapped with multi-word blocks Section 83 of the textbook presents example 8-word caches in exactly the same order A capacity of 8 words is far too small to be useful, but does allow textbook authors to draw diagrams showing entire caches

42 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 42/71 Caches in current processor chips Current processor chips have caches that are set-associative and have multi-word blocks Q1: Why might textbooks and lectures slides not show a set-associative cache with multi-word blocks? Q2: What are the capacity, associativity, and block size of the L1 (level one) D-caches in an Intel Core i7 chip?

43 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 43/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

44 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 44/71 Replacement policies The term replacement policy is a general name for a method of deciding where to place new instructions or data when they are brought into a cache after a miss First, an important review item: If a cache has multi-word blocks, replacement must bring an entire block into a cache, so that all of the updated block is consistent with the new tag for the block Second, a question: In a direct-mapped cache, is replacement a complicated problem with several different reasonable solutions?

45 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 45/71 Replacement in set-associative caches A block of data or instructions will be brought into a cache as a result of a cache miss The set bits of the address that caused the miss dictate which set must receive the new block But in an N-way set-associative cache, any one of the N blocks in the selected set could receive the new block Is it possible for the cache to make a perfect choice about which of N blocks to replace?

46 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 46/71 LRU replacement in 2-way set-associative caches A replacement strategy that works well for 2-way set-associative caches is called LRU, for least-recently-used An extra SRAM bit, called U, can be added to each set, to indicate which of the two blocks in the set has been less recently used This is shown on the next slide Questions, related to the next slide: Q1: How does a hit in set 233, way 0 affect the U bit in set 233? Q2: How does a hit in set 42, way 1 affect the U bit in set 42? Q3: If there is a miss in set 98, where in the cache does the new data or instruction go?

47 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 47/71 This is the cache of slide 23, with one extra 1-bit SRAM cell per set the U-bit main memory address 21 search tag 9 set bits 00 decoder V tag way 1 21 data/instr V tag way 0 21 data/instr U set 511 set 510 set 1 set 0 = = Hit 1 Hit 0 1 for hit, 0 for miss 1 0 data/instruction available to core

48 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 48/71 More about LRU replacement What is the rationale for LRU replacement? LRU replacement is easy to implement in an N-way set-associative cache if N = 2 Is it easy to implement when N > 2? Why or why not? If not, what might be a reasonable alternative?

49 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 49/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

50 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 50/71 Introduction to cache write policies So far, lecture and textbook discussion about caches has focused on two very similar problems: For instruction fetch in an I-cache, how is a hit detected, and what should happen if there is a miss? When a load instruction searches for data in a D-cache, how is a hit detected, and what should happen if there is a miss? A really important topic, not covered in depth in the textbook, is, How does a D-cache support STORE instructions?

51 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 51/71 Quick Review: One level of cache, no address translation physical address physical address processor core (control, PC, GPRs, ALUs, etc) instructions physical address I-cache D-cache main memory data instructions or data

52 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 52/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

53 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 53/71 Write buffers A write buffer is a critical component of a system for handling writes with a memory system that has caches In a simple system, like the one shown on the next slide, a write buffer contains a collection of data addresses and associated data items that are waiting to be copied out from a D-cache to main memory Depending on the D-cache design, data items in the write buffer could be as narrow as a single byte, or as wide as an entire D-cache block

54 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 54/71 This is a slight enhancement to the diagram on slide 51 Note that data bound from the core to main memory must pass through the write buffer physical address physical address processor core (control, PC, GPRs, ALUs, and so on) instruction physical address I-cache main memory data D-cache bwb bwb stands for block width in bits write buffer bwb instructions or data

55 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 55/71 Attention: The core and the caches are fast! Main memory is relatively slow A write buffer allows the core and the D-cache to keep running while multiple chunks of a data have accumulated in the write buffer, waiting for the time-consuming trip to main memory If the write buffer becomes completely full, the core may have to stall until at least some of the data in the write buffer has moved out to main memory

56 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 56/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

57 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 57/71 Writethrough and writeback policies Writethrough and writeback are names given to kinds of policies for making sure that D-caches properly transmit data to main memory in response to store instructions There are many variations of writethrough policies There are many variations of writeback policies In ENCM 369 in 2018, we ll just briefly present the basics of each kind of policy

58 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 58/71 processor core (control, PC, GPRs, ALUs, and so on) physical address instruction physical address I-cache physical address main memory data D-cache bwb write buffer bwb instructions or data The key difference between writethrough and writeback is in the rules for deciding when an address and some data must be put in the write buffer

59 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 59/71 Writethrough Here s a simple example of a writethrough policy: Every store instruction sends an address and some data to the write buffer! So every write goes through the D-cache, regardless of whether there whether there is a hit or a miss in the D-cache In the case of a D-cache write miss, replacement of a block is similar to what happens in response to a read miss

60 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 60/71 Writeback Now, here s a simple example of a writeback policy: If a store instruction hits in the D-cache, the data for the store is written into the appropriate D-cache block but not to the write buffer! A block in the D-cache containing fresh data that hasn t yet been put in the write buffer is called a dirty block Data in a dirty block will go to the write buffer when that block is evicted from the D-cache

61 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 61/71 Block status bits in writeback caches In earlier cache examples, which were I-caches, a block could have one of two statuses: valid or invalid, as indicated by the V-bit In a writeback D-cache, an extra status bit is added to each block, called the dirty bit, or D-bit A block can have any one of three statuses: invalid (V = 0), valid-and-clean (D = 0, V = 1), or valid-and-dirty (D = 1, V = 1) For the organization of slide 58 A valid, clean block is a perfect reflection of main memory contents A valid, dirty block has fresh data that isn t yet in main memory

62 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 62/71 The next slide shows the addition of D-bits to the 1024-word direct-mapped cache with 1-word blocks, to enable the cache to function as a writeback cache The words to replacement logic indicate that some details have been left out a miss in a set with D = 1 and V = 1 will require copying a data address and a data word to the write buffer

63 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 63/71 main memory address 20 search tag 10 set bits 00 decoder byte offset D V tag 20 data set 1023 set 1022 set 1 set 0 = data available to core to replacement logic Hit (1 for hit, 0 for miss)

64 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 64/71 As noted previously, caches in modern processor chips tend both to be set-associative and to have multi-word blocks The next slide shows a 1024-word 2-way set-associative D-cache with 2-word blocks Note that each 2-word block has one D-bit, one V-bit, and one stored tag The U bit in each helps with an LRU replacement policy

65 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 65/71 main memory address 21 search tag 8set bits 00 decoder D V tag word 1 word 0 D V tag word 1 word 0 U way way 0 set 255 set 254 set 1 set 0 block offset Hit 1 = Hit 0 = 1 for hit, 0 for miss data available to core

66 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 66/71 Writethrough is simpler but writeback is more efficient Most current data caches use some kind of writeback policy Imagine that a program is making very frequent updates to elements of a small array With a writeback cache these updates leave the bus between cache and main memory idle, which saves energy (and would make bus bandwidth available for other cores in a multi-core system); in comparison to a writethrough cache, the risk that the write buffer could become full is reduced

67 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 67/71 Outline of Slide Set 9 Introduction to cache design Direct-mapped caches The Set-Bit Conflict Problem in Direct-Mapped Caches Set-Associative Caches Fully-Associative Caches Multi-Word Blocks Replacement Policies Introduction to cache write policies Write buffers Writethrough and writeback policies Multi-level cache systems

68 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 68/71 Multi-level cache systems All the cache designs we have looked at so far are single-level: A miss in the I-cache or D-cache requires access to main memory for data Most current cache systems are multi-level Slide 70 shows the simplest reasonable two-level arrangement Cache level numbering starts with L1 (level one) closest to the core Note that the L2 cache is unified unlike the L1 caches, it holds a mix of instructions and data

69 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 69/71 Review yet again: One level of cache, no address translation physical address physical address I-cache processor core (control, PC, GPRs, ALUs, and so on) instruction physical address main memory data D-cache bwb write buffer bwb instructions or data

70 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 70/71 Two levels of cache, no address translation core phys addr instr phys addr data L1 I-cache L1 D-cache bwb write buffer phys addr bwb instrs or data L2 unified cache bwb write buffer phys addr bwb instrs or data main memory

71 ENCM 369 Winter 2018 Section 01 Slide Set 9 slide 71/71 Design goals for multi-level cache systems Key fact: Small SRAM arrays are faster than medium-size SRAM arrays, because cells in smaller arrays drive shorter wires L1 caches are small, so they can be fast enough to take care of hits from the core in roughly 1 3 clock cycles L2 caches are bigger and slower than L1 caches; much smaller and much faster than the DRAM used for main memory If a memory access misses in L1 but hits in L2, the time lost by the core is much much less than the time lost in a miss that involves access to main memory

Slide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Slide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng Slide Set 5 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide

More information

Contents Slide Set 9. Final Notes on Textbook Chapter 7. Outline of Slide Set 9. More about skipped sections in Chapter 7. Outline of Slide Set 9

Contents Slide Set 9. Final Notes on Textbook Chapter 7. Outline of Slide Set 9. More about skipped sections in Chapter 7. Outline of Slide Set 9 slide 2/41 Contents Slide Set 9 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014

More information

Slide Set 8. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 8. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 8 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides Slide Set 1 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section

More information

ENCM 369 Winter 2016 Lab 11 for the Week of April 4

ENCM 369 Winter 2016 Lab 11 for the Week of April 4 page 1 of 13 ENCM 369 Winter 2016 Lab 11 for the Week of April 4 Steve Norman Department of Electrical & Computer Engineering University of Calgary April 2016 Lab instructions and other documents for ENCM

More information

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015 CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable

More information

Slides for Lecture 15

Slides for Lecture 15 Slides for Lecture 15 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 6 March,

More information

Contents. Slide Set 2. Outline of Slide Set 2. More about Pseudoinstructions. Avoid using pseudoinstructions in ENCM 369 labs

Contents. Slide Set 2. Outline of Slide Set 2. More about Pseudoinstructions. Avoid using pseudoinstructions in ENCM 369 labs Slide Set 2 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions 1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College Computer Systems C S 0 7 Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College 2 Today s Topics TODAY S LECTURE: Caching ANNOUNCEMENTS: Assign6 & Assign7 due Friday! 6 & 7 NO late

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

ENCM 501 Winter 2015 Tutorial for Week 5

ENCM 501 Winter 2015 Tutorial for Week 5 ENCM 501 Winter 2015 Tutorial for Week 5 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 11 February, 2015 ENCM 501 Tutorial 11 Feb 2015 slide

More information

Slide Set 1 (corrected)

Slide Set 1 (corrected) Slide Set 1 (corrected) for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary January 2018 ENCM 369 Winter 2018

More information

Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng Slide Set 7 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide

More information

CS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017

CS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017 CS 1: Intro to Systems Caching Martin Gagne Swarthmore College March 2, 2017 Recall A cache is a smaller, faster memory, that holds a subset of a larger (slower) memory We take advantage of locality to

More information

Integer Multiplication and Division

Integer Multiplication and Division Integer Multiplication and Division for ENCM 369: Computer Organization Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 208 Integer

More information

Slide Set 3. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 3. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 3 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary January 2018 ENCM 369 Winter 2018 Section

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

Key Point. What are Cache lines

Key Point. What are Cache lines Caching 1 Key Point What are Cache lines Tags Index offset How do we find data in the cache? How do we tell if it s the right data? What decisions do we need to make in designing a cache? What are possible

More information

Caches Concepts Review

Caches Concepts Review Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

CS 136: Advanced Architecture. Review of Caches

CS 136: Advanced Architecture. Review of Caches 1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you

More information

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic

More information

Reducing Conflict Misses with Set Associative Caches

Reducing Conflict Misses with Set Associative Caches /6/7 Reducing Conflict es with Set Associative Caches Not too conflict y. Not too slow. Just Right! 8 byte, way xx E F xx C D What should the offset be? What should the be? What should the tag be? xx N

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

Slide Set 1. for ENCM 339 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

Slide Set 1. for ENCM 339 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary Slide Set 1 for ENCM 339 Fall 2016 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary September 2016 ENCM 339 Fall 2016 Slide Set 1 slide 2/43

More information

COMP 3221: Microprocessors and Embedded Systems

COMP 3221: Microprocessors and Embedded Systems COMP 3: Microprocessors and Embedded Systems Lectures 7: Cache Memory - III http://www.cse.unsw.edu.au/~cs3 Lecturer: Hui Wu Session, 5 Outline Fully Associative Cache N-Way Associative Cache Block Replacement

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Lecture 17 Introduction to Memory Hierarchies Why it s important  Fundamental lesson(s) Suggested reading: (HP Chapter Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "

More information

Lecture 11 Cache. Peng Liu.

Lecture 11 Cache. Peng Liu. Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

Slide Set 2. for ENCM 335 in Fall Steve Norman, PhD, PEng

Slide Set 2. for ENCM 335 in Fall Steve Norman, PhD, PEng Slide Set 2 for ENCM 335 in Fall 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary September 2018 ENCM 335 Fall 2018 Slide Set 2 slide

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. 13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 48 Computer Organization 5. The Basics of Cache Chansu Yu Caches: The Basic Idea A smaller set of storage locations storing a subset of information from a larger set (memory) Unlike registers or memory,

More information

Computer Science 432/563 Operating Systems The College of Saint Rose Spring Topic Notes: Memory Hierarchy

Computer Science 432/563 Operating Systems The College of Saint Rose Spring Topic Notes: Memory Hierarchy Computer Science 432/563 Operating Systems The College of Saint Rose Spring 2016 Topic Notes: Memory Hierarchy We will revisit a topic now that cuts across systems classes: memory hierarchies. We often

More information

Memory Hierarchy: Caches, Virtual Memory

Memory Hierarchy: Caches, Virtual Memory Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2014 Lecture 14 LAST TIME! Examined several memory technologies: SRAM volatile memory cells built from transistors! Fast to use, larger memory cells (6+ transistors

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Slide Set 5. for ENCM 369 Winter 2014 Lecture Section 01. Steve Norman, PhD, PEng

Slide Set 5. for ENCM 369 Winter 2014 Lecture Section 01. Steve Norman, PhD, PEng Slide Set 5 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5. Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.10, 5.15; Also, 5.13 & 5.17 Writing to caches: policies, performance Cache tradeoffs and

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #21: Caches 3 2005-07-27 CS61C L22 Caches III (1) Andy Carle Review: Why We Use Caches 1000 Performance 100 10 1 1980 1981 1982 1983

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Caches (Writing) P & H Chapter 5.2 3, 5.5 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Memory Code Stored in Memory (also, data and stack) memory PC +4 new pc

More information

MIPS) ( MUX

MIPS) ( MUX Memory What do we use for accessing small amounts of data quickly? Registers (32 in MIPS) Why not store all data and instructions in registers? Too much overhead for addressing; lose speed advantage Register

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 23 Hierarchical Memory Organization (Contd.) Hello

More information

CSE 410 Computer Systems. Hal Perkins Spring 2010 Lecture 12 More About Caches

CSE 410 Computer Systems. Hal Perkins Spring 2010 Lecture 12 More About Caches CSE 4 Computer Systems Hal Perkins Spring Lecture More About Caches Reading Computer Organization and Design Section 5. Introduction Section 5. Basics of Caches Section 5. Measuring and Improving Cache

More information

Cache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng

Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide

More information

CS152 Computer Architecture and Engineering Lecture 17: Cache System

CS152 Computer Architecture and Engineering Lecture 17: Cache System CS152 Computer Architecture and Engineering Lecture 17 System March 17, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http//http.cs.berkeley.edu/~patterson

More information

Levels in memory hierarchy

Levels in memory hierarchy CS1C Cache Memory Lecture 1 March 1, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs1c/schedule.html Review 1/: Memory Hierarchy Pyramid Upper Levels in memory hierarchy

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary Chapter 8 :: Systems Chapter 8 :: Topics Digital Design and Computer Architecture David Money Harris and Sarah L. Harris Introduction System Performance Analysis Caches Virtual -Mapped I/O Summary Copyright

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

Caches. Samira Khan March 23, 2017

Caches. Samira Khan March 23, 2017 Caches Samira Khan March 23, 2017 Agenda Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Basics of Cache Memory James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy Cache Memory Cache

More information

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory

More information

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes) Caches Han Wang CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes) This week: Announcements PA2 Work-in-progress submission Next six weeks: Two labs and two projects

More information

Slide Set 5. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 5. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 5 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2018 ENCM 369 Winter 2018 Section

More information

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6. Announcement Computer Architecture (CSC-350) Lecture 0 (08 April 008) Seung-Jong Park (Jay) http://www.csc.lsu.edu/~sjpark Chapter 6 Objectives 6. Introduction Master the concepts of hierarchical memory

More information

Winter 2009 FINAL EXAMINATION Location: Engineering A Block, Room 201 Saturday, April 25 noon to 3:00pm

Winter 2009 FINAL EXAMINATION Location: Engineering A Block, Room 201 Saturday, April 25 noon to 3:00pm University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Lecture Instructors: S. A. Norman (L01), N. R. Bartley (L02) Winter 2009 FINAL EXAMINATION Location:

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required

More information

Slide Set 11. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Slide Set 11. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng Slide Set 11 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section

More information

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache. ECE 201 - Lab 8 Logic Design for a Direct-Mapped Cache PURPOSE To understand the function and design of a direct-mapped memory cache. EQUIPMENT Simulation Software REQUIREMENTS Electronic copy of your

More information

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27

More information

See also cache study guide. Contents Memory and Caches. Supplement to material in section 5.2. Includes notation presented in class.

See also cache study guide. Contents Memory and Caches. Supplement to material in section 5.2. Includes notation presented in class. 13 1 Memory and Caches 13 1 See also cache study guide. Contents Supplement to material in section 5.2. Includes notation presented in class. 13 1 LSU EE 4720 Lecture Transparency. Formatted 14:51, 28

More information

CISC 360. Cache Memories Exercises Dec 3, 2009

CISC 360. Cache Memories Exercises Dec 3, 2009 Topics ν CISC 36 Cache Memories Exercises Dec 3, 29 Review of cache memory mapping Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. ν Hold frequently

More information

Lecture 17: Memory Hierarchy: Cache Design

Lecture 17: Memory Hierarchy: Cache Design S 09 L17-1 18-447 Lecture 17: Memory Hierarchy: Cache Design James C. Hoe Dept of ECE, CMU March 24, 2009 Announcements: Project 3 is due Midterm 2 is coming Handouts: Practice Midterm 2 solutions The

More information

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies 1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 14 Caches III Lecturer SOE Dan Garcia Google Glass may be one vision of the future of post-pc interfaces augmented reality with video

More information

Review : Pipelining. Memory Hierarchy

Review : Pipelining. Memory Hierarchy CS61C L11 Caches (1) CS61CL : Machine Structures Review : Pipelining The Big Picture Lecture #11 Caches 2009-07-29 Jeremy Huddleston!! Pipeline challenge is hazards "! Forwarding helps w/many data hazards

More information

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1> Digital Logic & Computer Design CS 4341 Professor Dan Moldovan Spring 21 Copyright 27 Elsevier 8- Chapter 8 :: Memory Systems Digital Design and Computer Architecture David Money Harris and Sarah L.

More information

Computer Architecture Memory hierarchies and caches

Computer Architecture Memory hierarchies and caches Computer Architecture Memory hierarchies and caches S Coudert and R Pacalet January 23, 2019 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Computer Architecture V Fall Practice Exam Questions

Computer Architecture V Fall Practice Exam Questions Computer Architecture V22.0436 Fall 2002 Practice Exam Questions These are practice exam questions for the material covered since the mid-term exam. Please note that the final exam is cumulative. See the

More information

13-1 Memory and Caches

13-1 Memory and Caches 13-1 Memory and Caches 13-1 See also cache study guide. Contents Supplement to material in section 5.2. Includes notation presented in class. 13-1 EE 4720 Lecture Transparency. Formatted 13:15, 9 December

More information