indicates problems that have been selected for discussion in section, time permitting.

Size: px
Start display at page:

Download "indicates problems that have been selected for discussion in section, time permitting."

Transcription

1 Page 1 of 17 Caches indicates problems that have been selected for discussion in section, time permitting. Problem 1. The diagram above illustrates a blocked, direct-mapped cache for a computer that uses 32-bit data words and 32-bit byte addresses. A. What is the maximum number words of data from main memory that can be stored in the cache at any one time? Maximum number of data words from main memory = (16 lines)(4 words/line) = 64 words B. How many bits of the address are used to select which line of the cache is accessed? With 16 cache lines, 4 bits of the address are required to select which line of the cache is accessed.

2 Page 2 of 17 C. How many bits wide is the tag field? Bits in the tag field = (32 address bits) - (4 bits to select line) - (4 bits to select word/byte) = 24 bits D. Briefly explain the purpose of the one-bit V field associated with each cache line. The tag and data fields of the cache will always have value in them, so the V bit is used to denote whether these value are consistent (valid) with what is in memory. Typically the V bit for each line in the cache is set to "0" when the machine is reset or the cache is flushed. E. Assume that memory location 0x2045C was present in the cache. Using the row and column labels from the figure, in what cache location(s) could we find the data from that memory location? What would the value(s) of the tag field(s) have to be for the cache row (s) in which the data appears? The cache uses ADDR[7:4] to determine where data from a particular address will be stored in the cache. Thus, location 0x C will be stored in line 5 of cache. The tag field should contain the upper 24 bits of the address, i.e., 0x Note that the bottom 4 bits of the address (0xC) determine which word and byte of the cache line is being referenced. F. Can data from locations 0x12368 and 0x322FF8 be present in the cache at the same time? What about data from locations 0x and 0x1034? Explain. Location 0x12368 will be stored in line 6 of the cache. Location 0x322F68 will be stored in line F of the cache. Since the lines differ, both locations can be cached at the same time. However, locations 0x and 0x1034 both would be stored in line 3 of cache, so they both could not be present in the cache at the same time. G. When an access causes a cache miss, how many words need to be fetched from memory to fill the appropriate cache location(s) and satisfy the request? There are 4 words in each line of the cache and since we only have one valid bit for the whole line, all 4 words have to have valid values. So to fill a cache line on a cache miss all 4 words would have to be fetched from main memory. Problem 2. Cache multiple choice:

3 Page 3 of 17 A. If a cache access requires one clock cycle and handling cache misses stalls the processor for an additional five cycles, which of the following cache hit rates comes closest to achieving an average memory access of 2 cycles? (A) 75% (B) 80% (C) 83% (D) 86% (E) 98% 2 cycle average access = (1 cycle for cache) + (1 - hit rate)(5 cycles stall) => hit rate = 80% B. LRU is an effective cache replacement strategy primarily because programs (A) exhibit locality of reference (B) usually have small working sets (C) read data much more frequently than write data (D) can generate addresses that collide in the cache (A). Locality implies that the probability of accessing a location decreases as the time since the last access increases. By choosing to replace locations that haven't been used for the longest time, the least-recently-used replacement strategy should, in theory, be replacing locations that have the lowest probability of being accessed in the future. C. If increasing the associativity of a cache improves performance it is primarily because programs (A) exhibit locality of reference (B) usually have small working sets (C) read data much more frequently than write data (D) can generate addresses that collide in the cache (D). Increasing cache associativity means that there are more cache locations in which a given memory word can reside, so replacements due to cache collisions (multiple addresses mapping to the same cache location) should be reduced. D. If increasing the block size of a cache improves performance it is primarily because programs (A) exhibit locality of reference (B) usually have small working sets (C) read data much more frequently than write data (D) can generate addresses that collide in the cache

4 Page 4 of 17 (A). Increased block size means that more words are fetched when filling a cache line after a miss on a particular location. If this leads to increased performance, then the nearby words in the block must have been accessed by the program later on, ie, the program is exhibiting locality. E. A fully-associative cache using an LRU replacement policy always has a better hit rate than a direct-mapped cache with the same total data capacity. (A) true (B) false False. Suppose both caches contain N words and consider a program that repeatedly accesses locations 0 through N (a total of N+1 words). The direct-mapped cache will map locations 0 and N into the same cache line and words 1 through N-1 into separate cache lines. So in the steady state, the program will miss twice (on locations 0 and N) each time through the loop. Now the fully-associative case: when the program first accesses word N, the FA cache will replace word 0 (the least-recently-used location). The next access is to location 0 and the FA cache will replace word 1, etc. So the FA cache is always choosing the replace the word the program is about to access, leading to a 0% hit rate! F. Consider the following program: integer A[1000]; for i = 1 to 1000 for j = 1 to 1000 A[i] = A[i] + 1 When the above program is compiled with all compiler optimizations turned off and run on a processor with a 1K byte direct-mapped write-back data cache with 4-word cache blocks, what is the approximate data cache miss rate? (Assume integers are one word long and a word is 4 bytes.) (A) % (B) 0.05% (C) 0.1% (D) 5% (E) 12.5% (A). Considering only the data accesses, the program performs 1,000,000 reads and 1,000,000 writes. Since the cache has 4-word blocks, each miss brings 4 words of the array into the cache. So accesses to the next 3 array locations won't cause a miss. Since the cache is write-back, writes happen directly into the cache without causing any memory accesses until the word is replaced. So altogether there are 250 misses (caused by a read of A[0], A[4], A[8],...), for a miss rate of 250/2,000,000 = % G. In a non-pipelined single-cycle-per-instruction processor with an instruction cache, the average instruction cache miss rate is 5%. It takes 8 clock cycles to fetch a cache line

5 Page 5 of 17 from the main memory. Disregarding data cache misses, what is the approximate average CPI (cycles per instruction)? (A) 0.45 (B) (C) 1.4 (D) 1.8 (E) 2.22 (C). CPI = (1 inst-per-cycle) + (0.05)(8 cycles/miss) = 1.4 H. Consider an 8-line one-word-block direct-mapped cache initialized to all zeroes where the following sequence of word addresses are accessed: 1, 4, 5, 20, 9, 19, 4, 5, 6, and 9. Which of the following tables reflect the final tag bits of the cache? First map the addresses to cache line numbers and tags where line number = address mod 8 tag = floor(address / 8) address: line #: tag: So, figure (E) represents the final tag bits of the cache. I. Consider the following partitioning of a CPU's 32-bit address output which is fed to a direct-mapped write-back cache: What is the memory requirement for data, tag and status bits of the cache?

6 Page 6 of 17 (A) 8 K bits (B) 42 K bits (C) 392 K bits (D) 1,160 K bits (E) 3,200 K bits The tag is 15 bits, cache index is 13 bits, and byte offset 4 bits (ie, 16 bytes/block). So there are 2 13 = 8192 cache lines. Each cache line requires 15 tag bits 1 valid bit 1 dirty bit (since this is a write-back cache) 128 data bits (16 bytes/cache line) === 145 bits per cache line Total storage required = 8192*145 bits = 1,160K bits. Problem 3. A student has miswired the address lines going to the memory of an unpipelined BETA. The wires in question carry a 30-bit word address to the memory subsystem, and the hapless student has in fact reversed the order of all 30 address bits. Much to his surprise, the machine continues to work perfectly. A. Explain why the miswiring doesn't affect the operation of the machine. Since the Beta reverses the order of the 30 bit address in the same manner for each memory access, the Beta will use the same reversed address to access a particular memory location for both stores and loads. Thus, the operation of the machine will not be affected. B. The student now replaces the memory in his miswired BETA with a supposedly higher performance unit that contains both a fast direct mapped cache and the same memory as before. The reversed wiring still exists between the BETA and this new unit. To his surprise, the new unit does not significantly improve the performance of his machine. In desperation, the student then fixes the reversal of his address lines and the machine's performance improves tremendously. Explain why this happens. Caches take advantage of locality of reference by reading in an entire block of related data at one time, thereby reducing main memory accesses. By reversing the order of the 30 bit address, locality of the memory addresses is disrupted. The low-order bits that would normally place related data close to one another are instead the high-order bits and related data is more spread out through the main memory. This reduction in locality reduces cache performance significantly. When the student fixes the address line reversal problem, locality of the memory is restored, and the cache can perform as intended.

7 Page 7 of 17 Problem 4. For this problem, assume that you have a processor with a cache connected to main memory via a bus. A successful cache access by the processor (a hit) takes 1 cycle. After an unsuccessful cache access (a miss), an entire cache block must be fetched from main memory over the bus. The fetch is not initiated until the cycle following the miss. A bus transaction consists of one cycle to send the address to memory, four cycles of idle time for main-memory access, and then one cycle to transfer each word in the block from main memory to the cache. Assume that the processor continues execution only after the last word of the block has arrived. In other words, if the block size is B words (at 32 bits/word), a cache miss will cost B cycles. The following table gives the average cache miss rates of a 1 Mbyte cache for various block sizes: A. Write an expression for the average memory access time for a 1-Mbyte cache and a B- word block size (in terms of the miss ratio m and B). Average access time = (1-m)(1 cycle) + (m)(6 + B cycles) = 1 + (m)(5+b) cycles B. What block size yields the best average memory access time? C. If bus contention adds three cycles to the main-memory access time, which block size yields the best average memory access time?

8 Page 8 of 17 D. If bus width is quadrupled to 128 bits, reducing the time spent in the transfer portion of a bus transaction to 25% of its previous value, what is the optimal block size? Assume that a minimum one transfer cycle is needed and don't include the contention cycles introduced in part (C). Problem 5. The following four cache designs C1 through C4, are proposed for the Beta. All use LRU replacement where applicable (e.g. within each set of a set associative cache). A. Which cache would you expect to take the most chip area (hence cost)? Cache C4 would most likely take up the most chip area because it is fully associative, thereby requiring a comparator for each cache line, and because it has the most data word capacity.

9 Page 9 of 17 B. Which cache is likely to perform worst in a benchmark involving repeated cycling through an array of 6K integers? Explain. C2 would likely have the worst performance on a benchmark involving repeated cycling through an array of 6K integers since it is the only cache listed with less than 6K data word capacity. C. It is observed that one of the caches performs very poorly in a particular benchmark which repeatedly copies one 1000-word array to another. Moving one of the arrays seems to cure the problem. Which cache is most likely to exhibit this behavior? Explain. We are told that one of the caches performs poorly in a particular benchmark which repeatedly copies one 1000-word array to another and that if one of the arrays is moved, the problem seems to be cured. This behavior is most likely exhibited by cache C3 because it is a direct mapped cache which only has one location to put any particular address. If the lower bits (used to choose the cache line) for the addresses of the array overlap, poor performance could be observed. Moving the array so that the lower bits of the array addresses don't overlap could solve this problem. D. Recall that we say cache A dominates cache B if for every input pattern, A caches every location cached by B. Identify every pair (A, B) of caches from the above list where A dominates B. Explain your reasoning. So long as we are not using a random replacement strategy, it is always possible to come up with a benchmark that will make a particular type of cache have a miss on every data access. Thus, we cannot say that one particular type of cache always dominates another type of cache. However, we can compare two caches of the same type. Both C4 and C1 are fully associative caches with the same replacement strategy. We can say that C4 dominates C1 since C4 has a greater data word capacity. Problem 6. The data-sheet for a particular byte-addressable 32-bit microprocessor reads as follows: The CPU produces a 32-bit virtual address for both data and instruction fetches. There are two caches: one is used when fetching instructions; the other is used for data accesses. Both caches are virtually addressed. The instruction cache is two-way set-associative with a total of 2 12 bytes of data storage, with 32-byte blocks. The data cache is two-way set-associative with a total of 2 13 bytes of data storage, with 32-byte blocks A. How many bits long is the tag field in each line of the instruction cache?

10 Page 10 of 17 There are 32 = 2 5 bytes per block. The cache has 2 12 total bytes and is 2-way set associative, so each set has 2 11 bytes and thus 2 11 /2 5 = 2 6 cache lines. So the address is partitioned by the cache as follows: [4:0] = 5 address bits for selecting byte/word within a block [10:5] = 6 address bits for selecting the cache line [31:11] = 21 address bit of tag field B. How many address bits are used to choose which line is accessed in the data cache? There are 32 = 2 5 bytes per block. The cache has 2 13 total bytes and is 2-way set associative, so each set has 2 12 bytes and thus 2 12 /2 5 = 2 7 cache lines. So the address is partitioned by the cache as follows: [4:0] = 5 address bits for selecting byte/word within a block [11:5] = 7 address bits for selecting the cache line [31:12] = 20 address bit of tag field C. Which of the following instruction addresses would never collide in the instruction cache with an instruction stored at location 0x0ACE6004? (A) 0x0BAD6004 (B) 0x0C81C81C (C) 0x (D) 0x0ACE6838 (E) 0xFACE6004 (F) 0x0CEDE008 Collisions happen when instruction addresses map to the same cache line. Referring to the answer for (A), address bits [10:5] are used to determine the cache line, so location 0x0ACE6004 is mapped to cache line 0. Only (D) 0x0ACE6838 maps to a different cache line and hence could never collide in the instruction cache with location 0x0ACE6004. D. What is the number of instructions in the largest instruction loop that could be executed with a 100% instruction cache hit rate during all but the first time through the loop? The instruction cache hold 2 12 bytes or 2 10 = 1024 instructions. So if the loop had 1024 instructions it would just fit into the cache. Problem 7. The following questions ask you to evaluate alternative cache designs using patterns of memory references taken from running programs. Each of the caches under consideration has a total capacity of 8 (4-byte) words, with one word stored in each cache line. The cache designs under consideration are: DM: a direct-mapped cache.

11 Page 11 of 17 S2: a 2-way set-associative cache with a least-recently-used replacement policy. FA: a fully-associative cache with a least-recently-used replacement policy. The questions below present a sequence of addresses for memory reads. You should assume the sequences repeat from the start whenever you see "...". Keep in mind that byte addressing is used; addresses of consecutive words in memory differ by 4. Each question asks which cache(s) give the best hit rate for the sequence. Answer by considering the steady-state hit rate, i.e., the percentage of memory references that hit in the cache after the sequence has been repeated many times. A. Which cache(s) have the best hit rate for the sequence 0, 16, 4, 36,... DM: locations 4 and 36 collide, so each iteration has 2 hits, 2 misses. S2: 100% hit rate. 0 and 16 map to the same cache line, as do 4 and 36, but since the cache is 2-way associative they don't collide. FA: 100% hit rate. The cache is only half filled by this loop. B. Which cache(s) have the best hit rate for the sequence 0, 4, 8, 12, 16, 20, 24, 28, 32,... DM: locations 0 and 32 collide, so each iteration has 7 hits, 2 misses. S2: locations 0, 16 and 32 all map to the same cache line. The LRU replacement strategy replaces 0 when accessing 32, 16 when accesing 0, 32 when accessing 16, etc., so each iteration has 6 hits, 3 misses. FA: has 0% hit rate in the steady state since the LRU replacement strategy throws out each location just before it's accessed by the loop! C. Which cache(s) have the best hit rate for the sequence 0, 4, 8, 12, 16, 20, 24, 28, 32, 28, 24, 20, 16, 12, 8, 4,... All caches perform the same -- locations 0 and 32 trade places in the caches, so each iteration has 14 hits and 2 misses. D. Which cache(s) have the best hit rate for the sequence 0, 4, 8, 12, 32, 36, 40, 44, 16,.. DM: 32 collides with 0, 36 with 4, 40 with 8, 44 with 12, so each itreation has only 1 hit and 8 misses.

12 Page 12 of 17 S2: locations 0, 16 and 32 trade places in the cache, so each iteration has 6 hits and 3 misses. FA: 0 hits since LRU throws out each location just before it's accessed by the loop. E. Assume that a cache access takes 1 cycle and a memory access takes 4 cycles. If a memory access is initiated only after the cache has missed, what is the maximum miss rate we can tolerate before use of the cache actually slows down accesses? If accesses always go to memory, it takes 4 cycles per access. Using the cache the average number of cycles per access is 1 + (miss rate)*4 So if the miss rate is larger than 75% the average number of cycles per access is more than 4. Problem 8. Ben Bitdiddle has been exploring various cache designs for use with the Beta processor. He is considering only caches with one word (4 bytes) per line. He is interested in the following cache designs: C1: 2-way set associative, LRU replacement, 256 total data words (128 sets of 2 words each). C2: 2-way set associative, random replacement, 256 total data words (128 sets of 2 words each). C3: 2-way set associative, LRU replacement, 512 total data words (256 sets of 2 words each). C4: 4-way set associative, LRU replacement, 512 total data words (128 sets of 4 words each). C5: Fully associative, LRU replacement, 512 total data words. In order to help her analysis, Ben is trying to identify cases where one cache dominates another in terms of cache hits. Ben considers that cache A dominates cache B if, given identical strings of memory references, every memory reference that gives a cache hit using B is also a hit using A. Thus if A dominates B, A will give at least as high a hit rate as B for every program. In each of the following pairs of caches, deduce whether the first dominates the second: A. C1 dominates C2 False. C1 has a 0% hit rate for 0, 256, 512, 0, 256, 512,..., but C2 might do slightly better because it chooses the replacement set at random. B. C2 dominates C1

13 Page 13 of 17 No. C1 has a 100% hit rate for 0, 256, 0, 256,..., but C2 might have an occasional miss. C. C3 dominates C1 Yes. C3 differs only in having a higher capacity than C1. D. C3 dominates C2 No. As we saw in (A) there are programs where LRU gets 0% hit rate and random may do slightly better, independently of the sizes of the caches. E. C4 dominates C3 No. C4 has 0% hit rate on 0, 128, 256, 384, 512, 0,... since all accesses map to the same cache line and LRU throws out the location just about to be accessed. In C3, 128 and 384 map to a different cache line than 0, 256 and 512, so manages a 40% hit rate in the steady state. F. C4 dominates C2 No, for the same reason as (A) and (D). G. C5 dominates C1 No. Consider the following access pattern: 0, accesses to 512 uncached locations whose addresses don't map to cache line 0 for cache C1, 0,... C5 will replace location 0 on the 513th access and hence miss when 0 is accessed in the following cycle. C1 will have location 0 still in the cache when it's accessed again by the loop. H. Averaged over a wide range of typical application programs, which of the above caches would you expect to yield the highest hit rate? In general larger caches are better and fully associative caches are better than set associative caches, so C5 should have the highest hit rate.

14 Page 14 of 17 Problem 9. Adverbs Unlimited is considering a computer system based loosely on the Beta. Five designs have been proposed, each of them similar to the Beta except for a cache between the 32-bit processor data bus and the main-memory subsystem. Like the Beta, each machine deals with 32-bit main-memory addresses, for a total address space of 2 32 bytes. The machines' caches differ only in the parameters of associativity, size, and writeback. The block size for each cache 1 word (4 bytes). Model Associativity Total data size (bytes) Write- DEFINATELY four-way 2 16 back CERTAINLY direct-mapped 2 16 back HOPEFULLY 4-way 2 10 through PERHAPS 2-way 2 10 back DOUBTFULLY direct-mapped 2 10 back A. How many bits are required for the tag portion of each cache line for each of the architectures? How bits of comparitor are needed? How many bits of SRAM altogether (including tag fields, valid and dirty bits). DEFINIATELY: 2 16 /4-way = 2 12 bytes/subcache => 2 10 cache lines/subcache => = 20 tag bits => 20 * 4 = 80 bits of comparator => total SRAM bits = 4*(8*2 12 data bits *(20 tag + 1 valid + 1 dirty)) CERTAINLY: 2 16 /1-way = 2 16 bytes/subcache => 2 14 cache lines => = 16 tag bits => 16 bits of comparator => total SRAM bits = 8*2 16 data bits *(16 tag + 1 valid + 1 dirty) HOPEFULLY: 2 10 /4-way = 2 8 bytes/subcache => 2 6 cache lines/subcach => 32-8 = 24 tag bits => 24 * 4 = 96 bits of comparator => total SRAM bits = 4*(8*2 8 data bits *(24 tag + 1 valid)) PERHAPS: 2 10 /2-way = 2 9 bytes/subcache => 2 7 cache lines/subcach => 32-9 = 23 tag bits => 23 * 2 = 46 bits of comparator => total SRAM bits = 2*(8*2 9 data bits *(23 tag + 1 valid + 1 dirty)) DOUBTFULLY: 2 10 /1-way = 2 10 bytes/subcache => 2 8 cache lines => = 22 tag bits => 22 bits of comparator => total SRAM bits = 8*2 10 data bits *(22 tag + 1 valid + 1 dirty)

15 Page 15 of 17 B. Address lines from the CPU are designated A31,..., A1, A0, where A0 is the low-order address bit. Which of these CPU address lines are used as address inputs to the SRAMs of the cache in the PERHAPS model? PERHAPS is a 2-way set-associative cache with a total of 2 10 bytes, so each directmapped subcache contains 2 9 bytes. With a block size of 1 word (4 bytes), address bits [8:2] would be used as the index into the 32-bit-wide SRAM. C. Suppose that address lines A2 and A9 were inadvertently interchanged in the cable between the DOUBTFULLY CPU and its cache. Which, if any, of the following statements best describes the effect(s) of this change, assuming that other hardware and software remain unmodified? A. The machine would no longer work. B. The machine would continue to work as before. C. The machine would continue to work, but at a reduced performance level. D. The machine would continue to work, at an improved performance level. (B). Address bits A2 through A9 are used as the cache index, interchanging them has no effect other than to change where in SRAM each cache line is stored, i.e., all the same locations are cached, they just happen to be stored in different cache SRAM locations than one might have expected. Problem 10. You are designing a controller for a tiny cache that is fully associative but has only three words in it. The cache has an LRU replacement policy. A reference record module (RRM) monitors references to the cache and always outputs the binary value 1, 2, or 3 on two output signals to indicate the least recently used cache entry. The RRM has two signal inputs, which can encode the number 0 (meaning no cache reference is occurring) or 1, 2, or 3 (indicating a reference to the corresponding word in the cache). A. What hit ratio will this cache achieve if faced with a repeating string of references to the following addresses: 100, 200, 104, 204, 200? Here's what happens: access 100: miss; cache contains 100, ---, --- access 200: miss; cache contains 200, 100, --- access 104: miss; cache contains 104, 200, 100 access 204: miss; cache contains 204, 104, 200 access 200: hit; cache contains 200, 204, 104

16 Page 16 of 17 access 100: miss; cache contains 100, 200, 204 access 200: hit; cache contains 200, 100, 204 access 104: miss; cache contains 104, 200, 100 access 204: miss; cache contains 204, 104, 200 access 200: hit; cache contains 200, 204, So in the steady state, location 200 stays in the cache and all other locations get replaced. So the hit rate is 2/5 or 40%. B. The RRM can be implemented as a finite-state machine. How many states does the RRM need to have? Why? There are 3! = 6 ways to list the three locations in order of use. Thus RRM needs 6 states, one state for each possible order. C. How many state bits does the RRM need to have? We can encode six states using 3 state bits. D. Draw a state-transition diagram for the RRM.

17 Page 17 of 17 E. Consider building an RRM for a 15-word fully associative cache. Write a mathematical expression for the number of bits in the ROM required in a ROM-and-register implementation of this RRM. (You need not calculate the numerical answer.) There are 15! possible states, so we would need ceiling(log2(15!)) = 41 state bits. So the ROM would have 2 41 locations of 41 bits each, for a total of approximately 90 trillion bits. F. Is it feasible to build the 15-word RRM above using a ROM and register in today's technology? Explain why or why not. 90 trillion bits is a bit much even for today's technology. In a.18u technology, a single transistor pulldown in a ROM might require (.2u x.5u) =.1u 2, so our ROM would require about 9 square meters of silicon!

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Comp 411 Computer Organization Fall 2006 Solutions Problem Set #10 Problem 1. Cache accounting The diagram below illustrates a blocked, direct-mapped cache

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip

More information

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Caches Part 1. Instructor: Sören Schwertfeger.   School of Information Science and Technology SIST CS 110 Computer Architecture Caches Part 1 Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's

More information

CS 61C: Great Ideas in Computer Architecture Caches Part 2

CS 61C: Great Ideas in Computer Architecture Caches Part 2 CS 61C: Great Ideas in Computer Architecture Caches Part 2 Instructors: Nicholas Weaver & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/fa15 Software Parallel Requests Assigned to computer eg,

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

CISC 360. Cache Memories Exercises Dec 3, 2009

CISC 360. Cache Memories Exercises Dec 3, 2009 Topics ν CISC 36 Cache Memories Exercises Dec 3, 29 Review of cache memory mapping Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. ν Hold frequently

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 233 6.2 Types of Memory 233 6.3 The Memory Hierarchy 235 6.3.1 Locality of Reference 237 6.4 Cache Memory 237 6.4.1 Cache Mapping Schemes 239 6.4.2 Replacement Policies 247

More information

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Circle the memory type based on electrically re-chargeable elements

ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Circle the memory type based on electrically re-chargeable elements ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Student name: Date: Example 1 Section: Memory hierarchy (SRAM, DRAM) Question # 1.1 Circle the memory type based on electrically re-chargeable elements

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H Katz http://insteecsberkeleyedu/~cs6c/ Parallel Requests Assigned to computer eg, Search

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Question 13 1: (Solution, p 4) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate.

Question 13 1: (Solution, p 4) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate. Questions 1 Question 13 1: (Solution, p ) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate. Question 13 : (Solution, p ) In implementing HYMN s control unit, the fetch cycle

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

6.004 Tutorial Problems L14 Cache Implementation

6.004 Tutorial Problems L14 Cache Implementation 6.004 Tutorial Problems L14 Cache Implementation Cache Miss Types Compulsory Miss: Starting with an empty cache, a cache line is first referenced (invalid) Capacity Miss: The cache is not big enough to

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit. Memory Hierarchy Goal: Fast, unlimited storage at a reasonable cost per bit. Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory. Fast: When you need something

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/19/17 Fall 2017 - Lecture #16 1 Parallel

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/16/17 Fall 2017 - Lecture #15 1 Outline

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts // CS C: Great Ideas in Computer Architecture (Machine Structures) s Part Instructors: Krste Asanović & Randy H Katz http://insteecsberkeleyedu/~csc/ Organization and Principles Write Back vs Write Through

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

Virtual Memory Worksheet

Virtual Memory Worksheet Virtual Memory Worksheet (v + p) bits in virtual address (m + p) bits in physical address 2 v number of virtual pages 2 m number of physical pages 2 p bytes per physical page 2 v+p bytes in virtual memory

More information

Memory Hierarchy & Caches Worksheet

Memory Hierarchy & Caches Worksheet Memory Hierarchy & Caches Worksheet Keep the most often-used data in a small, fast SRAM (often local to CPU chip). The reason this strategy works: LOCALITY. Locality of reference: Access to address X at

More information

Memory Hierarchy. Mehran Rezaei

Memory Hierarchy. Mehran Rezaei Memory Hierarchy Mehran Rezaei What types of memory do we have? Registers Cache (Static RAM) Main Memory (Dynamic RAM) Disk (Magnetic Disk) Option : Build It Out of Fast SRAM About 5- ns access Decoders

More information

ECE331 Homework 4. Due Monday, August 13, 2018 (via Moodle)

ECE331 Homework 4. Due Monday, August 13, 2018 (via Moodle) ECE331 Homework 4 Due Monday, August 13, 2018 (via Moodle) 1. Below is a list of 32-bit memory address references, given as hexadecimal byte addresses. The memory accesses are all reads and they occur

More information

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!) 7/4/ CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches II Instructor: Michael Greenbaum New-School Machine Structures (It s a bit more complicated!) Parallel Requests Assigned to

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Solutions for Chapter 7 Exercises

Solutions for Chapter 7 Exercises olutions for Chapter 7 Exercises 1 olutions for Chapter 7 Exercises 7.1 There are several reasons why you may not want to build large memories out of RAM. RAMs require more transistors to build than DRAMs

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic

More information

Chapter 6 Objectives

Chapter 6 Objectives Chapter 6 Memory Chapter 6 Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 24: Cache Performance Analysis Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Associative caches How do we

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

6.004 Tutorial Problems L14 Cache Implementation

6.004 Tutorial Problems L14 Cache Implementation 6.004 Tutorial Problems L14 Cache Implementation Cache Miss Types Compulsory Miss: Starting with an empty cache, a cache line is first referenced (invalid) Capacity Miss: The cache is not big enough to

More information

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!) Basic Memory Hierarchy Principles Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!) Cache memory idea Use a small faster memory, a cache memory, to store recently

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units CSCI 4717/5717 Computer Architecture Topic: Cache Memory Reading: Stallings, Chapter 4 Characteristics of Memory Location wrt Motherboard Inside CPU temporary memory or registers Motherboard main memory

More information

Memory Design. Cache Memory. Processor operates much faster than the main memory can.

Memory Design. Cache Memory. Processor operates much faster than the main memory can. Memory Design Cache Memory Processor operates much faster than the main memory can. To ameliorate the sitution, a high speed memory called a cache memory placed between the processor and main memory. Barry

More information

The check bits are in bit numbers 8, 4, 2, and 1.

The check bits are in bit numbers 8, 4, 2, and 1. The University of Western Australia Department of Electrical and Electronic Engineering Computer Architecture 219 (Tutorial 8) 1. [Stallings 2000] Suppose an 8-bit data word is stored in memory is 11000010.

More information

CAM Content Addressable Memory. For TAG look-up in a Fully-Associative Cache

CAM Content Addressable Memory. For TAG look-up in a Fully-Associative Cache CAM Content Addressable Memory For TAG look-up in a Fully-Associative Cache 1 Tagin Fully Associative Cache Tag0 Data0 Tag1 Data1 Tag15 Data15 1 Tagin CAM Data RAM Tag0 Data0 Tag1 Data1 Tag15 Data15 Tag

More information

Chapter 4 Main Memory

Chapter 4 Main Memory Chapter 4 Main Memory Course Outcome (CO) - CO2 Describe the architecture and organization of computer systems Program Outcome (PO) PO1 Apply knowledge of mathematics, science and engineering fundamentals

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies 1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of

More information

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Physical characteristics (such as packaging, volatility, and erasability Organization.

Physical characteristics (such as packaging, volatility, and erasability Organization. CS 320 Ch 4 Cache Memory 1. The author list 8 classifications for memory systems; Location Capacity Unit of transfer Access method (there are four:sequential, Direct, Random, and Associative) Performance

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for

More information

Why memory hierarchy

Why memory hierarchy Why memory hierarchy (3 rd Ed: p.468-487, 4 th Ed: p. 452-470) users want unlimited fast memory fast memory expensive, slow memory cheap cache: small, fast memory near CPU large, slow memory (main memory,

More information

Review: Computer Organization

Review: Computer Organization Review: Computer Organization Cache Chansu Yu Caches: The Basic Idea A smaller set of storage locations storing a subset of information from a larger set. Typically, SRAM for DRAM main memory: Processor

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017 CS 433 Homework 5 Assigned on 11/7/2017 Due in class on 11/30/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging 6.823, L8--1 Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Highly-Associative

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Bernhard Boser & Randy H Katz http://insteecsberkeleyedu/~cs61c/ 10/18/16 Fall 2016 - Lecture #15 1 Outline

More information

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27

More information

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations? Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) Chapter 7-1 Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授 V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) 臺大電機吳安宇教授 - 計算機結構 1 Outline 7.1 Introduction 7.2 The Basics of Caches

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

Lecture 11 Cache. Peng Liu.

Lecture 11 Cache. Peng Liu. Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

Computer Architecture Memory hierarchies and caches

Computer Architecture Memory hierarchies and caches Computer Architecture Memory hierarchies and caches S Coudert and R Pacalet January 23, 2019 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Consider the following 16-bit machine (datapath and register width is 2B).

Consider the following 16-bit machine (datapath and register width is 2B). Consider the following 16-bit machine (datapath and register width is 2B). -- Virtual memory: -- byte addressable. -- 16-bit virtual addresses. -- the VMAR is 16-bits. -- Physical memory: -- byte addressable.

More information

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3. 5 Solutions Chapter 5 Solutions S-3 5.1 5.1.1 4 5.1.2 I, J 5.1.3 A[I][J] 5.1.4 3596 8 800/4 2 8 8/4 8000/4 5.1.5 I, J 5.1.6 A(J, I) 5.2 5.2.1 Word Address Binary Address Tag Index Hit/Miss 5.2.2 3 0000

More information

Structure of Computer Systems

Structure of Computer Systems 222 Structure of Computer Systems Figure 4.64 shows how a page directory can be used to map linear addresses to 4-MB pages. The entries in the page directory point to page tables, and the entries in a

More information

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of

More information

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache. ECE 201 - Lab 8 Logic Design for a Direct-Mapped Cache PURPOSE To understand the function and design of a direct-mapped memory cache. EQUIPMENT Simulation Software REQUIREMENTS Electronic copy of your

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

Review : Pipelining. Memory Hierarchy

Review : Pipelining. Memory Hierarchy CS61C L11 Caches (1) CS61CL : Machine Structures Review : Pipelining The Big Picture Lecture #11 Caches 2009-07-29 Jeremy Huddleston!! Pipeline challenge is hazards "! Forwarding helps w/many data hazards

More information

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Memory. From Chapter 3 of High Performance Computing. c R. Leduc Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor

More information

Question?! Processor comparison!

Question?! Processor comparison! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!

More information

ECEN/CSCI 4593 Computer Organization and Design Exam-2

ECEN/CSCI 4593 Computer Organization and Design Exam-2 ECE/CSCI 4593 Computer Organization and Design Exam-2 ame: Write your initials at the top of each page. You are allowed one 8.5X11 page of notes. o interaction is allowed between students. Do not open

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information