Slide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng
|
|
- Deirdre Briggs
- 5 years ago
- Views:
Transcription
1 Slide Set 5 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017
2 ENCM 501 W17 Lectures: Slide Set 5 slide 2/37 Contents Review of example from Slide Set 4 SRAM cells and CAM cells Direct-mapped caches More about set-associative cache organization Fully-associative caches Options for handling writes to caches Multi-level caches Caches and Virtual Memory
3 ENCM 501 W17 Lectures: Slide Set 5 slide 3/37 Outline of Slide Set 5 Review of example from Slide Set 4 SRAM cells and CAM cells Direct-mapped caches More about set-associative cache organization Fully-associative caches Options for handling writes to caches Multi-level caches Caches and Virtual Memory
4 ENCM 501 W17 Lectures: Slide Set 5 slide 4/37 Review: Example computer with only one level of cache and no virtual memory CORE L1 I- CACHE L1 D- CACHE DRAM CONTROLLER DRAM MODULES We re looking at this simple system because it helps us to think about cache design and performance issues while avoiding the complexity of real systems like the Intel i7 shown in textbook Figure 2.21.
5 ENCM 501 W17 Lectures: Slide Set 5 slide 5/37 Review: Example data cache organization index 8 Key 8-to-256 decoder. set 0 set 1 set 2. set 253 set 254 set 255 way 0. way 1 block status: 1 valid bit and 1 dirty bit per block tag: 1 18-bit stored tag per block data: 64-byte (512-bit) data block set status: 1 LRU bit per set.. This could fit within our simple example hierarchy, but is also not much different from some current L1 D-cache designs.
6 ENCM 501 W17 Lectures: Slide Set 5 slide 6/37 Valid bits in caches going 1 0 I hope it s obvious why V bits for blocks go 0 1. But why might V bits go 1 0? In other words, why does it sometimes make sense to invalidate one or more cache blocks? Here are two big reasons. (There are likely some other good reasons.) DMA: direct memory access. Instruction writes by O/S kernels and by programs that write their own instructions. Let s make some notes about each of these reasons.
7 ENCM 501 W17 Lectures: Slide Set 5 slide 7/37 Outline of Slide Set 5 Review of example from Slide Set 4 SRAM cells and CAM cells Direct-mapped caches More about set-associative cache organization Fully-associative caches Options for handling writes to caches Multi-level caches Caches and Virtual Memory
8 ENCM 501 W17 Lectures: Slide Set 5 slide 8/37 Storage cells in the example cache Data blocks are implemented with SRAM cells. The design tradeoffs for the cell design relate to speed, chip area, and energy use per read or write. Tags and status bits might be SRAM cells or might be CAM ( content addressable memory ) cells. A CMOS CAM cell uses the same 6-transistor structure as an SRAM cell for reads, writes, and holding a stored bit for long periods of time during which there are neither reads nor writes. A CAM cell also has 3 or 4 extra transistors that help in determining whether the bit pattern in a group of CAM cells (e.g., a stored tag) matches some other bit pattern (e.g., a search tag).
9 ENCM 501 W17 Lectures: Slide Set 5 slide 9/37 CAM cells organized to make a J-bit stored tag BL J 1 BL J 1 BL J 2 BL J 2 BL 0 BL 0 WL i... MATCH i CAM CELL CAM CELL... CAM CELL For reads or writes the wordline and bitlines plays the same roles they do in a row of an SRAM array. To check for a match, the wordline is held LOW, and the search tag is applied to the bitlines. If every search tag bit matches the corresponding stored tag bit, the matchline stays HIGH; if there is even a single mismatch, the matchline goes LOW.
10 ENCM 501 W17 Lectures: Slide Set 5 slide 10/37 CAM cells versus SRAM cells for stored tags With CAM cells, tag comparisons can be done in place. With SRAM cells, stored tags would have to be read via bitlines to comparator circuits outside the tag array, which is a slower process. (Schematics of caches in the textbook tend to show tag comparison done outside of tag arrays, but that is likely done to show that a comparison is needed, not to indicate physical design.) CAM cells are larger than SRAM cells. But the total area needed for CAM-cell tag arrays will still be much smaller than the total area needed for SRAM data blocks. Would it make sense to use CAM cells for V (valid) bits?
11 ENCM 501 W17 Lectures: Slide Set 5 slide 11/37 Outline of Slide Set 5 Review of example from Slide Set 4 SRAM cells and CAM cells Direct-mapped caches More about set-associative cache organization Fully-associative caches Options for handling writes to caches Multi-level caches Caches and Virtual Memory
12 ENCM 501 W17 Lectures: Slide Set 5 slide 12/37 Cache line is a synonym for cache block Hennessy and Patterson are fairly consistent in their use of the term cache block. A lot of other literature uses the term cache block. However, the term cache line, which means the same thing, is also in wide use. So in other literature, you may read things like... In a 4-way set-associative cache, an index finds a set containing 4 cache lines. In a direct-mapped cache, there is one cache line per index. A cache miss, even if it is for access to a single byte, will result in the transfer of an entire cache line.
13 ENCM 501 W17 Lectures: Slide Set 5 slide 13/37 Direct-mapped caches A direct-mapped cache can be thought of as a special case of a set-associative cache, in which there is only one way. For a given cache capacity, a direct-mapped cache is easier to build than an N-way set-associative cache with N 2: no logic is required to find the correct way for data transfer after a hit is detected; no logic is needed to decide which block in a set to replace in handling a miss. Direct-mapped caches may also be faster and more energy-efficient. However, direct-mapped caches are vulnerable to index conflicts (sometimes called index collisions).
14 ENCM 501 W17 Lectures: Slide Set 5 slide 14/37 Let s think about changing the design of our example 32 KB 2-way set-associative data cache to a direct-mapped organization. If the capacity stays at 32 KB and the block size remains at 64 bytes, how does the change in organization affect the number of sets, the widths of indexes and tags, and the number of memory cells needed?
15 ENCM 501 W17 Lectures: Slide Set 5 slide 15/37 Data cache index conflict example Consider this sketch of a C function: int g1, g2, g3, g4; void func(int *x, int n) { int loc[10], k; while ( condition ) { } } make accesses to g1, g2, g3, g4 and loc What will happen in the following scenario? the addresses of g1 to g4 are 0x0804 fff0 to 0x0804 fffc the address of loc[0] is 0xbfbd fff0
16 ENCM 501 W17 Lectures: Slide Set 5 slide 16/37 Instruction cache index conflict example Suppose a program spends much of its time in a loop within function f... void f(double *x, double *y, int n) { int k; for (k = 0; k < n; k++) y[k] = g(x[k]) + h(x[k]); } Suppose that g and h are small, simple functions that don t call other functions. What kind of bad luck could cause huge numbers of misses in a direct-mapped instruction cache?
17 ENCM 501 W17 Lectures: Slide Set 5 slide 17/37 Outline of Slide Set 5 Review of example from Slide Set 4 SRAM cells and CAM cells Direct-mapped caches More about set-associative cache organization Fully-associative caches Options for handling writes to caches Multi-level caches Caches and Virtual Memory
18 ENCM 501 W17 Lectures: Slide Set 5 slide 18/37 Motivation for set-associative caches (1) Qualitatively: In our example of data cache index conflicts, the conflicts go away if the cache is changed from direct-mapped to 2-way set-associative. In our example of instruction cache index conflicts, in the worst case, the conflicts go away if the cache is changed from direct-mapped to 4-way set-associative. Quantitatively, see Figure B.8 on page B-24 of the textbook. Conflict misses are a big problem in direct-mapped caches. Moving from direct-mapped to 2-way to 4-way to 8-way reduces the conflict miss rate at each step.
19 ENCM 501 W17 Lectures: Slide Set 5 slide 19/37 Motivation for set-associative caches (2) Detailed studies show that 4-way set-associativity is good enough to eliminate almost all conflict misses. But many practical cache designs are 8- or even 16-way set-associative. There must be reasons for this other than the desire to avoid conflict misses. We ll come back to this question later.
20 ENCM 501 W17 Lectures: Slide Set 5 slide 20/37 Replacement strategies in set-associative caches Let N be the number of ways. With N = 2, LRU replacement is easy to implement a single bit in each set can track which block should be replaced on a miss in that set. Exact LRU replacement is harder to implement with N > 2. LRU status bits would have to somehow encode a list of least-to-most-recent accesses within a set. However, choice of replacement strategy between various reasonable options seems to have very little effect on miss rate see Figure B.4 on page B-10 of the textbook. So we re not going to study cache block replacement strategy in detail in ENCM 501.
21 ENCM 501 W17 Lectures: Slide Set 5 slide 21/37 Outline of Slide Set 5 Review of example from Slide Set 4 SRAM cells and CAM cells Direct-mapped caches More about set-associative cache organization Fully-associative caches Options for handling writes to caches Multi-level caches Caches and Virtual Memory
22 ENCM 501 W17 Lectures: Slide Set 5 slide 22/37 Fully-associative caches A fully-associative cache can be thought of as an N-way set-associative cache in which N is equal to the number of blocks. In this way of thinking, how many sets are there in a fully-associative cache? What is the width of an index? For energy use, how would hit detection in a fully-associative cache compare with hit detection in a direct-mapped or (small N) set-associative cache with the same capacity?
23 ENCM 501 W17 Lectures: Slide Set 5 slide 23/37 Outline of Slide Set 5 Review of example from Slide Set 4 SRAM cells and CAM cells Direct-mapped caches More about set-associative cache organization Fully-associative caches Options for handling writes to caches Multi-level caches Caches and Virtual Memory
24 ENCM 501 W17 Lectures: Slide Set 5 slide 24/37 Different options for handling writes See Q4: What Happens on a Write? on pages B-10 to B-12 of the textbook for details. There is not much I can put into lecture slides to improve on the clarity of that material. Note that the cache access examples presented at the end of Slide Set 4 assume a write-back policy, using write allocate in the case of a write miss.
25 ENCM 501 W17 Lectures: Slide Set 5 slide 25/37 Write buffers: Location and purpose A question that could arise from recent lecture material: In a system with one level of cache, is a write buffer part of the L1 D-cache or part of the DRAM controller? Partial answer: Neither the textbook nor Web search results are perfectly clear about this. But it s probably simpler to think of the write buffer as part of the L1 D-cache. Regardless, it s more important to understand the purpose of a write buffer: to securely hold pending main memory updates while minimizing the need of a processor core to stall.
26 ENCM 501 W17 Lectures: Slide Set 5 slide 26/37 Write buffers are hardware queues In general, a write buffer can be thought of as a collection of write buffer entries, in which each entry is a triple: (address, data size, data) The key idea is that a write buffer holds multiple pending writes, in case there is a flurry of cache misses, each causing replacement of a dirty block in a write-back cache; a simple flurry of writes to a write-through cache; some other situation that generates writes to main memory faster than main memory can receive them. A processor should only stall on a write if a write buffer is full has as many pending writes as it can hold which should be an unusual event.
27 ENCM 501 W17 Lectures: Slide Set 5 slide 27/37 Outline of Slide Set 5 Review of example from Slide Set 4 SRAM cells and CAM cells Direct-mapped caches More about set-associative cache organization Fully-associative caches Options for handling writes to caches Multi-level caches Caches and Virtual Memory
28 ENCM 501 W17 Lectures: Slide Set 5 slide 28/37 Multi-level caches A common configuration: A core has an L1 I-cache and an L1 D-cache, each with 32 KB capacity, and a unified L2 cache with 256 KB capacity. So there is a total of 320 KB of cache capacity for this core. Why is it not better to use the chip area for a 160 KB L1 I-cache and a 160 KB L1 D-cache? Give two reasons, one related to SRAM array design issues, and another related to memory use patterns of software programs. The same ideas apply to systems with 3 levels of cache. Other complicated design concerns come up when multiple cores share access to a single L2 or L3 cache.
29 ENCM 501 W17 Lectures: Slide Set 5 slide 29/37 AMAT for a 2-level cache system A formula from the textbook: AMAT = L1 hit time + L1 miss rate (L2 hit time + L2 miss rate L2 miss penalty) What is AMAT if there are no cache misses at all? What is AMAT if there are some L1 misses but no L2 misses? What are design criteria for the L1 and L2 caches, given that the goal is to minimize AMAT?
30 ENCM 501 W17 Lectures: Slide Set 5 slide 30/37 This definition (not from the textbook!) is incorrect: For a system with two levels of caches, the L2 hit rate of a program is the number of L2 hits divided by the total number of memory accesses. What is a correct definition for L2 hit rate, compatible with the formula for AMAT?
31 ENCM 501 W17 Lectures: Slide Set 5 slide 31/37 L2 cache design tradeoffs An L1 cache must keep up with a processor core. That is a challenge to a circuit design team but keeps the problem simple: If a design is too slow, it fails. For an L2 cache, the tradeoffs are more complex: Increasing capacity improves L2 miss rate but makes L2 hit time, chip area and (probably) energy use worse. Decreasing capacity improves L2 hit time, chip area, and (probably) energy use, but makes L2 miss rate worse. Suppose L1 hit time = 1 cycle, L1 miss rate = 0.020, L2 miss penalty = 100 cycles. Which is better, considering AMAT only, not chip area or energy? (a) L2 hit time = 10 cycles, L2 miss rate = 0.50 (b) L2 hit time = 12 cycles, L2 miss rate = 0.40
32 ENCM 501 W17 Lectures: Slide Set 5 slide 32/37 3 C s of cache misses: compulsory, capacity, conflict It s useful to think about the causes of cache misses. Compulsory misses (sometimes called cold misses ) happen on access to instructions or data that have never been in a cache. Examples include: instruction fetches in a program that has just been copied from disk to main memory; data reads of information that has just been copied from a disk controller or network interface to main memory. Compulsory misses would happen even if a cache had the same capacity as the main memory the cache was supposed to mirror.
33 ENCM 501 W17 Lectures: Slide Set 5 slide 33/37 Capacity misses This kind of miss arises because a cache is not big enough to contain all the instructions and/or data a program accesses while it runs. Capacity misses for a program can be counted by simulating a program run with a fully-associative cache of some fixed capacity. Since instruction and data blocks can be placed anywhere within a fully-associative cache, it s reasonable to assume that any miss on access to a previously accessed instruction or data item in a fully-associative cache occurs because the cache is not big enough. Why is this a good but not perfect approximation?
34 ENCM 501 W17 Lectures: Slide Set 5 slide 34/37 Conflict misses Conflict misses (also called collision misses ) occur in direct-mapped and N-way set-associative caches because too many accesses to memory generate a common index. In the absence of a 4th kind of miss coherence misses, which can happen when multiple processors share access to a memory system we can write: conflict misses = total misses compulsory misses capacity misses The main idea behind increasing set-associativity is to reduce conflict misses without the time and energy problems of a fully-associative cache.
35 ENCM 501 W17 Lectures: Slide Set 5 slide 35/37 3 C s: Data from experiments Textbook Figure B.8 has a lot of data; it s unreasonable to try to jam all of that data into a few lecture slides. So here s a subset of the data, for 8 KB capacity. N is the degree of associativity, and miss rates are in misses per thousand accesses. miss rates N compulsory capacity conflict < < 0.5 This is real data from practical applications. It is worthwhile to study the table to see what general patterns emerge.
36 ENCM 501 W17 Lectures: Slide Set 5 slide 36/37 Outline of Slide Set 5 Review of example from Slide Set 4 SRAM cells and CAM cells Direct-mapped caches More about set-associative cache organization Fully-associative caches Options for handling writes to caches Multi-level caches Caches and Virtual Memory
37 ENCM 501 W17 Lectures: Slide Set 5 slide 37/37 Caches and Virtual Memory Both are essential systems to support applications running on modern operating systems. As mentioned in Slide Set 4, it really helps to keep in mind what problems are solved by caches and what very different problems are solved by virtual memory. Caches are an impressive engineering workaround for difficult facts about relative latencies of memory arrays. Virtual memory (VM) is a concept, a great design idea, that solves a wide range of problems for computer systems in which multiple applications are sharing resources. Slide Set 6 will be about virtual memory and interactions between cache systems and virtual memory systems.
Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationENCM 501 Winter 2015 Tutorial for Week 5
ENCM 501 Winter 2015 Tutorial for Week 5 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 11 February, 2015 ENCM 501 Tutorial 11 Feb 2015 slide
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationContents Slide Set 9. Final Notes on Textbook Chapter 7. Outline of Slide Set 9. More about skipped sections in Chapter 7. Outline of Slide Set 9
slide 2/41 Contents Slide Set 9 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014
More informationSpring 2018 :: CSE 502. Cache Design Basics. Nima Honarmand
Cache Design Basics Nima Honarmand Storage Hierarchy Make common case fast: Common: temporal & spatial locality Fast: smaller, more expensive memory Bigger Transfers Registers More Bandwidth Controlled
More informationSpring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand
Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required
More informationSlide Set 8. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 8 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationSlide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng
Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationInteger Multiplication and Division
Integer Multiplication and Division for ENCM 369: Computer Organization Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 208 Integer
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationLecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University
Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationLecture 11 Cache. Peng Liu.
Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationLecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter
Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "
More information1/19/2009. Data Locality. Exploiting Locality: Caches
Spring 2009 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic Data Locality Temporal: if data item needed now, it is likely to be needed again in near future Spatial: if data item needed now, nearby
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationCache Memory and Performance
Cache Memory and Performance Cache Performance 1 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP)
More informationCOMP 3221: Microprocessors and Embedded Systems
COMP 3: Microprocessors and Embedded Systems Lectures 7: Cache Memory - III http://www.cse.unsw.edu.au/~cs3 Lecturer: Hui Wu Session, 5 Outline Fully Associative Cache N-Way Associative Cache Block Replacement
More informationCS152 Computer Architecture and Engineering Lecture 17: Cache System
CS152 Computer Architecture and Engineering Lecture 17 System March 17, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http//http.cs.berkeley.edu/~patterson
More informationQuestion?! Processor comparison!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!
More informationLet!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies
1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II 27-8-6 Scott Beamer, Instructor New Flow Based Routers CS61C L24 Cache II (1) www.anagran.com Caching Terminology When we try
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Write-Back Alternative: On data-write hit, just
More informationSystems Programming and Computer Architecture ( ) Timothy Roscoe
Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture
More informationSlides for Lecture 15
Slides for Lecture 15 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 6 March,
More informationCS161 Design and Architecture of Computer Systems. Cache $$$$$
CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016
Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss
More informationECE 485/585 Microprocessor System Design
Microprocessor System Design Lecture 8: Principle of Locality Cache Architecture Cache Replacement Policies Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering and Computer
More informationEEC 483 Computer Organization
EEC 48 Computer Organization 5. The Basics of Cache Chansu Yu Caches: The Basic Idea A smaller set of storage locations storing a subset of information from a larger set (memory) Unlike registers or memory,
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationENCM 369 Winter 2016 Lab 11 for the Week of April 4
page 1 of 13 ENCM 369 Winter 2016 Lab 11 for the Week of April 4 Steve Norman Department of Electrical & Computer Engineering University of Calgary April 2016 Lab instructions and other documents for ENCM
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2014 Lecture 14 LAST TIME! Examined several memory technologies: SRAM volatile memory cells built from transistors! Fast to use, larger memory cells (6+ transistors
More informationCS 136: Advanced Architecture. Review of Caches
1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you
More informationCS3350B Computer Architecture
CS3350B Computer Architecture Winter 2015 Lecture 3.1: Memory Hierarchy: What and Why? Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationVirtual Memory. Stefanos Kaxiras. Credits: Some material and/or diagrams adapted from Hennessy & Patterson, Hill, online sources.
Virtual Memory Stefanos Kaxiras Credits: Some material and/or diagrams adapted from Hennessy & Patterson, Hill, online sources. Caches Review & Intro Intended to make the slow main memory look fast by
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationMemory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar
More informationCache Optimisation. sometime he thought that there must be a better way
Cache sometime he thought that there must be a better way 2 Cache 1. Reduce miss rate a) Increase block size b) Increase cache size c) Higher associativity d) compiler optimisation e) Parallelism f) prefetching
More informationQ3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache
Fundamental Questions Computer Architecture and Organization Hierarchy: Set Associative Q: Where can a block be placed in the upper level? (Block placement) Q: How is a block found if it is in the upper
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 07: Caches II Shuai Wang Department of Computer Science and Technology Nanjing University 63 address 0 [63:6] block offset[5:0] Fully-AssociativeCache Keep blocks
More informationViews of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)
CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More information3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationMemory Hierarchy: Caches, Virtual Memory
Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/16/17 Fall 2017 - Lecture #15 1 Outline
More informationLecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program
More informationVirtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Precise Definition of Virtual Memory Virtual memory is a mechanism for translating logical
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationPage 1. Goals for Today" TLB organization" CS162 Operating Systems and Systems Programming Lecture 11. Page Allocation and Replacement"
Goals for Today" CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement" Finish discussion on TLBs! Page Replacement Policies! FIFO, LRU! Clock Algorithm!! Working Set/Thrashing!
More informationCache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris
More informationCMPT 300 Introduction to Operating Systems
CMPT 300 Introduction to Operating Systems Cache 0 Acknowledgement: some slides are taken from CS61C course material at UC Berkeley Agenda Memory Hierarchy Direct Mapped Caches Cache Performance Set Associative
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #21: Caches 3 2005-07-27 CS61C L22 Caches III (1) Andy Carle Review: Why We Use Caches 1000 Performance 100 10 1 1980 1981 1982 1983
More informationCS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches
CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single
More informationCaches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST
CS 110 Computer Architecture Caches Part 1 Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =
More informationThe Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):
The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Bernhard Boser & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/24/16 Fall 2016 - Lecture #16 1 Software
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationLocality. Cache. Direct Mapped Cache. Direct Mapped Cache
Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be
More informationSlides for Lecture 6
Slides for Lecture 6 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 28 January,
More information13-1 Memory and Caches
13-1 Memory and Caches 13-1 See also cache study guide. Contents Supplement to material in section 5.2. Includes notation presented in class. 13-1 EE 4720 Lecture Transparency. Formatted 13:15, 9 December
More informationMemory Hierarchy. Mehran Rezaei
Memory Hierarchy Mehran Rezaei What types of memory do we have? Registers Cache (Static RAM) Main Memory (Dynamic RAM) Disk (Magnetic Disk) Option : Build It Out of Fast SRAM About 5- ns access Decoders
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor
More informationEECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements
EECS15 - Digital Design Lecture 11 SRAM (II), Caches September 29, 211 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http//www-inst.eecs.berkeley.edu/~cs15 Fall
More informationCaches Concepts Review
Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on
More informationPage 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "
Review Address Segmentation " CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 23, 2011! Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! 1111 0000" 1110 000" Seg #"
More informationSee also cache study guide. Contents Memory and Caches. Supplement to material in section 5.2. Includes notation presented in class.
13 1 Memory and Caches 13 1 See also cache study guide. Contents Supplement to material in section 5.2. Includes notation presented in class. 13 1 EE 4720 Lecture Transparency. Formatted 9:11, 22 April
More informationCSE 153 Design of Operating Systems
CSE 53 Design of Operating Systems Winter 28 Lecture 6: Paging/Virtual Memory () Some slides modified from originals by Dave O hallaron Today Address spaces VM as a tool for caching VM as a tool for memory
More informationReview : Pipelining. Memory Hierarchy
CS61C L11 Caches (1) CS61CL : Machine Structures Review : Pipelining The Big Picture Lecture #11 Caches 2009-07-29 Jeremy Huddleston!! Pipeline challenge is hazards "! Forwarding helps w/many data hazards
More informationCMPSC 311- Introduction to Systems Programming Module: Caching
CMPSC 311- Introduction to Systems Programming Module: Caching Professor Patrick McDaniel Fall 2016 Reminder: Memory Hierarchy L0: Registers CPU registers hold words retrieved from L1 cache Smaller, faster,
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationCS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging
CS162 Operating Systems and Systems Programming Lecture 14 Caching and Demand Paging October 17, 2007 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs162 Review: Hierarchy of a Modern Computer
More information10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts
// CS C: Great Ideas in Computer Architecture (Machine Structures) s Part Instructors: Krste Asanović & Randy H Katz http://insteecsberkeleyedu/~csc/ Organization and Principles Write Back vs Write Through
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.10, 5.15; Also, 5.13 & 5.17 Writing to caches: policies, performance Cache tradeoffs and
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationCS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement"
CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement" October 3, 2012 Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Lecture 9 Followup: Inverted Page Table" With
More information