Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches
|
|
- Adrian Price
- 5 years ago
- Views:
Transcription
1 Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging 6.823, L8--1 Asanovic Laboratory for Computer Science M.I.T. Highly-Associative Caches 6.823, L8--2 For high associativity, use content-addressable memory (CAM) for tags Used in low-power microprocessors, e.g. StrongARM is 32-way setassociative. (Higher hit rates at lower energy than 2-4 way set-ass. RAM tags) Comparator per tag requires more transistors (~double area per tag bit) Address tag t set i offset b Set Set 1 i Set 0 Tag =? Data Block Tag =? Data Block Tag Tag =? =? Data Data Block Block Tag =? Data Block Tag =? Data Block Tag =? Data Block Tag =? Data Block Tag =? Data Block Only one set enabled Only hit data accessed Hit? Data Page 1
2 Replacement Policy In an associative cache, which block from a set should be evicted when the set becomes full? 6.823, L8--3 Random Least Recently Used (LRU) LRU cache state must be updated on every access true implementation only feasible for small sets (2-way easy) pseudo-lru binary tree often used for 4-8 way e.g. 3 state bits for 4-way pseudo-lru: First In, First Out (FIFO) aka. Round-Robin used in highly associative caches way3 way2 way1 way0 This is a second-order effect. How often does replacement happen? CPU-Cache Interaction (Simple 5-stage pipeline) 6.823, L8--4 PCen PC 0x4 Add nop addr inst hit? Primary Instruction Cache IR D Decode, Register Fetch E A B MD1 M ALU Y MD2 we addr Primary Data rdata Cache hit? wdata R Stall entire CPU on data cache miss To Memory Control Cache Refill Data from Lower Levels of Memory Hierarchy Page 2
3 Write Policy 6.823, L8--5 Cache hit: write through: write both cache & memory - generally higher traffic but simplifies cache coherence write back: write cache only (memory is written only when the entry is evicted) - a dirty bit per block can further reduce the traffic Cache miss: no write allocate: only write to main memory write allocate: (aka fetch on write) fetch block into cache Common combinations: write through and no write allocate write back with write allocate Managing Cache Writes 6.823, L8--6 In a direct-mapped cache, can we write cache data RAM in same cycle as cache tag RAM read? In a highly-associative cache with CAM tags, can we write cache data in the same cycle as tag CAM search? Page 3
4 Pipelining Cache Writes 6.823, L8--7 Possible solutions: - Writes take two cycles in memory stage, one cycle for tag check plus one cycle for data write if hit - Design data RAM that can perform read and write in one cycle, restore old value after tag miss - Hold write data for store in single buffer ahead of cache, write cache data during next store s tag check - Need to bypass from write buffer if read matches write buffer tag Use CAM tags --- data write only enabled if hit Cache Performance 6.823, L8--8 Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: reduce the hit time reduce the miss rate (e.g., larger cache) reduce the miss penalty (e.g., L2 cache) First order effect: the size and the hit time Ÿdesign the largest primary cache without slowing down the clock or adding pipeline stages Page 4
5 Causes for Cache Misses 6.823, L8--9 Compulsory: first-reference aka cold start misses - misses that would occur even with infinite cache Capacity: cache is too small to hold all data needed by the program - misses that would occur even under perfect placement & replacement policy Conflict: misses that occur because of collisions due to block-placement strategy - misses that would not occur with full associativity Determining the type of a miss requires running program traces on a cache simulator Effect of Cache Parameters on Performance 6.823, L8--10 Larger cache size + reduces capacity and conflict misses - hit time may increase Larger block size + spatial locality reduces compulsory misses and capacity reload misses - fewer blocks may increase conflict miss rate - larger blocks may increase miss penalty Higher associativity + reduces conflict misses (up to around 4-8 way) - hit time may increase Page 5
6 Reducing Hit Time 6.823, L8--11 On-chip versus off-chip caches Pipelining write tag check and data update for single-cycle write-hits Sum-Addressed Caches (UltraSPARC-III) - Can evaluate A+B=C equality faster than adding A and B! - In register+offset addressing mode, no need to perform addition before cache access - Tag compare unit performs A+B=C operation Pseudo-associative caches (way-predicting caches) - Guess which way will have hit, only look there first - Check other ways sequentially on miss - Combine direct-mapped hit time, associative miss rate - Larger hit time if way prediction poor Techniques to Further Reduce Cache Miss Rate 6.823, L8--12 Hardware prefetching of instructions and data Can remove compulsory misses Stream buffers predict sequential or strided accesses Speculative fetches can add useless memory traffic Software prefetching Software prefetch requires extending the ISA with register and cache prefetch instructions Takes extra instruction issue slots Software tuned to one hardware implementation Other software techniques placement (array padding) to reduce cache conflicts blocking transformations (reuse data once in cache) to reduce capacity misses Page 6
7 Reducing Miss Penalty 6.823, L8--13 Give priority to read-misses over writes and write-backs - queue writes in write buffer, let read overtake writes to memory - must check write buffer for match on read address Multi-level caches - reduced latency on a primary cache miss Fetch critical word in block first (aka wrap-around refill) - restart the processor as soon as needed word arrives Sub-block placement, fetch only part of a block on miss - small tag overhead from large blocks, but low miss penalty from small sub-blocks - need an extra valid bit per sub-block V Tag V sub-blk. 0 V sub-blk. 1 V sub-blk. 2 V sub-blk. 3 Victim caches - hold recently evicted blocks nearby to reduces conflict miss penalty for low-associativity caches Non-blocking caches to reduce stalls on cache misses Memory Management 6.823, L8--14 The Fifties: - Absolute Addresses - Dynamic address translation The Sixties: - Paged memory systems and TLBs - Atlas Demand paging Modern Virtual Memory Systems (next lecture) Page 7
8 Types of Names for Memory Locations 6.823, L8--15 machine language address ISA virtual address Address Mapping physical address Physical Memory (DRAM) Machine language address Ÿ as specified in machine code Virtual address Ÿ ISA specifies translation of machine code address into virtual address of program variable (may involve segment registers etc.) Physical address Ÿ operating system specifies mapping of virtual address into name for a physical memory location (i.e., actual address signals going to DRAM chips) EDSAC, early 50 s Absolute Addresses 6.823, L8--16 effective address = physical memory address Only one program ran at a time, with unrestricted access to entire machine (RAM + I/O devices) Addresses in a program depended upon where the program was to be loaded in memory But it was more convenient for programmers to write location-independent subroutines ŸLead to the development of loaders & linkers to statically relocate and link programs Page 8
9 Dynamic Address Translation Motivation: In the early machines, I/O operations were slow and each word transferred involved the CPU 6.823, L8--17 Higher throughput if CPU and I/O of two or more programs were overlapped Ÿmultiprogramming Location independent programs: Programming and storage management ease Ÿ need for a base register Protection: Independent programs should not affect each other inadvertently Ÿ need for a bound register program1 program2 Physical Memory User versus Kernel 6.823, L8--18 With multiprogramming came move away from programming bare machine users should be protected from each other (and program bugs) users need to share resources (e.g., CPU time, memory, disk) Hardware support evolves to support OS device interrupts to support multiprogramming (can t enforce that users software polls for each others devices) protected state and privileged execution modes to run OS Ÿ exceptions to catch protection violations Purely software schemes to manage protection high-level language programming only (no assembly code) trusted compiler software bugs cause security loopholes and system crashes Page 9
10 Simple Base and Bound Translation 6.823, L8--19 Load X Program Address Space Bound Register d Bound Violation Effective Addr Register Base Register + Physical Address current segment Main Memory Base and bounds registers only visible/accessible when processor running in kernel mode (aka supervisor mode) Separate Areas for Program and Data 6.823, L8--20 Load X Data Bound Register Effective Addr Register Data Base Register d + Bound Violation data segment Main Memory Program Address Space Program Bound Register Program Counter Program Base Register d + Bound Violation program segment This permitted sharing of program segments Used today on Cray vector supercomputers Page 10
11 user 1 user 2 free user 3 free OS Space 16 K 24 K 24 K 32 K 24 K Memory Fragmentation User 4 & 5 arrives user 1 user 2 user 4 free user 3 user 5 OS Space 16 K 24 K 16 K 8 K 32 K 24 K User 2 & 3 leaves user 1 free user 4 free user , L8--21 OS Space 16 K 24 K 16 K 40 K 24 K As users come and go, the storage is fragmented. Therefore, at some stage programs have to be moved around to compact the storage (burping the memory) Address Space of User-1 Paged Memory Systems: To reduce fragmentation Processor generated address can be interpreted as a pair <page number,offset> page number Page Table of User-1 offset A page table contains the physical address of the base of each page Fixed-length pages plus indirection through page table relaxes the contiguous allocation requirement 6.823, L Page 11
12 Private Address Space per User 6.823, L8--23 User 1 User 2 VA1 VA1 Page Table Physical Memory OS pages Page Table User 3 VA1 Page Table Each user has a page table Page table contains an entry for each user page where should page tables reside? FREE 6.823, L8--24 Where Should Page Tables Reside? Space required by the page tables is proportional to the page size, number of users,... ŸSpace requirement is large too expensive to keep in registers Special registers just for the current user: - need new management instructions - affects the context-switching time may not be feasible for large page tables Main memory: - needs one reference to retrieve the page base address and another to access the data word Ÿ doubles number of memory references! Page 12
13 Page Tables in Physical Memory 6.823, L8--25 Page Table, User 1 VA1 User 1 Page Table, User 2 VA1 User 2 Translation Lookaside Buffers Caching the Address Translation virtual address VPN offset 6.823, L8--26 V R W D tag PPN (VPN = virtual page number) (PPN = physical page number) hit? physical address PPN offset TLB speeds up the address translation (IBM, late 60 s) TLB keeps the <VPN, PPN> mappings for the recently accessed pages TLB also keeps additional information about each page, e.g., read/write, dirty Usually TLB is per process and flushed on a context switch Page 13
14 A Problem in Early Sixties 6.823, L8--27 There were many applications whose data could not fit in the main memory, e.g., Payroll Paged memory system reduced fragmentation but still required the whole program to be resident in the main memory Programmers moved the data back and forth from the secondary store by overlaying it repeatedly on the primary store tricky programming! Manual Overlays 6.823, L8--28 Assuming an instruction can address all the storage on the drum method1 - programmer keeps track of addresses in the main memory and initiates an I/O transfer when required method2 - automatic initiation of I/O transfers by software address translation Brookner s interpretive coding, 1960 method 1 proved too difficult for users and method 2 too slow! 40k bits main 640k bits drum central store Ferranti Mercury 1956 Page 14
15 Demand Paging Atlas, , L8--29 Primary 32 Pages 512 words/page A page from secondary storage is brought into the primary storage whenever it is (implicitly) demanded by the processor. Tom Kilburn Central Memory Secondary (Drum) 32x6 pages User sees 32 x 6 x 512 words of storage Primary memory as a cache for secondary memory Hardware Organization of Atlas 6.823, L8--30 Effective Address Initial Address Decode 48-bit words 512-word pages 1 PAR per page frame (Page Address Register) 0 31 PARs Fixed (ROM) 16 pages 0.4 ~1 Psec Subsidiary 2 pages 1.4 Psec Main 32 pages 1.4 Psec <effective PN, status> system code (not swapped) system data (not swapped) Drum (4) 192 pages Tape 8 decks 88 Psec/word Compare the effective page address against all 32 PARs match Ÿ normal access no match Ÿpage fault the state of the partially executed instruction was saved Page 15
16 Atlas Demand Paging Scheme On a page fault: 6.823, L8--31 input transfer into a free page is initiated the PAR is updated if no free page is left, a page is selected to be replaced (based on usage) the replaced page is written on the drum - to minimize drum latency effect, the first empty page on the drum was selected the page table is updated to point to the new location of the page on the drum Caching vs Demand Paging 6.823, L8--32 secondary memory CPU cache primary memory CPU primary memory Caching Demand paging cache entry page-frame cache block (~32 bytes) page (~4K bytes) cache miss (1% to 20%) page miss (<0.001%) cache hit (~1 cycle) page hit (~100 cycles) cache miss (~10 cycles) page miss(~5m cycles) a miss is handled a miss is handled in hardware mostly in software Page 16
Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance
6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation
CS 152 Computer Architecture and Engineering Lecture 8 - Translation Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation
CS 152 Computer Architecture and Engineering Lecture 8 - Translation Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation
CS 152 Computer Architecture and Engineering Lecture 9 - Address Translation Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation
CS 152 Computer Architecture and Engineering Lecture 9 - Address Translation Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches
CS 152 Computer Architecture and Engineering Lecture 11 - Virtual Memory and Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationLast =me in Lecture 7 3 C s of cache misses Compulsory, Capacity, Conflict
CS 152 Computer Architecture and Engineering Lecture 8 - Transla=on Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationVirtual Memory: From Address Translation to Demand Paging
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 12, 2014
More informationCOSC3330 Computer Architecture Lecture 20. Virtual Memory
COSC3330 Computer Architecture Lecture 20. Virtual Memory Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Virtual Memory Topics Reducing Cache Miss Penalty (#2) Use
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationLecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationCS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory
CS 152 Computer Architecture and Engineering Lecture 9 - Virtual Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationVirtual Memory: From Address Translation to Demand Paging
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 9, 2015
More informationModern Virtual Memory Systems. Modern Virtual Memory Systems
6.823, L12--1 Modern Virtual Systems Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L12--2 Modern Virtual Systems illusion of a large, private, uniform store Protection
More informationLecture 9 - Virtual Memory
CS 152 Computer Architecture and Engineering Lecture 9 - Virtual Memory Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152
More informationChapter 8. Virtual Memory
Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:
More informationChapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST
Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial
More informationLecture 11 Cache. Peng Liu.
Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationV. Primary & Secondary Memory!
V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)
More informationNew-School Machine Structures. Overarching Theme for Today. Agenda. Review: Memory Management. The Problem 8/1/2011
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Virtual Instructor: Michael Greenbaum 1 New-School Machine Structures Software Parallel Requests Assigned to computer e.g., Search Katz
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationMemory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationMemory hierarchy review. ECE 154B Dmitri Strukov
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could
More informationJohn Wawrzynek & Nick Weaver
CS 61C: Great Ideas in Computer Architecture Lecture 23: Virtual Memory John Wawrzynek & Nick Weaver http://inst.eecs.berkeley.edu/~cs61c From Previous Lecture: Operating Systems Input / output (I/O) Memory
More informationMemory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt
Memory Hierarchy 2/18/2016 CS 152 Sec6on 5 Colin Schmidt Agenda Review Memory Hierarchy Lab 2 Ques6ons Return Quiz 1 Latencies Comparison Numbers L1 Cache 0.5 ns L2 Cache 7 ns 14x L1 cache Main Memory
More informationCS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 8 Address Transla>on
CS 5 Computer Architecture and Engineering CS5 Graduate Computer Architecture Lecture 8 Transla>on Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 17: Virtual Memory and Caches
CS252 Spring 2017 Graduate Computer Architecture Lecture 17: Virtual Memory and Caches Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 16 Memory
More informationMemory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy
Datorarkitektur och operativsystem Lecture 7 Memory It is impossible to have memory that is both Unlimited (large in capacity) And fast 5.1 Intr roduction We create an illusion for the programmer Before
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationVirtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Precise Definition of Virtual Memory Virtual memory is a mechanism for translating logical
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationLecture 11. Virtual Memory Review: Memory Hierarchy
Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationMemory Management! Goals of this Lecture!
Memory Management! Goals of this Lecture! Help you learn about:" The memory hierarchy" Why it works: locality of reference" Caching, at multiple levels" Virtual memory" and thereby " How the hardware and
More informationMo Money, No Problems: Caches #2...
Mo Money, No Problems: Caches #2... 1 Reminder: Cache Terms... Cache: A small and fast memory used to increase the performance of accessing a big and slow memory Uses temporal locality: The tendency to
More informationEECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont
Lecture 16 Virtual Memory Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 18: Virtual Memory Lecture Outline Review of Main Memory Virtual Memory Simple Interleaving Cycle
More informationLecture 9 Virtual Memory
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 9 Virtual Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley
More informationECE468 Computer Organization and Architecture. Virtual Memory
ECE468 Computer Organization and Architecture Virtual Memory ECE468 vm.1 Review: The Principle of Locality Probability of reference 0 Address Space 2 The Principle of Locality: Program access a relatively
More informationECE 571 Advanced Microprocessor-Based Design Lecture 10
ECE 571 Advanced Microprocessor-Based Design Lecture 10 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 2 October 2014 Performance Concerns Caches Almost all programming can be
More informationShow Me the $... Performance And Caches
Show Me the $... Performance And Caches 1 CPU-Cache Interaction (5-stage pipeline) PCen 0x4 Add bubble PC addr inst hit? Primary Instruction Cache IR D To Memory Control Decode, Register Fetch E A B MD1
More informationECE4680 Computer Organization and Architecture. Virtual Memory
ECE468 Computer Organization and Architecture Virtual Memory If I can see it and I can touch it, it s real. If I can t see it but I can touch it, it s invisible. If I can see it but I can t touch it, it
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationCourse Outline. Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems
Course Outline Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems 1 Today: Memory Management Terminology Uniprogramming Multiprogramming Contiguous
More informationCOEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory
1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationMemory Management. Goals of this Lecture. Motivation for Memory Hierarchy
Memory Management Goals of this Lecture Help you learn about: The memory hierarchy Spatial and temporal locality of reference Caching, at multiple levels Virtual memory and thereby How the hardware and
More informationLecture 20: Virtual Memory, Protection and Paging. Multi-Level Caches
S 09 L20-1 18-447 Lecture 20: Virtual Memory, Protection and Paging James C. Hoe Dept of ECE, CMU April 8, 2009 Announcements: Best class ever, next Monday Handouts: H14 HW#4 (on Blackboard), due 4/22/09
More informationAdvanced Caching Techniques (2) Department of Electrical Engineering Stanford University
Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationMemory management. Requirements. Relocation: program loading. Terms. Relocation. Protection. Sharing. Logical organization. Physical organization
Requirements Relocation Memory management ability to change process image position Protection ability to avoid unwanted memory accesses Sharing ability to share memory portions among processes Logical
More informationVirtual Memory. Virtual Memory
Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical
More informationECE 4750 Computer Architecture, Fall 2017 T03 Fundamental Memory Concepts
ECE 4750 Computer Architecture, Fall 2017 T03 Fundamental Memory Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-09-26-15-52 1 Memory/Library Analogy 2 1.1. Three
More informationAnnouncements. ! Previous lecture. Caches. Inf3 Computer Architecture
Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for
More informationCS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory. Bernhard Boser & Randy Katz
CS 61C: Great Ideas in Computer Architecture Lecture 23: Virtual Memory Bernhard Boser & Randy Katz http://inst.eecs.berkeley.edu/~cs61c Agenda Virtual Memory Paged Physical Memory Swap Space Page Faults
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationAgenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File
EE 260: Introduction to Digital Design Technology Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa 2 Technology Naive Register File Write Read clk Decoder Read Write 3 4 Arrays:
More informationVirtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili
Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Intro to Virtual Memory
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Intro to Virtual Memory Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/ 1 Agenda Multiprogramming/time-sharing
More informationMemory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"
Memory Management! Goals of this Lecture! Help you learn about:" The memory hierarchy" Spatial and temporal locality of reference" Caching, at multiple levels" Virtual memory" and thereby " How the hardware
More informationCS 61C: Great Ideas in Computer Architecture. Virtual Memory
CS 61C: Great Ideas in Computer Architecture Virtual Memory Instructor: Justin Hsia 7/30/2012 Summer 2012 Lecture #24 1 Review of Last Lecture (1/2) Multiple instruction issue increases max speedup, but
More informationMEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming
MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address
More informationCS450/550 Operating Systems
CS450/550 Operating Systems Lecture 4 memory Palden Lama Department of Computer Science CS450/550 Memory.1 Review: Summary of Chapter 3 Deadlocks and its modeling Deadlock detection Deadlock recovery Deadlock
More informationa process may be swapped in and out of main memory such that it occupies different regions
Virtual Memory Characteristics of Paging and Segmentation A process may be broken up into pieces (pages or segments) that do not need to be located contiguously in main memory Memory references are dynamically
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationChapter 8 & Chapter 9 Main Memory & Virtual Memory
Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationComputer Architecture. Memory Hierarchy. Lynn Choi Korea University
Computer Architecture Memory Hierarchy Lynn Choi Korea University Memory Hierarchy Motivated by Principles of Locality Speed vs. Size vs. Cost tradeoff Locality principle Temporal Locality: reference to
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Organization Prof. Michel A. Kinsy The course has 4 modules Module 1 Instruction Set Architecture (ISA) Simple Pipelining and Hazards Module 2 Superscalar Architectures
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationMain Memory (Part I)
Main Memory (Part I) Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Main Memory 1393/8/5 1 / 47 Motivation and Background Amir
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache performance 4 Cache
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanovic & Vladimir Stojanovic hfp://inst.eecs.berkeley.edu/~cs61c/ Parallel Requests Assigned to computer
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanovic & Vladimir Stojanovic hcp://inst.eecs.berkeley.edu/~cs61c/ So$ware Parallel Requests Assigned
More information