Topics to be covered. EEC 581 Computer Architecture. Virtual Memory. Memory Hierarchy Design (II)

Size: px
Start display at page:

Download "Topics to be covered. EEC 581 Computer Architecture. Virtual Memory. Memory Hierarchy Design (II)"

Transcription

1 EEC 581 Computer Architecture Memory Hierarchy Design (II) Department of Electrical Engineering and Computer Science Cleveland State University Topics to be covered Cache Penalty Reduction Techniques Victim cache Assist cache Non-blocking cache Data Prefetch mechanism Virtual Memory 2 1

2 3Cs Absolute Miss Rate (SPEC92) Compulsory misses are a tiny fraction of the overall misses Capacity misses reduce with increasing sizes Conflict misses reduce with increasing associativity way Conflict way way way 0.06 Capacity Cache Size (KB) Compulsory 3 2:1 Cache Rule Miss rate DM cache size X ~= Miss rate 2-way SA cache size X/ way 2-way 4-way 8-way Conflict Capacity Cache Size (KB) Compulsory 4 2

3 3Cs Relative Miss Rate 100% 80% 60% 1-way 2-way 4-way 8-way Conflict 40% Capacity 20% 0% Caveat: fixed block size Cache Size (KB) Compulsory 5 Victim Caching [Jouppi 90] Processor L1 Memory VC Victim Cache Organization Victim cache (VC) A small, fully associative structure Effective in direct-mapped caches Whenever a line is displaced from L1 cache, it is loaded into VC Processor checks both L1 and VC simultaneously Swap data between VC and L1 if L1 misses and VC hits When data has to be evicted from VC, it is written back to memory 6 3

4 % of Conflict Misses Removed Dcache Icache 7 Assist Cache [Chan et al. 96] Processor L1 Memory AC Assist Cache Organization Assist Cache (on-chip) avoids thrashing in main (off-chip) L1 cache (both run at full speed) 64 x 32-byte fully associative CAM Data enters Assist Cache when miss (FIFO replacement policy in Assist Cache) Data conditionally moved to L1 or back to memory during eviction Flush back to memory when brought in by Spatial locality hint instructions Reduce pollution 8 4

5 Multi-lateral Cache Architecture Processor Core A B Memory A Fully Connected Multi-Lateral Cache Architecture Most of the cache architectures be generalized into this form 9 Cache Architecture Taxonomy Processor Processor Processor A B A A B Memory General Description Processor Memory Single-level cache Processor Memory Two-level cache Processor A B A B A B Memory Memory Memory Victim cache Assist cache NTS, and PCS caches 10 5

6 Non-blocking (Lockup-Free) Cache [Kroft 81] Prevent pipeline from stalling due to cache misses (continue to provide hits to other lines while servicing a miss on one/more lines) Uses Miss Status Handler Register (MSHR) Tracks cache misses, allocate one entry per cache miss (called fill buffer in Intel P6 proliferation) New cache miss checks against MSHR Pipeline stalls at a cache miss only when MSHR is full Carefully choose number of MSHR entries to match the sustainable bus bandwidth 11 Bus Utilization (MSHR = 2) Time m1 m2 m3 Initiation m4 interval m5 Lead-off latency 4 data chunk Stall due to insufficient MSHR BUS Bus Idle Data Transfer Memory bus utilization 12 6

7 Bus Utilization (MSHR = 4) Time Stall BUS Bus Idle Data Transfer Memory bus utilization 13 Sample question What is the major difference between CDC6600 s Scoreboarding algorithm and IBM 360/91 s Tomasulo algorithm? (One sentence) Why IBM 360/91 only implemented Tomasulo s algorithm in the floating-point unit but not in the integer unit? (One sentence) What are the two main functions of a ReOrder Buffer (ROB)? 14 7

8 Sample question What is the major difference between CDC6600 s Scoreboarding algorithm and IBM 360/91 s Tomasulo algorithm? (One sentence) Tomasulo algorithm does register renaming. Why IBM 360/91 only implemented Tomasulo s algorithm in the floating-point unit but not in the integer unit? (One sentence) Due to the long latency of the FPU. There are only 4 registers in the FPU. What are the two main functions of a ReOrder Buffer (ROB)? To support i) precise exception and ii) branch misprediction recovery 15 Sample question What is the main responsibility of the Load Store Queue? Given 4 architectural registers (R0 to R3) and 16 physical registers (T0 to T15). The current RAT content is indicated in the leftmost table below. Note that the physical registers are allocated in circular numbering order (i.e., T0, T1, to T15 then back to T0). Assume the renaming logic can rename 4 instructions per clock cycle. For the following instruction sequence, fill the RAT content one cycle later. (The destination register of arithmetic instructions is on the left-hand side.) What are the two main functions of a ReOrder Buffer (ROB)? RAT (after 1 cycle) 16 8

9 Sample question What is the main responsibility of the Load Store Queue? To perform memory address disambiguation and maintain memory ordering. Given 4 architectural registers (R0 to R3) and 16 physical registers (T0 to T15). The current RAT content is indicated in the leftmost table below. Note that the physical registers are allocated in circular numbering order (i.e., T0, T1, to T15 then back to T0). Assume the renaming logic can rename 4 instructions per clock cycle. For the following instruction sequence, fill the RAT content one cycle later. (The destination register of arithmetic instructions is on the left-hand side.) What are the two main functions of a ReOrder Buffer (ROB)? 17 Sample question Caches and main memory are sub-divided into multiple banks in order to allow parallel access. What is an alternative way of allowing parallel access? What a cache that allows multiple cache misses to be outstanding to main memory at the same time, the pipeline is not stalled. What is called? While cache misses are outstanding to main memory, what is the structure that keeps bookkeeping information about the outstanding cache misses? This structure often augments the cache. 18 9

10 Sample question Caches and main memory are sub-divided into multiple banks in order to allow parallel access. What is an alternative way of allowing parallel access? Multiporting, duplicating What a cache that allows multiple cache misses to be outstanding to main memory at the same time, the pipeline is not stalled. What is called? Non-blocking (or lockup-free) ( While cache misses are outstanding to main memory, what is the structure that keeps bookkeeping information about the outstanding cache misses? This structure often augments the cache. Miss status handling registers (MSHRs) 19 Sample question Consider a processor with separate instruction and data caches (and no L2 cache). We are focusing on improving the data cache performance since our instruction cache achieves 100% hit rate with various optimizations. The data cache is 4kB, direct-mapped, and has single cycle access latency. The processor supports a 64-bit virtual address space, 8kB pages and no more than 16GB physical memory. The cache block size is 32 bytes. The data cache is virtually indexed and physically tagged. Assume that the data TLB hit rate is 100%. The miss rate of the data cache is measured to be 10%. The miss penalty is 20 cycles. Compute the average memory access latency (in terms of number of cycles) for data accesses. To improve the overall memory access latency, we decided to introduce a victim cache. It is fully associative and has eight entries. Its access latency is one cycle. To save power and energy consumption, we decided to access the victim cache only after we detect a miss from the data cache. The victim cache hit rate is measured to be 50% (i.e., the probability of finding data in the victim cache given that the data cache doesn t have it). Further, only after we detect a miss from the victim cache we start miss handling. Compute the average memory access latency for data accesses

11 Prefetch (Data/Instruction) Predict what data will be needed in future Pollution vs. Latency reduction If you correctly predict the data that will be required in the future, you reduce latency. If you mispredict, you bring in unwanted data and pollute the cache To determine the effectiveness When to initiate prefetch? (Timeliness) Which lines to prefetch? How big a line to prefetch? (note that cache mechanism already performs prefetching.) What to replace? Software (data) prefetching vs. hardware prefetching 21 Software-controlled Prefetching Use instructions Existing instruction Alpha s load to r31 (hardwired to 0) Specialized instructions and hints Intel s SSE: prefetchnta, prefetcht0/t1/t2 MIPS32: PREF PowerPC: dcbt (data cache block touch), dcbtst (data cache block touch for store) Compiler or hand inserted prefetch instructions 22 11

12 Alpha The Alpha architecture supports data prefetch via load instructions with a destination of register R31 or F31. LDBU, LDF, LDG, LDL, LDT, LDWU LDS LDQ Normal cache line prefetches. Prefetch with modify intent; sets the dirty and modified bits. Prefetch, evict next; no temporal locality. The Alpha architecture also defines the following instructions. FETCH FETCH_M Prefetch Data Prefetch Data, Modify Intent PowerPC dcbt Dcbtst Intel SSE Data Cache Block Touch Data Cache Block Touch for Store The SSE prefetch instruction has the following variants: prefetcht0 prefetcht1 prefetcht2 prefetchnta Temporal data; prefetch data into all cache levels. Temporal with respect to first level cache; prefetch data in all cache levels except 0th cache level. Temporal with respect to second level cache; prefetch data in all cache levels, except 0th and 1st cache levels. Non-temporal with respect to all cache levels; prefetch data into non-temporal cache structure, with minimal cache pollution. 12

13 Software-controlled Prefetching for (i=0; i < N; i++) { prefetch (&a[i+1]); prefetch (&b[i+1]); } sop = sop + a[i]*b[i]; /* unroll loop 4 times */ for (i=0; i < N-4; i+=4) { prefetch (&a[i+4]); prefetch (&b[i+4]); } sop = sop + a[i]*b[i]; sop = sop + a[i+1]*b[i+1]; sop = sop + a[i+2]*b[i+2]; sop = sop + a[i+3]*b[i+3]; sop = sop + a[n-4]*b[n-4]; sop = sop + a[n-3]*b[n-3]; sop = sop + a[n-2]*b[n-2]; sop = sop + a[n-1]*b[n-1]; Prefetch latency <= computational time 25 Hardware-based Prefetching Sequential prefetching Prefetch on miss Tagged prefetch Both techniques are based on One Block Lookahead (OBL) prefetch: Prefetch line (L+1) when line L is accessed based on some criteria 26 13

14 Sequential Prefetching Prefetch on miss Initiate prefetch (L+1) whenever an access to L results in a miss Alpha does this for instructions (prefetched instructions are stored in a separate structure called stream buffer) Tagged prefetch Idea: Whenever there is a first use of a line (demand fetched or previously prefetched line), prefetch the next one One additional Tag bit for each cache line Tag the prefetched, not-yet-used line (Tag = 1) Tag bit = 0 : the line is demand fetched, or a prefetched line is referenced for the first time Prefetch (L+1) only if Tag bit = 1 on L 27 Sequential Prefetching Prefetch-on-miss when accessing contiguous lines Demand fetched Prefetched miss Demand fetched Prefetched hit Demand fetched Prefetched Demand fetched Prefetched miss Tagged Prefetch when accessing contiguous lines 0 Demand fetched 0 Demand fetched 0 Demand fetched 1 Prefetched 0 Prefetched 0 Prefetched 1 Prefetched 0 Prefetched 1 Prefetched miss 28 hit hit 14

15 29 Virtual Memory Virtual memory separation of logical memory from physical memory. Only a part of the program needs to be in memory for execution. Hence, logical address space can be much larger than physical address space. Allows address spaces to be shared by several processes (or threads). Allows for more efficient process creation. Virtual memory can be implemented via: Demand paging Demand segmentation Main memory is like a cache to the hard disc! 30 15

16 Virtual Address The concept of a virtual (or logical) address space that is bound to a separate physical address space is central to memory management Virtual address generated by the CPU Physical address seen by the memory Virtual and physical addresses are the same in compile-time and load-time address-binding schemes; virtual and physical addresses differ in execution-time address-binding schemes 31 Advantages of Virtual Memory Translation: Program can be given consistent view of memory, even though physical memory is scrambled Only the most important part of program ( Working Set ) must be in physical memory. Contiguous structures (like stacks) use only as much physical memory as necessary yet grow later. Protection: Different threads (or processes) protected from each other. Different pages can be given special behavior (Read Only, Invisible to user programs, etc). Kernel data protected from User programs Very important for protection from malicious programs => Far more viruses under Microsoft Windows Sharing: Can map same physical page to multiple users ( Shared memory ) 32 16

17 Use of Virtual Memory stack stack Shared Libs Shared page Shared Libs heap Static data code heap Static data code Process A Process B 33 Virtual vs. Physical Address Space Virtual Virtual Address Memory 0 A Physical Address 0 Main Memory 4k B 4k C 8k C 8k 12k 4G D k 16k 20k 24k 28k A B D Disk 34 17

18 = Paging Divide physical memory into fixed-size blocks (e.g., 4KB) called frames Divide logical memory into blocks of same size (4KB) called pages To run a program of size n pages, need to find n free frames and load program Set up a page table to map page addresses to frame addresses (operating system sets up the page table) 35 Page Table and Address Translation Virtual page number (VPN) Page offset Page Table Main Memory Physical page # (PPN) Physical address 36 18

19 = Page Table Structure Examples One-to-one mapping, space? Large pages Internal fragmentation (similar to having large line sizes in caches) Small pages Page table size issues Multi-level Paging Inverted Page Table Example: 64 bit address space, 4 KB pages (12 bits), 512 MB (29 bits) RAM Number of pages = 2 64 /2 12 = 2 52 (The page table has as many entrees) Each entry is ~4 bytes, the size of the Page table is 2 54 Bytes = 16 Petabytes! Can t fit the page table in the 512 MB RAM! 37 Multi-level (Hierarchical) Page Table Divide virtual address into multiple levels Level 1 is stored in P1 the Main memory P2 Page offset P1 P2 Level 1 page directory (pointer array) Level 2 page table (stores PPN) PPN Page offset 38 19

20 Inverted Page Table One entry for each real page of memory Shared by all active processes Entry consists of the virtual address of the page stored in that real memory location, with Process ID information Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs 39 Linear Inverted Page Table Contain entries (size of physical memory) in a linear array Need to traverse the array sequentially to find a match Can be time consuming PID = 8 Virtual Address VPN = 0x2AA70 Offset match PPN = 0x120D Offset Physical Address PPN Index x120C 0x120D PID VPN 1 0x xFEA00 1 0x x2409A 8 0x2AA Linear Inverted Page Table 40 20

21 Hashed Inverted Page Table Use hash table to limit the search to smaller number of page-table entries PID = 8 Virtual Address VPN = 0x2AA70 Offset 2 Hash anchor table Hash PID VPN 0 1 0x xFEA x x120C 0x120D x2409A 8 0x2AA70... match.... Next 0x x120D.... 0x0980 0x00A Fast Address Translation How often address translation occurs? Where the page table is kept? Keep translation in the hardware Use Translation Lookaside Buffer (TLB) Instruction-TLB & Data-TLB Essentially a cache (tag array = VPN, data array=ppn) Small (32 to 256 entries are typical) Typically fully associative (implemented as a content addressable memory, CAM) or highly associative to minimize conflicts 42 21

22 = 43 Example: Alpha data TLB VPN <35> offset <13> Address Space Number <8> <4><1> <35> <31> ASN ProtVTag PPN :1 mux 44-bit physical address 44 22

23 TLB and Caches Several Design Alternatives VIVT: Virtually-indexed Virtually-tagged Cache VIPT: Virtually-indexed Physically-tagged Cache PIVT: Physically-indexed Virtually-tagged Cache Not outright useful, R6000 is the only used this. PIPT: Physically-indexed Physically-tagged Cache

24 Virtually-Indexed Virtually-Tagged (VIVT) cache line return Processor Core VA VIVT Cache miss TLB Main Memory hit Fast cache access Only require address translation when going to memory (miss) Issues? 47 VIVT Cache Issues - Aliasing Homonym Same VA maps to different PAs Occurs when there is a context switch Solutions Include process id (PID) in cache or Flush cache upon context switches Synonym (also a problem in VIPT) Different VAs map to the same PA Occurs when data is shared by multiple processes Duplicated cache line in VIPT cache and VIVT$ w/ PID Data is inconsistent due to duplicated locations Solution Can Write-through solve the problem? Flush cache upon context switch If (index+offset) < page offset, can the problem be solved? (discussed later in VIPT) 48 24

25 49 Physically-Indexed Physically-Tagged (PIPT) cache line return Processor Core VA TLB PA PIPT Cache miss Main Memory hit Slower, always translate address before accessing memory Simpler for data coherence 50 25

26 Virtually-Indexed Physically-Tagged (VIPT) TLB PA Processor Core VA VIPT Cache miss Main Memory cache line return hit Gain benefit of a VIVT and PIPT Parallel Access to TLB and VIPT cache No Homonym How about Synonym? 51 Deal w/ Synonym in VIPT Cache Index VPN A Process A point to the same location within a page Process B VPN B Index VPN A!= VPN B How to eliminate duplication? make cache Index A == Index B? Tag array Data array 52 26

27 Synonym in VIPT Cache VPN Cache Tag Page Offset Set Index Line Offset If two VPNs do not differ in a then there is no synonym problem, since they will be indexed to the same set of a VIPT cache Imply # of sets cannot be too big Max number of sets = page size / cache line size Ex: 4KB page, 32B line, max set = 128 A complicated solution in MIPS R10000 a 53 R10000 s Solution to Synonym 32KB 2-Way Virtually-Indexed L1 VPN 12 bit 10 bit 4-bit Direct-Mapped Physical L2 a= VPN[1:0] stored as part of L2 cache Tag L2 is Inclusive of L1 VPN[1:0] is appended to the tag of L2 Given two virtual addresses VA1 and VA2 that differs in VPN[1:0] and both map to the same physical address PA Suppose VA1 is accessed first so blocks are allocated in L1&L2 What happens when VA2 is referenced? 1 VA2 indexes to a different block in L1 and misses 2 VA2 translates to PA and goes to the same block as VA1 in L2 3. Tag comparison fails (since VA1[1:0] VA2[1:0]) 4. Treated just like as a L2 conflict miss VA1 s entry in L1 is ejected (or dirty-written back if needed) due to inclusion policy 54 27

28 Deal w/ Synonym in MIPS R10000 VA1 Page offset index a1 VA2 Page offset index a2 1 miss 0 TLB L1 VIPT cache L2 PIPT Cache Physical index a2 a2!=a1 a1 Phy. Tag data 55 Deal w/ Synonym in MIPS R10000 VA1 Page offset index a1 VA2 Page offset index a2 Only one copy is present in L1 0 1 TLB L1 VIPT cache L2 PIPT Cache Data return a2 Phy. Tag data 56 28

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James

Computer Systems Architecture I. CSE 560M Lecture 17 Guest Lecturer: Shakir James Computer Systems Architecture I CSE 560M Lecture 17 Guest Lecturer: Shakir James Plan for Today Announcements and Reminders Project demos in three weeks (Nov. 23 rd ) Questions Today s discussion: Improving

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

Virtual Memory, Address Translation

Virtual Memory, Address Translation Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

Virtual Memory, Address Translation

Virtual Memory, Address Translation Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory: From Address Translation to Demand Paging Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 12, 2014

More information

Lec 11 How to improve cache performance

Lec 11 How to improve cache performance Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 12

ECE 571 Advanced Microprocessor-Based Design Lecture 12 ECE 571 Advanced Microprocessor-Based Design Lecture 12 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 March 2018 HW#6 will be posted Project will be coming up Announcements

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays. CSE420 Virtual Memory Prof. Mokhtar Aboelaze York University Based on Slides by Prof. L. Bhuyan (UCR) Prof. M. Shaaban (RIT) Virtual Memory Virtual memory first used to relive programmers from the burden

More information

Lecture 11. Virtual Memory Review: Memory Hierarchy

Lecture 11. Virtual Memory Review: Memory Hierarchy Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15

More information

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,

More information

Chapter 8. Virtual Memory

Chapter 8. Virtual Memory Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:

More information

LECTURE 12. Virtual Memory

LECTURE 12. Virtual Memory LECTURE 12 Virtual Memory VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a cache for magnetic disk. The mechanism by which this is accomplished

More information

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory 1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

EE 660: Computer Architecture Advanced Caches

EE 660: Computer Architecture Advanced Caches EE 660: Computer Architecture Advanced Caches Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David Wentzlaff Agenda Review Three C s Basic Cache

More information

CSC 631: High-Performance Computer Architecture

CSC 631: High-Performance Computer Architecture CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 10: Memory Part II CSC 631: High-Performance Computer Architecture 1 Two predictable properties of memory references: Temporal Locality:

More information

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance 6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Memory Hierarchy. Mehran Rezaei

Memory Hierarchy. Mehran Rezaei Memory Hierarchy Mehran Rezaei What types of memory do we have? Registers Cache (Static RAM) Main Memory (Dynamic RAM) Disk (Magnetic Disk) Option : Build It Out of Fast SRAM About 5- ns access Decoders

More information

Main Memory (Fig. 7.13) Main Memory

Main Memory (Fig. 7.13) Main Memory Main Memory (Fig. 7.13) CPU CPU CPU Cache Multiplexor Cache Cache Bus Bus Bus Memory Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3 Memory b. Wide memory organization c. Interleaved memory organization

More information

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache Fundamental Questions Computer Architecture and Organization Hierarchy: Set Associative Q: Where can a block be placed in the upper level? (Block placement) Q: How is a block found if it is in the upper

More information

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!

More information

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches CS 152 Computer Architecture and Engineering Lecture 11 - Virtual Memory and Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

Virtual Memory. Motivation:

Virtual Memory. Motivation: Virtual Memory Motivation:! Each process would like to see its own, full, address space! Clearly impossible to provide full physical memory for all processes! Processes may define a large address space

More information

Processes and Virtual Memory Concepts

Processes and Virtual Memory Concepts Processes and Virtual Memory Concepts Brad Karp UCL Computer Science CS 37 8 th February 28 (lecture notes derived from material from Phil Gibbons, Dave O Hallaron, and Randy Bryant) Today Processes Virtual

More information

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Misses Classifying Misses: 3 Cs! Compulsory The first access to a block is

More information

Virtual to physical address translation

Virtual to physical address translation Virtual to physical address translation Virtual memory with paging Page table per process Page table entry includes present bit frame number modify bit flags for protection and sharing. Page tables can

More information

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

Memory Hierarchy Requirements. Three Advantages of Virtual Memory CS61C L12 Virtual (1) CS61CL : Machine Structures Lecture #12 Virtual 2009-08-03 Jeremy Huddleston Review!! Cache design choices: "! Size of cache: speed v. capacity "! size (i.e., cache aspect ratio)

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 18: Virtual Memory Lecture Outline Review of Main Memory Virtual Memory Simple Interleaving Cycle

More information

Another View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements

Another View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements CS61C L25 Virtual I (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #25 Virtual I 27-8-7 Scott Beamer, Instructor Another View of the Hierarchy Thus far{ Next: Virtual { Regs Instr.

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Portland State University ECE 587/687. Caches and Memory-Level Parallelism Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each

More information

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Address Translation Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics How to reduce the size of page tables? How to reduce the time for

More information

ECE4680 Computer Organization and Architecture. Virtual Memory

ECE4680 Computer Organization and Architecture. Virtual Memory ECE468 Computer Organization and Architecture Virtual Memory If I can see it and I can touch it, it s real. If I can t see it but I can touch it, it s invisible. If I can see it but I can t touch it, it

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

Advanced cache optimizations. ECE 154B Dmitri Strukov

Advanced cache optimizations. ECE 154B Dmitri Strukov Advanced cache optimizations ECE 154B Dmitri Strukov Advanced Cache Optimization 1) Way prediction 2) Victim cache 3) Critical word first and early restart 4) Merging write buffer 5) Nonblocking cache

More information

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology Memory: Page Table Structure CSSE 332 Operating Systems Rose-Hulman Institute of Technology General address transla+on CPU virtual address data cache MMU Physical address Global memory Memory management

More information

Memory Management. Dr. Yingwu Zhu

Memory Management. Dr. Yingwu Zhu Memory Management Dr. Yingwu Zhu Big picture Main memory is a resource A process/thread is being executing, the instructions & data must be in memory Assumption: Main memory is infinite Allocation of memory

More information

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Carnegie Mellon Virtual Memory: Concepts 5-23: Introduction to Computer Systems 7 th Lecture, October 24, 27 Instructor: Randy Bryant 2 Hmmm, How Does This Work?! Process Process 2 Process n Solution:

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

ECE 411 Exam 1 Practice Problems

ECE 411 Exam 1 Practice Problems ECE 411 Exam 1 Practice Problems Topics Single-Cycle vs Multi-Cycle ISA Tradeoffs Performance Memory Hierarchy Caches (including interactions with VM) 1.) Suppose a single cycle design uses a clock period

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management CSE 120 July 18, 2006 Day 5 Memory Instructor: Neil Rhodes Translation Lookaside Buffer (TLB) Implemented in Hardware Cache to map virtual page numbers to page frame Associative memory: HW looks up in

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 48 Computer Organization 5. The Basics of Cache Chansu Yu Caches: The Basic Idea A smaller set of storage locations storing a subset of information from a larger set (memory) Unlike registers or memory,

More information

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is

More information

VIRTUAL MEMORY II. Jo, Heeseung

VIRTUAL MEMORY II. Jo, Heeseung VIRTUAL MEMORY II Jo, Heeseung TODAY'S TOPICS How to reduce the size of page tables? How to reduce the time for address translation? 2 PAGE TABLES Space overhead of page tables The size of the page table

More information

Handout 4 Memory Hierarchy

Handout 4 Memory Hierarchy Handout 4 Memory Hierarchy Outline Memory hierarchy Locality Cache design Virtual address spaces Page table layout TLB design options (MMU Sub-system) Conclusion 2012/11/7 2 Since 1980, CPU has outpaced

More information

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory: From Address Translation to Demand Paging Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 9, 2015

More information

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar

More information

Cache performance Outline

Cache performance Outline Cache performance 1 Outline Metrics Performance characterization Cache optimization techniques 2 Page 1 Cache Performance metrics (1) Miss rate: Neglects cycle time implications Average memory access time

More information

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory CPS 104 Computer Organization and Programming Lecture 20: Virtual Nov. 10, 1999 Dietolf (Dee) Ramm http://www.cs.duke.edu/~dr/cps104.html CPS 104 Lecture 20.1 Outline of Today s Lecture O Virtual. 6 Paged

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address

More information

Lecture-18 (Cache Optimizations) CS422-Spring

Lecture-18 (Cache Optimizations) CS422-Spring Lecture-18 (Cache Optimizations) CS422-Spring 2018 Biswa@CSE-IITK Compiler Optimizations Loop interchange Merging Loop fusion Blocking Refer H&P: You need it for PA3 and PA4 too. CS422: Spring 2018 Biswabandan

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017 CIS 5512 - Operating Systems Memory Management Cache Professor Qiang Zeng Fall 2017 Previous class What is logical address? Who use it? Describes a location in the logical memory address space Compiler

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

COSC 6385 Computer Architecture - Memory Hierarchy Design (III) COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses

More information

Lecture 7 - Memory Hierarchy-II

Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

CS 153 Design of Operating Systems Winter 2016

CS 153 Design of Operating Systems Winter 2016 CS 153 Design of Operating Systems Winter 2016 Lecture 16: Memory Management and Paging Announcement Homework 2 is out To be posted on ilearn today Due in a week (the end of Feb 19 th ). 2 Recap: Fixed

More information

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018 CIS 3207 - Operating Systems Memory Management Cache and Demand Paging Professor Qiang Zeng Spring 2018 Process switch Upon process switch what is updated in order to assist address translation? Contiguous

More information

Basic Memory Management

Basic Memory Management Basic Memory Management CS 256/456 Dept. of Computer Science, University of Rochester 10/15/14 CSC 2/456 1 Basic Memory Management Program must be brought into memory and placed within a process for it

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019 MEMORY: SWAPPING Shivaram Venkataraman CS 537, Spring 2019 ADMINISTRIVIA - Project 2b is out. Due Feb 27 th, 11:59 - Project 1b grades are out Lessons from p2a? 1. Start early! 2. Sketch out a design?

More information

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14 98:23 Intro to Computer Organization Lecture 4 Virtual Memory 98:23 Introduction to Computer Organization Lecture 4 Instructor: Nicole Hynes nicole.hynes@rutgers.edu Credits: Several slides courtesy of

More information

Basic Memory Management. Basic Memory Management. Address Binding. Running a user program. Operating Systems 10/14/2018 CSC 256/456 1

Basic Memory Management. Basic Memory Management. Address Binding. Running a user program. Operating Systems 10/14/2018 CSC 256/456 1 Basic Memory Management Program must be brought into memory and placed within a process for it to be run Basic Memory Management CS 256/456 Dept. of Computer Science, University of Rochester Mono-programming

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

Introduction to OpenMP. Lecture 10: Caches

Introduction to OpenMP. Lecture 10: Caches Introduction to OpenMP Lecture 10: Caches Overview Why caches are needed How caches work Cache design and performance. The memory speed gap Moore s Law: processors speed doubles every 18 months. True for

More information

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours CS433 Final Exam Prof Josep Torrellas December 12, 2006 Time: 2 hours Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 6 Questions. Please budget your time. 3. Calculators

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

Computer Systems. Virtual Memory. Han, Hwansoo

Computer Systems. Virtual Memory. Han, Hwansoo Computer Systems Virtual Memory Han, Hwansoo A System Using Physical Addressing CPU Physical address (PA) 4 Main memory : : 2: 3: 4: 5: 6: 7: 8:... M-: Data word Used in simple systems like embedded microcontrollers

More information

Memories. CPE480/CS480/EE480, Spring Hank Dietz.

Memories. CPE480/CS480/EE480, Spring Hank Dietz. Memories CPE480/CS480/EE480, Spring 2018 Hank Dietz http://aggregate.org/ee480 What we want, what we have What we want: Unlimited memory space Fast, constant, access time (UMA: Uniform Memory Access) What

More information