ECE 463/521: Spring 2005 Project 1: Data-Cache System Design Due: Wednesday, February 23, 2005, 11:00 PM

Size: px
Start display at page:

Download "ECE 463/521: Spring 2005 Project 1: Data-Cache System Design Due: Wednesday, February 23, 2005, 11:00 PM"

Transcription

1 ECE 463/521: Spring 2005 Project 1: Data-Cache System Design Due: Wednesday, February 23, 2005, 11:00 PM Project rules 1. All students are encouraged to work in teams of two, using pair programming. Pair programming means programming where two people sit at the same workstation, writing the code collaboratively. To find a partner, post a message on the message board at 2. ECE 521 students will have additional parts to do. If a 463 student pairs with a 521 student, they will have to meet the 521 requirements. 3. You may not work with the same partner on more than one project this semester.4.you must register your partnership by posting on the Pair-Programming Partners message board, under the topic Project Sharing of code between teams will be considered cheating, and will be penalized in accordance with the Academic Integrity policy. 6. It is acceptable for you to compare your results with other groups to help debug your program. It is not acceptable to collaborate on the final experiments. 7. You must do all your work in C, C++, or Java. C++ and Java are encouraged because they enable straightforward code-reuse and division of labor. 8. Homework will be submitted over the Wolfware Submit system and run in the Eos/Unity environment. Project description This project will study instruction caches, and the performance impact of varying the cache line size and using different compilation parameters for code to be run through the cache. ECE 521 students will also simulate the translation-lookaside buffer (TLB) for the system. You will write a trace-driven simulator, which inputs a trace (a sequence of references) from a dynamic instruction stream to simulate hardware operations. Input Output I-cache simulator 1

2 Input: Trace file The simulator reads a trace file that records instructions in the following format:!start address":!instruction length 1 "!instruction length 2 "!instruction length m "!start address":!instruction length 1 "!instruction length 2 "!instruction length m " All input is in hex. The start address is the address of the first instruction in a sequence, in hexadecimal. The instruction length is a hex digit that tells how many bytes an instruction occupies. A single line of the trace file represents instructions that are executed sequentially, without any jumps. Every time there is a jump (an unconditional jump or a taken branch), a new line is used in the trace file. This format allows us to represent a lot of instructions in a relatively small trace file. Example: 00abcdef:8476A58 00abcfee:4853C4B84 00abcdf0:5 00b03540:8C3D4... Simulator: Your task Specification of simulator Cache simulation capabilities a. The simulator models a memory hierarchy with an L1 instruction cache and an optional victim cache (the system can be configured with or without a victim cache). Tip: If you are using C++ or Java, note that the victim cache code is quite different from the L1 cache code, so it is best to implement different classes for each. b. L1 cache description: o SIZE: total bytes of data storage o ASSOC: the associativity of the cache (ASSOC = 1 is a direct-mapped cache) o BLOCKSIZE: the number of bytes in a block o LRU replacement policy There are a few constraints on the above parameters: (i) BLOCKSIZE is a power of two and (ii) the number of sets is a power of two. Note that ASSOC (and, therefore SIZE) need not be a power of two. The number of sets is determined by the following equation: # sets = SIZE BLOCKSIZE x ASSOC You may assume that a miss to the L1 cache that has to be satisfied from memory takes as long to process as MEM_DELAY hits (where MEM_DELAY is a constant derived as explained in the section on AAT Calculation, below). 2

3 c. Victim cache description: o The victim cache is fully associative and uses LRU replacement policy. o The number of victim cache entries, V, is adjustable. o L1 cache miss / victim cache hit: If there is a miss in the L1 cache and a hit in the victim cache (say for block X), block X is placed in the L1 cache. Then, the block that X replaced in the L1 cache (say block Y, which we call the victim block) is placed in the victim cache. Block Y goes into the victim cache entry where block X was found, replacing block X in the victim cache. That is, the two caches swap blocks X and Y. This also means that a given block will never reside in both caches at the same time. A special case is when there is no victim block from the L1 cache (which occurs when there are invalid entries in the L1 cache). In this case, block X is simply invalidated in the victim cache, instead of being replaced by a victim block. o L1 cache miss / victim cache miss: If there is a miss in the L1 cache and a miss in the victim cache (say for block X), block X is placed in the L1 cache. Then, the block that X replaced in the L1 cache (say block Y, which we call the victim block) is placed in the victim cache. We cannot perform a swap, as we did above, because block X was not found in the victim cache. Instead, block Y replaces the LRU block in the victim cache. In the special case that there is no victim block from the L1 cache, the victim cache does nothing. e. Your simulator should be capable of prefetching. If prefetching is turned on for a particular run, when a reference to a block, say block i, causes a miss, block i is fetched, but immediately after block i is in the cache, block i+1 is also fetched. This means that another line needs to be replaced from the cache. Note that this line will always come from the next set after the set into which line i was fetched. The prefetched line then becomes the MRU line of the set. Since it takes as long to process a miss as to process MEM_DELAY hits, when a block is prefetched, you should schedule the prefetch to begin as soon as the cache miss that triggered the prefetch has been processed. That is, if the current time is t when a miss is encountered, the processor will be stalled until time t+mem_delay waiting for the miss to be serviced. Then a prefetch will begin at time t+mem_delay and finish at time t+2mem_delay. Once a prefetch has begun, the block that is being prefetched is neither valid nor invalid, but in transition. That is, if it is referenced, it does not cause another miss, but the processor can t continue right away either. Rather, the processor is stalled until the prefetch has finished. Here are some thoughts on how to implement this. Instead of a valid/invalid bit for each cache line, we now need a variable that can take on the values VALID, INVALID, and IN TRANSITION. When a prefetch begins, the line into which the block is being prefetched changes state to IN TRANSITION. 3

4 What happens if a block is referenced while it is IN TRANSITION? The processor must stall till the prefetch finishes. So, the simulator can have a variable called, e.g., time_prefetch_done. Whenever a prefetch occurs, this variable is set to the time when it will be finished (e.g., if MEM_DELAY = 20 in the example above, it will finish at t+20). Now, if a block that is IN TRANSITION is referenced, a stall must occur until that time. So in the simulator, we can simply set current_time = time_prefetch_done. This takes care of handling the stall. (Note that we can get by with a single time_prefetch_done variable, since only one block can be being prefetched at a time.) Of course, before we handle a prefetch-induced stall, we need to be careful that the block is still IN TRANSITION. If the current time is after the time that the prefetch finishes, we don t need to stall (indeed, we better not stall!). We just need to set the block s status to VALID. An easy way to implement references to blocks IN TRANSITION would be to set current_time = min(current_time+1, time_prefetch_done) and set the block s status to VALID. There s another special case we need to consider. That is, what happens when a miss occurs while a prefetch is in progress? We can t start to process the miss until the prefetch is finished. (Actually, high-performance processors use split-transaction buses, which allow transfers to memory to be overlapped, but in our first project, we will opt for the simpler approach of assuming that only one transfer is active at a time.) So this means that if the current time is t, the miss will not finish being processed at time t+20, but rather at time_prefetch_done+20. Thus, prefetching makes it possible to have stalls that are almost twice as long as the stalls without prefetching. We hope that this doesn t happen very often, and that the cost is outweighed by having prefetched blocks made available without any processor stalls. f. Assume that no instruction is ever written to during the course of our simulations. Thus, you will not have to implement a dirty bit. i. Five parameters completely specify the system: SIZE, ASSOC, BLOCKSIZE, PREFETCH, and V. The variable V is the size of the victim cache. If V=0, the victim cache is disabled. j. The size of an address is 32 bits. TLB simulation capabilities ECE 521 students will also simulate the TLB of the system. The page size of the system should be a parameter that for a particular run, will be set either to 4K or 8K bytes. You may assume that physical memory consists of 256 MB (how large does this make the page-frame number?). k. TLB description. o ENTRIES: number of TLB entries o T_ASSOC: the associativity of the TLB (T_ASSOC = 1 # direct-mapped TLB) o LRU replacement policy 4

5 Each TLB entry consists of a valid bit, a page number, and a page-frame number. A real architecture would also maintain a write-protect bit for each page, but that is not necessary in our simulation. l. Address translation. Assume that the addresses in the trace file are virtual addresses. They must be translated to physical addresses before the instructions are placed in the I-cache. Translation proceeds as follows. The address is separated into a page number and a displacement. The page number is looked up in the TLB. If it is present, it is translated to a pageframe number, which is concatenated with the displacement to form the physical address. Then the physical address is looked up in the cache. If the page number is not present in the TLB, an entry must be made for it. In a real architecture, this would require looking it up in the page table. However, to simplify matters, we will simply assign the page-frame number by applying the following function to the page number: Discard the 8 most significant bits of the address, as well as the bits that constitute the displacement. Then take the ones-complement of the remaining bits. This becomes the pageframe number. An entry is then made for this page in the TLB. The entry consists of a valid bit (set to true ), the page number, and the page-frame number. This entry is placed in the proper place in the TLB, and may replace a pre-existing TLB entry. The cost of the TLB miss is 2 eight-byte memory reads. Project Specification Model 1: L1 cache (Both 463 and 521 students should complete this part) Requirements 1. The parameters, such as cache size, block size, associativity (n $ 1), and prefetch are adjustable in your simulator. 2. LRU is the only replacement policy. Output of your simulator In overview, this is the kind of information you will be collecting for each of the cache systems. For more complete information, refer to the specific lists in the descriptions of each type of cache system, below. 1. Number of lines fetched into the cache. 2. Average number of bytes from each line that are referenced by the processor while in the cache (see below). 3. Number of cache reads 4. Number of read misses 5

6 cache read misses 5. Cache miss rate = cache reads For item 2 above, keep a bit-vector for each line brought into the cache. The bit-vector contains one bit for each byte in the cache line. Each time a line is brought into the cache, the bit-vector is initialized to all 0s. Each time a byte is referenced by the processor, the corresponding bit in the bit-vector is turned on. (Note that a single instruction is usually several bytes in length, so in simulating the execution of an instruction, it may be necessary to turn on several bits of the bitvector). When a line is replaced in the cache, record the number of 1-bits in the bit vector. Call this number the number of active bytes. At the end of the simulation, go through each valid line in the cache and record the number of active bytes (it might be easier to achieve this just by invalidating all the lines in the cache when the simulation ends). Then calculate FBU =! # active bytes # cache misses " BLOCKSIZE Data analysis: Use your data collected from your experiments to analyze the relationship of the miss rate, AAT (average access time refer to AAT calculation and CACTI tool), cache size, block size, and other parameters of cache (better with table and graphic). Find two data cache systems with the best AAT for each trace file. Model 2: L1 cache + victim cache (for both of 463/521 students) Requirements 1. The cache size, block size, cache associativity, and prefetching policy of L1 and victim cache (fully associative) are adjustable in your simulator. 2. LRU is the only replacement policy. Output of your simulator: 1. Number of lines fetched into the L1 cache. 2. Average number of bytes from each line that are referenced by the processor while in the cache. 3. Number of L1 cache reads 4. Number of L1 read misses cache read misses 5. L1 cache miss rate = cache reads 6. Number of VC reads. 7. Number of VC read misses VC read misses 9. VC miss rate = L1 read misses 6

7 Data analysis Take the two best combinations of parameters from Model 1, and vary the size of the victim cache. Find the two data-cache systems with the best AAT for each trace file. Describe why you think the best-performing system outperforms the others. Model 3: All of the above + TLB Requirements 1. The cache size, block size, cache associativity, and prefetching policy of L1 and victim cache (fully associative) are adjustable in your simulator. 2. The number of entries and associativity of the TLB are adjustable in your simulator. 3. LRU is the only replacement policy. Output of your simulator: All of the outputs listed in Model 1 and Model 2, whichever applies to a particular run, as well as 1. Number of TLB reads (should be the same as the number of L1 cache reads). 2. Number of TLB misses TLB misses 3. TLB miss rate = TLB reads Data analysis Use the data collected from your experiments to analyze the relationship of miss rate, AAT (average access time), cache size, block size, number of TLB entries and other parameters of the cache (ideally, with tables and graphics). Find two data-cache systems with the best AAT for each trace file. Overall analysis From an architectural perspective, analyze and compare the characters of those three models with the results from your experiments. CACTI tool The CACTI tool is used to calculate the access time of a cache based on its configurations (size, associativity, etc. Installing CACTI Log into your Unity home directory in Unix (Solaris, etc.). CACTI does not run on Linux. Using CACTI We ve compiled cacti. Just type its name to run it. The executable is in /afs/eos/courses/ece/ece521/common/www/homework/projects/1/cacti (or 7

8 Using our function (simplest) The simplest way to use CACTI is to call the function that we have provided. Simply copy the function cacti_get_at from the file callcacti.c in the same directory, and paste it into your code. It has three parameters, cache size in bytes, block size in bytes, and associativity. (For a TLB, use 8 (bytes) as the block size.) In calling this function, use an associativity of 1 to denote a direct-mapped cache and an associativity of -1 to denote a fully associative cache. For example, the function might be called like this: cacti_get_at(16384, 128, 4) You are not required to use the cacti_get_at function. You may call CACTI directly, as follows. Calling CACTI directly (more flexibility) In case you need to use your own code to call cacti, here s what you need to know: CACTI Syntax cacti_ <csize> <bsize> <assoc> <tech> csize - size of cache in bytes (e.g., 16384) bsize - block size of cache in bytes (e.g., 32) assoc - associativity of cache (e.g., 2, 4, DM, or FA) For direct mapped caches use 1 or DM For set-associative caches use a number n for an n-way associative cache For fully associative caches, use FA tech - technology size in microns (we use 0.18um) Example: eos% cacti um will return the following timing and power analysis for a 16K 2-way set associative cache with a 32-byte block size at a 0.18 µm feature (technology) size: Technology Size: 0.18um Vdd: 1.7V Access Time (ns): this is what we need. Cycle Time (wave pipelined) (ns): Power (nj): Wire scale from mux drivers to data output: jj CACTI doesn t allow the number of sets to be too small. To simulate a fully associative cache, you have to use the FA option. Example: eos% cacti FA 0.18um will then return... Access Time (ns):

9 ... Limits of Parameters No matter whether you use our function or your own code to call CACTI, there are some limits on the parameters: Minimum Maximum Cache Size (bytes) Block Size * (bytes) Associativity or 2 25 (using option FA) Number of Sets * Block size must be a power of 2. AAT calculation The access times of the L1 cache, victim cache, and TLB can be determined using the CACTI tool. Below, T L1, T TLB and T V refer to the L1 cache, TLB cache and victim cache access times (in ns), respectively. Set the processor cycle time to be max(t TLB, T L1, T V ). For L1 cache AAT (s) = T L1 + Miss Rate L1 % Miss Penalty L1 It takes 20 cycles to fetch the first 8 bytes of a block from memory, and the remaining bytes are transferred at a rate of 8 bytes/cycle. Assuming L1 cache has the longest access time of any cache in the system (and thus,., the clock period in ns. is equal to T L1 ), and its line length is 64 bytes, Memory-to-L1 transfer time = = 27 cycles Miss Penalty L1 = MEM_DELAY = 27 cycles For L1+victim cache AAT (s) = T L1 + Miss Rate L1 % V_DELAY + Miss Rate V % MEM_DELAY Transferring a block from the victim cache to the L1 cache occurs at a rate of 8 bytes/cycle. The block must be completely transferred before the processor resumes. The V-to-L1 transfer time in ns is equal to: V-to-L1 transfer time V_DELAY = (BLOCKSIZE / 8) cycles Miss Penalty L1 = T v + (1 Miss Rate V ) % (BLOCKSIZE / 8) % T L1 + Miss Rate V % Miss Penalty V For L1 cache + TLB Assume that TLB lookup is done in parallel with cache access, so there is no impact on access time, except in case of a TLB miss. Take the TLB miss rate and multiply it by the time to service a TLB miss (which will be the time to fetch 8 bytes from main memory). Then proceed as before with the AAT calculation. Program Interface Requirements To assure that your code can be run and graded easily, 1. Your code must be able to run in the Unix environment on the Eos/Unity system. 9

10 2. A makefile must be provided. 3. A make will create an executable file named icache. 4. The program must be executable from the command line, e.g., for a run of the simulator without TLB, eos% icache!tracefile"!size"!assoc"!blocksize"!prefetch"!v" and, for a run with a TLB, eos% icache!tracefile"!size"!assoc"!blocksize"!prefetch"!v"!entries"!t_assoc" Example for no TLB: eos% icache trace1.txt <return> The first parameter is the the cache size, in bytes (16384), The second parameter is the associativity (2), and The third parameter is the block size (128), The fourth parameter is whether prefetching is in use (0 = no, 1 = yes) For a victim cache, there is a fifth parameter V, the size of the victim cache in number of lines. What to hand in 1. Makefile 2. Source code 3. Project report (Microsoft Word.doc file recommended) * No executable file (including cacti) is needed; just submit your source code and makefile. Make sure you use zip to compress all your source code and report into a single zip file named project1.zip. Inform us if your file is larger than 1M. Grading 0% You do not hand in (submit electronically) anything by the due date. +10% Your Makefile works, and creates three simulators (executable files). +10% Your simulator can read trace files from the command line and has the proper interface for parameter settings. +50% Your simulator produces the correct output. +30% You have done a good analysis of the experiment. 10

ECE 463/521: Fall 2002 Project #1: Data-Cache System Design Due: Monday, Sept. 30, 11:00 PM

ECE 463/521: Fall 2002 Project #1: Data-Cache System Design Due: Monday, Sept. 30, 11:00 PM ECE 463/521: Fall 2002 Project #1: Data-Cache System Design Due: Monday, Sept. 30, 11:00 PM Project rules 1. All students are encouraged to work in teams of two, using pair programming. Pair programming

More information

Project #3: Dynamic Instruction Scheduling. Due: Wednesday, Dec. 4, 11:00 PM

Project #3: Dynamic Instruction Scheduling. Due: Wednesday, Dec. 4, 11:00 PM ECE 463/521: Fall 2002 Project #3: Dynamic Instruction Scheduling Due: Wednesday, Dec. 4, 11:00 PM Project rules 1. All students are encouraged to work in teams of two, using pair programming. Pair programming

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring Caches and the Memory Hierarchy

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring Caches and the Memory Hierarchy CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring 2019 Caches and the Memory Hierarchy Assigned February 13 Problem Set #2 Due Wed, February 27 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

CS152 Computer Architecture and Engineering

CS152 Computer Architecture and Engineering CS152 Computer Architecture and Engineering Caches and the Memory Hierarchy Assigned 9/17/2016 Problem Set #2 Due Tue, Oct 4 http://inst.eecs.berkeley.edu/~cs152/fa16 The problem sets are intended to help

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring 2018 SOLUTIONS Caches and the Memory Hierarchy Assigned February 8 Problem Set #2 Due Wed, February 21 http://inst.eecs.berkeley.edu/~cs152/sp18

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

ECE 341. Lecture # 18

ECE 341. Lecture # 18 ECE 341 Lecture # 18 Instructor: Zeshan Chishti zeshan@ece.pdx.edu December 1, 2014 Portland State University Lecture Topics The Memory System Cache Memories Performance Considerations Hit Ratios and Miss

More information

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page. CS 433 Homework 4 Assigned on 10/17/2017 Due in class on 11/7/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements CS 61C: Great Ideas in Computer Architecture Virtual II Guest Lecturer: Justin Hsia Agenda Review of Last Lecture Goals of Virtual Page Tables Translation Lookaside Buffer (TLB) Administrivia VM Performance

More information

CSC/ECE 506: Computer Architecture and Multiprocessing Program 3: Simulating DSM Coherence Due: Tuesday, Nov 22, 2016

CSC/ECE 506: Computer Architecture and Multiprocessing Program 3: Simulating DSM Coherence Due: Tuesday, Nov 22, 2016 CSC/ECE 506: Computer Architecture and Multiprocessing Program 3: Simulating DSM Coherence Due: Tuesday, Nov 22, 2016 1. Overall Problem Description In this project, you will add new features to a trace-driven

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 33 Virtual Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ How does the virtual

More information

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour CSE 4 Computer Architecture Spring 25 Lectures 7 Virtual Memory Pramod V. Argade May 25, 25 Announcements Office Hour Monday, June 6th: 6:3-8 PM, AP&M 528 Instead of regular Monday office hour 5-6 PM Reading

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

(Refer Slide Time: 01:25)

(Refer Slide Time: 01:25) Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture - 32 Memory Hierarchy: Virtual Memory (contd.) We have discussed virtual

More information

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM

More information

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

Module 5: MIPS R10000: A Case Study Lecture 9: MIPS R10000: A Case Study MIPS R A case study in modern microarchitecture. Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R10000 A case study in modern microarchitecture Overview Stage 1: Fetch Stage 2: Decode/Rename Branch prediction Branch

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

CS152 Computer Architecture and Engineering SOLUTIONS Caches and the Memory Hierarchy Assigned 9/17/2016 Problem Set #2 Due Tue, Oct 4

CS152 Computer Architecture and Engineering SOLUTIONS Caches and the Memory Hierarchy Assigned 9/17/2016 Problem Set #2 Due Tue, Oct 4 CS152 Computer Architecture and Engineering SOLUTIONS Caches and the Memory Hierarchy Assigned 9/17/2016 Problem Set #2 Due Tue, Oct 4 http://inst.eecs.berkeley.edu/~cs152/sp16 The problem sets are intended

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Alexandria University

Alexandria University Alexandria University Faculty of Engineering Division of Communications & Electronics CC322 Computer Architecture Sheet 3 1. A cache has the following parameters: b, block size given in numbers of words;

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

ECE 341 Final Exam Solution

ECE 341 Final Exam Solution ECE 341 Final Exam Solution Time allowed: 110 minutes Total Points: 100 Points Scored: Name: Problem No. 1 (10 points) For each of the following statements, indicate whether the statement is TRUE or FALSE.

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

Logical Diagram of a Set-associative Cache Accessing a Cache

Logical Diagram of a Set-associative Cache Accessing a Cache Introduction Memory Hierarchy Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year Levels of memory with different sizes & speeds close to

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main

More information

Question 13 1: (Solution, p 4) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate.

Question 13 1: (Solution, p 4) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate. Questions 1 Question 13 1: (Solution, p ) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate. Question 13 : (Solution, p ) In implementing HYMN s control unit, the fetch cycle

More information

Lecture 17: Memory Hierarchy: Cache Design

Lecture 17: Memory Hierarchy: Cache Design S 09 L17-1 18-447 Lecture 17: Memory Hierarchy: Cache Design James C. Hoe Dept of ECE, CMU March 24, 2009 Announcements: Project 3 is due Midterm 2 is coming Handouts: Practice Midterm 2 solutions The

More information

6.823 Computer System Architecture Datapath for DLX Problem Set #2

6.823 Computer System Architecture Datapath for DLX Problem Set #2 6.823 Computer System Architecture Datapath for DLX Problem Set #2 Spring 2002 Students are allowed to collaborate in groups of up to 3 people. A group hands in only one copy of the solution to a problem

More information

Cache introduction. April 16, Howard Huang 1

Cache introduction. April 16, Howard Huang 1 Cache introduction We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? The rest of CS232 focuses on memory and input/output issues, which are frequently

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality Administrative EEC 7 Computer Architecture Fall 5 Improving Cache Performance Problem #6 is posted Last set of homework You should be able to answer each of them in -5 min Quiz on Wednesday (/7) Chapter

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Virtual Memory 1 Chapter 8 Characteristics of Paging and Segmentation Memory references are dynamically translated into physical addresses at run time E.g., process may be swapped in and out of main memory

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 L20 Virtual Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Questions from last time Page

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 13

ECE 571 Advanced Microprocessor-Based Design Lecture 13 ECE 571 Advanced Microprocessor-Based Design Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements More on HW#6 When ask for reasons why cache

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

Memory Hierarchy Basics

Memory Hierarchy Basics Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit. Memory Hierarchy Goal: Fast, unlimited storage at a reasonable cost per bit. Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory. Fast: When you need something

More information

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Lecture 21: Virtual Memory. Spring 2018 Jason Tang Lecture 21: Virtual Memory Spring 2018 Jason Tang 1 Topics Virtual addressing Page tables Translation lookaside buffer 2 Computer Organization Computer Processor Memory Devices Control Datapath Input Output

More information

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay! Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

A cache is a small, fast memory which is transparent to the processor. The cache duplicates information that is in main memory.

A cache is a small, fast memory which is transparent to the processor. The cache duplicates information that is in main memory. Cache memories A cache is a small, fast memory which is transparent to the processor. The cache duplicates information that is in main memory. With each data block in the cache, there is associated an

More information

A Framework for Memory Hierarchies

A Framework for Memory Hierarchies Associativity schemes Scheme Number of sets Blocks per set Direct mapped Number of blocks in cache 1 Set associative Blocks in cache / Associativity Associativity (2-8) Fully associative 1 Number Blocks

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

Address spaces and memory management

Address spaces and memory management Address spaces and memory management Review of processes Process = one or more threads in an address space Thread = stream of executing instructions Address space = memory space used by threads Address

More information

COMP 3500 Introduction to Operating Systems Project 5 Virtual Memory Manager

COMP 3500 Introduction to Operating Systems Project 5 Virtual Memory Manager COMP 3500 Introduction to Operating Systems Project 5 Virtual Memory Manager Points Possible: 100 Submission via Canvas No collaboration among groups. Students in one group should NOT share any project

More information

CSC/ECE 506: Architecture of Parallel Computers Program 2: Bus-Based Cache Coherence Protocols Due: Wednesday, October 25, 2017

CSC/ECE 506: Architecture of Parallel Computers Program 2: Bus-Based Cache Coherence Protocols Due: Wednesday, October 25, 2017 CSC/ECE 506: Architecture of Parallel Computers Program 2: Bus-Based Cache Coherence Protocols Due: Wednesday, October 25, 2017 1. Overall Problem Description In this project, you will add new features

More information

Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm

Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm Second Semester, 2015 16 Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm Instruction: Submit your answers electronically through

More information

CS 3733 Operating Systems:

CS 3733 Operating Systems: CS 3733 Operating Systems: Topics: Memory Management (SGG, Chapter 08) Instructor: Dr Dakai Zhu Department of Computer Science @ UTSA 1 Reminders Assignment 2: extended to Monday (March 5th) midnight:

More information

CS 61C: Great Ideas in Computer Architecture Caches Part 2

CS 61C: Great Ideas in Computer Architecture Caches Part 2 CS 61C: Great Ideas in Computer Architecture Caches Part 2 Instructors: Nicholas Weaver & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/fa15 Software Parallel Requests Assigned to computer eg,

More information

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018 CS 31: Intro to Systems Virtual Memory Kevin Webb Swarthmore College November 15, 2018 Reading Quiz Memory Abstraction goal: make every process think it has the same memory layout. MUCH simpler for compiler

More information

Cache Memory: Instruction Cache, HW/SW Interaction. Admin

Cache Memory: Instruction Cache, HW/SW Interaction. Admin Cache Memory Instruction Cache, HW/SW Interaction Computer Science 104 Admin Project Due Dec 7 Homework #5 Due November 19, in class What s Ahead Finish Caches Virtual Memory Input/Output (1 homework)

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

1. Creates the illusion of an address space much larger than the physical memory

1. Creates the illusion of an address space much larger than the physical memory Virtual memory Main Memory Disk I P D L1 L2 M Goals Physical address space Virtual address space 1. Creates the illusion of an address space much larger than the physical memory 2. Make provisions for

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 07: Caches II Shuai Wang Department of Computer Science and Technology Nanjing University 63 address 0 [63:6] block offset[5:0] Fully-AssociativeCache Keep blocks

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part Hierarchy - I Israel Koren ECE568/Koren Part.. 3 4 5 6 7 8 9 A B C D E F 6 blocks 3 4 block

More information

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

CS 61C: Great Ideas in Computer Architecture. Virtual Memory CS 61C: Great Ideas in Computer Architecture Virtual Memory Instructor: Justin Hsia 7/30/2012 Summer 2012 Lecture #24 1 Review of Last Lecture (1/2) Multiple instruction issue increases max speedup, but

More information

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Cache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Cache Optimization Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Cache Misses On cache hit CPU proceeds normally On cache miss Stall the CPU pipeline

More information

Introduction. Memory Hierarchy

Introduction. Memory Hierarchy Introduction Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year 1 Memory Hierarchy Levels of memory with different sizes & speeds close to

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

Lecture 11. Virtual Memory Review: Memory Hierarchy

Lecture 11. Virtual Memory Review: Memory Hierarchy Lecture 11 Virtual Memory Review: Memory Hierarchy 1 Administration Homework 4 -Due 12/21 HW 4 Use your favorite language to write a cache simulator. Input: address trace, cache size, block size, associativity

More information

Pipelined processors and Hazards

Pipelined processors and Hazards Pipelined processors and Hazards Two options Processor HLL Compiler ALU LU Output Program Control unit 1. Either the control unit can be smart, i,e. it can delay instruction phases to avoid hazards. Processor

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Computer Organization Architectures for Embedded Computing Tuesday 28 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

Virtual to physical address translation

Virtual to physical address translation Virtual to physical address translation Virtual memory with paging Page table per process Page table entry includes present bit frame number modify bit flags for protection and sharing. Page tables can

More information

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Chapter 8 & Chapter 9 Main Memory & Virtual Memory Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array

More information

Main Memory (Fig. 7.13) Main Memory

Main Memory (Fig. 7.13) Main Memory Main Memory (Fig. 7.13) CPU CPU CPU Cache Multiplexor Cache Cache Bus Bus Bus Memory Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3 Memory b. Wide memory organization c. Interleaved memory organization

More information

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time

Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Lecture 9: Improving Cache Performance: Reduce miss rate Reduce miss penalty Reduce hit time Review ABC of Cache: Associativity Block size Capacity Cache organization Direct-mapped cache : A =, S = C/B

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Lec 11 How to improve cache performance

Lec 11 How to improve cache performance Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,

More information

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating

More information

Chapter 6 Objectives

Chapter 6 Objectives Chapter 6 Memory Chapter 6 Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 24: Cache Performance Analysis Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Associative caches How do we

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II 27-8-6 Scott Beamer, Instructor New Flow Based Routers CS61C L24 Cache II (1) www.anagran.com Caching Terminology When we try

More information

6 - Main Memory EECE 315 (101) ECE UBC 2013 W2

6 - Main Memory EECE 315 (101) ECE UBC 2013 W2 6 - Main Memory EECE 315 (101) ECE UBC 2013 W2 Acknowledgement: This set of slides is partly based on the PPTs provided by the Wiley s companion website (including textbook images, when not explicitly

More information

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review CISC 662 Graduate Computer Architecture Lecture 6 - Cache and virtual memory review Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David

More information

Memory Design. Cache Memory. Processor operates much faster than the main memory can.

Memory Design. Cache Memory. Processor operates much faster than the main memory can. Memory Design Cache Memory Processor operates much faster than the main memory can. To ameliorate the sitution, a high speed memory called a cache memory placed between the processor and main memory. Barry

More information

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology Memory: Page Table Structure CSSE 332 Operating Systems Rose-Hulman Institute of Technology General address transla+on CPU virtual address data cache MMU Physical address Global memory Memory management

More information