The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Size: px
Start display at page:

Download "The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms"

Transcription

1 The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1

2 Some useful definitions When the CPU finds a requested data item in the cache, it is called a cache hit. When the CPU does not find a date item in the cache it is called cache miss. A fixed collection of data containing the requested word, called a block, is retrieved from the main memory and placed into the cache. Temporal locality tells us that we are likely to need this word again in the near future, so it useful to place it in the cache. Because of spatial locality, there is a high probability that the other data in the block will be needed soon. 2 2

3 Cache Performance Review The equation for the CPU execution time can be rewritten as follows: CPU ex_time (CPU cl_cycle Memory stall cycle) *Clock Cycle time where memory stall cycles is a number of cycles during which the CPU is stalled waiting for a memory access. Memory stall cycles Number of misses * Miss penalty Misses IC * * Miss penalty Instructio n Memory accesses IC * * Miss rate * Miss penalty Instructio n Miss rate is a fraction of cache accesses that result in a miss (it can be different for reads and writes we use some kind of the average miss rate) 3 3

4 An example I Question. Consider the computer with CPI = 1, when all memory accesses hit in the cache. The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the computer be if all instructions were cache hits? Answer: Let s compute the performance for the computer that always hits: CPU ex_time (CPU cl_cycle (IC *CPI 0) *Clock cycle IC *1.0* Clock cycle Memory stall cycle) *Clock Cycle Now compute the performance for the computer with real cache: 4 4 time Memory accesses Memory stall cycles IC * * Miss rate * Miss penalty Instructio n IC *(1 0.5) *0.02 * 25 IC *0.75

5 An example II Then CPU ex_time_ca che (IC *1.0 IC *0.75) *Clock cycle 1.75* IC * Clock cycle The performance ration is the inverse of the execution times: CPU ex_time_ca CPU ex_time che 1.75* IC *clock cycle 1.0* IC * Clock cycle 1.75 The computer with no cache misses is 1.75 times faster. 5 5

6 Misses per instruction Instead of miss rate you can used misses per instruction, both measurements are equivalence. Misses instructio n Miss rate * memory access Instructio n count Miss rate * Memory accesses Instructio n For our example misses per instruction = 0.02 * 1.5 = Misses per instruction are also reported as misses per 1000 instruction (in our example we have 30 misses per 1000 instructions). 6 6

7 Where can a block be placed in a cache? If each block has only one place it can appear in the cache, the cache is said to be direct mapped. (block address) MOD (number of blocks in cache) If a block can be placed anywhere in the cache, the cache is said to be fully associative. If a block can be placed in a restricted set of places in the cache, the cache is set associative. A set is a group of blocks in the cache. A block is first mapped onto a set, and then the block can be placed anywhere within that set. The set is usually chosen by bit selection; that is, (block address) MOD (Number of sets in cache) If there are n blocks in a set, the cache placement is called n-way set associative. 7 7

8 How is a block found if it is the cache? Caches have an address tag on each block frame that gives the block address. The tag of every cache block that might contain the desired information is checked to see if it matches the block address from the CPU. BLOCK ADDRESS TAG INDEX BLOCK OFFSET Tag is used to check all blocks in the set, index is used to select the set, block offset is the address within block. Fully associative caches have no index field. There must be a way to know that a cache block does not have valid information. The most common procedure is to add a valid bit to the tag to say whether or not this entry contains a valid address. If the bit is not set, there cannot be a match on this address. 8 8

9 Which block should be replaced on a cache miss? Generally three different strategies are used: Random to spread allocation uniformly, candidate blocks are randomly selected (sometimes pseudorandom strategy is used to get reproducible behavior). Least-recently used (LRU) to reduce the chance of throwing out information that will be needed soon, accesses to blocks are recorded. Relying on the past to predict the future, the block replaced is the one that has been unused for the longest time. First in, first out (FIFO) because the LRU can be complicated to calculate, this approximates LRU by determining the oldest block rather that the LRU. 9 9

10 What happens on a write? The data cache traffic, writes are about 21%. With reads we have not problem the block can be read from the cache at the same time that the tag is read. With writes we have problems: firstly, we can not modifying a block until the tag is checked to see if the address is a hit; secondly in common processor specifies the size of the writes. Additionally we need to solve the problem of cash coherence. Then two different strategies are used: Write through the information is written to both the block in the cache and in the block in the lower-level memory, Write back the information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced

11 What happens on a write? Multiprocessors and I/O want write back for processors caches to reduce the memory traffic and write through to keep the cache consistent with lower levels of the memory hierarchy. When the CPU must wait for writes to complete during write through, the CPU said to write stall (it can be reduced by introducing the write buffers). Since the data are not needed on a write, there are two options on a write miss: Write allocate the block is allocated on a write miss, followed by the write hit actions above. No-write allocate this apparently unusual alternative is write misses do not affect the cache

12 Cache Performance - miss rate approach Miss rate is independent of the speed of hardware, however like a instruction count can be misleading. A better measure of memory hierarchy performance is the average memory access time: Average memory access time = Hit time + miss rate * miss penalty where the hit time is the time to hit the cache The components of average access time can be measured either in absolute time or in the number of clock cycles It is still indirect measure of performance

13 An example - question Which the lower miss rate: a 16kB instruction cache with a 16kB data cache or a 32kB unified cache (miss per 1000 instruction for unified 32kB cache is equal to 43.3 and for 16kB instruction cache and date caches 3.82 and 40,9 respectively, the percentage of instruction references is about 74%). Assume that 36% of the instructions are data transfer, a hit takes 1 clock cycle and miss penalty is 100 clock cycles. A load and store hit tales 1 clock cycle on a unified cache if there is only one cache port to satisfy two simultaneous requests. What is the average memory access time in each case? Additionally assume write-through caches with a write buffer and ignore stalls due to the write buffer

14 An example solution Let s convert misses per 1000 instruction into miss rates. Using the following equation: Misses /1000 Miss rate 1000 instructio ns memory accesses instructio n We get miss rate for 16kB instruction cache (3.82/1000/1) = 0.004, for 16kB data cache (40.9/1000/0.36) = and for unified cache (43.3/1000/( )) = The overall miss rate for split caches = (74% * % * 0.114) = So, a 32kB unified cache has a slightly lower effective miss rate than two 16kB caches

15 An example solution The average access time formula can be divides into in instruction and data accesses: Average access time = % instructions * (hit time + instruction miss rate * miss penalty) + % data * (hit time + data miss rate * miss penalty) Average time for unified cache = 74% * ( * 100) + 26% * ( * 100) = 4.44 Average time for split cache = 74% * ( * 100) + 26% * ( * 100) = 4.24 Thus the split caches have a better average memory access time

16 Impact of caches on performance I Question: Consider use an in-order execution computer. Assume the cache miss penalty is 100 clock cycles, and all instructions normally take 1.0 clock cycles. Assuming the average miss rate is 2%, there is an average of 1.5 memory references per instruction, and the average number of cache misses per 1000 instructions is 30. What is the impact on performance when behavior of the cache is included? Calculate the impact using both misses per instruction and miss rate

17 Impact of caches on performance II CPU time IC *(CPI exec Memory stall clock cycles Instructio n )*Clock cycle time The performance, including cache misses, is CPU time with cache IC *(1.0 (30/1000 *100)) *clock cycle IC *4.00*Clock cycle The performance using miss rate CPU time IC *(CPI exec CPU time time memory accesses miss rate * *miss penalty) instructio n with cache IC *(1.0 (1.5* 2%*100)*clock cycle IC *4.00*clock cycle *clock cycle The CPU time increases fourfold, with CPI from 1.00 for a perfect cache to 4.00 with a cache that can miss. time time time time 17 17

18 Cache misses impact on performance Cache misses have a double-barreled impact on a CPU with a low CPI and fast clock: The lower the CPI execution, the higher the relative impact of a fixed number of cache miss clock cycles. When calculating CPI, the cache miss penalty is measured in CPU clock cycles for a miss. Therefore, even if memory hierarchies for two computers are identical, the CPU with the higher clock rate has a larger number of clock cycles per miss and hence a higher memory portion of CPI

19 Reducing cache miss penalty Technology trends have improved the speed of processors faster than DRAMs, making the relative cost of miss penalties increase over time. One of the opportunities to reduced the miss penalty is to add another level of cache between the original cache and main memory in same sense its make the cache faster and larger The first-level cache can be small enough to match the clock cycle time of the fast CPU, when the second-level cache can be large enough to capture many accesses that would go to main memory

20 Two levels cache Let s define the average memory access time for a two-level cache (L1 and L2 refer to a first and second levels of cache respectively) Average memory access time = Hit time L1 + Miss rate L1 * Miss penalty L1 when Miss penalty L1 = Hit time L2 + Miss rate L2 * Miss penalty L2 Then Average memory access time = Hit time L1 + Miss rate L1 *(Hit time L2 + Miss rate L2 * Miss penalty L2 ) Local miss rate is simply the number of misses in a cache divide by the total number of memory accesses to this cache. Global miss rate the number of misses in the cache divided by the total number of memory accesses generated by the CPU. Average memory stalls per instruction = misses per instruction L1 * Hit time L2 +misses per instruction L2 * Miss penalty L

21 An example Question: Suppose that in 1000 memory references there are 40 misses in the first-level cache and 20 misses in the second-level cache. What is the various miss rates? Assume miss penalty = 100, hit time = 10, hit time = 1 and there are 1.5 references per instructions. What is the average memory access time and average stall cycles per instruction? Ignore the impact of writes. Answer: The miss rates for the first-level cache is 40/1000 (4%). The local miss rate for the second-level cache is 20/40 (50%). The global miss rate of the second-level cache is 20/1000 (2%). Thus Average memory access time = Hit time L1 + Miss rate L1 *(Hit time L2 + Miss rate L2 * Miss penalty L2 ) = =1 +4% * ( % * 100) = 1 + 4% * 60 = 3.4 clock cycles

22 An example cont. How many misses we get per instruction? We need to multiply the misses by 1.5 to get the number of misses per 1000 instructions. For L1 we get 40 * 1.5 = 60 misses and for L2 20 * 1.5 = 30 misses. Assume that misses are equally distributed between instructions and data then: average memory stalls per instruction = misses per instruction L1 * Hit time L2 + misses per instruction L2 * Miss penalty L2 = (60/1000)*10 + (30/1000)*100= 3.6 clock cycles 22 22

23 The next example Question: What is the impact of second-level cache associativity on its miss penalty, when: Hit time for direct mapped = 10 clock cycles, Two-way set associativity increases hit time by 0.1 clock cycles to 10.1 clock cycles, Local miss rate for direct mapped = 25%, Local miss rate for two-way set associative = 20%, Miss penalty = 100 clock cycles. Answer: For direct-mapped cache, the first level cache miss penalty is: Miss_penalty 1-way L2 = % * 100 = 35.0 clock cycles. Adding the cost of associativity, for the first level 2-way L2 cache we received a miss penalty equal to % * 100 = 30.2 clock cycles The second level hit time must be an integral number, thus an improvement is Miss_penalty 2-way L2 = % * 100 = 30.0 clock cycles or Miss_penalty 2-way L2 = % * 100 = 31.0 clock cycles 23 23

24 Three categories of misses Compulsory the very first access to a block cannot be in the cache, so the block must be brought into the cache (also called cold-start misses), Capacity if the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur because of blocks being discarded and later retrieved, Conflict if the block placement strategy is set associative or direct mapped, conflict misses will occur because a block may be discarded and later retrieved if too many blocks map to its set

25 Reducing miss rate the classical approaches The simplest way to reduce miss rate is to increase the block size. Question: Assume the memory system takes 80 clock cycles of overhead and then delivers 16 bytes every 2 clock cycles. It means that it can supply 16 bytes in 82 clock cycles, 32 bytes in 84 clock cycles, and so on. Calculate the average memory access time for different cache and block sizes Answer: Average memory access time = hit time + Miss rate * miss penalty If we assume the hit rate is 1 clock cycle independent of block size, then for 16-byte block size in a 4kB cache we get 1 +(8.57%*82)= 8.02, for 32-byte block size in 256kB cache we get 1 + (0.7%*84)=1.58 Basing on similar to above calculation we can choose the block size with the smallest average memory access time for different cache sizes. (for example 32 bytes for 4 kb cache or 64 bytes for larger caches) 25 25

26 Reducing miss rate compiler optimization Loop interchange for (j = 0; j < 100; j = j + 1) for (i = 0; i < 5000; i = i + 1) x[i][j] = 2 * x[i][j] for (i = 0; i < 5000; i = i + 1) for (j = 0; j < 100; j = j + 1) x[i][j] = 2 * x[i][j] Blocking - for example matrix multiplication 26 26

27 Organization of main memory to improve performance Performance measures of main memory emphasize both latency and bandwidth (the number of bytes read or written per unit of time) Assume the performance of the basic memory organization is: 4 clock cycles to send the address 56 clock cycles for the access time per word 4 clock cycles to send a word of data Given a cache block of 4 words, and the word is 8 bytes, the miss penalty is 4*(4+56+4) = 256 clock cycles with a memory bandwidth of 1/8 byte

28 Organization of main memory to improve performance Wider Main Memory First- level caches are often organized with physical width of 1 word because CPU accesses are that size, When doubling the width of the cache and the memory will therefore double the memory bandwidth, With memory of two words, the miss penalty in our example would drop from 256 clock cycles to 128 (we need half the memory accesses and the bandwidth is ¼ byte per clock cycle) 28 28

29 Organization of main memory to improve performance Simple interleaved memory Question: What can interleaving and wide memory buy? Consider the following description of the computer and its cache performance: block size = 1 word, memory bus width = 1 word, miss rate = 3%, memory access per instruction = 1.2, cache miss penalty = 64 cycles, average cycles per instruction = 2. If we change the block size to 2 words, the miss rate falls to 2%, and a 4-word block has a miss rate of 1.2%. What is the improvements in performance of interleaving two ways and four ways versus doubling the width of memory and the bus, assuming the access times from previous example

30 Organization of main memory to improve performance Answer: The CPI for the base computer using 1-word blocks is 2 + (1.2 * 3% * 64) = 4.3 Increasing the block size to 2 words gives the following options: 64-bit bus and memory, no interleaving = 2 + (1.2*2%*2*64)= bit bus and memory, interleaving = 2 + (1.2*2%*(4+56+8))= bit bus and memory, no interleaving = 2 + (1.2*2%*1*64)= 3.54 Thus, doubling the block size slows the straightforward implementation If we increase the block size to four we obtain the following: 64-bit bus and memory, no interleaving = 2 + (1.2*1,2%*4*64)= bit bus and memory, interleaving = 2 + (1.2*1.2%*( ))= bit bus and memory, no interleaving = 2 + (1.2*1.2%*2*64)= 3.84 Again the larger blocks hurts performance for the simple case, although the interleaved 64-bit memory is now fastest 30 30

31 Virtual memory Virtual memory divides physical memory into blocks and allocates them to different processes. The two memory hierarchy levels are controlled by virtual memory (DRAMs and magnetic disks) Virtual memory shares protected memory space, automatically manages the memory hierarchy and simplifies loading the program for execution. The program can be placed anywhere in physical memory or disc by changing the mapping between them (relocation). There are two classes of virtual memory: paging with fixes block size (power of 2) and segmentation with variable size blocks

32 Some useful definitions Page or segment is used for memory blocks. Page fault or address fault is used for miss. CPU uses virtual addresses that are translated by a combination of hardware and software to physical addresses, which access main memory. Above process is called memory mapping or address translation

33 Paging versus segmentation Words per address Programmer visible? Replacing the block Memory use inefficiency Efficient disk traffic Page One Invisible to application programmer Trivial (all blocks are the same size) Internal fragmentation (unused portion of page) Yes (adjust page size to balance access time and transfer time) Segment Two (segment and offset) May be visible to application programmer Hard (must find contiguous variable-size, unused portion of main memory) External fragmentation (unused pieces of main memory) Not always (small segments may transfer just a few bytes) 33 33

34 Where can a block be placed in main memory The miss penalty for virtual memory involves access to a rotating magnetic storage device and is therefore quite high (1,000,000 to 10,000,000 clock cycles). So, given the choice of lower miss rate or a simpler placement algorithm, the lower miss rate is select because of miss penalty. Generally, operating system allows blocks to be placed anywhere in main memory (fully associative)

35 How is a block found if it is in main memory? Virtual address Virtual number offset Main memory Page/ segment table Page/ segment 35 35

36 Which block should be replaced on a virtual memory miss Almost all operating systems try to replace the least-recently used (LRU) block because if the past predicts the future, that is the one less likely to be needed. For this aim a use bit (reference bit) which is logically set whenever a page is accessed is used. The operating system periodically clears the use bits and later records them so it can determine which pages were used during a particular time period

37 What happens on a write The level below main memory contains rotating magnetic disks that takes millions of clock cycles to access. So, operating systems avoid writes through main memory to disk on every store by the CPU. It means that the write strategy is always write back. Using the dirty bit allows blocks to be written to disk only if they have been altered since since being read from the disk

38 Translation lookaside buffer (TLB) Address Space number Virtual page number offset ASN Prot V Tag Physical address. 128 : 1 multiplexor Address 38 38

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Topics. Digital Systems Architecture EECE EECE Need More Cache? Digital Systems Architecture EECE 33-0 EECE 9-0 Need More Cache? Dr. William H. Robinson March, 00 http://eecs.vanderbilt.edu/courses/eece33/ Topics Cache: a safe place for hiding or storing things. Webster

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user

More information

5 Memory-Hierarchy Design

5 Memory-Hierarchy Design 5 Memory-Hierarchy Design 1 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory 5.8 Protection and

More information

ECEC 355: Cache Design

ECEC 355: Cache Design ECEC 355: Cache Design November 28, 2007 Terminology Let us first define some general terms applicable to caches. Cache block or line. The minimum unit of information (in bytes) that can be either present

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

CPE 631 Lecture 04: CPU Caches

CPE 631 Lecture 04: CPU Caches Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Memory Hierarchy: The motivation

Memory Hierarchy: The motivation Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Pollard s Attempt to Explain Cache Memory

Pollard s Attempt to Explain Cache Memory Pollard s Attempt to Explain Cache Start with (Very) Basic Block Diagram CPU (Actual work done here) (Starting and ending data stored here, along with program) Organization of : Designer s choice 1 Problem

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache Fundamental Questions Computer Architecture and Organization Hierarchy: Set Associative Q: Where can a block be placed in the upper level? (Block placement) Q: How is a block found if it is in the upper

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards Computer Organization CS 231-01 Improving Performance Dr. William H. Robinson November 8, 2004 Topics Money's only important when you don't have any. Sting Cache Scoreboarding http://eecs.vanderbilt.edu/courses/cs231/

More information

Introduction to cache memories

Introduction to cache memories Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal

More information

LSN 7 Cache Memory. ECT466 Computer Architecture. Department of Engineering Technology

LSN 7 Cache Memory. ECT466 Computer Architecture. Department of Engineering Technology LSN 7 Cache Memory Department of Engineering Technology LSN 7 Cache Memory Join large storage device to high-speed memory Contains copies of regions of the main memory Uses locality of reference to increase

More information

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions

More information

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating

More information

CS 136: Advanced Architecture. Review of Caches

CS 136: Advanced Architecture. Review of Caches 1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you

More information

LECTURE 12. Virtual Memory

LECTURE 12. Virtual Memory LECTURE 12 Virtual Memory VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a cache for magnetic disk. The mechanism by which this is accomplished

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Memory Hierarchy: Motivation

Memory Hierarchy: Motivation Memory Hierarchy: Motivation The gap between CPU performance and main memory speed has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory

More information

ADMIN. SI232 Set #18: Caching Finale and Virtual Reality (Chapter 7) Down the home stretch. Split Caches. Final Exam Monday May 1 (first exam day)

ADMIN. SI232 Set #18: Caching Finale and Virtual Reality (Chapter 7) Down the home stretch. Split Caches. Final Exam Monday May 1 (first exam day) ADMIN SI232 Set #8: Caching Finale and Virtual Reality (Chapter 7) Ethics Discussion & Reading Quiz Wed April 2 Reading posted online Reading finish Chapter 7 Sections 7.4 (skip 53-536), 7.5, 7.7, 7.8

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) Chapter 7-1 Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授 V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) 臺大電機吳安宇教授 - 計算機結構 1 Outline 7.1 Introduction 7.2 The Basics of Caches

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Systems Programming and Computer Architecture ( ) Timothy Roscoe Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture

More information

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

Introduction to OpenMP. Lecture 10: Caches

Introduction to OpenMP. Lecture 10: Caches Introduction to OpenMP Lecture 10: Caches Overview Why caches are needed How caches work Cache design and performance. The memory speed gap Moore s Law: processors speed doubles every 18 months. True for

More information

Memory Hierarchy and Caches

Memory Hierarchy and Caches Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

12 Cache-Organization 1

12 Cache-Organization 1 12 Cache-Organization 1 Caches Memory, 64M, 500 cycles L1 cache 64K, 1 cycles 1-5% misses L2 cache 4M, 10 cycles 10-20% misses L3 cache 16M, 20 cycles Memory, 256MB, 500 cycles 2 Improving Miss Penalty

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

5 Memory-Hierarchy. Design

5 Memory-Hierarchy. Design 5 Memory-Hierarchy Design Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available.... We are... forced to recognize the possibility

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

Question?! Processor comparison!

Question?! Processor comparison! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!

More information

1. Creates the illusion of an address space much larger than the physical memory

1. Creates the illusion of an address space much larger than the physical memory Virtual memory Main Memory Disk I P D L1 L2 M Goals Physical address space Virtual address space 1. Creates the illusion of an address space much larger than the physical memory 2. Make provisions for

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

Logical Diagram of a Set-associative Cache Accessing a Cache

Logical Diagram of a Set-associative Cache Accessing a Cache Introduction Memory Hierarchy Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year Levels of memory with different sizes & speeds close to

More information

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 233 6.2 Types of Memory 233 6.3 The Memory Hierarchy 235 6.3.1 Locality of Reference 237 6.4 Cache Memory 237 6.4.1 Cache Mapping Schemes 239 6.4.2 Replacement Policies 247

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head

More information

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Lecture 17 Introduction to Memory Hierarchies Why it s important  Fundamental lesson(s) Suggested reading: (HP Chapter Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "

More information

Introduction. Memory Hierarchy

Introduction. Memory Hierarchy Introduction Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year 1 Memory Hierarchy Levels of memory with different sizes & speeds close to

More information

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality Caching Chapter 7 Basics (7.,7.2) Cache Writes (7.2 - p 483-485) configurations (7.2 p 487-49) Performance (7.3) Associative caches (7.3 p 496-54) Multilevel caches (7.3 p 55-5) Tech SRAM (logic) SRAM

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Memory Hierarchy. Mehran Rezaei

Memory Hierarchy. Mehran Rezaei Memory Hierarchy Mehran Rezaei What types of memory do we have? Registers Cache (Static RAM) Main Memory (Dynamic RAM) Disk (Magnetic Disk) Option : Build It Out of Fast SRAM About 5- ns access Decoders

More information

Cray XE6 Performance Workshop

Cray XE6 Performance Workshop Cray XE6 Performance Workshop Mark Bull David Henty EPCC, University of Edinburgh Overview Why caches are needed How caches work Cache design and performance. 2 1 The memory speed gap Moore s Law: processors

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies 1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Lecture 11 Cache. Peng Liu.

Lecture 11 Cache. Peng Liu. Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative

More information

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality Administrative EEC 7 Computer Architecture Fall 5 Improving Cache Performance Problem #6 is posted Last set of homework You should be able to answer each of them in -5 min Quiz on Wednesday (/7) Chapter

More information

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. Computer Architectures Chapter 5 Tien-Fu Chen National Chung Cheng Univ. Chap5-0 Topics in Memory Hierachy! Memory Hierachy Features: temporal & spatial locality Common: Faster -> more expensive -> smaller!

More information

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part 11 Memory Hierarchy - I Israel Koren ECE568/Koren Part.11.1 ECE568/Koren Part.11.2 Ideal Memory

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip

More information

Memory Technologies. Technology Trends

Memory Technologies. Technology Trends . 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 48 Computer Organization 5. The Basics of Cache Chansu Yu Caches: The Basic Idea A smaller set of storage locations storing a subset of information from a larger set (memory) Unlike registers or memory,

More information

a process may be swapped in and out of main memory such that it occupies different regions

a process may be swapped in and out of main memory such that it occupies different regions Virtual Memory Characteristics of Paging and Segmentation A process may be broken up into pieces (pages or segments) that do not need to be located contiguously in main memory Memory references are dynamically

More information

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!) Basic Memory Hierarchy Principles Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!) Cache memory idea Use a small faster memory, a cache memory, to store recently

More information

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information