A cache is a small, fast memory which is transparent to the processor. The cache duplicates information that is in main memory.

Similar documents
[ 6.1] A cache is a small, fast memory which is transparent to the processor. The cache duplicates information that is in main memory.

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CPE300: Digital System Architecture and Design

CPSC 330 Computer Organization

Modern Computer Architecture

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Virtual to physical address translation

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Logical Diagram of a Set-associative Cache Accessing a Cache

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory

ECE 341. Lecture # 18

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

[ 5.4] What cache line size is performs best? Which protocol is best to use?

Memory hierarchy review. ECE 154B Dmitri Strukov

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Virtual or Logical. Logical Addr. MMU (Memory Mgt. Unit) Physical. Addr. 1. (50 ns access)

ECE468 Computer Organization and Architecture. Virtual Memory

ECE4680 Computer Organization and Architecture. Virtual Memory

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

CS 153 Design of Operating Systems Winter 2016

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

CS 537 Lecture 6 Fast Translation - TLBs

A Review on Cache Memory with Multiprocessor System

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Introduction to OpenMP. Lecture 10: Caches

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Introduction. Memory Hierarchy

CMPSC 311- Introduction to Systems Programming Module: Caching

Memory. Objectives. Introduction. 6.2 Types of Memory

CMPSC 311- Introduction to Systems Programming Module: Caching

Memory Design. Cache Memory. Processor operates much faster than the main memory can.

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Operating Systems. Operating Systems Sina Meraji U of T

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

1. Memory technology & Hierarchy

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

Binghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter

ECE232: Hardware Organization and Design

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Memory and multiprogramming

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

UCB CS61C : Machine Structures

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Memory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Chapter Seven Morgan Kaufmann Publishers

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

ECE232: Hardware Organization and Design

ECE331: Hardware Organization and Design

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Virtual Memory Oct. 29, 2002

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

The memory of a program. Paging and Virtual Memory. Swapping. Paging. Physical to logical address mapping. Memory management: Review

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

LECTURE 11. Memory Hierarchy

CS162 - Operating Systems and Systems Programming. Address Translation => Paging"

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

1" 0" d" c" b" a" ...! ! Benefits of Larger Block Size. " Very applicable with Stored-Program Concept " Works well for sequential array accesses

Computer Science 146. Computer Architecture

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

Fig 7.30 The Cache Mapping Function. Memory Fields and Address Translation

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

LECTURE 12. Virtual Memory

Memory Technologies. Technology Trends

Chapter 8. Virtual Memory

HY225 Lecture 12: DRAM and Virtual Memory

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

UCB CS61C : Machine Structures

VIRTUAL MEMORY II. Jo, Heeseung

Operating Systems and Computer Networks. Memory Management. Dr.-Ing. Pascal A. Klein

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

UCB CS61C : Machine Structures

CSE 120 Principles of Operating Systems Spring 2017

Memory Management! Goals of this Lecture!

1. Creates the illusion of an address space much larger than the physical memory

Cray XE6 Performance Workshop

Caches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

CS61C : Machine Structures

virtual memory Page 1 CSE 361S Disk Disk

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging

Virtual Memory. control structures and hardware support

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Caching Basics. Memory Hierarchies

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 26, SPRING 2013

Transcription:

Cache memories A cache is a small, fast memory which is transparent to the processor. The cache duplicates information that is in main memory. With each data block in the cache, there is associated an identifier or tag. This allows the cache to be content addressable (e.g., like translation lookaside buffers). Key 37 26? 26 49 information information 7 A replacement rule must be used to purge entries from the cache when the space is needed for new information. Caches are smaller and faster than main memory. Backing, or secondary storage, on the other hand, is larger and slower. Cache Main memory Secondary storage Speed differences between cache memory and virtual memory. Main memory is about 10 50 times slower than a cache, but secondary memory is 1000 times slower than main memory. A cache miss is the term analogous to a page fault. Cache misses must be handled much more quickly than page faults. Thus, they are handled in hardware. Lecture 10 Architecture of Parallel Computers 1

Caches can be organized according to four different strategies: Direct Fully associative Set associative Sectored A cache implements several different policies for retrieving and storing information, one in each of the following categories: Fetch policy determines when information is loaded into the cache. Replacement policy determines what information is purged when space is needed for a new entry. Write policy determines how soon information in the cache is written to lower levels in the memory hierarchy. Cache memory organization Information is moved into and out of the cache in blocks. When a block is in the cache, it occupies a cache line. Blocks are usually larger than one byte (or word), to take advantage of locality in programs, and because memory may be organized so that it can overlap transfers of several words at a time. The block size is the same as the line size of the cache. A placement policy determines where a particular block can be placed when it goes into the cache. E.g., is a block of memory eligible to be placed in any line in the cache, or is it restricted to a single line? In our examples, we assume 2002 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2002 2

The cache contains 2048 words, with 16 words per line Thus it has lines. Main memory is made up of 256K words, or 16384 blocks. Thus an address consists of We want to structure the cache to achieve a high hit ratio. Hit the referenced information is in the cache. Miss referenced information is not in cache, must be read in from main memory. Hit ratio Number of hits Total number of references As noted above, there are four different placement policies. Direct Only 1 choice of where to place a block. block i line i mod 128 Each line has its own tag associated with it. When the line is in use, the tag contains the high-order seven bits of the main-memory address of the block. Lecture 10 Architecture of Parallel Computers 3

Cache 7 bits Line 0 Line 1 Line 127 Main-memory address 7 7 4 Main memory Block 0 Block 1 Block 2 Block 127 Block 128 Block 129 Block 255 Block 256 Block 257 Block 4095 Block 4096 Block 16383 Line Word To search for a word in the cache, 1. Determine what line to look in (easy; just select bits 10 4 of the address). 2. Compare the leading seven bits (bits 17 11) of the address with the tag of the line. If it matches, the block is in the cache. 3. Select the desired word from the line. Advantages: Fast lookup (only one comparison needed). Cheap hardware (no associative comparison). Easy to decide Disadvantage: Contention for lines. 2002 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2002 4

Fully associative A full 128 choices of where to place a block. block i any free (or purgeable) cache location 14 bits Cache Line 0 Line 1 Main memory Block 0 Block 1 Block i Line 127 Main-memory address 14 4 Block 16382 Block 16383 Word Each line has its own tag associated with it. When the line is in use, the tag contains the high-order fourteen bits of the main-memory address of the block. To search for a word in the cache, 1. Simultaneously compare the leading 14 bits (bits 17-4) of the address with the tag of all lines. If it matches any one, the block is in the cache. 2. Select the desired word from the line. Advantages: Minimal contention for lines. Wide variety of replacement algorithms feasible. Lecture 10 Architecture of Parallel Computers 5

Disadvantage: The most expensive of all organizations, due to the high cost of associative-comparison hardware. A flowchart of cache operation: The process of searching a fully associative cache is very similar to using a directly mapped cache. Let us consider them in detail. Virtual address Page number Byte within page Search TLB TLB hit? No Yes Update replacement status of TLB entries Block number Byte within block Select TLB victim to be replaced Enter new (virt., phys.) addr. pair in TLB Translate virt. addr. to physical addr. Select cache victim to be replaced Search tags of cache lines Cache hit? No Yes Fetch block from main memory Update replacement status of cache entries Fetch block from cache Store new block in cache Note that this diagram assumes a separate TLB. Select desired bytes from block Send byte(s) to processor Which steps would be different if the cache were directly mapped? 2002 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2002 6

Set associative 1 < n < 128 choices of where to place a block. A compromise between direct and fully associative strategies. The cache is divided into s sets, where s is a power of 2. block i any line in set i mod s Each line has its own tag associated with it. When the line is in use, the tag contains the high-order eight bits of the main-memory address of the block. (The next six bits can be derived from the set number.) Set 0 8 bits Cache Line 0 Line 1 Main memory Block 0 Block 1 Set 1 Line 2 Line 3 Block 63 Block 64 Block 65 Set 63 Line 126 Line 127 Block 4095 Block 16383 Main-memory address 8 6 4 Set Word To search for a word in the cache, Lecture 10 Architecture of Parallel Computers 7

1. Select the proper set (i mod s). 2. Simultaneously compare the leading 8 bits (bits 17 10) of the address with the tag of all lines in the set. If it matches any one, the block is in the cache. At the same time, the (first bytes/words of) the lines are also being read out so they will be accessible at the end of the cycle. 3. If a match is found, gate the data from the proper block to the cache-output buffer. 4. Select the desired word from the line. Desired block # s from set Lines from set =? =? =? =? Select Select Select Select Cache outputdata buffer Data out All reads from the cache occur as early as possible, to allow maximum time for the comparison to take place. Which line to use is decided late, after the data have reached high-speed registers, so the processor can receive the data fast. To attain maximum speed in accessing data, we would like to start searching the cache at the same time we are looking up the page number in the TLB. 2002 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2002 8

When the bit-selection method is used, both can be done at once if the page number is disjoint from the set number. This means that the number of bits k in the set number + the number of bits j which determine the byte (or word) within a line must be the number of bits d in the displacement field. Page number k Set number d We want k + j d. j Byte within block (If the page size is 2 d, then there will be d bits in the displacement field.) Factors influencing line lengths: Long lines higher hit ratios. Long lines less memory devoted to tags. Long lines longer memory transactions (undesirable in a multiprocessor). Long lines more write-backs (explained below). For most machines, line sizes between 16 and 64 bytes perform best. If there are b lines per set, the cache is said to be b-way set associative. The logic to compare 2, 4, or 8 tags simultaneously can be made quite fast. But as b increases beyond that, cycle time starts to climb, and the higher cycle time begins to offset the increased associativity. Lecture 10 Architecture of Parallel Computers 9

Almost all caches built today are either direct mapped, or 2- or 4-way set-associative. As cache size is increased, does a high degree of set associativity become more important? A high degree of set associativity gives more freedom of where to place a block in the cache. So it tends to decrease the chance of a conflict requiring a purge. A large cache also tends to decrease the chance that two blocks will map to the same line. So it tends to decrease the chance of a conflict requiring a purge. Consider a pathological case: A cache with 128 lines and a program that references every 128th block. Each reference causes a miss in a direct-mapped cache. Increasing the set associativity increases the hit ratio. Increasing the cache size also increases the hit ratio. So as cache size is increased, high associativity becomes less important. Sectored With a sectored cache, main memory is partitioned into sectors, each containing several blocks. The cache is partitioned into sector frames, each containing several lines. (The number of lines/sector frame = the number of blocks/sector.) When block b of a new sector c is brought in, it is brought into line b within some sector frame f, and the rest of the lines in sector frame f are marked invalid. Thus, if there are S sector frames, there are S choices of where to place a block. There are 128 lines in the cache. Assume there are 16 blocks per sector. Then there are only 8 sector frames in the cache. 2002 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2002 10

Thus, when a frame is in use, its tag contains the high-order of the main-memory address of the block. bits Since there are relatively few tags, all tags can be searched simultaneously. Sec. fr. 0 Sec. fr. 1 10 bits Cache Line 0 Line 1 Line 14 Line 15 Line 16 Line 31 Valid bit Main memory Block 0 Block 1 Block 15 Block 16 Block 31 Sector 0 Sector 1 Sec. fr. 7 Line 112 Line 127 Block 16368 Block 16383 Sector 1023 Main-memory address 10 4 4 Sector (tag) Block Word Sectored caches were popular in the early days of mainframes. They have made a comeback recently, with the increasing popularity of onchip caches, where space is severely limited. Diminishing returns: As the cache size grows larger, it becomes more and more difficult to improve performance by increasing the size of the cache: Lecture 10 Architecture of Parallel Computers 11

Miss ratio Cache size The 30% rule: Doubling cache size reduces the number of misses by 30%. Cache memory need not grow linearly with physical memory. (Program locality doesn t grow linearly with physical memory.) Modern processor architecture uses two, or even three, levels of caches. The first level captures most of the hits. The second level is still much faster than main memory. Performance is similar to what would be obtained with a cache as fast as the first level. Cost is similar to that of the second level. 1st-level cache 2nd-level cache Main memory F a s t e r 2002 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2002 12