Physical characteristics (such as packaging, volatility, and erasability Organization.

Similar documents
Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

Unit 2. Chapter 4 Cache Memory

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory

Computer & Microprocessor Architecture HCA103

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram

Overview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (4 th Week)

Chapter 4. Cache Memory. Yonsei University

A Review on Cache Memory with Multiprocessor System

WEEK 7. Chapter 4. Cache Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.

COMPUTER ARCHITECTURE AND ORGANIZATION

Computer Organization (Autonomous)

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Chapter 4 - Cache Memory

Cache Policies. Philipp Koehn. 6 April 2018

Memory Hierarchy Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory.

TK2123: COMPUTER ORGANISATION & ARCHITECTURE. CPU and Memory (2)

Cache memory. Lecture 4. Principles, structure, mapping

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Caching Prof. James L. Frankel Harvard University. Version of 5:16 PM 5-Apr-2016 Copyright 2016 James L. Frankel. All rights reserved.

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

Chapter 4 Main Memory

Caching Basics. Memory Hierarchies

Lecture 2: Memory Systems

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Cache Memory Mapping Techniques. Continue to read pp

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Memory Organization MEMORY ORGANIZATION. Memory Hierarchy. Main Memory. Auxiliary Memory. Associative Memory. Cache Memory.

Memory memories memory

LSN 7 Cache Memory. ECT466 Computer Architecture. Department of Engineering Technology

CPE300: Digital System Architecture and Design

Fig 7.30 The Cache Mapping Function. Memory Fields and Address Translation

MEMORY. Objectives. L10 Memory

Chapter 6 Objectives

A Cache Hierarchy in a Computer System

EE 4683/5683: COMPUTER ARCHITECTURE

Memory Hierarchy: Caches, Virtual Memory

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

COSC 243. Memory and Storage Systems. Lecture 10 Memory and Storage Systems. COSC 243 (Computer Architecture)

LECTURE 11. Memory Hierarchy

Memory Design. Cache Memory. Processor operates much faster than the main memory can.

Page 1. Multilevel Memories (Improving performance using a little cash )

MIPS) ( MUX

Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CS 136: Advanced Architecture. Review of Caches

Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

CPE300: Digital System Architecture and Design

CMPSC 311- Introduction to Systems Programming Module: Caching

Binghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

1" 0" d" c" b" a" ...! ! Benefits of Larger Block Size. " Very applicable with Stored-Program Concept " Works well for sequential array accesses

Internal Memory Cache Stallings: Ch 4, Ch 5 Key Characteristics Locality Cache Main Memory

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

MICROPROCESSOR MEMORY ORGANIZATION

CS152 Computer Architecture and Engineering Lecture 17: Cache System

UCB CS61C : Machine Structures

Chapter 8. Virtual Memory

Mo Money, No Problems: Caches #2...

Computer System Overview

Computer System Overview OPERATING SYSTEM TOP-LEVEL COMPONENTS. Simplified view: Operating Systems. Slide 1. Slide /S2. Slide 2.

Memory Hierarchy. Cache Memory. Virtual Memory

Mismatch of CPU and MM Speeds

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The Memory Hierarchy & Cache

CISC 360. Cache Memories Exercises Dec 3, 2009

Advanced Memory Organizations

EE414 Embedded Systems Ch 5. Memory Part 2/2

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

UCB CS61C : Machine Structures

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.

Show Me the $... Performance And Caches

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

ECE ECE4680

COMPUTER ARCHITECTURES

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

UCB CS61C : Machine Structures

Lecture 11 Cache. Peng Liu.

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

COSC 6385 Computer Architecture - Memory Hierarchies (I)

CS3350B Computer Architecture

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Concept of Memory. The memory of computer is broadly categories into two categories:

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

www-inst.eecs.berkeley.edu/~cs61c/

Welcome to Part 3: Memory Systems and I/O

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Locality and Data Accesses video is wrong one notes when video is correct

Transcription:

CS 320 Ch 4 Cache Memory 1. The author list 8 classifications for memory systems; Location Capacity Unit of transfer Access method (there are four:sequential, Direct, Random, and Associative) Performance (there are three measures: access time, cycle time, and transfer rate.) Physical type (there are four:semiconductor, Magnetic, Optical, and Magnetooptical) Physical characteristics (such as packaging, volatility, and erasability Organization. Memory systems get progressively faster each year. The memory speed needs to be matched to the CPU speed. If the memory is too slow it becomes a bottleneck, If it is too fast (which is unlikely) you are wasting your money) In general the faster the memory access time the higher the cost for a given technology. In general the larger the memory the greater the cost for a given technology. In general the larger the memory the slower the access time this is due to address decoding.

The memory hierarchy shows that faster, smaller, more expensive memory with more volatility with higher usage is at the top. The principle of Locality of Reference means that the next location you need will be very close to the location that you have. This comes from the von Neumann architecture which stores the program sequentially. Cache memory added to most desktop computer systems to provide high speed at low cost? Not the highest speed and not the lowest cost but a reasonable tradeoff. Data is transferred between the CPU and a cache memory in words. Data is transferred between the cache and main memory in blocks. Mapping is a process that is used to change a main memory address to a cache address. Mapping is necessary because the cache is smaller than the main memory and has a different address space. Main memory is organized into blocks each of which contains a number of words. Cache is organized into cache lines or lines. A line is an area of cache which stores a block of memory copied from the main memory. A line also stores a tag which allows us to identify the block.

The author lists six design parameters for cache systems; Cache size cache is fast and expensive. Making it larger does not significantly improve the efficiency. mapping function The mapping function tells us how a block of memory gets mapped into the cache address space. replacement algorithm The replacement algorithm tells what we are going to throw away (replace) when the cache line gets full. write policy Data can be in both the cache and the main memory so if you write to cache the write policy tells us how main memory gets updated. line size The line size tells us how many words of memory go into a line. Since the cache size is limited a larger line size means fewer lines and vice-versa. number of caches It is common now to place a cache on the CPU and a second or third cache nearby. Three caches are not unusual.

Figure 4.5 shows a flow chart for accessing a cache memory. There are three mapping functions currently in use Direct, Associative, and Set Associative Direct Mapping In direct mapping blocks of main memory mapped into the cache such that each block has a unique cache line to which it maps. In direct mapping the main memory address is divided into three fields which are called the Tag, Line, and Word. The Word Specifies the word (or byte) within the block or line. The Line Specifies the line number in the cache where the block is to be stored. The Tag is the most significant address bits. For a given address the tag is checked against the tag at the line number to verify that the data is in the cache.

How direct mapping works for a cache memory. For a system with a 64K cache, 16M main memory, and a block size of 4 bytes, how is the 24 bit address divided into its component parts?(in this case a byte is a word) 16 M => 24 bit address 64K cache/4 byte per block = 16K blocks = 14 bit line number. 4 bytes means 2 bits for the word. The remaining 8 bits becomes the tag. 8 bit Tag, 14 bit Line, 2 bit Word. Since multiple blocks in main memory map to the same cache line, when the CPU accesses data that is not in the cache That line gets replaced. The advantages of direct mapping are Simplicity and low cost. The disadvantages of direct mapping are Potentially high miss ratio in some cases where successive data that maps to the same line is accessed.

Associative mapping For associative mapping a memory block can go into and line in the cache. In associative mapping the address is divided into only two parts which are the tag and the word. Any memory block can go to any line so it's not necessary to specify the line. When a main memory address is generated, the tag is compared to every tag in the cache to determine if the block is in the cache. If the tag is present then the word bits is used to locate the particular word needed. The following figure explains how associative mapping works.

For a system with a 64K cache, 16M main memory, and a block size of 4 bytes, how is the 24 bit address divided into its component parts for an associative mapped cache? Since there are 4 bytes per line we need two bits for the word. The remaining 22 bits are the tag. The advantage of associative mapping is that Any word in memory can go to any line in the cache so it is potentially faster with a higher hit ratio. The disadvantage of associative mapping is that it is expensive and complex circuitry needed to compare tag to all tags in cache in parallel. (Associative memory) For associative memory you have the data and you want to know if the data is in the memory. To make this fast enough the associative look up must be done in hardware. Set Associative Memory In set associative mapping the cache is further divided into sets each of which contain several lines. Each memory address specifies a fixed set but the word can be anywhere in the set. This is a compromise between the direct mapping and the fully associative mapping. In set associative mapping the main memory address is divided into three fields which are called the Tag, Set, and Word. The word Specifies the word (or byte) within the block.

The Set Specifies the set of blocks in the cache where the particular memory block can be stored. The Tag is the most significant address bits. For a given address the tag is checked against all of the tags in the set to verify that the data is in the cache. For a system with a 64K cache, 16M main memory, and a block size of 4 bytes, how is the 24 bit address divided into its component parts if the cache is two-way set associative? 64K cache/4 byte per block = 16K blocks. With two blocks per set there are 8 K sets = 13 bit set number. 4 bytes in a block means 2 bits for the word. The remaining 9 bits becomes the tag. 9 bit Tag, 13 bit Line, 2 bit Word. If the memory/cache system in the above problem is 4-way set associative instead of 2- way set associative, how does the address get divided up into its component parts? 64K cache/4 byte per block = 16K blocks. With four blocks per set there are 4 K sets = 12 bit set number. 4 bytes in a block means 2 bits for the word. The remaining 10 bits becomes the tag. 10 bit Tag, 12 bit Line, 2 bit Word. The advantages of set associative mapping is that it Retains the low cost of a direct mapped cache but has the high hit ratio similar to the fully associative cache. The disadvantages of a set associative cache is Slightly higher in cost and more complex than the direct mapped cache. The figure below shows the relationship between associativity, cache size, and hit ratio.

Replacement Algorithms A replacement algorithm is used to determine what needs to be replaced in the cache when it is full. For a direct mapped cache there is no choice as to what is to be replaced. Every block has a fixed location where it goes in cache. The replacement algorithms currently in use are: LRU Least Recently Used Random Pick an arbitrary line to discard FIFO First in First Out LFU Least Frequently Used LRU means least recently used. Replace the block that has been in the cache the longest without being referenced. LFU means least frequently used, i.e. replace the block that has the least number of references. In terms of additional hardware: for LRU some bits are needed to determine use. For LFU a counter is needed to determine frequency of use. FIFO or First In First Out means Replace the block that has been in the cache the longest. Surprise! Random replacement is nearly as efficient as algorithms based on usage Write Policy The write policy governs how memory is written to in the presence of a cache. The author discusses two commonly used write policies; write through and write back. For write through Every write to cache also writes simultaneously to memory. The advantage of Write Through is Main memory is always up to date and the policy is easy to implement. The disadvantage is that this technique may generate lots of bus traffic. About 15% of instructions involve write operations but this may go as high as 50% in some applications. Write back Writes to cache do not cause an automatic write to main memory. Memory is only written to when the cache block is changed out. Write back reduces bus traffic but memory may be out of date if it is used by multiple sources. This technique also requires some additional hardware in the way of an update bit in the cache.

Cache coherency: In systems with multiple memories this term refers to the validity of a location. Cache may be out of date if the memory is changed by an outside source by way of DMA for example. Non-cacheable memory: Some portion of main memory is not allowed to go into the cache. In this way the data can be shared and changed without fear of cache coherency problems. There is no optimal value for the line size. Line sizes of 8 to 32 bytes are most frequently used as empirically optimal. Larger line reduce the number of lines that can go into the cache meaning that blocks will have to be replaced more often. Cache systems may be multilevel: This typically involves two caches one of which is on the cpu chip and the second is off chip. These are referred to as the L1 and L2 caches. Cache may also be split. This means there is a separate area for instructions and data? This is typical since data and instructions don't have much location of reference. The disadvantage of a split cache is that there is less room for instructions and is slightly more complicated to implement. The Pentium 4 has three on board caches. There are Two L1 caches for instructions and data and one unified L2 cache. The L3 cache is available only on high end versions.

The P4's L2 cache organized as 8-way set associative with a line size of 128 bytes and an overall size of 256K. The two L1 caches on the P4 are organized as 4-way set associative, with a line size of 64 bytes and a total of 8K. From Ch 14, the L1 instruction cache is a trace cache and includes a different organization that features out of order instruction execution. Note that the P4 L1 instruction cache comes after the instruction decoder and thus holds microinstructions. These are more RISC like and support the superscalar architecture in the P4. The P4 data cache uses Write Back. The instruction cache needs no write policy. The ARM processors mostly use a Logical Cache with the exception of the ARM 11 which uses a Physical Cache. The difference between these two cache is that the Logical cache caches virtual addresses and physical cache caches physical addresses.