Chapter 6 Objectives

Similar documents
Memory. Objectives. Introduction. 6.2 Types of Memory

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

MEMORY. Objectives. L10 Memory

Chapter 6 Objectives

Chapter 6 Objectives

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Memory memories memory

Chapter 4 Main Memory

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Computer Systems and Networks. ECPE 170 Jeff Shafer University of the Pacific $$$ $$$ Cache Memory $$$

V. Primary & Secondary Memory!

CMPSC 311- Introduction to Systems Programming Module: Caching

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

Memory Hierarchy: Caches, Virtual Memory

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

MIPS) ( MUX

Memory Hierarchies &

CSE 2021: Computer Organization

UNIT-V MEMORY ORGANIZATION

Memory. Lecture 22 CS301

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

COMPUTER ARCHITECTURE AND ORGANIZATION

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CSE 2021: Computer Organization

Chapter 5. Memory Technology

Caching Basics. Memory Hierarchies

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

LECTURE 11. Memory Hierarchy

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

CMPSC 311- Introduction to Systems Programming Module: Caching

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

Overview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition

Computer Systems and Networks. ECPE 170 Jeff Shafer University of the Pacific $$$ $$$ Cache Memory 2 $$$

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Chapter 4. Cache Memory. Yonsei University

Cache memory. Lecture 4. Principles, structure, mapping

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

COSC 243. Memory and Storage Systems. Lecture 10 Memory and Storage Systems. COSC 243 (Computer Architecture)

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

Computer & Microprocessor Architecture HCA103

+ Random-Access Memory (RAM)

Memory hierarchy and cache

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

Question?! Processor comparison!

Recap: Machine Organization

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory

Computer Architecture Memory hierarchies and caches

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

ECEC 355: Cache Design

Unit 2. Chapter 4 Cache Memory

Lecture 18: Memory Systems. Spring 2018 Jason Tang

Module 5a: Introduction To Memory System (MAIN MEMORY)

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Page 1. Multilevel Memories (Improving performance using a little cash )

Computer Organization (Autonomous)

Chapter 8. Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Introduction to cache memories

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Why memory hierarchy

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Contents. Main Memory Memory access time Memory cycle time. Types of Memory Unit RAM ROM

Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Caches. Hiding Memory Access Times

Cache and Memory. CS230 Tutorial 07

Transcription:

Chapter 6 Memory

Chapter 6 Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts: Cache memory, Virtual memory, Memory segmentation, paging and 2

6.1 Introduction Memory lies at the center of the stored-program computer. In previous chapter, we studied the ways in which memory is accessed by various ISAs. In this chapter, we focus on memory organization. A clear understanding of these ideas is essential for the analysis of system performance. 3

6.2 Types of Memory There are two kinds of main memory: Random access memory, RAM, and Read-only-memory, ROM. There are two types of RAM, Static RAM (SRAM). flip-flop, very fast, cache memory Dynamic RAM (DRAM) capacitors, refresh, slow, simple design, cheap S - DRAM : Synchronous DRAM DR - DRAM : Direct Rambus DRAM DDR - DRAM : Double Data Rate DRAM 4

6.2 Types of Memory ROM is used to store permanent data that persists even while the system is turned off (Non-Volatile). Types of ROM ROM PROM EPROM EEPROM Flash memory 5

6.3 The Memory Hierarchy This storage organization can be thought of as a pyramid: 6

6.3 The Memory Hierarchy To access a particular piece of data, the CPU first sends a request to its nearest memory, usually cache. If the data is not in cache, then main memory is queried. If the data is not in main memory, then the request goes to disk. Once the data is located, then the data, and a number of its nearby data elements are fetched into cache memory. 7

6.3 The Memory Hierarchy To access a particular piece of data, The CPU sends a request to the memory Write Read Memory Address Register decoder Addresses Memory n-bits per word Data Memory Buffer Register 8

6.3 The Memory Hierarchy If If The data is not in cache, Then Main memory is queried. The data is not in main memory, Then The request goes to disk Blocks P r o c e s s o r Upper (Higher) level D a t a a r e t r a n s f e r r e d Lower level 9

6.3 The Memory Hierarchy Once the data is located, then the data, and a number of its nearby data elements are fetched into cache memory P r o c e s s o r Upper (Higher) level D a t a a r e t r a n s f e r r e d Lower level 10

6.3 The Memory Hierarchy locality of reference (the principle of locality): The same value or related storage locations being frequently accessed. There are three forms of locality: Temporal locality- Recently-accessed data elements tend to be accessed again. Spatial locality - Accesses tend to cluster. Sequential locality - Instructions tend to be accessed sequentially. 11

6.3 The Memory Hierarchy 100 101 102 103 104 105 106 107 108 109 10A 10B 10C Load Add Store Jns Load Subt Skipcond Jump Halt 17 1 0 5 109 10A 10B 200 10C 10A 400 104 12

6.3 The Memory Hierarchy This leads us to some definitions. A hit is when data is found at a given memory level. A miss is when it is not found. The hit rate is the percentage of time data is found at a given memory level. The miss rate is the percentage of time it is not. Miss rate = 1 - hit rate. The hit time is the time required to access data at a given memory level. The miss penalty is the time required to process a miss, including the time that it takes to replace a block of memory plus the time it takes to deliver the data to the processor. 13

6.3 The Memory Hierarchy Definitions. P r o c e s s o r Hit Miss Hit rate Miss rate Miss rate = 1 - hit rate. Hit time Upper (Higher) level D a t a a r e t r a n s f e r r e d Lower level Miss penalty 14

6.4 Cache Memory The simplest cache mapping scheme is direct mapped cache. In a direct mapped cache consisting of N blocks of cache, block X of main memory maps to cache block Y = X mod N. Thus, if we have 10 blocks of cache, block 7 of cache may hold blocks 7, 17, 27, 37,... of main memory. Once a block of memory is copied into its slot in cache, a valid bit is set for the cache block to let the system know that the block contains valid data. What could happen without having a valid bit? 15

6.4 Cache Memory Example: Block size is one word of data 0 1 2 3 Cache Memory 0 1 2 3 16

6.4 Cache Memory Example: Block size is one word of data 0 1 2 3 Cache Memory 0 1 2 3 4 5 6 7 17

6.4 Cache Memory Example: Block size is one word of data 0 1 2 3 Cache Memory 0 1 2 3 4 5 6 7 18

6.4 Cache Memory Example: Block size is one word of data 0 1 2 3 Cache Memory 0 1 2 3 4 5 6 7 19

6.4 Cache Memory Example: Block size is one word of data 0 1 2 3 Cache Memory 0 1 2 3 4 5 6 7 20

6.4 Cache Memory Example: Block size is one word of data 0 1 2 3 Cache Memory 0 1 2 3 4 5 6 7 21

6.4 Cache Memory Example: Block size is one word of data 0 1 2 3 Cache Memory 0 1 2 3 4 5 6 7 22

6.4 Cache Memory Mapping: Block size is one word of data Cache has 8 entries Address is modulo the number of blocks in the cache 23

0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 6.4 Cache Memory Mapping: Block size is one word of data Cache has 8 entries Address is modulo the number of blocks in the cache C a c h e M e m o r y 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 24

6.4 Cache Memory Problem: How do we know whether the requested word is in the cache? Solution: Tag field Added to data word in cache Contain address information to where the data belongs Need only contain the upper portion of the address 25

6.4 Cache Memory Problem: How do we know that the block has the valid information? Solution: Valid bit Added to data word in cache 1 means data is valid (valid address) 0 means data is invalid (Invalid address) 26

Direct Mapped Cache Example Draw the cache after inserting the following blocks 10110, 11010, 10000, 00011, 10010 Initial Cache Index V Tag Data 000 0 001 0 010 0 011 0 100 0 101 0 110 0 111 0 27

Direct Mapped Cache After 10110 Index V Tag Data 000 0 001 0 010 0 011 0 100 0 101 0 After 11010 110 1 10 Memory(10110) 111 0 Index V Tag Data 000 0 001 0 010 1 11 Memory(11010) 011 0 100 0 101 0 28 110 1 10 Memory(10110) 111 0

Direct Mapped Cache After 10000 Index V Tag Data 000 1 10 Memory(10000) 001 0 010 1 11 Memory(11010) 011 0 100 0 101 0 After 00011 110 1 10 Memory(10110) 111 0 Index V Tag Data 000 1 10 Memory(10000) 001 0 010 1 11 Memory(11010) 011 1 00 Memory(00011) 100 0 101 N 110 1 10 Memory(10110) 29 111 0

Direct Mapped Cache After 10010 Index V Tag Data 000 1 10 Memory(10000) 001 0 010 1 10 Memory(10010) 011 1 00 Memory(00011) 100 0 101 0 110 1 10 Memory(10110) 111 0 30

Direct Mapped Cache Check: 10001 00110 10010 11111 Index V Tag Data 000 1 10 5120 001 0 010 1 10 124A 011 1 00 74B0 100 0 101 0 110 1 10 381F 111 0 01110 31

Direct Mapped Cache The a schematic of what cache looks like. Block 0 contains multiple words from main memory, identified with the tag 00000000. Block 1 contains words identified with the tag 11110101. The other two blocks are not valid. 32

Direct Mapped Cache The size of each field into which a memory address is divided depends on the size of the cache. Suppose our memory consists of 2 14 words, cache has 16 = 2 4 blocks, and each block holds 8 words. Thus memory is divided into 2 14 / 2 8 = 2 11 blocks. For our field sizes, we need 4 bits for the block, 3 bits for the word, and the tag is what s left over: 33

Direct Mapped Cache As an example, suppose a program generates the address 1AA. In 14-bit binary, this number is: 00000110101010. The first 7 bits of this address go in the tag field, the next 4 bits go in the block field, and the final 3 bits indicate the word within the block. 34

Direct Mapped Cache If subsequently the program generates the address 1AB, it will find the data it is looking for in block 0101, word 011. However, if the program generates the address, 3AB, instead, the block loaded for address 1AA would be evicted from the cache, and replaced by the blocks associated with the 3AB reference. 35

Fully Associative Cache Instead of placing memory blocks in specific cache locations based on memory address, we could allow a block to go anywhere in cache. In this way, cache would have to fill up before any blocks are replaced. This is how fully associative cache works. A memory address is partitioned into only two fields: the tag and the word. 36

Fully Associative Cache 37

Fully Associative Cache If we have 14-bit memory addresses and a cache with 16 blocks, each block of size 8. The field format of a memory reference is: When the cache is searched, all tags are searched in parallel to retrieve the data quickly. This requires special, costly hardware. 38

Fully Associative Cache No such mapping, Each memory block can be placed at any cache block The block that is replaced is the victim block. Which block you will replace? There are a number of ways to pick a victim, 39

Replacement Policy An optimal replacement policy would be able to look into the future to see which blocks won t be needed for the longest period of time. It is impossible to implement an optimal replacement algorithm, It is instructive to use it as a benchmark for assessing the efficiency of other schemes. 40

Replacement Policy A least recently used (LRU) algorithm keeps track of the last time that a block was assessed and evicts the block that has been unused for the longest period of time. The disadvantage of this approach is its complexity: LRU has to maintain an access history for each block, which ultimately slows down the cache. 41

Replacement Policy First-in, first-out (FIFO) is a popular cache replacement policy. In FIFO, the block that has been in the cache the longest, regardless of when it was last used. 42

Replacement Policy A random replacement policy does what its name implies: It picks a block at random and replaces it with a new block. Random replacement can certainly evict a block that will be needed often or needed soon, but it never thrashes. 43

Cache performance The performance of hierarchical memory is measured by its effective access time (EAT). EAT is a weighted average that takes into account the hit ratio and relative access times of successive levels of memory. The EAT for a two-level memory is given by: EAT = H Access C + (1-H ) Access MM. H is the cache hit rate Access C and Access MM are the access times for cache and main memory, respectively. 44

Cache performance For example, consider a system with a main memory access time of 200ns supported by a cache having a 10ns access time and a hit rate of 99%. The EAT is: 0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns = 11ns. This equation for determining the effective access time can be extended to any number of memory levels, as we will see in later sections. 45

Hits vs. Misses Read hits This is what we want! Read misses Stall the CPU, Fetch block from memory Deliver to cache Restart Write hits: Can replace data in cache and memory (write-through) Write data only into the cache (write-back the cache later) Write misses: Read the entire block into the cache, then write the word 46

Handling Cache Misses The action taken for cache miss depends on whether the action is to access instruction or data Data Access Stall the processor until the memory responds with the data Instruction access: The contents of the instruction register are invalid Get the instruction from the lower-level memory into the cache 47

Write-Through Approach Idea: Always write data into both memory & cache Steps: 1. Index the cache (Bits 15-2 of address) 2. Write the tag, data, & valid bit into cache 3. Write the word to main memory using the entire address Advantages Simple algorithm Disadvantages Poor performance Writing into main memory slows down the machine 48

Write-Back Approach Handles writes by updating values only to the block in the cache, then writing the modified block to the lower level of the hierarchy when the block is replaced Steps: 1. Index the cache 2. Write the tag, data, & valid bit into cache 3. The modified block is written to the memory only when it is replaced Advantages: Improves performance Disadvantages: Complex algorithm 49

Cache Write Policy Cache replacement policies must also take into account dirty blocks, those blocks that have been updated while they were in the cache. Dirty blocks must be written back to memory. A write policy determines how this will be done. There are two types of write policies, write through and write back. Write through updates cache and main memory simultaneously on every write. 50

6.4 Cache Memory Write back (also called copyback) updates memory only when the block is selected for replacement. The disadvantage of write through is that memory must be updated with each cache write, which slows down the access time on updates. This slowdown is usually negligible, because the majority of accesses tend to be reads, not writes. The advantage of write back is that memory traffic is minimized, but its disadvantage is that memory does not always agree with the value in cache, causing problems in systems with many concurrent users. 51

6.6 A Real-World Example The Pentium architecture supports both paging and segmentation, and they can be used in various combinations including unpaged unsegmented, segmented unpaged, and unsegmented paged. The processor supports two levels of cache (L1 and L2), both having a block size of 32 bytes. The L1 cache is next to the processor, and the L2 cache sits between the processor and memory. The L1 cache is in two parts: and instruction cache (Icache) and a data cache (D-cache). The next slide shows this organization schematically. 52

6.6 A Real-World Example 53

Chapter 6 Conclusion Computer memory is organized in a hierarchy, with the smallest, fastest memory at the top and the largest, slowest memory at the bottom. Cache memory gives faster access to main memory, while virtual memory uses disk storage to give the illusion of having a large main memory. Cache maps blocks of main memory to blocks of cache memory. Virtual memory maps page frames to virtual pages. There are three general types of cache: Direct mapped, fully associative and set associative. 54

Chapter 6 Conclusion With fully associative and set associative cache, as well as with virtual memory, replacement policies must be established. Replacement policies include LRU, FIFO, or LFU. These policies must also take into account what to do with dirty blocks. 55