CPSC 352. Chapter 7: Memory. Computer Organization. Principles of Computer Architecture by M. Murdocca and V. Heuring

Similar documents
Principles of Computer Architecture. Chapter 7: Memory

Computer Architecture and Organization

CMSC 313 Lecture 26 DigSim Assignment 3 Cache Memory Virtual Memory + Cache Memory I/O Architecture

CPE300: Digital System Architecture and Design

1. Creates the illusion of an address space much larger than the physical memory

CPE300: Digital System Architecture and Design

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

EE414 Embedded Systems Ch 5. Memory Part 2/2

Lecture 2: Memory Systems

Chapter 7- Memory System Design

Memory Organization MEMORY ORGANIZATION. Memory Hierarchy. Main Memory. Auxiliary Memory. Associative Memory. Cache Memory.

Computer Organization (Autonomous)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Memory. Objectives. Introduction. 6.2 Types of Memory

Topics in Memory Subsystem

Pipelined processors and Hazards

EN1640: Design of Computing Systems Topic 06: Memory System

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Structure of Computer Systems

MIPS) ( MUX

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

Contents. Main Memory Memory access time Memory cycle time. Types of Memory Unit RAM ROM

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

ADDRESS TRANSLATION AND TLB

ADDRESS TRANSLATION AND TLB

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Chapter 4 Main Memory

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

EN1640: Design of Computing Systems Topic 06: Memory System

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Design with Microprocessors

CPSC 330 Computer Organization

Memory Hierarchy and Caches

Fig 7.30 The Cache Mapping Function. Memory Fields and Address Translation

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Memory latency: Affects cache miss penalty. Measured by:

CS399 New Beginnings. Jonathan Walpole

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

ECE468 Computer Organization and Architecture. Virtual Memory

Systems Programming and Computer Architecture ( ) Timothy Roscoe

ECE4680 Computer Organization and Architecture. Virtual Memory

Chapter 8. Virtual Memory

UMBC. Select. Read. Write. Output/Input-output connection. 1 (Feb. 25, 2002) Four commonly used memories: Address connection ... Dynamic RAM (DRAM)

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Memory Technologies. Technology Trends

Large and Fast: Exploiting Memory Hierarchy

Memory latency: Affects cache miss penalty. Measured by:

CPS 104 Computer Organization and Programming Lecture 20: Virtual Memory

The Memory Hierarchy Part I

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Advanced Memory Organizations

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Chapter 5 The Memory System

Background. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

Memory and multiprogramming

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Lecture-14 (Memory Hierarchy) CS422-Spring

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.

NEXT SET OF SLIDES FROM DENNIS FREY S FALL 2011 CMSC313.


Memory Hierarchy. Cache Memory. Virtual Memory

Question 13 1: (Solution, p 4) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate.

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Chapter 4 Memory Management

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CMSC 313 Lecture 13 Project 4 Questions Reminder: Midterm Exam next Thursday 10/16 Virtual Memory

The Memory Hierarchy & Cache

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Transcription:

7- CPSC 352 Computer Organization

7-2 Chapter Contents 7. The Memory Hierarchy 7.2 Random Access Memory 7.3 Chip Organization 7.4 Commercial Memory Modules 7.5 Read-Only Memory 7.6 Cache Memory 7.7 Virtual Memory 7.8 Advanced Topics 7.9 Case Study: Rambus Memory 7. Case Study: The Intel Pentium Memory System

7-3 The Memory Hierarchy Fast and expensive Registers Increasing performance and increasing cost Cache Main memory Secondary storage (disks) Off-line storage (tape) Slow and inexpensive

7-4 Functional Behavior of a RAM Cell Read D Q CLK Select Data In/Out

7-5 Simplified RAM Chip Pinout WR Memory Chip A -A m- D -D w- CS

7-6 D 3D2 D D A Four-Word Memory with Four Bits per Word in a 2D Organization A A 2-to-4 decoder Chip Select (CS) WR WR CS Word WR CS Word WR CS Word 2 WR CS Word 3 Q 3Q2 Q Q

7-7 A Simplified Representation of the Four-Word by Four-Bit RAM D 3 D 2 D D WR A A 4 4 RAM CS Q 3 Q 2 Q Q

7-8 2-/2D Organization of a 64-Word by One-Bit RAM Read D Q A A A 2 Row Decoder Read/Write Control Row Select CLK A 3 A 4 A 5 Column Decoder (MUX/DEMUX) Data Two bits wide: One bit for data and one bit for select. Data In/Out One Stored Bit Column Select

7-9 Two Four-Word by Four-Bit RAMs are Used in Creating a Four-Word by Eight-Bit RAM CS WR D 7 D 6 D 5 D 4 D 3 D 2 D D A A 4 4 RAM 4 4 RAM Q 7 Q 6 Q 5 Q 4 Q 3 Q 2 Q Q

7- Two Four-Word by Four-Bit RAMs Make up an Eight-Word by Four-Bit RAM D 3 D 2 D D WR A A 4 4 RAM CS A 2 -to-2 decoder CS 4 4 RAM CS Q 3 Q 2 Q Q

7- Single-In-Line Memory Module Adapted from(texas Instruments, MOS Memory: Commercial and Military Specifications Data Book, Texas Instruments, Literature Response Center, P.O. Box 72228, Denver, Colorado, 99.) A-A9 CAS DQ-DQ8 NC RAS V cc V ss W PIN NOMENCLATURE Address Inputs Column-Address Strobe Data In/Data Out No Connection Row-Address Strobe 5-V Supply Ground Write Enable Vcc CAS DQ A A DQ2 A2 A3 Vss DQ3 A4 A5 DQ4 A6 A7 DQ5 A8 A9 NC DQ6 W Vss DQ7 NC DQ8 NC RAS NC NC Vcc 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 2 22 23 24 25 26 27 28 29 3

7-2 A ROM Stores Four Four-Bit Words 2-to-4 decoder A A Enable Location Stored word Q 3 Q 2 Q Q

7-3 A Lookup Table (LUT) Implements an Eight-Bit ALU Operand A Operand B Function select A A A2 A3 A4 A5 A6 A7 A8 A9 A A A2 A3 A4 A5 A6 A7 Q Q Q2 Q3 Q4 Q5 Q6 Q7 Output A7 A6 Function Add Subtract Multiply Divide

7-4 Placement of Cache in a Computer System CPU 4 MHz Main Memory MHz CPU 4 MHz Cache Main Memory MHz Bus 66 MHz Without cache Bus 66 MHz With cache The locality principle: a recently referenced memory location is likely to be referenced again (temporal locality); a neighbor of a recently referenced memory location is likely to be referenced (spatial locality).

7-5 An Associative Mapping Scheme for a Cache Memory Valid Dirty Tag 27 Slot Block 32 words per block Slot Block Slot 2.. Block 28 Slot 2 4 Block 29 Cache Memory. Block 2 27 Main Memory

7-6 Associative Mapping Example Consider how an access to memory location (A35F4) 6 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, and the cache consists of 2 4 slots: Tag Word 27 bits 5 bits If the addressed word is in the cache, it will be found in word (4) 6 of a slot that has tag (5AF8) 6, which is made up of the 27 most significant bits of the address. If the addressed word is not in the cache, then the block corresponding to tag field (5AF8) 6 is brought into an available slot in the cache from the main memory, and the memory reference is then satisfied from the cache. Tag Word

7-7 Replacement Policies When there are no available slots in which to place a block, a replacement policy is implemented. The replacement policy governs the choice of which slot is freed up for the new block. Replacement policies are used for associative and set-associative mapping schemes, and also for virtual memory. Least recently used (LRU) First-in/first-out (FIFO) Least frequently used (LFU) Random Optimal (used for analysis only look backward in time and reverse-engineer the best possible strategy for a particular sequence of memory references.)

7-8 A Direct Mapping Scheme for Cache Memory Valid Dirty Tag 3 Slot Block 32 words per block Slot Block Slot 2.. Block 2 4 Slot 2 4 Block 2 4 + Cache Memory. Block 2 27 Main Memory

7-9 Direct Mapping Example For a direct mapped cache, each main memory block can be mapped to only one slot, but each slot can receive more than one block. Consider how an access to memory location (A35F4) 6 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, and the cache consists of 2 4 slots: Tag Slot Word 3 bits 4 bits 5 bits If the addressed word is in the cache, it will be found in word (4) 6 of slot (2F8) 6, which will have a tag of (46) 6. Tag Slot Word

7-2 A Set Associative Mapping Scheme for a Cache Memory Valid Dirty Tag Set 4 Slot Slot Block Block 32 words per block Set Slot 2.. Block 2 3 Set 2 3 Slot 2 4 Block 2 3 + Cache. Block 2 27 Main Memory

7-2 Set-Associative Mapping Example Consider how an access to memory location (A35F4) 6 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, there are two blocks per set, and the cache consists of 2 4 slots: Tag Set Word 4 bits 3 bits 5 bits The leftmost 4 bits form the tag field, followed by 3 bits for the set field, followed by five bits for the word field: Tag Set Word

7-22 Cache Read and Write Policies Cache Read Cache Write Data is in the cache Data is not in the cache Data is in the cache Data is not in the cache Forward to CPU. Load Through: Forward the word as cache line is filled, -or- Fill cache line and then forward word. Write Through: Write data to both cache and main memory, -or- Write Back: Write data to cache only. Defer main memory write until block is flushed. Write Allocate: Bring line into cache, then update it, -or- Write No-Allocat Update main memory only.

7-23 Hit Ratios and Effective Access Times Hit ratio and effective access time for single level cache: Hit ratios and effective access time for multi-level cache:

7-24 Direct Mapped Cache Example Compute hit ratio and effective access time for a program that executes from memory locations 48 to 95, and then loops times from 5 to 3. Slot Slot Slot 2 Block Block Block 2-5 6-3 32-47 The direct mapped cache has four 6- word slots, a hit time of 8 ns, and a miss time of 25 ns. Loadthrough is used. The cache is initially empty. Slot 3 Cache Block 3 Block 4 Block 5. Main Memory 48-63 64-79 8-95

7-25 Table of Events for Example Program Event Location Time Comment miss 5 hits miss 5 hits miss 5 hits miss miss 5 hits 9 hits 44 hits 48 25ns Memory block 3 to cache slot 3 49-63 8ns 5=2ns 64 25ns Memory block 4 to cache slot 65-79 8ns 5=2ns 8 25ns Memory block 5 to cache slot 8-95 8ns 5=2ns 5 25ns Memory block to cache slot 6 25ns Memory block to cache slot 7-3 8ns 5=2ns 5 8ns 9=72ns Last nine iterations of loop 6-3 8ns 44=2,24ns Last nine iterations of loop Total hits = 23 Total misses = 5

7-26 Calculation of Hit Ratio and Effective Access Time for Example Program

Cache slot 7-27 Neat Little LRU Algorithm A sequence is shown for the Neat Little LRU Algorithm for a cache with four slots. Main memory blocks are accessed in the sequence:, 2, 3,, 5, 4. 2 3 Cache slot 2 3 2 3 2 3 2 3 2 3 Initial Block accesses:, 2 2 3 2 3 2 3 2 2 3 3 2 3 2 3 2 3, 2, 3, 2, 3,, 2, 3,, 5, 2, 3,, 5, 4

7-28 Overlays A partition graph for a program with a main routine and three subroutines: Compiled program Physical Memory Main Routine Main A Partition # Subroutine A Smaller than program C B Subroutine B Partition # Partition graph Subroutine C

7-29 Virtual Memory Virtual memory is stored in a hard disk image. The physical memory holds a small number of virtual pages in physical page frames. A mapping between a virtual and a physical memory: Virtual addresses Virtual memory - 23 24-247 Page Page Physical memory Physical addresses 248-37 Page 2 Page frame - 23 372-495 Page 3 Page frame 24-247 496-59 Page 4 Page frame 2 248-37 52-643 Page 5 Page frame 3 372-495 644-767 Page 6 768-89 Page 7

7-3 Page Table The page table maps between virtual memory and physical memory. Present bit Page frame Page # Disk address Present bit: : Page is not in physical memory : Page is in physical memory 2 3 4 5 6 7

7-3 Using the Page Table A virtual address is translated into a physical address: Page Offset Virtual address 2 3 4 5 6 7 Physical address Page table

7-32 Using the Page Table (cont ) The configuration of a page table changes as a program executes. Initially, the page table is empty. In the final configuration, four pages are in physical memory. 2 3 4 5 6 7 2 3 4 5 6 7 After fault on page # After fault on page #3 2 3 4 5 6 7 2 3 4 5 6 7 After fault on page #2 Final

7-33 Segmentation A segmented memory allows two users to share the same word processor code, with different data spaces: Segment # Execute only Segment # Read/write by user # Segment #2 Read/write by user # Used Used Free Used Free Unused Address space for code segment of word processor Data space for user # Data space for user #

7-34 Fragmentation (a) Free area of memory after initialization; (b) after fragmentation; (c) after coalescing. Operating System Free Area Operating System Program A Free Area Program B Free Area Program C Operating System Program A Free Area Program B Free Area Program C Free Area Free Area Free Area Dead Zone Dead Zone Dead Zone I/O Space I/O Space I/O Space (a) (b) (c)

7-35 Translation Lookaside Buffer An example TLB holds 8 entries for a system with 32 virtual pages and 6 page frames. Valid Virtual page number Physical page number - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

7-36 3-Variable Decoder A conventional decoder is not extensible to large sizes because each address line drives twice as many logic gates for each added address line. a a a 2 d d d 2 d 3 d 4 d 5 d 6 d 7

7-37 Tree Decoder - 3 Variables A tree decoder is more easily extended to large sizes because fanin and fan-out are managed by adding deeper levels. Fan-out buffers a d a d d d a a d 2 d 2 a 2 d 3 a 2 d 3 d 4 d 4 d 5 d 5 d 6 d 6 d 7 d 7 (a) (b)

7-38 Tree Decoding One Level at a Time A decoding tree for a 6-word random access memory: Level Level Level 2 Level 3

7-39 Content Addressable Memory Addressing Relationships between random access memory and content addressable memory: Address Value Field Field2 Field3 A FF A 9E A4 A8 AC A A4 A8 AC 86734F F FE6822 352467C C34597 392B 3456 49 9 749 575 7 4 E C F FE 6E 5 84 32 bits 32 bits 2 bits 4 bits 8 bits Random access memory Content addressable memory

Information and Commands Data Gathering Device 7-4 Overview of CAM Central Control Source: (Foster, C. C., Content Addressable Parallel Processors, Van Nostrand Reinhold Company, 976.) Comparand Mask Cell Cell T T Cell 2 T 2 Cell 495 T 495

7-4 Addressing Subtrees for a CAM Data: four bits per channel 4 Control: one bit per channel 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

7-42 Block Diagram of Dual-Read RAM D D 7 Data In A Address 2 A A 9 2 2 Word 8 bits A RAM 8 Port A WR CS A Address 2 Data In A A 9 2 B Address 2 2 2 Word 8 bits B RAM 8 Port B B B 9 WR CS WR CS

7-43 Rambus Memory Rambus technology on the Nintendo 64 motherboard (top left and bottom right) enables cost savings over the conventional Sega Saturn motherboard design (bottom left). (Photo source: Rambus, Inc.)

7-44 The Intel Pentium Memory System Pentium Processor Chip CPU 256 Level Instruction Cache 8 KB TLB n Level 2 Cache Up to 52KB n Main Memory Up to 8GB Level Data Cache 8 KB TLB