Chapter 4 Main Memory

Similar documents
William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Unit 2. Chapter 4 Cache Memory

Chapter 5 Internal Memory

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

Overview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition

Chapter 5. Internal Memory. Yonsei University

Computer & Microprocessor Architecture HCA103

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

TK2123: COMPUTER ORGANISATION & ARCHITECTURE. CPU and Memory (2)

Semiconductor Memory Types Microprocessor Design & Organisation HCA2102

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

WEEK 7. Chapter 4. Cache Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (4 th Week)

Semiconductor Memory Types. Computer & Microprocessor Architecture HCA103. Memory Cell Operation. Semiconductor Memory.

Computer Organization and Assembly Language (CS-506)

Module 5a: Introduction To Memory System (MAIN MEMORY)

Concept of Memory. The memory of computer is broadly categories into two categories:

Memory and Disk Systems

Memory Overview. Overview - Memory Types 2/17/16. Curtis Nelson Walla Walla University

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

Chapter 4. Cache Memory. Yonsei University

Memory Hierarchy Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory.

COMPUTER ARCHITECTURE AND ORGANIZATION

Advanced Parallel Architecture Lesson 4 bis. Annalisa Massini /2015

Advanced Parallel Architecture Lesson 4. Annalisa Massini /2015

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (5 th Week)

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

CS 320 February 2, 2018 Ch 5 Memory

Computer Organization. Chapter 12: Memory organization

Chapter 6 Objectives

UNIT-V MEMORY ORGANIZATION

COSC 243. Memory and Storage Systems. Lecture 10 Memory and Storage Systems. COSC 243 (Computer Architecture)

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.)

Chapter 2: Fundamentals of a microprocessor based system

Chapter 8 Memory Basics

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Computer Organization (Autonomous)

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Memory classification:- Topics covered:- types,organization and working

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

Memory. Objectives. Introduction. 6.2 Types of Memory

CREATED BY M BILAL & Arslan Ahmad Shaad Visit:

Large and Fast: Exploiting Memory Hierarchy

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Memory Hierarchy. Cache Memory. Virtual Memory

COMP3221: Microprocessors and. and Embedded Systems. Overview. Lecture 23: Memory Systems (I)

Lecture 18: Memory Systems. Spring 2018 Jason Tang

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

ECSE-2610 Computer Components & Operations (COCO)

Internal Memory Cache Stallings: Ch 4, Ch 5 Key Characteristics Locality Cache Main Memory

Microcontroller Systems. ELET 3232 Topic 11: General Memory Interfacing

Overview. EE 4504 Computer Organization. Historically, the limiting factor in a computer s performance has been memory access time

UNIT:4 MEMORY ORGANIZATION

Memory memories memory

SAE5C Computer Organization and Architecture. Unit : I - V

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

Design and Implementation of an AHB SRAM Memory Controller

EE414 Embedded Systems Ch 5. Memory Part 2/2

Technology in Action

1. Explain in detail memory classification.[summer-2016, Summer-2015]

THE MICROCOMPUTER SYSTEM CHAPTER - 2

Cache Memory Part 1. Cache Memory Part 1 1

Introduction read-only memory random access memory

chapter 8 The Memory System Chapter Objectives

MEMORY. Objectives. L10 Memory

Memory Hierarchy and Cache Ch 4-5

ELCT 912: Advanced Embedded Systems

Lecture-7 Characteristics of Memory: In the broad sense, a microcomputer memory system can be logically divided into three groups: 1) Processor

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Summer 2003 Lecture 18 07/09/03

Computer Memory.

Cache memory. Lecture 4. Principles, structure, mapping

a) Memory management unit b) CPU c) PCI d) None of the mentioned

ECE 485/585 Microprocessor System Design

Memory Hierarchy and Cache Ch 4-5. Memory Hierarchy Main Memory Cache Implementation

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422)

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Computers Are Your Future

COMPUTER ARCHITECTURE

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

COMP2121: Microprocessors and Interfacing. Introduction to Microprocessors

7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc.

Design with Microprocessors

The Memory Hierarchy Part I

Computer Systems Organization

Physical characteristics (such as packaging, volatility, and erasability Organization.

Computer Organization & Architecture M. A, El-dosuky

MEMORY BHARAT SCHOOL OF BANKING- VELLORE

Semiconductor Memories: RAMs and ROMs

8051 INTERFACING TO EXTERNAL MEMORY

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

Transcription:

Chapter 4 Main Memory

Course Outcome (CO) - CO2 Describe the architecture and organization of computer systems Program Outcome (PO) PO1 Apply knowledge of mathematics, science and engineering fundamentals to the solution of complex electrical / electronic engineering problems L01-Knowledge in specific area-content

Internal Register Cache Main memory External Peripheral storage devices Disk Tape

Internal In bytes or words (8 bits, 16 bits, 32 bits) External In bytes 256MB, 8GB

Internal Usually governed by data bus width Number of electrical lines (A) into and out of the memory module May be equal to the word length Determines the number of addressable locations (N) 2^A = N External Usually a block which is much larger than a word

Sequential Access in a linear sequence Start at the beginning and read through in order Access time depends on location of data and previous location e.g. Tape

Direct Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location e.g. Hard disk, floppy disk

Random Individual location has a unique address and can be identified exactly Access time is independent of location or previous access e.g. RAM

Associative Individual location has its own addressing mechanism Data is located by a comparison with contents of a portion of the store data Access time is independent of location or previous access e.g. Cache

Access time (Time taken to perform a read or write operation) Time between presenting the address and getting the valid data or data are written Memory Cycle time Time may be required for the memory to recover before next access Cycle time is access + recovery Transfer Rate Rate at which data can be moved/transferred

Semiconductor RAM Magnetic Disk & Tape Optical CD & DVD

Volatility Erasable

Concerns with internal memory (random-access memory) Physical arrangement of bits into words

How much? Capacity How fast? Time is money How expensive? Capacity Access time Cost

Various technologies are used to implement memory systems Faster access time, greater cost per bit Greater capacity, smaller cost per bit Greater capacity, slower access time

It is possible to build a computer which uses only static RAM (see later) This would be very fast This would need no cache How can you cache cache? This would cost a lot!

Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of the memory by the processor

Suppose there are two levels of memory Level 1 access time of 0.01μs Level 2 access time of 0.1 μs

Two-level of memory to reduce average access time works if Decreasing frequency of access of the memory by the processor is valid Known as locality of reference During the course of the execution of a program, memory references (instructions and data) tend to cluster Program typically contains iterative loops or subroutine

Memory Type Category Erasure Write Mechanism Volatility Random-access memory (RAM) Read-write memory Electrically, byte-level Electrically Volatile Read-only memory (ROM) Programmable ROM (PROM) Read-only memory Not possible Masks Erasable PROM (EPROM) UV light, chiplevel Nonvolatile Electrically Erasable PROM (EEPROM) Read-mostly memory Electrically, byte-level Electrically Flash memory Electrically, block-level

DYNAMIC RAM Bits stored as charge on capacitors Charges leak (discharged) Need refresh circuits Need refreshing even when powered

Address line active when bit read or written Transistor acts as switch Write Voltage to bit line Read High for 1 low for 0 Then signal address line Transfers charge to capacitor Address line selected transistor turns on Charge from capacitor fed via bit line to sense amplifier Compares with reference value to determine 0 or 1 Capacitor charge must be restored

Bits stored as on/off switches (flip flop) Data is hold as long as power is supplied

Transistor arrangement gives stable logic state State 1 C 1 high, C 2 low T 1 T 4 off, T 2 T 3 on State 0 C 2 high, C 1 low T 2 T 3 off, T 1 T 4 on Address line is used to open/close T 5 and T 6 Write: desired bit value is applied to line B and /B Read: value is on line B

Simpler construction Smaller per bit More dense and less expensive Slower Requires refresh circuitry For large memory requirement (main memory) Complex construction Larger per bit More expensive Faster For cache memory

Permanent storage Nonvolatile Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables

Written during manufacture Very expensive for small runs Programmable (once) PROM Needs special equipment to program Read mostly Erasable Programmable (EPROM) Erased by UV Electrically Erasable (EEPROM) Takes much longer to write than read Flash memory Erase whole memory electrically

16-Mbit DRAM with 4 bits read/write at a time Organized as 4 square arrays of 2048 by 2048 elements The elements are connected by both horizontal (row) and vertical (column) lines 11 address lines are required to select one of 2048 rows and fed into a row decoder The decoder activates a single one of the 2048 outputs depending on the bit pattern on the 11 input lines 11 address lines select one of 2048 columns of 4 bits per column 4 data lines are used for the input and output of 4 bits to and from a data buffer First, 11 address signals are passed to the chip to define the row, then the 11 address signals are presented for the column address. Reduces number of address pins Multiplex row address and column address 11 pins to address (2 11 =2048) Adding one more pin doubles range of values so x4 capacity

8-bit word 18-bit address is needed

Hard Failure Permanent (physical) defect Soft Error Random, non-destructive No permanent damage to memory Cause by power supply problems Detected using Hamming error correcting code

Access is synchronized with an external clock Address is presented to RAM RAM finds data (CPU waits in conventional DRAM) Since SDRAM moves data in time with system clock, CPU knows when data will be ready CPU does not have to wait, it can do something else Burst mode allows SDRAM to set up stream of data and fire it out in block DDR-SDRAM sends data twice per clock cycle (leading & trailing edge)

Developed and adopted by Intel for Pentium & Itanium Main competitor to SDRAM Vertical package all pins on one side Data exchange over 28 wires < cm long Bus addresses up to 320 RDRAM chips at 1.6Gbps Asynchronous block protocol 480ns access time Then 1.6 Gbps

Chapter 4.2 Cache Memory

Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module

A relatively large and slow main memory together with a smaller, faster cache memory.

CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache line

Addressing Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches

Where does cache sit? Between processor and virtual memory management unit Between MMU and main memory Logical cache (virtual cache) stores data using virtual addresses Processor accesses cache directly, not through physical cache Cache access faster, before MMU address translation Virtual addresses use same address space for different applications Must flush cache on each context switch Physical cache stores data using main memory physical addresses

Cost The overall average cost per bit must close to that of main memory Cache is expensive Size Overall average access time is close to that of the cache alone Too large tend to be slightly slower than small cache Checking cache for data takes time

Processor Type Year of Introduction L1 cache L2 cache L3 cache IBM 360/85 Mainframe 1968 16 to 32 KB PDP-11/70 Minicomputer 1975 1 KB VAX 11/780 Minicomputer 1978 16 KB IBM 3033 Mainframe 1978 64 KB IBM 3090 Mainframe 1985 128 to 256 KB Intel 80486 PC 1989 8 KB Pentium PC 1993 8 KB/8 KB 256 to 512 KB PowerPC 601 PC 1993 32 KB PowerPC 620 PC 1996 32 KB/32 KB PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB IBM S/390 G6 Mainframe 1999 256 KB 8 MB Pentium 4 PC/server 2000 8 KB/8 KB 256 KB IBM SP High-end server/ supercomputer 2000 64 KB/32 KB 8 MB CRAY MTAb Supercomputer 2000 8 KB 2 MB Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB Itanium 2 PC/server 2002 32 KB 256 KB 6 MB IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB

Direct Associative Set associative

Cache of 64 kbyte Blocks of 4 bytes each Cache is organized as 16k (2 14 ) lines of 4 bytes Main memory consists of 16 Mbytes Addressable by a 24-bit address (2 24 =16M) 4M blocks of 4 bytes each

Each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)

Tag s-r Line r Word w 8 14 2 24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier 8-bit tag (=22-14) 14-bit line No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag

Cache line Main Memory blocks held 0 0, m, 2m, 3m 2s-m 1 1,m+1, 2m+1 2s-m+1 m-1 m-1, 2m-1,3m-1 2s-1

Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = m = 2r Size of tag = (s r) bits

Simple Inexpensive Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

Lower miss penalty Remember what was discarded Already fetched Use again with little penalty Fully associative 4 to 16 cache lines Between direct mapped L1 cache and next memory level

A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line s tag is examined for a match Cache searching gets expensive

Tag 22 bit Word 2 bit 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address line Tag Data Cache FFFFFC FFFFFC 24682468 3FFF

Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = undetermined Size of tag = s bits

Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set e.g. Block B can be in any line of set i e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set

13 bit set number Block number in main memory is modulo 2 13 000000, 00A000, 00B000, 00C000 map to same set

Tag 9 bit Set 13 bit Word 2 bit Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g Address Tag Data Set number 1FF 7FFC 1FF 12345678 1FFF 001 7FFC 001 11223344 1FFF

Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2d Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s d) bits

Significant up to at least 64kB for 2-way Difference between 2-way and 4-way at 4kB much less than 4kB to 8kB Cache complexity increases with associativity Not justified against increasing cache to 8kB or 16kB Above 32kB gives no improvement (simulation results)

Hit ratio 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1k direct 2-way 4-way 8-way 16-way 2k 4k 8k 16k Cache size (bytes) 32k 64k 128k 256k 512k 1M

No choice Each block only maps to one line Replace that line

Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is lru? First in first out (FIFO) replace block that has been in cache longest Least frequently used (LFU) replace block which has had fewest hits Random

An 8-bit microprocessor has an on-chip 8-KBytes two-way setassociative cache. The line size of the cache is four (4) bytes. The microprocessor is connected to a byte addressable main memory of 2 MBytes. Determine the: 1. number of lines in the cache 2. number of sets in the cache 3. format of the main memory addresses 4. result of tag comparison (i.e. cache miss or cache hit) for the given section of cache in Figure 1 when the microprocessor issue an address of 19E68D (hex). State a reason why. 5. new content of line X in the given section of cache in Figure 2 if the line is selected for content replacement after a cache miss occur when the microprocessor issued an address of 0A4B4D (hex). State the new tag value of line X and the set number line X belongs to.

Figure 1 Figure 2