Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Similar documents
Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

Memory latency: Affects cache miss penalty. Measured by:

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by:

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Memory. Objectives. Introduction. 6.2 Types of Memory

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory. Lecture 22 CS301

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

Memory Hierarchy and Caches

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

The University of Adelaide, School of Computer Science 13 September 2018

EN1640: Design of Computing Systems Topic 06: Memory System

Lecture-14 (Memory Hierarchy) CS422-Spring

EN1640: Design of Computing Systems Topic 06: Memory System

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Recap: Machine Organization

COSC 6385 Computer Architecture - Memory Hierarchies (II)

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Page 1. Multilevel Memories (Improving performance using a little cash )

The Memory Hierarchy & Cache

MEMORY. Objectives. L10 Memory

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

EE414 Embedded Systems Ch 5. Memory Part 2/2

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EEC 483 Computer Organization. Chapter 5.3 Measuring and Improving Cache Performance. Chansu Yu

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

CENG3420 Lecture 08: Memory Organization

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Mainstream Computer System Components

CENG4480 Lecture 09: Memory 1

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Computer System Components

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

14:332:331. Week 13 Basics of Cache

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Memory Hierarchy Y. K. Malaiya

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Lecture 18: DRAM Technologies

Introduction to OpenMP. Lecture 10: Caches

Adapted from David Patterson s slides on graduate computer architecture

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Computer Science 146. Computer Architecture

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Course Administration

Copyright 2012, Elsevier Inc. All rights reserved.

Memory Technologies. Technology Trends

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

14:332:331. Week 13 Basics of Cache

a) Memory management unit b) CPU c) PCI d) None of the mentioned

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Advanced Memory Organizations

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Computer Systems Laboratory Sungkyunkwan University

LECTURE 5: MEMORY HIERARCHY DESIGN

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

COMPUTER ORGANIZATION AND DESIGN

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.)

Large and Fast: Exploiting Memory Hierarchy

Lecture 11. Virtual Memory Review: Memory Hierarchy

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Internal Memory Cache Stallings: Ch 4, Ch 5 Key Characteristics Locality Cache Main Memory

Chapter Seven Morgan Kaufmann Publishers

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

Logical Diagram of a Set-associative Cache Accessing a Cache

CSE502: Computer Architecture CSE 502: Computer Architecture

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Transcription:

Memory systems Memory technology Memory hierarchy Virtual memory Memory technology DRAM Dynamic Random Access Memory bits are represented by an electric charge in a small capacitor charge leaks away, need to be refreshed at regular intervals reading the memory also discarges the capacitors DRAM has better price/performance than SRAM also higher densities, need less power and dissipate less heat SRAM Static Random Access Memory based on gates, a bit is stored in 6 transistors no refreshing, retain data as long as they have power SRAM provides higher speed used for high-performance memories cache, video memory,...

Access time and cycle time Memory access time is the time it takes read or write a memory location Memory cycle time is the minimum time between two successive memory references can not do repeated accesses immediately after each other have to refresh the memory after an access Example: ns access time ns cycle time Memory banks and interleaving The memory is organized as a number of banks each bank consists of a separate memory device Interleaving consecutive memory accesses address different banks when one bank is refreshed, another bank can be accesed can overlap accesses and refreshing Gives a continous flow of data from memory Example: -way interleaved memory 6 8 7 9

Dynamic RAM technology Fast page mode DRAM improves access to memory in sequentially located addresses (cache lines) the entire address does not have to be transmitted to the memory for each access, only the least significant bits EDO RAM (Extended Data OUT RAM) very similar to fast page mode RAM SDRAM (Synchronous DRAM) CPU and memory is synchronized by an external clock consecutive data is output synchronously on a clock pulse memory chips are divided into two independent cell banks, interleaving PC66, PC, PC SDRAM, etc. MHz * 6 bits / 8 bits = 6 MB/s peak bandwidth typical efficiency approx. 7 % = 8 MB/s Dynamic RAM technology (cont.) DDR SDRAM (Double Data Rate SDRAM) memory architecture chosen by AMD synchronous DRAM the memory chips perform accesses on both the rising and falling edges of the clock a memory with a MHz clock operates effectively at 66 MHz 6-bit data bus MHz clock cycle * clocks/cycle * 6 bits / 8 bits = 8 MB/s peak bandwidth typical efficiency approx. 6 % = 8 MB/s 8 pin SIMMs 6

Dynamic RAM technology (cont.) Direct RAMBUS proprietary technology of Rambus Inc., memory architecture chosen by Intel new, fast DRAM architecture, MHz operates on both rising and falling edge of clock cycle transfers data ovar a narrow 6-bit bus (Direct Rambus Channel) multiple memory banks use pipelining technology to send four 6-bit packets at a time (6-bit memory accesses) MHz * clocks/cycle * 6 bits / b bits = 6 MB/s typical efficiency approx. 6 % = 6 MB/s RIMMs similar as DIMMs but different pin count (8 vs. 68) covered by an aluminium heat spreader 7 Memory hierarchy Hierarchical memory organisation registers Virtual Cache Registers memory cache memory RAM main memory ALU Page Line Element KB 8 b disk memory From small, fast and expensive to large, slow and cheap Example: memory access times on a MHz 6 Alpha register ns L (on-chip) ns L (on-chip) ns L (off-chip) ns memory ns 8

Registers Small, very fast memory storage located close to the ALU Implemented by static RAM operates at the same speed as instruction execution IA- ISA defines 8 general purpose -bit registers + special purpose registers: EIP,EFLAGS, 6 segment registers + 8 8 bit floating-point registers and 6 special-purpose registers 8 6-bit MMX registers, aliased to FP registers GPR are used by the processor for operand evaluation stores intermediate values in expression evaluation Example: x = G*. + A/M -W*M Optimizing compilers make efficient use of registers for expression evaluation 9 Caches Small, fast memory located between the processor and main memory implemented by static RAM store a subset of the memory Separate caches for instructions and data Harvard memory architecture can simultaneously fetch instructions and operands Data in a lower memory level is also stored in the higher level Strategies to maintain coherence between cache and memory: write-through: data is immediately written back to memory when it is updated write-back: data is written to memory when a modified cache line is replaced in cache

Cache lines The unit of data transferred between RAM and cache is called a cache line consists of N consecutive memory locations When we access a memory location, a consecutive memory block is copied to the cache a cache replacement policy defines how old data in cache is replaced with new data tries to keep frequently used data in the cache Typical cache line sizes range from 8 bits to bits For each memory access, the computer first checks if the cache line containing this memory location is in cache if not (on a cache miss) the line is brought in has to decide which old cache line to throw out LRU algorithm Cache organization A cache mapping defines how memory locations are placed into caches mapping of addresses to cache lines Each cache line records the memory Cache addresses it represents called a tag used to find out which memory addesses are stored in a cache line Cache is much smaller than RAM two memory blocks can be mapped to the same cache line Think of memory as being divided into blocks of the size of a cache line Memory 6

Direct mapped caches A memory block can be placed in exactly one cache line The mapping is (block address) MOD (nr. of lines in cache) Easy to find out if a memory address is in cache or not check the tag in the cache line where it is supposed to be Cache Memory Fully associative cache A memory block can be placed in any cache line Can not calculate in which cache line a memory block should be placed have to search through all cache lines to find the location containing the tag we are looking for Associative memory search through all cache lines simultaneously for a matching tag Associative caches are small and expensive 7

Set associative cache A memory block can be placed in a restricted set of cache lines the block is first mapped onto a set of cache lines then it is decedied into which of these it is placed The mapping is Set (block address) MOD (nr. of sets) Set If there are N sets, the cache placement is called N-way associative Can compute in which set a block is placed only have to do associative search in a small set Cache Memory Cache misses Assume a L cache access time of ns, L access time of ns and memory access time of ns if we have a 8% L hit rate, % L hit rate and % memory references the average memory access time is.8* +.* +.* =. ns Caches are based on the principle of locality spatial locality we access data located near each other (sequential access) temporal locality we do repeated accesses to the same data Three different reasons for cache misses compulsory capacity conflict 6 8

Compulsory cache misses Cold start misses or first reference misses the first access to a block of memory always causes a cache miss when the line is brought in to the cache Can increase the cache line size increases cache miss penalty increases conflict misses, because the cache contains fewer lines Can use prefetching bring in the next contigous cache line at the same time some processors also have a prefetch instruction, which the compiler can insert into the code works for contiguous memory accesses, not for random access patterns 7 Capacity cache misses The cache can not hold all of the memory referenced in the program capacity misses occur when some cache lines are replaced because the cache is full and later need to be brought in again Capacity misses can be overcome by increasing the cache size Can also modify the data structures and algorithm to improve spatial and temporal locality compiler optimization high-level code optimization techniques 8 9

Conflict cache misses In direct mapped and set associative caches, many memory blocks can map to the same cache line a cache line may have to be thrown out because some other line needs its place in the cache the same line may have to be brought in immediately after Conflict misses can be overcome by using higher associativity -way associative instead of -way can also try to avoid conflict misses in the program design : cache rule of thumb a direct mapped cache of size N has about the same miss rate as a -way set-associative cace of size N/ 9 Cache trashing Repeatedly throws out a cache line that we need in the next memory access can occur with direct mapped and -way set associative caches Assume we have a direct mapped cache cache line size of bytes (= 8 words) Two arrays X and Y contiguosly located cache size apart (address of X) MOD (nr. of lines in cache) = b (address of Y) MOD (nr. of lines in cache) X X[] and Y[] are mapped to the same cache line when one is brought in to cache, the other is thrown out causes cache trashing if we access both arrays Y sequentially b+csize b+*csize

Example of cache trashing In the first iteration, the reference to X[] causes a compulsory cache miss the cache line containing X[] X[] is brought in X[] is placed in a register The cache line containing Y[] Y[] is brought in maps to the same line, replaces X[] X[] X[] is placed in a register SIZE = 6*; /* 6K */ double X[SIZE], Y[SIZE];... for (i=; i<size; i++) { Y[i] = X[I]+Y[i]; } X[] and Y[] are added and Y[] is stored In the next iteration the cache line containing X[] has to be brought in again conflict cache misses in all iterations Cache trashing (cont.) Can also get cache trashing in -way set associative caches Three consecutive arrays of cache size X[k], Y[k] and Z[k] all map to the same set the set size is two one will always be thrown out in each iteration Can be avoided by padding the arrays insert an array of the cache line size between the arrays X[], Y[] and Z[] map to different cache lines Trashing may be a problem when array size is a power of two SIZE = 6*; /* 6K */ double X[SIZE], Y[SIZE], Z[SIZE];... for (i=; i<size; i++) { Z[i] = X[I]*Y[i]+Z[i]; }

Virtual memory Decouples addresses used by the program (virtual addresses) from physical addresses the program uses a large contigous address space actual memory blocks may be located anywhere in physical memory some memory blocks may also be on secondary storage Memory is divided into pages page size can be from bytes to MB Virtual addresses are translated to physical addresses using a page table K 8 K K A B C D Virtual address C K 8 K K A 6 K K K B 8 K D K 6 K K K 8 K K Physical address Stores the mapping of logical to physical addresses Indexed by the virtual page number one entry per page in the virtual address space Page tables are usually large stored in virtual memory Virtual address need two virtual-to-physical translations to find a physical address Use a translation lookaside buffer (TLB) as a cache for addess translations Page number Offset Page table Page tables Physical memory

Translation lookaside buffer Cache memory for address translations tag field holds a part of the virtual address data field holds the physical page frame number also status bits: valid, use, dirty Implemented by an associative cache memory TLB is limited in size virtual addresses not in the TLB cause a TLB miss Repeated TLB misses cause very bad performance same as for repeated cache misses Good cache behaviour usually implies good TLB behaviour