CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

Similar documents
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

EEM 486: Computer Architecture. Lecture 9. Memory

ECE 485/585 Microprocessor System Design

LECTURE 5: MEMORY HIERARCHY DESIGN

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

EE414 Embedded Systems Ch 5. Memory Part 2/2

Computer System Components

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

Lecture 18: DRAM Technologies

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

COSC 6385 Computer Architecture - Memory Hierarchies (III)

The University of Adelaide, School of Computer Science 13 September 2018

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

Copyright 2012, Elsevier Inc. All rights reserved.

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by:

Memory. Lecture 22 CS301

Mainstream Computer System Components

Introduction to memory system :from device to system

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Computer Systems Laboratory Sungkyunkwan University

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

CS311 Lecture 21: SRAM/DRAM/FLASH

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Memories: Memory Technology

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

Topic 21: Memory Technology

Topic 21: Memory Technology

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Chapter 8 Memory Basics

Adapted from David Patterson s slides on graduate computer architecture

Memory Hierarchy and Caches

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Copyright 2012, Elsevier Inc. All rights reserved.

CSE 502 Graduate Computer Architecture

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

Memory Hierarchy Design

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Lecture-14 (Memory Hierarchy) CS422-Spring

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

Memory Hierarchy Basics

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

CSE502: Computer Architecture CSE 502: Computer Architecture

The DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

This Unit: Main Memory. Building a Memory System. First Memory System Design. An Example Memory System

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

EN1640: Design of Computing Systems Topic 06: Memory System

CENG3420 Lecture 08: Memory Organization

CSE502: Computer Architecture CSE 502: Computer Architecture

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Lecture 11. Virtual Memory Review: Memory Hierarchy

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

Slide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, DRAM Bandwidth

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Pollard s Attempt to Explain Cache Memory

A Cache Hierarchy in a Computer System

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

a) Memory management unit b) CPU c) PCI d) None of the mentioned

Memory technology and optimizations ( 2.3) Main Memory

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

The Memory Hierarchy Part I

ECE 250 / CS250 Introduction to Computer Architecture

Copyright 2012, Elsevier Inc. All rights reserved.

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

Transcription:

CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5 Memory Controller Chipset 6MB DIMM Lecture 9: Main Memory 9-/ /6/ A. Sohn

6MB Dual Inline Memory Module Chip Chip7 Mbx8 Mbx8 Mbx8 Mbx8 Mbx8 Mbx8 Mbx8 Mbx8 Parity/ECC x = = 6 MB x x 8 x = MB 6-bit bus -bit muxed Address bus Lecture 9: Main Memory 9-/ /6/ A. Sohn 5MB with 6MB DIMMs x = = 6 MB x x 8 x = MB 6-bit bus -bit muxed Address bus Lecture 9: Main Memory 9-/ /6/ A. Sohn

Current DRAM Technology DRAM fast page mode (burst) allows repeated accesses to the row buffer without another row access time, called a page hit. Synchronous DRAM (SDRAM) synchronizes itself with the timing of the. This enables the memory controller to know the exact clock cycle when the requested data will be ready. SDRAM chips take advantage of interleaving and burst mode functions. GHz vs memory MHz. Double Rate SDRAM (DDR SDRAM) DDR SDRAM allows the memory chip to perform transactions on both the rising and falling edges of the clock cycle. A MHz DDR SDRAM clock rate yields an effective data rate of MHz. Direct Rambus DRAM (DRDRAM) transfers data at MHz over a narrow 6-bit bus called a Direct Rambus Channel. This high-speed clock rate is done through double clockking which allows operations to occur on both the rising and falling edges of the clock cycle. It appears that the current market is moving against Rambus. Lecture 9: Main Memory 9-5/ /6/ A. Sohn DRAM Cell consists of a transistor and a capacitor Row access Column access Lecture 9: Main Memory 9-6/ /6/ A. Sohn

DRAM Array consists of a transistor and a capacitor Row access Column access Lecture 9: Main Memory 9-7/ /6/ A. Sohn Sample Kb Memory Address 5 6 7 8 9 5 6 7 8 9 Address -out-of- Row Decoder Kb xx 5 6 7 8 9 5 6 7 8 9 Sense Amplifiers I/O Gating -out-of- Column Decoder Lecture 9: Main Memory 9-8/ /6/ A. Sohn

Kb with Address Multiplexing 5 6 7 8 9 5 6 7 8 9 Address Address Register Row Address Col Address Counter -out-of- Row Decoder I/O Gating Kb xx -out-of- Column Decoder 5 6 7 8 9 5 6 7 8 9 Sense Amplifiers Burst Lecture 9: Main Memory 9-9/ /6/ A. Sohn Address Kb Address Register Row Address Bank Control -out-of- Row Decoder -out-of- Row Decoder -out-of- Row Decoder -out-of- Row Decoder 5 6 7 8 9 Bank Bank Bank Bank 5 6 7 8 9 Kb xx 5 6 7 8 9 5 6 7 8 9 Sense Amplifiers Col Address I/O Gating -out-of- Column Decoder Lecture 9: Main Memory 9-/ /6/ A. Sohn

Partitioning of Address Space High Order Word Interleaving Consecutive words are stored in the same memory bank High order bits are used to select a bank Low Order Word Interleaving Consecutive words are stored across different memory banks Low order bits are used to select a bank Grouped Low Order Interleaving Consecutive words are stored across different banks of a group High order bits are used to select a group. Low order bits are used to select the bank within a group. Pros and Cons Lecture 9: Main Memory 9-/ /6/ A. Sohn High Order Word Interleaving load f,(r) mult f,f,f store (r),f addi r,r,-#8 bne r,r,loop MSB LSB 5 6 7 8 9 5 6 7 8 9 Bank Bank Bank Bank Lecture 9: Main Memory 9-/ /6/ A. Sohn

Low Order Word Interleaving load f,(r) mult f,f,f store (r),f addi r,r,-#8 bne r,r,loop MSB LSB 8 6 5 9 7 6 8 7 5 9 Bank Bank Bank Bank Lecture 9: Main Memory 9-/ /6/ A. Sohn Grouped Low Order Interleaving load f,(r) mult f,f,f store (r),f addi r,r,-#8 MSB LSB bne r,r,loop 6 8 5 7 9 5 6 8 7 9 Bank Bank Bank Bank Lecture 9: Main Memory 9-/ /6/ A. Sohn

Memory Latency 5 6 7 8 9 5 6 7 8 9 Address Address Register Row Address Col Address -out-of- Row Decoder I/O Gating Kb xx -out-of- Column Decoder 5 6 7 8 9 5 6 7 8 9 Sense Amplifiers Lecture 9: Main Memory 9-5/ /6/ A. Sohn Memory Latency for 6-bit Line First word (8 bytes) - 9 cycles address to Chipset Memory controller at chipset to DRAM Row access strobe (RAS), reading and charging the row Column access strobe (CAS) to get the column to DRAM output buffer from output buffer to through chipset Second word (8 bytes) - cycle Third word (8 bytes) - cycle Fourth word (8 bytes) - cycle For @ GHz, DRAM @MHz, the total latency for reading a 6-bit L cache line Lecture 9: Main Memory 9-6/ /6/ A. Sohn

Memory Latency 5 Memory Controller Chipset 6MB DIMM Lecture 9: Main Memory 9-7/ /6/ A. Sohn Cache Miss Penalty Cache Multiplexer -bit -bit -bit Multiplexer -bit -bit -bit -bit 6-bit -bit -bit Memory block ( words) words (6 bits) words () word word word word Bank Bank Bank Bank word ( bits) Assuming 5 clocks to send address to memory 5 clocks to access memory 5 clocks to send a word to cache Lecture 9: Main Memory 9-8/ /6/ A. Sohn

Cache Miss Penalty 5 + x5 + x5 Cache Memory -bit word ( bits) block ( words) Lecture 9: Main Memory 9-9/ /6/ A. Sohn Cache Miss Penalty 5 + x5 + 5 Multiplexer -bit -bit -bit -bit Cache -bit Memory words () Lecture 9: Main Memory 9-/ /6/ A. Sohn

Cache Miss Penalty 5 + x5 + x5 Cache Memory -bit word ( bits) word ( bits) word ( bits) word ( bits) Bank Bank Bank Bank Lecture 9: Main Memory 9-/ /6/ A. Sohn Improving Cache Performance. Reduce cache miss rate (number of cache misses). Reduce cache miss penalty. Reduce the time to hit in the cache Lecture 9: Main Memory 9-/ /6/ A. Sohn

Reducing Cache Miss Rate Reducing the number of cache misses. Larger block size: lowers compulsory misses but increases miss penalty. Higher associativity: increases hit time and increase clock cycle. Yet another cache: victim cache?. Pseudo-associative caches 5. Hardware prefetching of instructions and data: two blocks are prefetched: one in cache and the other in buffer 6. Compiler-controlled prefetching 7. Compiler optimization Lecture 9: Main Memory 9-/ /6/ A. Sohn Reducing Cache Miss Penalty. Giving priority to read misses over writes. Sub-block placement for reduced miss penalty. Early restart and critical word first. Nonblocking caches to reduce stalls 5. Second-level caches Reducing Hit Time. Small and simple caches. Avoiding address translation during indexing. Pipelining writes for fast write hits Lecture 9: Main Memory 9-/ /6/ A. Sohn

Alpha Memory Hierarchy word = 6 bits block = words = bytes= 6 bits block offset = 5 bits => byte addressable High bits are used to select a word out of words in a block ITLB = entries (pages) Instruction cache = 8 (index) x 5 (block size) = B cache = 8 (index) x 5 (block size) = B L cache = 9 (index) x 5 (block size) = B Main memory = x 5 (block size) = B Disk = (# of pages) x (page size) = B Lecture 9: Main Memory 9-/ /6/ A. Sohn