COSC 6385 Computer Architecture - Memory Hierarchies (III)

Similar documents
COSC 6385 Computer Architecture - Memory Hierarchies (II)

LECTURE 5: MEMORY HIERARCHY DESIGN

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Adapted from David Patterson s slides on graduate computer architecture

Introduction to cache memories

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved.

Memories. CPE480/CS480/EE480, Spring Hank Dietz.

Memory technology and optimizations ( 2.3) Main Memory

CSE 502 Graduate Computer Architecture

The University of Adelaide, School of Computer Science 13 September 2018

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

Computer Systems Laboratory Sungkyunkwan University

MEMORY HIERARCHY DESIGN

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Chapter 5 Internal Memory

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

DRAM Main Memory. Dual Inline Memory Module (DIMM)

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS 320 February 2, 2018 Ch 5 Memory

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.)

EE414 Embedded Systems Ch 5. Memory Part 2/2

CSEE W4824 Computer Architecture Fall 2012

CS 261 Fall Mike Lam, Professor. Memory

Computer System Components

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

The Memory Hierarchy 1

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

Memory latency: Affects cache miss penalty. Measured by:

Microcontroller Systems. ELET 3232 Topic 11: General Memory Interfacing

Memory classification:- Topics covered:- types,organization and working

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter 5. Internal Memory. Yonsei University

Real Time Embedded Systems

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Mainstream Computer System Components

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Memory latency: Affects cache miss penalty. Measured by:

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

ECE 485/585 Midterm Exam

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Semiconductor Memory Types Microprocessor Design & Organisation HCA2102

EEM 486: Computer Architecture. Lecture 9. Memory

Structure of Computer Systems. advantage of low latency, read and write operations with auto-precharge are recommended.

Introduction read-only memory random access memory

CS311 Lecture 21: SRAM/DRAM/FLASH

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Large and Fast: Exploiting Memory Hierarchy

CSE502: Computer Architecture CSE 502: Computer Architecture

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CENG3420 Lecture 08: Memory Organization

a) Memory management unit b) CPU c) PCI d) None of the mentioned

Lecture 18: DRAM Technologies

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

Contents. Main Memory Memory access time Memory cycle time. Types of Memory Unit RAM ROM

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

Memory memories memory

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Module 5a: Introduction To Memory System (MAIN MEMORY)

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Chapter 8 Memory Basics

ECE 485/585 Midterm Exam

Chapter 4 Main Memory

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

CS 33. Memory Hierarchy I. CS33 Intro to Computer Systems XVI 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

ECE 485/585 Microprocessor System Design

CENG4480 Lecture 09: Memory 1

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

CS 261 Fall Mike Lam, Professor. Memory

Memory hierarchy Outline

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS

Design with Microprocessors

Main Memory (RAM) Organisation

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Storage Technologies and the Memory Hierarchy

Organization Row Address Column Address Bank Address Auto Precharge 128Mx8 (1GB) based module A0-A13 A0-A9 BA0-BA2 A10

Transcription:

COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory Access time: Time between read request and when desired word arrives Cycle time: Minimum time between unrelated requests to memory DRAM mostly used for main memory SRAM used for cache 1

Memory Technology Static Random Access Memory (SRAM) Requires low power to retain bit Requires 6 transistors/bit Dynamic Random Access Memory (DRAM) One transistor/bit Must be re-written after being read Must be periodically refreshed (~ 8 ms) Refresh can be done for an entire row simultaneously Memory system unavailable for an entire memory cycle (Access time + cycle time) Address lines are multiplexed: Upper half of address: row access strobe (RAS) Lower half of address: column access strobe (CAS) Source: http://www.eng.utah.edu/~cs7810/pres/11-7810-12.pdf 2

Memory Technology Amdahl: Memory capacity should grow linearly with processor speed Memory capacity and speed has not kept pace with processors Capacity increased at ~55% per year RAS cycle has improved at ~5% per year Year Chip size Slowest RAS DRAM (ns) Fastest RAS DRAM (ns) CAS (ns) Cycle time (ns) 1980 64Kbit 180 150 75 250 1989 4Mbit 100 80 20 165 1998 128Mbit 70 50 10 100 2010 4Gbit 36 28 1 37 2012 8Gbit 30 24 0.5 31 DRAM optimizations Dual Inline Memory Module (DIMM): Chip containing 4 16 DRAMS Double data rate (DDR): transfer data on the rising and falling edge of the DRAM clock signal Doubles the peak data rate Synchronous DRAM (SDRAM) Added clock to DRAM interface Removed initial synchronization with controller clock Contain a register to hold the number of bytes requested Up to 8 transfers of 16 bits each can be served without having to send a new address (burst mode) Burst mode often supports critical word first 3

DRAM optimizations (II) Multiple accesses to same row Wider interfaces DDR: offered 4 bit transfer mode DDR2 and DDR3: offer 16 bit transfer mode Multiple banks on each DRAM device 2-8 banks in DDR3 Requires to add another segment to the address: bank number, row address, column address DDR DRAMs and DIMMs 4

Memory Technology DDR: DDR2 Lower power (2.5 V -> 1.8 V) Higher clock rates (266 MHz, 333 MHz, 400 MHz) DDR3 1.5 V 800 MHz DDR4 1-1.2 V 1600 MHz GDDR5 is graphics memory based on DDR3 Memory Optimizations Graphics memory: Achieve 2-5 X bandwidth per DRAM vs. DDR3 Wider interfaces (32 bit vs. 16 bit) Higher clock rate Possible because they are attached via soldering instead of socketted DIMM modules Reducing power in SDRAMs: Lower voltage Low power mode (ignores clock, continues to refresh) 5

Flash Memory Type of Electronically Erasable Programmable Read-Only Memory (EEPROM) Holds contents without power Cheaper than SDRAM, more expensive than disk Slower than SDRAM, faster than disk Must be erased (in blocks) before being overwritten Limited number of write cycles Memory Dependability Electronic circuits are susceptible to cosmic rays For SDRAM Soft errors: dynamic errors Detected and fixed by error correcting codes (ECC) One parity bit for ~8 data bits Hard errors: permanent hardware errors DIMMS often contain sparse rows to replace defective rows Chipkill: RAID-like error recovery technique Failure of an entire chip can be handled 6

Intel Core i7 memory hierarchy 48 bit virtual address 36 bit physical address 2-level TLB caches 4 KB page size = 2 12 bytes Virtual address 0 12 24 36 48 Page frame Characteristics Instruction TLB Data TLB Second level TLB Size 128 entries 64 entries 512 entries Associativity 4-way 4-way 4-way Access latency 1 cycle 1 cycle 6 cycles Miss 7 cycles 7 cycles 100s of cycles Intel Core i7 memory hierarchy (II) Characteristics L1 Instruction L1 Data L2 L3 Size 32 KB 32 KB 256 KB 2 MB per core Associativity 4-way 8-way 8-way 16-way Access latency 4 cycles 4 cycles 10 cycles 35 cycles No. of index bits 7 6 9 13 bits* *Assuming a 4-core processors L1 and L2 separate per core, L3 shared among all cores L1 caches are virtually indexed but physically tagged L2 and L3 are physically indexed and tagged Non-blocking caches Merging write buffer for the L1 caches L3 inclusive of L1 and L2 cache Cache block size: 64 bytes => 6 bits for block offset required 7

Accessing the Instruction TLB Given a PC ( = virtual address) Send page frame of virtual address to instruction TLB to retrieve physical address TLB provides physical address if found and checks for access violation If not found in Instruction TLB, 2 nd level TLB is checked If not found in 2 nd level TLB operating system has to perform the translation Full page table can be very large and might be itself swapped out to disk -> another translation step required to load the corresponding part of the page table Accessing the Instruction Cache To identify address in L1 Instruction cache: Index field of virtual address: 7 bits + 2 bits from block offset ( i7 always loads 16 bytes per instruction request) Cache tag of physical address: 23 bits = 36 bits physical address 7 bits index - 6 bits block offset 2 nd level cache is physically indexed and physically tagged 36 bits physical address decomposed into: 6 bits offset, 9 bits index and 21 bits tag Block Address Tag Index Block Offset 8

9