CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

Similar documents
CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

C 1. Last Time. CSE 490/590 Computer Architecture. Cache I. Branch Delay Slots (expose control hazard to software)

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

Lecture 6 - Memory. Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory. Last =me in Lecture 5

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Page 1. Multilevel Memories (Improving performance using a little cash )

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Lecture-14 (Memory Hierarchy) CS422-Spring

CS252 Spring 2017 Graduate Computer Architecture. Lecture 11: Memory

Outline. EECS Components and Design Techniques for Digital Systems. Lec 06 Using FSMs Review: Typical Controller: state

PART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

Lecture 7 - Memory Hierarchy-II

Memory Hierarchy. Slides contents from:

COMP26120: Algorithms and Imperative Programming

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

CS252 Graduate Computer Architecture Spring 2014 Lecture 10: Memory

Computer Architecture ELEC3441

4. Minimax and planning problems

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

! errors caused by signal attenuation, noise.!! receiver detects presence of errors:!

MOBILE COMPUTING. Wi-Fi 9/20/15. CSE 40814/60814 Fall Wi-Fi:

MOBILE COMPUTING 3/18/18. Wi-Fi IEEE. CSE 40814/60814 Spring 2018

Scheduling. Scheduling. EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #4 Updated March 16, 2012

FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS

EC 513 Computer Architecture

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

Dimmer time switch AlphaLux³ D / 27

EEM 486: Computer Architecture. Lecture 9. Memory

Using CANopen Slave Driver

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Topic 21: Memory Technology

Topic 21: Memory Technology

CS 152, Spring 2011 Section 8

Memory Hierarchy. Slides contents from:

Network management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional);

Low-Cost WLAN based. Dr. Christian Hoene. Computer Science Department, University of Tübingen, Germany

MIC2569. Features. General Description. Applications. Typical Application. CableCARD Power Switch

Lecture 9 - Virtual Memory

Lecture 13 - VLIW Machines and Statically Scheduled ILP

Advanced Computer Architecture

Parallel Multigrid Preconditioning on Graphics Processing Units (GPUs) for Robust Power Grid Analysis

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)

PCMCIA / JEIDA SRAM Card

Video streaming over Vajda Tamás

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt

NEWTON S SECOND LAW OF MOTION

Location. Electrical. Loads. 2-wire mains-rated. 0.5 mm² to 1.5 mm² Max. length 300 m (with 1.5 mm² cable). Example: Belden 8471

COSC 3213: Computer Networks I Chapter 6 Handout # 7

DRAM Main Memory. Dual Inline Memory Module (DIMM)

CS152 Computer Architecture and Engineering Lecture 16: Memory System

EE414 Embedded Systems Ch 5. Memory Part 2/2

CS422 Computer Networks

Overview of Board Revisions

Chapter 8 Memory Basics

Exercise 3: Bluetooth BR/EDR

A Matching Algorithm for Content-Based Image Retrieval

A time-space consistency solution for hardware-in-the-loop simulation system

Assignment 2. Due Monday Feb. 12, 10:00pm.

ECE 485/585 Microprocessor System Design

The Memory Hierarchy & Cache

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

Optimal Crane Scheduling

Mobile Computing IEEE Standard 9/10/14. CSE 40814/60814 Fall 2014

PCMCIA / JEIDA SRAM Card

CS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory

CpE 442. Memory System

Chapter 8 LOCATION SERVICES

CSC 631: High-Performance Computer Architecture

Motor Control. 5. Control. Motor Control. Motor Control

Data Structures and Algorithms. The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 2

Utility-Based Hybrid Memory Management

Introduction to memory system :from device to system

MUX 1. GENERAL DESCRIPTION

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

MOTION DETECTORS GRAPH MATCHING LAB PRE-LAB QUESTIONS

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 7 Memory III

Connections, displays and operating elements. 3 aux. 5 aux.

Page 1. Key Points from Last Lecture Frame format. EEC173B/ECS152C, Winter Wireless LANs

Gauss-Jordan Algorithm

Connections, displays and operating elements. Status LEDs (next to the keys)

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy and Caches

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

The DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell

Memories: Memory Technology

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Mainstream Computer System Components

Transcription:

CS 152 Compuer Archiecure and Engineering Lecure 6 - Memory Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152 Las ime in Lecure 5 Conrol hazards (branches, inerrups) are mos difficul o handle as hey change which insrucion should be execued nex Speculaion commonly used o reduce effec of conrol hazards (predic sequenial fech, predic no excepions) Branch delay slos make conrol hazard visible o sofware Precise excepions: sop cleanly on one insrucion, all previous insrucions compleed, no following insrucions have changed archiecural sae To implemen precise excepions in pipeline, shif fauling insrucions down pipeline o commi poin, where excepions are handled in program order 2

CPU-Memory Boleneck CPU Memory Performance of high-speed compuers is usually limied by memory bandwidh & laency Laency (ime for a single access) Memory access ime >> Processor cycle ime Bandwidh (number of accesses per uni ime) if fracion m of insrucions access memory,!1+m memory references / insrucion!cpi = 1 requires 1+m memory refs / cycle (assuming MIPS RISC ISA) 3 Core Memory Core memory was firs large scale reliable main memory invened by Forreser in lae 40s/early 50s a MIT for Whirlwind projec Bis sored as magneizaion polariy on small ferrie cores hreaded ono 2 dimensional grid of wires Coinciden curren pulses on X and Y wires would wrie cell and also sense original sae (desrucive reads) Robus, non-volaile sorage Used on space shule compuers unil recenly Cores hreaded ono wires by hand (25 billion a year a peak producion) Core access ime ~ 1µs DEC PDP-8/E Board, 4K words x 12 bis, (1968) 4

Semiconducor Memory, DRAM Semiconducor memory began o be compeiive in early 1970s Inel formed o exploi marke for semiconducor memory Firs commercial DRAM was Inel 1103 1Kbi of sorage on single chip charge on a capacior used o hold value Semiconducor memory quickly replaced core in 70s 5 One Transisor Dynamic RAM 1-T DRAM Cell word access ransisor V REF TiN op elecrode (V REF ) Ta 2 O 5 dielecric bi Sorage capacior (FET gae, rench, sack) poly word line W boom elecrode access ransisor 6

DRAM Archiecure Col. 1 bi lines Col. N+M N M Row Address Decoder Column Decoder & Sense Amplifiers 2 M Row 1 word lines Row 2 N Memory cell (one bi) D Bis sored in 2-dimensional arrays on chip Modern chips have around 4 logical banks on each chip each logical bank physically implemened as many smaller arrays 7 DRAM Packaging ~7 Clock and conrol signals Address lines muliplexed row/column address ~12 bus (4b,8b,16b,32b) DRAM chip DIMM (Dual Inline Memory Module) conains muliple chips wih clock/conrol/address signals conneced in parallel (someimes need buffers o drive signals o all chips) pins work ogeher o reurn wide word (e.g., 64-bi daa bus using 16x4-bi pars) 8

DRAM Operaion Three seps in read/wrie access o a given bank Row access (RAS) decode row address, enable addressed row (ofen muliple Kb in row) bilines share charge wih sorage cell small change in volage deeced by sense amplifiers which lach whole row of bis sense amplifiers drive bilines full rail o recharge sorage cells Column access (CAS) decode column address o selec small number of sense amplifier laches (4, 8, 16, or 32 bis depending on DRAM package) on read, send lached bis ou o chip pins on wrie, change sense amplifier laches which hen charge sorage cells o required value can perform muliple column accesses on same row wihou anoher row access (burs mode) Precharge charges bi lines o known value, required before nex row access Each sep has a laency of around 15-20ns in modern DRAMs Various DRAM sandards (DDR, RDRAM) have differen ways of encoding he signals for ransmission o he DRAM, bu all share same core archiecure 9 200MHz Clock Double- Rae (DDR2) DRAM Row Column Precharge Row [ Micron, 256Mb DDR2 SDRAM daashee ] 400Mb/s Rae 10

Performance Processor-DRAM Gap (laency) 1000 100 10 1 1980 1981 1982 1983 1984 1985 1986 1987!Proc 60%/year Moore s Law 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time DRAM CPU Processor-Memory Performance Gap: (grows 50% / year) Four-issue 2GHz superscalar accessing 100ns DRAM could execue 800 insrucions during ime for one memory access! DRAM 7%/year 11 Typical Memory Reference Paerns Insrucion feches Address Sack accesses accesses Time

Common Predicable Paerns Two predicable properies of memory references: Temporal Localiy: If a locaion is referenced i is likely o be referenced again in he near fuure. Spaial Localiy: If a locaion is referenced i is likely ha locaions near i will be referenced in he near fuure. Memory Reference Paerns Memory Address (one do per access) Spaial Localiy Temporal Localiy Time Donald J. Hafield, Jeanee Gerald: Program Resrucuring for Virual Memory. IBM Sysems Journal 10(3): 168-192 (1971)

Mulilevel Memory Sraegy: Reduce average laency using small, fas memories called caches. Caches are a mechanism o reduce memory laency based on he empirical observaion ha he paerns of memory references made by a processor are ofen highly predicable: PC 96 loop: ADD r2, r1, r1 100 SUBI r3, r3, #1 104 BNEZ r3, loop 108 112 Memory Hierarchy CPU A Small, Fas Memory (RF, SRAM) B holds frequenly used daa Big, Slow Memory (DRAM) capaciy: Regiser << SRAM << DRAM why? laency: Regiser << SRAM << DRAM why? bandwidh: on-chip >> off-chip why? On a daa access: hi (daa " fas memory)! low laency access miss (daa # fas memory)! long laency access (DRAM) 16

Relaive Memory Cell Sizes On-Chip SRAM in logic chip DRAM on memory chip [ Foss, Implemening Applicaion-Specific Memory, ISSCC 1996 ] 17 Managemen of Memory Hierarchy Small/fas sorage, e.g., regisers Address usually specified in insrucion Generally implemened direcly as a regiser file» bu hardware migh do hings behind sofware s back, e.g., sack managemen, regiser renaming Large/slower sorage, e.g., memory Address usually compued from values in regiser Generally implemened as a cache hierarchy» hardware decides wha is kep in fas memory» bu sofware may provide hins, e.g., don cache or prefech 18

CS152 Adminisrivia Quiz 1 Thursday in class (306 Soda) Lecures 1-5, closed book, no calculaors or compuers Krse, special office hours, Wednesday 2/11, 2-3pm, 579 Soda Hall (Par Lab) Sco special office hours, Wednesday 2/11, 4-5pm, 711 Soda Hall Nex week lecure 2/17 back in 320 Soda 19 Caches Caches exploi boh ypes of predicabiliy: Exploi emporal localiy by remembering he conens of recenly accessed locaions. Exploi spaial localiy by feching blocks of daa around recenly accessed locaions.

Inside a Cache Address Address Processor CACHE Main Memory copy of main memory locaion 100 copy of main memory locaion 101 100 304 Bye Bye Bye Line Address 6848 416 Block Cache Algorihm (Read) Look a Processor Address, search cache ags o find mach. Then eiher Found in cache a.k.a. HIT No in cache a.k.a. MISS Reurn copy of daa from cache Read block of daa from Main Memory Wai Reurn daa o processor and updae cache Q: Which line do we replace?

Placemen Policy Block Number 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Memory Se Number 0 1 2 3 0 1 2 3 4 5 6 7 Cache block 12 can be placed Fully (2-way) Se Direc Associaive Associaive Mapped anywhere anywhere in only ino se 0 block 4 (12 mod 4) (12 mod 8) 23 Direc-Mapped Cache Index Block Offse V k b Block 2 k lines = HIT Word or Bye

Direc Map Address Selecion higher-order vs. lower-order address bis Index Block Offse k V Block b 2 k lines = HIT Word or Bye 2-Way Se-Associaive Cache Index Block Offse b V k Block V Block = = Word or Bye HIT

Fully Associaive Cache V Block = Block Offse b = = Word or Bye HIT Replacemen Policy In an associaive cache, which block from a se should be eviced when he se becomes full? Random Leas Recenly Used (LRU) LRU cache sae mus be updaed on every access rue implemenaion only feasible for small ses (2-way) pseudo-lru binary ree ofen used for 4-8 way Firs In, Firs Ou (FIFO) a.k.a. Round-Robin used in highly associaive caches No Leas Recenly Used (NLRU) FIFO wih excepion for mos recenly used block or blocks This is a second-order effec. Why? 28

Acknowledgemens These slides conain maerial developed and copyrigh by: Arvind (MIT) Krse Asanovic (MIT/UCB) Joel Emer (Inel/MIT) James Hoe (CMU) John Kubiaowicz (UCB) David Paerson (UCB) MIT maerial derived from course 6.823 UCB maerial derived from course CS252 29