CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Similar documents
CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

Lecture 7 - Memory Hierarchy-II

C 1. Last Time. CSE 490/590 Computer Architecture. Cache I. Branch Delay Slots (expose control hazard to software)

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Lecture 11 Cache. Peng Liu.

Page 1. Multilevel Memories (Improving performance using a little cash )

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2

EE 660: Computer Architecture Advanced Caches

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2

! errors caused by signal attenuation, noise.!! receiver detects presence of errors:!

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt

PART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR

Using CANopen Slave Driver

Outline. EECS Components and Design Techniques for Digital Systems. Lec 06 Using FSMs Review: Typical Controller: state

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

COSC 3213: Computer Networks I Chapter 6 Handout # 7

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Network management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional);

Data Structures and Algorithms. The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 2

CS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from:

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

4 Error Control. 4.1 Issues with Reliable Protocols

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 7 Memory III

Optimal Crane Scheduling

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

EC 513 Computer Architecture

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

COMP26120: Algorithms and Imperative Programming

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab

CSC 631: High-Performance Computer Architecture

Memory Hierarchy. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

4. Minimax and planning problems

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

CS 152 Computer Architecture and Engineering. Lecture 19: Directory- Based Cache Protocols. Recap: Snoopy Cache Protocols

The Difference-bit Cache*

Lecture 9 - Virtual Memory

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

Lecture 6 - Memory. Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

Scheduling. Scheduling. EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #4 Updated March 16, 2012

Dimmer time switch AlphaLux³ D / 27

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 11 Cache and Virtual Memory

Utility-Based Hybrid Memory Management

Chapter 8 LOCATION SERVICES

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Exercise 3: Bluetooth BR/EDR

Data Structures and Algorithms

CS252 Spring 2017 Graduate Computer Architecture. Lecture 11: Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

USBFC (USB Function Controller)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

Memory hierarchy review. ECE 154B Dmitri Strukov

Chapter 4 Sequential Instructions

Midterm Exam Announcements

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Video streaming over Vajda Tamás

Gauss-Jordan Algorithm

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Why not experiment with the system itself? Ways to study a system System. Application areas. Different kinds of systems

Motor Control. 5. Control. Motor Control. Motor Control

An Adaptive Spatial Depth Filter for 3D Rendering IP

CMPSC 311- Introduction to Systems Programming Module: Caching

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CENG 477 Introduction to Computer Graphics. Modeling Transformations

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Chapter 3 MEDIA ACCESS CONTROL

ECE 4750 Computer Architecture, Fall 2014 T05 FSM and Pipelined Cache Memories

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Test - Accredited Configuration Engineer (ACE) Exam - PAN-OS 6.0 Version

14:332:331. Week 13 Basics of Cache

CS 152, Spring 2011 Section 8

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Computer Systems and Networks. ECPE 170 Jeff Shafer University of the Pacific $$$ $$$ Cache Memory 2 $$$

FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory. Last =me in Lecture 5

Advanced Computer Architecture

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

CMPSC 274: Transac0on Processing Lecture #6: Concurrency Control Protocols

Announcements. TCP Congestion Control. Goals of Today s Lecture. State Diagrams. TCP State Diagram

Mo Money, No Problems: Caches #2...

CS 152 Computer Architecture and Engineering

Transcription:

CS 152 Compuer Archiecure and Engineering Lecure 7 - Memory Hierarchy-II Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152 Las ime in Lecure 6 Dynamic RAM (DRAM) is main form of main memory sorage in use oday Holds values on small capaciors, need refreshing (hence dynamic) Slow muli-sep access: precharge, read row, read column Saic RAM (SRAM) is faser bu more expensive Used o build on-chip memory for caches s exploi wo forms of predicabiliy in memory reference sreams Temporal localiy, same locaion likely o be accessed again soon Spaial localiy, neighboring locaion likely o be accessed soon holds small se of values in fas memory (SRAM) close o processor Need o develop search scheme o find values in cache, and replacemen policy o make space for newly accessed locaions 2/17/2009 CS152-Spring!09 2

Placemen Policy Block Number 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Memory Se Number 0 1 2 3 0 1 2 3 4 5 6 7 block 12 can be placed Fully (2-way) Se Direc Associaive Associaive Mapped anywhere anywhere in only ino se 0 block 4 (12 mod 4) (12 mod 8) 2/17/2009 CS152-Spring!09 3 Direc-Mapped Index Block Offse V k b Block 2 k lines = HIT Word or Bye 2/17/2009 CS152-Spring!09

2-Way Se-Associaive Index Block Offse b V k Block V Block = = Word or Bye HIT 2/17/2009 CS152-Spring!09 Fully Associaive V Block = Block Offse b = = Word or Bye HIT 2/17/2009 CS152-Spring!09

Replacemen Policy In an associaive cache, which block from a se should be eviced when he se becomes full? Random Leas Recenly Used (LRU) LRU cache sae mus be updaed on every access rue implemenaion only feasible for small ses (2-way) pseudo-lru binary ree ofen used for 4-8 way Firs In, Firs Ou (FIFO) a.k.a. Round-Robin used in highly associaive caches No Leas Recenly Used (NLRU) FIFO wih excepion for mos recenly used block or blocks This is a second-order effec. Why? Replacemen only happens on misses 2/17/2009 CS152-Spring!09 7 Block Size and Spaial Localiy Block is uni of ransfer beween he cache and memory Word0 Word1 Word2 Word3 4 word block, b=2 Spli CPU address block address offse b 32-b bis b bis 2 b = block size a.k.a line size (in byes) Larger block size has disinc hardware advanages less ag overhead exploi fas burs ransfers from DRAM exploi fas burs ransfers over wide busses Wha are he disadvanages of increasing block size? Fewer blocks => more conflics. Can wase bandwidh. 2/17/2009 CS152-Spring!09 8

CPU- Ineracion (5-sage pipeline) PCen PC 0x4 Add addr nop ins hi? Primary Insrucion IR D Decode, Regiser Fech E A B MD1 ALU M Y MD2 we addr Primary rdaa hi? wdaa R Sall enire CPU on daa cache miss To Memory Conrol Refill from Lower Levels of Memory Hierarchy 2/17/2009 CS152-Spring!09 9 Improving Performance Average memory access ime = Hi ime + Miss rae x Miss penaly To improve performance: reduce he hi ime reduce he miss rae reduce he miss penaly Wha is he simples design sraegy? Bigges cache ha doesn increase hi ime pas 1-2 cycles (approx 8-32KB in modern echnology) [ design issues more complex wih ou-of-order superscalar processors ] 2/17/2009 CS152-Spring!09 10

Serial-versus-Parallel and Memory access! is HIT RATIO: Fracion of references in cache 1 -! is MISS RATIO: Remaining references Processor Addr CACHE Average access ime for serial search: Addr Main Memory cache + (1 -!) mem Processor Addr CACHE Main Memory Average access ime for parallel search:! cache + (1 -!) mem Savings are usually small, mem >> cache, hi raio! high High bandwidh required for memory pah Complexiy of handling parallel pahs can slow cache 2/17/2009 CS152-Spring!09 Causes for Misses Compulsory: firs-reference o a block a.k.a. cold sar misses - misses ha would occur even wih infinie cache Capaciy: cache is oo small o hold all daa needed by he program - misses ha would occur even under perfec replacemen policy Conflic: misses ha occur because of collisions due o block-placemen sraegy - misses ha would no occur wih full associaiviy 2/17/2009 CS152-Spring!09 12

Effec of Parameers on Performance Larger cache size + reduces capaciy and conflic misses - hi ime will increase Higher associaiviy + reduces conflic misses - may increase hi ime Larger block size + reduces compulsory and capaciy (reload) misses - increases conflic misses and miss penaly 2/17/2009 CS152-Spring!09 13 Wrie Policy Choices hi: wrie hrough: wrie boh cache & memory» generally higher raffic bu simplifies cache coherence wrie back: wrie cache only (memory is wrien only when he enry is eviced)» a diry bi per block can furher reduce he raffic miss: no wrie allocae: only wrie o main memory wrie allocae (aka fech on wrie): fech ino cache Common combinaions: wrie hrough and no wrie allocae wrie back wih wrie allocae 2/17/2009 CS152-Spring!09 14

Wrie Performance V Index k Block Offse b 2 k lines = WE HIT Word or Bye 2/17/2009 CS152-Spring!09 15 Reducing Wrie Hi Time Problem: Wries ake wo cycles in memory sage, one cycle for ag check plus one cycle for daa wrie if hi Soluions: Design daa RAM ha can perform read and wrie in one cycle, resore old value afer ag miss Fully-associaive (CAM ) caches: Word line only enabled if hi Pipelined wries: Hold wrie daa for sore in single buffer ahead of cache, wrie cache daa during nex sore s ag check 2/17/2009 CS152-Spring!09 16

CS152 Adminisrivia 2/17/2009 CS152-Spring!09 17 Pipelining Wries Address and Sore From CPU Index Sore s Delayed Wrie Addr. =? Load/Sore S L Delayed Wrie =? 1 0 Load o CPU Hi? from a sore hi wrien ino daa porion of cache during ag access of subsequen sore 2/17/2009 CS152-Spring!09 18

Wrie Buffer o Reduce Read Miss Penaly CPU RF Wrie buffer Unified L2 Eviced diry lines for wrieback cache OR All wries in wriehru cache Processor is no salled on wries, and read misses can go ahead of wrie o main memory Problem: Wrie buffer may hold updaed value of locaion needed by a read miss Simple scheme: on a read miss, wai for he wrie buffer o go empy Faser scheme: Check wrie buffer addresses agains read miss addresses, if no mach, allow read miss o go ahead of wries, else, reurn value in wrie buffer 2/17/2009 CS152-Spring!09 19 Block-level Opimizaions s are oo large, i.e., oo much overhead Simple soluion: Larger blocks, bu miss penaly could be large. Sub-block placemen (aka secor cache) A valid bi added o unis smaller han full block, called sub-blocks Only read a sub-block on a miss If a ag maches, is he word in he cache? 100 300 204 1 1 1 1 1 1 0 0 0 1 0 1 2/17/2009 CS152-Spring!09 20

Se-Associaive RAM- Saus Saus =? =? No energy-efficien A ag and daa word is read from every way Two-phase approach Firs read ags, hen jus read daa from seleced way More energy-efficien Doubles laency in L1 OK, for L2 and above, why? Index Offse 2/17/2009 CS152-Spring!09 21 Mulilevel s A memory canno be large and fas Increasing sizes of cache a each level CPU L1$ L2$ DRAM Local miss rae = misses in cache / accesses o cache Global miss rae = misses in cache / CPU memory accesses Misses per insrucion = misses in cache / number of insrucions 2/17/2009 CS152-Spring!09 22

A Typical Memory Hierarchy c.2008 Spli insrucion & daa primary caches (on-chip SRAM) Muliple inerleaved memory banks (off-chip DRAM) CPU RF L1 Insrucion L1 Unified L2 Memory Memory Memory Memory Mulipored regiser file (par of CPU) Large unified secondary cache (on-chip SRAM) 2/17/2009 CS152-Spring!09 23 Presence of L2 influences L1 design Use smaller L1 if here is also L2 Trade increased L1 miss rae for reduced L1 hi ime and reduced L1 miss penaly Reduces average access energy Use simpler wrie-hrough L1 wih on-chip L2 Wrie-back L2 cache absorbs wrie raffic, doesn go off-chip A mos one L1 miss reques per L1 access (no diry vicim wrie back) simplifies pipeline conrol Simplifies coherence issues Simplifies error recovery in L1 (can use jus pariy bis in L1 and reload from L2 when pariy error deeced on L1 read) 2/17/2009 CS152-Spring!09 24

Inclusion Policy Inclusive mulilevel cache: Inner cache holds copies of daa in ouer cache Exernal access need only check ouer cache Mos common case Exclusive mulilevel caches: Inner cache may hold daa no in ouer cache Swap lines beween inner/ouer caches on miss Used in AMD Ahlon wih 64KB primary and 256KB secondary cache Why choose one ype or he oher? 2/17/2009 CS152-Spring!09 25 Ianium-2 On-Chip s (Inel/HP, 2002) Level 1: 16KB, 4-way s.a., 64B line, quad-por (2 load+2 sore), single cycle laency Level 2: 256KB, 4-way s.a, 128B line, quad-por (4 load or 4 sore), five cycle laency Level 3: 3MB, 12-way s.a., 128B line, single 32B por, welve cycle laency 2/17/2009 CS152-Spring!09 26

Acknowledgemens These slides conain maerial developed and copyrigh by: Arvind (MIT) Krse Asanovic (MIT/UCB) Joel Emer (Inel/MIT) James Hoe (CMU) John Kubiaowicz (UCB) David Paerson (UCB) MIT maerial derived from course 6.823 UCB maerial derived from course CS252 2/17/2009 CS152-Spring!09 27