CS 61C: Great Ideas in Computer Architecture Direct- Mapped Caches. Increasing distance from processor, decreasing speed.

Similar documents
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

10/11/17. New-School Machine Structures. Review: Single Cycle Instruction Timing. Review: Single-Cycle RISC-V RV32I Datapath. Components of a Computer

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part I

Components of a Computer

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

CS 61C: Great Ideas in Computer Architecture Lecture 15: Caches, Part 2

CS 61C: Great Ideas in Computer Architecture Lecture 15: Caches, Part 2

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. 10/4/10 Fall Lecture #16. Agenda

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CMPT 300 Introduction to Operating Systems

EE 4683/5683: COMPUTER ARCHITECTURE

CS 61C: Great Ideas in Computer Architecture (Machine Structures) More Cache: Set Associa0vity. Smart Phone. Today s Lecture. Core.

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

UCB CS61C : Machine Structures

Course Administration

Agenda. Recap: Components of a Computer. Agenda. Recap: Cache Performance and Average Memory Access Time (AMAT) Recap: Typical Memory Hierarchy

Chapter 8 Memory Hierarchy and Cache Memory

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

14:332:331. Week 13 Basics of Cache

CS61C : Machine Structures

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

14:332:331. Week 13 Basics of Cache

CS 465 Final Review. Fall 2017 Prof. Daniel Menasce

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt

Review : Pipelining. Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture Caches Part 2

Review: New-School Machine Structures. Review: Direct-Mapped Cache. TIO Dan s great cache mnemonic. Memory Access without Cache

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Computer Systems CSE 410 Autumn Memory Organiza:on and Caches

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer

CS3350B Computer Architecture

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

UCB CS61C : Machine Structures

CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches, Set Associative Caches, Cache Performance

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS61C : Machine Structures

Carnegie Mellon. Cache Memories

CS 61C: Great Ideas in Computer Architecture Strings and Func.ons. Anything can be represented as a number, i.e., data or instruc\ons

Review. You Are Here! Agenda. Recap: Typical Memory Hierarchy. Recap: Components of a Computer 11/6/12

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Virtual Memory: Concepts

LECTURE 11. Memory Hierarchy

UCB CS61C : Machine Structures

CS61C : Machine Structures

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Direct-Mapped and Set Associative Caches. Instructor: Steven Ho

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures

! CS61C : Machine Structures. Lecture 22 Caches I. !!Instructor Paul Pearce! ITʼS NOW LEGAL TO JAILBREAK YOUR PHONE!

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Memory. Lecture 22 CS301

Memory Hierarchy. Mehran Rezaei

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy: Caches, Virtual Memory

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches

Memory Hierarchies &

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches. Smart Phone. Today s Lecture. Core. FuncTonal Unit(s) Logic Gates

CS 110 Computer Architecture

And in Review! ! Locality of reference is a Big Idea! 3. Load Word from 0x !

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Problem: Processor- Memory Bo<leneck

Instructor: Randy H. Katz hbp://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #13. Warehouse Scale Computer

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

CS61C : Machine Structures

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

Caches II. CSE 351 Spring Instructor: Ruth Anderson

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Transcription:

CS 6C: Great Ideas in Computer Architecture Direct- Mapped s 9/27/2 Instructors: Krste Asanovic, Randy H Katz hdp://insteecsberkeleyedu/~cs6c/fa2 Fall 2 - - Lecture #4 New- School Machine Structures (It s a bit more complicated!) Parallel Requests Assigned to computer eg, Search Katz Parallel Threads Assigned to core eg, Lookup, Ads So2ware Parallel Instruc^ons > instruc^on @ one ^me eg, 5 pipelined instruc^ons Parallel > data item @ one ^me eg, Add of 4 pairs of words Hardware descrip^ons All gates @ one ^me Programming Languages Harness Parallelism & Achieve High Performance Hardware Warehouse Scale Computer Core Input/Output Instruc^on Unit(s) Core Smart Phone Logic Gates 2 Computer () Core Func^onal Unit(s) A 0 +B 0 A +B A 2 +B 2 A 3 +B 3 Today s Lecture Inner Levels in memory hierarchy Outer Big Idea: Hierarchy Level Level 2 Level 3 Level n Increasing distance from processor, decreasing speed Size of memory at each level As we move to outer levels the latency goes up and price per bit goes down Why? 3 Library Analogy Wri^ng a report based on books on reserve Eg, works of JD Salinger Go to library to get reserved book and place on desk in library If need more, check them out and keep on desk But don t return earlier books since might need them You hope this collec^on of ~0 books on desk enough to write report, despite 0 being only 00000% of books in UC Berkeley libraries 4 Principle of Locality Principle of Locality: Programs access small por^on of address space at any instant of ^me What program structures lead to locality in instruc^on accesses? Philosophy Programmer- invisible hardware mechanism to give illusion of speed of fastest memory with size of largest memory Works fine even if programmer has no idea what a cache is However, performance- oriented programmers today some^mes reverse engineer cache design to design data structures to match cache We ll do that in Project 3 5 9/27/2 Spring 2 - - Lecture #2 6

Access without Load word instruc^on: lw $t0, 0($t)! $t contains 022 ten, [022] = 99 issues address 022 ten to 2 reads word at address 022 ten (99) 3 sends 99 to 4 loads 99 into register $t 9/27/2 Spring 2 - - Lecture #2 7 Access with Load word instruc^on: lw $t0, 0($t)! $t contains 022 ten, [022] = 99 With cache (similar to a hash) issues address 022 ten to 2 checks to see if has copy of data at address 022 ten 2a If finds a match (Hit): cache reads 99, sends to processor 2b No match (Miss): cache sends address 022 to I reads 99 at address 022 ten II sends 99 to III replaces word with new 99 IV sends 99 to processor 3 loads 99 into register $t 9/27/2 Spring 2 - - Lecture #2 8 s Need way to tell if have copy of loca^on in memory so that can decide on hit or miss On cache miss, put memory address of block in tag address of cache block 022 placed in tag next to data from memory (99) 252 2 3 7 4 From earlier instruc^ons 9/27/2 Spring 2 - - Lecture #2 9 Anatomy of a 6 Byte, 4 Byte Block Opera^ons: Hit 2 Miss 3 Refill cache from memory needs s to decide if is a Hit or Miss Compares all 4 tags 9/27/2 Spring 2 - - Lecture # 0 252 2 3 7 4 Requirements Suppose processor now requests loca^on 5, which contains? Doesn t match any cache block, so must evict one resident block to make room Which block to evict? Replace vic^m with new memory block at address 5 252 2 5 3 7 4 9/27/2 Spring 2 - - Lecture #2 Block Must be Aligned in Word blocks are aligned, so binary address of all words in cache always ends in 00 two How to take advantage of this to save hardware and energy? Don t need to compare last 2 bits of byte address (comparator can be narrower) => Don t need to store last 2 bits of byte address in ( can be narrower) 9/27/2 Spring 2 - - Lecture #2 2 2

Anatomy of a 32B, 8B Block Blocks must be aligned in pairs, otherwise could get same word twice in cache s only have even- numbered words Last 3 bits of address always 000 two s, comparators can be narrower Can get hit for either word in block 9/27/2 Spring 2 - - Lecture # 3 252 30 40 2 42 947-0 000 7 Big Idea: Locality Temporal Locality (locality in ^me) Go back to same book on desktop mul^ple ^mes If a memory loca^on is referenced, then it will tend to be referenced again soon SpaGal Locality (locality in space) When go to book shelf, pick up mul^ple books on JD Salinger since library stores related books together If a memory loca^on is referenced, the loca^ons with nearby addresses will tend to be referenced soon 4 Principle of Locality Principle of Locality: Programs access small por^on of address space at any instant of ^me What program structures lead to temporal and spa^al locality in instruc^on accesses? In data accesses? Common Op^miza^ons Reduce tag overhead by having larger blocks Eg, 2 words, 4 words, 8 words Separate caches for instruc^ons and data Double bandwidth, don t interfere with each other Bigger caches (but access ^me could get bigger than one clock cycle if too big) Divide cache into mul^ple sets, only search inside one set => saves comparators, energy If as many sets as blocks, then only comparator (aka Direct- Mapped ) But may increase Miss Rate 5 9/27/2 Spring 2 - - Lecture #2 6 Hardware Cost of Need to compare every tag to the address Comparators are expensive Op^miza^on: 2 sets => ½ comparators bit selects which set Set 0 Set 7 7 Fields used by Controller Block Offset: Byte address within block Set : Selects which set : Remaining por^on of processor address (s total) Set Size of = log2 (number of sets) Size of = size Size of log2 (number of bytes/block) 8 3

What is limit to number of sets? Can save more comparators if have more than 2 sets Limit: As Many Sets as Blocks only needs one comparator! Called Direct- Mapped Design One More Detail: Valid Bit When start a new program, cache does not have valid informa^on for this program Need an indicator whether this tag entry is valid for this program Add a valid bit to the cache tag entry 0 => cache miss, even if by chance, address = tag => cache hit, if processor address = tag 9 Direct- Mapped Example One word blocks, cache size = K words (or 4KB) Hit Valid bit ensures something useful in cache for this index Compare with upper part of to see if a Hit 0 Valid 0 2 02 022 023 3 30 3 2 2 0 Comparator What kind of locality are we taking advantage of? 2 32 Read data from cache instead of memory if a Hit Administrivia Lab #5: MIPS Assembly HW #4 (of six), due Sunday Project 2a: MIPS Emulator, due Sunday Midterm, two weeks from yesterday 22 Agenda Hierarchy Analogy Hierarchy Overview Administrivia s Fully Associa^ve, N- Way Set Associa^ve, Direct Mapped s Performance Mul^level s Terms Hit rate: frac^on of access that hit in the cache Miss rate: Hit rate Miss penalty: ^me to replace a block from lower level in memory hierarchy to cache Hit ^me: ^me to access cache memory (including tag comparison) Abbrevia^on: $ = cache (A Berkeley innova^on!) 23 24 4

Mapping a 6- bit 5 4 3 2 Mem Block Within $ Block Block Within $ Byte Offset Within Block (eg, Word) In example, block size is 4 bytes/ word (it could be mul^- word) and cache blocks are the same size, unit of transfer between memory and cache # blocks >> # blocks 6 blocks/6 words/64 bytes/6 bits to address all bytes 4 blocks, 4 bytes ( word) per block 4 blocks map to each cache block Byte within block: low order two bits, ignore! (nothing smaller than a block) block to cache block, aka index: middle two bits Which memory block is in a given cache block, aka tag: top two bits 25 0 Caching: A Simple First Example Valid 00 0 0 Q: Is the mem block in cache? Compare the cache tag to the high- order 2 memory address bits to tell if the memory block is in the cache (provided valid bit is set) 0000xx 000xx 000xx 00xx 000xx 00xx 00xx 0xx 000xx 00xx 00xx 0xx 00xx 0xx 0xx xx Main One word blocks Two low- order bits define the byte in the block (32b words) Q: Where in the cache is the memory block? Use next 2 low- order memory address bits the index to determine which cache block (ie, modulo the number of blocks in the cache) 26 00 0 0 Caching: A Simple First Example Valid Q: Is the memory block in cache? Compare the cache tag to the high- order 2 memory address bits to tell if the memory block is in the cache (provided valid bit is set) 0000xx 000xx 000xx 00xx 000xx 00xx 00xx 0xx 000xx 00xx 00xx 0xx 00xx 0xx 0xx xx Main One word blocks Two low order bits (xx) define the byte in the block (32b words) Q: Where in the cache is the mem block? Use next 2 low- order memory address bits the index to determine which cache block (ie, modulo the number of blocks in the cache) 27 Mul^word- Block Direct- Mapped Four words/block, cache size = K words Byte 3 30 3 2 4 3 2 0 Hit offset Valid 0 2 253 254 255 8 What kind of locality are we taking advantage of? 28 32 Names for Each Organiza^on Fully Associa^ve : Block can go anywhere First design in lecture Note: No field, but comparator/block Direct Mapped : Block goes one place Note: Only comparator Number of sets = number blocks N- way Set Associa^ve : N places for a block Number of sets = number of blocks / N Fully Associa^ve: N = number of blocks Direct Mapped: N = 29 Range of Set- Associa^ve s For a fixed- size cache, each increase by a factor of 2 in associa^vity doubles the number of blocks per set (ie, the number of ways ) and halves the number of sets decreases the size of the index by bit and increases the size of the tag by bit More Associa^vity (more ways) Note: IBM persists in calling sets ways and ways sets They re wrong 30 5

For S sets, N ways, B blocks, which statements hold? A) The cache has B tags B) The cache needs N comparators C) B = N x S D) Size of = Log 2 (S) A only A and B only A, B, and C only! path Typical Hierarchy On- Chip Components Control RegFile Instr Second Level (SRAM) Main (DRAM) Secondary (Disk Or Flash) Speed (cycles): ½ s s 0 s 00 s,000,000 s Size (bytes): 00 s 0K s M s G s T s Cost/bit: highest lowest Principle of locality + memory hierarchy presents programmer with as much memory as is available in the cheapest technology at the speed offered by the fastest technology 3 32 Review so far Principle of Locality for Libraries /Computer Hierarchy of Memories (speed/size/cost per bit) to Exploit Locality copy of data lower level in memory hierarchy Direct Mapped to find block in cache using field and Valid bit for Hit Larger caches reduce Miss rate via Temporal and Spa^al Locality, but can increase Hit ^me Mul^level caches help Miss penalty AMAT helps balance Hit ^me, Miss rate, Miss penalty 33 6