ECE 485/585 Microprocessor System Design

Similar documents
ECE 341. Lecture # 16

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

EE 4683/5683: COMPUTER ARCHITECTURE

Course Administration

ECE 485/585 Microprocessor System Design

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Memory Hierarchy: Caches, Virtual Memory

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

ECE260: Fundamentals of Computer Engineering

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Lecture 2: Memory Systems

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Computer Science 432/563 Operating Systems The College of Saint Rose Spring Topic Notes: Memory Hierarchy

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Memory Hierarchy. Slides contents from:

LECTURE 10: Improving Memory Access: Direct and Spatial caches

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Topics. Digital Systems Architecture EECE EECE Need More Cache?

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Memory Hierarchy. Slides contents from:

Page 1. Memory Hierarchies (Part 2)

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Introduction to OpenMP. Lecture 10: Caches

Slide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

MIPS) ( MUX

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

ECE 341. Lecture # 18

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

Administrivia. Expect new HW out today (functions in assembly!)

CS3350B Computer Architecture

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Caching Basics. Memory Hierarchies

CS3350B Computer Architecture

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer)

Caches. Samira Khan March 23, 2017

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

CSE502: Computer Architecture CSE 502: Computer Architecture

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 6: Memory Organization Part I

Memory. Objectives. Introduction. 6.2 Types of Memory

Fundamentals of Computer Systems

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Caches. Hiding Memory Access Times

Page 1. Multilevel Memories (Improving performance using a little cash )

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 15

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

UNIT-V MEMORY ORGANIZATION

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Memory Hierarchies &

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

Lecture 18: Memory Systems. Spring 2018 Jason Tang

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Cache Memory and Performance

k -bit address bus n-bit data bus Control lines ( R W, MFC, etc.)

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

Large and Fast: Exploiting Memory Hierarchy

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

CSE 2021: Computer Organization

Fundamentals of Computer Systems

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Cray XE6 Performance Workshop

14:332:331. Week 13 Basics of Cache

Question?! Processor comparison!

EEC 483 Computer Organization

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

CSE 2021: Computer Organization

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

Chapter Seven Morgan Kaufmann Publishers

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Introduction to cache memories

6 th Lecture :: The Cache - Part Three

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Computer Architecture Memory hierarchies and caches

Chapter 6 Objectives

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Marten van Dijk Syed Kamran Haider, Chenglu Jin, Phuong Ha Nguyen. Department of Electrical & Computer Engineering University of Connecticut

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

Transcription:

Microprocessor System Design Lecture 8: Principle of Locality Cache Architecture Cache Replacement Policies Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering and Computer Science Source: Lecture based on materials provided by Mark F.

Cache Topics Cache Basics Memory vs. Processor Performance The Memory Hierarchy Registers, SRAM, DRAM, Disk Spatial and Temporal Locality Cache Hits and Misses Cache Organization Direct Mapped Caches Two-Way, Four-Way Caches Fully Associative (N-Way) Caches Sector-mapped caches Cache Line Replacement Algorithms Cache Performance and Performance improvements Cache Coherence Intel Cache Evolution Multicore Caches Cache Design Issues

The Problem: Memory Wall Memory Technology Trends 100,000 Relative Performance Gains 10,000 Performance 1,000 CPU 100 10 Memory 1 Year From Hennessy & Patterson, Computer Architecture: A Quantitative Approach (4 th edition)

Memory System Design Tradeoffs A big challenge in memory system design is to provide a sufficiently large memory capacity, with reasonable speed at an affordable cost SRAM Complex basic cell circuit => fast access, but high cost per bit DRAM Simpler basic cell circuit => less cost per bit, but slower than SRAMs Flash memory and Magnetic disks DRAMs provide more storage than SRAM but less than what is necessary Disks provide a large amount of storage, but are much slower than DRAMs No single memory technology can provide both large capacity and fast speed at an affordable cost

A Solution: Memory Hierarchy From Hennessy & Patterson, Computer Architecture: A Quantitative Approach (4 th edition) Processor Datapath Control Registers On-Chip Cache Second Level Cache (SRAM) Third Level Cache (SRAM) Main Memory (DRAM) Secondary Storage (Disk) Tertiary Storage (Tape) Intermediate results Cached DRAM Instructions File System Data Paging [Cached Files] Archive Backup

Intel Pentium 4 3.2 GHz Server Component Access Speed (Time for data to be returned) Registers 1 cycle = 0.3 nanoseconds L1 Cache 3 cycles = 1 nanoseconds L2 Cache 20 cycles = 7 nanoseconds L3 Cache 40 cycles = 13 nanoseconds Memory 300 cycles = 100 nanoseconds

How is the Hierarchy Managed? Registers Memory Compiler Programmer Cache Memory Hardware Memory Disk Operating System (Virtual Memory: Paging) Programmer (File System)

Principle of Locality Analysis of programs indicates that many instructions in localized areas of a program are executed repeatedly during some period of time, while other instructions are executed relatively less frequently These frequently executed instructions may be the ones in a loop, nested loop or few procedures calling each other repeatedly This is called locality of reference Temporal locality of reference Recently executed instruction is likely to be executed again very soon Recently accessed data is likely to be accessed very soon Spatial locality of reference Instructions/data with addresses close to a recently accessed instruction/data are likely to be accessed soon A cache is designed to take advantage of both types of locality of reference

Use of a Cache Memory Processor Cache Main memory A cache is a small but fast SRAM inserted between processor and main memory Data in a cache is organized at the granularity of cache blocks When the processor issues a request for a memory address, an entire block (e.g., 64 bytes) is transferred from the main memory to the cache Later references to same address can be serviced by the cache (temporal locality) References to other addresses in this block can also be serviced by the cache (spatial locality) Higher locality => More requests serviced by the cache

Caching Student Advising Analogy Thousands of student folders Indexed by 9-digit student ID Located up the stairs and down the hall a long walk Space for 100 file folders at my desk Located at my side short access time

Cache Organization How is the Cache laid out? The cache is made up of a number of cache lines (sometimes called blocks) Data is hauled into the cache from memory in chunks (may be smaller than a line) If CPU requests 4 bytes of data, cache gets entire line (32/64/128 bytes) Spatial locality says you re likely to need that data anyway Incur cost only once rather than each time CPU needs piece of data Ex: The Pentium P4 Xeon s Level 1 Data Cache: Contains 8K bytes The cache lines are each 64 bytes This gives 8192 bytes / 64 bytes = 128 cache lines

Simple Direct Mapped Cache 31 address 4 3 cache lines ( sets ) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 data index Use least significant 4 bits to determine which slot to cache data in But 2 28 different addresses could have their data cached in the same spot 0 4 set

Simple Direct Mapped Cache (cont d) address tag 31 43 index 0 valid tag data 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 set Need to store tag to be sure the data is for this address and not another (Only need to store the address minus the index bits 28 bits in this example)

Cache Hits and Misses When the processor needs to access some data, that data may or may not be found in the cache If the data is found in the cache, it is called a cache hit Read hit: The processor reads data from cache and does not need to go to memory Write hit: Cache has a replica of the contents of main memory, both cache and main memory need to be updated If the data is not found in the cache, it is called a cache miss The block containing the data is transferred from memory to cache After the block is transferred, the desired data is forwarded to the processor. The desired data may also be forwarded to the processor as soon as it is transferred without waiting for the entire cache block to be transferred. This is called load-through or critical word first

Cache Behavior Reads Read behavior if Valid bit clear /* slot empty cache miss */ stall CPU read cache line from memory set Valid bit write Tag bits deliver data to CPU else /* slot occupied */ if Tag bits match /* cache hit! */ deliver data to CPU else /* occupied by another cache miss */ stall CPU cast out existing cache line ( victim ) read cache line from memory write Tag bits deliver data to CPU

Cache Behavior - Writes Policy decisions for all writes Write Through Write Back Replace data in cache and memory Requires write buffer to be effective Allows CPU to continue w/o waiting for DRAM Replace data in cache only Requires addition of dirty bit in tag/valid memory Write back on when: Cache flush is performed Line becomes victim and is cast out Policy decision for write miss Write Allocate Place the data into the cache Write No Allocate (or Write Around) Don t place the data in the cache Philosophy successive writes (without intervening read) unlikely Saves not only the cache line fill for the requested cache line but possibility of casting out a line which is more likely to be used later

Write Buffer for Write-Through A Write Buffer is needed between cache and memory if using Write Through policy to avoid having processor wait Processor writes data into the cache and the write buffer Memory controller write contents of the buffer to memory Write Buffer is just a FIFO Intel: posted write buffer PWB Small depth Store frequency << 1/DRAM write cycle Processor Cache DRAM Write Buffer

Cache Behavior Writes Write behavior if Valid bit set /* slot occupied */ if Tag bits match /* cache hit! */ write data to cache write data to memory - or - set dirty bit for cache line else /* occupied by another */ stall CPU cast out existing cache line ( victim ) read cache line from memory write Tag bits write data to cache write data to memory - or - set dirty bit for cache line else /* slot empty */ stall CPU read cache line from memory write Tag bits set Valid bit write data to cache write data to memory - or - set dirty bit for cache line write through or write back Why? write through or write back assumes write allocate Why? write through or write back

Why read a cache line for a write? bytes to be written D V Tag cache line Data being written by CPU is smaller than cache line Write misses in cache Have only single valid bit and tag bits for entire line Subsequent read operation must find valid data for rest of cache line

Casting Out a Victim Depends upon policies Write Through Data in cache isn t the only current copy (memory is up to date) Just over-write victim cache line with new cache line (change tag bits) Write Back Must check dirty bit to see if victim cache line is modified If so, must write the victim cache line back to memory Can lead to interesting behavior A CPU read can cause memory write followed by read Write back dirty cache line (victim) Read new cache line A CPU write can cause a memory write followed by a read Write back dirty cache line (victim) Read new cache line into which data will be written in cache

What if Cache miss and cache location at Cache Index occupied Called a cache conflict or collision Action: Cast Out existing entry ( victim ) Replace with new entry..but what if we need that earlier entry again? Thrashing Solution N-way set associative caches Simultaneously hold in cache two (or more) lines that would have been forced to share same place in direct mapped cache

Cache Organization How Does The Cache Manage the Cache Lines? Associativity describes how data is stored in the cache Direct Mapped Associativity = 1 Each set has single line If it s in the cache there s only one place it could be N-way Associativity Each set contains N lines There are N places ( ways ) the line could be Fully associative All cache lines share the same possible places