Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

Similar documents
Advanced Computer Architecture

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

COSC 6385 Computer Architecture - Memory Hierarchies (I)

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

Modern Computer Architecture

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

CS654 Advanced Computer Architecture. Lec 2 - Introduction

CPE 631 Lecture 04: CPU Caches

CSE 502 Graduate Computer Architecture. Lec 6-7 Memory Hierarchy Review

Memory hierarchy review. ECE 154B Dmitri Strukov

Handout 4 Memory Hierarchy

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

ECE468 Computer Organization and Architecture. Virtual Memory

CS152 Computer Architecture and Engineering Lecture 17: Cache System

CSE 502 Graduate Computer Architecture. Lec 5-6 Memory Hierarchy Review

ECE4680 Computer Organization and Architecture. Virtual Memory

Computer Architecture Spring 2016

CSC Memory System. A. A Hierarchy and Driving Forces

CS152 Computer Architecture and Engineering Lecture 18: Virtual Memory

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Caches Part 1. Instructor: Sören Schwertfeger. School of Information Science and Technology SIST

CSE 502 Graduate Computer Architecture. Lec 7-10 App B: Memory Hierarchy Review

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Review from last lecture. EECS 252 Graduate Computer Architecture. Lec 4 Memory Hierarchy Review. Outline. Example Standard Deviation: Last time

Question?! Processor comparison!

Lecture-14 (Memory Hierarchy) CS422-Spring

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

Lecture 11. Virtual Memory Review: Memory Hierarchy

Memory Hierarchies 2009 DAT105

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Caching Basics. Memory Hierarchies

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

Memory Hierarchy Review

ECE ECE4680

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

Review : Pipelining. Memory Hierarchy

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS61C : Machine Structures

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Homework 6. BTW, This is your last homework. Assigned today, Tuesday, April 10 Due time: 11:59PM on Monday, April 23. CSCI 402: Computer Architectures

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

CS61C : Machine Structures

LECTURE 11. Memory Hierarchy

The Memory Hierarchy & Cache

Levels in memory hierarchy

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

ECE331: Hardware Organization and Design

Computer Systems Architecture

CS61C Review of Cache/VM/TLB. Lecture 26. April 30, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

MIPS) ( MUX

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

UCB CS61C : Machine Structures

More and faster A.W. Burks, H.H. Goldstine A.Von Neumann

Memory Hierarchy: Caches, Virtual Memory

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

COSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University

10/11/17. New-School Machine Structures. Review: Single Cycle Instruction Timing. Review: Single-Cycle RISC-V RV32I Datapath. Components of a Computer

Memory Hierarchy. Slides contents from:

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

UC Berkeley CS61C : Machine Structures

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Advanced Memory Organizations

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

EE 4683/5683: COMPUTER ARCHITECTURE

UC Berkeley CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

Administrivia. CMSC 411 Computer Systems Architecture Lecture 6. When do MIPS exceptions occur? Review: Exceptions. Answers to HW #1 posted

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

CS61C : Machine Structures

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

UCB CS61C : Machine Structures

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

LECTURE 10: Improving Memory Access: Direct and Spatial caches

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches, Set Associative Caches, Cache Performance

UCB CS61C : Machine Structures

Chapter 6 Objectives

Transcription:

Administrivia CMSC 4 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy Alan Sussman als@cs.umd.edu Homework # returned today solutions posted on password protected web page Homework # posted and due next Tuesday Read Appendix B of H&P First project, on basic pipelining, posted by tomorrow CMSC 4-8 (some from Patterson, Sussman, others)! R4 pipeline performance 4 major causes of pipeline stalls SPEC9 benchmarks Assuming a perfect cache 5 integer and five FP programs load stalls from using load result or cycles after load branch stalls cycles on every taken branch, or empty branch delay slot FP result stalls RAW hazards for an FP operand FP structural stalls from conflicts for functional units in FP pipeline CMSC 4-8 (some from Patterson, Sussman, others)! 3 CMSC 4-8 (some from Patterson, Sussman, others)! 4

Dynamically scheduled pipelines We ll cover this, and the scoreboard technique, in Unit 5 Pitfalls Unexpected hazards do occur for example, when a branch is taken before a previous instruction finishes Extensive pipelining can slow a machine down, or lead to worse cost-performance more complex hardware can cause a longer clock cycle, killing the benefits of more pipelining CMSC 4-8 (some from Patterson, Sussman, others)! 5 CMSC 4-8 (some from Patterson, Sussman, others)! 6 Pitfalls (cont.) A poor compiler can make a good machine look bad compiler writers need to understand the architecture in order to» optimize efficiently and» avoid hazards better to eliminate useless instructions, than make them run faster MEMORY HIERARCHY CMSC 4-8 (some from Patterson, Sussman, others)! 7 CMSC 4-8 (some from Patterson, Sussman, others)! 8

Levels of the Memory Hierarchy Capacity Access Time Cost CPU Registers s Bytes <s ns Cache K Bytes - ns -. cents/bit Main Memory G Bytes ns- 5ns $.-. cents /bit Disk T Bytes, ms (,, ns) -5-6 - cents/bit Tape infinite sec-min -8 Registers Cache Memory Disk Tape Instr. Operands Blocks Pages Files Staging Xfer Unit prog./compiler -8 bytes cache cntl 8-8 bytes OS 5-4K bytes user/operator Mbytes Upper Level faster Larger Lower Level The Principle of Locality The Principle of Locality: Program accesses a relatively small portion of the address space at any short period of time. Two Different Types of Locality: Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access) Last 5- years, HW has relied on locality to improve overall performance It is a property of programs that is exploited in machine design. CMSC 4-8 (some from Patterson, Sussman, others)! 9 CMSC 4-8 (some from Patterson, Sussman, others)! Memory Address (one dot per access)! Programs with locality cache well... Bad locality behavior Spatial Locality Temporal Locality Time! Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal (3): 68-9 (97)! Issues to consider How big should the fastest memory (cache memory) be? How do we decide what to put in cache memory? If the cache is full, how do we decide what to remove? How do we find something in cache? How do we handle writes? CMSC 4-8 (some from Patterson, Sussman, others)! CMSC 4-8 (some from Patterson, Sussman, others)!

First, there is main memory Jargon: frame address which page? block number which cache block? contents the data Then add a cache Jargon: Each address of a memory location is partitioned into block address» tag» index block offset Fig. 5.5 CMSC 4-8 (some from Patterson, Sussman, others)! 3 CMSC 4-8 (some from Patterson, Sussman, others)! 4 How does cache memory work? The following slides discuss: what cache memory is three organizations for cache memory» direct mapped.» set associative» fully associative how the bookkeeping is done Important note: All addresses shown are in octal. Addresses in the book are usually decimal. What is cache memory? Main memory first Main memory is divided into (cache) blocks. Each block contains many words (3-56 common now). CMSC 4-8 (some from Patterson, Sussman, others)! 5 CMSC 4-8 (some from Patterson, Sussman, others)! 6

Main memory Main memory (cont.) Blocks are grouped into frames (pages), 3 frames in this picture. Blocks are addressed by their frame number, and their block number within the frame. 3 4 5 6 7 3 4 5 6 7 CMSC 4-8 (some from Patterson, Sussman, others)! 7 CMSC 4-8 (some from Patterson, Sussman, others)! 8 Cache memory Cache memory (cont.) Cache has many, MANY fewer blocks than main memory, each with a block number, 3 4 5 6 7 Initially, all the valid bits set to zero. 3 4 5 6 7 a memory address, 4 53 74 5 6 77 4 53 74 5 6 77 data, a valid bit, a dirty bit. CMSC 4-8 (some from Patterson, Sussman, others)! 9 CMSC 4-8 (some from Patterson, Sussman, others)!

Where can a block be placed? Cache memory (cont.) Block placed in 8 block cache: Fully associative, direct mapped, -way set associative Cache Memory Full Mapped Direct Mapped ( mod 8) = 4 -Way Assoc ( mod 4) = 34567 34567 34567 33 345678934567893456789 Suppose want to load block 4 (octal) from memory into cache. Three ways to organize cache direct mapped set associative fully associative 3 4 5 6 7 4 53 74 5 6 77 CMSC 4-8 (some from Patterson, Sussman, others)! CMSC 4-8 (some from Patterson, Sussman, others)! Direct mapped cache Direct mapped cache (cont.) In direct mapped cache, block 4 can only be put in the cache block with address 4. 3 4 5 6 7 4 53 74 5 6 77 After the load, the contents look like this. 3 4 5 6 7 4 53 4 5 6 77 So the cache will no longer hold the block with memory address 74. CMSC 4-8 (some from Patterson, Sussman, others)! 3 CMSC 4-8 (some from Patterson, Sussman, others)! 4

Set associative cache Set associative cache (cont.) In set associative cache, each memory block can be put in any of a set of possible blocks in cache. Set Set Set Set 3 3 4 5 6 7 4 4 55 7 6 3 77 So after loading the block, cache memory might look like this. Set Set Set Set 3 3 4 5 6 7 4 4 4 55 7 6 3 77 For example, if divide cache into 4 sets, block 4 can be put in any block in Set (since last two bits of 4 octal are zero). CMSC 4-8 (some from Patterson, Sussman, others)! 5 CMSC 4-8 (some from Patterson, Sussman, others)! 6 Set associative cache (cont.) Set associative cache replacement Note that the last two bits of the memory block s address always match the set number, so do not need to be stored. This part of the address is called the index. The higher order bits are stored, and are called the tag. In these pictures, both index and tag shown. Set Set Set Set 3 3 4 5 6 7 4 4 4 55 7 6 3 77 Which entry in the set to replace? Three common choices: Replace an eligible random block Replace the least recently used (LRU) block» can be hard to keep track of, so often only approximated Replace the oldest eligible block (First In, First Out, or FIFO) Recall: Block address Tag Index Block offset CMSC 4-8 (some from Patterson, Sussman, others)! 7 CMSC 4-8 (some from Patterson, Sussman, others)! 8

Data cache replacement example Fully associative cache SPEC, in misses per instructions Set associativity Two-way Four-way Eight-Way In fully associative cache, memory blocks may be stored anywhere. 3 4 5 6 7 4 53 74 5 6 77 Size LRU Random FIFO LRU Random FIFO LRU Random FIFO 6KB 4. 7.3 5.5.7 5. 3.3 9..8.4 64KB 3.4 4.3 3.9.4.3 3. 99.7.5.3 56KB 9. 9. 9.5 9. 9. 9.5 9. 9. 9.5 So block 4 might be put in the first available block -- one with valid =. CMSC 4-8 (some from Patterson, Sussman, others)! 9 CMSC 4-8 (some from Patterson, Sussman, others)! 3 Fully associative cache (cont.) With this result. 3 4 5 6 7 4 4 53 74 5 6 77 CMSC 4-8 (some from Patterson, Sussman, others)! 3