ECE 2300 Digital Logic & Computer Organization. Caches

Similar documents
ECE 2300 Digital Logic & Computer Organization. More Caches Measuring Performance

Computer Architecture CS372 Exam 3

ECE331: Hardware Organization and Design

Computer Architecture Spring 2016

CPU Pipelining Issues

ECE 2300 Digital Logic & Computer Organization. More Single Cycle Microprocessor

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

CMSC411 Fall 2013 Midterm 1

CS 251, Winter 2019, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark

Memory hierarchy Outline

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Question 1: (20 points) For this question, refer to the following pipeline architecture.

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

COSC 6385 Computer Architecture - Pipelining

ECE 2300 Digital Logic & Computer Organization

LECTURE 11. Memory Hierarchy

ECE154A Introduction to Computer Architecture. Homework 4 solution

Comprehensive Exams COMPUTER ARCHITECTURE. Spring April 3, 2006

Chapter 5. Memory Technology

ECE 331 Hardware Organization and Design. UMass ECE Discussion 10 4/5/2018

ECE 2300 Digital Logic & Computer Organization. Single Cycle Microprocessor

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Lecture 9 Pipeline and Cache

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

DLX Unpipelined Implementation

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11

ECE260: Fundamentals of Computer Engineering

LECTURE 3: THE PROCESSOR

Chapter 4. The Processor

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS/CoE 1541 Mid Term Exam (Fall 2018).

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The University of Adelaide, School of Computer Science 13 September 2018

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011

CS/CoE 1541 Exam 1 (Spring 2019).

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

Designing a Pipelined CPU

ECE331: Hardware Organization and Design

Full Datapath. Chapter 4 The Processor 2

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Chapter 4 The Processor 1. Chapter 4A. The Processor

CS 2506 Computer Organization II Test 2

Lecture-14 (Memory Hierarchy) CS422-Spring

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

ECE 2300 Digital Logic & Computer Organization. More Caches

Computer Systems Architecture Spring 2016

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

COMP2611: Computer Organization. The Pipelined Processor

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CSE 2021: Computer Organization

lecture 18 cache 2 TLB miss TLB - TLB (hit and miss) - instruction or data cache - cache (hit and miss)

Processor (II) - pipelining. Hwansoo Han

COMPUTER ORGANIZATION AND DESIGN

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

CSE 2021: Computer Organization

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

COMPUTER ORGANIZATION AND DESIGN

COMP 3221: Microprocessors and Embedded Systems

Virtual memory. Hung-Wei Tseng

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

14:332:331. Week 13 Basics of Cache

10/11/17. New-School Machine Structures. Review: Single Cycle Instruction Timing. Review: Single-Cycle RISC-V RV32I Datapath. Components of a Computer

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

EN1640: Design of Computing Systems Topic 06: Memory System

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches

ECE 2300 Digital Logic & Computer Organization. More Caches

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Pipeline Control Hazards and Instruction Variations

Lecture 16: Cache in Context (Uniprocessor) James C. Hoe Department of ECE Carnegie Mellon University

Final Exam Fall 2007

ECE232: Hardware Organization and Design

(Basic) Processor Pipeline

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6

Q1: Finite State Machine (8 points)

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Handout 4 Memory Hierarchy

Chapter 4 The Processor 1. Chapter 4B. The Processor

Transcription:

ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1

Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2

Course Content Binary numbers and logic gates Boolean algebra and combinational logic Sequential logic and state machines Binary arithmetic Memories Instruction set architecture organization s and virtual memory Input/output Advanced topics Lecture 2: 3

Review: Pipelined Microprocessor PCJ CU sign bit =? MB, F MW, MD LD M U X PCJ P C PCL +2 Inst RAM Decoder Adder LD SA SB DR RF D_in M U X M U X M U X M U X M U X MB F m F ALU Data RAM D_IN MW M U X MD SE IF/ID ID/EX EX/MEM MEM/WB Lecture 2: 4

Example: Data Hazards w/ Forwarding Assume HW forwarding and NO delay slot for load Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back LW, () ADDI,, 1 BEQ,, X SW, 4() X: Lecture 2: 5

We Need Fast and Large IF ID EX MEM WB Instruction RAM Reg A L U Data RAM Reg IF/ID ID/EX EX/MEM MEM/WB cycle time: ~3ps-2ns (~3GHz-5MHz) DRAM Slow (1-5 ns for a read or write) Cheap (1 transistor + capacitor per bit cell) SRAM Fast ( s of ps to few ns for a read/write) Expensive (6 transistors per bit cell) Lecture 2: 6

Using s in the Pipeline IF ID EX MEM WB Instruction (SRAM) Reg A L U Data (SRAM) Reg IF/ID ID/EX EX/MEM MEM/WB Main (DRAM) Lecture 2: 7

Small SRAM memory that permits rapid access to a subset of instructions or data If the data is in the cache (cache hit), we retrieve it without slowing down the pipeline If the data is not in the cache (cache miss), we retrieve it from the main memory (penalty incurred in accessing DRAM) The hit rate is the fraction of memory accesses found in the cache The miss rate is (1 hit rate) Lecture 2: 8

Access with Average memory access time with cache: Hit time + Miss rate * Miss penalty Example Main memory access time = 5ns hit time = 2ns Miss rate = 1% Average mem access time w/o cache = 5ns Average mem access time w/ cache = 2 +.1*5 = 7ns Lecture 2: 9

Why s Work: Principle of Locality Temporal locality If memory location X is accessed, then it is likely to be accessed again in the near future s exploit temporal locality by keeping a referenced instruction or data in the cache Spatial locality If memory location X is accessed, then locations near X are likely to be accessed in the near future s exploit spatial locality by bringing in a block of instructions or data into the cache on a miss Lecture 2: 1

Some Important Terms is partitioned into blocks Each cache block (or cache line) typically contains multiple bytes of data A whole block is read or written during data transfer between cache and main memory Each cache block is associated with a tag and a valid bit Tag: A unique ID to differentiate between different memory blocks may be mapped into the same block Valid bit: indicates whether the data in a cache block is valid (1) or not () Lecture 2: 11

Direct Mapped Concepts A given memory block is mapped to one and only one cache block Block Block, 8, 16, 24 1 1, 9, 17, 25 2 2, 1, 18, 26 3 3, 11, 19, 27 4 4, 12, 2, 28 5 5, 13, 21, 29 6 6, 14, 22, 3 7 7, 15, 23, 31 Example: A cache with 8 blocks Block addresses in decimal Here we assume the main memory is 4 times larger than cache Lecture 2: 12

Direct Mapped (DM) Concepts 1 11 1 Same example: has 8 blocks and main memory has 32 blocks 1 1 11 Block addresses are in binary 4 different memory blocks may be mapped to the same cache location 11 Lecture 2: 13

DM Organization Lecture 2: 14

Address Translation for DM Breakdown of memory address for cache use n-i-b tag bits i index bits b byte offset bits DM cache parameters Size of each cache block is 2 b bytes cache block and cache line are synonymous Number of blocks is 2 i Total cache size is 2 b 2 i = 2 b+i bytes Lecture 2: 15

Use the index bits to retrieve the tag, data, and valid bit Reading DM Compare the tag from the address with the retrieved tag If valid & a match in tag (hit), select the desired data using the byte offset Otherwise (miss) Bring the memory block into the cache (also set valid) Store the tag from the address with the block Select the desired data using the byte offset Lecture 2: 16

Use the index bits to retrieve the tag and valid bit Writing DM Compare the tag from the address with the retrieved tag Data If valid & a match in tag (hit), write the data into the cache location Otherwise (miss), one option Bring the memory block into the cache (also set valid) Store the tag from the address with the block Write the data into the cache location Lecture 2: 17

Direct Mapped Example Size of each block is 4 bytes holds 4 blocks holds 16 blocks address has 6 bits 2 tag bits 2 index bits 2 byte offset bits 1 1 11 Lecture 2: 18

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R miss 1 1 11 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 block address (binary) Data (decimal) Lecture 2: 19

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R miss 1 1 11 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 2

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R miss 1 1 11 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 21

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R miss 1 1 11 1 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 22

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R miss 1 1 11 1 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 23

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 14 miss 1 1 11 1 1 14 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 24

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 14 miss 1 1 11 1 1 14 1 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 25

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 11 1 1 14 1 1 1 17 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 26

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 11 1 1 14 1 1 1 17 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 27

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 11 1 1 1 1 17 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 28

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 17 14 hit 1 1 11 1 1 1 1 17 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 29

Direct Mapped Example <= M[] <= M[] <= M[11] <= M[] R 17 14 hit 1 1 11 1 1 1 1 17 1 1 11 11 1 11 1 1 1 1 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 3

Doubling the Block Size Size of each block is 8 bytes holds 2 blocks holds 8 blocks address has 6 bits 2 tag bits 3 byte offset bits 1 V tag data 1 index bit Lecture 2: 31

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R miss 1 V tag data 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 32

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R miss 1 1 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 33

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R hit 1 1 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 34

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R hit 1 1 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 35

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R miss 1 1 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 36

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 14 miss 1 1 1 14 15 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 37

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 14 miss 1 1 1 14 15 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 38

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 1 14 15 1 1 16 17 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 39

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 1 14 15 1 1 16 17 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 4

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 17 14 miss 1 1 1 1 16 17 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 41

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 17 14 hit 1 1 1 1 16 17 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 42

Doubling the Block Size <= M[] <= M[] <= M[11] <= M[] R 17 14 hit 1 1 1 1 16 17 1 1 1 1 11 11 11 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 Lecture 2: 43

Block Size Considerations Larger blocks may reduce miss rate due to spatial locality But in a fixed-sized cache Larger blocks => fewer of them => increased miss rate due to conflicts Larger blocks => data fetched along with the requested data may not be used Larger blocks increase the miss penalty Takes longer to transfer a larger block from memory Lecture 2: 44

Next Time More s Lecture 2: 45