Problem: Processor- Memory Bo<leneck

Similar documents
Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Problem: Processor Memory BoJleneck

Computer Systems CSE 410 Autumn Memory Organiza:on and Caches

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

CS241 Computer Organization Spring Principle of Locality

+ Random-Access Memory (RAM)

CS3350B Computer Architecture

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Read-only memory (ROM): programmed during production Programmable ROM (PROM): can be programmed once SRAM (Static RAM)

CS 240 Stage 3 Abstractions for Practical Systems

How to Write Fast Numerical Code

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 26, SPRING 2013

Giving credit where credit is due

Random-Access Memory (RAM) CISC 360. The Memory Hierarchy Nov 24, Conventional DRAM Organization. SRAM vs DRAM Summary.

CISC 360. The Memory Hierarchy Nov 13, 2008

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 26, FALL 2012

Random-Access Memory (RAM) Lecture 13 The Memory Hierarchy. Conventional DRAM Organization. SRAM vs DRAM Summary. Topics. d x w DRAM: Key features

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

How to Write Fast Numerical Code

Denison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Key features. ! RAM is packaged as a chip.! Basic storage unit is a cell (one bit per cell).! Multiple RAM chips form a memory.

The Memory Hierarchy Sept 29, 2006

General Cache Mechanics. Memory Hierarchy: Cache. Cache Hit. Cache Miss CPU. Cache. Memory CPU CPU. Cache. Cache. Memory. Memory

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

CMPSC 311- Introduction to Systems Programming Module: Caching

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Random Access Memory (RAM)

CMPSC 311- Introduction to Systems Programming Module: Caching

The Memory Hierarchy /18-213/15-513: Introduction to Computer Systems 11 th Lecture, October 3, Today s Instructor: Phil Gibbons

Caches Spring 2016 May 4: Caches 1

F28HS Hardware-Software Interface: Systems Programming

Computer Systems. Memory Hierarchy. Han, Hwansoo

211: Computer Architecture Summer 2016

The Memory Hierarchy / : Introduction to Computer Systems 10 th Lecture, Feb 12, 2015

Cache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons

CS 201 The Memory Hierarchy. Gerson Robboy Portland State University

This lecture. Virtual Memory. Virtual memory (VM) CS Instructor: Sanjeev Se(a

The. Memory Hierarchy. Chapter 6

NEXT SET OF SLIDES FROM DENNIS FREY S FALL 2011 CMSC313.

Foundations of Computer Systems

Memory hierarchies: caches and their impact on the running time

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Memory Management! Goals of this Lecture!

Today. The Memory Hierarchy. Random-Access Memory (RAM) Nonvolatile Memories. Traditional Bus Structure Connecting CPU and Memory

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Carnegie Mellon. Cache Memories

Today. Cache Memories. General Cache Concept. General Cache Organization (S, E, B) Cache Memories. Example Memory Hierarchy Smaller, faster,

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Memory Hierarchy. Announcement. Computer system model. Reference

Memory Hierarchy. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska

Caches Concepts Review

Memory Hierarchy. Cache Memory Organization and Access. General Cache Concept. Example Memory Hierarchy Smaller, faster,

Today. The Memory Hierarchy. Byte Oriented Memory Organization. Simple Memory Addressing Modes

Advanced Memory Organizations

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

Agenda Cache memory organization and operation Chapter 6 Performance impact of caches Cache Memories

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance

CS 153 Design of Operating Systems

Caches II CSE 351 Spring #include <yoda.h> int is_try(int do_flag) { return!(do_flag!do_flag); }

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Carnegie Mellon. Carnegie Mellon

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Today Cache memory organization and operation Performance impact of caches

Cache Memory and Performance

ECE232: Hardware Organization and Design

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CS 33. Caches. CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Recap: Machine Organization

The University of Adelaide, School of Computer Science 13 September 2018

Computer Organization: A Programmer's Perspective

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Carnegie Mellon. Cache Memories. Computer Architecture. Instructor: Norbert Lu1enberger. based on the book by Randy Bryant and Dave O Hallaron

Introduction to OpenMP. Lecture 10: Caches

Memory hierarchy Outline

Announcements. Outline

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Cache memories are small, fast SRAM based memories managed automatically in hardware.

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

Cray XE6 Performance Workshop

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CISC 360. Cache Memories Nov 25, 2008

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Cache Memories October 8, 2007

The Memory Hierarchy 10/25/16

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

EN1640: Design of Computing Systems Topic 06: Memory System

Cache Memories. EL2010 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 2010

Caches II. CSE 351 Autumn Instructor: Justin Hsia

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Transcription:

Storage Hierarchy Instructor: Sanjeev Se(a 1 Problem: Processor- Bo<leneck Processor performance doubled about every 18 months CPU Reg Bus bandwidth evolved much slower Main Core 2 Duo: Can process at least 256 Bytes/cycle (1 SSE two operand add and mult) Core 2 Duo: Bandwidth 2 Bytes/cycle Latency 100 cycles Solu,on: Caches 2 1

Cache DefiniNon: Computer memory with short access Nme used for the storage of frequently or recently used instrucnons or data 3 General Cache Mechanics Cache 84 9 10 14 3 Smaller, faster, more expensive memory caches a subset of the blocks 10 4 Data is copied in block- sized transfer units 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Larger, slower, cheaper memory viewed as parnnoned into blocks 4 2

General Cache Concepts: Hit Cache Request: 14 8 9 14 3 Data in block b is needed Block b is in cache: Hit! 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5 General Cache Concepts: Miss Cache Request: 12 8 12 9 14 3 Data in block b is needed Block b is not in cache: Miss! 12 Request: 12 Block b is fetched from memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block b is stored in cache Placement policy: determines where b goes Replacement policy: determines which block gets evicted (vic(m) 6 3

Cache Performance Metrics Miss Rate Frac(on of memory references not found in cache (misses / accesses) = 1 hit rate Typical numbers (in percentages): 3-10% for L1 can be quite small (e.g., < 1%) for L2, depending on size, etc. Hit Time Time to deliver a line in the cache to the processor includes (me to determine whether the line is in the cache Typical numbers: 1-2 clock cycle for L1 5-20 clock cycles for L2 Miss Penalty Addi(onal (me required because of a miss typically 50-200 cycles for main memory (Trend: increasing!) 7 Lets think about those numbers Huge difference between a hit and a miss Could be 100x, if just L1 and main memory Would you believe 99% hits is twice as good as 97%? Consider: cache hit (me of 1 cycle miss penalty of 100 cycles Average access (me: 97% hits: 1 cycle + 0.03 * 100 cycles = 4 cycles 99% hits: 1 cycle + 0.01 * 100 cycles = 2 cycles This is why miss rate is used instead of hit rate 8 4

Types of Cache Misses Cold (compulsory) miss Occurs on first access to a block Conflict miss Most hardware caches limit blocks to a small subset (some(mes a singleton) of the available cache slots e.g., block i must be placed in slot (i mod 4) Conflict misses occur when the cache is large enough, but mul(ple data objects all map to the same slot e.g., referencing blocks 0, 8, 0, 8,... would miss every (me Capacity miss Occurs when the set of ac(ve cache blocks (working set) is larger than the cache 9 Why Caches Work Locality: Programs tend to use data and instrucnons with addresses near or equal to those they have used recently Temporal locality: Recently referenced items are likely to be referenced again in the near future block SpaNal locality: Items with nearby addresses tend to be referenced close together in (me block 10 5

Example: Locality? sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Data: Temporal: sum referenced in each itera(on Spa(al: array a[] accessed in stride- 1 pacern InstrucNons: Temporal: cycle through loop repeatedly Spa(al: reference instruc(ons in sequence Being able to assess the locality of code is a crucial skill for a programmer 11 Locality Example #1 int sum_array_rows(int a[m][n]) { int i, j, sum = 0; } for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; 12 6

Locality Example #2 int sum_array_cols(int a[m][n]) { int i, j, sum = 0; } for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; 13 Locality Example #3 int sum_array_3d(int a[m][n][n]) { int i, j, k, sum = 0; } for (i = 0; i < M; i++) for (j = 0; j < N; j++) for (k = 0; k < N; k++) sum += a[k][i][j]; return sum; How can it be fixed? 14 7

Hierarchies Some fundamental and enduring propernes of hardware and socware systems: Faster storage technologies almost always cost more per byte and have lower capacity The gaps between memory technology speeds are widening True of registers DRAM, DRAM disk, etc. Well- wricen programs tend to exhibit good locality These propernes complement each other beaunfully They suggest an approach for organizing memory and storage systems known as a memory hierarchy 15 An Example Hierarchy L0: registers CPU registers hold words retrieved from L1 cache Smaller, faster, costlier per byte Larger, slower, cheaper per byte L4: L3: L2: L1: on- chip L1 cache (SRAM) off- chip L2 cache (SRAM) main memory (DRAM) local secondary storage (local disks) L1 cache holds cache lines retrieved from L2 cache L2 cache holds cache lines retrieved from main memory Main memory holds disk blocks retrieved from local disks Local disks hold files retrieved from disks on remote network servers L5: remote secondary storage (tapes, distributed file systems, Web servers) 16 8

Examples of Caching in the Hierarchy Cache Type What is Cached? Where is it Cached? Latency (cycles) Managed By Registers 4- byte words CPU core 0 Compiler TLB Address translanons On- Chip TLB 0 Hardware L1 cache 64- bytes block On- Chip L1 1 Hardware L2 cache 64- bytes block Off- Chip L2 10 Hardware Virtual 4- KB page Main memory 100 Hardware+OS Buffer cache Parts of files Main memory 100 OS Network buffer cache Parts of files Local disk 10,000,000 AFS/NFS client Browser cache Web pages Local disk 10,000,000 Web browser Web cache Web pages Remote server disks 1,000,000,000 Web proxy server 17 Hierarchy: Core 2 Duo L1/L2 cache: 64 B blocks Not drawn to scale CPU Reg Throughput: Latency: L1 I- cache 32 KB L1 D- cache ~4 MB L2 unified cache ~4 GB ~500 GB Main 16 B/cycle 8 B/cycle 2 B/cycle 1 B/30 cycles 3 cycles 14 cycles 100 cycles millions Disk 18 9