Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view

Size: px
Start display at page:

Download "Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view"

Transcription

1 1 Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view Pierre Michaud INRIA HiPEAC 11, January 26, 2011

2 2 Outline Self-performance contract Proposition for making the performance of sequential programs executing on symmetric multicores more deterministic Implications for the microarchitecture Focus on bus bandwidth and shared cache

3 3 Context / assumptions General-purpose symmetric multicore E.g., Intel Nehalem Multiprogrammed environment Sequential programming still predominant Even parallelizable applications have sequential parts Some programs require a minimum performance level

4 4 Performance-concerned programmer Application performance (depends on platform, programming language, programmer s skill, ) minimum acceptable performance maximum tolerable effort Programming effort (~ time)

5 5 Parallel vs. sequential The performance-concerned programmer may not necessarily want to write parallel programs Application performance parallel sequential Programming effort

6 6 Example Sam invents a new video decompression method

7 Non deterministic performance? 7

8 8 Programming a (soft) real-time app Application performance best perf worst perf non determinism requires more effort Programming effort

9 9 Resource sharing in multicores Caches, bandwidth, power, temperature, Performance depends on which apps run simultaneously Shared cache is a major source of non determinism observed slowdowns on an Intel Nehalem up to 2 x The operating system cannot solve the problem but by preventing apps to run simultaneously

10 10 The bandwidth paradox A higher bus bandwidth allows evicting more blocks from the last-level shared cache during a given time Increasing bandwidth may decrease the performance of some apps Not a healthy situation!

11 11 Known solutions Do nothing, ignore the problem (current situation) Private caches But bus bandwidth is still shared Programmable quotas (not implemented so far) Shared cache and bus bandwidth are partitioned on demand The operating system is supposed to do the partitioning The developer of an application has no guarantee that the application can get more than 1/N of a resource on an N-core machine

12 12 Proposition : self-performance contract Symmetric run copies of the application run simultaneously on all cores and use the same inputs Self-performance performance measured for one instance of the app under a symmetric run The OS provides a selfperf utility that the programmer can use for doing symmetric runs and measuring self-performance Self-performance contract the microarchitect tries to keep the actual performance greater than or equal to selfperformance

13 13 Rationale Defines a performance target in isolation No need to make assumptions about other apps With N cores, this gives 1/N of shared resources to the app Simple for the programmer No need to know internal microarchitecture details Just be aware of the self-performance contract Programmers who are not concerned can still measure performance as usual

14 Implications for the microarchitecture 14

15 15 Static partitioning? Static partitioning = private resources Possible way to implement the self-performance contract but static partitioning of bus bandwidth is quite inefficient Shared resources provide higher average performance Especially when running simultaneously fewer threads than cores

16 16 Dealing with shared resources How to manage shared resources so as to implement the selfperformance contract without harming throughput? threads needing less than 1/N of a shared resource get what they need threads needing more than 1/N of a shared resource get at least 1/N If some threads need less than their fair share, the surplus is allotted to the others threads

17 17 Bus bandwidth Self-performance is generally higher than what would be obtained by a static partitioning of bus bandwidth bursts of last-level cache misses LLC misses per cycle Symmetric run, threads slightly out-of-sync bandwidth time

18 18 Thread arbitration policy Self-performance requires fair arbitration between threads that takes into account memory requests burstiness Fair policies have been proposed with programmable quotas Self-performance contract can use simpler implementations A simple policy that works well for bus access N threads N counters, one per thread (e.g., 4-bit counter) priority to thread with smallest counter value add N-1 to selected thread counter, subtract 1 from all other counters

19 19 Shared cache Last-level cache is the shared resource most critical for selfperformance contract The app should ask for large pages, or the OS should implement superpages or do some page coloring so that mapping to cache sets is as much as possible deterministic Thread-oblivious replacement policies (e.g., LRU) are incompatible with self-performance contract

20 20 Replacement policy Replacement policy should partition each cache set equally among threads competing for that set Proposition: the SAR B2 policy Underlying policy can be LRU, CLOCK, NRU, DIP, DRRIP, Upon a miss Pick a random block in the cache set random thread B2 rule: find which of random thread and missing thread has more blocks in the cache set Victim selection : victimize a block from that thread

21 21 Thread ID Thread ID must be stored along with each cached block Microarchitectural TID affects performance only N logical cores Log 2 (N) bits TID Unused cores (i.e., not currently running a thread) means some TIDs are inactive Reclaim rule : if random block TID is inactive, victimize that block

22 22 Remarks Random block selection ensures convergence to fair partitioning The fair partitioning of a cache set may change after an increase or a decrease of the number of active threads If an active thread needs less than its fair share, other active threads can share the surplus

23 23 Hardware cost Stored TIDs Example: 8 cores, 64-byte blocks storage overhead < 0.6 % Underlying replacement policy may require extra storage CLOCK requires one clock hand per core and per set prefer NRU and NRU-based (e.g., DRRIP) Some logic for B2 rule and victim selection Last-level cache miss latency 100 s of clock cycles can use some sequential logic

24 24 B2 rule sequential implementation read TIDs of blocks in the set Missing block TID M =? incr. shift register counter R =? decr. random value Random block TID Final counter sign used for victim selection

25 25 Conclusion Possible to make performance of sequential apps much more deterministic Requires modifying shared cache replacement policy and bus arbitration Reasonable hardware cost Parallel programs? no obvious solution so far Non determinism inherent to some parallel programs

Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view

Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view Pierre Michaud To cite this version: Pierre Michaud. Replacement policies for shared caches on symmetric

More information

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou (  Zhejiang University Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads

More information

Cache Replacement Championship. The 3P and 4P cache replacement policies

Cache Replacement Championship. The 3P and 4P cache replacement policies 1 Cache Replacement Championship The 3P and 4P cache replacement policies Pierre Michaud INRIA June 20, 2010 2 Optimal replacement? Offline (we know the future) Belady Online (we don t know the future)

More information

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5) Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1 More Cache Basics caches are split as instruction and data; L2 and L3 are unified The /L2 hierarchy can be inclusive,

More information

Lecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1)

Lecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1) Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1) 1 Types of Cache Misses Compulsory misses: happens the first time a memory word is accessed the misses for an infinite cache

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

PAGE REPLACEMENT. Operating Systems 2015 Spring by Euiseong Seo

PAGE REPLACEMENT. Operating Systems 2015 Spring by Euiseong Seo PAGE REPLACEMENT Operating Systems 2015 Spring by Euiseong Seo Today s Topics What if the physical memory becomes full? Page replacement algorithms How to manage memory among competing processes? Advanced

More information

Operating Systems Lecture 6: Memory Management II

Operating Systems Lecture 6: Memory Management II CSCI-GA.2250-001 Operating Systems Lecture 6: Memory Management II Hubertus Franke frankeh@cims.nyu.edu What is the problem? Not enough memory Have enough memory is not possible with current technology

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Data-flow prescheduling for large instruction windows in out-of-order processors. Pierre Michaud, André Seznec IRISA / INRIA January 2001

Data-flow prescheduling for large instruction windows in out-of-order processors. Pierre Michaud, André Seznec IRISA / INRIA January 2001 Data-flow prescheduling for large instruction windows in out-of-order processors Pierre Michaud, André Seznec IRISA / INRIA January 2001 2 Introduction Context: dynamic instruction scheduling in out-oforder

More information

Virtual Memory - II. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - XVI. University at Buffalo.

Virtual Memory - II. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - XVI. University at Buffalo. CSE 421/521 - Operating Systems Fall 2012 Lecture - XVI Virtual Memory - II Tevfik Koşar University at Buffalo October 25 th, 2012 1 Roadmap Virtual Memory Page Replacement Algorithms Optimal Algorithm

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Operating Systems. IV. Memory Management

Operating Systems. IV. Memory Management Operating Systems IV. Memory Management Ludovic Apvrille ludovic.apvrille@telecom-paristech.fr Eurecom, office 470 http://soc.eurecom.fr/os/ @OS Eurecom Outline Basics of Memory Management Hardware Architecture

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 06: Multithreaded Processors

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 06: Multithreaded Processors Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 06: Multithreaded Processors Objective To learn meaning of thread To understand multithreaded processors,

More information

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

Chapters 9 & 10: Memory Management and Virtual Memory

Chapters 9 & 10: Memory Management and Virtual Memory Chapters 9 & 10: Memory Management and Virtual Memory Important concepts (for final, projects, papers) addressing: physical/absolute, logical/relative/virtual overlays swapping and paging memory protection

More information

Lecture 14: Large Cache Design II. Topics: Cache partitioning and replacement policies

Lecture 14: Large Cache Design II. Topics: Cache partitioning and replacement policies Lecture 14: Large Cache Design II Topics: Cache partitioning and replacement policies 1 Page Coloring CACHE VIEW Bank number with Page-to-Bank Tag Set Index Bank number with Set-interleaving Block offset

More information

CISC 7310X. C05: CPU Scheduling. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 3/1/2018 CUNY Brooklyn College

CISC 7310X. C05: CPU Scheduling. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 3/1/2018 CUNY Brooklyn College CISC 7310X C05: CPU Scheduling Hui Chen Department of Computer & Information Science CUNY Brooklyn College 3/1/2018 CUNY Brooklyn College 1 Outline Recap & issues CPU Scheduling Concepts Goals and criteria

More information

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013)

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013) CPU Scheduling Daniel Mosse (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013) Basic Concepts Maximum CPU utilization obtained with multiprogramming CPU I/O Burst Cycle Process

More information

Operating Systems. Memory Management. Lecture 9 Michael O Boyle

Operating Systems. Memory Management. Lecture 9 Michael O Boyle Operating Systems Memory Management Lecture 9 Michael O Boyle 1 Memory Management Background Logical/Virtual Address Space vs Physical Address Space Swapping Contiguous Memory Allocation Segmentation Goals

More information

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5) Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1 Techniques to Reduce Cache Misses Victim caches Better replacement policies pseudo-lru, NRU Prefetching, cache

More information

LECTURE 3:CPU SCHEDULING

LECTURE 3:CPU SCHEDULING LECTURE 3:CPU SCHEDULING 1 Outline Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation 2 Objectives

More information

CS420: Operating Systems

CS420: Operating Systems Virtual Memory James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Background Code needs to be in memory

More information

Deterministic Memory Abstraction and Supporting Multicore System Architecture

Deterministic Memory Abstraction and Supporting Multicore System Architecture Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi $, Prathap Kumar Valsan^, Renato Mancuso *, Heechul Yun $ $ University of Kansas, ^ Intel, * Boston University

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part I: Operating system overview: Memory Management 1 Hardware background The role of primary memory Program

More information

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3. 5 Solutions Chapter 5 Solutions S-3 5.1 5.1.1 4 5.1.2 I, J 5.1.3 A[I][J] 5.1.4 3596 8 800/4 2 8 8/4 8000/4 5.1.5 I, J 5.1.6 A(J, I) 5.2 5.2.1 Word Address Binary Address Tag Index Hit/Miss 5.2.2 3 0000

More information

Virtual Memory III. Jo, Heeseung

Virtual Memory III. Jo, Heeseung Virtual Memory III Jo, Heeseung Today's Topics What if the physical memory becomes full? Page replacement algorithms How to manage memory among competing processes? Advanced virtual memory techniques Shared

More information

Virtual Memory Outline

Virtual Memory Outline Virtual Memory Outline Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel Memory Other Considerations Operating-System Examples

More information

Lecture 12: Large Cache Design. Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers

Lecture 12: Large Cache Design. Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers Lecture 12: Large ache Design Topics: Shared vs. private, centralized vs. decentralized, UA vs. NUA, recent papers 1 Shared Vs. rivate SHR: No replication of blocks SHR: Dynamic allocation of space among

More information

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1 Memory Allocation Copyright : University of Illinois CS 241 Staff 1 Allocation of Page Frames Scenario Several physical pages allocated to processes A, B, and C. Process B page faults. Which page should

More information

Memory - Paging. Copyright : University of Illinois CS 241 Staff 1

Memory - Paging. Copyright : University of Illinois CS 241 Staff 1 Memory - Paging Copyright : University of Illinois CS 241 Staff 1 Physical Frame Allocation How do we allocate physical memory across multiple processes? What if Process A needs to evict a page from Process

More information

A Framework for Memory Hierarchies

A Framework for Memory Hierarchies Associativity schemes Scheme Number of sets Blocks per set Direct mapped Number of blocks in cache 1 Set associative Blocks in cache / Associativity Associativity (2-8) Fully associative 1 Number Blocks

More information

CS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017

CS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017 CS 1: Intro to Systems Caching Martin Gagne Swarthmore College March 2, 2017 Recall A cache is a smaller, faster memory, that holds a subset of a larger (slower) memory We take advantage of locality to

More information

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Systems Programming and Computer Architecture ( ) Timothy Roscoe Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture

More information

Page Replacement Algorithms

Page Replacement Algorithms Page Replacement Algorithms MIN, OPT (optimal) RANDOM evict random page FIFO (first-in, first-out) give every page equal residency LRU (least-recently used) MRU (most-recently used) 1 9.1 Silberschatz,

More information

15: OS Scheduling and Buffering

15: OS Scheduling and Buffering 15: OS Scheduling and ing Mark Handley Typical Audio Pipeline (sender) Sending Host Audio Device Application A->D Device Kernel App Compress Encode for net RTP ed pending DMA to host (~10ms according to

More information

Parallel Programming Multicore systems

Parallel Programming Multicore systems FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

VIRTUAL MEMORY READING: CHAPTER 9

VIRTUAL MEMORY READING: CHAPTER 9 VIRTUAL MEMORY READING: CHAPTER 9 9 MEMORY HIERARCHY Core! Processor! Core! Caching! Main! Memory! (DRAM)!! Caching!! Secondary Storage (SSD)!!!! Secondary Storage (Disk)! L cache exclusive to a single

More information

Effect of memory latency

Effect of memory latency CACHE AWARENESS Effect of memory latency Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns. Assume that the processor has two ALU units and it is capable

More information

Virtual Memory. Today.! Virtual memory! Page replacement algorithms! Modeling page replacement algorithms

Virtual Memory. Today.! Virtual memory! Page replacement algorithms! Modeling page replacement algorithms Virtual Memory Today! Virtual memory! Page replacement algorithms! Modeling page replacement algorithms Reminder: virtual memory with paging! Hide the complexity let the OS do the job! Virtual address

More information

CPU Scheduling: Objectives

CPU Scheduling: Objectives CPU Scheduling: Objectives CPU scheduling, the basis for multiprogrammed operating systems CPU-scheduling algorithms Evaluation criteria for selecting a CPU-scheduling algorithm for a particular system

More information

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value

More information

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015 CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable

More information

The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!!

The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!! The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!! 1 2 3 Modern CMPs" Intel e5 2600 (2013)! SLLC" AMD Orochi (2012)! SLLC"

More information

FlexSC. Flexible System Call Scheduling with Exception-Less System Calls. Livio Soares and Michael Stumm. University of Toronto

FlexSC. Flexible System Call Scheduling with Exception-Less System Calls. Livio Soares and Michael Stumm. University of Toronto FlexSC Flexible System Call Scheduling with Exception-Less System Calls Livio Soares and Michael Stumm University of Toronto Motivation The synchronous system call interface is a legacy from the single

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

Virtual Memory COMPSCI 386

Virtual Memory COMPSCI 386 Virtual Memory COMPSCI 386 Motivation An instruction to be executed must be in physical memory, but there may not be enough space for all ready processes. Typically the entire program is not needed. Exception

More information

Unit 2 Buffer Pool Management

Unit 2 Buffer Pool Management Unit 2 Buffer Pool Management Based on: Pages 318-323, 541-542, and 586-587 of Ramakrishnan & Gehrke (text); Silberschatz, et. al. ( Operating System Concepts ); Other sources Original slides by Ed Knorr;

More information

Optimal Algorithm. Replace page that will not be used for longest period of time Used for measuring how well your algorithm performs

Optimal Algorithm. Replace page that will not be used for longest period of time Used for measuring how well your algorithm performs Optimal Algorithm Replace page that will not be used for longest period of time Used for measuring how well your algorithm performs page 1 Least Recently Used (LRU) Algorithm Reference string: 1, 2, 3,

More information

Multi-core Programming Evolution

Multi-core Programming Evolution Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

Past: Making physical memory pretty

Past: Making physical memory pretty Past: Making physical memory pretty Physical memory: no protection limited size almost forces contiguous allocation sharing visible to program easy to share data gcc gcc emacs Virtual memory each program

More information

Virtual Memory. CSCI 315 Operating Systems Design Department of Computer Science

Virtual Memory. CSCI 315 Operating Systems Design Department of Computer Science Virtual Memory CSCI 315 Operating Systems Design Department of Computer Science Notice: The slides for this lecture have been largely based on those from an earlier edition of the course text Operating

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts

More information

CS 136: Advanced Architecture. Review of Caches

CS 136: Advanced Architecture. Review of Caches 1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you

More information

Unit 2 Buffer Pool Management

Unit 2 Buffer Pool Management Unit 2 Buffer Pool Management Based on: Sections 9.4, 9.4.1, 9.4.2 of Ramakrishnan & Gehrke (text); Silberschatz, et. al. ( Operating System Concepts ); Other sources Original slides by Ed Knorr; Updates

More information

CSE 153 Design of Operating Systems

CSE 153 Design of Operating Systems CSE 153 Design of Operating Systems Winter 18 Lecture 18/19: Page Replacement Memory Management Memory management systems Physical and virtual addressing; address translation Techniques: Partitioning,

More information

Chapter 3: Important Concepts (3/29/2015)

Chapter 3: Important Concepts (3/29/2015) CISC 3595 Operating System Spring, 2015 Chapter 3: Important Concepts (3/29/2015) 1 Memory from programmer s perspective: you already know these: Code (functions) and data are loaded into memory when the

More information

Resource Sharing and Partitioning in Multicore

Resource Sharing and Partitioning in Multicore www.bsc.es Resource Sharing and Partitioning in Multicore Francisco J. Cazorla Mixed Criticality/Reliability Workshop HiPEAC CSW Barcelona May 2014 Transition to Multicore and Manycores Wanted or imposed

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 System I/O System I/O (Chap 13) Central

More information

Reminder: Mechanics of address translation. Paged virtual memory. Reminder: Page Table Entries (PTEs) Demand paging. Page faults

Reminder: Mechanics of address translation. Paged virtual memory. Reminder: Page Table Entries (PTEs) Demand paging. Page faults CSE 451: Operating Systems Autumn 2012 Module 12 Virtual Memory, Page Faults, Demand Paging, and Page Replacement Reminder: Mechanics of address translation virtual address virtual # offset table frame

More information

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand

Spring 2016 :: CSE 502 Computer Architecture. Caches. Nima Honarmand Caches Nima Honarmand Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required by all of the running applications

More information

12 Cache-Organization 1

12 Cache-Organization 1 12 Cache-Organization 1 Caches Memory, 64M, 500 cycles L1 cache 64K, 1 cycles 1-5% misses L2 cache 4M, 10 cycles 10-20% misses L3 cache 16M, 20 cycles Memory, 256MB, 500 cycles 2 Improving Miss Penalty

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

Performance and Optimization Issues in Multicore Computing

Performance and Optimization Issues in Multicore Computing Performance and Optimization Issues in Multicore Computing Minsoo Ryu Department of Computer Science and Engineering 2 Multicore Computing Challenges It is not easy to develop an efficient multicore program

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

SCALING HARDWARE AND SOFTWARE

SCALING HARDWARE AND SOFTWARE SCALING HARDWARE AND SOFTWARE FOR THOUSAND-CORE SYSTEMS Daniel Sanchez Electrical Engineering Stanford University Multicore Scalability 1.E+06 10 6 1.E+05 10 5 1.E+04 10 4 1.E+03 10 3 1.E+02 10 2 1.E+01

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 What is an Operating System? What is

More information

Windows 10: Memory Compression (from insider hub, 20 August 2015)

Windows 10: Memory Compression (from insider hub, 20 August 2015) Windows 10: Memory Compression (from insider hub, 20 August 2015) With the announcement of the Memory compression feature, we have received a lot of queries from the insiders asking for more information.

More information

Virtual Memory Design and Implementation

Virtual Memory Design and Implementation Virtual Memory Design and Implementation To do q Page replacement algorithms q Design and implementation issues q Next: Last on virtualization VMMs Loading pages When should the OS load pages? On demand

More information

What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout

What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout Goals for the OS Interface More convenient abstractions than hardware interface Manage shared resources Provide near-hardware

More information

CS 333 Introduction to Operating Systems. Class 14 Page Replacement. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 14 Page Replacement. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 14 Page Replacement Jonathan Walpole Computer Science Portland State University Page replacement Assume a normal page table (e.g., BLITZ) User-program is

More information

Operating Systems (2INC0) 2017/18

Operating Systems (2INC0) 2017/18 Operating Systems (2INC0) 2017/18 Virtual Memory (10) Dr Courtesy of Dr I Radovanovic, Dr R Mak System rchitecture and Networking Group genda Recap memory management in early systems Principles of virtual

More information

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,

More information

Operating System Overview

Operating System Overview A Typical Computer from a Hardware Point of View... Operating System Overview Otto J. Anshus Memory Chipset I/O bus (including slides from, Princeton University) University of Tromsø Keyboard Network /OJA

More information

Chapter 9: Virtual Memory. Operating System Concepts 9 th Edition

Chapter 9: Virtual Memory. Operating System Concepts 9 th Edition Chapter 9: Virtual Memory Silberschatz, Galvin and Gagne 2013 Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating

More information

Clock page algorithm. Least recently used (LRU) NFU algorithm. Aging (NFU + forgetting) Working set. Process behavior

Clock page algorithm. Least recently used (LRU) NFU algorithm. Aging (NFU + forgetting) Working set. Process behavior When a page fault occurs Page replacement algorithms OS 23 32 OS has to choose a page to evict from memory If the page has been modified, the OS has to schedule a disk write of the page The page just read

More information

Task Partitioning and Placement in Multicore Microcontrollers. David Lacey 24 th September 2012

Task Partitioning and Placement in Multicore Microcontrollers. David Lacey 24 th September 2012 Task Partitioning and Placement in Multicore Microcontrollers David Lacey 24 th September 2012 Multicore Microcontrollers Many embedded systems are now multicore Maybe lots of cores: XMOS devices have

More information

Page Replacement. (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018

Page Replacement. (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018 Page Replacement (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018 Today s Goals Making virtual memory virtual : incorporating disk backing. Explore page replacement policies

More information

15-740/ Computer Architecture Lecture 12: Advanced Caching. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 12: Advanced Caching. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 12: Advanced Caching Prof. Onur Mutlu Carnegie Mellon University Announcements Chuck Thacker (Microsoft Research) Seminar Tomorrow RARE: Rethinking Architectural

More information

CS6401- Operating System UNIT-III STORAGE MANAGEMENT

CS6401- Operating System UNIT-III STORAGE MANAGEMENT UNIT-III STORAGE MANAGEMENT Memory Management: Background In general, to rum a program, it must be brought into memory. Input queue collection of processes on the disk that are waiting to be brought into

More information

Buses. Maurizio Palesi. Maurizio Palesi 1

Buses. Maurizio Palesi. Maurizio Palesi 1 Buses Maurizio Palesi Maurizio Palesi 1 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single shared channel Microcontroller Microcontroller

More information

A Comparison of Capacity Management Schemes for Shared CMP Caches

A Comparison of Capacity Management Schemes for Shared CMP Caches A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip

More information

Multitasking and scheduling

Multitasking and scheduling Multitasking and scheduling Guillaume Salagnac Insa-Lyon IST Semester Fall 2017 2/39 Previously on IST-OPS: kernel vs userland pplication 1 pplication 2 VM1 VM2 OS Kernel rchitecture Hardware Each program

More information

Optimizing Replication, Communication, and Capacity Allocation in CMPs

Optimizing Replication, Communication, and Capacity Allocation in CMPs Optimizing Replication, Communication, and Capacity Allocation in CMPs Zeshan Chishti, Michael D Powell, and T. N. Vijaykumar School of ECE Purdue University Motivation CMP becoming increasingly important

More information

Chapter 5: CPU Scheduling

Chapter 5: CPU Scheduling COP 4610: Introduction to Operating Systems (Fall 2016) Chapter 5: CPU Scheduling Zhi Wang Florida State University Contents Basic concepts Scheduling criteria Scheduling algorithms Thread scheduling Multiple-processor

More information

AB-Aware: Application Behavior Aware Management of Shared Last Level Caches

AB-Aware: Application Behavior Aware Management of Shared Last Level Caches AB-Aware: Application Behavior Aware Management of Shared Last Level Caches Suhit Pai, Newton Singh and Virendra Singh Computer Architecture and Dependable Systems Laboratory Department of Electrical Engineering

More information

Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core

Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core 3 rd HiPEAC Workshop IBM, Haifa 17-4-2007 Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core, P. Michaud, L. He, D. Fetis, C. Ioannou, P. Charalambous and A. Seznec

More information

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks : Defending Against Cache-Based Side Channel Attacks Mengjia Yan, Bhargava Gopireddy, Thomas Shull, Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Presented by Mengjia

More information

Part I Overview Chapter 1: Introduction

Part I Overview Chapter 1: Introduction Part I Overview Chapter 1: Introduction Fall 2010 1 What is an Operating System? A computer system can be roughly divided into the hardware, the operating system, the application i programs, and dthe users.

More information

COMP 3361: Operating Systems 1 Final Exam Winter 2009

COMP 3361: Operating Systems 1 Final Exam Winter 2009 COMP 3361: Operating Systems 1 Final Exam Winter 2009 Name: Instructions This is an open book exam. The exam is worth 100 points, and each question indicates how many points it is worth. Read the exam

More information

Micro-Architectural Attacks and Countermeasures

Micro-Architectural Attacks and Countermeasures Micro-Architectural Attacks and Countermeasures Çetin Kaya Koç koc@cs.ucsb.edu Çetin Kaya Koç http://koclab.org Winter 2017 1 / 25 Contents Micro-Architectural Attacks Cache Attacks Branch Prediction Attack

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required

More information