Scheduling, part 2. Don Porter CSE 506

Similar documents
238P: Operating Systems. Lecture 14: Process scheduling

Context Switching & CPU Scheduling

Scheduling. Don Porter CSE 306

ò mm_struct represents an address space in kernel ò task represents a thread in the kernel ò A task points to 0 or 1 mm_structs

ò Paper reading assigned for next Tuesday ò Understand low-level building blocks of a scheduler User Kernel ò Understand competing policy goals

Native POSIX Thread Library (NPTL) CSE 506 Don Porter

Recap. Run to completion in order of arrival Pros: simple, low overhead, good for batch jobs Cons: short jobs can stuck behind the long ones

Process management. Scheduling

CS 326: Operating Systems. CPU Scheduling. Lecture 6

Multi-Level Feedback Queues

Operating Systems. Process scheduling. Thomas Ropars.

EECS 750: Advanced Operating Systems. 01/29 /2014 Heechul Yun

8: Scheduling. Scheduling. Mark Handley

PROCESS SCHEDULING Operating Systems Design Euiseong Seo

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

High level scheduling: Medium level scheduling: Low level scheduling. Scheduling 0 : Levels

Last Class: Processes

Computer Systems Assignment 4: Scheduling and I/O

Fall 2014:: CSE 506:: Section 2 (PhD) Threading. Nima Honarmand (Based on slides by Don Porter and Mike Ferdman)

But this will not be complete (no book covers 100%) So consider it a rough approximation Last lecture OSPP Sections 3.1 and 4.1

PROCESS SCHEDULING II. CS124 Operating Systems Fall , Lecture 13

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

Interactive Scheduling

Two Level Scheduling. Interactive Scheduling. Our Earlier Example. Round Robin Scheduling. Round Robin Schedule. Round Robin Schedule

CSCI-GA Operating Systems Lecture 3: Processes and Threads -Part 2 Scheduling Hubertus Franke

SMD149 - Operating Systems

Comparison of soft real-time CPU scheduling in Linux kernel 2.6 series with Solaris 10

Course Syllabus. Operating Systems

COSC243 Part 2: Operating Systems

The Art and Science of Memory Allocation

W4118: advanced scheduling

Lecture Topics. Announcements. Today: Uniprocessor Scheduling (Stallings, chapter ) Next: Advanced Scheduling (Stallings, chapter

Scheduling II. Today. Next Time. ! Proportional-share scheduling! Multilevel-feedback queue! Multiprocessor scheduling. !

CS 4284 Systems Capstone. Resource Allocation & Scheduling Godmar Back

Scheduling - Overview

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

CS370 Operating Systems

CS 318 Principles of Operating Systems

CPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )

Operating Systems. Scheduling

Administrivia. Networks and Operating Systems ( ) Chapter 3: Scheduling. A Small Quiz. Last time. Scheduling.

Operating Systems and Networks Assignment 9

Scheduling Mar. 19, 2018

OPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5

INF1060: Introduction to Operating Systems and Data Communication. Pål Halvorsen. Wednesday, September 29, 2010

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013)

Processes. Overview. Processes. Process Creation. Process Creation fork() Processes. CPU scheduling. Pål Halvorsen 21/9-2005

Supplementary Materials on Multilevel Feedback Queue

Chapter 5: CPU Scheduling

Process Scheduling. Copyright : University of Illinois CS 241 Staff

Scheduling. Scheduling 1/51

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Processes and threads

CSE 120 Principles of Operating Systems

Scheduling policy. Reference: ULK3e 7.1. Scheduling in Linux 1 / 20

CS3210: Processes and switching 1. Taesoo Kim

Announcements. Program #1. Program #0. Reading. Is due at 9:00 AM on Thursday. Re-grade requests are due by Monday at 11:59:59 PM.

CSE 120 Principles of Operating Systems Spring 2017

Operating Systems. Lecture Process Scheduling. Golestan University. Hossein Momeni

Scheduling. Scheduling 1/51

Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW

Analysis of the BFS Scheduler in FreeBSD

Advanced Operating Systems (CS 202) Scheduling (1)

Multitasking and scheduling

CSC Operating Systems Spring Lecture - XII Midterm Review. Tevfik Ko!ar. Louisiana State University. March 4 th, 2008.

Lecture 12: Demand Paging

Chap 7, 8: Scheduling. Dongkun Shin, SKKU

Scheduling. Scheduling. Scheduling. Scheduling Criteria. Priorities. Scheduling

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s)

Operating System Concepts Ch. 5: Scheduling

Timers 1 / 46. Jiffies. Potent and Evil Magic

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

Computer Systems II. First Two Major Computer System Evolution Steps

Scheduling in the Supermarket

15: OS Scheduling and Buffering

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s)

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Multiprocessor Systems. COMP s1

CS 153 Design of Operating Systems Winter 2016

Operating Systems. Introduction & Overview. Outline for today s lecture. Administrivia. ITS 225: Operating Systems. Lecture 1

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Page Frame Reclaiming

Networks and Operating Systems ( ) Chapter 3: Scheduling

CSE 120 Principles of Operating Systems

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Scheduling

Unix SVR4 (Open Solaris and illumos distributions) CPU Scheduling

Processes, Execution, and State. What is CPU Scheduling? Goals and Metrics. Rectified Scheduling Metrics. CPU Scheduling: Proposed Metrics 4/6/2016

VFS, Continued. Don Porter CSE 506

Lecture 2: Architectural Support for OSes

ECE 598 Advanced Operating Systems Lecture 22

Scheduling. Monday, November 22, 2004

Operating system concepts. Task scheduling

How Linux RT_PREEMPT Works

Process Scheduling Queues

Announcements. Program #1. Reading. Due 2/15 at 5:00 pm. Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed)

OS 1 st Exam Name Solution St # (Q1) (19 points) True/False. Circle the appropriate choice (there are no trick questions).

Scheduling. CS 161: Lecture 4 2/9/17

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Scheduling: Case Studies. CS 161: Lecture 5 2/14/17

Transcription:

Scheduling, part 2 Don Porter CSE 506

Logical Diagram Binary Memory Formats Allocators Threads Today s Lecture Switching System to CPU Calls RCU scheduling File System Networking Sync User Kernel Memory Management Device Drivers CPU Scheduler Interrupts Disk Net Consistency Hardware

Last time ò Scheduling overview, key trade-offs, etc. ò O(1) scheduler older Linux scheduler ò Today: Completely Fair Scheduler (CFS) new hotness ò Other advanced scheduling issues ò Real-time scheduling ò Kernel preemption ò Priority laundering ò Security attack trick developed at Stony Brook

Fair Scheduling ò Simple idea: 50 tasks, each should get 2% of CPU time ò Do we really want this? ò What about priorities? ò Interactive vs. batch jobs? ò CPU topologies? ò Per-user fairness? ò Alice has one task and Bob has 49; why should Bob get 98% of CPU time? ò Etc.?

Editorial ò Real issue: O(1) scheduler bookkeeping is complicated ò Heuristics for various issues makes it more complicated ò Heuristics can end up working at cross-purposes ò Software engineering observation: ò Kernel developers better understood scheduling issues and workload characteristics, could make more informed design choice ò Elegance: Structure (and complexity) of solution matches problem

CFS idea ò Back to a simple list of tasks (conceptually) ò Ordered by how much time they ve had ò Least time to most time ò Always pick the neediest task to run ò Until it is no longer neediest ò Then re-insert old task in the timeline ò Schedule the new neediest

CFS Example 5 10 15 22 26 Schedule neediest task List sorted by how many ticks the task has had

CFS Example 10 15 22 26 11 Once no longer the neediest, put back on the list

But lists are inefficient ò Duh! That s why we really use a tree ò Red-black tree: 9/10 Linux developers recommend it ò log(n) time for: ò Picking next task (i.e., search for left-most task) ò Putting the task back when it is done (i.e., insertion) ò Remember: n is total number of tasks on system

Details ò Global virtual clock: ticks at a fraction of real time ò Fraction is number of total tasks ò Each task counts how many clock ticks it has had ò Example: 4 tasks ò Global vclock ticks once every 4 real ticks ò Each task scheduled for one real tick; advances local clock by one tick

More details ò Task s ticks make key in RB-tree ò Fewest tick count get serviced first ò No more runqueues ò Just a single tree-structured timeline

CFS Example (more realistic) Global Ticks: 12 13 10 ò Tasks sorted by ticks executed ò One global tick per n ticks ò n == number of tasks (5) ò 4 ticks for first task 4 12 ò Reinsert into list ò 1 tick to new first task 5 1 8 ò Increment global clock

Edge case 1 ò What about a new task? ò If task ticks start at zero, doesn t it get to unfairly run for a long time? ò Strategies: ò Could initialize to current time (start at right) ò Could get half of parent s deficit

What happened to priorities? ò Priorities let me be deliberately unfair ò This is a useful feature ò In CFS, priorities weigh the length of a task s tick ò Example: ò For a high-priority task, a virtual, task-local tick may last for 10 actual clock ticks ò Note: 10:1 ratio is a made-up example. See code for real weights. For a low-priority task, a virtual, task-local tick may only last for 1 actual clock tick ò Result: Higher-priority tasks run longer, low-priority tasks make some progress

Interactive latency ò Recall: GUI programs are I/O bound ò We want them to be responsive to user input ò Need to be scheduled as soon as input is available ò Will only run for a short time

GUI program strategy ò Just like O(1) scheduler, CFS takes blocked programs out of the RB-tree of runnable processes ò Virtual clock continues ticking while tasks are blocked ò Increasingly large deficit between task and global vclock ò When a GUI task is runnable, generally goes to the front ò Dramatically lower vclock value than CPU-bound jobs ò Reminder: front is left side of tree

Other refinements ò Per group or user scheduling ò Real to virtual tick ratio becomes a function of number of both global and user s/group s tasks ò Unclear how CPU topologies are addressed

Recap: Ticks galore! ò Real time is measured by a timer device, which ticks at a certain frequency by raising a timer interrupt ò A process s virtual tick is some number of real ticks ò We implement priorities, per-user fairness, etc. by tuning this ratio ò The global tick counter is used to keep track of the maximum possible virtual ticks a process has had. ò Used to calculate one s deficit

CFS Summary ò Simple idea: logically a queue of runnable tasks, ordered by who has had the least CPU time ò Implemented with a tree for fast lookup, reinsertion ò Global clock counts virtual ticks ò Priorities and other features/tweaks implemented by playing games with length of a virtual tick ò Virtual ticks vary in wall-clock length per-process

Real-time scheduling ò Different model: need to do a modest amount of work by a deadline ò Example: ò Audio application needs to deliver a frame every nth of a second ò Too many or too few frames unpleasant to hear

Strawman ò If I know it takes n ticks to process a frame of audio, just schedule my application n ticks before the deadline ò Problems? ò Hard to accurately estimate n ò Interrupts ò Cache misses ò Disk accesses ò Variable execution time depending on inputs

Hard problem ò Gets even worse with multiple applications + deadlines ò May not be able to meet all deadlines ò Interactions through shared data structures worsen variability ò Block on locks held by other tasks ò Cached file system data gets evicted ò Optional reading (interesting): Nemesis an OS without shared caches to improve real-time scheduling

Simple hack ò Create a highest-priority scheduling class for real-time process ò SCHED_RR RR == round robin ò RR tasks fairly divide CPU time amongst themselves ò Pray that it is enough to meet deadlines ò If so, other tasks share the left-overs ò Assumption: like GUI programs, RR tasks will spend most of their time blocked on I/O ò Latency is key concern

Next issue: Kernel time ò Should time spent in the OS count against an application s time slice? ò Yes: Time in a system call is work on behalf of that task ò No: Time in an interrupt handler may be completing I/O for another task

Timeslices + syscalls ò System call times vary ò Context switches generally at system call boundary ò Can also context switch on blocking I/O operations ò If a time slice expires inside of a system call: ò Task gets rest of system call for free ò Steals from next task ò Potentially delays interactive/real time task until finished

Idea: Kernel Preemption ò Why not preempt system calls just like user code? ò Well, because it is harder, duh! ò Why? ò ò May hold a lock that other tasks need to make progress May be in a sequence of HW config options that assumes it won t be interrupted ò General strategy: allow fragile code to disable preemption ò Cf: Interrupt handlers can disable interrupts if needed

Kernel Preemption ò Implementation: actually not too bad ò Essentially, it is transparently disabled with any locks held ò A few other places disabled by hand ò Result: UI programs a bit more responsive

Priority Laundering ò Some attacks are based on race conditions for OS resources (e.g., symbolic links) ò Generally, these are privilege-escalation attacks against administrative utilities (e.g., passwd) ò Can only be exploited if attacker controls scheduling ò Ensure that victim is descheduled after a given system call (not explained today) ò Ensure that attacker always gets to run after the victim

Problem rephrased ò At some arbitrary point in the future, I want to be sure task X is at the front of the scheduler queue ò But no sooner ò And I have some CPU-intensive work I also need to do ò Suggestions?

Dump work on your kids ò Strategy: ò Create a child process to do all the work ò And a pipe ò Parent attacker spends all of its time blocked on the pipe ò Looks I/O bound gets priority boost! ò Just before right point in the attack, child puts a byte in the pipe ò Parent uses short sleep intervals for fine-grained timing ò Parent stays at the front of the scheduler queue

SBU Pride ò This trick was developed as part of a larger work on exploiting race conditions at SBU ò By Rob Johnson and SPLAT lab students ò An optional reading, if you are interested ò Something for the old tool box

Summary ò Understand: ò Completely Fair Scheduler (CFS) ò Real-time scheduling issues ò Kernel preemption ò Priority laundering