ò Paper reading assigned for next Tuesday ò Understand low-level building blocks of a scheduler User Kernel ò Understand competing policy goals

Similar documents
ò mm_struct represents an address space in kernel ò task represents a thread in the kernel ò A task points to 0 or 1 mm_structs

Scheduling. Don Porter CSE 306

238P: Operating Systems. Lecture 14: Process scheduling

Scheduling, part 2. Don Porter CSE 506

W4118: advanced scheduling

Native POSIX Thread Library (NPTL) CSE 506 Don Porter

Context Switching & CPU Scheduling

CS 326: Operating Systems. CPU Scheduling. Lecture 6

Course Syllabus. Operating Systems

Scheduling in the Supermarket

EECS 750: Advanced Operating Systems. 01/29 /2014 Heechul Yun

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems

Scheduling policy. Reference: ULK3e 7.1. Scheduling in Linux 1 / 20

SMD149 - Operating Systems

Scheduling Mar. 19, 2018

The Art and Science of Memory Allocation

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

CS370 Operating Systems

CSCI-GA Operating Systems Lecture 3: Processes and Threads -Part 2 Scheduling Hubertus Franke

Processes. Overview. Processes. Process Creation. Process Creation fork() Processes. CPU scheduling. Pål Halvorsen 21/9-2005

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

Processes and Non-Preemptive Scheduling. Otto J. Anshus

CS2506 Quick Revision

Operating Systems. Process scheduling. Thomas Ropars.

Scheduling. Scheduling 1/51

Fall 2014:: CSE 506:: Section 2 (PhD) Threading. Nima Honarmand (Based on slides by Don Porter and Mike Ferdman)

CS 4284 Systems Capstone. Resource Allocation & Scheduling Godmar Back

PROCESS MANAGEMENT. Operating Systems 2015 Spring by Euiseong Seo

LECTURE 3:CPU SCHEDULING

8: Scheduling. Scheduling. Mark Handley

Multi-Level Feedback Queues

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Process management. Scheduling

CS3733: Operating Systems

PROCESS SCHEDULING Operating Systems Design Euiseong Seo

Scheduling. CS 161: Lecture 4 2/9/17

Scheduling. Scheduling 1/51

Process Scheduling. Copyright : University of Illinois CS 241 Staff

REVIEW OF COMMONLY USED DATA STRUCTURES IN OS

Scheduling - Overview

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Analysis of the BFS Scheduler in FreeBSD

CPSC 341 OS & Networks. Processes. Dr. Yingwu Zhu

Chapter 5: CPU Scheduling

CSC Operating Systems Spring Lecture - XII Midterm Review. Tevfik Ko!ar. Louisiana State University. March 4 th, 2008.

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Scheduling

Lecture Topics. Announcements. Today: Uniprocessor Scheduling (Stallings, chapter ) Next: Advanced Scheduling (Stallings, chapter

THE PROCESS ABSTRACTION. CS124 Operating Systems Winter , Lecture 7

SFO The Linux Kernel Scheduler. Viresh Kumar (PMWG)

Kernel Korner What's New in the 2.6 Scheduler

Scheduling. Today. Next Time Process interaction & communication

Processes and Threads. Processes and Threads. Processes (2) Processes (1)

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

PROCESS SCHEDULING II. CS124 Operating Systems Fall , Lecture 13

Networks and Operating Systems ( ) Chapter 3: Scheduling

INF1060: Introduction to Operating Systems and Data Communication. Pål Halvorsen. Wednesday, September 29, 2010

CS370 Operating Systems

Announcements. Program #1. Program #0. Reading. Is due at 9:00 AM on Thursday. Re-grade requests are due by Monday at 11:59:59 PM.

CPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )

COSC243 Part 2: Operating Systems

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Processes and threads

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

Advanced Operating Systems (CS 202) Scheduling (1)

Scheduling II. Today. Next Time. ! Proportional-share scheduling! Multilevel-feedback queue! Multiprocessor scheduling. !

Computer Systems Laboratory Sungkyunkwan University

Process behavior. Categories of scheduling algorithms.

Chapter 5: Process Scheduling

Chapter 5 Scheduling

Chap 7, 8: Scheduling. Dongkun Shin, SKKU

What is a Process? Processes and Process Management Details for running a program

CS 322 Operating Systems Practice Midterm Questions

Announcements. Reading. Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) CMSC 412 S14 (lect 5)

Operating Systems. Scheduling

High level scheduling: Medium level scheduling: Low level scheduling. Scheduling 0 : Levels

Scalable Linux Scheduling (Paper # 9)

Chapter 5: CPU Scheduling

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems, 3rd edition. Uses content with permission from Assoc. Prof. Florin Fortis, PhD

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013)

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 21

Processes, PCB, Context Switch

OS Structure, Processes & Process Management. Don Porter Portions courtesy Emmett Witchel

Chapter 5: CPU Scheduling. Operating System Concepts Essentials 8 th Edition

But this will not be complete (no book covers 100%) So consider it a rough approximation Last lecture OSPP Sections 3.1 and 4.1

CSC369 Lecture 2. Larry Zhang

CS 31: Intro to Systems Processes. Kevin Webb Swarthmore College March 31, 2016

CS Computer Systems. Lecture 6: Process Scheduling

Scheduling. Monday, November 22, 2004

Chapter 3: Processes. Operating System Concepts 9 th Edit9on

Multitasking and scheduling

Operating Systems. Computer Science & Information Technology (CS) Rank under AIR 100

CHAPTER 2: PROCESS MANAGEMENT

Chapter 5: CPU Scheduling. Operating System Concepts 8 th Edition,

Announcements. Program #1. Reading. Due 2/15 at 5:00 pm. Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed)

Lecture 4: Memory Management & The Programming Interface

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

Interactive Scheduling

Two Level Scheduling. Interactive Scheduling. Our Earlier Example. Round Robin Scheduling. Round Robin Schedule. Round Robin Schedule

CSE 120 Principles of Operating Systems Spring 2017

Processes, Context Switching, and Scheduling. Kevin Webb Swarthmore College January 30, 2018

Transcription:

Housekeeping Paper reading assigned for next Tuesday Scheduling Don Porter CSE 506 Memory Management Logical Diagram Binary Memory Formats Allocators Threads Today s Lecture Switching System to CPU Calls RCU scheduling File System Networking Sync Device Drivers CPU Scheduler User Kernel Lecture goals Understand low-level building blocks of a scheduler Understand competing policy goals Understand the O(1) scheduler CFS next lecture Familiarity with standard Unix scheduling APIs Interrupts Disk Net Consistency Hardware Undergrad review What is cooperative multitasking? Processes voluntarily yield CPU when they are done What is preemptive multitasking? OS only lets tasks run for a limited time, then forcibly context switches the CPU Pros/cons? Cooperative gives more control; so much that one task can hog the CPU forever Preemptive gives OS more control, more overheads/complexity Where can we preempt a process? In other words, what are the logical points at which the OS can regain control of the CPU? System calls Before During (more next time on this) After Interrupts Timer interrupt ensures maximum time slice 1

(Linux) Terminology Outline mm_struct represents an address space in kernel task represents a thread in the kernel A task points to 0 or 1 mm_structs Kernel threads just borrow previous task s mm, as they only execute in kernel address space Many tasks can point to the same mm_struct Multi-threading Quantum CPU timeslice Policy goals Low-level mechanisms O(1) Scheduler CPU topologies Scheduling interfaces Policy goals No perfect solution Fairness everything gets a fair share of the CPU Real-time deadlines CPU time before a deadline more valuable than time after Latency vs Throughput: Timeslice length matters! GUI programs should feel responsive CPU-bound jobs want long timeslices, better throughput User priorities Optimizing multiple variables Like memory allocation, this is best-effort Some workloads prefer some scheduling strategies Nonetheless, some solutions are generally better than others Virus scanning is nice, but I don t want it slowing things down What is it? Context switching Swap out the address space and running thread Address space: Need to change page tables Update cr3 register on x86 Simplified by convention that kernel is at same address range in all processes What would be hard about mapping kernel in different places? Other context switching tasks Swap out other register state Segments, debugging registers, MMX, etc If descheduling a process for the last time, reclaim its memory Switch thread stacks 2

Switching threads How to switch stacks? Programming abstraction: /* Do some work */ Store register state on the stack in a well-defined format Carefully update stack registers to new stack Tricky: can t use stack-based storage for this step! schedule(); /* Something else runs */ /* Do more work */ Example Weird code to write ebp esp Thread 1 (prev) regs Thread 2 (next) regs ebp ebp eax /* eax is next->thread_infoesp */! /* push general-purpose regs*/! push ebp! mov esp, eax! pop ebp! /* pop other regs */! Inside schedule(), you end up with code like: switch_to(me, next, &last);! /* possibly clean up last */! Where does last come from? Output of switch_to Written on my stack by previous thread (not me)! How to code this? Outline Pick a register (say ebx); before context switch, this is a pointer to last s location on the stack Pick a second register (say eax) to stores the pointer to the currently running task (me) Make sure to push ebx after eax After switching stacks: pop ebx /* eax still points to old task*/ mov (ebx), eax /* store eax at the location ebx points to */ pop eax /* Update eax to new task */ Policy goals Low-level mechanisms O(1) Scheduler CPU topologies Scheduling interfaces 3

Strawman scheduler Even straw-ier man Organize all processes as a simple list In schedule(): Pick first one on list to run next Put suspended task at the end of the list Problem? Only allows round-robin scheduling Can t prioritize tasks Naïve approach to priorities: Scan the entire list on each run Or periodically reshuffle the list Problems: Forking where does child go? What about if you only use part of your quantum? Eg, blocking I/O O(1) scheduler O(1) Bookkeeping Goal: decide who to run next, independent of number of processes in system Still maintain ability to prioritize tasks, handle partially unused quanta, etc runqueue: a list of runnable processes Blocked processes are not on any runqueue A runqueue belongs to a specific CPU Each task is on exactly one runqueue Task only scheduled on runqueue s CPU unless migrated 2 *40 * #CPUs runqueues 40 dynamic priority levels (more later) 2 sets of runqueues one active and one expired O(1) Data Structures O(1) Intuition Active Expired Take the first task off the lowest-numbered runqueue on active set Confusingly: a lower priority value means higher priority When done, put it on appropriate runqueue on expired set Once active is completely empty, swap which set of runqueues is active and expired Constant time, since fixed number of queues to check; only take first item from non-empty queue 4

O(1) Example What now? Active Expired Active Expired Pick first, highest priority task to run Move to expired queue when quantum expires Blocked Tasks What if a program blocks on I/O, say for the disk? Active Blocking Example Expired Disk It still has part of its quantum left Not runnable, so don t waste time putting it on the active or expired runqueues We need a wait queue associated with each blockable event Disk, lock, pipe, network socket, etc Block on disk! Process goes on disk wait queue Blocked Tasks, cont Time slice tracking A blocked task is moved to a wait queue until the expected event happens No longer on any active or expired queue! Disk example: After I/O completes, interrupt handler moves task back to active runqueue If a process blocks and then becomes runnable, how do we know how much time it had left? Each task tracks ticks left in time_slice field On each clock tick: current->time_slice--! If time slice goes to zero, move to expired queue Refill time slice Schedule someone else An unblocked task can use balance of time slice Forking halves time slice with child 5

More on priorities = highest priority = lowest priority 120 = base priority nice value: user-specified adjustment to base priority Selfish (not nice) = -20 (I want to go first) Really nice = +19 (I will go last) Base time slice # % (140 prio)* 20ms prio <120 time = $ % (140 prio)* 5ms prio 120 & Higher priority tasks get longer time slices And run first Goal: Responsive UIs Idea: Infer from sleep time Most GUI programs are I/O bound on the user Unlikely to use entire time slice Users get annoyed when they type a key and it takes a long time to appear Idea: give UI programs a priority boost Go to front of line, run briefly, block on I/O again Which ones are the UI programs? By definition, I/O bound applications spend most of their time waiting on I/O We can monitor I/O wait time and infer which programs are GUI (and disk intensive) Give these applications a priority boost Note that this behavior can be dynamic Ex: GUI configures DVD ripping, then it is CPU-bound Scheduling should match program phases Dynamic priority dynamic priority = max (, min ( static priority bonus + 5, ) ) Bonus is calculated based on sleep time Dynamic priority determines a tasks runqueue This is a heuristic to balance competing goals of CPU throughput and latency in dealing with infrequent I/O May not be optimal Dynamic Priority in O(1) Scheduler Important: The runqueue a process goes in is determined by the dynamic priority, not the static priority Dynamic priority is mostly determined by time spent waiting, to boost UI responsiveness Nice values influence static priority No matter how nice you are (or aren t), you can t boost your dynamic priority without blocking on a wait queue! 6

Rebalancing tasks Rebalancing As described, once a task ends up in one CPU s runqueue, it stays on that CPU forever CPU 0 CPU 1 CPU 1 Needs More Work! Rebalancing tasks Idea: Idle CPUs rebalance As described, once a task ends up in one CPU s runqueue, it stays on that CPU forever What if all the processes on CPU 0 exit, and all of the processes on CPU 1 fork more children? We need to periodically rebalance Balance overheads against benefits Figuring out where to move tasks isn t free If a CPU is out of runnable tasks, it should take load from busy CPUs Busy CPUs shouldn t lose time finding idle CPUs to take their work if possible There may not be any idle CPUs Overhead to figure out whether other idle CPUs exist Just have busy CPUs rebalance much less frequently Average load Rebalancing strategy How do we measure how busy a CPU is? Average number of runnable tasks over time Available in /proc/loadavg Read the loadavg of each CPU Find the one with the highest loadavg (Hand waving) Figure out how many tasks we could take If worth it, lock the CPU s runqueues and take them If not, try again later 7

Locking note Why not rebalance? If CPU A locks CPU B s runqueue to take some work: CPU B must lock its runqueues in the common case that no one is rebalancing Cf Hoard and per-cpu heaps Idiosyncrasy: runqueue locks are acquired by one task and released by another Usually this would indicate a bug! Intuition: If things run slower on another CPU Why might this happen? NUMA (Non-Uniform Memory Access) Hyper-threading Multi-core cache behavior Vs: Symmetric Multi-Processor (SMP) performance on all CPUs is basically the same SMP Node NUMA Node CPU0 CPU1 CPU2 CPU3 CPU0 CPU1 CPU2 CPU3 Memory Memory Memory All CPUs similar, equally close to memory Want to keep execution near memory; higher migration costs Scheduling Domains SMP Scheduling Domain General abstraction for CPU topology Tree of CPUs CPU0 CPU1 CPU2 CPU3 Each leaf node contains a group of close CPUs When an idle CPU rebalances, it starts at leaf node and works up to the root Most rebalancing within the leaf Higher threshold to rebalance across a parent Flat, all CPUS equivalent! 8

NUMA Scheduling Domains Hyper-threading CPU0 starts rebalancing here first Higher threshold to move to sibling/ parent Precursor to multi-core A few more transistors than Intel knew what to do with, but not enough to build a second core on a chip yet Duplicate architectural state (registers, etc), but not execution resources (ALU, floating point, etc) OS view: 2 logical CPUs CPU0 CPU1 CPU2 CPU3 CPU: pipeline bubble in one CPU can be filled with operations from another; yielding higher utilization Hyper-threaded scheduling Imagine 2 hyper-threaded CPUs 4 Logical CPUs But only 2 CPUs-worth of power Suppose I have 2 tasks They will do much better on 2 different physical CPUs than sharing one physical CPU They will also contend for space in the cache NUMA + Hyperthreading Scheduling Domains Logical CPU CPU0 Physical CPU is a sched domain NUMA DOMAIN 1 NUMA DOMAIN 1 CPU2 CPU4 CPU6 Less of a problem for threads in same program Why? CPU1 CPU3 CPU5 CPU7 Multi-core Outline More levels of caches Migration among CPUs sharing a cache preferable Why? More likely to keep data in cache Scheduling domains based on shared caches Eg, cores on same chip are in one domain Policy goals Low-level mechanisms O(1) Scheduler CPU topologies Scheduling interfaces 9

Setting priorities Scheduler Affinity setpriority(which, who, niceval) and getpriority() Which: process, process group, or user id PID, PGID, or UID Niceval: -20 to +19 (recall earlier) nice(niceval) Historical interface (backwards compatible) Equivalent to: setpriority(prio_process, getpid(), niceval) sched_setaffinity and sched_getaffinity Can specify a bitmap of CPUs on which this can be scheduled Better not be 0! Useful for benchmarking: ensure each thread on a dedicated CPU yield Summary Moves a runnable task to the expired runqueue Unless real-time (more later), then just move to the end of the active runqueue Several other real-time related APIs Understand competing scheduling goals Understand how context switching implemented Understand O(1) scheduler + rebalancing Understand various CPU topologies and scheduling domains Scheduling system calls 10