An introduction to checkpointing. for scientifc applications

Size: px
Start display at page:

Download "An introduction to checkpointing. for scientifc applications"

Transcription

1 UCL/CISM An introduction to checkpointing for scientifc applications November 2016 CISM/CÉCI training session

2 What is checkpointing?

3 Without checkpointing: $./count 1 2 3^C $./count With checkpointing: $./count 1 2 3^C $./count 4 5 6

4 Without checkpointing: With checkpointing: $./countcheckpointing: $./count 'saving' a computation 3^C 3^C $./count $./count so that it can be resumed later 1 4 (rather than started again)

5 Why do we need checkpointing?

6 Goals of checkpointing in HPC: 1. Fit in time constraints 2. Debugging, monitoring 3. Cope with NODE_FAILs 4. Gang scheduling and preemption

7 The idea: Save the program state Values in variables Open fles... Position in the code Signal or event... every time a checkpoint is encountered and restart from there upon (un)planned stop rather than bootstrap again from scratch starting loops at iteration 0 creating tmp fles...

8 The key questions... Transparency Transparency for for developer developer Portability Portability to to other other systems systems Size Size of of state state to to save save Checkpointing Checkpointing overhead overhead Do I need to write a lot of additional code? Can I stop on one system and restart on another? How many GB of disk does it require? How many FLOPs lost to ensure checkpointing?

9 Who's in charge of all that? Transparency for developer Portability to other systems Size of state to save Checkpointing overhead the application itself a library the compiler a run-time the OS the hardware

10 Today's agenda: How to make your program checkpoint-able -> concepts and examples -> recipes (design patterns) and signals -> Slurm -> parallel checkpointing

11 So you can play On hmem: ~dfr/checkpointing.tgz

12 1 Making a program checkpoint-able by saving its state every iteration and looking for a state fle on startup.

13 Without checkpointing: $./count 1 2 3^C $./count With checkpointing: $./count 1 2 3^C $./count 4 5 6

14 Without checkpointing: $./count 1 2 3^C $./count $./count 1 2 3^C $./count 4 5 6

15 The general recipe 1. Look for a state fle (name can be hardcoded, or, better, passed as parameter) 2. If found, then restore state (initialize all variables with content of the fle state) Else, bootstrap (create initial state) 3. Periodically save the state In the previous example : The state is just an integer Periodically means at each iteration

16 So you can play 1. Translate 'count' in your favorite language 2. Adapt it to enable checkpointing

17 Python recipe

18 R recipe

19 Octave recipe

20 Fortran recipe

21 C recipe

22 Java recipe

23 2 Using UNIX signals to reduce overhead : do not save the state at each iteration -- wait for the signal.

24 UNIX processes can receive 'signals' from the user, the OS, or another process

25 UNIX processes can receive 'signals' from the user, the OS, or another process ^C ^D kill -9 kill ^Z fg, bg

26 UNIX processes can receive 'signals' from the user, the OS, or another process e.g.

27 UNIX processes can receive 'signals' from the user, the OS, or another process e.g.

28 UNIX processes can receive 'signals' with an associated default action

29 UNIX processes can receive 'signals' and handle ('trap') them

30 The general recipe 1. Register a signal handler (a function that will modify a global variable when recieving a signal) 2. Test the value of the global variable periodically (At a moment when the state is consistent an easy to recreate) 3. If the value indicates so, save state to disk (and optionally gracefully stop) In the previous example : The state is just an integer Periodically means at each iteration

31 So you can play Adapt your program to handle signals

32 Useful links C: Fortran: Python: Octave: R: Java:

33 Previous C recipe

34 C signal recipe

35 C signal recipe

36 Fortan signal recipe

37 Fortan signal recipe

38 Java signal recipe

39 Python signal recipe

40 Octave signal recipe

41 R signal recipe

42 3 Use Slurm signaling abilities to manage checkpoint-able software in Slurm scripts on the clusters.

43 scancel is used to send signals to jobs

44 Example: use scancel --signal USR1 $SLURM_JOB_ID to force state dump for reviewing/debugging Python signal recipe

45 --signal to have Slurm send signals automatically before the end of the allocation

46 Set non-zero return code when stopping because of a received signal Fortran signal recipe

47 Then you can have your job re-queued automatically

48 Set a non-zero exit code C: exit(1) Fortran: stop 1 Octave: exit( 1 ) R: quit( status=1 ) Python: sys.exit( 1 ) Java: System.exit( 1 )

49 Or chain the jobs...

50 Using a signal-based watchdog to re-queue the job just before it is killed

51 4 Parallel programs are better checkpointed after a global synchronization.

52 In the fork-join model, checkpoint after a join and before a fork Checkpoint here Easily ensure state consistency Allows restarting with a different number of threads

53

54

55

56 5 Use programs and libraries that enable other programs with checkpoint/restart capabilities.

57 Such program needs to: 1. Access the process' memory (the c/r program forks itself as the process, or uses a kernel module) 2. Access the processor state at any moment (it uses signals to interrupt the process and provoke storage of the registers on the stack) 3. Track the state changing actions (fork, exec, system, etc.) (wrap standard library functions with LD_PRELOAD'ed custom functions) 4. Inject checkpointing code in the program (LD_PRELOAD a library with signal handlers)

58 Such program needs to: 1. Access the process' memory (the c/r program forks itself as the process, or uses a kernel module) 2. Access the processor state at any moment (it uses signals to interrupt the process and provoke storage of the registers on the stack) 3. Track the state changing actions (fork, exec, system, etc.) (wrap standard library functions with LD_PRELOAD'ed custom functions) 4. Inject checkpointing code in the program (LD_PRELOAD a library with signal handlers)

59 LD_PRELOAD magic

60 LD_PRELOAD magic

61

62 UCL/CISM Summary, Wrap-up and Conclusions. October 2014 CISM/CÉCI training session

63 Never again force your users to click 'Discard'...

64 Make initializations conditional Save minimal reconstructable state periodically Save full workspace upon signal Checkpoint after a synchronization

65 So you can play Adapt your own program for checkpoint/restart what is the minimal reconstructible state? what file format for the checkpoint? what frequence/what signal? what start strategy: look for checkpoint file vs command-line parameter, etc.? what initalization should I modify? what files should I re-open?

An introduction to checkpointing. for scientific applications

An introduction to checkpointing. for scientific applications damien.francois@uclouvain.be UCL/CISM - FNRS/CÉCI An introduction to checkpointing for scientific applications November 2013 CISM/CÉCI training session What is checkpointing? Without checkpointing: $./count

More information

Most of the work is done in the context of the process rather than handled separately by the kernel

Most of the work is done in the context of the process rather than handled separately by the kernel Process Control Process Abstraction for a running program Manages program s use of memory, cpu time, and i/o resources Most of the work is done in the context of the process rather than handled separately

More information

Process. Heechul Yun. Disclaimer: some slides are adopted from the book authors slides with permission

Process. Heechul Yun. Disclaimer: some slides are adopted from the book authors slides with permission Process Heechul Yun Disclaimer: some slides are adopted from the book authors slides with permission 1 Recap OS services Resource (CPU, memory) allocation, filesystem, communication, protection, security,

More information

Computer Systems II. First Two Major Computer System Evolution Steps

Computer Systems II. First Two Major Computer System Evolution Steps Computer Systems II Introduction to Processes 1 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent processes) 2 1 At First (1945 1955) In the beginning,

More information

Advanced Memory Management

Advanced Memory Management Advanced Memory Management Main Points Applications of memory management What can we do with ability to trap on memory references to individual pages? File systems and persistent storage Goals Abstractions

More information

Process. Heechul Yun. Disclaimer: some slides are adopted from the book authors slides with permission 1

Process. Heechul Yun. Disclaimer: some slides are adopted from the book authors slides with permission 1 Process Heechul Yun Disclaimer: some slides are adopted from the book authors slides with permission 1 Recap OS services Resource (CPU, memory) allocation, filesystem, communication, protection, security,

More information

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Objectives To introduce the notion of a

More information

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Processes and Non-Preemptive Scheduling. Otto J. Anshus Processes and Non-Preemptive Scheduling Otto J. Anshus Threads Processes Processes Kernel An aside on concurrency Timing and sequence of events are key concurrency issues We will study classical OS concurrency

More information

Processes and Threads

Processes and Threads COS 318: Operating Systems Processes and Threads Kai Li and Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall13/cos318 Today s Topics u Concurrency

More information

Announcements Processes: Part II. Operating Systems. Autumn CS4023

Announcements Processes: Part II. Operating Systems. Autumn CS4023 Operating Systems Autumn 2018-2019 Outline Announcements 1 Announcements 2 Announcements Week04 lab: handin -m cs4023 -p w04 ICT session: Introduction to C programming Outline Announcements 1 Announcements

More information

SMD149 - Operating Systems

SMD149 - Operating Systems SMD149 - Operating Systems Roland Parviainen November 3, 2005 1 / 45 Outline Overview 2 / 45 Process (tasks) are necessary for concurrency Instance of a program in execution Next invocation of the program

More information

Windows History 2009 Windows 7 2

Windows History 2009 Windows 7 2 Example: Windows 1 Windows History 2009 Windows 7 2 Features added Windows2000 additions Plug-and-play Network directory service New GUI Vista additions New GUI More focus on security clean-up the code

More information

Linux-CR: Transparent Application Checkpoint-Restart in Linux

Linux-CR: Transparent Application Checkpoint-Restart in Linux Linux-CR: Transparent Application Checkpoint-Restart in Linux Oren Laadan Columbia University orenl@cs.columbia.edu Linux Kernel Summit, November 2010 1 orenl@cs.columbia.edu Linux Kernel Summit, November

More information

Processes. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processes. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processes Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu OS Internals User space shell ls trap shell ps Kernel space File System Management I/O

More information

COS 318: Operating Systems

COS 318: Operating Systems COS 318: Operating Systems Overview Kai Li Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Important Times Lectures 9/20 Lecture is here Other lectures in

More information

W4118 Operating Systems. Junfeng Yang

W4118 Operating Systems. Junfeng Yang W4118 Operating Systems Junfeng Yang What is a process? Outline Process dispatching Common process operations Inter-process Communication What is a process Program in execution virtual CPU Process: an

More information

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008 Agenda Threads CSCI 444/544 Operating Systems Fall 2008 Thread concept Thread vs process Thread implementation - user-level - kernel-level - hybrid Inter-process (inter-thread) communication What is Thread

More information

Lecture 4: Process Management

Lecture 4: Process Management Lecture 4: Process Management (Chapters 2-3) Process: execution context of running program. A process does not equal a program! Process is an instance of a program Many copies of same program can be running

More information

Autosave for Research Where to Start with Checkpoint/Restart

Autosave for Research Where to Start with Checkpoint/Restart Autosave for Research Where to Start with Checkpoint/Restart Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC) brandon.barker@cornell.edu Workshop: High Performance

More information

3.1 Introduction. Computers perform operations concurrently

3.1 Introduction. Computers perform operations concurrently PROCESS CONCEPTS 1 3.1 Introduction Computers perform operations concurrently For example, compiling a program, sending a file to a printer, rendering a Web page, playing music and receiving e-mail Processes

More information

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017 ECE 550D Fundamentals of Computer Systems and Engineering Fall 2017 The Operating System (OS) Prof. John Board Duke University Slides are derived from work by Profs. Tyler Bletsch and Andrew Hilton (Duke)

More information

Last time: introduction. Networks and Operating Systems ( ) Chapter 2: Processes. This time. General OS structure. The kernel is a program!

Last time: introduction. Networks and Operating Systems ( ) Chapter 2: Processes. This time. General OS structure. The kernel is a program! ADRIAN PERRIG & TORSTEN HOEFLER Networks and Operating Systems (252-0062-00) Chapter 2: Processes Last time: introduction Introduction: Why? February 12, 2016 Roles of the OS Referee Illusionist Glue Structure

More information

A process. the stack

A process. the stack A process Processes Johan Montelius What is a process?... a computation KTH 2017 a program i.e. a sequence of operations a set of data structures a set of registers means to interact with other processes

More information

CS510 Operating System Foundations. Jonathan Walpole

CS510 Operating System Foundations. Jonathan Walpole CS510 Operating System Foundations Jonathan Walpole The Process Concept 2 The Process Concept Process a program in execution Program - description of how to perform an activity instructions and static

More information

Roadmap. Tevfik Ko!ar. CSC Operating Systems Fall Lecture - III Processes. Louisiana State University. Processes. September 1 st, 2009

Roadmap. Tevfik Ko!ar. CSC Operating Systems Fall Lecture - III Processes. Louisiana State University. Processes. September 1 st, 2009 CSC 4103 - Operating Systems Fall 2009 Lecture - III Processes Tevfik Ko!ar Louisiana State University September 1 st, 2009 1 Roadmap Processes Basic Concepts Process Creation Process Termination Context

More information

Operating System. Chapter 3. Process. Lynn Choi School of Electrical Engineering

Operating System. Chapter 3. Process. Lynn Choi School of Electrical Engineering Operating System Chapter 3. Process Lynn Choi School of Electrical Engineering Process Def: A process is an instance of a program in execution. One of the most profound ideas in computer science. Not the

More information

Distributed Systems Operation System Support

Distributed Systems Operation System Support Hajussüsteemid MTAT.08.009 Distributed Systems Operation System Support slides are adopted from: lecture: Operating System(OS) support (years 2016, 2017) book: Distributed Systems: Concepts and Design,

More information

Lecture Topics. Announcements. Today: Threads (Stallings, chapter , 4.6) Next: Concurrency (Stallings, chapter , 5.

Lecture Topics. Announcements. Today: Threads (Stallings, chapter , 4.6) Next: Concurrency (Stallings, chapter , 5. Lecture Topics Today: Threads (Stallings, chapter 4.1-4.3, 4.6) Next: Concurrency (Stallings, chapter 5.1-5.4, 5.7) 1 Announcements Make tutorial Self-Study Exercise #4 Project #2 (due 9/20) Project #3

More information

New User Seminar: Part 2 (best practices)

New User Seminar: Part 2 (best practices) New User Seminar: Part 2 (best practices) General Interest Seminar January 2015 Hugh Merz merz@sharcnet.ca Session Outline Submitting Jobs Minimizing queue waits Investigating jobs Checkpointing Efficiency

More information

Processes. Process Management Chapter 3. When does a process gets created? When does a process gets terminated?

Processes. Process Management Chapter 3. When does a process gets created? When does a process gets terminated? Processes Process Management Chapter 3 1 A process is a program in a state of execution (created but not terminated) Program is a passive entity one on your disk (survivor.class, kelly.out, ) Process is

More information

* What are the different states for a task in an OS?

* What are the different states for a task in an OS? * Kernel, Services, Libraries, Application: define the 4 terms, and their roles. The kernel is a computer program that manages input/output requests from software, and translates them into data processing

More information

CPSC 341 OS & Networks. Processes. Dr. Yingwu Zhu

CPSC 341 OS & Networks. Processes. Dr. Yingwu Zhu CPSC 341 OS & Networks Processes Dr. Yingwu Zhu Process Concept Process a program in execution What is not a process? -- program on a disk A process is an active object, but a program is just a file It

More information

Native POSIX Thread Library (NPTL) CSE 506 Don Porter

Native POSIX Thread Library (NPTL) CSE 506 Don Porter Native POSIX Thread Library (NPTL) CSE 506 Don Porter Logical Diagram Binary Memory Threads Formats Allocators Today s Lecture Scheduling System Calls threads RCU File System Networking Sync User Kernel

More information

Chapter 4: Multi-Threaded Programming

Chapter 4: Multi-Threaded Programming Chapter 4: Multi-Threaded Programming Chapter 4: Threads 4.1 Overview 4.2 Multicore Programming 4.3 Multithreading Models 4.4 Thread Libraries Pthreads Win32 Threads Java Threads 4.5 Implicit Threading

More information

System Call. Preview. System Call. System Call. System Call 9/7/2018

System Call. Preview. System Call. System Call. System Call 9/7/2018 Preview Operating System Structure Monolithic Layered System Microkernel Virtual Machine Process Management Process Models Process Creation Process Termination Process State Process Implementation Operating

More information

Processes. CS439: Principles of Computer Systems January 24, 2018

Processes. CS439: Principles of Computer Systems January 24, 2018 Processes CS439: Principles of Computer Systems January 24, 2018 Last Time History Lesson Hardware expensive, humans cheap Hardware cheap, humans expensive Hardware very cheap, humans very expensive Dual-mode

More information

Application Fault Tolerance Using Continuous Checkpoint/Restart

Application Fault Tolerance Using Continuous Checkpoint/Restart Application Fault Tolerance Using Continuous Checkpoint/Restart Tomoki Sekiyama Linux Technology Center Yokohama Research Laboratory Hitachi Ltd. Outline 1. Overview of Application Fault Tolerance and

More information

Processes. Johan Montelius KTH

Processes. Johan Montelius KTH Processes Johan Montelius KTH 2017 1 / 47 A process What is a process?... a computation a program i.e. a sequence of operations a set of data structures a set of registers means to interact with other

More information

UNIT -3 PROCESS AND OPERATING SYSTEMS 2marks 1. Define Process? Process is a computational unit that processes on a CPU under the control of a scheduling kernel of an OS. It has a process structure, called

More information

Process Scheduling Queues

Process Scheduling Queues Process Control Process Scheduling Queues Job queue set of all processes in the system. Ready queue set of all processes residing in main memory, ready and waiting to execute. Device queues set of processes

More information

IT 540 Operating Systems ECE519 Advanced Operating Systems

IT 540 Operating Systems ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (3 rd Week) (Advanced) Operating Systems 3. Process Description and Control 3. Outline What Is a Process? Process

More information

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin

More information

Unix Processes. What is a Process?

Unix Processes. What is a Process? Unix Processes Process -- program in execution shell spawns a process for each command and terminates it when the command completes Many processes all multiplexed to a single processor (or a small number

More information

OS 1 st Exam Name Solution St # (Q1) (19 points) True/False. Circle the appropriate choice (there are no trick questions).

OS 1 st Exam Name Solution St # (Q1) (19 points) True/False. Circle the appropriate choice (there are no trick questions). OS 1 st Exam Name Solution St # (Q1) (19 points) True/False. Circle the appropriate choice (there are no trick questions). (a) (b) (c) (d) (e) (f) (g) (h) (i) T_ The two primary purposes of an operating

More information

The Kernel Abstraction

The Kernel Abstraction The Kernel Abstraction Debugging as Engineering Much of your time in this course will be spent debugging In industry, 50% of software dev is debugging Even more for kernel development How do you reduce

More information

CSE325 Principles of Operating Systems. Processes. David P. Duggan February 1, 2011

CSE325 Principles of Operating Systems. Processes. David P. Duggan February 1, 2011 CSE325 Principles of Operating Systems Processes David P. Duggan dduggan@sandia.gov February 1, 2011 Today s Goal: 1. Process Concept 2. Process Manager Responsibilities 3. Process Scheduling 4. Operations

More information

ECE 598 Advanced Operating Systems Lecture 23

ECE 598 Advanced Operating Systems Lecture 23 ECE 598 Advanced Operating Systems Lecture 23 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 April 2016 Don t forget HW#9 Midterm next Thursday Announcements 1 Process States

More information

CSCE Operating Systems Interrupts, Exceptions, and Signals. Qiang Zeng, Ph.D. Fall 2018

CSCE Operating Systems Interrupts, Exceptions, and Signals. Qiang Zeng, Ph.D. Fall 2018 CSCE 311 - Operating Systems Interrupts, Exceptions, and Signals Qiang Zeng, Ph.D. Fall 2018 Previous Class Process state transition Ready, blocked, running Call Stack Execution Context Process switch

More information

Processes. OS Structure. OS Structure. Modes of Execution. Typical Functions of an OS Kernel. Non-Kernel OS. COMP755 Advanced Operating Systems

Processes. OS Structure. OS Structure. Modes of Execution. Typical Functions of an OS Kernel. Non-Kernel OS. COMP755 Advanced Operating Systems OS Structure Processes COMP755 Advanced Operating Systems An OS has many parts. The Kernel is the core of the OS. It controls the execution of the system. Many OS features run outside of the kernel, such

More information

COS 318: Operating Systems. Overview. Andy Bavier Computer Science Department Princeton University

COS 318: Operating Systems. Overview. Andy Bavier Computer Science Department Princeton University COS 318: Operating Systems Overview Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Logistics Precepts: Tue: 7:30pm-8:30pm, 105 CS

More information

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition

More information

PROCESS CONTROL: PROCESS CREATION: UNIT-VI PROCESS CONTROL III-II R

PROCESS CONTROL: PROCESS CREATION: UNIT-VI PROCESS CONTROL III-II R PROCESS CONTROL: This will describe the use and implementation of the system calls that control the process context. The fork system call creates a new process, the exit call terminates process execution,

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 8 Threads and Scheduling Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ How many threads

More information

PROCESS CONTROL BLOCK TWO-STATE MODEL (CONT D)

PROCESS CONTROL BLOCK TWO-STATE MODEL (CONT D) MANAGEMENT OF APPLICATION EXECUTION PROCESS CONTROL BLOCK Resources (processor, I/O devices, etc.) are made available to multiple applications The processor in particular is switched among multiple applications

More information

Processes. Process Concept

Processes. Process Concept Processes These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang s courses at GMU can make a single machine readable copy and print a single copy of each slide

More information

Process Concepts. CSC400 - Operating Systems. 3. Process Concepts. J. Sumey

Process Concepts. CSC400 - Operating Systems. 3. Process Concepts. J. Sumey CSC400 - Operating Systems 3. Process Concepts J. Sumey Overview Concurrency Processes & Process States Process Accounting Interrupts & Interrupt Processing Interprocess Communication CSC400 - Process

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

ECE 574 Cluster Computing Lecture 8

ECE 574 Cluster Computing Lecture 8 ECE 574 Cluster Computing Lecture 8 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 16 February 2017 Announcements Too many snow days Posted a video with HW#4 Review HW#5 will

More information

Graham vs legacy systems

Graham vs legacy systems New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet

More information

ELEC 377 Operating Systems. Week 1 Class 2

ELEC 377 Operating Systems. Week 1 Class 2 Operating Systems Week 1 Class 2 Labs vs. Assignments The only work to turn in are the labs. In some of the handouts I refer to the labs as assignments. There are no assignments separate from the labs.

More information

REVIEW OF COMMONLY USED DATA STRUCTURES IN OS

REVIEW OF COMMONLY USED DATA STRUCTURES IN OS REVIEW OF COMMONLY USED DATA STRUCTURES IN OS NEEDS FOR EFFICIENT DATA STRUCTURE Storage complexity & Computation complexity matter Consider the problem of scheduling tasks according to their priority

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Chapter 4: Multithreaded Programming. Operating System Concepts 8 th Edition,

Chapter 4: Multithreaded Programming. Operating System Concepts 8 th Edition, Chapter 4: Multithreaded Programming, Silberschatz, Galvin and Gagne 2009 Chapter 4: Multithreaded Programming Overview Multithreading Models Thread Libraries Threading Issues 4.2 Silberschatz, Galvin

More information

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne Chapter 4: Threads Silberschatz, Galvin and Gagne Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Linux Threads 4.2 Silberschatz, Galvin and

More information

Chap 4, 5: Process. Dongkun Shin, SKKU

Chap 4, 5: Process. Dongkun Shin, SKKU Chap 4, 5: Process 1 Process Concept Job A bundle of program and data to be executed An entity before submission for execution Process (= running program) An entity that is registered to kernel for execution

More information

Systems Programming/ C and UNIX

Systems Programming/ C and UNIX Systems Programming/ C and UNIX Alice E. Fischer November 22, 2013 Alice E. Fischer () Systems Programming Lecture 12... 1/27 November 22, 2013 1 / 27 Outline 1 Jobs and Job Control 2 Shared Memory Concepts

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information

Lecture 7: Signals and Events. CSC 469H1F Fall 2006 Angela Demke Brown

Lecture 7: Signals and Events. CSC 469H1F Fall 2006 Angela Demke Brown Lecture 7: Signals and Events CSC 469H1F Fall 2006 Angela Demke Brown Signals Software equivalent of hardware interrupts Allows process to respond to asynchronous external events (or synchronous internal

More information

COS 318: Operating Systems. Deadlocks. Jaswinder Pal Singh Computer Science Department Princeton University

COS 318: Operating Systems. Deadlocks. Jaswinder Pal Singh Computer Science Department Princeton University COS 318: Operating Systems Deadlocks Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Today s Topics u Conditions for a deadlock u Strategies

More information

Fall 2015 COMP Operating Systems. Lab #3

Fall 2015 COMP Operating Systems. Lab #3 Fall 2015 COMP 3511 Operating Systems Lab #3 Outline n Operating System Debugging, Generation and System Boot n Review Questions n Process Control n UNIX fork() and Examples on fork() n exec family: execute

More information

Chapter 4: Threads. Chapter 4: Threads

Chapter 4: Threads. Chapter 4: Threads Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Processes. Dr. Yingwu Zhu

Processes. Dr. Yingwu Zhu Processes Dr. Yingwu Zhu Process Growing Memory Stack expands automatically Data area (heap) can grow via a system call that requests more memory - malloc() in c/c++ Entering the kernel (mode) Hardware

More information

CSC 4320 Test 1 Spring 2017

CSC 4320 Test 1 Spring 2017 CSC 4320 Test 1 Spring 2017 Name 1. What are the three main purposes of an operating system? 2. Which of the following instructions should be privileged? a. Set value of timer. b. Read the clock. c. Clear

More information

Processes. CS439: Principles of Computer Systems January 30, 2019

Processes. CS439: Principles of Computer Systems January 30, 2019 Processes CS439: Principles of Computer Systems January 30, 2019 What We Know Operating system complexity increased over time in response to economic and technological changes The three roles did not show

More information

Process Description and Control. Chapter 3

Process Description and Control. Chapter 3 Process Description and Control Chapter 3 Contents Process states Process description Process control Unix process management Process From processor s point of view execute instruction dictated by program

More information

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems Processes CS 475, Spring 2018 Concurrent & Distributed Systems Review: Abstractions 2 Review: Concurrency & Parallelism 4 different things: T1 T2 T3 T4 Concurrency: (1 processor) Time T1 T2 T3 T4 T1 T1

More information

Processes. Sanzheng Qiao. December, Department of Computing and Software

Processes. Sanzheng Qiao. December, Department of Computing and Software Processes Sanzheng Qiao Department of Computing and Software December, 2012 What is a process? The notion of process is an abstraction. It has been given many definitions. Program in execution is the most

More information

Background: Operating Systems

Background: Operating Systems Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M030 9 th October 2015 Outline Goals of an operating system Sketch of UNIX User processes, kernel Process-kernel communication Waiting

More information

Signals: Management and Implementation. Sanjiv K. Bhatia Univ. of Missouri St. Louis

Signals: Management and Implementation. Sanjiv K. Bhatia Univ. of Missouri St. Louis Signals: Management and Implementation Sanjiv K. Bhatia Univ. of Missouri St. Louis sanjiv@aryabhat.umsl.edu http://www.cs.umsl.edu/~sanjiv Signals Mechanism to notify processes of asynchronous events

More information

Processes and Threads. Processes and Threads. Processes (2) Processes (1)

Processes and Threads. Processes and Threads. Processes (2) Processes (1) Processes and Threads (Topic 2-1) 2 홍성수 Processes and Threads Question: What is a process and why is it useful? Why? With many things happening at once in a system, need some way of separating them all

More information

Concurrent Programming. Copyright 2017 by Robert M. Dondero, Ph.D. Princeton University

Concurrent Programming. Copyright 2017 by Robert M. Dondero, Ph.D. Princeton University Concurrent Programming Copyright 2017 by Robert M. Dondero, Ph.D. Princeton University 1 Objectives You will learn/review: What a process is How to fork and wait for processes What a thread is How to spawn

More information

VEOS high level design. Revision 2.1 NEC

VEOS high level design. Revision 2.1 NEC high level design Revision 2.1 NEC Table of contents About this document What is Components Process management Memory management System call Signal User mode DMA and communication register Feature list

More information

What is a Process? Processes and Process Management Details for running a program

What is a Process? Processes and Process Management Details for running a program 1 What is a Process? Program to Process OS Structure, Processes & Process Management Don Porter Portions courtesy Emmett Witchel! A process is a program during execution. Ø Program = static file (image)

More information

Announcement. Exercise #2 will be out today. Due date is next Monday

Announcement. Exercise #2 will be out today. Due date is next Monday Announcement Exercise #2 will be out today Due date is next Monday Major OS Developments 2 Evolution of Operating Systems Generations include: Serial Processing Simple Batch Systems Multiprogrammed Batch

More information

Computer System Overview

Computer System Overview Computer System Overview Introduction A computer system consists of hardware system programs application programs 2 Operating System Provides a set of services to system users (collection of service programs)

More information

The cow and Zaphod... Virtual Memory #2 Feb. 21, 2007

The cow and Zaphod... Virtual Memory #2 Feb. 21, 2007 15-410...The cow and Zaphod... Virtual Memory #2 Feb. 21, 2007 Dave Eckhardt Bruce Maggs 1 L16_VM2 Wean Synchronization Watch for exam e-mail Please answer promptly Computer Club demo night Thursday (2/22)

More information

Process Description and Control. Chapter 3

Process Description and Control. Chapter 3 Process Description and Control Chapter 3 Major Requirements of an Operating System Interleave the execution of many processes to maximize processor utilization while providing reasonable response time

More information

Checkpointing with DMTCP and MVAPICH2 for Supercomputing. Kapil Arya. Mesosphere, Inc. & Northeastern University

Checkpointing with DMTCP and MVAPICH2 for Supercomputing. Kapil Arya. Mesosphere, Inc. & Northeastern University MVAPICH Users Group 2016 Kapil Arya Checkpointing with DMTCP and MVAPICH2 for Supercomputing Kapil Arya Mesosphere, Inc. & Northeastern University DMTCP Developer Apache Mesos Committer kapil@mesosphere.io

More information

Lecture 4: Memory Management & The Programming Interface

Lecture 4: Memory Management & The Programming Interface CS 422/522 Design & Implementation of Operating Systems Lecture 4: Memory Management & The Programming Interface Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken

More information

Each terminal window has a process group associated with it this defines the current foreground process group. Keyboard-generated signals are sent to

Each terminal window has a process group associated with it this defines the current foreground process group. Keyboard-generated signals are sent to Each terminal window has a process group associated with it this defines the current foreground process group. Keyboard-generated signals are sent to all processes in the current window s process group.

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013! Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Threading Issues Operating System Examples

More information

Sample Questions. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Sample Questions. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) Sample Questions Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Sample Questions 1393/8/10 1 / 29 Question 1 Suppose a thread

More information

The Kernel Abstraction. Chapter 2 OSPP Part I

The Kernel Abstraction. Chapter 2 OSPP Part I The Kernel Abstraction Chapter 2 OSPP Part I Kernel The software component that controls the hardware directly, and implements the core privileged OS functions. Modern hardware has features that allow

More information

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering

More information

CSCE 313 Introduction to Computer Systems. Instructor: Dezhen Song

CSCE 313 Introduction to Computer Systems. Instructor: Dezhen Song CSCE 313 Introduction to Computer Systems Instructor: Dezhen Song Programs, Processes, and Threads Programs and Processes Threads Programs, Processes, and Threads Programs and Processes Threads Processes

More information

Lecture 1 Introduction (Chapter 1 of Textbook)

Lecture 1 Introduction (Chapter 1 of Textbook) Bilkent University Department of Computer Engineering CS342 Operating Systems Lecture 1 Introduction (Chapter 1 of Textbook) Dr. İbrahim Körpeoğlu http://www.cs.bilkent.edu.tr/~korpe 1 References The slides

More information

Protection and System Calls. Otto J. Anshus

Protection and System Calls. Otto J. Anshus Protection and System Calls Otto J. Anshus Protection Issues CPU protection Prevent a user from using the CPU for too long Throughput of jobs, and response time to events (incl. user interactive response

More information

ENGR 3950U / CSCI 3020U Midterm Exam SOLUTIONS, Fall 2012 SOLUTIONS

ENGR 3950U / CSCI 3020U Midterm Exam SOLUTIONS, Fall 2012 SOLUTIONS SOLUTIONS ENGR 3950U / CSCI 3020U (Operating Systems) Midterm Exam October 23, 2012, Duration: 80 Minutes (10 pages, 12 questions, 100 Marks) Instructor: Dr. Kamran Sartipi Question 1 (Computer Systgem)

More information

MPI History. MPI versions MPI-2 MPICH2

MPI History. MPI versions MPI-2 MPICH2 MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

CSCE 313: Intro to Computer Systems

CSCE 313: Intro to Computer Systems CSCE 313 Introduction to Computer Systems Instructor: Dr. Guofei Gu http://courses.cse.tamu.edu/guofei/csce313/ Programs, Processes, and Threads Programs and Processes Threads 1 Programs, Processes, and

More information