CSCE Introduction to Computer Systems Spring 2019

Similar documents
Creating Threads COMP755

Creating Threads. Programming Details. COMP750 Distributed Systems

Processes. 4: Threads. Problem needs > 1 independent sequential process? Example: Web Server. Last Modified: 9/17/2002 2:27:59 PM

CSCE Introduction to Computer Systems Spring 2019

Processes, Threads, SMP, and Microkernels

Chapter 4 Threads, SMP, and

Lecture Contents. 1. Overview. 2. Multithreading Models 3. Examples of Thread Libraries 4. Summary

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

What is a Process. Processes. Programs and Processes. System Classification 3/5/2013. Process: An execution stream and its associated state

CSCE : Computer Systems Homework #1 Part 1 (25 pts) Due date: 1/24/19

Lecture Topics. Announcements. Today: Threads (Stallings, chapter , 4.6) Next: Concurrency (Stallings, chapter , 5.

CSCE Introduction to Computer Systems Spring 2019

CSCE Introduction to Computer Systems Spring 2019


Threads in Java. Threads (part 2) 1/18

Operating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering

Threads, SMP, and Microkernels. Chapter 4

CSE 153 Design of Operating Systems Fall 2018

Announcement. Exercise #2 will be out today. Due date is next Monday

Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018

Chapter 3 Process Description and Control

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Exercise (could be a quiz) Solution. Concurrent Programming. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - IV Threads

4. Concurrency via Threads

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019

CSE325 Principles of Operating Systems. Processes. David P. Duggan February 1, 2011

Computer Systems Architecture

Lecture 5: Process Description and Control Multithreading Basics in Interprocess communication Introduction to multiprocessors

Chapter 4: Threads. Operating System Concepts 8 th Edition,

System Call. Preview. System Call. System Call. System Call 9/7/2018

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

! Readings! ! Room-level, on-chip! vs.!

Parallel Computing Platforms

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

CSCE 313 Introduction to Computer Systems. Instructor: Dezhen Song

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

CSCE 313: Intro to Computer Systems

Multi-core Architectures. Dr. Yingwu Zhu

Processes and Threads

4.8 Summary. Practice Exercises

Process Characteristics. Threads Chapter 4. Process Characteristics. Multithreading vs. Single threading

Threads Chapter 4. Reading: 4.1,4.4, 4.5

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Last time: introduction. Networks and Operating Systems ( ) Chapter 2: Processes. This time. General OS structure. The kernel is a program!

Threads. CS3026 Operating Systems Lecture 06

Computer Systems Architecture

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018

Lecture 2: February 6

Introduction II. Overview

Definition Multithreading Models Threading Issues Pthreads (Unix)

Chapter 3 Process Description and Control

Lecture 24: Memory, VM, Multiproc

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits

Arvind Krishnamurthy Spring Threads, synchronization, scheduling Virtual memory File systems Networking

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Threads. Threads (continued)

Chapter 5: Threads. Outline

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Windows architecture. user. mode. Env. subsystems. Executive. Device drivers Kernel. kernel. mode HAL. Hardware. Process B. Process C.

WHY PARALLEL PROCESSING? (CE-401)

Chapter 17 - Parallel Processing

Sistemi in Tempo Reale

Multiprocessors 2007/2008

Chapter 4: Multithreaded Programming

Fall 2017 :: CSE 306. Process Abstraction. Nima Honarmand

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems

For use by students enrolled in #71251 CSE430 Fall 2012 at Arizona State University. Do not use if not enrolled.

Parallel Architectures

The Big Picture So Far. Today: Process Management

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008

THREAD LEVEL PARALLELISM

Processes, Threads and Processors

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Processes and threads

Last class: Today: Course administration OS definition, some history. Background on Computer Architecture

1 System & Activities

CSC 2405: Computer Systems II

Processes and Threads. From Processes to Threads. The Case for Threads. The Case for Threads. Process abstraction combines two concepts.

EECS 3221 Operating System Fundamentals

EECS 3221 Operating System Fundamentals

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

W4118 Operating Systems. Junfeng Yang

Introduction to Parallel Computing

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Slide 6-1. Processes. Operating Systems: A Modern Perspective, Chapter 6. Copyright 2004 Pearson Education, Inc.

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Get detailed information from

CPS 210: Operating Systems

Operating Systems. Computer Science & Information Technology (CS) Rank under AIR 100

Page 1. Analogy: Problems: Operating Systems Lecture 7. Operating Systems Lecture 7

Parallelism Marco Serafini

OS lpr. www. nfsd gcc emacs ls 1/27/09. Process Management. CS 537 Lecture 3: Processes. Example OS in operation. Why Processes? Simplicity + Speed

Lecture 4: Memory Management & The Programming Interface

CS6401- Operating System QUESTION BANK UNIT-I

Lecture 25: Interrupt Handling and Multi-Data Processing. Spring 2018 Jason Tang

THREADS & CONCURRENCY

Lecture Topics. Announcements. Today: Operating System Overview (Stallings, chapter , ) Next: Processes (Stallings, chapter

Transcription:

CSCE 313-200 Introduction to Computer Systems Spring 2019 Threads Dmitri Loguinov Texas A&M University January 29, 2019 1

Updates Quiz on Thursday System Programming Tutorial (pay attention to exercises) Pointers, VS debugging tools/strategies, APIs Common Microsoft data types The last two lectures (OS concepts, processes) Common issues in hw1p1 Not waiting for CC.exe to exit Printing room with %X instead of %llx Not handling CC errors in ResponseCC::status Make sure to check for API errors Catches bugs sooner, simplifies debugging 2

Chapter 4: Roadmap 4.1 Processes and threads 4.2 SMP 4.3 Micro-kernels 4.4 Windows threads 4.5 Solaris threads 4.6 Linux threads Part II Chapter 3: Processes Chapter 4: Threads Chapter 5: Concurrency Chapter 6: Deadlocks 3

Motivation Why parallelize a single program? Two main reasons Take advantage of multi-core CPU capacity Perform many concurrent blocking operations quickly While non-blocking I/O helps with the second issue, it doesn t solve the first one Also makes code more complex Code suitable for parallelization CPU-intensive computation a non-threaded process can never utilize more than one core Many blocking operations network connections, pipes, user events 4

Taxonomy Why not fork a new process then? Two main issues: Frequent process context switch is expensive Data sharing may be inefficient (i.e., through kernel) and possibly tedious to program Thus, there is a need for a simpler/faster concurrency model that uses threads Thread is a dispatchable unit of work within a process single process, single thread (MS-DOS) process 1 process 1 process process process N process N multiple processes, single thread (old Unix) single process, multiple threads (user library over uniprogramming OS) multiple processes, multiple threads (modern OSes) 5

How to Implement Threads Historically, threads didn t exist in multi-tasked OSes Users wrote special libraries (e.g., pthreads) to emulate threads OS scheduled the process, then library scheduled threads Benefits of User-Level Threads (ULT): Thread switch completely in user mode (i.e., fast) Control over scheduler and its policy Portability of code (no dependency on OS APIs) Problems: process thread library kernel process scheduler old Unix When kernel APIs block, the entire process is blocked No ability to run concurrently on multiple CPUs 6

How to Implement Threads Later, OSes became threadaware and offered Kernel- Level Threads (KLT) Another term is Light Weight Processes (LWT) Benefits of KLT: Multi-CPU usage by the same program, non-blocking I/O Drawbacks compared to ULT: Requires kernel mode switch after each slice or whever the scheduler runs process kernel thread scheduler Windows/Linux 7

Performance delay in microsec How expensive is context switch? Traditional numbers suggest ULT switch is 10x faster than KLT, which is 5x faster than process switch Windows benchmark agrees with the last ratio ULT rarely used on Windows, no performance results readily available Operation ULT KLT Process Create 34 948 11,300 Event wait + switch 37 441 1,840 old VAX Unix Operation ULT KLT Process Event wait + switch 2.2 Switch 0.44 AMD Phenom II X6 2.8 GHz While these latencies are small, they do increase as the # of threads/processes in the ready state rises 8

Kernel Threads Difference from the single-threaded model Threads have separate stacks and execution context called Thread Control Block (TCB), but share all virtual memory process image thread 1 thread N Data Data User stack User stack Program code Program code Kernel stack Kernel stack User stack PCB TCB 1 TCB N PCB Shared memory Kernel stack Shared memory process image single-threaded process multi-threaded process 9

Kernel Threads OS still enforces separation between processes However, threads are not protected from each other Buffer overflow in one thread may wipe out data of other threads in the same process Process owns Virtual address space and shared memory Security attributes of all objects (e.g., open files) Threads own TCB that includes thread state (e.g., blocked, running, ready), thread context (registers), scheduler priorities and its auxiliary info, pending wait events Execution stack (user and kernel) 10

Using Threads typedef DWORD ( stdcall *LPTHREAD_START_ROUTINE) ( [in] LPVOID lpthreadparameter ); In Windows: HANDLE WINAPI CreateThread ( in_opt LPSECURITY_ATTRIBUTES lpthreadattributes, in SIZE_T dwstacksize, in LPTHREAD_START_ROUTINE lpstartaddress, in_opt LPVOID lpparameter, in DWORD dwcreationflags, out_opt LPDWORD lpthreadid ); Security = NULL, stacksize = 0 (default), flags = 0 Must provide the address of start function Thread executes from that address Current thread continues as normal Definition of a thread function: DWORD stdcall MyThread (LPVOID lpthreadparameter); 11

Using Threads class MyExample { public: int count; void Run (int threadid); ; DWORD stdcall ThreadStarter (LPVOID p) { ThreadParams *t = (ThreadParams*) p; t->me->run (t->threadid); return 0; class ThreadParams { public: MyExample* me; int threadid; ; #define THREADS_TO_RUN 100 void main (void) { HANDLE thread [THREADS_TO_RUN]; ThreadParams t [THREADS_TO_RUN]; MyExample me; me.count = 0; // stores thread handles // parameters passed to threads for (int i = 0; i < THREADS_TO_RUN; i++) { // start a bunch of threads t[i].threadid = i; // assign seq # to this thread t[i].me = &me; // must pass a pointer to shared variables/classes // run thread with default stack size if ((thread [i] = CreateThread (NULL, 0, ThreadStarter, t + i, 0, NULL)) == NULL) { printf ( failed to create thread %d, error %d\n", i, GetLastError()); exit (-1); for (int i = 0; i < THREADS_TO_RUN; i++) { // now hang here waiting for threads to quit WaitForSingleObject (thread [i], INFINITE); CloseHandle (thread[i]); printf ( result = %d\n, me.count); 12

Using Threads Try to encapsulate all functionality inside your class member functions Local variables are never { shared (they stay in thread int a = 4; Sleep (100); a stack) += 70; Global and static variables Shared between threads, but they are considered bad style and thus not recommended Heap-allocated blocks void MyExample::Run (int threadid) { Sleep (100); count ++; printf ("Thread %d finished\n", threadid); void MyExample::Run (int threadid) // local int b = 3; // global void MyExample::Run (int threadid) { static int a = 4; // static a += 70; b += 70; Normally not shared unless you provide a common pointer to multiple threads and they dereference it 13

Using Threads Thread execution is non-deterministic Threads can be interrupted at any time void MyExample::Run (int threadid) { Sleep (100); count ++; printf ("Thread %d finished\n", threadid); Speed of execution may differ by any factor Make sure each thread gets its own copy of ThreadParams to avoid problems like this: ThreadParams t; t.me = &me; for (int i = 0; i < THREADS_TO_RUN; i++) { // start a bunch of threads t.threadid = i; // assign # to this thread if ((thread [i] = CreateThread (NULL, 0, ThreadStarter, &t, 0, NULL)) == NULL) { printf ( failed to create thread %d, error %d\n", i, GetLastError()); exit (-1); all threads may get their threadid = THREADS_TO_RUN-1 14

Chapter 4: Roadmap 4.1 Processes and threads 4.2 SMP 4.3 Micro-kernels 4.4 Windows threads 4.5 Solaris threads 4.6 Linux threads 15

SMP SMP (Symmetric Multi-Processing) Consists of multiple CPUs connected by bus (e.g., HyperTransport in AMD) Each CPU contains multiple cores and dedicated memory controller SMP benefits: Performance Availability (e.g., failure of some CPUs does not have to crash the system) Scalability (e.g., more CPUs can be added to an existing motherboard if it supports them) CPU 1 core 1 core 2 memory controller RAM bank 1 CPU 2 core 3 core 4 memory controller RAM bank 2 16

SMP CPU clock speed no longer scales due to insurmountable heat problems Scaling cores is much easier at this stage Consumer-grade computers today Intel Xeon w/24-cores, quad-cpu configurations (96 cores per motherboard), Intel Phi expansion card w/60 cores CUDA (nvidia Titan) video cards with 5000+ cores Evolution of computer architecture: Sequential computers had a single CPU Traditional 1940s-1950s mainframes Sequential Execution Parallel 17

SMP Notation: S = single, M = multiple I = instruction, D = data Level 1 SISD: single core, no internal parallelism SIMD: single core, can run the same instruction on multiple RAM locations in parallel (e.g., video cards, SSE, MMX, AVX) MIMD: different instructions on different data (i.e., multiple cores) MISD: rarely implemented Sequential SISD Execution SIMD Parallel MIMD 18

SMP Sequential Execution Parallel Level 2: Shared memory: single motherboard Distributed memory: multiple computers Level 3: Master/slave: OS run on dedicated core, programs run everywhere else SISD Master/ slave SIMD SMP: OS and programs share all cores (modern computers and kernels) this course Clusters: racks of servers, possibly geographically distributed in datacenters Shared memory SMP MIMD Distributed memory Clusters 19

Wrap-up Cache coherence issues drastically affect consistency and performance when multiple threads modify the same RAM location CPU 1 CPU 2 core 1 core 2 core 3 core 4 L1 cache L1 cache L1 cache L1 cache L2 cache L2 cache L2 cache L2 cache memory controller L3 cache memory controller RAM bank 1 RAM bank 2 20