PRACE Autumn School Basic Programming Models
|
|
- Angelica McDowell
- 5 years ago
- Views:
Transcription
1 PRACE Autumn School 2010 Basic Programming Models
2 Basic Programming Models - Outline Introduction Key concepts Architectures Programming models Programming languages Compilers Operating system & libraries APIs 2
3 Introduction Basic programming environment #include <stdio.h> int main(int argc, char * argv[]) { printf ( Hello world\n ); return 0; } Compiler $ gcc -o hello hello.c $ hello (binary file) Execution environment (OS & libraries) Hardware $./hello hello world $ 3
4 Architecture Single processor model Core Cache hierarchy Memory 4
5 A problem The number of transistors in a chip keeps increasing but power consumption and heat dissipation have reached a limit Intel Core i7 Processor Extreme Edition 32 nm. 6 cores / 12 threads 3.33 Ghz. Clock speed 12 Mb. cache cannot increase!!! Solution: keep same clock rates and use the area to colocate several processors (cores) in the same die Use parallelism to increase overall throughput 5
6 Parallelism Serial architectures already had a lot of parallelism ILP Instruction-Level Parallelism Pipelined and superscalar processors SIMD Single-Instruction Multiple-Data SSE4, AVX, Altivec... The focus is now TLP Thread-Level Parallelism Also exploited in SMP nodes with single core processors Requires to rewrite applications to take advantage of multiple hardware threads/cores 6
7 A thread... simply defined as an independent execution context Appear at all levels: Application, libraries user-level, OS, hardware Before multicore processors, each processor had only one hardware thread User-level OS Hardware 7
8 Simultaneous multithreading Single processor model with SMT Seen as two hardware threads from the OS point of view Q: how to spawn work to the second SMT? 8
9 Multicore and SMT Multi-core model (with SMT) Seen as two hardware threads from the OS point of view No difference with respect to SMT (see /proc/cpuinfo) Q: Differences between SMT and multi-core? 9
10 Multicore SMP machines Multi-chip model and memory interconnect Coherence protocol Hardware ensures data coherency and consistency At a cache line granularity Invalidations, false sharing... Q: Interfaces? 10
11 NUMA NUMA Non-Uniform Memory Access Access to memory addresses is not uniform Memory locality and migration are important for performance Q: Interfaces? 11
12 Accelerators Heterogeneity GPUs, FPGAs Accelerators in general 12
13 Cluster of multicore SMP machines Cluster model distributed memory Cluster Interconnect Q: How data is transfered among memories in different nodes? Q: Interfaces? 13
14 Basic Programming Models - Outline Introduction Key concepts Architectures Programming models Programming languages Compilers Operating system & libraries APIs 14
15 Programming Models Shared memory Automatic data accesses No need to express communication up to cores Message passing Distributed memory Programmer is responsible... through expressing communication up to nodes Cluster Interconnect 15
16 Shared memory Memory is accessed (read and written) from all processor cores CPU1 x=n x= x=n CPU2 Communication and synchronization happen through shared memory 16
17 Shared memory No other writes to x Cache coherence protocol write read on hardware thread P W x, N R x time N write (on P1) read (on P2) write (on P1) write (on P2) No other writes to x W R x, N x time N No other writes to x W W x, N x, M time Never read location x as M first and then N 17
18 Distributed memory Message passing Each processor has its own memory cannot be accessed directly from other processors x=n CPU1 x= CPU2 x= x=... Communication and synchronization happen through explicit messages 18
19 Performance measurement Speed-up (S) Expresses how much faster (or slower) the parallel execution is T s T(p) Execution time of the sequential version of the program Execution time of the parallel version running on P processors S(p) = T s T(p) 19
20 Performance measurement Efficiency (E) Measures how well we are using the machine resources E(p) = p S(p) 20
21 Performance measurement Scalability curve Expresses how good the efficiency is as we increase the number of processors Speed-up Superlinear Ideal Acceptable Poor scalability Number of processors 21
22 Amdahl's Law (from Gene Amdahl) Determines which is the maximum speed-up we can expect Represents the impact of the sequential part of a program in its overall scalability f Fraction of program that is sequential (cannot be parallelized) S max (p) = f f p f = f = f = % serial f = % serial!! 22
23 Comparison 100 to 1000 cores Amdahl's Law f = f = f = % serial f = % serial!! 23
24 Reality is usually worse Scalability curve degraded Overhead Communication & synchronization between threads Speed-up Conflicts Superlinear Ideal Acceptable Degraded performance Number of processors 24
25 Sources of overhead Management of parallelism Cost of creating, and joining parallelism Communication Cache coherence and/or application messaging Synchronization Locks, barriers... depending on their use and implementation Load imbalance Program has not enough work to keep all resources busy, or The distribution of work results in some processors receiving more work to do than others 25
26 Sources of overhead They mainly cause an increase in the serial portion of the program With the corresponding loose of scalability Their impact depends on architecture runtime system application 26
27 Programming Languages and Interfaces C, Fortran Pthreads, Message Passing Interface (MPI) OpenMP Global address space approaches UPC, Coarray Fortran, X10, Chapel, Fortress 27
28 Identifying parallelism Two main sources Functional decomposition (task parallelism) Which parts of the application can run in parallel? Data decomposition (data parallelism) Which operations on data can be performed in parallel Loops are usually a good source of parallelism 28
29 Choosing the right granularity Should we parallelize big or small portions of the application? Ideally, we should choose the coarser grain Lower overall overhead Usually less communication and synchronization... but it can lead to load imbalance Using finer grain Larger overall overhead, communication and synchronization... but much better load balance is possible Large applications use multi-level parallelization With the possibility of using several programming models MPI+OpenMP 29
30 After parallelization... solve the problems Correctness Incorrect parallelization, race conditions, deadlock... Performance Load imbalance False sharing Bad locality management Finding the source of problems is intrinsically harder than in sequential programs Use as much help as possible from support tools 30
31 True sharing and false sharing True sharing: two processors access the same memory location, and the hardware will transfer the data from CPU1's cache to CPU2's cache CPU1 x=n x= x=n CPU2 False sharing: two processors access different (close) memory locations, in the same cache line CPU1 x0=n x0= x1= x2= x3= x4= x4=m CPU2 cache line moving from CPU1's cache to CPU2's cache 31
32 Compilers Different languages require different compilers gcc -o hello hello.c g++ -o hello hello.c gfortran -o hello hello.f gfortran free-form -o hello hello.f mpicc -o hello hello.c mpif90 -o hello hello.f90 32
33 Detailed compilation flow Source code Headers Preprocessor Compiler Other Object files Libraries Assembler Object file Linker Executable file 33
34 Some compiler options Optimization -O -O3 Debugging -g -ggdb Profiling -pg Preprocess only -E Go to object file -c Add include search directory -I<directory> Add libraries search directory -L<directory> Add libraries to link with -l<library-short-name> -lpthread indicates to link with libpthread.so library -lm link to the math library (use also #include <math.h>) Support OpenMP -fopenmp 34
35 Basic debugging A debugger allows to examine the execution of a program Start your program (run) gdb hello run Specifying any parameter that might affect its behavior run -i -t 10 Stops your program on critical errors (segm. fault) Stop your program on specified conditions Examine the program when stopped Change registers/variables/memory in your program Solve small bugs an keep going 35
36 Debugging a program Give attention to the compiler flags -g to generate debugging information -O0 to get accurate information about your program -O, -O3 can cause inaccuracies Invoke the debugger with your program gdb <program-to-debug> Run the program run <usual arguments provided to the program> Ctrl-C will interrupt execution and gdb will regain control 36
37 Debugging a program Attaching to a running process gdb -pid PID Setting breakpoints break function break file:function break <linenumber> break file:<line number> break... if i>5 Setting watchpoints watch i==100 37
38 Execution environment Applications Runtime libraries Operating system Hardware Operating System Task Thread PC+SP Thread Task PC+SP PC+SP Process - application - runtime libraries Hardware 38
39 OS support Generic structure of the address space Code Data BSS Shared libraries Main stack PC2 PC0 PC1 SP3 SP1 PC3 SP2 Data contains constant-initialized data BSS grows with malloc Other memory areas can be mapped with mmap Thread stacks are allocated at thread creation Main stack grows automatically up to ulimit -s (e.g. 8 Mb.) SP0 39
40 Runtime libraries C support library glibc UNIX system calls open/close/read/write/fork/exec... Buffered IO printf, fprintf, fread, fwrite, fopen... Sockets TCP and UDP communications MPI builds on sockets or other communication infrastructures... Pthreads Creation, termination Attributes Scheduling, priorities, binding... OpenMP builds on top of pthreads 40
41 Static and dynamic libraries Non-shared (static) The executable has the code of all functions Uses more memory space It cannot take advantage of new library versions Dynamic (shared) The executable contains only its own code It links the libraries dynamically Only one resident copy of the library Installing a new version will affect all programs 41
42 Pthreads Creation of a pthread Allocates a thread descriptor Allocates stack and builds stack frame Function, argument Creates a kernel-level thread if needed Sharing process resources #include <pthread.h> int pthread_create( pthread_t * thread, pthread_attr_t * attr, void * (* start_routine) (void *), void * arg ); 42
43 Pthreads Example: creation of a pthread void main () { int res; pthread_t th;... res = pthread_create (&th, NULL, func, argument);... } void * func (void * argument) { printf ( argument %d\n, (int) argument); } th 43
44 Pthreads types of parallelism Fork/join Fork / join n x pthread_create Join Gets the termination code of the pthreads n x pthread_join 44
45 Pthreads types of parallelism Unstructured - detach Unstructured pthread_create + pthread_detach pthread_exit Detach The application does not need to get the pthreads termination code 45
46 Pthreads - interface pthread_exit (code) saves the termination code in the pthread structure pthread_join (pth, &status) retrieves the pthread termination code pthread_detach (pth) marks the pthread structure to be freed when the pthread terminates The pthread descriptor can be released after Pthread_exit and pthread_join Pthread_detach and pthread_exit 46
47 Pthreads - main difficulties Code outlining Programmer has to identify parallelism, outline regions in functions and define the interfaces Synchronization Locking Conditional variables In heterogeneous multicores (Cell/B.E., GPUs...) Split the program in different files for compilation 47
48 MPI Based on messaging added to the application, and the MPI library msg_send, msg_rcv Compiler driver mpicc -o hello hello.c takes care of adding -I<mpi-include> and -lmpi Runtime system The MPI library 48
49 OpenMP Based on directives on top of C/C++/Fortran Compiler driver gcc -fopenmp -lgomp -lpthreads added automatically by the gcc compiler Runtime system Based on Pthreads, not seen by programmer 49
50 Next steps on this track MPI Monday afternoon to Wednesday OpenMP Thursday and Friday 50
Threaded Programming. Lecture 9: Alternatives to OpenMP
Threaded Programming Lecture 9: Alternatives to OpenMP What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming
More informationCS 3305 Intro to Threads. Lecture 6
CS 3305 Intro to Threads Lecture 6 Introduction Multiple applications run concurrently! This means that there are multiple processes running on a computer Introduction Applications often need to perform
More informationCS333 Intro to Operating Systems. Jonathan Walpole
CS333 Intro to Operating Systems Jonathan Walpole Threads & Concurrency 2 Threads Processes have the following components: - an address space - a collection of operating system state - a CPU context or
More informationCS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University
CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 Process creation in UNIX All processes have a unique process id getpid(),
More informationCS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University
CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 The Process Concept 2 The Process Concept Process a program in execution
More informationCS510 Operating System Foundations. Jonathan Walpole
CS510 Operating System Foundations Jonathan Walpole The Process Concept 2 The Process Concept Process a program in execution Program - description of how to perform an activity instructions and static
More informationpthreads CS449 Fall 2017
pthreads CS449 Fall 2017 POSIX Portable Operating System Interface Standard interface between OS and program UNIX-derived OSes mostly follow POSIX Linux, macos, Android, etc. Windows requires separate
More informationThreads. Threads (continued)
Threads A thread is an alternative model of program execution A process creates a thread through a system call Thread operates within process context Use of threads effectively splits the process state
More informationPOSIX threads CS 241. February 17, Copyright University of Illinois CS 241 Staff
POSIX threads CS 241 February 17, 2012 Copyright University of Illinois CS 241 Staff 1 Recall: Why threads over processes? Creating a new process can be expensive Time A call into the operating system
More informationParallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially
More informationCS510 Operating System Foundations. Jonathan Walpole
CS510 Operating System Foundations Jonathan Walpole Threads & Concurrency 2 Why Use Threads? Utilize multiple CPU s concurrently Low cost communication via shared memory Overlap computation and blocking
More informationThread. Disclaimer: some slides are adopted from the book authors slides with permission 1
Thread Disclaimer: some slides are adopted from the book authors slides with permission 1 IPC Shared memory Recap share a memory region between processes read or write to the shared memory region fast
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 7 Threads Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ How many processes can a core
More informationCPSC 341 OS & Networks. Threads. Dr. Yingwu Zhu
CPSC 341 OS & Networks Threads Dr. Yingwu Zhu Processes Recall that a process includes many things An address space (defining all the code and data pages) OS resources (e.g., open files) and accounting
More informationC Grundlagen - Threads
Michael Strassberger saremox@linux.com Proseminar C Grundlagen Fachbereich Informatik Fakultaet fuer Mathematik, Informatik und Naturwissenschaften Universitaet Hamburg 3. Juli 2014 Table of Contents 1
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More information30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy
Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 8 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ How many partners can we cave for project:
More informationCS/CoE 1541 Final exam (Fall 2017). This is the cumulative final exam given in the Fall of Question 1 (12 points): was on Chapter 4
CS/CoE 1541 Final exam (Fall 2017). Name: This is the cumulative final exam given in the Fall of 2017. Question 1 (12 points): was on Chapter 4 Question 2 (13 points): was on Chapter 4 For Exam 2, you
More informationMulticore and Multiprocessor Systems: Part I
Chapter 3 Multicore and Multiprocessor Systems: Part I Max Planck Institute Magdeburg Jens Saak, Scientific Computing II 44/337 Symmetric Multiprocessing Definition (Symmetric Multiprocessing (SMP)) The
More informationCSCI4430 Data Communication and Computer Networks. Pthread Programming. ZHANG, Mi Jan. 26, 2017
CSCI4430 Data Communication and Computer Networks Pthread Programming ZHANG, Mi Jan. 26, 2017 Outline Introduction What is Multi-thread Programming Why to use Multi-thread Programming Basic Pthread Programming
More informationEPL372 Lab Exercise 2: Threads and pthreads. Εργαστήριο 2. Πέτρος Παναγή
EPL372 Lab Exercise 2: Threads and pthreads Εργαστήριο 2 Πέτρος Παναγή 1 Threads Vs Processes 2 Process A process is created by the operating system, and requires a fair amount of "overhead". Processes
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationMultithreaded Programming
Multithreaded Programming The slides do not contain all the information and cannot be treated as a study material for Operating System. Please refer the text book for exams. September 4, 2014 Topics Overview
More informationParallelism Marco Serafini
Parallelism Marco Serafini COMPSCI 590S Lecture 3 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November
More informationApplication Programming
Multicore Application Programming For Windows, Linux, and Oracle Solaris Darryl Gove AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationThreads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits
CS307 What is a thread? Threads A thread is a basic unit of CPU utilization contains a thread ID, a program counter, a register set, and a stack shares with other threads belonging to the same process
More informationCSci 4061 Introduction to Operating Systems. (Threads-POSIX)
CSci 4061 Introduction to Operating Systems (Threads-POSIX) How do I program them? General Thread Operations Create/Fork Allocate memory for stack, perform bookkeeping Parent thread creates child threads
More informationWhat is concurrency? Concurrency. What is parallelism? concurrency vs parallelism. Concurrency: (the illusion of) happening at the same time.
What is concurrency? Concurrency Johan Montelius KTH 2017 Concurrency: (the illusion of) happening at the same time. A property of the programing model. Why would we want to do things concurrently? What
More informationConcurrency. Johan Montelius KTH
Concurrency Johan Montelius KTH 2017 1 / 32 What is concurrency? 2 / 32 What is concurrency? Concurrency: (the illusion of) happening at the same time. 2 / 32 What is concurrency? Concurrency: (the illusion
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationProgramming with Shared Memory. Nguyễn Quang Hùng
Programming with Shared Memory Nguyễn Quang Hùng Outline Introduction Shared memory multiprocessors Constructs for specifying parallelism Creating concurrent processes Threads Sharing data Creating shared
More informationPROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec
PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationCS 261 Fall Mike Lam, Professor. Threads
CS 261 Fall 2017 Mike Lam, Professor Threads Parallel computing Goal: concurrent or parallel computing Take advantage of multiple hardware units to solve multiple problems simultaneously Motivations: Maintain
More informationThreads. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Threads Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong (jinkyu@skku.edu) Concurrency
More informationWarm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?
Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective? CS 470 Spring 2018 POSIX Mike Lam, Professor Multithreading & Pthreads MIMD
More informationLecture 4 Threads. (chapter 4)
Bilkent University Department of Computer Engineering CS342 Operating Systems Lecture 4 Threads (chapter 4) Dr. İbrahim Körpeoğlu http://www.cs.bilkent.edu.tr/~korpe 1 References The slides here are adapted/modified
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2019 Lecture 7 Threads Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Amdahls law example: Person
More informationProcesses. Johan Montelius KTH
Processes Johan Montelius KTH 2017 1 / 47 A process What is a process?... a computation a program i.e. a sequence of operations a set of data structures a set of registers means to interact with other
More informationProcesses, Threads, SMP, and Microkernels
Processes, Threads, SMP, and Microkernels Slides are mainly taken from «Operating Systems: Internals and Design Principles, 6/E William Stallings (Chapter 4). Some materials and figures are obtained from
More informationA Thread is an independent stream of instructions that can be schedule to run as such by the OS. Think of a thread as a procedure that runs
A Thread is an independent stream of instructions that can be schedule to run as such by the OS. Think of a thread as a procedure that runs independently from its main program. Multi-threaded programs
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationA process. the stack
A process Processes Johan Montelius What is a process?... a computation KTH 2017 a program i.e. a sequence of operations a set of data structures a set of registers means to interact with other processes
More informationShared Memory: Virtual Shared Memory, Threads & OpenMP
Shared Memory: Virtual Shared Memory, Threads & OpenMP Eugen Betke University of Hamburg Department Informatik Scientific Computing 09.01.2012 Agenda 1 Introduction Architectures of Memory Systems 2 Virtual
More informationThreads Tuesday, September 28, :37 AM
Threads_and_fabrics Page 1 Threads Tuesday, September 28, 2004 10:37 AM Threads A process includes an execution context containing Memory map PC and register values. Switching between memory maps can take
More informationChapter 4 Concurrent Programming
Chapter 4 Concurrent Programming 4.1. Introduction to Parallel Computing In the early days, most computers have only one processing element, known as the Central Processing Unit (CPU). Due to this hardware
More informationCSE 333 SECTION 9. Threads
CSE 333 SECTION 9 Threads HW4 How s HW4 going? Any Questions? Threads Sequential execution of a program. Contained within a process. Multiple threads can exist within the same process. Every process starts
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2019 Lecture 6 Processes Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Fork( ) causes a branch
More informationCMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationIntroduction to pthreads
CS 220: Introduction to Parallel Computing Introduction to pthreads Lecture 25 Threads In computing, a thread is the smallest schedulable unit of execution Your operating system has a scheduler that decides
More informationParallel Hybrid Computing Stéphane Bihan, CAPS
Parallel Hybrid Computing Stéphane Bihan, CAPS Introduction Main stream applications will rely on new multicore / manycore architectures It is about performance not parallelism Various heterogeneous hardware
More informationBarbara Chapman, Gabriele Jost, Ruud van der Pas
Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology
More informationCSE 306/506 Operating Systems Threads. YoungMin Kwon
CSE 306/506 Operating Systems Threads YoungMin Kwon Processes and Threads Two characteristics of a process Resource ownership Virtual address space (program, data, stack, PCB ) Main memory, I/O devices,
More informationParallelism. CS6787 Lecture 8 Fall 2017
Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does
More informationLecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the
More informationConcurrency, Thread. Dongkun Shin, SKKU
Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationPOSIX PTHREADS PROGRAMMING
POSIX PTHREADS PROGRAMMING Download the exercise code at http://www-micrel.deis.unibo.it/~capotondi/pthreads.zip Alessandro Capotondi alessandro.capotondi(@)unibo.it Hardware Software Design of Embedded
More informationCS420: Operating Systems
Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing
More informationCS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco
CS 326 Operating Systems C Programming Greg Benson Department of Computer Science University of San Francisco Why C? Fast (good optimizing compilers) Not too high-level (Java, Python, Lisp) Not too low-level
More informationThreads. studykorner.org
Threads Thread Subpart of a process Basic unit of CPU utilization Smallest set of programmed instructions, can be managed independently by OS No independent existence (process dependent) Light Weight Process
More informationOpenMP threading: parallel regions. Paolo Burgio
OpenMP threading: parallel regions Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks,
More informationECE 598 Advanced Operating Systems Lecture 23
ECE 598 Advanced Operating Systems Lecture 23 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 April 2016 Don t forget HW#9 Midterm next Thursday Announcements 1 Process States
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationHPCSE - I. «Introduction to multithreading» Panos Hadjidoukas
HPCSE - I «Introduction to multithreading» Panos Hadjidoukas 1 Processes and Threads POSIX Threads API Outline Thread management Synchronization with mutexes Deadlock and thread safety 2 Terminology -
More informationClaude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique
Got 2 seconds Sequential 84 seconds Expected 84/84 = 1 second!?! Got 25 seconds MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Séminaire MATHEMATIQUES
More informationCISC2200 Threads Spring 2015
CISC2200 Threads Spring 2015 Process We learn the concept of process A program in execution A process owns some resources A process executes a program => execution state, PC, We learn that bash creates
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationCS533 Concepts of Operating Systems. Jonathan Walpole
CS533 Concepts of Operating Systems Jonathan Walpole Introduction to Threads and Concurrency Why is Concurrency Important? Why study threads and concurrent programming in an OS class? What is a thread?
More informationMultiprocessors 2007/2008
Multiprocessors 2007/2008 Abstractions of parallel machines Johan Lukkien 1 Overview Problem context Abstraction Operating system support Language / middleware support 2 Parallel processing Scope: several
More informationMoore s Law. Multicore Programming. Vendor Solution. Power Density. Parallelism and Performance MIT Lecture 11 1.
Moore s Law 1000000 Intel CPU Introductions 6.172 Performance Engineering of Software Systems Lecture 11 Multicore Programming Charles E. Leiserson 100000 10000 1000 100 10 Clock Speed (MHz) Transistors
More information12:00 13:20, December 14 (Monday), 2009 # (even student id)
Final Exam 12:00 13:20, December 14 (Monday), 2009 #330110 (odd student id) #330118 (even student id) Scope: Everything Closed-book exam Final exam scores will be posted in the lecture homepage 1 Parallel
More informationConcurrent Programming
Concurrent Programming CS 485G-006: Systems Programming Lectures 32 33: 18 20 Apr 2016 1 Concurrent Programming is Hard! The human mind tends to be sequential The notion of time is often misleading Thinking
More informationHigh Performance Computing Lecture 21. Matthew Jacob Indian Institute of Science
High Performance Computing Lecture 21 Matthew Jacob Indian Institute of Science Semaphore Examples Semaphores can do more than mutex locks Example: Consider our concurrent program where process P1 reads
More information4.8 Summary. Practice Exercises
Practice Exercises 191 structures of the parent process. A new task is also created when the clone() system call is made. However, rather than copying all data structures, the new task points to the data
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationCS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control
Processes & Threads Concurrent Programs Process = Address space + one thread of control Concurrent program = multiple threads of control Multiple single-threaded processes Multi-threaded process 2 1 Concurrent
More informationCS-345 Operating Systems. Tutorial 2: Grocer-Client Threads, Shared Memory, Synchronization
CS-345 Operating Systems Tutorial 2: Grocer-Client Threads, Shared Memory, Synchronization Threads A thread is a lightweight process A thread exists within a process and uses the process resources. It
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationMulti-core Architectures. Dr. Yingwu Zhu
Multi-core Architectures Dr. Yingwu Zhu What is parallel computing? Using multiple processors in parallel to solve problems more quickly than with a single processor Examples of parallel computing A cluster
More informationPCS - Part Two: Multiprocessor Architectures
PCS - Part Two: Multiprocessor Architectures Institute of Computer Engineering University of Lübeck, Germany Baltic Summer School, Tartu 2008 Part 2 - Contents Multiprocessor Systems Symmetrical Multiprocessors
More informationHybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.
Hybrid MPI/OpenMP parallelization Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Thread parallelism (such as OpenMP or Pthreads) can provide additional parallelism
More information45-year CPU Evolution: 1 Law -2 Equations
4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University 1. Introduction 2. System Structures 3. Process Concept 4. Multithreaded Programming
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationLecture 10 Midterm review
Lecture 10 Midterm review Announcements The midterm is on Tue Feb 9 th in class 4Bring photo ID 4You may bring a single sheet of notebook sized paper 8x10 inches with notes on both sides (A4 OK) 4You may
More informationMulticore Hardware and Parallelism
Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3
More informationCOSC 6374 Parallel Computation. Analytical Modeling of Parallel Programs (I) Edgar Gabriel Fall Execution Time
COSC 6374 Parallel Computation Analytical Modeling of Parallel Programs (I) Edgar Gabriel Fall 2015 Execution Time Serial runtime T s : time elapsed between beginning and the end of the execution of a
More informationMultiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism
Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,
More informationCPU Scheduling (Part II)
CPU Scheduling (Part II) Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) CPU Scheduling 1393/7/28 1 / 58 Motivation Amir H. Payberah
More informationCarnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition
Carnegie Mellon 1 Concurrent Programming 15-213: Introduction to Computer Systems 23 rd Lecture, Nov. 13, 2018 2 Concurrent Programming is Hard! The human mind tends to be sequential The notion of time
More information