Parallelism Marco Serafini

Size: px
Start display at page:

Download "Parallelism Marco Serafini"

Transcription

1 Parallelism Marco Serafini COMPSCI 590S Lecture 3

2 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November 5, 4-7 pm Campus Center Auditorium Recruiting and industry engagement event 2

3 Why multi-core architectures? 3

4 Multi-Cores We have talked about multi-core architectures Why do we actually use multi-cores? Why not a single core? 4

5 Maximum Clock Rate is Stagnating Two major laws are collapsing Moore s law Dennard scaling Source: 5

6 Moore s Law Density of transistors in an integrated circuit doubles every two years. Smaller à changes propagate faster Exponential axis So far so good, but the trend is slowing down and it won t last for long (Intel s prediction: until 2021 unless new technologies arise) [1] [1] 6

7 Dennard Scaling Reducing transistor size does not increase power density à power consumption proportional to chip area Stopped holding around 2006 Assumptions break when physical system close to limit Post-Dennard-scaling world of today Huge cooling and power consumption issues If we kept the same clock frequency trends, today a CPU would have the power density of a nuclear reactor 7

8 Heat Dissipation Problem Large datacenters consume energy like large cities Cooling is the main cost factor Columbia River valley (2006) Luleå (2015) 8

9 Where is Luleå? 9

10 Possible Solutions Dynamic Voltage and Frequency Scaling (DVFS) E.g. Intel s TurboBoost Only works under low load Use part of the chip for coprocessors (e.g. graphics) Lower power consumption Limited number of generic functionalities to offload 10

11 More Solutions Multicores Replace 1 powerful core with multiple weaker cores on a chip SIMD Single Instruction Multiple Data A massive number of cores with reduced flexibility FPGAs Dedicated hardware designed for a specific task 11

12 Multi-Core processors Idea: scale computational power linearly Instead of a single 5 GHz core, 2 * 2.5 GHz cores Scale heat dissipation linearly k cores have ~ k times the heat dissipation of a single core Increasing frequency of a single core by k times creates superlinear heat dissipation increase 12

13 Memory Bandwidth Bottleneck Cores compete for the same main memory bus Caches help in two ways They reduce latency (as we have discussed) They also increase throughput by avoiding bus contention 13

14 How to Leverage Multicores Run multiple tasks in parallel Multiprocessing Multithreading E.g. PCs have many parallel background apps OS, music, antivirus, web browser, How to parallelize one app is not trivial Embarrassingly parallel tasks Can be run by multiple threads No coordination 14

15 SIMD Processors Single Instruction Multiple Data (SIMD) processors Example Graphical Processing Units (GPUs) Intel Phi coprocessors Q: Possible SIMD snippets for i in [0,n-1] do v[i] = v[i] * pi for i in [0,n-1] do if v[i] < 0.01 then v[i] = 0 15

16 Automatic Parallelization? Holy grail in the multi-processor era Approaches Programming languages Systems with APIs that help express parallelism Efficient coordination mechanisms 16

17 Processes vs. Threads 17

18 Processes & Threads We have discussed that multicores is the future How to make use of parallelism? OS/PL support for parallel programming Processes Threads 18

19 Processes vs. Threads Process: separate memory space Thread: shared memory space (except stack) Processes Threads Heap not shared shared Global variables not shared shared Local variables (Stack) not shared not shared Code shared shared File handles not shared shared 19

20 Parallel Programming Shared memory Threads Access same memory locations (in heap & global variables) Message-Passing Processes Explicit communication: message-passing 20

21 Shared Memory

22 Shared Memory Example void main (){ x = 12; // assume that x is a global variable t = new ThreadX(); t.start(); // starts thread t y = 12/x; System.out.println(y); t.join(); // wait until t completes } class ThreadX extends Thread{ void run (){ x = 0; } } This is pseudo-java in C++: pthread_create pthread_join Question: What is printed as output? 22

23 Desired: Atomicity Thread a foo() Thread b foo() void foo (){ x = 0; x = 1; y = 1/x; } foo should be atomic, in the sense of indivisible (ancient Greek) DESIRED Thread a Thread b POSSIBLE Thread a Thread b x = 0 x = 0 x = 1 x = 1 y = 1 time x = 0 x = 1 y = 1 happensbefore changes become visible y = 1/0 x = 0 23

24 Race Condition Non-deterministic access to shared variables Correctness requires specific sequence of accesses But we cannot rely on it because of non-determinism! Solutions Enforce a specific order using synchronization Enforce a sequence of happen-before relationships Locks, mutexes, semaphores: threads block each other Lock-free algorithms: threads do not wait for each other Hard to implement correctly! Typical programmer uses locks Java has optimized data structures with thread-safety, e.g., ConcurrentHashMap 24

25 Locks Thread a l.lock() foo() l.unlock() Thread b l.lock() foo() l.unlock() void foo (){ x = 0; x ++; y = 1/x; } We use a lock variable l and use it to synchronize Equivalent: declare void synchronized foo() Impossible now Thread a Thread b Possible Thread a Thread b x = 0 x = 1 x = 0 l.lock() foo() l.unlock() l.lock() - waits l.lock() - acquires foo() l.unlock() time 25

26 Deadlock Thread a l1.lock() l2.lock() foo() l1.unlock() l2.unlock() Thread b l2.lock() l1.lock() foo() l2.unlock() l1.unlock() Question: What can go wrong? 26

27 Requirements for a Deadlock Mutual exclusion: resources (locks) held and nonshareable Hold and wait: hold a resource and request another No preemption: can unlock only when holding Circular wait: chain of threads waiting each other Question: Simple solution? All threads acquire locks in same order 27

28 Notify / Wait Thread a synchronized(o){ o.wait(); foo(); } Thread b synchronized(o){ foo(); o.notify(); } Thread a o.wait() Thread a waits o.wait() foo() Thread b foo() o.notify() Notify on an object sends a signal that activates other threads waiting on that object This code guarantees that Thread b executes foo before Thread a 28

29 What About Cache Coherency? Cache coherency ensures atomicity for Single instructions Single cache lines In reality Different variables may reside on different cache lines A variable may be accessed across multiple instructions Single high-level instructions may compile to multiple low-level ones Example: a++ in C may compile to load (a, r0); r0 = r0 + 1; store(r0, a) That s why we need locks Main lesson learned from cache coherency discussion: you should partition data 29

30 Challenges with Multi-Threading Correctness Heisenbugs: Non-deterministic bugs that appear only in certain conditions. Hard to reproduce à Hard to debug Performance Understanding concurrency bottlenecks is hard! Waiting time does not show up in profilers (only CPU time) Load-balance Make sure all cores work all the time and do not wait 30

31 Critical Path t1 t1 t2 t3 start multiple threads one step each Coordination (barrier) makes load balancing harder Critical path: Maximum sequential path (thread t1, 10 steps) 9 extra steps t1 t1 wait for all threads to complete (barrier) 31

32 Message Passing

33 Message Passing Processes communicate by exchanging messages Sockets: Communication endpoints On a network: UDP sockets, TCP sockets Internal to a node: Inter-Process Communication (IPC) Different technologies but similar abstractions 33

34 Building a Message Serialization Message content stored at random locations in RAM They need to be packed into a byte array to be sent Deserialization Receive the byte array Rebuild the original variable Pointers do not make sense anymore across nodes! 34

35 Example: Serializing a Binary Tree 10 Question: How to serialize it? Possible solution DFS Mark null pointers with -1 null 5 null null 12 null How to deserialize? 35

36 Threads + Message Passing Client-server model Client sends requests Server computes replies and sends them back Threads often used to hide latency Each client request is handled by a thread The request might wait for resources (e.g. I/O) Other threads execute other requests in the meanwhile 36

37 Processes in Different Languages Java (interpreted) The Java Virtual Machine (interpreter) is a process Creating a new process entails creating a new JVM ProcessBuilder C/C++ (compiled) OS-specific details of how processes can be generated Typical command: fork() Creates a child process, which executes instruction after fork() Child process is a full copy of the parent More on forking later 37

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Multithreading Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to large compute clusters Can perform multiple

More information

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on

More information

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University CS 571 Operating Systems Midterm Review Angelos Stavrou, George Mason University Class Midterm: Grading 2 Grading Midterm: 25% Theory Part 60% (1h 30m) Programming Part 40% (1h) Theory Part (Closed Books):

More information

Parallel Programming Multicore systems

Parallel Programming Multicore systems FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have

More information

Parallelization and Synchronization. CS165 Section 8

Parallelization and Synchronization. CS165 Section 8 Parallelization and Synchronization CS165 Section 8 Multiprocessing & the Multicore Era Single-core performance stagnates (breakdown of Dennard scaling) Moore s law continues use additional transistors

More information

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011 Synchronization CS61, Lecture 18 Prof. Stephen Chong November 3, 2011 Announcements Assignment 5 Tell us your group by Sunday Nov 6 Due Thursday Nov 17 Talks of interest in next two days Towards Predictable,

More information

CS 261 Fall Mike Lam, Professor. Threads

CS 261 Fall Mike Lam, Professor. Threads CS 261 Fall 2017 Mike Lam, Professor Threads Parallel computing Goal: concurrent or parallel computing Take advantage of multiple hardware units to solve multiple problems simultaneously Motivations: Maintain

More information

THREADS & CONCURRENCY

THREADS & CONCURRENCY 27/04/2018 Sorry for the delay in getting slides for today 2 Another reason for the delay: Yesterday: 63 posts on the course Piazza yesterday. A7: If you received 100 for correctness (perhaps minus a late

More information

High Performance Computing Course Notes Shared Memory Parallel Programming

High Performance Computing Course Notes Shared Memory Parallel Programming High Performance Computing Course Notes 2009-2010 2010 Shared Memory Parallel Programming Techniques Multiprocessing User space multithreading Operating system-supported (or kernel) multithreading Distributed

More information

Kernel Synchronization I. Changwoo Min

Kernel Synchronization I. Changwoo Min 1 Kernel Synchronization I Changwoo Min 2 Summary of last lectures Tools: building, exploring, and debugging Linux kernel Core kernel infrastructure syscall, module, kernel data structures Process management

More information

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions CMSC 330: Organization of Programming Languages Multithreaded Programming Patterns in Java CMSC 330 2 Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to

More information

Parallelism and Concurrency. COS 326 David Walker Princeton University

Parallelism and Concurrency. COS 326 David Walker Princeton University Parallelism and Concurrency COS 326 David Walker Princeton University Parallelism What is it? Today's technology trends. How can we take advantage of it? Why is it so much harder to program? Some preliminary

More information

High Performance Computing Lecture 21. Matthew Jacob Indian Institute of Science

High Performance Computing Lecture 21. Matthew Jacob Indian Institute of Science High Performance Computing Lecture 21 Matthew Jacob Indian Institute of Science Semaphore Examples Semaphores can do more than mutex locks Example: Consider our concurrent program where process P1 reads

More information

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019 CS 31: Introduction to Computer Systems 22-23: Threads & Synchronization April 16-18, 2019 Making Programs Run Faster We all like how fast computers are In the old days (1980 s - 2005): Algorithm too slow?

More information

PRACE Autumn School Basic Programming Models

PRACE Autumn School Basic Programming Models PRACE Autumn School 2010 Basic Programming Models Basic Programming Models - Outline Introduction Key concepts Architectures Programming models Programming languages Compilers Operating system & libraries

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 The Process Concept 2 The Process Concept Process a program in execution

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio Fall 2017 1 Outline Inter-Process Communication (20) Threads

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 Process creation in UNIX All processes have a unique process id getpid(),

More information

Concurrency: State Models & Design Patterns

Concurrency: State Models & Design Patterns Concurrency: State Models & Design Patterns Practical Session Week 02 1 / 13 Exercises 01 Discussion Exercise 01 - Task 1 a) Do recent central processing units (CPUs) of desktop PCs support concurrency?

More information

Application Programming

Application Programming Multicore Application Programming For Windows, Linux, and Oracle Solaris Darryl Gove AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris

More information

Parallel Computing Concepts. CSInParallel Project

Parallel Computing Concepts. CSInParallel Project Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................

More information

Computer Architecture Crash course

Computer Architecture Crash course Computer Architecture Crash course Frédéric Haziza Department of Computer Systems Uppsala University Summer 2008 Conclusions The multicore era is already here cost of parallelism is dropping

More information

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Operating Systems Lecture 4 - Concurrency and Synchronization Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Mutual exclusion Hardware solutions Semaphores IPC: Message passing

More information

EECS 482 Introduction to Operating Systems

EECS 482 Introduction to Operating Systems EECS 482 Introduction to Operating Systems Winter 2018 Baris Kasikci Slides by: Harsha V. Madhyastha http://knowyourmeme.com/memes/mind-blown 2 Recap: Processes Hardware interface: app1+app2+app3 CPU +

More information

CSE 374 Programming Concepts & Tools

CSE 374 Programming Concepts & Tools CSE 374 Programming Concepts & Tools Hal Perkins Fall 2017 Lecture 22 Shared-Memory Concurrency 1 Administrivia HW7 due Thursday night, 11 pm (+ late days if you still have any & want to use them) Course

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists

More information

Threads Tuesday, September 28, :37 AM

Threads Tuesday, September 28, :37 AM Threads_and_fabrics Page 1 Threads Tuesday, September 28, 2004 10:37 AM Threads A process includes an execution context containing Memory map PC and register values. Switching between memory maps can take

More information

Concurrent Programming with Threads: Why you should care deeply

Concurrent Programming with Threads: Why you should care deeply Concurrent Programming with Threads: Why you should care deeply Don Porter Portions courtesy Emmett Witchel Performance (vs. VAX-11/780) Uniprocessor Performance Not Scaling 10000 20% /year 1000 52% /year

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29 Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

FYS Data acquisition & control. Introduction. Spring 2018 Lecture #1. Reading: RWI (Real World Instrumentation) Chapter 1.

FYS Data acquisition & control. Introduction. Spring 2018 Lecture #1. Reading: RWI (Real World Instrumentation) Chapter 1. FYS3240-4240 Data acquisition & control Introduction Spring 2018 Lecture #1 Reading: RWI (Real World Instrumentation) Chapter 1. Bekkeng 14.01.2018 Topics Instrumentation: Data acquisition and control

More information

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today

More information

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

Principles of Software Construction: Objects, Design, and Concurrency. The Perils of Concurrency Can't live with it. Cant live without it.

Principles of Software Construction: Objects, Design, and Concurrency. The Perils of Concurrency Can't live with it. Cant live without it. Principles of Software Construction: Objects, Design, and Concurrency The Perils of Concurrency Can't live with it. Cant live without it. Spring 2014 Charlie Garrod Christian Kästner School of Computer

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved

More information

Systems software design. Processes, threads and operating system resources

Systems software design. Processes, threads and operating system resources Systems software design Processes, threads and operating system resources Who are we? Krzysztof Kąkol Software Developer Jarosław Świniarski Software Developer Presentation based on materials prepared

More information

UNIT:2. Process Management

UNIT:2. Process Management 1 UNIT:2 Process Management SYLLABUS 2.1 Process and Process management i. Process model overview ii. Programmers view of process iii. Process states 2.2 Process and Processor Scheduling i Scheduling Criteria

More information

Computation Abstractions. Processes vs. Threads. So, What Is a Thread? CMSC 433 Programming Language Technologies and Paradigms Spring 2007

Computation Abstractions. Processes vs. Threads. So, What Is a Thread? CMSC 433 Programming Language Technologies and Paradigms Spring 2007 CMSC 433 Programming Language Technologies and Paradigms Spring 2007 Threads and Synchronization May 8, 2007 Computation Abstractions t1 t1 t4 t2 t1 t2 t5 t3 p1 p2 p3 p4 CPU 1 CPU 2 A computer Processes

More information

THREAD LEVEL PARALLELISM

THREAD LEVEL PARALLELISM THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture

More information

COMP 530: Operating Systems Concurrent Programming with Threads: Why you should care deeply

COMP 530: Operating Systems Concurrent Programming with Threads: Why you should care deeply Concurrent Programming with Threads: Why you should care deeply Don Porter Portions courtesy Emmett Witchel 1 Uniprocessor Performance Not Scaling Performance (vs. VAX-11/780) 10000 1000 100 10 1 20% /year

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018 CS 31: Intro to Systems Threading & Parallel Applications Kevin Webb Swarthmore College November 27, 2018 Reading Quiz Making Programs Run Faster We all like how fast computers are In the old days (1980

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

CMSC 132: Object-Oriented Programming II

CMSC 132: Object-Oriented Programming II CMSC 132: Object-Oriented Programming II Synchronization in Java Department of Computer Science University of Maryland, College Park Multithreading Overview Motivation & background Threads Creating Java

More information

CS3733: Operating Systems

CS3733: Operating Systems Outline CS3733: Operating Systems Topics: Synchronization, Critical Sections and Semaphores (SGG Chapter 6) Instructor: Dr. Tongping Liu 1 Memory Model of Multithreaded Programs Synchronization for coordinated

More information

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture

More information

THREADS & CONCURRENCY

THREADS & CONCURRENCY 4/26/16 Announcements BRING YOUR CORNELL ID TO THE PRELIM. 2 You need it to get in THREADS & CONCURRENCY Prelim 2 is next Tonight BRING YOUR CORNELL ID! A7 is due Thursday. Our Heap.java: on Piazza (A7

More information

THREADS: (abstract CPUs)

THREADS: (abstract CPUs) CS 61 Scribe Notes (November 29, 2012) Mu, Nagler, Strominger TODAY: Threads, Synchronization - Pset 5! AT LONG LAST! Adversarial network pong handling dropped packets, server delays, overloads with connection

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples

More information

Why do we care about parallel?

Why do we care about parallel? Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The

More information

The University of Texas at Arlington

The University of Texas at Arlington The University of Texas at Arlington Lecture 10: Threading and Parallel Programming Constraints CSE 5343/4342 Embedded d Systems II Objectives: Lab 3: Windows Threads (win32 threading API) Convert serial

More information

Operating Systems, Fall Lecture 9, Tiina Niklander 1

Operating Systems, Fall Lecture 9, Tiina Niklander 1 Multiprocessor Systems Multiple processor systems Ch 8.1 8.3 1 Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message passing multiprocessor,

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Programmable NICs. Lecture 14, Computer Networks (198:552)

Programmable NICs. Lecture 14, Computer Networks (198:552) Programmable NICs Lecture 14, Computer Networks (198:552) Network Interface Cards (NICs) The physical interface between a machine and the wire Life of a transmitted packet Userspace application NIC Transport

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS3350B Computer Architecture Winter 2015 Lecture 7.2: Multicore TLP (1) Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson & Hennessy,

More information

Threads & Concurrency

Threads & Concurrency 2 Due date of A7 Due d About A5-A6 We have changed the due date of A7 Friday, 28 April. Threads & Concurrency But the last date to submit A7 remains the same: 29 April. We make the last date be 29 April

More information

Parallel Computing. Prof. Marco Bertini

Parallel Computing. Prof. Marco Bertini Parallel Computing Prof. Marco Bertini Modern CPUs Historical trends in CPU performance From Data processing in exascale class computer systems, C. Moore http://www.lanl.gov/orgs/hpc/salishan/salishan2011/3moore.pdf

More information

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing

More information

CMSC 330: Organization of Programming Languages. Concurrency & Multiprocessing

CMSC 330: Organization of Programming Languages. Concurrency & Multiprocessing CMSC 330: Organization of Programming Languages Concurrency & Multiprocessing Multiprocessing Multiprocessing: The use of multiple parallel computations We have entered an era of multiple cores... Hyperthreading

More information

Processor speed. Concurrency Structure and Interpretation of Computer Programs. Multiple processors. Processor speed. Mike Phillips <mpp>

Processor speed. Concurrency Structure and Interpretation of Computer Programs. Multiple processors. Processor speed. Mike Phillips <mpp> Processor speed 6.037 - Structure and Interpretation of Computer Programs Mike Phillips Massachusetts Institute of Technology http://en.wikipedia.org/wiki/file:transistor_count_and_moore%27s_law_-

More information

Test and Verification Solutions

Test and Verification Solutions Test and Verification Solutions Have you noticed? Even the Hardware is changing Mike Bartley, TVS 1 Agenda The rise of multicore computing The types of distributed computing Where is this stuff used? Why

More information

Threads & Concurrency

Threads & Concurrency Threads & Concurrency Lecture 24 CS2110 Spring 2017 Due date of A7 Due d About A5-A6 2 We have changed the due date of A7 Friday, 28 April. But the last date to submit A7 remains the same: 29 April. We

More information

Chapter 7: Deadlocks. Operating System Concepts 9 th Edition

Chapter 7: Deadlocks. Operating System Concepts 9 th Edition Chapter 7: Deadlocks Silberschatz, Galvin and Gagne 2013 Chapter 7: Deadlocks System Model Deadlock Characterization Methods for Handling Deadlocks Deadlock Prevention Deadlock Avoidance Deadlock Detection

More information

! Readings! ! Room-level, on-chip! vs.!

! Readings! ! Room-level, on-chip! vs.! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads

More information

INF 212 ANALYSIS OF PROG. LANGS CONCURRENCY. Instructors: Crista Lopes Copyright Instructors.

INF 212 ANALYSIS OF PROG. LANGS CONCURRENCY. Instructors: Crista Lopes Copyright Instructors. INF 212 ANALYSIS OF PROG. LANGS CONCURRENCY Instructors: Crista Lopes Copyright Instructors. Basics Concurrent Programming More than one thing at a time Examples: Network server handling hundreds of clients

More information

Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"

Lecture 28 Multicore, Multithread Suggested reading: (H&P Chapter 7.4) Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)" 1" Processor components" Multicore processors and programming" Processor comparison" CSE 30321 - Lecture 01 - vs." Goal: Explain

More information

CS510 Advanced Topics in Concurrency. Jonathan Walpole

CS510 Advanced Topics in Concurrency. Jonathan Walpole CS510 Advanced Topics in Concurrency Jonathan Walpole Threads Cannot Be Implemented as a Library Reasoning About Programs What are the valid outcomes for this program? Is it valid for both r1 and r2 to

More information

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem? What is the Race Condition? And what is its solution? Race Condition: Where several processes access and manipulate the same data concurrently and the outcome of the execution depends on the particular

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

Lec 26: Parallel Processing. Announcements

Lec 26: Parallel Processing. Announcements Lec 26: Parallel Processing Kavita Bala CS 341, Fall 28 Computer Science Cornell University Announcements Pizza party Tuesday Dec 2, 6:3-9: Location: TBA Final project (parallel ray tracer) out next week

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation

More information

Parallel Computing. Parallel Computing. Hwansoo Han

Parallel Computing. Parallel Computing. Hwansoo Han Parallel Computing Parallel Computing Hwansoo Han What is Parallel Computing? Software with multiple threads Parallel vs. concurrent Parallel computing executes multiple threads at the same time on multiple

More information

Dr Markus Hagenbuchner CSCI319. Distributed Systems Chapter 3 - Processes

Dr Markus Hagenbuchner CSCI319. Distributed Systems Chapter 3 - Processes Dr Markus Hagenbuchner markus@uow.edu.au CSCI319 Distributed Systems Chapter 3 - Processes CSCI319 Chapter 3 Page: 1 Processes Lecture notes based on the textbook by Tannenbaum Study objectives: 1. Understand

More information

The University of Texas at Arlington

The University of Texas at Arlington The University of Texas at Arlington Lecture 6: Threading and Parallel Programming Constraints CSE 5343/4342 Embedded Systems II Based heavily on slides by Dr. Roger Walker More Task Decomposition: Dependence

More information

Lecture 27 Programming parallel hardware" Suggested reading:" (see next slide)"

Lecture 27 Programming parallel hardware Suggested reading: (see next slide) Lecture 27 Programming parallel hardware" Suggested reading:" (see next slide)" 1" Suggested Readings" Readings" H&P: Chapter 7 especially 7.1-7.8" Introduction to Parallel Computing" https://computing.llnl.gov/tutorials/parallel_comp/"

More information

COMP 322: Fundamentals of Parallel Programming

COMP 322: Fundamentals of Parallel Programming COMP 322: Fundamentals of Parallel Programming! Lecture 1: The What and Why of Parallel Programming; Task Creation & Termination (async, finish) Vivek Sarkar Department of Computer Science, Rice University

More information

Parallelism. Parallel Hardware. Introduction to Computer Systems

Parallelism. Parallel Hardware. Introduction to Computer Systems Parallelism We have been discussing the abstractions and implementations that make up an individual computer system in considerable detail up to this point. Our model has been a largely sequential one,

More information

Multiprocessor Systems Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message pas

Multiprocessor Systems Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message pas Multiple processor systems 1 Multiprocessor Systems Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message passing multiprocessor, access

More information

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5 Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5 Multiple Processes OS design is concerned with the management of processes and threads: Multiprogramming Multiprocessing Distributed processing

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

Cloud Computing CS

Cloud Computing CS Cloud Computing CS 15-319 Programming Models- Part I Lecture 4, Jan 25, 2012 Majd F. Sakr and Mohammad Hammoud Today Last 3 sessions Administrivia and Introduction to Cloud Computing Introduction to Cloud

More information

CS 3305 Intro to Threads. Lecture 6

CS 3305 Intro to Threads. Lecture 6 CS 3305 Intro to Threads Lecture 6 Introduction Multiple applications run concurrently! This means that there are multiple processes running on a computer Introduction Applications often need to perform

More information

Process Synchronisation (contd.) Deadlock. Operating Systems. Spring CS5212

Process Synchronisation (contd.) Deadlock. Operating Systems. Spring CS5212 Operating Systems Spring 2009-2010 Outline Process Synchronisation (contd.) 1 Process Synchronisation (contd.) 2 Announcements Presentations: will be held on last teaching week during lectures make a 20-minute

More information

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II)

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II) SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II) Shan He School for Computational Science University of Birmingham Module 06-19321: SSC Outline Outline of Topics

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

Parallel Programming: Background Information

Parallel Programming: Background Information 1 Parallel Programming: Background Information Mike Bailey mjb@cs.oregonstate.edu parallel.background.pptx Three Reasons to Study Parallel Programming 2 1. Increase performance: do more work in the same

More information

Parallel Programming: Background Information

Parallel Programming: Background Information 1 Parallel Programming: Background Information Mike Bailey mjb@cs.oregonstate.edu parallel.background.pptx Three Reasons to Study Parallel Programming 2 1. Increase performance: do more work in the same

More information

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective? Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective? CS 470 Spring 2018 POSIX Mike Lam, Professor Multithreading & Pthreads MIMD

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

Other consistency models

Other consistency models Last time: Symmetric multiprocessing (SMP) Lecture 25: Synchronization primitives Computer Architecture and Systems Programming (252-0061-00) CPU 0 CPU 1 CPU 2 CPU 3 Timothy Roscoe Herbstsemester 2012

More information

CS 3723 Operating Systems: Final Review

CS 3723 Operating Systems: Final Review CS 3723 Operating Systems: Final Review Instructor: Dr. Tongping Liu Lecture Outline High-level synchronization structure: Monitor Pthread mutex Conditional variables Barrier Threading Issues 1 2 Monitors

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information