Multi-core Parallelization in Clojure - a Case Study

Size: px
Start display at page:

Download "Multi-core Parallelization in Clojure - a Case Study"

Transcription

1 Multi-core Parallelization in Clojure - a Case Study Johann M. Kraus and Hans A. Kestler AG Bioinformatics and Systems Biology Institute of Neural Information Processing University of Ulm

2 Outline 1. Concepts of parallel programming 2. Short introduction to Clojure 3. Multi-core parallel K-means - the case study 4. Analysis and Results 5. Summary

3 Parallel Programming Definition: Parallel programming is a form of programming where many calculations are performed simultaneously. Physical constraints prevent frequency scaling of processors This led to an increasing interest in parallel hardware and parallel programming Multi-core hardware is standard on desktop computers Parallel software can use this hardware to the full capacity

4 Large problems are divided into smaller ones and the subproblems are solved simultaneously Speedup S is limited by the fraction of parallelizable code P Amdahl s law: S = 1 1 P + P N Amdahl's law Speedup Fraction of parallelizable code 0.95 % 0.90 % 0.75 % 0.50 % Number of processors

5 Concepts of Parallel Programming Explicit vs. implicit parallelization Explicitly define communication and synchronization details for each task: MPI Java Threads Functional programming allows implicit parallelization: Parallel processing of functions Functions are free of side-effects Data is immutable

6 Distributed vs. local hardware Master - Slave parallelization (e.g. Message Passing Interface) Shared memory parallelization (e.g. Open Multi-Processing) Master CPU 0 Slave 0 Slave 1 Slave 2 CPU 4 Shared Memory CPU 1 Slave 3 Slave 4 CPU 3 CPU2 send data send result read write

7 Thread programming Threads are refinements of a process that share the same memory and can be processed separately and simultaneously Available in many languages, e.g. PThreads (C), Java Threads (Java), OpenMP Threads (C, Fortran) Execution of threads is handled by a scheduler that manages the available processing time Communication between threads is faster than communication between processes new start runnable schedule awake waiting Invoking threads is also faster than fork/join processes terminated end running block

8 Concurrency control via locking and synchronizing Concurrency control ensures that threads can access shared memory without violating data integrity The most popular approach to concurrency is locking and synchronizing public class Counter { private int value = 0 ; public synchronized void incr{ value = value + 1 ; } } Counter counter = new Counter ( ) ; counter. i n c r ( ) ; Problems might occur when using too many locks, too few locks, wrong locks, or locks in the wrong order Using locks can be fatally error-prone, e.g. dead-locks

9 Concurrency control via transactional memory Transactional memory offers a flexible alternative to lock-based concurrency control Functionality is analogous to controlling simultaneous access to database management systems Transactions ensure properties: Atomicity: Either all changes of a transaction occur or none do Consistency: Only valid changes are committed Isolation: No transaction sees the effect of other transactions Durability: Changes from transactions will be persistent

10 Software transactional memory maps transactional memory to concurrency control in parallel programming TIME :Transaction 0 :Data :Transaction 1 get data get data [consistent data] send modified data [consistent data] send modified data get data [consistent data] send modified data

11 Clojure Functional programming language hosted on the JVM Extends the code-as-data paradigm to maps and vectors Based on immutable data structures Provides built-in concurrency support via software transactional memory Completely symbiotic to Java, e.g. easy access to Java libraries Platform independent

12 Java interaction ( import ( c e r n. j e t. random. sampling RandomSamplingAssistant ) ) ( defn sample [ n k ] ( seq (. RandomSamplingAssistant ( samplearray k ( i n t a r r a y ( range n ) ) ) ) ) ) Dynamic typing and multi-methods An object is defined as the sum of what it can do (methods), rather than the sum of what it is (type hierarchy) Add type hints to speed up code ( defn da+ [#ˆ doubles a s #ˆdoubles bs ] (amap as i r e t (+ ( aget as i ) ( aget bs i ) ) ) )

13 Transactional references and STM Transactional references ensure safe coordinated synchronous changes to mutable storage locations Are bound to a single storage location for their lifetime Only allow mutation of that location to occur within transactions Available operations are ref-set, alter, and commute No explicit locking is required ( def counter ( ref 0 ) ) ( dosync ( alter counter inc ))

14 Agents Agents allow independent asynchronous change of mutable locations Are bound to a single storage location for their lifetime Only allow mutation of that location to a new state to occur as a result of an action Actions are functions that are asynchronously applied to the state of an Agent The return value of an action becomes new state of the Agent Agents are integrated with the STM ( def counter ( agent 0 ) ) ( send counter inc )

15 Cluster analysis Given a data set X compute a partition of X into k disjoint clusters C, such that: (1) How many clusters are in the data set? k i=1 C i = X (2) C i and C i C j = 3 cluster 9 cluster

16 Cluster algorithms For all possible partitions evaluate the objective function f and search the optimum. The cardinality of the set of all possible partitions is given by: Stirling numbers of the second kind S k N = 1 k! k ( ) ( 1) k i k i i=0 Runtime (nanosecond) i N Number of data points Number of clusters Cluster algorithms provide a heuristic for this search: Partitional clustering (K-means, Neuralgas, SOM, Fuzzy C-means,...) Hierarchical clustering (Divisive/agglomerative, Complete linkage,...) Graph-based clustering (Spectral clustering, NMF, Affinity propagation,...) Model-based clustering, Biclustering, Semi-supervised clustering

17 K-means algorithm Function KMeans Input : X = { x 1,..., x n } ( Data to be c l u s t e r e d ) k ( Number o f c l u s t e r s ) Output : C = { c 1,..., c k } ( C l u s t e r c e n t r o i d s ) m: X > C ( C l u s t e r a s s i g n m e n t s ) I n i t i a l i z e C ( e. g. random s e l e c t i o n from X) While C has changed For each x i i n X m( x i ) = a r g m i n j d i s t a n c e ( x i, c j) End For each c j i n C c j = c e n t r o i d ({ x i m( x i ) = j }) End End

18 Cluster Validation Evaluation requires repeated runs of clustering, e.g.: Resampled data sets Different parameters MCA-index: mean proportion of samples being consistent over different clusterings MCA = 1 n max π k i=1 A i B j

19 Estimation of the expected value of a validation index Random label: randomly assign each item to a cluster k Random partition: choose a random partition Random prototype: assign each item to its next prototype mean mca index Mean value from 100 runs cluster

20 Multi-core K-means with Clojure Split the data set into smaller pieces that are handled by agents Each cluster is represented by an agent Add a commutative list of cluster members within a transactional reference to accelerate the centroid update step Data Agent 0 Data Agent 1 Data Agent 2 Data Agent 3 Data Agent n Cluster Agent 0 Member Ref 0 Cluster Agent 1 Member Ref 1 Cluster Agent k Member Ref k read write

21 simultaneous read Cluster Agent 0 Data Agent 0 Cluster Agent 1 Data Agent 1 Cluster Agent k Data Agent n simultaneous write Member Ref 0 Data Agent 0 Member Ref 1 Data Agent 1 Member Ref 2 Data Agent n

22 read: (nearest-cluster) write: (commute) (assoc) ( defn assignment [ ] (map #(send % update dataagent ) DataAgents ) ( defn update dataagent [ d a t a p o i n t s ] (map update d a t a p o i n t d a t a p o i n t s ) ) ( defn update d a t a p o i n t [ d a t a p o i n t ] ( l e t [ newass ( n e a r e s t c l u s t e r d a t a p o i n t ) ] ( dosync (commute ( nth MemberRefs newass ) conj ( : data d a t a p o i n t ) ) ) ( assoc d a t a p o i n t : assignment newass ) ) )

23 Benchmark results Large data sets (artificial): Each data point is sampled from N(0,1) Summary for 10 runs of K-means cases, 100 dimensions 20 Cluster cases, 200 dimensions 20 Cluster runtime (seconds) runtime (minutes) ParaKMeans K-means R McKmeans K-means R McKmeans

24 Number of computer cores used Number of data agents used x cluster x cluster runtime (seconds) runtime (seconds) number of computer cores number of data agents

25 Large data sets with cluster structure Data sampled from a multi-variate normal distribution samples, 200/500 dimensions, 10/20 cluster runtime (seconds) K-means R McKmeans 200 / / / / / / / / 20 Number of samples / Number of clusters

26 Accuracy compared to the known grouping of data Measured with the MCA index Red bars indicate the random-prototype baseline x cluster x cluster x cluster x cluster MCA index McKmeans K-means R McKmeans K-means R McKmeans K-means R McKmeans K-means R

27 Real world data set Microarray data (Radiation-induced changes in human gene expression) samples (genes) and 465 features (profiles) K-means R McKmeans runtime (seconds) Cluster 5 Cluster 10 Cluster 20 Cluster 2 Cluster 5 Cluster 10 Cluster 20 Cluster Number of clusters Smirnov D, Morley M, Shin E, Spielman R, Cheung V: Genetic analysis of radiation-induced changes in human gene expression. Nature 2009, 459:

28 Application to Cluster Number Estimation Repeated clustering with different subsets of data Repeated for different number of clusters k Most stable clustering is produced for the real cluster number Jackknife resampling Evaluation with MCA index Data set: samples, 100 features, 3 cluster 10 runs per cluster number minutes on dual-quad core 3.2 GHz MCA index number of clusters

29 Java GUI ( import ( javax. swing JFrame JLabel JTextField JButton ) ( j a v a. awt. event A c t i o n L i s t e n e r ) ( j a v a. awt GridLayout ) ) ( l e t [ frame ( new JFrame Hello, World! ) h e l l o button ( new JButton Say h e l l o ) h e l l o l a b e l ( new JLabel ) ] (. h e l l o button ( a d d A c t i o n L i s t e n e r ( proxy [ A c t i o n L i s t e n e r ] [ ] ( a c t i o n P e r f o r m e d [ evt ] (. h e l l o l a b e l ( s e t T ext Hello, World! ) ) ) ) ) ) ( doto frame (. setlayout ( new GridLayout ) ) (. add h e l l o button ) (. add h e l l o l a b e l ) (. s e t S i z e ) (. s e t V i s i b l e t r u e ) ) )

30

31 Summary Writing parallel programs usually requires a careful software design and a deep knowledge about thread-safe programming Concurrency control via transactional memory circumvents problems of lock-based concurrency strategies Immutable data structures play a key role to software transactional memory Clojure combines Lisp, Java and a powerful STM system This enables fast parallelization of algorithms, even for rapid prototyping Our simulations show a good performance of the parallelized code

32 Thank you for your attention.

33 Statistical computing library Clojure-based statistical computing R-like semantics COLT library for numerical computation JFreeChart library for graphics

Clojure Concurrency Constructs. CSCI 5828: Foundations of Software Engineering Lecture 12 10/02/2014

Clojure Concurrency Constructs. CSCI 5828: Foundations of Software Engineering Lecture 12 10/02/2014 Clojure Concurrency Constructs CSCI 5828: Foundations of Software Engineering Lecture 12 10/02/2014 1 Goals Cover the material presented in Chapters 3 & 4 of our concurrency textbook! Books examples from

More information

Introduction Basics Concurrency Conclusion. Clojure. Marcel Klinzing. December 13, M. Klinzing Clojure 1/18

Introduction Basics Concurrency Conclusion. Clojure. Marcel Klinzing. December 13, M. Klinzing Clojure 1/18 Clojure Marcel Klinzing December 13, 2012 M. Klinzing Clojure 1/18 Overview/History Functional programming language Lisp dialect Compiles to Java Bytecode Implemented in Java Created by Rich Hickey Version

More information

Stuart

Stuart Clojure Time Stuart Halloway stu@clojure.com @stuarthalloway Copyright 2007-2010 Relevance, Inc. This presentation is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United

More information

Clojure. A Dynamic Programming Language for the JVM. Rich Hickey

Clojure. A Dynamic Programming Language for the JVM. Rich Hickey Clojure A Dynamic Programming Language for the JVM Rich Hickey Clojure Fundamentals 3 years in development, released 10/2007 A new Lisp, not Common Lisp or Scheme Functional emphasis on immutability Supporting

More information

Clojure. A Dynamic Programming Language for the JVM. (and CLR) Rich Hickey

Clojure. A Dynamic Programming Language for the JVM. (and CLR) Rich Hickey Clojure A Dynamic Programming Language for the JVM (and CLR) Rich Hickey Agenda Fundamentals Rationale Feature Tour Integration with the JVM Q&A Clojure Fundamentals Dynamic a new Lisp, not Common Lisp

More information

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin

More information

Seminar on Languages for Scientific Computing Aachen, 6 Feb Navid Abbaszadeh.

Seminar on Languages for Scientific Computing Aachen, 6 Feb Navid Abbaszadeh. Scientific Computing Aachen, 6 Feb 2014 navid.abbaszadeh@rwth-aachen.de Overview Trends Introduction Paradigms, Data Structures, Syntax Compilation & Execution Concurrency Model Reference Types Performance

More information

Reference types in Clojure. April 2, 2014

Reference types in Clojure. April 2, 2014 Reference types in Clojure April 2, 2014 Clojure atoms, vars, refs, agents Software transactional memory 2 / 15 From The Joy of Clojure book Time The relative moments when events occur State A snapshot

More information

Identity, State and Values

Identity, State and Values Identity, State and Values Clojure s approach to concurrency Rich Hickey Agenda Functions and processes Identity, State, and Values Persistent Data Structures Clojure s Managed References Q&A Functions

More information

Persistent Data Structures and Managed References

Persistent Data Structures and Managed References Persistent Data Structures and Managed References Clojure s approach to Identity and State Rich Hickey Agenda Functions and processes Identity, State, and Values Persistent Data Structures Clojure s Managed

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading

More information

Tackling Concurrency With STM. Mark Volkmann 10/22/09

Tackling Concurrency With STM. Mark Volkmann 10/22/09 Tackling Concurrency With Mark Volkmann mark@ociweb.com 10/22/09 Two Flavors of Concurrency Divide and conquer divide data into subsets and process it by running the same code on each subset concurrently

More information

Tackling Concurrency With STM

Tackling Concurrency With STM Tackling Concurrency With Mark Volkmann mark@ociweb.com 10/22/09 Two Flavors of Concurrency Divide and conquer divide data into subsets and process it by running the same code on each subset concurrently

More information

Chapter 4: Threads. Chapter 4: Threads

Chapter 4: Threads. Chapter 4: Threads Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

OPERATING SYSTEM. Chapter 4: Threads

OPERATING SYSTEM. Chapter 4: Threads OPERATING SYSTEM Chapter 4: Threads Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples Objectives To

More information

Chapter 4: Multi-Threaded Programming

Chapter 4: Multi-Threaded Programming Chapter 4: Multi-Threaded Programming Chapter 4: Threads 4.1 Overview 4.2 Multicore Programming 4.3 Multithreading Models 4.4 Thread Libraries Pthreads Win32 Threads Java Threads 4.5 Implicit Threading

More information

Problems with Concurrency. February 19, 2014

Problems with Concurrency. February 19, 2014 with Concurrency February 19, 2014 s with concurrency interleavings race conditions dead GUI source of s non-determinism deterministic execution model 2 / 30 General ideas Shared variable Access interleavings

More information

Clojure Lisp for the Real #clojure

Clojure Lisp for the Real #clojure Clojure Lisp for the Real World @stuartsierra #clojure 1 Bullet Points Values Code is data Generic data access Concurrency 2 Stuart Sierra Relevance, Inc. Clojure/core Clojure contributor 3 Values 4 Values

More information

Clojure. A (not-so-pure) functional approach to concurrency. Paolo Baldan Linguaggi per il Global Computing AA 2016/2017

Clojure. A (not-so-pure) functional approach to concurrency. Paolo Baldan Linguaggi per il Global Computing AA 2016/2017 Clojure A (not-so-pure) functional approach to concurrency Paolo Baldan Linguaggi per il Global Computing AA 2016/2017 In the words of the inventor Functional programming (rooted in Lisp, from 60s old

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information

.consulting.solutions.partnership. Clojure by Example. A practical introduction to Clojure on the JVM

.consulting.solutions.partnership. Clojure by Example. A practical introduction to Clojure on the JVM .consulting.solutions.partnership Clojure by Example A practical introduction to Clojure on the JVM Clojure By Example 1 Functional Progamming Concepts 3 2 Clojure Basics 4 3 Clojure Examples 5 4 References

More information

Clojure Lisp for the Real clojure.com

Clojure Lisp for the Real clojure.com Clojure Lisp for the Real World @stuartsierra clojure.com Stuart Sierra Relevance, Inc. Clojure/core Clojure contributor Values Values 3 Values 3 + 2 = 5 Values let x = 3 Values let x = 3 let x = 5 Values

More information

Open Multi-Processing: Basic Course

Open Multi-Processing: Basic Course HPC2N, UmeåUniversity, 901 87, Sweden. May 26, 2015 Table of contents Overview of Paralellism 1 Overview of Paralellism Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2 Pragmas/Sentinels

More information

Clojure is. A dynamic, LISP-based. programming language. running on the JVM

Clojure is. A dynamic, LISP-based. programming language. running on the JVM (first '(Clojure.)) Clojure is A dynamic, LISP-based programming language running on the JVM Origin 2007, Rich Hickey.. 1958, John McCarthy Features Functional Homoiconic Immutability (persistent data

More information

Parallelism. Master 1 International. Andrea G. B. Tettamanzi. Université de Nice Sophia Antipolis Département Informatique

Parallelism. Master 1 International. Andrea G. B. Tettamanzi. Université de Nice Sophia Antipolis Département Informatique Parallelism Master 1 International Andrea G. B. Tettamanzi Université de Nice Sophia Antipolis Département Informatique andrea.tettamanzi@unice.fr Andrea G. B. Tettamanzi, 2014 1 Lecture 5, Part a Languages

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information

Chapter 4: Threads. Operating System Concepts 9 th Edit9on

Chapter 4: Threads. Operating System Concepts 9 th Edit9on Chapter 4: Threads Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads 1. Overview 2. Multicore Programming 3. Multithreading Models 4. Thread Libraries 5. Implicit

More information

Concurrency and High Performance Reloaded

Concurrency and High Performance Reloaded Concurrency and High Performance Reloaded Disclaimer Any performance tuning advice provided in this presentation... will be wrong! 2 www.kodewerk.com Me Work as independent (a.k.a. freelancer) performance

More information

The Curious Clojureist

The Curious Clojureist The Curious Clojureist NEAL FORD director / software architect meme wrangler ThoughtWorks nford@thoughtworks.com 2002 Summit Boulevard, Atlanta, GA 30319 nealford.com thoughtworks.com memeagora.blogspot.com

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Ver teil tes Rechnen und Parallelprogrammierung: Introduction to Multi-Threading in Java

Ver teil tes Rechnen und Parallelprogrammierung: Introduction to Multi-Threading in Java Ver teil tes Rechnen und Parallelprogrammierung: Introduction to Multi-Threading in Java Based on the book (chapter 29): Introduction to Java Programming (Comprehensive Version) by Y. Daniel Liang Based

More information

Concurrency: what, why, how

Concurrency: what, why, how Concurrency: what, why, how May 28, 2009 1 / 33 Lecture about everything and nothing Explain basic idea (pseudo) vs. Give reasons for using Present briefly different classifications approaches models and

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Chapter 4: Multithreaded

Chapter 4: Multithreaded Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Overview Multithreading Models Thread Libraries Threading Issues Operating-System Examples 2009/10/19 2 4.1 Overview A thread is

More information

Towards Approximate Computing: Programming with Relaxed Synchronization

Towards Approximate Computing: Programming with Relaxed Synchronization Towards Approximate Computing: Programming with Relaxed Synchronization Lakshminarayanan Renganarayana Vijayalakshmi Srinivasan Ravi Nair (presenting) Dan Prener IBM T.J. Watson Research Center October

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

SELF-STUDY. Glossary

SELF-STUDY. Glossary SELF-STUDY 231 Glossary HTML (Hyper Text Markup Language - the language used to code web pages) tags used to embed an applet. abstract A class or method that is incompletely defined,

More information

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on

More information

CS420: Operating Systems

CS420: Operating Systems Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads) Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program

More information

Applied Unified Ownership. Capabilities for Sharing Across Threads

Applied Unified Ownership. Capabilities for Sharing Across Threads Applied Unified Ownership or Capabilities for Sharing Across Threads Elias Castegren Tobias Wrigstad DRF transfer parallel programming AppliedUnified Ownership memory management placement in pools (previous

More information

Java Concurrency. Towards a better life By - -

Java Concurrency. Towards a better life By - - Java Concurrency Towards a better life By - Srinivasan.raghavan@oracle.com - Vaibhav.x.choudhary@oracle.com Java Releases J2SE 6: - Collection Framework enhancement -Drag and Drop -Improve IO support J2SE

More information

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions CMSC 330: Organization of Programming Languages Multithreaded Programming Patterns in Java CMSC 330 2 Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to

More information

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

Non-Blocking Inter-Partition Communication with Wait-Free Pair Transactions

Non-Blocking Inter-Partition Communication with Wait-Free Pair Transactions Non-Blocking Inter-Partition Communication with Wait-Free Pair Transactions Ethan Blanton and Lukasz Ziarek Fiji Systems, Inc. October 10 th, 2013 WFPT Overview Wait-Free Pair Transactions A communication

More information

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 Introduction to Parallel Computing CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 1 Definition of Parallel Computing Simultaneous use of multiple compute resources to solve a computational

More information

COURSE 11 PROGRAMMING III OOP. JAVA LANGUAGE

COURSE 11 PROGRAMMING III OOP. JAVA LANGUAGE COURSE 11 PROGRAMMING III OOP. JAVA LANGUAGE PREVIOUS COURSE CONTENT Input/Output Streams Text Files Byte Files RandomAcessFile Exceptions Serialization NIO COURSE CONTENT Threads Threads lifecycle Thread

More information

Scientific Programming in C XIV. Parallel programming

Scientific Programming in C XIV. Parallel programming Scientific Programming in C XIV. Parallel programming Susi Lehtola 11 December 2012 Introduction The development of microchips will soon reach the fundamental physical limits of operation quantum coherence

More information

CS691/SC791: Parallel & Distributed Computing

CS691/SC791: Parallel & Distributed Computing CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP 1 Contents Introduction OpenMP Programming Model and Examples OpenMP programming examples Task parallelism. Explicit thread synchronization.

More information

DPHPC: Introduction to OpenMP Recitation session

DPHPC: Introduction to OpenMP Recitation session SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set of compiler directives

More information

COMP Parallel Computing. SMM (2) OpenMP Programming Model

COMP Parallel Computing. SMM (2) OpenMP Programming Model COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Designing for Performance. Patrick Happ Raul Feitosa

Designing for Performance. Patrick Happ Raul Feitosa Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance

More information

CS 2112 Lecture 20 Synchronization 5 April 2012 Lecturer: Andrew Myers

CS 2112 Lecture 20 Synchronization 5 April 2012 Lecturer: Andrew Myers CS 2112 Lecture 20 Synchronization 5 April 2012 Lecturer: Andrew Myers 1 Critical sections and atomicity We have been seeing that sharing mutable objects between different threads is tricky We need some

More information

Multiple Choice Questions: Identify the choice that best completes the statement or answers the question. (15 marks)

Multiple Choice Questions: Identify the choice that best completes the statement or answers the question. (15 marks) M257 MTA Spring2010 Multiple Choice Questions: Identify the choice that best completes the statement or answers the question. (15 marks) 1. If we need various objects that are similar in structure, but

More information

Effective Performance Measurement and Analysis of Multithreaded Applications

Effective Performance Measurement and Analysis of Multithreaded Applications Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Introduction to Locks. Intrinsic Locks

Introduction to Locks. Intrinsic Locks CMSC 433 Programming Language Technologies and Paradigms Spring 2013 Introduction to Locks Intrinsic Locks Atomic-looking operations Resources created for sequential code make certain assumptions, a large

More information

Chapter 5: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads Linux Threads Java Threads

Chapter 5: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads Linux Threads Java Threads Chapter 5: Threads Overview Multithreading Models Threading Issues Pthreads Windows XP Threads Linux Threads Java Threads 5.1 Silberschatz, Galvin and Gagne 2003 More About Processes A process encapsulates

More information

Concurrency in Object Oriented Programs 1. Object-Oriented Software Development COMP4001 CSE UNSW Sydney Lecturer: John Potter

Concurrency in Object Oriented Programs 1. Object-Oriented Software Development COMP4001 CSE UNSW Sydney Lecturer: John Potter Concurrency in Object Oriented Programs 1 Object-Oriented Software Development COMP4001 CSE UNSW Sydney Lecturer: John Potter Outline Concurrency: the Future of Computing Java Concurrency Thread Safety

More information

Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems

Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems V Brazilian Symposium on Computing Systems Engineering Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems Alessandro Trindade, Hussama Ismail, and Lucas Cordeiro Foz

More information

Principles of Software Construction: Objects, Design, and Concurrency. The Perils of Concurrency Can't live with it. Cant live without it.

Principles of Software Construction: Objects, Design, and Concurrency. The Perils of Concurrency Can't live with it. Cant live without it. Principles of Software Construction: Objects, Design, and Concurrency The Perils of Concurrency Can't live with it. Cant live without it. Spring 2014 Charlie Garrod Christian Kästner School of Computer

More information

Unit #8: Shared-Memory Parallelism and Concurrency

Unit #8: Shared-Memory Parallelism and Concurrency Unit #8: Shared-Memory Parallelism and Concurrency CPSC 221: Algorithms and Data Structures Will Evans and Jan Manuch 2016W1 Unit Outline History and Motivation Parallelism versus Concurrency Counting

More information

Performance Analysis with Periscope

Performance Analysis with Periscope Performance Analysis with Periscope M. Gerndt, V. Petkov, Y. Oleynik, S. Benedict Technische Universität München periscope@lrr.in.tum.de October 2010 Outline Motivation Periscope overview Periscope performance

More information

Introduction to Parallel Programming. Tuesday, April 17, 12

Introduction to Parallel Programming. Tuesday, April 17, 12 Introduction to Parallel Programming 1 Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational

More information

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman) CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI

More information

Problems with Concurrency

Problems with Concurrency with Concurrency February 14, 2012 1 / 27 s with concurrency race conditions deadlocks GUI source of s non-determinism deterministic execution model interleavings 2 / 27 General ideas Shared variable Shared

More information

Clojure: Enemy of the State

Clojure: Enemy of the State Clojure: Enemy of the State * * Not actually an enemy of the state, or state in general. :) Alex Miller @puredanger Roadmap Values vs objects Collections Sequences Generic data interfaces Identity and

More information

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text Acknowledgments Programming with MPI Parallel ming Jan Thorbecke Type to enter text This course is partly based on the MPI courses developed by Rolf Rabenseifner at the High-Performance Computing-Center

More information

Advanced concurrent programming in Java Shared objects

Advanced concurrent programming in Java Shared objects Advanced concurrent programming in Java Shared objects Mehmet Ali Arslan 21.10.13 Visibility To see(m) or not to see(m)... 2 There is more to synchronization than just atomicity or critical sessions. Memory

More information

Questions from last time

Questions from last time Questions from last time Pthreads vs regular thread? Pthreads are POSIX-standard threads (1995). There exist earlier and newer standards (C++11). Pthread is probably most common. Pthread API: about a 100

More information

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Performance Analysis Instructor: Haidar M. Harmanani Spring 2018 Outline Performance scalability Analytical performance measures Amdahl

More information

Semi-supervised learning

Semi-supervised learning Semi-supervised Learning COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Overview 2 Semi-supervised learning Semi-supervised classification Semi-supervised clustering Semi-supervised

More information

Allows program to be incrementally parallelized

Allows program to be incrementally parallelized Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP

More information

Shared Memory Programming with OpenMP

Shared Memory Programming with OpenMP Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model

More information

The Problem with Threads

The Problem with Threads The Problem with Threads Author Edward A Lee Presented by - Varun Notibala Dept of Computer & Information Sciences University of Delaware Threads Thread : single sequential flow of control Model for concurrent

More information

Favoring Isolated Mutability The Actor Model of Concurrency. CSCI 5828: Foundations of Software Engineering Lecture 24 04/11/2012

Favoring Isolated Mutability The Actor Model of Concurrency. CSCI 5828: Foundations of Software Engineering Lecture 24 04/11/2012 Favoring Isolated Mutability The Actor Model of Concurrency CSCI 5828: Foundations of Software Engineering Lecture 24 04/11/2012 1 Goals Review the material in Chapter 8 of the Concurrency textbook that

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

Writing Parallel Programs COMP360

Writing Parallel Programs COMP360 Writing Parallel Programs COMP360 We stand at the threshold of a many core world. The hardware community is ready to cross this threshold. The parallel software community is not. Tim Mattson principal

More information

Concurrency: State Models & Design Patterns

Concurrency: State Models & Design Patterns Concurrency: State Models & Design Patterns Practical Session Week 02 1 / 13 Exercises 01 Discussion Exercise 01 - Task 1 a) Do recent central processing units (CPUs) of desktop PCs support concurrency?

More information

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Multithreading Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to large compute clusters Can perform multiple

More information

ET International HPC Runtime Software. ET International Rishi Khan SC 11. Copyright 2011 ET International, Inc.

ET International HPC Runtime Software. ET International Rishi Khan SC 11. Copyright 2011 ET International, Inc. HPC Runtime Software Rishi Khan SC 11 Current Programming Models Shared Memory Multiprocessing OpenMP fork/join model Pthreads Arbitrary SMP parallelism (but hard to program/ debug) Cilk Work Stealing

More information

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co-

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Shaun Lindsay CS425 A Comparison of Unified Parallel C, Titanium and Co-Array Fortran The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Array Fortran s methods of parallelism

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 7 Threads Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ How many processes can a core

More information

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially

More information

Software Architecture

Software Architecture Software Architecture Lecture 5 Call-Return Systems Rob Pettit George Mason University last class data flow data flow styles batch sequential pipe & filter process control! process control! looping structure

More information

CPL 2016, week 10. Clojure functional core. Oleg Batrashev. April 11, Institute of Computer Science, Tartu, Estonia

CPL 2016, week 10. Clojure functional core. Oleg Batrashev. April 11, Institute of Computer Science, Tartu, Estonia CPL 2016, week 10 Clojure functional core Oleg Batrashev Institute of Computer Science, Tartu, Estonia April 11, 2016 Overview Today Clojure language core Next weeks Immutable data structures Clojure simple

More information

Chair of Software Engineering. Java and C# in depth. Carlo A. Furia, Marco Piccioni, Bertrand Meyer. Java: concurrency

Chair of Software Engineering. Java and C# in depth. Carlo A. Furia, Marco Piccioni, Bertrand Meyer. Java: concurrency Chair of Software Engineering Carlo A. Furia, Marco Piccioni, Bertrand Meyer Java: concurrency Outline Java threads thread implementation sleep, interrupt, and join threads that return values Thread synchronization

More information

Experiences with an SMP Implementation for X10 based on the Java Concurrency Utilities (Extended Abstract)

Experiences with an SMP Implementation for X10 based on the Java Concurrency Utilities (Extended Abstract) Experiences with an SMP Implementation for X1 based on the Java Concurrency Utilities (Extended Abstract) Rajkishore Barik IBM India Research Lab. rajbarik@in.ibm.com Allan Kielstra IBM Toronto Laboratory

More information