Parallel Programming

Size: px
Start display at page:

Download "Parallel Programming"

Transcription

1 Parallel Programming 7. Data Parallelism Christoph von Praun 07-1

2 (1) Parallel algorithm structure design space Organization by Data (1.1) Geometric Decomposition Organization by Tasks (1.3) Task Parallelism Organization by Data Flow (1.5) Pipeline (1.2) Recursive Data (1.4) Divide and Conquer 07-2

3 (1.1) Geometric decomposition Context: Application operates on a large data structure with multiple data items. Operation on each data item has regular access with clear dependencies. Application is typically data-intensive, little computation per data item 07-3

4 Example: Heat transfer... Temperature: normal hot n simulation steps 07-4

5 Stencil (=schema according to which T is updated): t 1 t 2 T t 3 t 4 Temperature: normal hot T new = Iteration until T new - T old < ε. (t 1 old +t 2 old +t 3 old +t 4 old )

6 Forces Data decomposition: Different parts of the data structure are assigned to different activities. granularity and topology? naive decomposition may not be ideal. Scheduling: Coordination required if operation of one activity require input from data belonging to another activity. 07-6

7 Solution (template) Partition data into chunks, one chunk per activity. activities must access their chunks + inputs efficiently (may require explicit communication if data is distributed) each activity updates only its chunk

8 Use data copies ( ghost cells ) to reduce dependencies across different chunks (old/new schema): - Red activity keeps copy of t 2 old. - Red and blue activity exchange data in lockstep. T new = t 1 t 2 T t 3 t 4 (t old 1 +t old 2 +t old 3 +t old 4 )

9 For certain stencils: Avoid dependencies by alternating updates of red and black elements. activities operate in lockstep (all activities update red, then all activities update black...) 07-9

10 Computations that proceed in lock-step are best described following the SPMD (Single Program Multiple Data) Clock clk = Clock.make(); clk.drop(); for (c in chunks) { // each chunk is processed // by a separate activity async clocked (clk) { <update red in chunk c> clk.next(); <update black in chunk c> } } same/single program run by different activities on multiple chunks/data

11 Consequences Data decomposition (chunk processed by an activity) must be chosen wisely according to caching and in-memory layout of data structures. Patterns in Category (2) 07-11

12 Which decomposition is preferable? for (r in rows) async { for (c in columns) <update array at (r,c)> } for (c in columns) async { for (r in rows) <update array at (r,c)> } 07-12

13 ... it depends on the memory layout: cache line (holds variables at consecutive addresses) Row major: offset = row*ncols+col Programming language C, X10 Column major: offset = col*nrows+row Programming language Fortran, Matlab 07-13

14 Sequential traversal in X10 (row major): val region = (0..NROWS) * (0..NCOLS); val arr = new Array[Double](region, 0.0); for ([r,c] in region) // r, c: Int <update arr(r,c)> for (p in region) <update arr(p)> // p: Point{rank=2} 07-14

15 Parallel traversal in X10 (row major): val region = (0..NROWS) * (0..NCOLS); val arr = new Array[Double](region, 0.0); for ([r] in region.projection(0)) async for ([c] in region.projection(1)) <update arr(r,c)> 07-15

16 Consequences (cont.) Explicit exchange of data at synchronization points may require explicit communication on platforms w/o shared memory (e.g. MPI) copy operation in shared memory (from real cells to ghost cells) 07-16

17 Abstract model: Mesh Implementation: Array with ghost cells. copy from real to ghost local memory of blue activity local memory of red activity copy from real to ghost 07-17

18 Why ghost cells? To facilitate data independence on shared memory machines To aggregate communication in distributed memory systems and enable computations on local memory 07-18

19 Consequences (cont.) Activities operate in lockstep; performance depends on (dynamic) frequency of synchronization. frequent synchronization is a sign for frequent dependences, hence little parallelism

20 Lockstep computation Data exchange (ghost cell update) at synchronization points. Clock clk = Clock.make(); clk.drop(); for (c in chunks) { async clocked (clk) { <initialize ghost cells> clk.next(); while (!done) { <read local data and ghost cells, update local data> clk.next(); <update ghost cells> clk.next(); } } 07-20

21 Lockstep computation: <init GC> <init GC> <local stencil computation> <local stencil computation> <update GC> <update GC> clk.next() clk.next() 07-21

22 Further examples Dense linear algebra computations, e.g. solver for systems of linear equations (LINPACK, measure of floating-point performance for supercomputer TOP 500) matrix muliplication 07-22

23 Matrix muliplication C = A B B (m k) A (n m) * C (n k) Naive parallel algorithm: - Each element in c(i,j) is computed by an activity - Activity reads row of A and column of B

24 B (m k) 3 lines 1 line A (n m) * C (n k) Challenge: - Computation of c(i,j) requires 2m read operations - 2m variables fall typically on many different cache lines (row major) - reading a line from memory into the cache incurs significant delay (cache miss) 07-24

25 Further Examples (cont.) Finite element methods (structured grids), e.g. simulation of electromagnetism, fluid dynamics, heat transfer (PDE solvers) Simulation of airflow and temperature in data center rack with different component layout: Source: [2] 07-25

26 Image processing, e.g., Gaussian image blurring: per pixel stencil computation value of a pixel is weighted average of neighboring points (#px) orig 5px 20px 07-26

27 Limits on parallelism? Conceptually: Most data-parallel algorithms are embarrasingly parallel no dependency among tasks, e.g. matrix multiplication no or few synchronization points lots / arbitrary parallelism perfect scaling In practice: Limitations due to the implementation and physics of the machine... Source: [3] 07-27

28 Scaling of data parallel problems Strong Scaling Fix overall problem/data size. Varying number of computational resources Do additional computational resources shorten solution of a fixed-size problem? Sometimes called scale-up Very challenging: decreasing amount of computation per activity, less opportunity for data reuse within activity (caching), requires very efficient coordination and sharing between computational resources. Source: [3] 07-28

29 Scaling of data parallel problems Weak Scaling Fixed problem size per computational unit. Varying number of computational units. Can a larger problem be computed in the the same time with additional computational resources? Sometimes called scale-out Less challenging than strong scaling. Examples: clusters computing, Blade centers, many Google applications. Source: [3] 07-29

30 (1) Parallel algorithm structure design space Organization by Data (1.1) Data Parallelism (Geometric Decomposition) (1.2) Recursive Data Organization by Tasks (1.3) Task Parallelism (1.4) Divide and Conquer Organization by Data Flow (1.5) Pipeline 07-30

31 (1.2) Recursive data Context: Like (1.1) Data structure is recursive lists, trees, graphs Operations are sometimes recursive, sometimes seem inherently sequential

32 Example: Reduction Data structure: List of values 3, 5, 17, 3, 6, 8, 12, 13 Operation: compute sum of values 07-32

33 Sequential algorithm: ((((((3+5)+17)+3)+6)+8)+12) time 07-33

34 Sequential algorithm def sum(arr: Array[Int]{rank==1}): Int { var sum: Int = 0; for (i in arr) sum += arr(i); return sum; } 07 -

35 Parallel algorithm: pair-wise summation ((3+5)+(17+3))+((6+8)+(12+13)) 3 67 Just changed the evaluation order of sequential program Simple change of schedule enables / increases parallelism time

36 Parallel algorithm def sum(arr: Array[Int]{rank==1}): Int { return pairwise(arr, arr.region.min(0), arr.region.max(0)); } def pairwise(arr: Array[Int]{rank==1}, lo: Int, hi: Int) : Int { if (lo == hi) return arr(lo); else { val lsum = Future.make(() => pairwise(arr, lo, lo + (hi-lo)/2)); val rsum = Future.make(() => pairwise(arr, lo + (hi-lo)/2 + 1, hi)); return lsum.force() + rsum.force(); } } 07 -

37 Semantics of X10 future S1; S1 val v1: Future[T] = Future.make(E1); S2; val v2: T = v1.force(); A feasible execution: 1) spaw async evaluation of expression E1 2) force future and claim result. s1 s2 v2 = <val> e1 hb-order 07-37

38 Parallel algorithm import x10.util.concurrent.future; def sum(arr: Array[Int]{rank==1}): Int { return pairwise(arr, arr.region.min(0), arr.region.max(0)); } def pairwise(arr: Array[Int]{rank==1}, lo: Int, hi: Int) : Int { if (lo == hi) return arr(lo); else { val lsum = Future.make(() => pairwise(arr, lo, lo + (hi-lo)/2)); } } concurrent recursive descent val rsum = Future.make(() => pairwise(arr, lo + (hi-lo)/2 + 1, hi)); return lsum.force() + rsum.force(); block until results are available 07 -

39 Algorithm follows divide and conquer pattern always possible for recursive operations natural opportunity for parallelization 07-39

40 Consequences Problem and its solution must be cast into a recursive form: Incurs sometimes additional cost that must be traded-off against the performance improvement due to parallelization. In the example: additional variables for temporary results, recursive caller chain Recursive formulation may not be intuitive to read 07-40

41 Amount of computation in the recursive descent must be significant to offset the cost of communication and synchronization. Example: sequential sum my be faster for arrays of size smaller than For larger arrays, take recursive, parallel algorithm

42 Parallel algorithm import x10.util.concurrent.future; val THRESHOLD = 1000; def par_sum(arr: Array[Int]{rank==1}): Int { return pairwise(arr, arr.region.min(0), arr.region.max(0)); } def pairwise(arr: Array[Int]{rank==1}, lo: Int, hi: Int) : Int { if (hi-lo < THRESHOLD) return seq_sum(arr, lo, hi); else { val lsum = Future.make(() => pairwise(arr, lo, lo + (hi-lo)/2)); val rsum = Future.make(() => pairwise(arr, lo + (hi-lo)/2 + 1, hi)); return lsum.force() + rsum.force(); } } def seq_sum(arr: Array[Int]{rank==1}, lo: Int, hi: Int): Int { var sum: Int = 0; for ((i): Point{rank==1} in [lo..hi]) sum += arr(i); return sum; } 07 -

43 Another example: Prefix sum (=scan) of list Data structure: List of values 3, 5, 17, 3, 6, 8, 12, 13 Operation: compute partial sum of first, up to current variable in the list: 3, 5, 17, 3, 6, 8, 12, , 8, 25, 28, 34, 42, 54,

44 Sequential algorithm def prefix_sum(arr: Array[Int]{rank==1}, res: Array[Int]{rank==1 && self.region == arr.region}) { for ((i): Point{rank==1} in arr) { if (i == 0) res(i) = arr(i); else res(i) = res(i-1) + arr(i); } } 07 -

45 Prefix sum is more difficult to parallelize than sum because all values (res(i), i<k) in the sequential solution are required to compute res(k)

46 Parallel prefix sum O:O 1:1 2:2 3:3 4:4 5:5 6:6 7:

47 Parallel prefix sum O:O 0:1 1:2 2:3 3:4 4:5 5:6 6: O:O 1:1 2:2 3:3 4:4 5:5 6:6 7:7 copy add complete temporary 07-47

48 Parallel prefix sum 3 8 O:O 0: :2 28 0:3 1:4 2:5 3:6 4: O:O 0:1 1:2 2:3 3:4 4:5 5:6 6: O:O 1:1 2:2 3:3 4:4 5:5 6:6 7:7 copy add complete temporary 07-48

49 Parallel prefix sum 35 0:4 42 0:5 57 0:6 67 0:7 3 8 O:O 0: :2 28 0:3 1:4 2:5 3:6 4: O:O 0:1 1:2 2:3 3:4 4:5 5:6 6: O:O 1:1 2:2 3:3 4:4 5:5 6:6 7:7 copy add complete temporary 07-49

50 Sources [1] Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill: Patterns for Parallel Programming, Addison Wesley [2] Future Facilities: [3] Maged Michael, Jose Moreira, Doron Shiloach, Robert Wisniewski: "Scale-up x Scale-out: A Case Study using Nutch/Lucene". Parallel and Distributed Processing Symposium (IPDPS),

51 This work is licensed under a Creative Commons Attribution- ShareAlike 3.0 License. You are free: to Share to copy, distribute and transmit the work to Remix to adapt the work Under the following conditions: Attribution. You must attribute the work to The Art of Multiprocessor Programming (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights

Parallel Programming

Parallel Programming Parallel Programming 9. Pipeline Parallelism Christoph von Praun praun@acm.org 09-1 (1) Parallel algorithm structure design space Organization by Data (1.1) Geometric Decomposition Organization by Tasks

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,

More information

Concurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Concurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists Companion slides for The by Maurice Herlihy & Nir Shavit Set Object Interface Collection of elements No duplicates Methods add() a new element remove() an element contains() if element

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

Coarse-grained and fine-grained locking Niklas Fors

Coarse-grained and fine-grained locking Niklas Fors Coarse-grained and fine-grained locking Niklas Fors 2013-12-05 Slides borrowed from: http://cs.brown.edu/courses/cs176course_information.shtml Art of Multiprocessor Programming 1 Topics discussed Coarse-grained

More information

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 9: Performance Issues in Shared Memory. The Lecture Contains:

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 9: Performance Issues in Shared Memory. The Lecture Contains: The Lecture Contains: Data Access and Communication Data Access Artifactual Comm. Capacity Problem Temporal Locality Spatial Locality 2D to 4D Conversion Transfer Granularity Worse: False Sharing Contention

More information

Patterns of Parallel Programming with.net 4. Ade Miller Microsoft patterns & practices

Patterns of Parallel Programming with.net 4. Ade Miller Microsoft patterns & practices Patterns of Parallel Programming with.net 4 Ade Miller (adem@microsoft.com) Microsoft patterns & practices Introduction Why you should care? Where to start? Patterns walkthrough Conclusions (and a quiz)

More information

Task Graph. Name: Problem: Context: D B. C Antecedent. Task Graph

Task Graph. Name: Problem: Context: D B. C Antecedent. Task Graph Graph Name: Graph Note: The Graph pattern is a concurrent execution pattern and should not be confused with the Arbitrary Static Graph architectural pattern (1) which addresses the overall organization

More information

Solution: a lock (a/k/a mutex) public: virtual void unlock() =0;

Solution: a lock (a/k/a mutex) public: virtual void unlock() =0; 1 Solution: a lock (a/k/a mutex) class BasicLock { public: virtual void lock() =0; virtual void unlock() =0; ; 2 Using a lock class Counter { public: int get_and_inc() { lock_.lock(); int old = count_;

More information

Parallelization Principles. Sathish Vadhiyar

Parallelization Principles. Sathish Vadhiyar Parallelization Principles Sathish Vadhiyar Parallel Programming and Challenges Recall the advantages and motivation of parallelism But parallel programs incur overheads not seen in sequential programs

More information

Introduction. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Introduction. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Introduction Companion slides for The by Maurice Herlihy & Nir Shavit Moore s Law Transistor count still rising Clock speed flattening sharply 2 Moore s Law (in practice) 3 Nearly Extinct: the Uniprocesor

More information

Parallel Programming Patterns

Parallel Programming Patterns Parallel Programming Patterns Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ Copyright 2013, 2017, 2018 Moreno Marzolla, Università

More information

Parallel Programming. March 15,

Parallel Programming. March 15, Parallel Programming March 15, 2010 1 Some Definitions Computational Models and Models of Computation real world system domain model - mathematical - organizational -... computational model March 15, 2010

More information

Marco Danelutto. May 2011, Pisa

Marco Danelutto. May 2011, Pisa Marco Danelutto Dept. of Computer Science, University of Pisa, Italy May 2011, Pisa Contents 1 2 3 4 5 6 7 Parallel computing The problem Solve a problem using n w processing resources Obtaining a (close

More information

COSC 6374 Parallel Computation. Parallel Design Patterns. Edgar Gabriel. Fall Design patterns

COSC 6374 Parallel Computation. Parallel Design Patterns. Edgar Gabriel. Fall Design patterns COSC 6374 Parallel Computation Parallel Design Patterns Fall 2014 Design patterns A design pattern is a way of reusing abstract knowledge about a problem and its solution Patterns are devices that allow

More information

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker! 7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker! Roadmap > Introduction! > Optimizations in the Back-end! > The Optimizer! > SSA Optimizations! > Advanced Optimizations! 2 Literature!

More information

Parallelization Strategy

Parallelization Strategy COSC 6374 Parallel Computation Algorithm structure Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure

More information

EE382N (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV

EE382N (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV EE382 (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality (c) Rodric Rabbah, Mattan

More information

Ade Miller Senior Development Manager Microsoft patterns & practices

Ade Miller Senior Development Manager Microsoft patterns & practices Ade Miller (adem@microsoft.com) Senior Development Manager Microsoft patterns & practices Save time and reduce risk on your software development projects by incorporating patterns & practices, Microsoft's

More information

Parallelization Strategy

Parallelization Strategy COSC 335 Software Design Parallel Design Patterns (II) Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure

More information

Linked Lists: Locking, Lock- Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists: Locking, Lock- Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Linked Lists: Locking, Lock- Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Coarse-Grained Synchronization Each method locks the object Avoid

More information

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Linked Lists: Locking, Lock-Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Objects Adding threads should not lower throughput Contention

More information

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger. Sources:

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger. Sources: Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,

More information

5. Semantic Analysis!

5. Semantic Analysis! 5. Semantic Analysis! Prof. O. Nierstrasz! Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes.! http://www.cs.ucla.edu/~palsberg/! http://www.cs.purdue.edu/homes/hosking/!

More information

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Analytical Modeling of Parallel Systems To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Lecture 17: Array Algorithms

Lecture 17: Array Algorithms Lecture 17: Array Algorithms CS178: Programming Parallel and Distributed Systems April 4, 2001 Steven P. Reiss I. Overview A. We talking about constructing parallel programs 1. Last time we discussed sorting

More information

10th August Part One: Introduction to Parallel Computing

10th August Part One: Introduction to Parallel Computing Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer

More information

Parallel Algorithm Design. Parallel Algorithm Design p. 1

Parallel Algorithm Design. Parallel Algorithm Design p. 1 Parallel Algorithm Design Parallel Algorithm Design p. 1 Overview Chapter 3 from Michael J. Quinn, Parallel Programming in C with MPI and OpenMP Another resource: http://www.mcs.anl.gov/ itf/dbpp/text/node14.html

More information

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press,   ISSN The implementation of a general purpose FORTRAN harness for an arbitrary network of transputers for computational fluid dynamics J. Mushtaq, A.J. Davies D.J. Morgan ABSTRACT Many Computational Fluid Dynamics

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction

COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

More information

A Pattern Language for Parallel Programming

A Pattern Language for Parallel Programming A Pattern Language for Parallel Programming Tim Mattson timothy.g.mattson@intel.com Beverly Sanders sanders@cise.ufl.edu Berna Massingill bmassing@cs.trinity.edu Motivation Hardware for parallel computing

More information

Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves

Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves Daniel Butnaru butnaru@in.tum.de Advisor: Michael Bader bader@in.tum.de JASS 08 Computational Science and Engineering

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 14 Parallelism in Software V

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 14 Parallelism in Software V EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 14 Parallelism in Software V Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Fall 2011 --

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

Chapter 17 - Parallel Processing

Chapter 17 - Parallel Processing Chapter 17 - Parallel Processing Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 17 - Parallel Processing 1 / 71 Table of Contents I 1 Motivation 2 Parallel Processing Categories

More information

Parallelization of an Example Program

Parallelization of an Example Program Parallelization of an Example Program [ 2.3] In this lecture, we will consider a parallelization of the kernel of the Ocean application. Goals: Illustrate parallel programming in a low-level parallel language.

More information

Turbostream: A CFD solver for manycore

Turbostream: A CFD solver for manycore Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 6 ( Analyzing Distributed Memory Algorithms )

CSE 590: Special Topics Course ( Supercomputing ) Lecture 6 ( Analyzing Distributed Memory Algorithms ) CSE 590: Special Topics Course ( Supercomputing ) Lecture 6 ( Analyzing Distributed Memory Algorithms ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2012 2D Heat Diffusion

More information

Simulating ocean currents

Simulating ocean currents Simulating ocean currents We will study a parallel application that simulates ocean currents. Goal: Simulate the motion of water currents in the ocean. Important to climate modeling. Motion depends on

More information

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2 Complexity and Advanced Algorithms Monsoon 2011 Parallel Algorithms Lecture 2 Trivia ISRO has a new supercomputer rated at 220 Tflops Can be extended to Pflops. Consumes only 150 KW of power. LINPACK is

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3 6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

Parallel Programming Patterns

Parallel Programming Patterns Parallel Programming Patterns Pattern-Driven Parallel Application Development 7/10/2014 DragonStar 2014 - Qing Yi 1 Parallelism and Performance p Automatic compiler optimizations have their limitations

More information

Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY

Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY Clustering Algorithm Clustering is an unsupervised machine learning algorithm that divides a data into meaningful sub-groups,

More information

MPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education

MPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education MPI Tutorial Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education Center for Vision, Cognition, Learning and Art, UCLA July 15 22, 2013 A few words before

More information

Shared-Memory Computability

Shared-Memory Computability Shared-Memory Computability 10011 Universal Object Wait-free/Lock-free computable = Threads with methods that solve n- consensus Art of Multiprocessor Programming Copyright Herlihy- Shavit 2007 93 GetAndSet

More information

Distributed Computing through Combinatorial Topology. Maurice Herlihy & Dmitry Kozlov & Sergio Rajsbaum

Distributed Computing through Combinatorial Topology. Maurice Herlihy & Dmitry Kozlov & Sergio Rajsbaum Distributed Computing through Maurice Herlihy & Dmitry Kozlov & Sergio Rajsbaum 1 In the Beginning 1 0 1 1 0 1 0 a computer was just a Turing machine Distributed Computing though 2 Today??? Computing is

More information

IN5050: Programming heterogeneous multi-core processors Thinking Parallel

IN5050: Programming heterogeneous multi-core processors Thinking Parallel IN5050: Programming heterogeneous multi-core processors Thinking Parallel 28/8-2018 Designing and Building Parallel Programs Ian Foster s framework proposal develop intuition as to what constitutes a good

More information

Ironclad C++ A Library-Augmented Type-Safe Subset of C++

Ironclad C++ A Library-Augmented Type-Safe Subset of C++ Ironclad C++ A Library-Augmented Type-Safe Subset of C++ Christian DeLozier, Richard Eisenberg, Peter-Michael Osera, Santosh Nagarakatte*, Milo M. K. Martin, and Steve Zdancewic October 30, 2013 University

More information

CS 470 Spring Parallel Algorithm Development. (Foster's Methodology) Mike Lam, Professor

CS 470 Spring Parallel Algorithm Development. (Foster's Methodology) Mike Lam, Professor CS 470 Spring 2018 Mike Lam, Professor Parallel Algorithm Development (Foster's Methodology) Graphics and content taken from IPP section 2.7 and the following: http://www.mcs.anl.gov/~itf/dbpp/text/book.html

More information

Distributed Computing through Combinatorial Topology MITRO207, P4, 2017

Distributed Computing through Combinatorial Topology MITRO207, P4, 2017 Distributed Computing through MITRO207, P4, 2017 Administrivia Language: (fr)anglais? Lectures: Fridays (28.04, 20.05-23.06, 30.06), Thursday (29.06), 8:30-11:45, B555-557 Web page: http://perso.telecom-paristech.fr/~kuznetso/

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication Material based on Chapter 10, Numerical Algorithms, of B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c

More information

CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis. Kevin Quinn Fall 2015

CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis. Kevin Quinn Fall 2015 CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis Kevin Quinn Fall 2015 Outline Done: How to write a parallel algorithm with fork and join Why using divide-and-conquer

More information

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 Introduction to Parallel Computing CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 1 Definition of Parallel Computing Simultaneous use of multiple compute resources to solve a computational

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA 3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires

More information

Sparse Training Data Tutorial of Parameter Server

Sparse Training Data Tutorial of Parameter Server Carnegie Mellon University Sparse Training Data Tutorial of Parameter Server Mu Li! CSD@CMU & IDL@Baidu! muli@cs.cmu.edu High-dimensional data are sparse Why high dimension?! make the classifier s job

More information

CS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1

CS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1 CS4961 Parallel Programming Lecture 4: Data and Task Parallelism Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following command:

More information

Parallel Algorithm Design. CS595, Fall 2010

Parallel Algorithm Design. CS595, Fall 2010 Parallel Algorithm Design CS595, Fall 2010 1 Programming Models The programming model o determines the basic concepts of the parallel implementation and o abstracts from the hardware as well as from the

More information

Cost-Effective Parallel Computational Electromagnetic Modeling

Cost-Effective Parallel Computational Electromagnetic Modeling Cost-Effective Parallel Computational Electromagnetic Modeling, Tom Cwik {Daniel.S.Katz, cwik}@jpl.nasa.gov Beowulf System at PL (Hyglac) l 16 Pentium Pro PCs, each with 2.5 Gbyte disk, 128 Mbyte memory,

More information

EE382N (20): Computer Architecture - Parallelism and Locality Lecture 10 Parallelism in Software I

EE382N (20): Computer Architecture - Parallelism and Locality Lecture 10 Parallelism in Software I EE382 (20): Computer Architecture - Parallelism and Locality Lecture 10 Parallelism in Software I Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality (c) Rodric Rabbah, Mattan

More information

Chapter 10. Implementing Subprograms ISBN

Chapter 10. Implementing Subprograms ISBN Chapter 10 Implementing Subprograms ISBN 0-321-33025-0 Chapter 10 Topics The General Semantics of Calls and Returns Implementing Simple Subprograms Implementing Subprograms with Stack-Dynamic Local Variables

More information

Advanced Parallel Programming

Advanced Parallel Programming Sebastian von Alfthan Jussi Enkovaara Pekka Manninen Advanced Parallel Programming February 15-17, 2016 PRACE Advanced Training Center CSC IT Center for Science Ltd, Finland All material (C) 2011-2016

More information

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java s Devrim Akgün Computer Engineering of Technology Faculty, Duzce University, Duzce,Turkey ABSTRACT Developing multi

More information

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers July 14, 1997 J Daniel S. Katz (Daniel.S.Katz@jpl.nasa.gov) Jet Propulsion Laboratory California Institute of Technology

More information

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs Dan Grossman Last Updated: January 2016 For more information, see http://www.cs.washington.edu/homes/djg/teachingmaterials/

More information

High Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms

High Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms High Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering (CiE) Scientific Computing

More information

Image Processing. Filtering. Slide 1

Image Processing. Filtering. Slide 1 Image Processing Filtering Slide 1 Preliminary Image generation Original Noise Image restoration Result Slide 2 Preliminary Classic application: denoising However: Denoising is much more than a simple

More information

Lecture 7: Mutual Exclusion 2/16/12. slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit

Lecture 7: Mutual Exclusion 2/16/12. slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Principles of Concurrency and Parallelism Lecture 7: Mutual Exclusion 2/16/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Time Absolute, true and mathematical time, of

More information

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular

More information

Patterns for! Parallel Programming!

Patterns for! Parallel Programming! Lecture 4! Patterns for! Parallel Programming! John Cavazos! Dept of Computer & Information Sciences! University of Delaware!! www.cis.udel.edu/~cavazos/cisc879! Lecture Overview Writing a Parallel Program

More information

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Focus so far: Correctness and Progress Models Accurate (we never lied to you) But idealized

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters Part II-3: Parallel FVM using MPI

Introduction to Parallel Programming for Multicore/Manycore Clusters Part II-3: Parallel FVM using MPI Introduction to Parallel Programming for Multi/Many Clusters Part II-3: Parallel FVM using MPI Kengo Nakajima Information Technology Center The University of Tokyo 2 Overview Introduction Local Data Structure

More information

Parallel Poisson Solver in Fortran

Parallel Poisson Solver in Fortran Parallel Poisson Solver in Fortran Nilas Mandrup Hansen, Ask Hjorth Larsen January 19, 1 1 Introduction In this assignment the D Poisson problem (Eq.1) is to be solved in either C/C++ or FORTRAN, first

More information

8. Hardware-Aware Numerics. Approaching supercomputing...

8. Hardware-Aware Numerics. Approaching supercomputing... Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 48 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum

More information

8. Hardware-Aware Numerics. Approaching supercomputing...

8. Hardware-Aware Numerics. Approaching supercomputing... Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 22 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum

More information

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation Shared Memory and Distributed Multiprocessing Bhanu Kapoor, Ph.D. The Saylor Foundation 1 Issue with Parallelism Parallel software is the problem Need to get significant performance improvement Otherwise,

More information

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I EE382 (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Spring 2015

More information

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Fall 2011 --

More information

Some aspects of parallel program design. R. Bader (LRZ) G. Hager (RRZE)

Some aspects of parallel program design. R. Bader (LRZ) G. Hager (RRZE) Some aspects of parallel program design R. Bader (LRZ) G. Hager (RRZE) Finding exploitable concurrency Problem analysis 1. Decompose into subproblems perhaps even hierarchy of subproblems that can simultaneously

More information

Algorithms for GIS csci3225

Algorithms for GIS csci3225 Algorithms for GIS csci3225 Laura Toma Bowdoin College Flow on digital terrain models (I) Flow Where does the water go when it rains? Flooding: What are the areas susceptible to flooding? Sea level rise:

More information

Universality of Consensus. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Universality of Consensus. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Universality of Consensus Companion slides for The by Maurice Herlihy & Nir Shavit Turing Computability 1 0 1 1 0 1 0 A mathematical model of computation Computable = Computable on a T-Machine 2 Shared-Memory

More information

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003 Topic Overview One-to-All Broadcast

More information

Topic Notes: Message Passing Interface (MPI)

Topic Notes: Message Passing Interface (MPI) Computer Science 400 Parallel Processing Siena College Fall 2008 Topic Notes: Message Passing Interface (MPI) The Message Passing Interface (MPI) was created by a standards committee in the early 1990

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

Titanium. Titanium and Java Parallelism. Java: A Cleaner C++ Java Objects. Java Object Example. Immutable Classes in Titanium

Titanium. Titanium and Java Parallelism. Java: A Cleaner C++ Java Objects. Java Object Example. Immutable Classes in Titanium Titanium Titanium and Java Parallelism Arvind Krishnamurthy Fall 2004 Take the best features of threads and MPI (just like Split-C) global address space like threads (ease programming) SPMD parallelism

More information

Parallelism in Software

Parallelism in Software Parallelism in Software Minsoo Ryu Department of Computer Science and Engineering 2 1 Parallelism in Software 2 Creating a Multicore Program 3 Multicore Design Patterns 4 Q & A 2 3 Types of Parallelism

More information

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography 1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography

More information

Scheduling Image Processing Pipelines

Scheduling Image Processing Pipelines Lecture 14: Scheduling Image Processing Pipelines Visual Computing Systems Simple image processing kernel int WIDTH = 1024; int HEIGHT = 1024; float input[width * HEIGHT]; float output[width * HEIGHT];

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Milind Kulkarni Research Statement

Milind Kulkarni Research Statement Milind Kulkarni Research Statement With the increasing ubiquity of multicore processors, interest in parallel programming is again on the upswing. Over the past three decades, languages and compilers researchers

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information