Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming

Size: px
Start display at page:

Download "Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming"

Transcription

1 Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming Dalvan Griebler, Luiz Gustavo Fernandes Pontifícia Universidade Católica do Rio Grande do Sul - PUCRS Programa de Pós-Graduação em Ciência da Computação - PPGCC Grupo de Modelagem de Aplicações Paralelas - GMAP Brazilian Symposium on Programming Languages - SBLP October 203 / 2

2 2 / 2 Summary Introduction 2 Patterns-Oriented Parallel Programming (POPP) 3 DSL-POPP Compilation Process Programming Interface and Implementation Levels of parallelism 4 Results Implementation Example of the DSL-POPP Tests Scenario Performance of DSL-POPP 5 Conclusions 6 References

3 3 / 2 Introduction Skeletons/Patterns ([], [2], [3])

4 3 / 2 Introduction Skeletons/Patterns ([], [2], [3]) Programming Interfaces (FastFlow [4], Muesli [5], SkeTo[6], Skandium [7], eskel[8], P3L [9], Lithium [0], Muskel [] and Skil [2])

5 3 / 2 Introduction Skeletons/Patterns ([], [2], [3]) Programming Interfaces (FastFlow [4], Muesli [5], SkeTo[6], Skandium [7], eskel[8], P3L [9], Lithium [0], Muskel [] and Skil [2]) Main goals of DSL-POPP [3]: Reduce the effort without compromise the performance Patterns-Oriented Parallel Programming Abstract details of patterns implementation Offer different levels of parallelism

6 3 / 2 Introduction Skeletons/Patterns ([], [2], [3]) Programming Interfaces (FastFlow [4], Muesli [5], SkeTo[6], Skandium [7], eskel[8], P3L [9], Lithium [0], Muskel [] and Skil [2]) Main goals of DSL-POPP [3]: Reduce the effort without compromise the performance Patterns-Oriented Parallel Programming Abstract details of patterns implementation Offer different levels of parallelism Paper contributions We propose the POPP model We introduce DSL-POPP We present a case study based on an image processing algorithm

7 4 / 2 Patterns-Oriented Parallel Programming (POPP) Main Routine Code Block Code Block n Subroutine Code Block Code Block n... Subroutine Subroutine n Code Block Code Block n Code Block Code Block n Master/Slave pattern code blocks S m s sn M... Sn m s sn subroutine subroutine n main routine Pipeline pattern code blocks P Pn p pn p pn subroutine subroutine n main routine Legend: M,S: Master/Slave (main routine) m,s: master/slave (subrotine) P: Pipeline stage (main routine) p: pipeline stage (subroutine) Figure: POPP model Figure: Master/Slave - Pipeline.

8 Patterns-Oriented Parallel Programming (POPP) Main Routine Code Block Code Block n Subroutine Code Block Code Block n... Subroutine Subroutine n Code Block Code Block n Code Block Code Block n Master/Slave pattern code blocks S m s sn M... Sn m s sn subroutine subroutine n main routine Pipeline pattern code blocks P Pn p pn p pn subroutine subroutine n main routine Legend: M,S: Master/Slave (main routine) m,s: master/slave (subrotine) P: Pipeline stage (main routine) p: pipeline stage (subroutine) Figure: POPP model Figure: Master/Slave - Pipeline. Hybrid patterns P P2 Pn m s sn subroutine (master/slave) m s sn subroutine 2 (master/slave) main routine (pipeline) p pn subroutine n (pipeline) Figure: Combination of Patterns. 4 / 2

9 5 / 2 Compilation Process DSL-POPP $PipelinePattern @Stage(){ Source Code Pattern Tree Syntatic/Semantic Analysis include pthread.h include smmpi.h SMMPI_send() SMMPI_recv() pthread_create() pthread_join() Source-to-Source Transformation DSL-POPP Precompiler System Figure: Compilation process. GCC Compiler Binary Code

10 6 / 2 Programming Interface and Implementation DSL-POPP $PipelinePattern num_th, void* buffer, int num_th, void* buffer, int num_th, void* buffer, int buf_size){ Pipeline Block C re a t e Work 0 Stage Block thread 0 thread Work 0 T h re a Work 0 d s thread 2 J o i n T h re a d s (a) Pipeline

11 Programming Interface and Implementation DSL-POPP $PipelinePattern num_th, void* buffer, int num_th, void* buffer, int num_th, void* buffer, int buf_size){ Pipeline Block C re a t e Work 0 Stage Block thread 0 thread Work 0 T h re a Work 0 d s thread 2 J o i n T h re a d s (a) Pipeline $MasterSlavePattern num_th, void* buffer, int buf_size, const POPP_LB_Policy){ Master Block Create Threads Work 0.0 Slave Block Work 0.n Work n.0 thread 0 thread n Join Threads Work n.n (b) Master/Slave Figure: Syntax and logical structure of the DSL-POPP Policies for Load Balancing: POPP_LB_STATIC; POPP_LB_DYNAMIC; POPP_LB_COST. 6 / 2

12 7 / 2 Levels of parallelism DSL-POPP Pipeline - Pipeline a) b) Pipeline - Master/Slave Master/Slave - Master/Slave c) d) Master/Slave - Pipeline Control threads (master) First level active threads Second level active threads Figure: Overview of thread graph in DSL-POPP.

13 8 / 2 Implementation Example of the DSL-POPP Results List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt Sobel Roberts IM IM IM Figure: Overview of DSL-POPP Image Processing Algorithm Implementation.

14 9 / 2 Implementation Example of the DSL-POPP Results List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt 2 n IM Split Split Sobel IM Split IM... 2 n Roberts IM Split Figure: Overview of DSL-POPP Image Processing Algorithm Implementation.

15 0 / 2 Implementation Example of the DSL-POPP Results

16 / 2 Implementation Example of the DSL-POPP Results

17 ... 2 / 2 Tests Scenario Results List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Test- Prewitt Sobel Roberts Master/Slave IM Split Master/Slave IM Split Master/Slave IM Split 2 IM n

18 3 / 2 Tests Scenario Results Pipeline List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt Sobel Roberts IM IM2 IM Test-2 IM3 IM39 IM2 IM IM39 IM39

19 ... 4 / 2 Tests Scenario Results List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt Sobel IM Master/Slave Split 2 n Master/Slave IM Split Master/Slave Split 2 IM n Test-3. and Test-3.2 Roberts IM Master/Slave Split

20 ... 5 / 2 Tests Scenario Results Pipeline List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt Sobel Roberts IM IM IM IM Test-4 IM2 IM3 IM IM2 IM Master/Slave IM39 Split Master/Slave IM39 Split Master/Slave IM39 Split IM 2 n

21 6 / 2 Tests Scenario Results List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt Sobel 2 n IM IM Master/Slave Split Test-5 Roberts IM

22 Performance of DSL-POPP Results Speedup Test Number of threads Efficiency Speedup Ideal Efficiency Speedup Test Number of threads Efficiency Speedup Ideal Efficiency Speedup Test Number of threads Efficiency Speedup Ideal Efficiency Speedup Test Number of threads Efficiency Speedup Ideal Efficiency Speedup Test Number of threads Efficiency Speedup Ideal Efficiency Speedup Test Number of threads Efficiency Speedup Ideal Efficiency 7 / 2

23 8 / 2 Conclusions About this paper Hide Low level parallel programming primitives Patterns may be easily nested or combined Good performance for image processing application Different parallel implementation tests were performed Future Works Include other parallel patterns Investigate optimized techniques for code generation Effort evaluation.

24 References I Mattson G. T., Sanders A. B., and Massingill L. B. Patterns for Parallel Programming. Addison-Wesley, Boston, USA, Intel and Mccool D. M. Structured Parallel Programming with Deterministic Patterns. In HotPar-2nd USENIX Workshop on Hot Topics in Parallelism, pages 6, Berkeley, CA, June 200. Catanzaro R. and Keutzer K. Parallel Computing with Patterns and Frameworks. XRDS: Crossroads, The ACM Magazine for Students, 7():22 27, 200. Aldinucci M. and Danelutto M. and Kilpatrick P. and Torquati M. FastFlow: High-Level and Efficient Streaming on Multi-core. In Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, chapter 3. Wiley, Boston, USA, 203. Ciechanowicz P. and Kuchen H. Enhancing Muesli s Data Parallel Skeletons for Multi-core Computer Architectures. In High Performance Computing and Communications (HPCC), 200 2th IEEE International Conference on, pages 08 3, Melbourne, Australia, September 200. Karasawa Y. and Iwasaki H. A Parallel Skeleton Library for Multi-core Clusters. In Parallel Processing, ICPP 09. International Conference on, pages 84 9, Vienna, Austria, September / 2

25 References II Leyton M. and Piquer J.M. Skandium: Multi-core Programming with Algorithmic Skeletons. In Parallel, Distributed and Network-Based Processing (PDP), 200 8th Euromicro International Conference on, pages , Pisa, Italy, February 200. Benoit A., Cole M., Gilmore S., and Hillston J. Flexible Skeletal Programming with eskel. In Proceedings of the th international Euro-Par conference on Parallel Processing, pages , Lisboa, Portugal, September, Bacci B. and Danelutto M. and Orlando S. and Pelagatti S. and Vanneschi M. P3L: A Structured High-Level Parallel Language, and its Structured Support. Concurrency: Practice and Experience, 7(3): , 995. Aldinucci M. and Danelutto M. and Teti P. An Advanced Environment Supporting Structured Parallel Programming in Java. Future Gener. Comput. Syst., 9(5):6 626, Aldinucci M. and Danelutto M. and Kilpatrick P. Skeletons for Multi/Many-core Systems. In Parallel Computing: From Multicores and GPU s to Petascale (Proc. of PARCO 2009, Lyon, France), pages , Lyon, France, September Botorog G.H. and Kuchen H. Skil: An Imperative Language with Algorithmic Skeletons for Efficient Distributed Programming. In High Performance Distributed Computing, 996., Proceedings of 5th IEEE International Symposium on, pages , Syracuse, NY, USA, August / 2

26 References III Griebler D. J. Proposta de uma Linguagem Específica de Domínio de Programação Paralela Orientada a Padrões Paralelos: um Estudo de Caso Baseado no Padrão Mestre/Escravo para Arquiteturas Multi-Core. Master s thesis, PUCRS, 202. Voltar para Capa 2 / 2

Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures

Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures Dalvan Griebler, Daniel Adornes, Luiz Gustavo Fernandes Pontifícia Universidade Católica

More information

Marco Danelutto. May 2011, Pisa

Marco Danelutto. May 2011, Pisa Marco Danelutto Dept. of Computer Science, University of Pisa, Italy May 2011, Pisa Contents 1 2 3 4 5 6 7 Parallel computing The problem Solve a problem using n w processing resources Obtaining a (close

More information

Joint Structured/Unstructured Parallelism Exploitation in muskel

Joint Structured/Unstructured Parallelism Exploitation in muskel Joint Structured/Unstructured Parallelism Exploitation in muskel M. Danelutto 1,4 and P. Dazzi 2,3,4 1 Dept. Computer Science, University of Pisa, Italy 2 ISTI/CNR, Pisa, Italy 3 IMT Institute for Advanced

More information

Skel: A Streaming Process-based Skeleton Library for Erlang (Early Draft!)

Skel: A Streaming Process-based Skeleton Library for Erlang (Early Draft!) Skel: A Streaming Process-based Skeleton Library for Erlang (Early Draft!) Archibald Elliott 1, Christopher Brown 1, Marco Danelutto 2, and Kevin Hammond 1 1 School of Computer Science, University of St

More information

Parallel patterns + Macro Data Flow for multi-core programming

Parallel patterns + Macro Data Flow for multi-core programming Parallel patterns + Macro Data Flow for multi-core programming M. Aldinucci Dept. Computer Science Univ. of Torino, Italy aldinuc@di.unito.it L. Anardu & M. Danelutto & M. Torquati Dept. Computer Science

More information

Skeletons for multi/many-core systems

Skeletons for multi/many-core systems Skeletons for multi/many-core systems Marco ALDINUCCI a Marco DANELUTTO b Peter KILPATRICK c a Dept. Computer Science Univ. of Torino Italy b Dept. Computer Science Univ. of Pisa Italy c Dept. Computer

More information

Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications

Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications Fernando Rui, Márcio Castro, Dalvan Griebler, Luiz Gustavo Fernandes Email: fernando.rui@acad.pucrs.br,

More information

High-Level and Efficient Stream Parallelism on Multi-core Systems with SPar for Data Compression Applications

High-Level and Efficient Stream Parallelism on Multi-core Systems with SPar for Data Compression Applications High-Level and Efficient Stream Parallelism on Multi-core Systems with SPar for Data Compression Applications Dalvan Griebler 1, Renato B. Hoffmann 1, Junior Loff 1, Marco Danelutto 2, Luiz Gustavo Fernandes

More information

GMaVis: A Domain-Specific Language for Large-Scale Geospatial Data Visualization Supporting Multi-core Parallelism

GMaVis: A Domain-Specific Language for Large-Scale Geospatial Data Visualization Supporting Multi-core Parallelism 1 / 61 GMaVis: A Domain-Specific Language for Large-Scale Geospatial Data Visualization Supporting Multi-core Parallelism Cleverson Ledur Advisor: Ph.D. Luiz Gustavo Fernandes Co-Advisor: Ph.D. Isabel

More information

The cost of security in skeletal systems

The cost of security in skeletal systems The cost of security in skeletal systems M. Aldinucci Dept. Computer Science Univ. of Pisa Italy aldinuc@di.unipi.it M. Danelutto Dept. Computer Science Univ. of Pisa Italy marcod@di.unipi.it Abstract

More information

Marco Danelutto. October 2010, Amsterdam. Dept. of Computer Science, University of Pisa, Italy. Skeletons from grids to multicores. M.

Marco Danelutto. October 2010, Amsterdam. Dept. of Computer Science, University of Pisa, Italy. Skeletons from grids to multicores. M. Marco Danelutto Dept. of Computer Science, University of Pisa, Italy October 2010, Amsterdam Contents 1 2 3 4 Structured parallel programming Structured parallel programming Algorithmic Cole 1988 common,

More information

Two Fundamental Concepts in Skeletal Parallel Programming

Two Fundamental Concepts in Skeletal Parallel Programming Two Fundamental Concepts in Skeletal Parallel Programming Anne Benoit and Murray Cole School of Informatics, The University of Edinburgh, James Clerk Maxwell Building, The King s Buildings, Mayfield Road,

More information

A Parallel Sweep Line Algorithm for Visibility Computation

A Parallel Sweep Line Algorithm for Visibility Computation Universidade Federal de Viçosa Departamento de Informática Programa de Pós-Graduação em Ciência da Computação A Parallel Sweep Line Algorithm for Visibility Computation Chaulio R. Ferreira Marcus V. A.

More information

Implementing Fusion-Equipped Parallel Skeletons by Expression Templates

Implementing Fusion-Equipped Parallel Skeletons by Expression Templates Implementing Fusion-Equipped Parallel Skeletons by Expression Templates Kiminori Matsuzaki and Kento Emoto Graduate School of Information Science and Technology, University of Tokyo, Japan. {kmatsu,emoto@ipl.t.u-tokyo.ac.jp

More information

Algorithmic skeletons meeting grids q

Algorithmic skeletons meeting grids q Parallel Computing 32 (2006) 449 462 www.elsevier.com/locate/parco Algorithmic skeletons meeting grids q Marco Danelutto *, Marco Aldinucci Department of Computer Science, University of Pisa, Largo Pontecorvo

More information

Type Safe Algorithmic Skeletons

Type Safe Algorithmic Skeletons Type Safe Algorithmic Skeletons Denis Caromel, Ludovic Henrio, and Mario Leyton INRIA Sophia-Antipolis, Université de Nice Sophia-Antipolis, CNRS - I3S 2004, Route des Lucioles, BP 93, F-06902 Sophia-Antipolis

More information

Exceptions for Algorithmic Skeletons

Exceptions for Algorithmic Skeletons See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/44016639 Exceptions for Algorithmic Skeletons Article August 2010 DOI: 10.1007/978-3-642-15291-7_3

More information

The multi/many core challenge: a pattern based programming perspective

The multi/many core challenge: a pattern based programming perspective The multi/many core challenge: a pattern based programming perspective Marco Dept. Computer Science, Univ. of Pisa CoreGRID Programming model Institute XXII Jornadas de Paralelismo Sept. 7-9, 2011, La

More information

Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed

Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed Marco Aldinucci Computer Science Dept. - University of Torino - Italy Marco Danelutto, Massimiliano Meneghin,

More information

Communicating Process Architectures in Light of Parallel Design Patterns and Skeletons

Communicating Process Architectures in Light of Parallel Design Patterns and Skeletons Communicating Process Architectures in Light of Parallel Design Patterns and Skeletons Dr Kevin Chalmers School of Computing Edinburgh Napier University Edinburgh k.chalmers@napier.ac.uk Overview ˆ I started

More information

AssistConf: a Grid configuration tool for the ASSIST parallel programming environment

AssistConf: a Grid configuration tool for the ASSIST parallel programming environment AssistConf: a Grid configuration tool for the ASSIST parallel programming environment R. Baraglia 1, M. Danelutto 2, D. Laforenza 1, S. Orlando 3, P. Palmerini 1,3, P. Pesciullesi 2, R. Perego 1, M. Vanneschi

More information

Design patterns percolating to parallel programming framework implementation

Design patterns percolating to parallel programming framework implementation International Journal of Parallel Programming - ISSN 0885-7458 DOI: 10.1007/s10766-013-0273-6 The final publication is available at link.springer.com Design patterns percolating to parallel programming

More information

Scalable Farms. Michael Poldner a, Herbert Kuchen a. D Münster, Germany

Scalable Farms. Michael Poldner a, Herbert Kuchen a. D Münster, Germany 1 Scalable Farms Michael Poldner a, Herbert Kuchen a a University of Münster, Department of Information Systems, Leonardo Campus 3, D-4819 Münster, Germany Algorithmic skeletons intend to simplify parallel

More information

Fine Tuning Algorithmic Skeletons

Fine Tuning Algorithmic Skeletons Fine Tuning Algorithmic Skeletons Denis Caromel and Mario Leyton INRIA Sophia-Antipolis, CNRS, I3S, UNSA. 2004, Route des Lucioles, BP 93, F-06902 Sophia-Antipolis Cedex, France. First.Last@sophia.inria.fr

More information

Structured approaches for multi/many core targeting

Structured approaches for multi/many core targeting Structured approaches for multi/many core targeting Marco Danelutto M. Torquati, M. Aldinucci, M. Meneghin, P. Kilpatrick, D. Buono, S. Lametti Dept. Computer Science, Univ. of Pisa CoreGRID Programming

More information

eskimo: EXPERIMENTING WITH SKELETONS IN THE SHARED ADDRESS MODEL

eskimo: EXPERIMENTING WITH SKELETONS IN THE SHARED ADDRESS MODEL c World Scientific Publishing Company eskimo: EXPERIMENTING WITH SKELETONS IN THE SHARED ADDRESS MODEL MARCO ALDINUCCI Inst. of Information Science and Technologies (ISTI) National Research Council (CNR)

More information

Efficient Smith-Waterman on multi-core with FastFlow

Efficient Smith-Waterman on multi-core with FastFlow BioBITs Euromicro PDP 2010 - Pisa Italy - 17th Feb 2010 Efficient Smith-Waterman on multi-core with FastFlow Marco Aldinucci Computer Science Dept. - University of Torino - Italy Massimo Torquati Computer

More information

A Fusion-Embedded Skeleton Library

A Fusion-Embedded Skeleton Library A Fusion-Embedded Skeleton Library Kiminori Matsuzaki 1, Kazuhiko Kakehi 1, Hideya Iwasaki 2, Zhenjiang Hu 1,3, and Yoshiki Akashi 2 1 Graduate School of Information Science and Technology, University

More information

Structured parallel programming

Structured parallel programming Marco Dept. Computer Science, Univ. of Pisa DSL 2013 July 2013, Cluj, Romania Contents Introduction Structured programming Targeting HM2C Managing vs. computing Progress... Introduction Structured programming

More information

A UNIFIED MAPREDUCE PROGRAMMING INTERFACE FOR MULTI-CORE AND DISTRIBUTED ARCHITECTURES

A UNIFIED MAPREDUCE PROGRAMMING INTERFACE FOR MULTI-CORE AND DISTRIBUTED ARCHITECTURES PONTIFICAL CATHOLIC UNIVERSITY OF RIO GRANDE DO SUL FACULTY OF INFORMATICS COMPUTER SCIENCE GRADUATE PROGRAM A UNIFIED MAPREDUCE PROGRAMMING INTERFACE FOR MULTI-CORE AND DISTRIBUTED ARCHITECTURES DANIEL

More information

Parallel Skeletons for Variable-Length Lists in SkeTo Skeleton Library

Parallel Skeletons for Variable-Length Lists in SkeTo Skeleton Library Parallel Skeletons for Variable-Length Lists in SkeTo Skeleton Library Aug. 27, 2009 Haruto Tanno Hideya Iwasaki The University of Electro-Communications (Japan) 1 Outline Introduction Problems of Exiting

More information

arxiv: v1 [cs.dc] 16 Sep 2016

arxiv: v1 [cs.dc] 16 Sep 2016 State access patterns in embarrassingly parallel computations Marco Danelutto & Massimo Torquati Dept. of Computer Science Univ. of Pisa {marcod,torquati}@di.unipi.it Peter Kilpatrick Dept. of Computer

More information

TOWARDS THE AUTOMATIC MAPPING OF ASSIST APPLICATIONS FOR THE GRID

TOWARDS THE AUTOMATIC MAPPING OF ASSIST APPLICATIONS FOR THE GRID TOWARDS THE AUTOMATIC MAPPING OF ASSIST APPLICATIONS FOR THE GRID Marco Aldinucci Computer Science Departement, University of Pisa Largo Bruno Pontecorvo 3, I-56127 Pisa, Italy aldinuc@di.unipi.it Anne

More information

State Access Patterns in Stream Parallel Computations

State Access Patterns in Stream Parallel Computations State Access Patterns in Stream Parallel Computations M. Danelutto, P. Kilpatrick, G. Mencagli M. Torquati Dept. of Computer Science Univ. of Pisa Dept. of Computer Science Queen s Univ. Belfast Abstract

More information

The Loop-of-Stencil-Reduce paradigm

The Loop-of-Stencil-Reduce paradigm The Loop-of-Stencil-Reduce paradigm M. Aldinucci, M. Danelutto, M. Drocco, P. Kilpatrick, G. Peretti Pezzi and M. Torquati Computer Science Department, University of Turin, Italy. Computer Science Department,

More information

ON IMPLEMENTING THE FARM SKELETON

ON IMPLEMENTING THE FARM SKELETON ON IMPLEMENTING THE FARM SKELETON MICHAEL POLDNER and HERBERT KUCHEN Department of Information Systems, University of Münster, D-48149 Münster, Germany ABSTRACT Algorithmic skeletons intend to simplify

More information

The Implementation of ASSIST, an Environment for Parallel and Distributed Programming

The Implementation of ASSIST, an Environment for Parallel and Distributed Programming The Implementation of ASSIST, an Environment for Parallel and Distributed Programming Marco Aldinucci 2, Sonia Campa 1, Pierpaolo Ciullo 1, Massimo Coppola 2, Silvia Magini 1, Paolo Pesciullesi 1, Laura

More information

LIBERO: a framework for autonomic management of multiple non-functional concerns

LIBERO: a framework for autonomic management of multiple non-functional concerns LIBERO: a framework for autonomic management of multiple non-functional concerns M. Aldinucci, M. Danelutto, P. Kilpatrick, V. Xhagjika University of Torino University of Pisa Queen s University Belfast

More information

Exercising high-level parallel programming on streams: a systems biology use case

Exercising high-level parallel programming on streams: a systems biology use case Exercising high-level parallel programming on streams: a systems biology use case Marco Aldinucci, Maurizio Drocco, Guilherme Peretti Pezzi, Claudia Misale, Fabio Tordini Computer Science Department, University

More information

An efficient Unbounded Lock-Free Queue for Multi-Core Systems

An efficient Unbounded Lock-Free Queue for Multi-Core Systems An efficient Unbounded Lock-Free Queue for Multi-Core Systems Authors: Marco Aldinucci 1, Marco Danelutto 2, Peter Kilpatrick 3, Massimiliano Meneghin 4 and Massimo Torquati 2 1 Computer Science Dept.

More information

TEMPLET: A MARKUP LANGUAGE FOR CONCURRENT ACTOR-ORIENTED PROGRAMMING

TEMPLET: A MARKUP LANGUAGE FOR CONCURRENT ACTOR-ORIENTED PROGRAMMING TEMPLET: A MARKUP LANGUAGE FOR CONCURRENT ACTOR-ORIENTED PROGRAMMING S.V. Vostokin Samara National Research University, Samara, Russia Abstract. The article presents a markup domain-specific language (DSL)

More information

Optimization Techniques for Implementing Parallel Skeletons in Grid Environments

Optimization Techniques for Implementing Parallel Skeletons in Grid Environments Optimization Techniques for Implementing Parallel Skeletons in Grid Environments M. Aldinucci 1,M.Danelutto 2,andJ.Dünnweber 3 1 Inst. of Information Science and Technologies CNR, Via Moruzzi 1, Pisa,

More information

CSE 333 SECTION 9. Threads

CSE 333 SECTION 9. Threads CSE 333 SECTION 9 Threads HW4 How s HW4 going? Any Questions? Threads Sequential execution of a program. Contained within a process. Multiple threads can exist within the same process. Every process starts

More information

Efficient, Deterministic, and Deadlock-free Concurrency

Efficient, Deterministic, and Deadlock-free Concurrency Efficient, Deterministic Concurrency p. 1/31 Efficient, Deterministic, and Deadlock-free Concurrency Nalini Vasudevan Columbia University Efficient, Deterministic Concurrency p. 2/31 Data Races int x;

More information

Enhancing the performance of Grid Applications with Skeletons and Process Algebras

Enhancing the performance of Grid Applications with Skeletons and Process Algebras Enhancing the performance of Grid Applications with Skeletons and Process Algebras (funded by the EPSRC, grant number GR/S21717/01) A. Benoit, M. Cole, S. Gilmore, J. Hillston http://groups.inf.ed.ac.uk/enhance/

More information

POSIX PTHREADS PROGRAMMING

POSIX PTHREADS PROGRAMMING POSIX PTHREADS PROGRAMMING Download the exercise code at http://www-micrel.deis.unibo.it/~capotondi/pthreads.zip Alessandro Capotondi alessandro.capotondi(@)unibo.it Hardware Software Design of Embedded

More information

Targeting heterogeneous architectures via macro data flow

Targeting heterogeneous architectures via macro data flow Targeting heterogeneous architectures via macro data flow M. Aldinucci Dept. Computer Science University of Torino C.So Svizzera, 185 10149 Torino Italy M. Danelutto Dept. Computer Science University of

More information

Concurrency Patterns in SCOOP

Concurrency Patterns in SCOOP Concurrency Patterns in SCOOP Master Thesis Project Plan Project period: 10. March to 8. September 2014 Student name: Roman Schmocker, 09-911-215 Status: 4. semester, Msc in Computer Science Email address:

More information

Threaded Programming. Lecture 9: Alternatives to OpenMP

Threaded Programming. Lecture 9: Alternatives to OpenMP Threaded Programming Lecture 9: Alternatives to OpenMP What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming

More information

Jun Li, Ph.D. School of Computing and Information Sciences Phone:

Jun Li, Ph.D. School of Computing and Information Sciences Phone: Jun Li, Ph.D. School of Computing and Information Sciences Phone: + 1-305-348-4964 Florida International University Email: junli @ cs. fiu. edu 11200 SW 8th St, ECS 380, Miami, FL 33199 Web: http://users.cs.fiu.edu/

More information

Parallel Programming using FastFlow

Parallel Programming using FastFlow Parallel Programming using FastFlow Massimo Torquati Computer Science Department, University of Pisa - Italy Karlsruhe, September 2nd, 2014 Outline Structured Parallel Programming

More information

Lithium: A Structured Parallel Programming Environment in Java

Lithium: A Structured Parallel Programming Environment in Java Lithium: A Structured Parallel Programming Environment in Java M. Danelutto & P. Teti Dept. Computer Science University of Pisa Italy {Marco.Danelutto@di.unipi.it, tetipaol@libero.it} Abstract. We describe

More information

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday

More information

FastFlow: targeting distributed systems

FastFlow: targeting distributed systems FastFlow: targeting distributed systems Massimo Torquati ParaPhrase project meeting, Pisa Italy 11 th July, 2012 torquati@di.unipi.it Talk outline FastFlow basic concepts two-tier parallel model From single

More information

Structured Parallel Programming with Deterministic Patterns

Structured Parallel Programming with Deterministic Patterns Structured Parallel Programming with Deterministic Patterns Michael D. McCool, Intel, michael.mccool@intel.com Many-core processors target improved computational performance by making available various

More information

Molecular Dynamics. Dim=3, parts=8192, steps=10. crayc (Cray T3E) Processors

Molecular Dynamics. Dim=3, parts=8192, steps=10. crayc (Cray T3E) Processors The llc language and its implementation Antonio J. Dorta, Jose Rodr guez, Casiano Rodr guez and Francisco de Sande Dpto. Estad stica, I.O. y Computación Universidad de La Laguna La Laguna, 38271, Spain

More information

Programming Assignment #4

Programming Assignment #4 SSE2030: INTRODUCTION TO COMPUTER SYSTEMS (Fall 2014) Programming Assignment #4 Due: November 15, 11:59:59 PM 1. Introduction The goal of this programing assignment is to enable the student to get familiar

More information

Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17

Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Tutorial Instructors [James Reinders, Michael J. Voss, Pablo Reble, Rafael Asenjo]

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming

More information

Relational Algebra Teaching Support Tool

Relational Algebra Teaching Support Tool Journal of Information Systems Engineering & Management, 2(2), 8 ISSN: 2468-4376 Relational Algebra Teaching Support Tool Jonathas Jivago de Almeida Cruz 1 *, Kleber Kroll de Azevedo Silva 2 1 Federal

More information

Pool evolution: a parallel pattern for evolutionary and symbolic computing

Pool evolution: a parallel pattern for evolutionary and symbolic computing This is an author version of the contribution published by Springer on INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING DOI: 10.1007/s10766-013-0273-6 Pool evolution: a parallel pattern for evolutionary and

More information

CSCI4430 Data Communication and Computer Networks. Pthread Programming. ZHANG, Mi Jan. 26, 2017

CSCI4430 Data Communication and Computer Networks. Pthread Programming. ZHANG, Mi Jan. 26, 2017 CSCI4430 Data Communication and Computer Networks Pthread Programming ZHANG, Mi Jan. 26, 2017 Outline Introduction What is Multi-thread Programming Why to use Multi-thread Programming Basic Pthread Programming

More information

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 19 25 October 2018 Topics for

More information

Brian F. Cooper. Distributed systems, digital libraries, and database systems

Brian F. Cooper. Distributed systems, digital libraries, and database systems Brian F. Cooper Home Office Internet 2240 Homestead Ct. #206 Stanford University cooperb@stanford.edu Los Altos, CA 94024 Gates 424 http://www.stanford.edu/~cooperb/app/ (408) 730-5543 Stanford, CA 94305

More information

Introduction to pthreads

Introduction to pthreads CS 220: Introduction to Parallel Computing Introduction to pthreads Lecture 25 Threads In computing, a thread is the smallest schedulable unit of execution Your operating system has a scheduler that decides

More information

Charm++ Workshop 2010

Charm++ Workshop 2010 Charm++ Workshop 2010 Eduardo R. Rodrigues Institute of Informatics Federal University of Rio Grande do Sul - Brazil ( visiting scholar at CS-UIUC ) errodrigues@inf.ufrgs.br Supported by Brazilian Ministry

More information

FastFlow: targeting distributed systems Massimo Torquati

FastFlow: targeting distributed systems Massimo Torquati FastFlow: targeting distributed systems Massimo Torquati May 17 th, 2012 torquati@di.unipi.it http://www.di.unipi.it/~torquati FastFlow node FastFlow's implementation is based on the concept of node (ff_node

More information

Top-down definition of Network Centric Operating System features

Top-down definition of Network Centric Operating System features Position paper submitted to the Workshop on Network Centric Operating Systems Bruxelles 16-17 march 2005 Top-down definition of Network Centric Operating System features Thesis Marco Danelutto Dept. Computer

More information

Parallelism paradigms

Parallelism paradigms Parallelism paradigms Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 23, 2011 Outline 1 Parallelization strategies 2 Shared memory 3 Distributed memory 4 Parallelization

More information

[8] GEBREMEDHIN, A. H.; MANNE, F.; MANNE, G. F. ; OPENMP, P. Scalable parallel graph coloring algorithms, 2000.

[8] GEBREMEDHIN, A. H.; MANNE, F.; MANNE, G. F. ; OPENMP, P. Scalable parallel graph coloring algorithms, 2000. 9 Bibliography [1] CECKA, C.; LEW, A. J. ; DARVE, E. International Journal for Numerical Methods in Engineering. Assembly of finite element methods on graphics processors, journal, 2010. [2] CELES, W.;

More information

Performance Testing from UML Models with Resource Descriptions *

Performance Testing from UML Models with Resource Descriptions * Performance Testing from UML Models with Resource Descriptions * Flávio M. de Oliveira 1, Rômulo da S. Menna 1, Hugo V. Vieira 1, Duncan D.A. Ruiz 1 1 Faculdade de Informática Pontifícia Universidade Católica

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 19 January 2017 Outline for Today Threaded programming

More information

Communication Library to Overlap Computation and Communication for OpenCL Application

Communication Library to Overlap Computation and Communication for OpenCL Application Communication Library to Overlap Computation and Communication for OpenCL Application Toshiya Komoda, Shinobu Miwa, Hiroshi Nakamura Univ.Tokyo What is today s talk about? Heterogeneous Computing System

More information

[4] ANDREWS, G. R. Foundations of Multithreaded, Parallel, and Distributed Programming. 1. ed. Boston: Addison Wesley, p.

[4] ANDREWS, G. R. Foundations of Multithreaded, Parallel, and Distributed Programming. 1. ed. Boston: Addison Wesley, p. Bibliografia [1] ADELSON, E. H.; BERGEN, J. R. The Plenoptic Function and the Elements of Early Vision. In: LANDY M.; MOVSHON A. Computational Model of Visual Processing. Cambridge, Massachusetts: The

More information

Introduction to FastFlow programming

Introduction to FastFlow programming Introduction to FastFlow programming SPM lecture, November 2016 Massimo Torquati Computer Science Department, University of Pisa - Italy Objectives Have a good idea of the FastFlow

More information

Threads. Threads (continued)

Threads. Threads (continued) Threads A thread is an alternative model of program execution A process creates a thread through a system call Thread operates within process context Use of threads effectively splits the process state

More information

Automatic mapping of ASSIST applications using process algebra

Automatic mapping of ASSIST applications using process algebra Automatic mapping of ASSIST applications using process algebra Marco Aldinucci Dept. of Computer Science, University of Pisa Largo B. Pontecorvo 3, Pisa I-56127, Italy and Anne Benoit LIP, Ecole Normale

More information

Remote and Partial Reconfiguration of FPGAs: Tools and Trends

Remote and Partial Reconfiguration of FPGAs: Tools and Trends Remote and Partial Reconfiguration of FPGAs: Tools and Trends Daniel Mesquita, Fernando Moraes, José palma, Leandro Moller, Ney Calazans Laboratoire de Informatique, de Robotique et de Microéletronique

More information

CS333 Intro to Operating Systems. Jonathan Walpole

CS333 Intro to Operating Systems. Jonathan Walpole CS333 Intro to Operating Systems Jonathan Walpole Threads & Concurrency 2 Threads Processes have the following components: - an address space - a collection of operating system state - a CPU context or

More information

Foundation of Parallel Computing- Term project report

Foundation of Parallel Computing- Term project report Foundation of Parallel Computing- Term project report Shobhit Dutia Shreyas Jayanna Anirudh S N (snd7555@rit.edu) (sj7316@rit.edu) (asn5467@rit.edu) 1. Overview: Graphs are a set of connections between

More information

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008 1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction

More information

Skeleton programming environments Muesli (1)

Skeleton programming environments Muesli (1) Skeleton programming environments Muesli (1) Patrizio Dazzi ISTI - CNR Pisa Research Campus mail: patrizio.dazzi@isti.cnr.it Master Degree (Laurea Magistrale) in Computer Science and Networking Academic

More information

arxiv: v1 [cs.dc] 16 Jun 2016

arxiv: v1 [cs.dc] 16 Jun 2016 Proc. of the 9th Intl Symposium on High-Level Parallel Programming and Applications (HLPP) July 4-5 2016, Muenster, Germany A Comparison of Big Data Frameworks on a Layered Dataflow Model Claudia Misale

More information

Subset Sum Problem Parallel Solution

Subset Sum Problem Parallel Solution Subset Sum Problem Parallel Solution Project Report Harshit Shah hrs8207@rit.edu Rochester Institute of Technology, NY, USA 1. Overview Subset sum problem is NP-complete problem which can be solved in

More information

Introduction to FastFlow programming

Introduction to FastFlow programming Introduction to FastFlow programming SPM lecture, November 2016 Massimo Torquati Computer Science Department, University of Pisa - Italy Data Parallel Computations In data parallel

More information

Contention-Aware Scheduling of Parallel Code for Heterogeneous Systems

Contention-Aware Scheduling of Parallel Code for Heterogeneous Systems Contention-Aware Scheduling of Parallel Code for Heterogeneous Systems Chris Gregg Jeff S. Brantley Kim Hazelwood Department of Computer Science, University of Virginia Abstract A typical consumer desktop

More information

A brief introduction to OpenMP

A brief introduction to OpenMP A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

DryadLINQ. by Yuan Yu et al., OSDI 08. Ilias Giechaskiel. January 28, Cambridge University, R212

DryadLINQ. by Yuan Yu et al., OSDI 08. Ilias Giechaskiel. January 28, Cambridge University, R212 DryadLINQ by Yuan Yu et al., OSDI 08 Ilias Giechaskiel Cambridge University, R212 ig305@cam.ac.uk January 28, 2014 Conclusions Takeaway Messages SQL cannot express iteration Unsuitable for machine learning,

More information

A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming

A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming Shigeyuki Sato and Hideya Iwasaki Department of Computer Science The University of Electro-Communications sato@ipl.cs.uec.ac.jp,

More information

Programming with Shared Memory. Nguyễn Quang Hùng

Programming with Shared Memory. Nguyễn Quang Hùng Programming with Shared Memory Nguyễn Quang Hùng Outline Introduction Shared memory multiprocessors Constructs for specifying parallelism Creating concurrent processes Threads Sharing data Creating shared

More information

Multiprocessors 2007/2008

Multiprocessors 2007/2008 Multiprocessors 2007/2008 Abstractions of parallel machines Johan Lukkien 1 Overview Problem context Abstraction Operating system support Language / middleware support 2 Parallel processing Scope: several

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 28 August 2018 Last Thursday Introduction

More information

Black-Box Program Specialization

Black-Box Program Specialization Published in Technical Report 17/99, Department of Software Engineering and Computer Science, University of Karlskrona/Ronneby: Proceedings of WCOP 99 Black-Box Program Specialization Ulrik Pagh Schultz

More information

Formalizing OO Frameworks and Framework Instantiation

Formalizing OO Frameworks and Framework Instantiation Formalizing OO Frameworks and Framework Instantiation Christiano de O. Braga, Marcus Felipe M. C. da Fontoura, Edward H. Hæusler, and Carlos José P. de Lucena Departamento de Informática, Pontifícia Universidade

More information

Management in Distributed Systems: A Semi-formal Approach

Management in Distributed Systems: A Semi-formal Approach Management in Distributed Systems: A Semi-formal Approach Marco Aldinucci 1, Marco Danelutto 1, and Peter Kilpatrick 2 1 Department of Computer Science, University of Pisa {aldinuc,marcod}@di.unipi.it

More information

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Alexey Paznikov Saint Petersburg Electrotechnical University

More information

Skeletor: A DSL for Describing Type-based Specifications of Parallel Skeletons

Skeletor: A DSL for Describing Type-based Specifications of Parallel Skeletons Skeletor: A DSL for Describing Type-based Specifications of Parallel Skeletons David Castro Kevin Hammond School of Computer Science, University of St Andrews, St Andrews, UK. dc84@st-andrews.ac.uk, kh@cs.st-andrews.ac.uk

More information

Ordered Read Write Locks for Multicores and Accelarators

Ordered Read Write Locks for Multicores and Accelarators Ordered Read Write Locks for Multicores and Accelarators INRIA & ICube Strasbourg, France mariem.saied@inria.fr ORWL, Ordered Read-Write Locks An inter-task synchronization model for data-oriented parallel

More information