Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming

Size: px

Start display at page:

Download "Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming"

Monica Potter
5 years ago
Views:

Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming Dalvan Griebler, Luiz Gustavo Fernandes Pontifícia Universidade Católica do Rio Grande do Sul -

1 Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming Dalvan Griebler, Luiz Gustavo Fernandes Pontifícia Universidade Católica do Rio Grande do Sul - PUCRS Programa de Pós-Graduação em Ciência da Computação - PPGCC Grupo de Modelagem de Aplicações Paralelas - GMAP Brazilian Symposium on Programming Languages - SBLP October 203 / 2

2 2 / 2 Summary Introduction 2 Patterns-Oriented Parallel Programming (POPP) 3 DSL-POPP Compilation Process Programming Interface and Implementation Levels of parallelism 4 Results Implementation Example of the DSL-POPP Tests Scenario Performance of DSL-POPP 5 Conclusions 6 References

3 3 / 2 Introduction Skeletons/Patterns ([], [2], [3])

4 3 / 2 Introduction Skeletons/Patterns ([], [2], [3]) Programming Interfaces (FastFlow [4], Muesli [5], SkeTo[6], Skandium [7], eskel[8], P3L [9], Lithium [0], Muskel [] and Skil [2])

5 3 / 2 Introduction Skeletons/Patterns ([], [2], [3]) Programming Interfaces (FastFlow [4], Muesli [5], SkeTo[6], Skandium [7], eskel[8], P3L [9], Lithium [0], Muskel [] and Skil [2]) Main goals of DSL-POPP [3]: Reduce the effort without compromise the performance Patterns-Oriented Parallel Programming Abstract details of patterns implementation Offer different levels of parallelism

6 3 / 2 Introduction Skeletons/Patterns ([], [2], [3]) Programming Interfaces (FastFlow [4], Muesli [5], SkeTo[6], Skandium [7], eskel[8], P3L [9], Lithium [0], Muskel [] and Skil [2]) Main goals of DSL-POPP [3]: Reduce the effort without compromise the performance Patterns-Oriented Parallel Programming Abstract details of patterns implementation Offer different levels of parallelism Paper contributions We propose the POPP model We introduce DSL-POPP We present a case study based on an image processing algorithm

.. Sn m s sn subroutine subroutine n main routine Pipeline pattern code blocks P Pn p pn p pn subroutine subroutine n main routine Legend:

7 4 / 2 Patterns-Oriented Parallel Programming (POPP) Main Routine Code Block Code Block n Subroutine Code Block Code Block n... Subroutine Subroutine n Code Block Code Block n Code Block Code Block n Master/Slave pattern code blocks S m s sn M... Sn m s sn subroutine subroutine n main routine Pipeline pattern code blocks P Pn p pn p pn subroutine subroutine n main routine Legend: M,S: Master/Slave (main routine) m,s: master/slave (subrotine) P: Pipeline stage (main routine) p: pipeline stage (subroutine) Figure: POPP model Figure: Master/Slave - Pipeline.

8 Patterns-Oriented Parallel Programming (POPP) Main Routine Code Block Code Block n Subroutine Code Block Code Block n... Subroutine Subroutine n Code Block Code Block n Code Block Code Block n Master/Slave pattern code blocks S m s sn M... Sn m s sn subroutine subroutine n main routine Pipeline pattern code blocks P Pn p pn p pn subroutine subroutine n main routine Legend: M,S: Master/Slave (main routine) m,s: master/slave (subrotine) P: Pipeline stage (main routine) p: pipeline stage (subroutine) Figure: POPP model Figure: Master/Slave - Pipeline. Hybrid patterns P P2 Pn m s sn subroutine (master/slave) m s sn subroutine 2 (master/slave) main routine (pipeline) p pn subroutine n (pipeline) Figure: Combination of Patterns. 4 / 2

9 5 / 2 Compilation Process DSL-POPP $PipelinePattern @Stage(){ Source Code Pattern Tree Syntatic/Semantic Analysis include pthread.h include smmpi.h SMMPI_send() SMMPI_recv() pthread_create() pthread_join() Source-to-Source Transformation DSL-POPP Precompiler System Figure: Compilation process. GCC Compiler Binary Code

10 6 / 2 Programming Interface and Implementation DSL-POPP $PipelinePattern num_th, void* buffer, int num_th, void* buffer, int num_th, void* buffer, int buf_size){ Pipeline Block C re a t e Work 0 Stage Block thread 0 thread Work 0 T h re a Work 0 d s thread 2 J o i n T h re a d s (a) Pipeline

11 Programming Interface and Implementation DSL-POPP $PipelinePattern num_th, void* buffer, int num_th, void* buffer, int num_th, void* buffer, int buf_size){ Pipeline Block C re a t e Work 0 Stage Block thread 0 thread Work 0 T h re a Work 0 d s thread 2 J o i n T h re a d s (a) Pipeline $MasterSlavePattern num_th, void* buffer, int buf_size, const POPP_LB_Policy){ Master Block Create Threads Work 0.0 Slave Block Work 0.n Work n.0 thread 0 thread n Join Threads Work n.n (b) Master/Slave Figure: Syntax and logical structure of the DSL-POPP Policies for Load Balancing: POPP_LB_STATIC; POPP_LB_DYNAMIC; POPP_LB_COST. 6 / 2

12 7 / 2 Levels of parallelism DSL-POPP Pipeline - Pipeline a) b) Pipeline - Master/Slave Master/Slave - Master/Slave c) d) Master/Slave - Pipeline Control threads (master) First level active threads Second level active threads Figure: Overview of thread graph in DSL-POPP.

13 8 / 2 Implementation Example of the DSL-POPP Results List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt Sobel Roberts IM IM IM Figure: Overview of DSL-POPP Image Processing Algorithm Implementation.

14 9 / 2 Implementation Example of the DSL-POPP Results List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt 2 n IM Split Split Sobel IM Split IM... 2 n Roberts IM Split Figure: Overview of DSL-POPP Image Processing Algorithm Implementation.

15 0 / 2 Implementation Example of the DSL-POPP Results

16 / 2 Implementation Example of the DSL-POPP Results

17 ... 2 / 2 Tests Scenario Results List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Test- Prewitt Sobel Roberts Master/Slave IM Split Master/Slave IM Split Master/Slave IM Split 2 IM n

18 3 / 2 Tests Scenario Results Pipeline List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt Sobel Roberts IM IM2 IM Test-2 IM3 IM39 IM2 IM IM39 IM39

19 ... 4 / 2 Tests Scenario Results List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt Sobel IM Master/Slave Split 2 n Master/Slave IM Split Master/Slave Split 2 IM n Test-3. and Test-3.2 Roberts IM Master/Slave Split

20 ... 5 / 2 Tests Scenario Results Pipeline List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt Sobel Roberts IM IM IM IM Test-4 IM2 IM3 IM IM2 IM Master/Slave IM39 Split Master/Slave IM39 Split Master/Slave IM39 Split IM 2 n

21 6 / 2 Tests Scenario Results List of images with 3000x2550 resolution IM IM2 IM3 IM4 IM39 IM40 Prewitt Sobel 2 n IM IM Master/Slave Split Test-5 Roberts IM

22 Performance of DSL-POPP Results Speedup Test Number of threads Efficiency Speedup Ideal Efficiency Speedup Test Number of threads Efficiency Speedup Ideal Efficiency Speedup Test Number of threads Efficiency Speedup Ideal Efficiency Speedup Test Number of threads Efficiency Speedup Ideal Efficiency Speedup Test Number of threads Efficiency Speedup Ideal Efficiency Speedup Test Number of threads Efficiency Speedup Ideal Efficiency 7 / 2

23 8 / 2 Conclusions About this paper Hide Low level parallel programming primitives Patterns may be easily nested or combined Good performance for image processing application Different parallel implementation tests were performed Future Works Include other parallel patterns Investigate optimized techniques for code generation Effort evaluation.

24 References I Mattson G. T., Sanders A. B., and Massingill L. B. Patterns for Parallel Programming. Addison-Wesley, Boston, USA, Intel and Mccool D. M. Structured Parallel Programming with Deterministic Patterns. In HotPar-2nd USENIX Workshop on Hot Topics in Parallelism, pages 6, Berkeley, CA, June 200. Catanzaro R. and Keutzer K. Parallel Computing with Patterns and Frameworks. XRDS: Crossroads, The ACM Magazine for Students, 7():22 27, 200. Aldinucci M. and Danelutto M. and Kilpatrick P. and Torquati M. FastFlow: High-Level and Efficient Streaming on Multi-core. In Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, chapter 3. Wiley, Boston, USA, 203. Ciechanowicz P. and Kuchen H. Enhancing Muesli s Data Parallel Skeletons for Multi-core Computer Architectures. In High Performance Computing and Communications (HPCC), 200 2th IEEE International Conference on, pages 08 3, Melbourne, Australia, September 200. Karasawa Y. and Iwasaki H. A Parallel Skeleton Library for Multi-core Clusters. In Parallel Processing, ICPP 09. International Conference on, pages 84 9, Vienna, Austria, September / 2

25 References II Leyton M. and Piquer J.M. Skandium: Multi-core Programming with Algorithmic Skeletons. In Parallel, Distributed and Network-Based Processing (PDP), 200 8th Euromicro International Conference on, pages , Pisa, Italy, February 200. Benoit A., Cole M., Gilmore S., and Hillston J. Flexible Skeletal Programming with eskel. In Proceedings of the th international Euro-Par conference on Parallel Processing, pages , Lisboa, Portugal, September, Bacci B. and Danelutto M. and Orlando S. and Pelagatti S. and Vanneschi M. P3L: A Structured High-Level Parallel Language, and its Structured Support. Concurrency: Practice and Experience, 7(3): , 995. Aldinucci M. and Danelutto M. and Teti P. An Advanced Environment Supporting Structured Parallel Programming in Java. Future Gener. Comput. Syst., 9(5):6 626, Aldinucci M. and Danelutto M. and Kilpatrick P. Skeletons for Multi/Many-core Systems. In Parallel Computing: From Multicores and GPU s to Petascale (Proc. of PARCO 2009, Lyon, France), pages , Lyon, France, September Botorog G.H. and Kuchen H. Skil: An Imperative Language with Algorithmic Skeletons for Efficient Distributed Programming. In High Performance Distributed Computing, 996., Proceedings of 5th IEEE International Symposium on, pages , Syracuse, NY, USA, August / 2

26 References III Griebler D. J. Proposta de uma Linguagem Específica de Domínio de Programação Paralela Orientada a Padrões Paralelos: um Estudo de Caso Baseado no Padrão Mestre/Escravo para Arquiteturas Multi-Core. Master s thesis, PUCRS, 202. Voltar para Capa 2 / 2

Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures

Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures Dalvan Griebler, Daniel Adornes, Luiz Gustavo Fernandes Pontifícia Universidade Católica