Parallelism Inherent in the Wavefront Algorithm. Gavin J. Pringle

Size: px
Start display at page:

Download "Parallelism Inherent in the Wavefront Algorithm. Gavin J. Pringle"

Transcription

1 Parallelism Inherent in the Wavefront Algorithm Gavin J. Pringle

2 The Benchmark code Particle transport code using wavefront algorithm Primarily used for benchmarking Coded in Fortran 90 and MPI Scales to thousands of cores for large problems Over 90% of time in one kernel at the heart of the computation 2

3 Serial Algorithm Outline Outer iteration Loop over energy groups Inner iteration Loop over sweeps Loop over cells in z direction Loop over cells in y direction Loop over cells in x direction Loop over angles (only independent loop!) work (90% of time spent here) End loop over angles End loop over cells in x direction End loop over cells in y direction

4 Close up of parallelised loops over cells Loop over cells in z direction Possible MPI_Recv communications Loop over cells in y direction Loop over cells in x direction Loop over angles (number of angles too small for MPI) work End loop over angles End loop over cells in x direction End loop over cells in y direction Possible MPI_Ssend communcations End loop over cells in z direction

5 MPI 2D decomposition is 2D decomposition of front x-y face. Figure shows 4 MPI tasks k l j 5

6 Diagram of dependicies This diagram shows the domain of one MPI task MPI data FromTop A cell cannot be processed until all cells upstream have been processed. MPI data ToLeft MPI data FromRight 6 MPI data ToBottom

7 Sweep order: 3D diagonal slices MPI data FromTop MPI data ToLeft MPI data FromRight Cells of the same colour are independent and may be processed in parallel once preceding slices are complete. 7 MPI data ToBottom

8 Slice shapes (6x6x6) Increasing triangles Then transforming Hexagons Then decreasing (flipped) triangles 8

9 Slice 1 Cell nearest the viewer 9

10 Slice 2 Moving down away from viewer 10

11 11 Slice 3

12 12 Slice 4

13 13 Slice 5

14 14 Slice 6

15 15 Slice 7

16 16 Slice 8

17 17 Slice 9

18 18 Slice 10

19 19 Slice 11

20 20 Slice 12

21 21 Slice 13

22 22 Slice 14

23 23 Slice 15

24 Slice 16 Point furthest from viewer 24

25 Close up of parallelised loops over cells using MPI Loop over cells in z direction Possible MPI_Recv communications Loop over cells in y direction Loop over cells in x direction Loop over angles (number of angles too small for MPI) work End loop over angles End loop over cells in x direction End loop over cells in y direction Possible MPI_Ssend communcations End loop over cells in z direction

26 Loop over slices Possible MPI_Recv communications OMP DO PARALLEL Loop over cells in each slice OMP DO PARALLEL Loop over angles work End loop over angles OMP END DO PARALLEL End Loop over cells in each slice OMP END DO PARALLEL Possible MPI_Ssend communcations End loop over slices Close up of parallelised loops over cells using MPI and OpenMP

27 Parallel Algorithm Outline Outer iteration Loop over energy groups Inner iteration Loop over sweeps Loop over slices Possible MPI_Recv communications OMP DO PARALLEL Loop over cells in each slice OMP DO PARALLEL Loop over angles work End loop over angles Etc

28 Decoupling inter-dependant energy group calculations Initially, each energy group calculation used a previous energy groups results as input Decoupling the energy groups has two outcomes Execution time is greatly increased Energy Groups are now independent and can be parallelised Often seen in HPC Modern algorithms can be inherently serial An older version may be parallelisable

29 TaskFarm Summary If all the tasks take the same time to compute Block distribution of tasks Cyclic distribution of tasks dealing cards else if all tasks have different execution times If length of tasks are unknown in advance else endif Endif Cyclic distribution of tasks Order tasks: longest first, shortest last Cyclic distribution of tasks

30 Final Parallel Algorithm Outline Outer iteration MPI Task Farm of energy groups Inner iteration Loop over sweeps Loop over slices Possible MPI_Recv communications OMP DO PARALLEL Loop over cells in each slice OMP DO PARALLEL Loop over angles work End loop over angles Etc

31 Other wavefront codes have the loops in a different order Conclusion Loop over energy groups can occur within loops over cells and might be parallelised with OpenMP Must be decoupled

32 Thank you Any questions?

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

Parallel Programming with MPI and OpenMP

Parallel Programming with MPI and OpenMP Parallel Programming with MPI and OpenMP Michael J. Quinn Chapter 6 Floyd s Algorithm Chapter Objectives Creating 2-D arrays Thinking about grain size Introducing point-to-point communications Reading

More information

Sweep3D analysis. Jesús Labarta, Judit Gimenez CEPBA-UPC

Sweep3D analysis. Jesús Labarta, Judit Gimenez CEPBA-UPC Sweep3D analysis Jesús Labarta, Judit Gimenez CEPBA-UPC Objective & index Objective: Describe the analysis and improvements in the Sweep3D code using Paraver Compare MPI and OpenMP versions Index The algorithm

More information

MPI Programming. Henrik R. Nagel Scientific Computing IT Division

MPI Programming. Henrik R. Nagel Scientific Computing IT Division 1 MPI Programming Henrik R. Nagel Scientific Computing IT Division 2 Outline Introduction Finite Difference Method Finite Element Method LU Factorization SOR Method Monte Carlo Method Molecular Dynamics

More information

Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP

Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP Ewa Deelman and Rajive Bagrodia UCLA Computer Science Department deelman@cs.ucla.edu, rajive@cs.ucla.edu http://pcl.cs.ucla.edu

More information

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Optimising the Mantevo benchmark suite for multi- and many-core architectures Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of

More information

Parallel I/O. and split communicators. David Henty, Fiona Ried, Gavin J. Pringle

Parallel I/O. and split communicators. David Henty, Fiona Ried, Gavin J. Pringle Parallel I/O and split communicators David Henty, Fiona Ried, Gavin J. Pringle Dr Gavin J. Pringle Applications Consultant gavin@epcc.ed.ac.uk +44 131 650 6709 4x4 array on 2x2 Process Grid Parallel IO

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

LooPo: Automatic Loop Parallelization

LooPo: Automatic Loop Parallelization LooPo: Automatic Loop Parallelization Michael Claßen Fakultät für Informatik und Mathematik Düsseldorf, November 27 th 2008 Model-Based Loop Transformations model-based approach: map source code to an

More information

Problem Solving with Python Challenges 2 Scratch to Python

Problem Solving with Python Challenges 2 Scratch to Python Problem Solving with Python Challenges 2 Scratch to Python Contents 1 Drawing a triangle... 1 2 Generalising our program to draw regular polygons in Scratch and Python... 2 2.1 Drawing a square... 2 2.2

More information

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 18. Combining MPI and OpenMP

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 18. Combining MPI and OpenMP Chapter 18 Combining MPI and OpenMP Outline Advantages of using both MPI and OpenMP Case Study: Conjugate gradient method Case Study: Jacobi method C+MPI vs. C+MPI+OpenMP Interconnection Network P P P

More information

All-Pairs Shortest Paths - Floyd s Algorithm

All-Pairs Shortest Paths - Floyd s Algorithm All-Pairs Shortest Paths - Floyd s Algorithm Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 31, 2011 CPD (DEI / IST) Parallel

More information

Section 7.2 Volume: The Disk Method

Section 7.2 Volume: The Disk Method Section 7. Volume: The Disk Method White Board Challenge Find the volume of the following cylinder: No Calculator 6 ft 1 ft V 3 1 108 339.9 ft 3 White Board Challenge Calculate the volume V of the solid

More information

Laplace Exercise Solution Review

Laplace Exercise Solution Review Laplace Exercise Solution Review John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2017 Finished? If you have finished, we can review a few principles that you have inevitably

More information

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri Thejas Ramashekar Chandan Reddy Uday Bondhugula Department of Computer Science and Automation

More information

Computational Geometry

Computational Geometry Lecture 12: Lecture 12: Motivation: Terrains by interpolation To build a model of the terrain surface, we can start with a number of sample points where we know the height. Lecture 12: Motivation: Terrains

More information

Distributed Memory Programming With MPI (4)

Distributed Memory Programming With MPI (4) Distributed Memory Programming With MPI (4) 2014 Spring Jinkyu Jeong (jinkyu@skku.edu) 1 Roadmap Hello World in MPI program Basic APIs of MPI Example program The Trapezoidal Rule in MPI. Collective communication.

More information

Fractals exercise. Investigating task farms and load imbalance

Fractals exercise. Investigating task farms and load imbalance Fractals exercise Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Parallelizing Loops. Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna.

Parallelizing Loops. Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna. Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ Copyright 2017, 2018 Moreno Marzolla, Università di Bologna, Italy (http://www.moreno.marzolla.name/teaching/hpc/)

More information

Optimisation of LESsCOAL for largescale high-fidelity simulation of coal pyrolysis and combustion

Optimisation of LESsCOAL for largescale high-fidelity simulation of coal pyrolysis and combustion Optimisation of LESsCOAL for largescale high-fidelity simulation of coal pyrolysis and combustion Kaidi Wan 1, Jun Xia 2, Neelofer Banglawala 3, Zhihua Wang 1, Kefa Cen 1 1. Zhejiang University, Hangzhou,

More information

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter May 20-22, 2013, IHPC 2013, Iowa City 2,500 BC: Military Invents Parallelism Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced

More information

Parallel Programming Libraries and implementations

Parallel Programming Libraries and implementations Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

1ACE Exercise 6. Name Date Class. 6. a. Draw ALL the lines of symmetry on Shape 1 and Shape 2 below. HINT Refer back to Problem 1.

1ACE Exercise 6. Name Date Class. 6. a. Draw ALL the lines of symmetry on Shape 1 and Shape 2 below. HINT Refer back to Problem 1. 1ACE Exercise 6 Investigation 1 6. a. Draw ALL the lines of symmetry on Shape 1 and Shape 2 below. HINT Refer to Problem 1.2 for an explanation of lines of symmetry. Shape 1 Shape 2 b. Do these shapes

More information

Intro to Parallel Computing

Intro to Parallel Computing Outline Intro to Parallel Computing Remi Lehe Lawrence Berkeley National Laboratory Modern parallel architectures Parallelization between nodes: MPI Parallelization within one node: OpenMP Why use parallel

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Lecture 11 Combinatorial Planning: In the Plane

Lecture 11 Combinatorial Planning: In the Plane CS 460/560 Introduction to Computational Robotics Fall 2017, Rutgers University Lecture 11 Combinatorial Planning: In the Plane Instructor: Jingjin Yu Outline Convex shapes, revisited Combinatorial planning

More information

Optimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition

Optimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition Optimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition CUG 2018, Stockholm Andreas Jocksch, Matthias Kraushaar (CSCS), David Daverio (University of Cambridge,

More information

Overpartioning with the Rice dhpf Compiler

Overpartioning with the Rice dhpf Compiler Overpartioning with the Rice dhpf Compiler Strategies for Achieving High Performance in High Performance Fortran Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/hug00overpartioning.pdf

More information

Performance Analysis and Optimization of a Deterministic Radiation Transport Code on the Cray SV1

Performance Analysis and Optimization of a Deterministic Radiation Transport Code on the Cray SV1 Performance Analysis and Optimization of a Deterministic Radiation Transport Code on the Cray SV1 Peter Cebull Advisory Engineer May 20, 2004 Outline Background Description of Attila Initial analysis and

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

ecse08-10: Optimal parallelisation in CASTEP

ecse08-10: Optimal parallelisation in CASTEP ecse08-10: Optimal parallelisation in CASTEP Arjen, Tamerus University of Cambridge at748@cam.ac.uk Phil, Hasnip University of York phil.hasnip@york.ac.uk July 31, 2017 Abstract We describe an improved

More information

2011 James S. Rickards Fall Invitational Geometry Team Round QUESTION 1

2011 James S. Rickards Fall Invitational Geometry Team Round QUESTION 1 QUESTION 1 In the diagram above, 1 and 5 are supplementary and 2 = 6. If 1 = 34 and 2 = 55, find 3 + 4 + 5 + 6. QUESTION 2 A = The sum of the degrees of the interior angles of a regular pentagon B = The

More information

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software

More information

COMP528: Multi-core and Multi-Processor Computing

COMP528: Multi-core and Multi-Processor Computing COMP528: Multi-core and Multi-Processor Computing Dr Michael K Bane, G14, Computer Science, University of Liverpool m.k.bane@liverpool.ac.uk https://cgi.csc.liv.ac.uk/~mkbane/comp528 2X So far Why and

More information

Dr. Ilia Bermous, the Australian Bureau of Meteorology. Acknowledgements to Dr. Martyn Corden (Intel), Dr. Zhang Zhang (Intel), Dr. Martin Dix (CSIRO)

Dr. Ilia Bermous, the Australian Bureau of Meteorology. Acknowledgements to Dr. Martyn Corden (Intel), Dr. Zhang Zhang (Intel), Dr. Martin Dix (CSIRO) Performance, accuracy and bit-reproducibility aspects in handling transcendental functions with Cray and Intel compilers for the Met Office Unified Model Dr. Ilia Bermous, the Australian Bureau of Meteorology

More information

Point-to-Point Synchronisation on Shared Memory Architectures

Point-to-Point Synchronisation on Shared Memory Architectures Point-to-Point Synchronisation on Shared Memory Architectures J. Mark Bull and Carwyn Ball EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland, U.K. email:

More information

Learning outcomes. COMPSCI 101 Principles of Programming. Drawing 2D shapes using Characters. Printing a Row of characters

Learning outcomes. COMPSCI 101 Principles of Programming. Drawing 2D shapes using Characters. Printing a Row of characters Learning outcomes At the end of this lecture, students should be able to draw 2D shapes using characters draw 2D shapes on a Canvas COMPSCI 101 Principles of Programming Lecture 25 - Using the Canvas widget

More information

Parallelization of DQMC Simulations for Strongly Correlated Electron Systems

Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Che-Rung Lee Dept. of Computer Science National Tsing-Hua University Taiwan joint work with I-Hsin Chung (IBM Research), Zhaojun

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

2018 AIME I Problems

2018 AIME I Problems 2018 AIME I Problems Problem 1 Let be the number of ordered pairs of integers, with and such that the polynomial x + x + can be factored into the product of two (not necessarily distinct) linear factors

More information

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran

More information

MPI Casestudy: Parallel Image Processing

MPI Casestudy: Parallel Image Processing MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by

More information

=1, and F n. for all n " 2. The recursive definition of Fibonacci numbers immediately gives us a recursive algorithm for computing them:

=1, and F n. for all n  2. The recursive definition of Fibonacci numbers immediately gives us a recursive algorithm for computing them: Wavefront Pattern I. Problem Data elements are laid out as multidimensional grids representing a logical plane or space. The dependency between the elements, often formulated by dynamic programming, results

More information

Using Co-Array Fortran to Enhance the Scalability of the EPIC Code

Using Co-Array Fortran to Enhance the Scalability of the EPIC Code Using Co-Array Fortran to Enhance the Scalability of the EPIC Code Jef Dawson Army High Performance Computing Research Center, Network Computing Services, Inc. ABSTRACT: Supercomputing users continually

More information

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data Python for Development of OpenMP and CUDA Kernels for Multidimensional Data Zane W. Bell 1, Greg G. Davidson 2, Ed D Azevedo 3, Thomas M. Evans 2, Wayne Joubert 4, John K. Munro, Jr. 5, Dilip R. Patlolla

More information

SNAP Performance Benchmark and Profiling. April 2014

SNAP Performance Benchmark and Profiling. April 2014 SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting

More information

Key Learning for Grade 3

Key Learning for Grade 3 Key Learning for Grade 3 The Ontario Curriculum: Mathematics (2005) Number Sense and Numeration Read, represent, compare and order whole numbers to 1000, and use concrete materials to investigate fractions

More information

Fractals. Investigating task farms and load imbalance

Fractals. Investigating task farms and load imbalance Fractals Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

The Power of Streams on the SRC MAP. Wim Bohm Colorado State University. RSS!2006 Copyright 2006 SRC Computers, Inc. ALL RIGHTS RESERVED.

The Power of Streams on the SRC MAP. Wim Bohm Colorado State University. RSS!2006 Copyright 2006 SRC Computers, Inc. ALL RIGHTS RESERVED. The Power of Streams on the SRC MAP Wim Bohm Colorado State University RSS!2006 Copyright 2006 SRC Computers, Inc. ALL RIGHTS RSRV. MAP C Pure C runs on the MAP Generated code: circuits Basic blocks in

More information

Directive-based Programming for Highly-scalable Nodes

Directive-based Programming for Highly-scalable Nodes Directive-based Programming for Highly-scalable Nodes Doug Miles Michael Wolfe PGI Compilers & Tools NVIDIA Cray User Group Meeting May 2016 Talk Outline Increasingly Parallel Nodes Exposing Parallelism

More information

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new

More information

A4. Intro to Parallel Computing

A4. Intro to Parallel Computing Self-Consistent Simulations of Beam and Plasma Systems Steven M. Lund, Jean-Luc Vay, Rémi Lehe and Daniel Winklehner Colorado State U., Ft. Collins, CO, 13-17 June, 2016 A4. Intro to Parallel Computing

More information

4-8 Notes. Warm-Up. Discovering the Rule for Finding the Total Degrees in Polygons. Triangles

4-8 Notes. Warm-Up. Discovering the Rule for Finding the Total Degrees in Polygons. Triangles Date: Learning Goals: How do you find the measure of the sum of interior and exterior angles in a polygon? How do you find the measures of the angles in a regular polygon? Warm-Up 1. What is similar about

More information

Mochalskyy Serhiy 1, Dirk Wünderlich 1, Benjamin Ruf 1, Peter Franzen 1 Ursel Fanz 1 and Tiberiu Minea 2

Mochalskyy Serhiy 1, Dirk Wünderlich 1, Benjamin Ruf 1, Peter Franzen 1 Ursel Fanz 1 and Tiberiu Minea 2 3D Numerical Simulations of the Negative Ion Extraction Using Realistic Plasma Parameters, Geometry of the Extraction Aperture and Full 3D Magnetic Field Map 1, Dirk Wünderlich 1, Benjamin Ruf 1, Peter

More information

Measuring Triangles. 1 cm 2. 1 cm. 1 cm

Measuring Triangles. 1 cm 2. 1 cm. 1 cm 3 Measuring Triangles You can find the area of a figure by drawing it on a grid (or covering it with a transparent grid) and counting squares, but this can be very time consuming. In Investigation 1, you

More information

Lesson 18: Slicing on an Angle

Lesson 18: Slicing on an Angle Student Outcomes Students describe polygonal regions that result from slicing a right rectangular prism or pyramid by a plane that is not necessarily parallel or perpendicular to a base. Lesson Notes In

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for

More information

CFD exercise. Regular domain decomposition

CFD exercise. Regular domain decomposition CFD exercise Regular domain decomposition Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

PLC Papers Created For:

PLC Papers Created For: PLC Papers Created For: Year 10 Topic Practice Papers: Polygons Polygons 1 Grade 4 Look at the shapes below A B C Shape A, B and C are polygons Write down the mathematical name for each of the polygons

More information

Understanding Dynamic Parallelism

Understanding Dynamic Parallelism Understanding Dynamic Parallelism Know your code and know yourself Presenter: Mark O Connor, VP Product Management Agenda Introduction and Background Fixing a Dynamic Parallelism Bug Understanding Dynamic

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing Introduction to Parallel Computing with MPI and OpenMP P. Ramieri Segrate, November 2016 Course agenda Tuesday, 22 November 2016 9.30-11.00 01 - Introduction to parallel

More information

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially

More information

STUDYING OPENMP WITH VAMPIR

STUDYING OPENMP WITH VAMPIR STUDYING OPENMP WITH VAMPIR Case Studies Sparse Matrix Vector Multiplication Load Imbalances November 15, 2017 Studying OpenMP with Vampir 2 Sparse Matrix Vector Multiplication y 1 a 11 a n1 x 1 = y m

More information

Parallel Programming Patterns Overview and Concepts

Parallel Programming Patterns Overview and Concepts Parallel Programming Patterns Overview and Concepts Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Laplace Exercise Solution Review

Laplace Exercise Solution Review Laplace Exercise Solution Review John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Finished? If you have finished, we can review a few principles that you have inevitably

More information

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015 PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability

More information

An Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels

An Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels An Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels Masatomo Hashimoto Masaaki Terai Toshiyuki Maeda Kazuo Minami 26/04/2017 ICPE2017 1 Agenda Performance engineering

More information

Experiences with Achieving Portability across Heterogeneous Architectures

Experiences with Achieving Portability across Heterogeneous Architectures Experiences with Achieving Portability across Heterogeneous Architectures Lukasz G. Szafaryn +, Todd Gamblin ++, Bronis R. de Supinski ++ and Kevin Skadron + + University of Virginia ++ Lawrence Livermore

More information

Whatcom County Math Championship 2018 Geometry -4 th Grade

Whatcom County Math Championship 2018 Geometry -4 th Grade Whatcom County Math Championship 2018 Geometry -4 th Grade 1. How many diagonals does a hexagon have? 2. At 7:00 pm, what is the degree measure of the angle formed between the minute and the hour hand?

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

Dynamic Loop Parallelisation

Dynamic Loop Parallelisation Dynamic Loop Parallelisation Adrian Jackson and Orestis Agathokleous EPCC, The University of Edinburgh, Kings Buildings, Mayfield Road, Edinburgh, EH9 3JZ, UK arxiv:1205.2367v1 [cs.pl] 10 May 2012 Abstract

More information

MPI Profile (mpip) on IRS BenchMark Application

MPI Profile (mpip) on IRS BenchMark Application MPI Profile (mpip) on IRS BenchMark Application MPI with ZRAD8 -np=2 -np=6 -np=1 Call App% MPI% App% MPI% App% MPI% App% MPI% App% MPI% any.62 3.94.79 28.22 26.23 73. 1.72 1.67 1.46 6.28.12 1.1.24 8.44

More information

Bring your application to a new era:

Bring your application to a new era: Bring your application to a new era: learning by example how to parallelize and optimize for Intel Xeon processor and Intel Xeon Phi TM coprocessor Manel Fernández, Roger Philp, Richard Paul Bayncore Ltd.

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

Parallel Query Optimisation

Parallel Query Optimisation Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation

More information

Introduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/

Introduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/ Introduction to MPI Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Distributed Memory Programming (MPI) Message Passing Model Initializing and terminating programs Point to point

More information

Mixed Mode MPI / OpenMP Programming

Mixed Mode MPI / OpenMP Programming Mixed Mode MPI / OpenMP Programming L.A. Smith Edinburgh Parallel Computing Centre, Edinburgh, EH9 3JZ 1 Introduction Shared memory architectures are gradually becoming more prominent in the HPC market,

More information

Wallpaper Groups and Statistical Geometry

Wallpaper Groups and Statistical Geometry Wallpaper Groups and Statistical Geometry John Shier Abstract. The arrangement of regions filled by statistical geometry into arrays with specific symmetry properties is studied for the space groups p2mm,

More information

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System

More information

Scientific Programming, Analysis, and Visualization with Python. Mteor 227 Fall 2017

Scientific Programming, Analysis, and Visualization with Python. Mteor 227 Fall 2017 Scientific Programming, Analysis, and Visualization with Python Mteor 227 Fall 2017 Python The Big Picture Interpreted General purpose, high-level Dynamically type Multi-paradigm Object-oriented Functional

More information

Parallel Programming. Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab

Parallel Programming. Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab Parallel Programming Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab Summing n numbers for(i=1; i++; i

More information

Vocabulary: Looking For Pythagoras

Vocabulary: Looking For Pythagoras Vocabulary: Looking For Pythagoras Concept Finding areas of squares and other figures by subdividing or enclosing: These strategies for finding areas were developed in Covering and Surrounding. Students

More information

OpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa

OpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa OpenMP on the FDSM software distributed shared memory Hiroya Matsuba Yutaka Ishikawa 1 2 Software DSM OpenMP programs usually run on the shared memory computers OpenMP programs work on the distributed

More information

Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006

Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006 Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006 Outline Reducing the constant from O ( ɛ d) to O ( ɛ (d 1)/2) in uery time Need to know ɛ ahead of time Preprocessing time and

More information

Similar Triangles. Triangles are similar if corresponding angles are equal. Now take a look at the sides of the two triangles shown in Figure 3.

Similar Triangles. Triangles are similar if corresponding angles are equal. Now take a look at the sides of the two triangles shown in Figure 3. Imagine you are having a picnic at the lake. You look out across the lake and see a sailboat off in the distance the sail is a tiny triangle to the naked eye. For a closer look, you pick up the binoculars

More information

Supercomputing in Plain English Exercise #6: MPI Point to Point

Supercomputing in Plain English Exercise #6: MPI Point to Point Supercomputing in Plain English Exercise #6: MPI Point to Point In this exercise, we ll use the same conventions and commands as in Exercises #1, #2, #3, #4 and #5. You should refer back to the Exercise

More information

Activity 1 Look at the pattern on the number line and find the missing numbers. Model. (b) (c) (a) (b) (c) (d)

Activity 1 Look at the pattern on the number line and find the missing numbers. Model. (b) (c) (a) (b) (c) (d) Lesson Look at the pattern on the number line and find the missing numbers. Model (a) (b) (c) 9 Answers: (a) (b) (c) (a) (b) (c) (a) (b) (c) (a) (b) (c) 00 00. (a) (b) 00. (c) 0 0 (a) (b) (c) Use the number

More information

An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ.

An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ. An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ. Of Pisa Italy 29/02/2012, Nuremberg, Germany ARTEMIS ARTEMIS Joint Joint Undertaking

More information

February Regional Geometry Team: Question #1

February Regional Geometry Team: Question #1 February Regional Geometry Team: Question #1 A = area of an equilateral triangle with a side length of 4. B = area of a square with a side length of 3. C = area of a regular hexagon with a side length

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions

More information

LESSON SUMMARY. Properties of Shapes

LESSON SUMMARY. Properties of Shapes LESSON SUMMARY CXC CSEC MATHEMATICS UNIT Seven: Geometry Lesson 13 Properties of Shapes Textbook: Mathematics, A Complete Course by Raymond Toolsie, Volume 1 and 2. (Some helpful exercises and page numbers

More information

Detection and Analysis of Iterative Behavior in Parallel Applications

Detection and Analysis of Iterative Behavior in Parallel Applications Detection and Analysis of Iterative Behavior in Parallel Applications Karl Fürlinger and Shirley Moore Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University

More information

OpenMP for next generation heterogeneous clusters

OpenMP for next generation heterogeneous clusters OpenMP for next generation heterogeneous clusters Jens Breitbart Research Group Programming Languages / Methodologies, Universität Kassel, jbreitbart@uni-kassel.de Abstract The last years have seen great

More information

Transformation, tessellation and symmetry line symmetry

Transformation, tessellation and symmetry line symmetry Transformation, tessellation and symmetry line symmetry Reflective or line symmetry describes mirror image, when one half of a shape or picture matches the other exactly. The middle line that divides the

More information

Verification of the Hexagonal Ray Tracing Module and the CMFD Acceleration in ntracer

Verification of the Hexagonal Ray Tracing Module and the CMFD Acceleration in ntracer KNS 2017 Autumn Gyeongju Verification of the Hexagonal Ray Tracing Module and the CMFD Acceleration in ntracer October 27, 2017 Seongchan Kim, Changhyun Lim, Young Suk Ban and Han Gyu Joo * Reactor Physics

More information

UNIT 1 SIMILARITY, CONGRUENCE, AND PROOFS Lesson 4: Exploring Congruence Instruction

UNIT 1 SIMILARITY, CONGRUENCE, AND PROOFS Lesson 4: Exploring Congruence Instruction Prerequisite Skills This lesson requires the use of the following skills: constructing perpendicular bisectors copying a segment copying an angle Introduction Think about trying to move a drop of water

More information

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications Center for Information Services and High Performance Computing (ZIH) Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications The Fourth International Workshop on Accelerators and Hybrid Exascale

More information