Parallelism Inherent in the Wavefront Algorithm. Gavin J. Pringle
|
|
- Christopher Chambers
- 5 years ago
- Views:
Transcription
1 Parallelism Inherent in the Wavefront Algorithm Gavin J. Pringle
2 The Benchmark code Particle transport code using wavefront algorithm Primarily used for benchmarking Coded in Fortran 90 and MPI Scales to thousands of cores for large problems Over 90% of time in one kernel at the heart of the computation 2
3 Serial Algorithm Outline Outer iteration Loop over energy groups Inner iteration Loop over sweeps Loop over cells in z direction Loop over cells in y direction Loop over cells in x direction Loop over angles (only independent loop!) work (90% of time spent here) End loop over angles End loop over cells in x direction End loop over cells in y direction
4 Close up of parallelised loops over cells Loop over cells in z direction Possible MPI_Recv communications Loop over cells in y direction Loop over cells in x direction Loop over angles (number of angles too small for MPI) work End loop over angles End loop over cells in x direction End loop over cells in y direction Possible MPI_Ssend communcations End loop over cells in z direction
5 MPI 2D decomposition is 2D decomposition of front x-y face. Figure shows 4 MPI tasks k l j 5
6 Diagram of dependicies This diagram shows the domain of one MPI task MPI data FromTop A cell cannot be processed until all cells upstream have been processed. MPI data ToLeft MPI data FromRight 6 MPI data ToBottom
7 Sweep order: 3D diagonal slices MPI data FromTop MPI data ToLeft MPI data FromRight Cells of the same colour are independent and may be processed in parallel once preceding slices are complete. 7 MPI data ToBottom
8 Slice shapes (6x6x6) Increasing triangles Then transforming Hexagons Then decreasing (flipped) triangles 8
9 Slice 1 Cell nearest the viewer 9
10 Slice 2 Moving down away from viewer 10
11 11 Slice 3
12 12 Slice 4
13 13 Slice 5
14 14 Slice 6
15 15 Slice 7
16 16 Slice 8
17 17 Slice 9
18 18 Slice 10
19 19 Slice 11
20 20 Slice 12
21 21 Slice 13
22 22 Slice 14
23 23 Slice 15
24 Slice 16 Point furthest from viewer 24
25 Close up of parallelised loops over cells using MPI Loop over cells in z direction Possible MPI_Recv communications Loop over cells in y direction Loop over cells in x direction Loop over angles (number of angles too small for MPI) work End loop over angles End loop over cells in x direction End loop over cells in y direction Possible MPI_Ssend communcations End loop over cells in z direction
26 Loop over slices Possible MPI_Recv communications OMP DO PARALLEL Loop over cells in each slice OMP DO PARALLEL Loop over angles work End loop over angles OMP END DO PARALLEL End Loop over cells in each slice OMP END DO PARALLEL Possible MPI_Ssend communcations End loop over slices Close up of parallelised loops over cells using MPI and OpenMP
27 Parallel Algorithm Outline Outer iteration Loop over energy groups Inner iteration Loop over sweeps Loop over slices Possible MPI_Recv communications OMP DO PARALLEL Loop over cells in each slice OMP DO PARALLEL Loop over angles work End loop over angles Etc
28 Decoupling inter-dependant energy group calculations Initially, each energy group calculation used a previous energy groups results as input Decoupling the energy groups has two outcomes Execution time is greatly increased Energy Groups are now independent and can be parallelised Often seen in HPC Modern algorithms can be inherently serial An older version may be parallelisable
29 TaskFarm Summary If all the tasks take the same time to compute Block distribution of tasks Cyclic distribution of tasks dealing cards else if all tasks have different execution times If length of tasks are unknown in advance else endif Endif Cyclic distribution of tasks Order tasks: longest first, shortest last Cyclic distribution of tasks
30 Final Parallel Algorithm Outline Outer iteration MPI Task Farm of energy groups Inner iteration Loop over sweeps Loop over slices Possible MPI_Recv communications OMP DO PARALLEL Loop over cells in each slice OMP DO PARALLEL Loop over angles work End loop over angles Etc
31 Other wavefront codes have the loops in a different order Conclusion Loop over energy groups can occur within loops over cells and might be parallelised with OpenMP Must be decoupled
32 Thank you Any questions?
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationParallel Programming with MPI and OpenMP
Parallel Programming with MPI and OpenMP Michael J. Quinn Chapter 6 Floyd s Algorithm Chapter Objectives Creating 2-D arrays Thinking about grain size Introducing point-to-point communications Reading
More informationSweep3D analysis. Jesús Labarta, Judit Gimenez CEPBA-UPC
Sweep3D analysis Jesús Labarta, Judit Gimenez CEPBA-UPC Objective & index Objective: Describe the analysis and improvements in the Sweep3D code using Paraver Compare MPI and OpenMP versions Index The algorithm
More informationMPI Programming. Henrik R. Nagel Scientific Computing IT Division
1 MPI Programming Henrik R. Nagel Scientific Computing IT Division 2 Outline Introduction Finite Difference Method Finite Element Method LU Factorization SOR Method Monte Carlo Method Molecular Dynamics
More informationPerformance Study of the MPI and MPI-CH Communication Libraries on the IBM SP
Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP Ewa Deelman and Rajive Bagrodia UCLA Computer Science Department deelman@cs.ucla.edu, rajive@cs.ucla.edu http://pcl.cs.ucla.edu
More informationOptimising the Mantevo benchmark suite for multi- and many-core architectures
Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of
More informationParallel I/O. and split communicators. David Henty, Fiona Ried, Gavin J. Pringle
Parallel I/O and split communicators David Henty, Fiona Ried, Gavin J. Pringle Dr Gavin J. Pringle Applications Consultant gavin@epcc.ed.ac.uk +44 131 650 6709 4x4 array on 2x2 Process Grid Parallel IO
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationLooPo: Automatic Loop Parallelization
LooPo: Automatic Loop Parallelization Michael Claßen Fakultät für Informatik und Mathematik Düsseldorf, November 27 th 2008 Model-Based Loop Transformations model-based approach: map source code to an
More informationProblem Solving with Python Challenges 2 Scratch to Python
Problem Solving with Python Challenges 2 Scratch to Python Contents 1 Drawing a triangle... 1 2 Generalising our program to draw regular polygons in Scratch and Python... 2 2.1 Drawing a square... 2 2.2
More informationCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 18. Combining MPI and OpenMP
Chapter 18 Combining MPI and OpenMP Outline Advantages of using both MPI and OpenMP Case Study: Conjugate gradient method Case Study: Jacobi method C+MPI vs. C+MPI+OpenMP Interconnection Network P P P
More informationAll-Pairs Shortest Paths - Floyd s Algorithm
All-Pairs Shortest Paths - Floyd s Algorithm Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 31, 2011 CPD (DEI / IST) Parallel
More informationSection 7.2 Volume: The Disk Method
Section 7. Volume: The Disk Method White Board Challenge Find the volume of the following cylinder: No Calculator 6 ft 1 ft V 3 1 108 339.9 ft 3 White Board Challenge Calculate the volume V of the solid
More informationLaplace Exercise Solution Review
Laplace Exercise Solution Review John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2017 Finished? If you have finished, we can review a few principles that you have inevitably
More informationGenerating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory
Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri Thejas Ramashekar Chandan Reddy Uday Bondhugula Department of Computer Science and Automation
More informationComputational Geometry
Lecture 12: Lecture 12: Motivation: Terrains by interpolation To build a model of the terrain surface, we can start with a number of sample points where we know the height. Lecture 12: Motivation: Terrains
More informationDistributed Memory Programming With MPI (4)
Distributed Memory Programming With MPI (4) 2014 Spring Jinkyu Jeong (jinkyu@skku.edu) 1 Roadmap Hello World in MPI program Basic APIs of MPI Example program The Trapezoidal Rule in MPI. Collective communication.
More informationFractals exercise. Investigating task farms and load imbalance
Fractals exercise Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationParallelizing Loops. Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna.
Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ Copyright 2017, 2018 Moreno Marzolla, Università di Bologna, Italy (http://www.moreno.marzolla.name/teaching/hpc/)
More informationOptimisation of LESsCOAL for largescale high-fidelity simulation of coal pyrolysis and combustion
Optimisation of LESsCOAL for largescale high-fidelity simulation of coal pyrolysis and combustion Kaidi Wan 1, Jun Xia 2, Neelofer Banglawala 3, Zhihua Wang 1, Kefa Cen 1 1. Zhejiang University, Hangzhou,
More informationCommunication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures
Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart
More informationOpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter May 20-22, 2013, IHPC 2013, Iowa City 2,500 BC: Military Invents Parallelism Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to
More informationOpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More information1ACE Exercise 6. Name Date Class. 6. a. Draw ALL the lines of symmetry on Shape 1 and Shape 2 below. HINT Refer back to Problem 1.
1ACE Exercise 6 Investigation 1 6. a. Draw ALL the lines of symmetry on Shape 1 and Shape 2 below. HINT Refer to Problem 1.2 for an explanation of lines of symmetry. Shape 1 Shape 2 b. Do these shapes
More informationIntro to Parallel Computing
Outline Intro to Parallel Computing Remi Lehe Lawrence Berkeley National Laboratory Modern parallel architectures Parallelization between nodes: MPI Parallelization within one node: OpenMP Why use parallel
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationLecture 11 Combinatorial Planning: In the Plane
CS 460/560 Introduction to Computational Robotics Fall 2017, Rutgers University Lecture 11 Combinatorial Planning: In the Plane Instructor: Jingjin Yu Outline Convex shapes, revisited Combinatorial planning
More informationOptimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition
Optimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition CUG 2018, Stockholm Andreas Jocksch, Matthias Kraushaar (CSCS), David Daverio (University of Cambridge,
More informationOverpartioning with the Rice dhpf Compiler
Overpartioning with the Rice dhpf Compiler Strategies for Achieving High Performance in High Performance Fortran Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/hug00overpartioning.pdf
More informationPerformance Analysis and Optimization of a Deterministic Radiation Transport Code on the Cray SV1
Performance Analysis and Optimization of a Deterministic Radiation Transport Code on the Cray SV1 Peter Cebull Advisory Engineer May 20, 2004 Outline Background Description of Attila Initial analysis and
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationecse08-10: Optimal parallelisation in CASTEP
ecse08-10: Optimal parallelisation in CASTEP Arjen, Tamerus University of Cambridge at748@cam.ac.uk Phil, Hasnip University of York phil.hasnip@york.ac.uk July 31, 2017 Abstract We describe an improved
More information2011 James S. Rickards Fall Invitational Geometry Team Round QUESTION 1
QUESTION 1 In the diagram above, 1 and 5 are supplementary and 2 = 6. If 1 = 34 and 2 = 55, find 3 + 4 + 5 + 6. QUESTION 2 A = The sum of the degrees of the interior angles of a regular pentagon B = The
More informationUsing Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh
Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software
More informationCOMP528: Multi-core and Multi-Processor Computing
COMP528: Multi-core and Multi-Processor Computing Dr Michael K Bane, G14, Computer Science, University of Liverpool m.k.bane@liverpool.ac.uk https://cgi.csc.liv.ac.uk/~mkbane/comp528 2X So far Why and
More informationDr. Ilia Bermous, the Australian Bureau of Meteorology. Acknowledgements to Dr. Martyn Corden (Intel), Dr. Zhang Zhang (Intel), Dr. Martin Dix (CSIRO)
Performance, accuracy and bit-reproducibility aspects in handling transcendental functions with Cray and Intel compilers for the Met Office Unified Model Dr. Ilia Bermous, the Australian Bureau of Meteorology
More informationPoint-to-Point Synchronisation on Shared Memory Architectures
Point-to-Point Synchronisation on Shared Memory Architectures J. Mark Bull and Carwyn Ball EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland, U.K. email:
More informationLearning outcomes. COMPSCI 101 Principles of Programming. Drawing 2D shapes using Characters. Printing a Row of characters
Learning outcomes At the end of this lecture, students should be able to draw 2D shapes using characters draw 2D shapes on a Canvas COMPSCI 101 Principles of Programming Lecture 25 - Using the Canvas widget
More informationParallelization of DQMC Simulations for Strongly Correlated Electron Systems
Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Che-Rung Lee Dept. of Computer Science National Tsing-Hua University Taiwan joint work with I-Hsin Chung (IBM Research), Zhaojun
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More information2018 AIME I Problems
2018 AIME I Problems Problem 1 Let be the number of ordered pairs of integers, with and such that the polynomial x + x + can be factored into the product of two (not necessarily distinct) linear factors
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationMPI Casestudy: Parallel Image Processing
MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by
More information=1, and F n. for all n " 2. The recursive definition of Fibonacci numbers immediately gives us a recursive algorithm for computing them:
Wavefront Pattern I. Problem Data elements are laid out as multidimensional grids representing a logical plane or space. The dependency between the elements, often formulated by dynamic programming, results
More informationUsing Co-Array Fortran to Enhance the Scalability of the EPIC Code
Using Co-Array Fortran to Enhance the Scalability of the EPIC Code Jef Dawson Army High Performance Computing Research Center, Network Computing Services, Inc. ABSTRACT: Supercomputing users continually
More informationPython for Development of OpenMP and CUDA Kernels for Multidimensional Data
Python for Development of OpenMP and CUDA Kernels for Multidimensional Data Zane W. Bell 1, Greg G. Davidson 2, Ed D Azevedo 3, Thomas M. Evans 2, Wayne Joubert 4, John K. Munro, Jr. 5, Dilip R. Patlolla
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationKey Learning for Grade 3
Key Learning for Grade 3 The Ontario Curriculum: Mathematics (2005) Number Sense and Numeration Read, represent, compare and order whole numbers to 1000, and use concrete materials to investigate fractions
More informationFractals. Investigating task farms and load imbalance
Fractals Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationThe Power of Streams on the SRC MAP. Wim Bohm Colorado State University. RSS!2006 Copyright 2006 SRC Computers, Inc. ALL RIGHTS RESERVED.
The Power of Streams on the SRC MAP Wim Bohm Colorado State University RSS!2006 Copyright 2006 SRC Computers, Inc. ALL RIGHTS RSRV. MAP C Pure C runs on the MAP Generated code: circuits Basic blocks in
More informationDirective-based Programming for Highly-scalable Nodes
Directive-based Programming for Highly-scalable Nodes Doug Miles Michael Wolfe PGI Compilers & Tools NVIDIA Cray User Group Meeting May 2016 Talk Outline Increasingly Parallel Nodes Exposing Parallelism
More informationESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report
ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new
More informationA4. Intro to Parallel Computing
Self-Consistent Simulations of Beam and Plasma Systems Steven M. Lund, Jean-Luc Vay, Rémi Lehe and Daniel Winklehner Colorado State U., Ft. Collins, CO, 13-17 June, 2016 A4. Intro to Parallel Computing
More information4-8 Notes. Warm-Up. Discovering the Rule for Finding the Total Degrees in Polygons. Triangles
Date: Learning Goals: How do you find the measure of the sum of interior and exterior angles in a polygon? How do you find the measures of the angles in a regular polygon? Warm-Up 1. What is similar about
More informationMochalskyy Serhiy 1, Dirk Wünderlich 1, Benjamin Ruf 1, Peter Franzen 1 Ursel Fanz 1 and Tiberiu Minea 2
3D Numerical Simulations of the Negative Ion Extraction Using Realistic Plasma Parameters, Geometry of the Extraction Aperture and Full 3D Magnetic Field Map 1, Dirk Wünderlich 1, Benjamin Ruf 1, Peter
More informationMeasuring Triangles. 1 cm 2. 1 cm. 1 cm
3 Measuring Triangles You can find the area of a figure by drawing it on a grid (or covering it with a transparent grid) and counting squares, but this can be very time consuming. In Investigation 1, you
More informationLesson 18: Slicing on an Angle
Student Outcomes Students describe polygonal regions that result from slicing a right rectangular prism or pyramid by a plane that is not necessarily parallel or perpendicular to a base. Lesson Notes In
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationIntroduction to OpenMP
Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for
More informationCFD exercise. Regular domain decomposition
CFD exercise Regular domain decomposition Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationPLC Papers Created For:
PLC Papers Created For: Year 10 Topic Practice Papers: Polygons Polygons 1 Grade 4 Look at the shapes below A B C Shape A, B and C are polygons Write down the mathematical name for each of the polygons
More informationUnderstanding Dynamic Parallelism
Understanding Dynamic Parallelism Know your code and know yourself Presenter: Mark O Connor, VP Product Management Agenda Introduction and Background Fixing a Dynamic Parallelism Bug Understanding Dynamic
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing Introduction to Parallel Computing with MPI and OpenMP P. Ramieri Segrate, November 2016 Course agenda Tuesday, 22 November 2016 9.30-11.00 01 - Introduction to parallel
More informationParallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially
More informationSTUDYING OPENMP WITH VAMPIR
STUDYING OPENMP WITH VAMPIR Case Studies Sparse Matrix Vector Multiplication Load Imbalances November 15, 2017 Studying OpenMP with Vampir 2 Sparse Matrix Vector Multiplication y 1 a 11 a n1 x 1 = y m
More informationParallel Programming Patterns Overview and Concepts
Parallel Programming Patterns Overview and Concepts Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationLaplace Exercise Solution Review
Laplace Exercise Solution Review John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Finished? If you have finished, we can review a few principles that you have inevitably
More informationPERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015
PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability
More informationAn Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels
An Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels Masatomo Hashimoto Masaaki Terai Toshiyuki Maeda Kazuo Minami 26/04/2017 ICPE2017 1 Agenda Performance engineering
More informationExperiences with Achieving Portability across Heterogeneous Architectures
Experiences with Achieving Portability across Heterogeneous Architectures Lukasz G. Szafaryn +, Todd Gamblin ++, Bronis R. de Supinski ++ and Kevin Skadron + + University of Virginia ++ Lawrence Livermore
More informationWhatcom County Math Championship 2018 Geometry -4 th Grade
Whatcom County Math Championship 2018 Geometry -4 th Grade 1. How many diagonals does a hexagon have? 2. At 7:00 pm, what is the degree measure of the angle formed between the minute and the hour hand?
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationDynamic Loop Parallelisation
Dynamic Loop Parallelisation Adrian Jackson and Orestis Agathokleous EPCC, The University of Edinburgh, Kings Buildings, Mayfield Road, Edinburgh, EH9 3JZ, UK arxiv:1205.2367v1 [cs.pl] 10 May 2012 Abstract
More informationMPI Profile (mpip) on IRS BenchMark Application
MPI Profile (mpip) on IRS BenchMark Application MPI with ZRAD8 -np=2 -np=6 -np=1 Call App% MPI% App% MPI% App% MPI% App% MPI% App% MPI% any.62 3.94.79 28.22 26.23 73. 1.72 1.67 1.46 6.28.12 1.1.24 8.44
More informationBring your application to a new era:
Bring your application to a new era: learning by example how to parallelize and optimize for Intel Xeon processor and Intel Xeon Phi TM coprocessor Manel Fernández, Roger Philp, Richard Paul Bayncore Ltd.
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationParallel Query Optimisation
Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation
More informationIntroduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/
Introduction to MPI Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Distributed Memory Programming (MPI) Message Passing Model Initializing and terminating programs Point to point
More informationMixed Mode MPI / OpenMP Programming
Mixed Mode MPI / OpenMP Programming L.A. Smith Edinburgh Parallel Computing Centre, Edinburgh, EH9 3JZ 1 Introduction Shared memory architectures are gradually becoming more prominent in the HPC market,
More informationWallpaper Groups and Statistical Geometry
Wallpaper Groups and Statistical Geometry John Shier Abstract. The arrangement of regions filled by statistical geometry into arrays with specific symmetry properties is studied for the space groups p2mm,
More informationIntroducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method
Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System
More informationScientific Programming, Analysis, and Visualization with Python. Mteor 227 Fall 2017
Scientific Programming, Analysis, and Visualization with Python Mteor 227 Fall 2017 Python The Big Picture Interpreted General purpose, high-level Dynamically type Multi-paradigm Object-oriented Functional
More informationParallel Programming. Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab
Parallel Programming Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab Summing n numbers for(i=1; i++; i
More informationVocabulary: Looking For Pythagoras
Vocabulary: Looking For Pythagoras Concept Finding areas of squares and other figures by subdividing or enclosing: These strategies for finding areas were developed in Covering and Surrounding. Students
More informationOpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa
OpenMP on the FDSM software distributed shared memory Hiroya Matsuba Yutaka Ishikawa 1 2 Software DSM OpenMP programs usually run on the shared memory computers OpenMP programs work on the distributed
More informationApproximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006
Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006 Outline Reducing the constant from O ( ɛ d) to O ( ɛ (d 1)/2) in uery time Need to know ɛ ahead of time Preprocessing time and
More informationSimilar Triangles. Triangles are similar if corresponding angles are equal. Now take a look at the sides of the two triangles shown in Figure 3.
Imagine you are having a picnic at the lake. You look out across the lake and see a sailboat off in the distance the sail is a tiny triangle to the naked eye. For a closer look, you pick up the binoculars
More informationSupercomputing in Plain English Exercise #6: MPI Point to Point
Supercomputing in Plain English Exercise #6: MPI Point to Point In this exercise, we ll use the same conventions and commands as in Exercises #1, #2, #3, #4 and #5. You should refer back to the Exercise
More informationActivity 1 Look at the pattern on the number line and find the missing numbers. Model. (b) (c) (a) (b) (c) (d)
Lesson Look at the pattern on the number line and find the missing numbers. Model (a) (b) (c) 9 Answers: (a) (b) (c) (a) (b) (c) (a) (b) (c) (a) (b) (c) 00 00. (a) (b) 00. (c) 0 0 (a) (b) (c) Use the number
More informationAn innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ.
An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ. Of Pisa Italy 29/02/2012, Nuremberg, Germany ARTEMIS ARTEMIS Joint Joint Undertaking
More informationFebruary Regional Geometry Team: Question #1
February Regional Geometry Team: Question #1 A = area of an equilateral triangle with a side length of 4. B = area of a square with a side length of 3. C = area of a regular hexagon with a side length
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationLESSON SUMMARY. Properties of Shapes
LESSON SUMMARY CXC CSEC MATHEMATICS UNIT Seven: Geometry Lesson 13 Properties of Shapes Textbook: Mathematics, A Complete Course by Raymond Toolsie, Volume 1 and 2. (Some helpful exercises and page numbers
More informationDetection and Analysis of Iterative Behavior in Parallel Applications
Detection and Analysis of Iterative Behavior in Parallel Applications Karl Fürlinger and Shirley Moore Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University
More informationOpenMP for next generation heterogeneous clusters
OpenMP for next generation heterogeneous clusters Jens Breitbart Research Group Programming Languages / Methodologies, Universität Kassel, jbreitbart@uni-kassel.de Abstract The last years have seen great
More informationTransformation, tessellation and symmetry line symmetry
Transformation, tessellation and symmetry line symmetry Reflective or line symmetry describes mirror image, when one half of a shape or picture matches the other exactly. The middle line that divides the
More informationVerification of the Hexagonal Ray Tracing Module and the CMFD Acceleration in ntracer
KNS 2017 Autumn Gyeongju Verification of the Hexagonal Ray Tracing Module and the CMFD Acceleration in ntracer October 27, 2017 Seongchan Kim, Changhyun Lim, Young Suk Ban and Han Gyu Joo * Reactor Physics
More informationUNIT 1 SIMILARITY, CONGRUENCE, AND PROOFS Lesson 4: Exploring Congruence Instruction
Prerequisite Skills This lesson requires the use of the following skills: constructing perpendicular bisectors copying a segment copying an angle Introduction Think about trying to move a drop of water
More informationScalable Critical Path Analysis for Hybrid MPI-CUDA Applications
Center for Information Services and High Performance Computing (ZIH) Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications The Fourth International Workshop on Accelerators and Hybrid Exascale
More information