Fractals exercise. Investigating task farms and load imbalance

Size: px
Start display at page:

Download "Fractals exercise. Investigating task farms and load imbalance"

Transcription

1 Fractals exercise Investigating task farms and load imbalance

2 Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. This means you are free to copy and redistribute the material and adapt and build on the material under the following terms: You must give appropriate credit, provide a link to the license and indicate if changes were made. If you adapt or build on the material you must distribute your work under the same license as the original. Note that this presentation contains images owned by others. Please seek their permission before reusing these images.

3 Aims Explore how the granularity of tasks impacts performance Trade-off between the amount of parallelism (number of parallel tasks) and amount of communication (size of tasks) Consider issues surrounding load balance Remember the runtime of the code is determined by the slowest running task so we want work to be as evenly distributed as possible The exercise introduces a Load Imbalance Factor (LIF) which illustrates how much faster your code could run if the load was evenly distributed

4 What are fractals? Ideas behind the Mandelbrot and Julia sets

5 The Mandelbrot Set The Mandelbrot Set is the set of numbers resulting from repeated iterations of the complex (i = -1) function: Z n Z 2 n C 1 with the initial condition Z 0 0 C = x 0 +iy 0 belongs to the Mandelbrot set if Z n remains bounded i.e. does not diverge Z n = x n + iy n, Z n 2 = (x n 2 y n 2 + 2ix n y n ), Z n 2 =(x n2 +y n2 )

6 The Mandelbrot Set cont. Separating out the real and imaginary parts gives: Z n = Z n r +iz n i Z r n = x 2 2 n-1 - y n-1 Z n i + x 0 = 2x n-1 y n-1 + y 0 Take the threshold value as: Z 2 ³ 4.0 Set the maximum number of iterations to N max - Assume that Z does not diverge at higher values of N max

7 The Julia Set Similar algorithm to Mandelbrot Set recall: Z n Z 2 n C Z 0 1, C x 0 iy 0, 0 There are an infinite number of Julia sets, parameterised by a complex number C Z n Z 2 C Z x iy 0 0 n 1, 0 for example, C = i 0.156

8 Visualisation To visualise a Mandelbrot/Julia set: Represent the complex plane as a 2D grid where complex numbers correspond to points on the grid (x, y) Calculate number of iterations N for the series to diverge (exceed the threshold) for each point on the grid If it does not diverge, N = N max Convert the value of N to a colour and plot this on the grid

9 Mandelbrot Set Very slow to compute Very quick to compute

10 Parallel implementation How do we parallelise computation of these fractals?

11 Parallelisation Values for each coordinate depend only on the previous values at that coordinate. decompose 2D grid into equally sized blocks no communications between blocks needed. Don t know in advance how much work is needed. number of iterations across the blocks varies. work dynamically assigned to workers as they become available. Implementation Split the grid into blocks: each block corresponds to a task. master process hands out tasks to worker processes. workers return completed task to master.

12 Example: Parallelisation on 4 CPUs master CPU workers CPU 2 CPU 3 CPU y x In diagram, colour represents which worker did the task number gives the task id tasks scan from left to right, moving upwards

13 Parallelisation cont in supplied code shading represents worker here we have added worker id as a number by hand e.g. taskfarm run on 5 CPUs 1 master 4 workers total number of tasks = 16

14 Notes about the exercise

15 Exercise You are supplied with source code etc. Compile and run on the machine Visualise results Quantify performance results For a fixed number of workers improve load balance by increasing number of tasks (decrease size) compute LIF to estimate minimum achievable runtime is this minimum ever reached?

16 Exercise outcomes What do the timings tell us about HPC machines?

17 Example results (fixed number of workers)

18 Results cont.

19 16 workers and 16 tasks -----Workload Summary (number of iterations)---- Total Number of Workers: 16 Total Number of Tasks: 16 Total Worker Load: Average Worker Load: Maximum Worker Load: Minimum Worker Load: Time taken by 16 workers was (secs) Load Imbalance Factor:

20 16 workers and 64 tasks -----Workload Summary (number of iterations) Total Number of Workers: 16 Total Number of Tasks: 64 Total Worker Load: Average Worker Load: Maximum Worker Load: Minimum Worker Load: Time taken by 16 workers was (secs) Load Imbalance Factor:

21 Key points to take away TASK FARMS Also known as the master/worker pattern Allows a master process to distribute work to a set of worker processors. Can be used for other types of tasks but it complicates the situation and other patterns may be more suitable for implementing. Master process is responsible for creating, distributing and gathering the individual jobs. Can improve load balance by using more tasks than workers with some overhead Load imbalance adversely affects performance especially as number of processors increases

22 Key points to take away TASKS Units of work Vary in size, do not have to be of consistent execution time. If execution times are known it can help with load balancing. QUEUES Master generates a pool of tasks and puts them in a queue Workers assigned task from queue when idle

23 Key points to take away LOAD BALANCING How a system determines how work or tasks are distributed across workers (processes or threads) Successful load balancing avoids idle processes and overloading single cores Poor load balancing leads to under-utilised cores, reducing performance.

24 Key points to take away COST Increasingly important Finite budgets require optimal use of resources requested. Load balancing is just one method of ensuring optimal usage and avoiding wasting resources. More power and resources do not necessarily mean improved performance. Always ask is it necessary to run this on 4000 cores or could it be run on 2000 more efficiently?

Fractals. Investigating task farms and load imbalance

Fractals. Investigating task farms and load imbalance Fractals Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Parallel Programming Patterns Overview and Concepts

Parallel Programming Patterns Overview and Concepts Parallel Programming Patterns Overview and Concepts Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

CFD exercise. Regular domain decomposition

CFD exercise. Regular domain decomposition CFD exercise Regular domain decomposition Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Reusing this material

Reusing this material XEON PHI BASICS Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Parallel Programming Patterns. Overview and Concepts

Parallel Programming Patterns. Overview and Concepts Parallel Programming Patterns Overview and Concepts Outline Practical Why parallel programming? Decomposition Geometric decomposition Task farm Pipeline Loop parallelism Performance metrics and scaling

More information

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike

More information

Message Passing Programming. Designing MPI Applications

Message Passing Programming. Designing MPI Applications Message Passing Programming Designing MPI Applications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Parallel Programming Libraries and implementations

Parallel Programming Libraries and implementations Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Batch Systems. Running your jobs on an HPC machine

Batch Systems. Running your jobs on an HPC machine Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Analytical Modeling of Parallel Programs

Analytical Modeling of Parallel Programs 2014 IJEDR Volume 2, Issue 1 ISSN: 2321-9939 Analytical Modeling of Parallel Programs Hardik K. Molia Master of Computer Engineering, Department of Computer Engineering Atmiya Institute of Technology &

More information

Fractal Dimension of Julia Sets

Fractal Dimension of Julia Sets Fractal Dimension of Julia Sets Claude Heiland-Allen claude@mathr.co.uk March 6, 2015 Fractal Dimension of Julia Sets Fractal Dimension How Long is a Coast? Box-Counting Dimension Examples Fractal Dimension

More information

Building Blocks. Operating Systems, Processes, Threads

Building Blocks. Operating Systems, Processes, Threads Building Blocks Operating Systems, Processes, Threads Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Advanced OpenMP. Memory model, flush and atomics

Advanced OpenMP. Memory model, flush and atomics Advanced OpenMP Memory model, flush and atomics Why do we need a memory model? On modern computers code is rarely executed in the same order as it was specified in the source code. Compilers, processors

More information

Data Analytics with HPC. Data Streaming

Data Analytics with HPC. Data Streaming Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA M. GAUS, G. R. JOUBERT, O. KAO, S. RIEDEL AND S. STAPEL Technical University of Clausthal, Department of Computer Science Julius-Albert-Str. 4, 38678

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

Meltdown for Dummies. The road to hell is full of good intentions

Meltdown for Dummies. The road to hell is full of good intentions Meltdown for Dummies The road to hell is full of good intentions Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Chap. 5 Part 2. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chap. 5 Part 2. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1 Chap. 5 Part 2 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 Static work allocation Where work distribution is predetermined, but based on what? Typical scheme Divide n size data into P

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Lecture 14. Performance Programming with OpenMP

Lecture 14. Performance Programming with OpenMP Lecture 14 Performance Programming with OpenMP Sluggish front end nodes? Advice from NCSA Announcements Login nodes are honest[1-4].ncsa.uiuc.edu Login to specific node to target one that's not overloaded

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

Requirements, Partitioning, paging, and segmentation

Requirements, Partitioning, paging, and segmentation Requirements, Partitioning, paging, and segmentation Main Memory: The Big Picture kernel memory proc struct kernel stack/u area Stack kernel stack/u area Stack kernel stack/u area Stack Data Text (shared)

More information

A Comparative Study of Load Balancing Algorithms: A Review Paper

A Comparative Study of Load Balancing Algorithms: A Review Paper Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Parallel Programming. Libraries and implementations

Parallel Programming. Libraries and implementations Parallel Programming Libraries and implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Parallel Programming Patterns

Parallel Programming Patterns Parallel Programming Patterns Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ Copyright 2013, 2017, 2018 Moreno Marzolla, Università

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

Introduction to Object- Oriented Programming

Introduction to Object- Oriented Programming Introduction to Object- Oriented Programming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Message-Passing Programming with MPI. Message-Passing Concepts

Message-Passing Programming with MPI. Message-Passing Concepts Message-Passing Programming with MPI Message-Passing Concepts Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

MPI Casestudy: Parallel Image Processing

MPI Casestudy: Parallel Image Processing MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by

More information

Complex Numbers, Polar Equations, and Parametric Equations. Copyright 2017, 2013, 2009 Pearson Education, Inc.

Complex Numbers, Polar Equations, and Parametric Equations. Copyright 2017, 2013, 2009 Pearson Education, Inc. 8 Complex Numbers, Polar Equations, and Parametric Equations Copyright 2017, 2013, 2009 Pearson Education, Inc. 1 8.2 Trigonometric (Polar) Form of Complex Numbers The Complex Plane and Vector Representation

More information

CS179: GPU Programming Recitation 5: Rendering Fractals

CS179: GPU Programming Recitation 5: Rendering Fractals CS179: GPU Programming Recitation 5: Rendering Fractals Rendering Fractals Volume data vs. texture memory Creating and using CUDA arrays Using PBOs for screen output Quaternion Julia Sets Rendering volume

More information

Molecular Modelling and the Cray XC30 Performance Counters. Michael Bareford, ARCHER CSE Team

Molecular Modelling and the Cray XC30 Performance Counters. Michael Bareford, ARCHER CSE Team Molecular Modelling and the Cray XC30 Performance Counters Michael Bareford, ARCHER CSE Team michael.bareford@epcc.ed.ac.uk Reusing this material This work is licensed under a Creative Commons Attribution-

More information

Welcome! Virtual tutorial starts at 15:00 BST

Welcome! Virtual tutorial starts at 15:00 BST Welcome! Virtual tutorial starts at 15:00 BST Parallel IO and the ARCHER Filesystem ARCHER Virtual Tutorial, Wed 8 th Oct 2014 David Henty Reusing this material This work is licensed

More information

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps Interactive Scheduling Algorithms Continued o Priority Scheduling Introduction Round-robin assumes all processes are equal often not the case Assign a priority to each process, and always choose the process

More information

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory

More information

Runtime Support for Scalable Task-parallel Programs

Runtime Support for Scalable Task-parallel Programs Runtime Support for Scalable Task-parallel Programs Pacific Northwest National Lab xsig workshop May 2018 http://hpc.pnl.gov/people/sriram/ Single Program Multiple Data int main () {... } 2 Task Parallelism

More information

Reusing this material

Reusing this material Derived Datatypes Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling Will Smith and Elizabeth Fehrmann May 23, 2006 Multiple Processor Systems Dr. Muhammad Shaaban Overview Serial Compilers Parallel Compilers

More information

Process. One or more threads of execution Resources required for execution. Memory (RAM) Others

Process. One or more threads of execution Resources required for execution. Memory (RAM) Others Memory Management 1 Learning Outcomes Appreciate the need for memory management in operating systems, understand the limits of fixed memory allocation schemes. Understand fragmentation in dynamic memory

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics December 0, 0 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

Reusing this material

Reusing this material Virtual Topologies Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Concepts from High-Performance Computing

Concepts from High-Performance Computing Concepts from High-Performance Computing Lecture A - Overview of HPC paradigms OBJECTIVE: The clock speeds of computer processors are topping out as the limits of traditional computer chip technology are

More information

Embarrassingly Parallel Computations Creating the Mandelbrot set

Embarrassingly Parallel Computations Creating the Mandelbrot set Embarrassingly Parallel Computations Creating the Mandelbrot set Péter Kacsuk Laboratory of Parallel and Distributed Systems MTA SZTAKI Research Institute kacsuk@sztaki.hu www.lpds.sztaki.hu Definition

More information

Search Algorithms for Discrete Optimization Problems

Search Algorithms for Discrete Optimization Problems Search Algorithms for Discrete Optimization Problems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic

More information

Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ,

Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ, Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - fabio.baruffa@lrz.de LRZ, 27.6.- 29.6.2016 Architecture Overview Intel Xeon Processor Intel Xeon Phi Coprocessor, 1st generation Intel Xeon

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 9: Performance tuning Sources of overhead There are 6 main causes of poor performance in shared memory parallel programs: sequential code communication load imbalance synchronisation

More information

Escape-Time Fractals

Escape-Time Fractals Escape-Time Fractals Main Concept Fractals are geometric shapes that exhibit self-similarity. That is, they have the same pattern at different scales. In fact, fractals continue to show intricate details

More information

Technical Report. Performance Analysis for Parallel R Programs: Towards Efficient Resource Utilization. Helena Kotthaus, Ingo Korb, Peter Marwedel

Technical Report. Performance Analysis for Parallel R Programs: Towards Efficient Resource Utilization. Helena Kotthaus, Ingo Korb, Peter Marwedel Performance Analysis for Parallel R Programs: Towards Efficient Resource Utilization Technical Report Helena Kotthaus, Ingo Korb, Peter Marwedel 01/2015 technische universität dortmund Part of the work

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

It s. slow! SQL Saturday. Copyright Heraflux Technologies. Do not redistribute or copy as your own. 1. Database. Firewall Load Balancer.

It s. slow! SQL Saturday. Copyright Heraflux Technologies. Do not redistribute or copy as your own. 1. Database. Firewall Load Balancer. App request Web Server Firewall Load Balancer Web Server App Server Report Server Desktop App Desktop App Desktop App Desktop App Web Server Database It s FG1 FG2 Log MDF NDF NDF NDF LDF SQL Server Instance

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Allows program to be incrementally parallelized

Allows program to be incrementally parallelized Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP

More information

Chapter 11 Search Algorithms for Discrete Optimization Problems

Chapter 11 Search Algorithms for Discrete Optimization Problems Chapter Search Algorithms for Discrete Optimization Problems (Selected slides) A. Grama, A. Gupta, G. Karypis, and V. Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003.

More information

The MPI Message-passing Standard Lab Time Hands-on. SPD Course Massimo Coppola

The MPI Message-passing Standard Lab Time Hands-on. SPD Course Massimo Coppola The MPI Message-passing Standard Lab Time Hands-on SPD Course 2016-2017 Massimo Coppola Remember! Simplest programs do not need much beyond Send and Recv, still... Each process lives in a separate memory

More information

ANALYSIS OF A DYNAMIC LOAD BALANCING IN MULTIPROCESSOR SYSTEM

ANALYSIS OF A DYNAMIC LOAD BALANCING IN MULTIPROCESSOR SYSTEM International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol. 3, Issue 1, Mar 2013, 143-148 TJPRC Pvt. Ltd. ANALYSIS OF A DYNAMIC LOAD BALANCING

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

Parallel design patterns ARCHER course. Vectorisation and active messaging

Parallel design patterns ARCHER course. Vectorisation and active messaging Parallel design patterns ARCHER course Vectorisation and active messaging Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

IN5050: Programming heterogeneous multi-core processors Thinking Parallel

IN5050: Programming heterogeneous multi-core processors Thinking Parallel IN5050: Programming heterogeneous multi-core processors Thinking Parallel 28/8-2018 Designing and Building Parallel Programs Ian Foster s framework proposal develop intuition as to what constitutes a good

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics July 11, 2016 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Requirements, Partitioning, paging, and segmentation

Requirements, Partitioning, paging, and segmentation Requirements, Partitioning, paging, and segmentation Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated efficiently to pack as many processes into memory

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics May 24, 2015 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

Message Passing Programming. Introduction to MPI

Message Passing Programming. Introduction to MPI Message Passing Programming Introduction to MPI Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Introduction to OpenMP. Lecture 4: Work sharing directives

Introduction to OpenMP. Lecture 4: Work sharing directives Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for

More information

Millisort: An Experiment in Granular Computing. Seo Jin Park with Yilong Li, Collin Lee and John Ousterhout

Millisort: An Experiment in Granular Computing. Seo Jin Park with Yilong Li, Collin Lee and John Ousterhout Millisort: An Experiment in Granular Computing Seo Jin Park with Yilong Li, Collin Lee and John Ousterhout Massively Parallel Granular Computing Massively parallel computing as an application of granular

More information

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 3.5, Page 1

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 3.5, Page 1 Learning Objectives At the end of the class you should be able to: justify why depth-bounded search is useful demonstrate how iterative-deepening works for a particular problem demonstrate how depth-first

More information

Algorithms PART I: Embarrassingly Parallel. HPC Fall 2012 Prof. Robert van Engelen

Algorithms PART I: Embarrassingly Parallel. HPC Fall 2012 Prof. Robert van Engelen Algorithms PART I: Embarrassingly Parallel HPC Fall 2012 Prof. Robert van Engelen Overview Ideal parallelism Master-worker paradigm Processor farms Examples Geometrical transformations of images Mandelbrot

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Introduction to running C based MPI jobs on COGNAC. Paul Bourke November 2006

Introduction to running C based MPI jobs on COGNAC. Paul Bourke November 2006 Introduction to running C based MPI jobs on COGNAC. Paul Bourke November 2006 The following is a practical introduction to running parallel MPI jobs on COGNAC, the SGI Altix machine (160 Itanium2 cpus)

More information

Introduction to Performance Tuning & Optimization Tools

Introduction to Performance Tuning & Optimization Tools Introduction to Performance Tuning & Optimization Tools a[i] a[i+1] + a[i+2] a[i+3] b[i] b[i+1] b[i+2] b[i+3] = a[i]+b[i] a[i+1]+b[i+1] a[i+2]+b[i+2] a[i+3]+b[i+3] Ian A. Cosden, Ph.D. Manager, HPC Software

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

Message Passing Programming. Modes, Tags and Communicators

Message Passing Programming. Modes, Tags and Communicators Message Passing Programming Modes, Tags and Communicators Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Search Algorithms for Discrete Optimization Problems

Search Algorithms for Discrete Optimization Problems Search Algorithms for Discrete Optimization Problems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. 1 Topic

More information

Parallel Constraint Programming (and why it is hard... ) Ciaran McCreesh and Patrick Prosser

Parallel Constraint Programming (and why it is hard... ) Ciaran McCreesh and Patrick Prosser Parallel Constraint Programming (and why it is hard... ) This Week s Lectures Search and Discrepancies Parallel Constraint Programming Why? Some failed attempts A little bit of theory and some very simple

More information

Overloading, abstract classes, and inheritance

Overloading, abstract classes, and inheritance Overloading, abstract classes, and inheritance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-ncsa/4.0/deed.en_us

More information

Fun with Fractals Saturday Morning Math Group

Fun with Fractals Saturday Morning Math Group Fun with Fractals Saturday Morning Math Group Alistair Windsor Fractals Fractals are amazingly complicated patterns often produced by very simple processes. We will look at two different types of fractals

More information

Multiprocessor and Real- Time Scheduling. Chapter 10

Multiprocessor and Real- Time Scheduling. Chapter 10 Multiprocessor and Real- Time Scheduling Chapter 10 Classifications of Multiprocessor Loosely coupled multiprocessor each processor has its own memory and I/O channels Functionally specialized processors

More information

Welcome! Virtual tutorial starts at 15:00 GMT. Please leave feedback afterwards at:

Welcome! Virtual tutorial starts at 15:00 GMT. Please leave feedback afterwards at: Welcome! Virtual tutorial starts at 15:00 GMT Please leave feedback afterwards at: www.archer.ac.uk/training/feedback/online-course-feedback.php Introduction to Version Control (part 1) ARCHER Virtual

More information

Real-time grid computing for financial applications

Real-time grid computing for financial applications CNR-INFM Democritos and EGRID project E-mail: cozzini@democritos.it Riccardo di Meo, Ezio Corso EGRID project ICTP E-mail: {dimeo,ecorso}@egrid.it We describe the porting of a test case financial application

More information

ò mm_struct represents an address space in kernel ò task represents a thread in the kernel ò A task points to 0 or 1 mm_structs

ò mm_struct represents an address space in kernel ò task represents a thread in the kernel ò A task points to 0 or 1 mm_structs Last time We went through the high-level theory of scheduling algorithms Scheduling Today: View into how Linux makes its scheduling decisions Don Porter CSE 306 Lecture goals Understand low-level building

More information

Fortran classes and data visibility

Fortran classes and data visibility Fortran classes and data visibility Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-ncsa/4.0/deed.en_us

More information

PowerVR Series5. Architecture Guide for Developers

PowerVR Series5. Architecture Guide for Developers Public Imagination Technologies PowerVR Series5 Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Scheduling. Don Porter CSE 306

Scheduling. Don Porter CSE 306 Scheduling Don Porter CSE 306 Last time ò We went through the high-level theory of scheduling algorithms ò Today: View into how Linux makes its scheduling decisions Lecture goals ò Understand low-level

More information

Parallelizing Compilers

Parallelizing Compilers CSc 553 Principles of Compilation 36 : Parallelizing Compilers I Parallelizing Compilers Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2011 Christian Collberg Parallelizing

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Introduction to Algorithms December 8, 2004 Massachusetts Institute of Technology 6.046J/18.410J Professors Piotr Indyk and Charles E. Leiserson Handout 34 Problem Set 9 Solutions Reading: Chapters 32.1

More information

Computable Sets. KR Chowdhary Professor & Head.

Computable Sets. KR Chowdhary Professor & Head. Computable Sets KR Chowdhary Professor & Head Email: kr.chowdhary@acm.org Department of Computer Science and Engineering MBM Engineering College, Jodhpur March 19, 2013 kr chowdhary Comp-Set 1/ 1 Computable

More information

Molecular Dynamics Simulations with Julia

Molecular Dynamics Simulations with Julia Emily Crabb 6.338/18.337 Final Project Molecular Dynamics Simulations with Julia I. Project Overview This project consists of one serial and several parallel versions of a molecular dynamics simulation

More information

MATLAB Distributed Computing Server (MDCS) Training

MATLAB Distributed Computing Server (MDCS) Training MATLAB Distributed Computing Server (MDCS) Training Artemis HPC Integration and Parallel Computing with MATLAB Dr Hayim Dar hayim.dar@sydney.edu.au Dr Nathaniel Butterworth nathaniel.butterworth@sydney.edu.au

More information

Execution Strategy and Runtime Support for Regular and Irregular Applications on Emerging Parallel Architectures

Execution Strategy and Runtime Support for Regular and Irregular Applications on Emerging Parallel Architectures Execution Strategy and Runtime Support for Regular and Irregular Applications on Emerging Parallel Architectures Xin Huo Advisor: Gagan Agrawal Motivation - Architecture Challenges on GPU architecture

More information

Hybrid Implementation of 3D Kirchhoff Migration

Hybrid Implementation of 3D Kirchhoff Migration Hybrid Implementation of 3D Kirchhoff Migration Max Grossman, Mauricio Araya-Polo, Gladys Gonzalez GTC, San Jose March 19, 2013 Agenda 1. Motivation 2. The Problem at Hand 3. Solution Strategy 4. GPU Implementation

More information

Reusing this material

Reusing this material Modules Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-ncsa/4.0/deed.en_us

More information

Parallelization Strategy

Parallelization Strategy COSC 6374 Parallel Computation Algorithm structure Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure

More information

Adaptively Mapping Parallelism Based on System Workload Using Machine Learning

Adaptively Mapping Parallelism Based on System Workload Using Machine Learning Adaptively Mapping Parallelism Based on System Workload Using Machine Learning Dominik Grewe E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science Computer Science School of Informatics University

More information

A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche

A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche Hélène Renard 19th International Heterogeneity in Computing Workshop April 19, 2010 École polytechnique universitaire

More information

This exam is open book / open notes. No electronic devices are permitted.

This exam is open book / open notes. No electronic devices are permitted. SENG 310 Midterm February 2011 Total Marks: / 40 Name Solutions Student # This exam is open book / open notes. No electronic devices are permitted. Part I: Short Answer Questions ( / 12 points) 1. Explain

More information