Fractals. Investigating task farms and load imbalance

Similar documents
Fractals exercise. Investigating task farms and load imbalance

Parallel Programming Patterns Overview and Concepts

CFD exercise. Regular domain decomposition

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Parallel Programming. Libraries and Implementations

HPC Architectures. Types of resource currently in use

Message Passing Programming. Designing MPI Applications

Reusing this material

Batch Systems. Running your jobs on an HPC machine

Welcome! Virtual tutorial starts at 15:00 BST

Parallel Programming Patterns. Overview and Concepts

Parallel Programming Libraries and implementations

Parallel Programming. Libraries and implementations

Meltdown for Dummies. The road to hell is full of good intentions

Advanced OpenMP. Memory model, flush and atomics

Molecular Modelling and the Cray XC30 Performance Counters. Michael Bareford, ARCHER CSE Team

Fractal Dimension of Julia Sets

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

Building Blocks. Operating Systems, Processes, Threads

Message-Passing Programming with MPI. Message-Passing Concepts

Introduction to Object- Oriented Programming

Chap. 5 Part 2. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Complex Numbers, Polar Equations, and Parametric Equations. Copyright 2017, 2013, 2009 Pearson Education, Inc.

Data Analytics with HPC. Data Streaming

Welcome! Virtual tutorial starts at 15:00 GMT. Please leave feedback afterwards at:

A Comparative Study of Load Balancing Algorithms: A Review Paper

Analytical Modeling of Parallel Programs

CS179: GPU Programming Recitation 5: Rendering Fractals

Requirements, Partitioning, paging, and segmentation

Parallel Programming Patterns

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Process. One or more threads of execution Resources required for execution. Memory (RAM) Others

Multiprocessor Systems. COMP s1

Reusing this material

Embarrassingly Parallel Computations Creating the Mandelbrot set

Lecture 14. Performance Programming with OpenMP

Escape-Time Fractals

NumPy. Arno Proeme, ARCHER CSE Team Attributed to Jussi Enkovaara & Martti Louhivuori, CSC Helsinki

Domain Decomposition: Computational Fluid Dynamics

Parallel design patterns ARCHER course. Vectorisation and active messaging

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

Multiprocessor Systems. Chapter 8, 8.1

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Message Passing Programming. Introduction to MPI

Search Algorithms for Discrete Optimization Problems

Message Passing Programming. Modes, Tags and Communicators

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

Introduction to running C based MPI jobs on COGNAC. Paul Bourke November 2006

Introduction to OpenMP

Overloading, abstract classes, and inheritance

MATLAB Distributed Computing Server (MDCS) Training

Welcome! Virtual tutorial will start at 15:00 GMT. Please leave feedback afterwards at:

FRACTALS. Week 13 Laboratory for Introduction to Programming and Algorithms Uwe R. Zimmer based on a lab by James Barker. Pre-Laboratory Checklist

Domain Decomposition: Computational Fluid Dynamics

Fun with Fractals Saturday Morning Math Group

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling

ò mm_struct represents an address space in kernel ò task represents a thread in the kernel ò A task points to 0 or 1 mm_structs

Domain Decomposition: Computational Fluid Dynamics

Runtime Support for Scalable Task-parallel Programs

Scheduling. Don Porter CSE 306

Technical Report. Performance Analysis for Parallel R Programs: Towards Efficient Resource Utilization. Helena Kotthaus, Ingo Korb, Peter Marwedel

Fortran classes and data visibility

Parallelizing Compilers

Parallel design patterns ARCHER course. The actor pattern

Hybrid Implementation of 3D Kirchhoff Migration

Execution Strategy and Runtime Support for Regular and Irregular Applications on Emerging Parallel Architectures

The MPI Message-passing Standard Lab Time Hands-on. SPD Course Massimo Coppola

A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche

Chapter 11 Search Algorithms for Discrete Optimization Problems

Reusing this material

Reusing this material

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

It s. slow! SQL Saturday. Copyright Heraflux Technologies. Do not redistribute or copy as your own. 1. Database. Firewall Load Balancer.

A Discussion of Julia and Mandelbrot Sets. In this paper we will examine the definitions of a Julia Set and the Mandelbrot Set, its

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

On Fractal Colouring Algorithms

Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ,

Introduction to OpenMP

Requirements, Partitioning, paging, and segmentation

Hopalong Fractals. Linear complex. Quadratic complex

Introduction to OpenMP. Lecture 4: Work sharing directives

MPI Casestudy: Parallel Image Processing

A New Fuzzy Algorithm for Dynamic Load Balancing In Distributed Environment

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Graphs and Network Flows IE411. Lecture 20. Dr. Ted Ralphs

Parallel Constraint Programming (and why it is hard... ) Ciaran McCreesh and Patrick Prosser

Search Algorithms for Discrete Optimization Problems

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Process. One or more threads of execution Resources required for execution

Real-time grid computing for financial applications

Allows program to be incrementally parallelized

PowerVR Series5. Architecture Guide for Developers

Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016

Outline: Embarrassingly Parallel Problems. Example#1: Computation of the Mandelbrot Set. Embarrassingly Parallel Problems. The Mandelbrot Set

Lesson 5: Identifying Proportional and Non-Proportional Relationships in Graphs

Drawing fractals in a few lines of Matlab

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Boundary scanning and complex dynamics

Non-Blocking Communications

Transcription:

Fractals Investigating task farms and load imbalance

Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us This means you are free to copy and redistribute the material and adapt and build on the material under the following terms: You must give appropriate credit, provide a link to the license and indicate if changes were made. If you adapt or build on the material you must distribute your work under the same license as the original. Note that this presentation contains images owned by others. Please seek their permission before reusing these images.

h"p://www.archer.ac.uk support@archer.ac.uk

The Mandelbrot Set The Mandelbrot Set is the set of numbers resulting from repeated iterations of the complex (i = -1) function: 2 Z = Z + C Z = 0 n n 1 with the initial condition 0 C = x 0 +iy 0 belongs to the Mandelbrot set if Z n remains bounded i.e. does not diverge Z n = x n + iy n, Z n 2 = (x n 2 y n 2 + 2ix n y n ), Z n 2 =(x n2 +y n2 )

The Mandelbrot Set cont. Separating out the real and imaginary parts gives: Z n = Z n r + iz n i Z r n = x 2 2 n 1 y n 1 Z n i + x 0 = 2x n 1 y n 1 + y 0 Take the threshold value as: Z 2 4.0 Set the maximum number of iterations to N max - Assume that Z does not diverge at higher values of N max

The Julia Set Similar algorithm to Mandelbrot Set recall: 2 Zn = Zn 1 + C, C = x 0 + iy0, Z 0 = 0 There are an infinite number of Julia sets, parameterised by a complex number C 2 Zn = Zn + C Z 0 = x + iy 0 1, 0 for example, C = 0.8 + i 0.156

Visualisation To visualise a Mandelbrot/Julia set: Represent the complex plane as a 2D grid where complex numbers correspond to points on the grid (x, y) Calculate number of iterations N for the series to diverge (exceed the threshold) for each point on the grid If it does not diverge, N = N max Convert the value of N to a colour and plot this on the grid

Mandelbrot Set Very slow to compute Very quick to compute

Parallelisation Values for each coordinate depend only on the previous values at that coordinate. decompose 2D grid into equally sized blocks no communications between blocks needed. Don t know in advance how much work is needed. number of iterations across the blocks varies. work dynamically assigned to workers as they become available. Implementation Split the grid into blocks: each block corresponds to a task. master process hands out tasks to worker processes. workers return completed task to master.

Example: Parallelisation on 4 CPUs master CPU 1 7 8 9 workers CPU 2 CPU 3 CPU 4 4 5 6 1 25 2 3 y x 1 2 3 In diagram, colour represents which worker did the task number gives the task id tasks scan from left to right, moving upwards

Parallelisation cont. 4 4 4 4 3 3 1 3 1 4 2 1 1 2 3 4 in supplied code shading represents worker here we have added worker id as a number by hand e.g. taskfarm run on 5 CPUs 1 master 4 workers total number of tasks = 16

Exercise You are supplied with source code etc. Compile and run on ARCHER visualise results Quantify performance results For a fixed number of workers improve load balance by increasing number of tasks (decrease size) compute LIF to estimate minimum achievable runtime is this minimum ever reached?

Fractals Outcomes

Example results fixed number of workers

Results cont.

16 workers and 16 tasks -----Workload Summary (number of iterations)---- Total Number of Workers: 16 Total Number of Tasks: 16 Total Worker Load: 498023053 Average Worker Load: 31126440 Maximum Worker Load: 156694685 Minimum Worker Load: 62822 Time taken by 16 workers was 1.929219 (secs) Load Imbalance Factor: 5.034134

16 workers and 64 tasks -----Workload Summary (number of iterations)--------- Total Number of Workers: 16 Total Number of Tasks: 64 Total Worker Load: 498023053 Average Worker Load: 31126440 Maximum Worker Load: 46743511 Minimum Worker Load: 10968369 Time taken by 16 workers was 0.586923 (secs) Load Imbalance Factor: 1.501730

Key points to take away TASK FARMS Also known as the master/worker pattern Allows a master process to distribute work to a set of worker processors. Can be used for other types of tasks but it complicates the situation and other patterns may be more suitable for implementing. Master process is responsible for creating, distributing and gathering the individual jobs. Can improve load balance by using more tasks than workers with some overhead Load imbalance adversely affects performance especially as number of processors increases

Key points to take away TASKS Units of work Vary in size, do not have to be of consistent execution time. If execution times are known it can help with load balancing. QUEUES Master generates a pool of tasks and puts them in a queue Workers assigned task from queue when idle

Key points to take away LOAD BALANCING How a system determines how work or tasks are distributed across workers (processes or threads) Successful load balancing avoids idle processes and overloading single cores Poor load balancing leads to under-utilised cores, reducing performance.

Key points to take away COST Increasingly important Finite budgets require optimal use of resources requested. Load balancing is just one method of ensuring optimal usage and avoiding wasting resources. More power and resources do not necessarily mean improved performance. Always ask is it necessary to run this on 4000 cores or could it be run on 2000 more efficiently?