A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche

Size: px
Start display at page:

Download "A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche"

Transcription

1 A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche Hélène Renard 19th International Heterogeneity in Computing Workshop April 19, 2010 École polytechnique universitaire de Nice-Sophia Antipolis i3s Université Bordeaux 1 LaBRI

2 Abdou Guermouche SimGrid vs Real-Life 2 Plan of presentation 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion

3 Framework Plan of presentation Abdou Guermouche SimGrid vs Real-Life 3 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion

4 Abdou Guermouche SimGrid vs Real-Life 4 Framework Plan of presentation Data redistribution algorithms 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion

5 Abdou Guermouche SimGrid vs Real-Life 5 Framework Data redistribution algorithms Data redistribution algorithms : context Target platforms: distributed heterogeneous platforms (network of workstations, clusters of clusters, grids, etc.) 1. Various sources of load imbalance : application requirements / platform. 2. The data must be redistributed to achieve a better load balancing. 3. No discussion of the mechanism of load balancing we consider it as given.

6 Abdou Guermouche SimGrid vs Real-Life 6 Framework Data redistribution algorithms Data redistribution algorithms : context The algorithm operates on a wide array of rectangular sample data: The array is split in vertical slices; This geometric constraint recommends that processors must be organized as a virtual ring: Each processor only communicates twice (once with each neighbor). x i 1,j x i,j 1 x i,j x i,j+1 x i+1,j P i 1 P i P i+1 Figure: Communication scheme.

7 Abdou Guermouche SimGrid vs Real-Life 7 Framework Data redistribution algorithms Redistribution problem for heterogeneous bidirectional rings Definition A redistribution is light if each processor initially owns all data that it will send during the execution of the algorithm. Minimize τ subject to S i,i+1 0 S i,i 1 0 S i,i+1 + S i,i 1 S i+1,i S i 1,i = δ i S i,i+1 c i,i+1 + S i,i 1 c i,i 1 τ S i+1,i c i+1,i + S i 1,i c i 1,i τ 1 i n 1 i n 1 i n 1 i n 1 i n (1) To lead to... We can use the solution of System 1 safely.

8 Abdou Guermouche SimGrid vs Real-Life 8 Framework Plan of presentation Heat propagation 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion

9 Abdou Guermouche SimGrid vs Real-Life 9 Laplace equation Framework Heat propagation Context A metal plate to which is applied a source of heat from the edges. The heat will spread within plate. The temperature at the edges is kept constant, the heat distribution in the plate tends to a stationary state. Heat source Laplace equation : 2 f x f y 2 = 0 Heat source

10 Abdou Guermouche SimGrid vs Real-Life 10 Laplace equation : Framework Heat propagation 2 f x + 2 f 2 y = 0 2 Resolution : 1. Approximating the solution discretization grid n 2 points Heat source Heat source 2. Using finite differences on the Laplace equation, this is equivalent to iteratively solve the following equation: 4x i,j (x i 1,j + x i+1,j + x i,j 1 + x i,j+1 ) = 0

11 Abdou Guermouche SimGrid vs Real-Life 11 Laplace equation: Framework Heat propagation 2 f x + 2 f 2 y = 0 2 xi,j 1 xi 1,j xi,j xi,j+1 Same pattern of communication as the ring of processors xi+1,j Pi 1 Pi Pi+1 Figure: Communication scheme. Communication only with immediate neighbors. 3. Solving a linear system Jacobi, since it is of the form: Ax = b, with A and x as... x 1,1... x 1, = b x n,n 1 x n,n

12 Real-life and simulation Plan of presentation Abdou Guermouche SimGrid vs Real-Life Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion

13 Abdou Guermouche SimGrid vs Real-Life 13 Real-life and simulation Plan of presentation 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion

14 Abdou Guermouche SimGrid vs Real-Life 14 Real-life and simulation Goal : Compare the behavior of algorithms for load balancing and data redistribution on two different platforms : Grid 5000 SimGrid Figure: SimGrid Figure: Grid 5000

15 Abdou Guermouche SimGrid vs Real-Life 15 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Figure: Experimental scheme: the master and the workers. This organization is used in both the simulated and real-life context. The difference comes from the monitor which is given by SimGrid in the simulated context.

16 Abdou Guermouche SimGrid vs Real-Life 15 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Figure: Experimental scheme: the master and the workers. Master: Gather the results of the measurements. Call the redistribution algorithms when needed.

17 Abdou Guermouche SimGrid vs Real-Life 15 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Figure: Experimental scheme: the master and the workers. Monitor: Modify (using wrekavoc) the characteristics of the platform.

18 Abdou Guermouche SimGrid vs Real-Life 15 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Slaves: Figure: Experimental scheme: the master and the workers. Do all the computations and communications. Exchange data for redistribution according to the results of the master.

19 Abdou Guermouche SimGrid vs Real-Life 16 Real-life and simulation Plan of presentation Wrekavoc 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion

20 Abdou Guermouche SimGrid vs Real-Life 17 Real-life and simulation Wrekavoc Wrekavoc, in the center of both platforms 1. in our context, Wrekavoc is used to control CPU and network capabilities; of randomly chosen resources; in order to study the behavior of the application. Modify CPU speed Modify Memory available Node Daemon Modify Network bandwith & Latency Figure: Wrekavoc in pictures

21 Abdou Guermouche SimGrid vs Real-Life 18 Real-life and simulation Wrekavoc 1. Real and simulated execution: Retrieve through measurements: processor speed network latency inbound bandwidth Differences: Real execution: the modification of the characteristics of the platform are done using wrekavoc, Simulated execution: the modification of the characteristics of the platfrom is a built-in functionnality of SimGrid.

22 Experimental results Plan of presentation Abdou Guermouche SimGrid vs Real-Life Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion

23 Experimental results Time for iteration (in seconds) Real-life Simgrid Iteration number Time for iteration (in seconds) Real-life Simgrid Iteration number (a) No platform variation. (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: one site platform. Abdou Guermouche SimGrid vs Real-Life 20

24 Experimental results Time for iteration (in seconds) Real-life Simgrid Iteration number Time for iteration (in seconds) Real-life Simgrid Iteration number (a) No platform variation. (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: two sites platform. Abdou Guermouche SimGrid vs Real-Life 21

25 Experimental results Time for iteration (in seconds) Real-life Simgrid Iteration number Time for iteration (in seconds) Real-life Simgrid Iteration number (a) No platform variation. (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: five sites platform. Abdou Guermouche SimGrid vs Real-Life 22

26 Experimental results Time for iteration (in seconds) Real-life Simgrid Iteration number Time for iteration (in seconds) Real-life Simgrid Iteration number (a) No platform variation. (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: two sites platform. Each iteration is three time more costly than a regular one. Abdou Guermouche SimGrid vs Real-Life 23

27 Conclusion Plan of presentation Abdou Guermouche SimGrid vs Real-Life Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion

28 Abdou Guermouche SimGrid vs Real-Life 25 Conclusion Conclusion 1. Two versions of the same application: the propagation of heat Simulated implementation on top of SimGrid. Real-life implementation running on the Grid 5000 platform. Using wrekavoc to control the characteristics of the platform. Use the same platform characteristics over time in the two contexts. 2. The observed behavior for the simulated case is very close to that of a real execution. 3. A first step for validation of SimGrid in the context of complex applications.

Modalis. A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche and Hélène Renard, May 5, 2010

Modalis. A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche and Hélène Renard, May 5, 2010 A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche and Hélène Renard, LaBRI/Univ Bordeaux 1 I3S/École polytechnique universitaire de Nice-Sophia Antipolis May

More information

Centralized versus distributed schedulers for multiple bag-of-task applications

Centralized versus distributed schedulers for multiple bag-of-task applications Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.

More information

Centralized versus distributed schedulers for multiple bag-of-task applications

Centralized versus distributed schedulers for multiple bag-of-task applications Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

An introduction to mesh generation Part IV : elliptic meshing

An introduction to mesh generation Part IV : elliptic meshing Elliptic An introduction to mesh generation Part IV : elliptic meshing Department of Civil Engineering, Université catholique de Louvain, Belgium Elliptic Curvilinear Meshes Basic concept A curvilinear

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Accurate emulation of CPU performance

Accurate emulation of CPU performance Accurate emulation of CPU performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy Grand Est 2 LORIA / Nancy - Université Validation of distributed systems Approaches: Theoretical approach

More information

Joe Wingbermuehle, (A paper written under the guidance of Prof. Raj Jain)

Joe Wingbermuehle, (A paper written under the guidance of Prof. Raj Jain) 1 of 11 5/4/2011 4:49 PM Joe Wingbermuehle, wingbej@wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download The Auto-Pipe system allows one to evaluate various resource mappings and topologies

More information

Comparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2

Comparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2 Comparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2 Jingwei Zhu March 19, 2014 Instructor: Surya Pratap Vanka 1 Project Description The purpose of this

More information

Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai

Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example

More information

Parallel Computing. Parallel Algorithm Design

Parallel Computing. Parallel Algorithm Design Parallel Computing Parallel Algorithm Design Task/Channel Model Parallel computation = set of tasks Task Program Local memory Collection of I/O ports Tasks interact by sending messages through channels

More information

Computational Fluid Dynamics (CFD) using Graphics Processing Units

Computational Fluid Dynamics (CFD) using Graphics Processing Units Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores

More information

f xx + f yy = F (x, y)

f xx + f yy = F (x, y) Application of the 2D finite element method to Laplace (Poisson) equation; f xx + f yy = F (x, y) M. R. Hadizadeh Computer Club, Department of Physics and Astronomy, Ohio University 4 Nov. 2013 Domain

More information

Fractals. Investigating task farms and load imbalance

Fractals. Investigating task farms and load imbalance Fractals Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA

More information

Figure (5) Kohonen Self-Organized Map

Figure (5) Kohonen Self-Organized Map 2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;

More information

University of Innsbruck. Topology Aware Data Organisation for Large Scale Simulations

University of Innsbruck. Topology Aware Data Organisation for Large Scale Simulations University of Innsbruck Institute of Computer Science Research Group DPS (Distributed and Parallel Systems) Topology Aware Data Organisation for Large Scale Simulations Master Thesis Supervisor: Herbert

More information

Fractals exercise. Investigating task farms and load imbalance

Fractals exercise. Investigating task farms and load imbalance Fractals exercise Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Breaking the memory barrier (for finite difference modeling)

Breaking the memory barrier (for finite difference modeling) Breaking the memory barrier (for finite difference modeling) Jon Marius Venstad Norwegian University of Science and Technology (NTNU) Department of Petroleum Engineering & Applied Geophysics E-mail: venstad@gmail.com

More information

Parallelization Strategy

Parallelization Strategy COSC 335 Software Design Parallel Design Patterns (II) Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure

More information

Parallel FDTD Solver with Static and Dynamic Load Balancing

Parallel FDTD Solver with Static and Dynamic Load Balancing Parallel FDTD Solver with Static and Dynamic Load Balancing Gleb Balykov Lomonosov Moscow State University, Moscow, 119991, Russia balykov.gleb@yandex.ru Abstract. Finite-difference time-domain method

More information

Parallel Programming Patterns Overview and Concepts

Parallel Programming Patterns Overview and Concepts Parallel Programming Patterns Overview and Concepts Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Matrix multiplication

Matrix multiplication Matrix multiplication Standard serial algorithm: procedure MAT_VECT (A, x, y) begin for i := 0 to n - 1 do begin y[i] := 0 for j := 0 to n - 1 do y[i] := y[i] + A[i, j] * x [j] end end MAT_VECT Complexity:

More information

Lecture 27: Board Notes: Parallel Programming Examples

Lecture 27: Board Notes: Parallel Programming Examples Lecture 27: Board Notes: Parallel Programming Examples Part A: Consider the following binary search algorithm (a classic divide and conquer algorithm) that searches for a value X in a sorted N-element

More information

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage

More information

1.2 Numerical Solutions of Flow Problems

1.2 Numerical Solutions of Flow Problems 1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian

More information

Spectral Compression of Mesh Geometry

Spectral Compression of Mesh Geometry Spectral Compression of Mesh Geometry Zachi Karni, Craig Gotsman SIGGRAPH 2000 1 Introduction Thus far, topology coding drove geometry coding. Geometric data contains far more information (15 vs. 3 bits/vertex).

More information

Overlapping Ring Monitoring Algorithm in TIPC

Overlapping Ring Monitoring Algorithm in TIPC Overlapping Ring Monitoring Algorithm in TIPC Jon Maloy, Ericsson Canada Inc. Montreal April 7th 2017 PURPOSE When a cluster node becomes unresponsive due to crash, reboot or lost connectivity we want

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 4. System Partitioning Lothar Thiele 4-1 System Design specification system synthesis estimation SW-compilation intellectual prop. code instruction set HW-synthesis intellectual

More information

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3 6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require

More information

Parallel Poisson Solver in Fortran

Parallel Poisson Solver in Fortran Parallel Poisson Solver in Fortran Nilas Mandrup Hansen, Ask Hjorth Larsen January 19, 1 1 Introduction In this assignment the D Poisson problem (Eq.1) is to be solved in either C/C++ or FORTRAN, first

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Synchronous Computations

Synchronous Computations Synchronous Computations Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Introduction We

More information

Parallel Hybrid Monte Carlo Algorithms for Matrix Computations

Parallel Hybrid Monte Carlo Algorithms for Matrix Computations Parallel Hybrid Monte Carlo Algorithms for Matrix Computations V. Alexandrov 1, E. Atanassov 2, I. Dimov 2, S.Branford 1, A. Thandavan 1 and C. Weihrauch 1 1 Department of Computer Science, University

More information

Large Mesh Deformation Using the Volumetric Graph Laplacian

Large Mesh Deformation Using the Volumetric Graph Laplacian Large Mesh Deformation Using the Volumetric Graph Laplacian Kun Zhou1 Jin Huang2 John Snyder3 Xinguo Liu1 Hujun Bao2 Baining Guo1 Heung-Yeung Shum1 1 Microsoft Research Asia 2 Zhejiang University 3 Microsoft

More information

MPI Programming. Henrik R. Nagel Scientific Computing IT Division

MPI Programming. Henrik R. Nagel Scientific Computing IT Division 1 MPI Programming Henrik R. Nagel Scientific Computing IT Division 2 Outline Introduction Finite Difference Method Finite Element Method LU Factorization SOR Method Monte Carlo Method Molecular Dynamics

More information

CSCE 5160 Parallel Processing. CSCE 5160 Parallel Processing

CSCE 5160 Parallel Processing. CSCE 5160 Parallel Processing HW #9 10., 10.3, 10.7 Due April 17 { } Review Completing Graph Algorithms Maximal Independent Set Johnson s shortest path algorithm using adjacency lists Q= V; for all v in Q l[v] = infinity; l[s] = 0;

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Optimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH

Optimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH Optimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH Carsten Clauss, Martin Pöppe, Thomas Bemmerl carsten@lfbs.rwth-aachen.de http://www.mp-mpich.de Lehrstuhl für Betriebssysteme

More information

Parallel Greedy Matching Algorithms

Parallel Greedy Matching Algorithms Parallel Greedy Matching Algorithms Fredrik Manne Department of Informatics University of Bergen, Norway Rob Bisseling, University of Utrecht Md. Mostofa Patwary, University of Bergen 1 Outline Background

More information

Distributed Fine-Grained Node Localization in Ad-Hoc Networks. A Scalable Location Service for Geographic Ad Hoc Routing. Presented by An Nguyen

Distributed Fine-Grained Node Localization in Ad-Hoc Networks. A Scalable Location Service for Geographic Ad Hoc Routing. Presented by An Nguyen Distributed Fine-Grained Node Localization in Ad-Hoc Networks A Scalable Location Service for Geographic Ad Hoc Routing Presented by An Nguyen Distributed Fine-Grained Node Localization in Ad-Hoc Networks

More information

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop

More information

Event-driven computing

Event-driven computing Event-driven computing Andrew Brown Southampton adb@ecs.soton.ac.uk Simon Moore Cambridge simon.moore@cl.cam.ac.uk David Thomas Imperial College d.thomas1@imperial.ac.uk Andrey Mokhov Newcastle andrey.mokhov@newcastle.ac.uk

More information

Monte Carlo Methods; Combinatorial Search

Monte Carlo Methods; Combinatorial Search Monte Carlo Methods; Combinatorial Search Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 22, 2012 CPD (DEI / IST) Parallel and

More information

Parallel Systems. Project topics

Parallel Systems. Project topics Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a

More information

by Mahender Reddy Concept To Reality / Summer 2006

by Mahender Reddy Concept To Reality / Summer 2006 by Mahender Reddy Demand for higher extrusion rates, increased product quality and lower energy consumption have prompted plants to use various methods to determine optimum process conditions and die designs.

More information

AMath 483/583 Lecture 21 May 13, 2011

AMath 483/583 Lecture 21 May 13, 2011 AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versions of Jacobi iteration Gauss-Seidel and SOR iterative methods Next week: More MPI Debugging and totalview GPU computing Read: Class notes

More information

Data mining with sparse grids

Data mining with sparse grids Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks

More information

Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India

Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India Outline of Presentation MPEG-2 to H.264 Transcoding Need for a multiprocessor

More information

Parallel Algorithm Design. CS595, Fall 2010

Parallel Algorithm Design. CS595, Fall 2010 Parallel Algorithm Design CS595, Fall 2010 1 Programming Models The programming model o determines the basic concepts of the parallel implementation and o abstracts from the hardware as well as from the

More information

Modeling and Simulation of Hybrid Systems

Modeling and Simulation of Hybrid Systems Modeling and Simulation of Hybrid Systems Luka Stanisic Samuel Thibault Arnaud Legrand Brice Videau Jean-François Méhaut CNRS/Inria/University of Grenoble, France University of Bordeaux/Inria, France JointLab

More information

Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems

Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems Gamal Attiya and Yskandar Hamam Groupe ESIEE Paris, Lab. A 2 SI Cité Descartes, BP 99, 93162 Noisy-Le-Grand, FRANCE {attiyag,hamamy}@esiee.fr

More information

Parallel Pencil Beam Redefinition Algorithm. Paul Alderson Mark Wright Amit Jain Robert Boyd

Parallel Pencil Beam Redefinition Algorithm. Paul Alderson Mark Wright Amit Jain Robert Boyd Parallel Pencil Beam Redefinition Algorithm Paul Alderson Mark Wright Amit Jain Robert Boyd Problem Definition Radiation Therapy Pencil Beam Redefinition Algorithm (PBRA) calculates radiation dose distributions.

More information

AMath 483/583 Lecture 24. Notes: Notes: Steady state diffusion. Notes: Finite difference method. Outline:

AMath 483/583 Lecture 24. Notes: Notes: Steady state diffusion. Notes: Finite difference method. Outline: AMath 483/583 Lecture 24 Outline: Heat equation and discretization OpenMP and MPI for iterative methods Jacobi, Gauss-Seidel, SOR Notes and Sample codes: Class notes: Linear algebra software $UWHPSC/codes/openmp/jacobi1d_omp1.f90

More information

Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism

Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism 1 Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism Takeshi Nanri (Kyushu Univ. and JST CREST, Japan) 16 Aug, 2016 4th Annual MVAPICH Users Group Meeting 2 Background

More information

Overview of MUMPS (A multifrontal Massively Parallel Solver)

Overview of MUMPS (A multifrontal Massively Parallel Solver) Overview of MUMPS (A multifrontal Massively Parallel Solver) E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, J.-Y. L Excellent, T. Slavova, B. Uçar CERFACS, ENSEEIHT-IRIT, INRIA (LIP and LaBRI),

More information

Fluent User Services Center

Fluent User Services Center Solver Settings 5-1 Using the Solver Setting Solver Parameters Convergence Definition Monitoring Stability Accelerating Convergence Accuracy Grid Independence Adaption Appendix: Background Finite Volume

More information

10th August Part One: Introduction to Parallel Computing

10th August Part One: Introduction to Parallel Computing Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer

More information

AMath 483/583 Lecture 24

AMath 483/583 Lecture 24 AMath 483/583 Lecture 24 Outline: Heat equation and discretization OpenMP and MPI for iterative methods Jacobi, Gauss-Seidel, SOR Notes and Sample codes: Class notes: Linear algebra software $UWHPSC/codes/openmp/jacobi1d_omp1.f90

More information

Scheduling Tasks Sharing Files from Distributed Repositories

Scheduling Tasks Sharing Files from Distributed Repositories from Distributed Repositories Arnaud Giersch 1, Yves Robert 2 and Frédéric Vivien 2 1 ICPS/LSIIT, University Louis Pasteur, Strasbourg, France 2 École normale supérieure de Lyon, France September 1, 2004

More information

Contents. I The Basic Framework for Stationary Problems 1

Contents. I The Basic Framework for Stationary Problems 1 page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other

More information

Unit I Reading Graphical Methods

Unit I Reading Graphical Methods Unit I Reading Graphical Methods One of the most effective tools for the visual evaluation of data is a graph. The investigator is usually interested in a quantitative graph that shows the relationship

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

Design of Parallel Programs Algoritmi e Calcolo Parallelo. Daniele Loiacono

Design of Parallel Programs Algoritmi e Calcolo Parallelo. Daniele Loiacono Design of Parallel Programs Algoritmi e Calcolo Parallelo Web: home.dei.polimi.it/loiacono Email: loiacono@elet.polimi.it References q The material in this set of slide is taken from two tutorials by Blaise

More information

A Decoupled Scheduling Approach for the GrADS Program Development Environment. DCSL Ahmed Amin

A Decoupled Scheduling Approach for the GrADS Program Development Environment. DCSL Ahmed Amin A Decoupled Scheduling Approach for the GrADS Program Development Environment DCSL Ahmed Amin Outline Introduction Related Work Scheduling Architecture Scheduling Algorithm Testbench Results Conclusions

More information

ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS

ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS NURNABI MEHERUL ALAM M.Sc. (Agricultural Statistics), Roll No. I.A.S.R.I, Library Avenue, New Delhi- Chairperson: Dr. P.K. Batra Abstract: Block designs

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Complexity results for throughput and latency optimization of replicated and data-parallel workflows

Complexity results for throughput and latency optimization of replicated and data-parallel workflows Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon June 2007 Anne.Benoit@ens-lyon.fr

More information

A Technique for improving the scheduling of network communicating processes in MOSIX

A Technique for improving the scheduling of network communicating processes in MOSIX A Technique for improving the scheduling of network communicating processes in MOSIX Rengakrishnan Subramanian Masters Report, Final Defense Guidance by Prof. Dan Andresen Agenda MOSIX Network communicating

More information

Smoothing & Fairing. Mario Botsch

Smoothing & Fairing. Mario Botsch Smoothing & Fairing Mario Botsch Motivation Filter out high frequency noise Desbrun, Meyer, Schroeder, Barr: Implicit Fairing of Irregular Meshes using Diffusion and Curvature Flow, SIGGRAPH 99 2 Motivation

More information

Non supervised classification of individual electricity curves

Non supervised classification of individual electricity curves Non supervised classification of individual electricity curves Jairo Cugliari 17 février 2014 Joint work with Benjamin Auder (LMO, Université Paris-Sud) Outline 1 Motivation 2 Functional clustering 3 Parallel

More information

The Overall SHRIMP Project. Related Work (from last section) Paper Goals. Bill Kramer April 17, 2002

The Overall SHRIMP Project. Related Work (from last section) Paper Goals. Bill Kramer April 17, 2002 CS 258 Reading Assignment 16 Discussion Design Choice in the SHRIMP System: An Empirical Study Bill Kramer April 17, 2002 # The Overall SHRIMP Project The SHRIMP (Scalable High-performance Really Inexpensive

More information

Hybrid MPI + OpenMP Approach to Improve the Scalability of a Phase-Field-Crystal Code

Hybrid MPI + OpenMP Approach to Improve the Scalability of a Phase-Field-Crystal Code Hybrid MPI + OpenMP Approach to Improve the Scalability of a Phase-Field-Crystal Code Reuben D. Budiardja reubendb@utk.edu ECSS Symposium March 19 th, 2013 Project Background Project Team (University of

More information

RT 3D FDTD Simulation of LF and MF Room Acoustics

RT 3D FDTD Simulation of LF and MF Room Acoustics RT 3D FDTD Simulation of LF and MF Room Acoustics ANDREA EMANUELE GRECO Id. 749612 andreaemanuele.greco@mail.polimi.it ADVANCED COMPUTER ARCHITECTURES (A.A. 2010/11) Prof.Ing. Cristina Silvano Dr.Ing.

More information

Virtual Network Embedding with Substrate Support for Parallelization

Virtual Network Embedding with Substrate Support for Parallelization Virtual Network Embedding with Substrate Support for Parallelization Sheng Zhang, Nanjing University, P.R. China Jie Wu, Temple University, USA Sanglu Lu, Nanjing University, P.R. China Presenter: Pouya

More information

New Challenges In Dynamic Load Balancing

New Challenges In Dynamic Load Balancing New Challenges In Dynamic Load Balancing Karen D. Devine, et al. Presentation by Nam Ma & J. Anthony Toghia What is load balancing? Assignment of work to processors Goal: maximize parallel performance

More information

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014

More information

Application Example Running on Top of GPI-Space Integrating D/C

Application Example Running on Top of GPI-Space Integrating D/C Application Example Running on Top of GPI-Space Integrating D/C Tiberiu Rotaru Fraunhofer ITWM This project is funded from the European Union s Horizon 2020 Research and Innovation programme under Grant

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

ANSYS FLUENT. Lecture 3. Basic Overview of Using the FLUENT User Interface L3-1. Customer Training Material

ANSYS FLUENT. Lecture 3. Basic Overview of Using the FLUENT User Interface L3-1. Customer Training Material Lecture 3 Basic Overview of Using the FLUENT User Interface Introduction to ANSYS FLUENT L3-1 Parallel Processing FLUENT can readily be run across many processors in parallel. This will greatly speed up

More information

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static

More information

A Box-Consistency Contraction Operator Based on extremal Functions

A Box-Consistency Contraction Operator Based on extremal Functions A Box-Consistency Contraction Operator Based on extremal Functions Gilles Trombettoni, Yves Papegay, Gilles Chabert, Odile Pourtallier INRIA, Université de Nice-Sophia, France October 2008, El Paso, Texas

More information

Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17

Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17 Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17 11/27/2017 Background Many developers choose OpenMP in hopes of having a single source code that runs effectively anywhere (performance

More information

Geometric Modeling Assignment 3: Discrete Differential Quantities

Geometric Modeling Assignment 3: Discrete Differential Quantities Geometric Modeling Assignment : Discrete Differential Quantities Acknowledgements: Julian Panetta, Olga Diamanti Assignment (Optional) Topic: Discrete Differential Quantities with libigl Vertex Normals,

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Germán Llort

Germán Llort Germán Llort gllort@bsc.es >10k processes + long runs = large traces Blind tracing is not an option Profilers also start presenting issues Can you even store the data? How patient are you? IPDPS - Atlanta,

More information

Chapter 3 : Topology basics

Chapter 3 : Topology basics 1 Chapter 3 : Topology basics What is the network topology Nomenclature Traffic pattern Performance Packaging cost Case study: the SGI Origin 2000 2 Network topology (1) It corresponds to the static arrangement

More information

M.E.J. Newman: Models of the Small World

M.E.J. Newman: Models of the Small World A Review Adaptive Informatics Research Centre Helsinki University of Technology November 7, 2007 Vocabulary N number of nodes of the graph l average distance between nodes D diameter of the graph d is

More information

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC Lecture 9: Group Communication Operations Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 4 slides of the text, by A. Grama w/ a few changes, augmentations and corrections Topic Overview

More information

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press,   ISSN The implementation of a general purpose FORTRAN harness for an arbitrary network of transputers for computational fluid dynamics J. Mushtaq, A.J. Davies D.J. Morgan ABSTRACT Many Computational Fluid Dynamics

More information

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids

More information

On Cluster Resource Allocation for Multiple Parallel Task Graphs

On Cluster Resource Allocation for Multiple Parallel Task Graphs On Cluster Resource Allocation for Multiple Parallel Task Graphs Henri Casanova Frédéric Desprez Frédéric Suter University of Hawai i at Manoa INRIA - LIP - ENS Lyon IN2P3 Computing Center, CNRS / IN2P3

More information

Graph Partitioning for Scalable Distributed Graph Computations

Graph Partitioning for Scalable Distributed Graph Computations Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 3/3/08 CAP5510 1 Gene g Probe 1 Probe 2 Probe N 3/3/08 CAP5510

More information

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma

More information

The Finite Element Method

The Finite Element Method The Finite Element Method A Practical Course G. R. Liu and S. S. Quek Chapter 1: Computational modeling An overview 1 CONTENTS INTRODUCTION PHYSICAL PROBLEMS IN ENGINEERING COMPUTATIONAL MODELLING USING

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

1.7.1 Laplacian Smoothing

1.7.1 Laplacian Smoothing 1.7.1 Laplacian Smoothing 320491: Advanced Graphics - Chapter 1 434 Theory Minimize energy functional total curvature estimate by polynomial-fitting non-linear (very slow!) 320491: Advanced Graphics -

More information