Modalis. A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche and Hélène Renard, May 5, 2010
|
|
- Derek Briggs
- 5 years ago
- Views:
Transcription
1 A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche and Hélène Renard, LaBRI/Univ Bordeaux 1 I3S/École polytechnique universitaire de Nice-Sophia Antipolis May 5, 2010 Modalis
2 Hélène Renard SimGrid vs Real-Life 2 Plan of presentation 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Scheduling & heat Wrekavoc 3. Experimental results 4. Conclusion and future works
3 Framework Plan of presentation Hélène Renard SimGrid vs Real-Life 3 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Scheduling & heat Wrekavoc 3. Experimental results 4. Conclusion and future works
4 Hélène Renard SimGrid vs Real-Life 4 Framework Plan of presentation Data redistribution algorithms 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Scheduling & heat Wrekavoc 3. Experimental results 4. Conclusion and future works
5 Hélène Renard SimGrid vs Real-Life 5 Framework Data redistribution algorithms Data redistribution algorithms : context Target platforms: distributed heterogeneous platforms (network of workstations, clusters of clusters, grids, etc.) 1. Various sources of load imbalance : application requirements / platform. 2. The data must be redistributed to achieve a better load balancing. 3. No discussion of the mechanism of load balancing we consider it as given.
6 Hélène Renard SimGrid vs Real-Life 6 Framework Data redistribution algorithms Data redistribution algorithms : context The algorithm operates on a wide array of rectangular sample data: The array is split in vertical slices; This geometric constraint recommends that processors must be organized as a virtual ring: Each processor only communicates twice (once with each neighbor). x i 1,j x i,j 1 x i,j x i,j+1 x i+1,j P i 1 P i P i+1 Figure: Communication scheme.
7 Hélène Renard SimGrid vs Real-Life 7 Framework Data redistribution algorithms Redistribution problem for heterogeneous bidirectional rings Definition A redistribution is light if each processor initially owns all data that it will send during the execution of the algorithm. Minimize τ subject to S i,i+1 0 S i,i 1 0 S i,i+1 + S i,i 1 S i+1,i S i 1,i = δ i S i,i+1 c i,i+1 + S i,i 1 c i,i 1 τ S i+1,i c i+1,i + S i 1,i c i 1,i τ 1 i n 1 i n 1 i n 1 i n 1 i n (1) To lead to... We can use the solution of System 1 safely.
8 Hélène Renard SimGrid vs Real-Life 8 Framework Plan of presentation Heat propagation 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Scheduling & heat Wrekavoc 3. Experimental results 4. Conclusion and future works
9 Hélène Renard SimGrid vs Real-Life 9 Laplace equation Framework Heat propagation Context A metal plate to which is applied a source of heat from the edges. The heat will spread within plate. The temperature at the edges is kept constant, the heat distribution in the plate tends to a stationary state. Heat source Laplace equation : 2 f x f y 2 = 0 Heat source
10 Hélène Renard SimGrid vs Real-Life 10 Laplace equation : Framework Heat propagation 2 f x + 2 f 2 y = 0 2 Resolution : 1. Approximating the solution discretization grid n 2 points Heat source Heat source 2. Using finite differences on the Laplace equation, this is equivalent to iteratively solve the following equation: 4x i,j (x i 1,j + x i+1,j + x i,j 1 + x i,j+1 ) = 0
11 Hélène Renard SimGrid vs Real-Life 11 Laplace equation: Framework Heat propagation 2 f x + 2 f 2 y = 0 2 xi,j 1 xi 1,j xi,j xi,j+1 Same pattern of communication as the ring of processors xi+1,j Pi 1 Pi Pi+1 Communication only with immediate neighbors. Figure: Communication scheme. 3. Solving a linear system Jacobi, since it is of the form: Ax = b, with A and x as... x 1,1... x 1, = b x n,n 1 x n,n
12 Hélène Renard SimGrid vs Real-Life 12 Laplace equation: Framework Heat propagation 2 f x + 2 f 2 y = Enrichment of the matrix: the vector b is zero except for on the lower and higher (source heat) neighboring points of point heat source B.. C {z } A x 1,1 x 1,2 x 1,3.. x 5,4 x 5,5 {z } x 1 0 = C B t 1 t 2 t t 9 t 10 1 C A {z } b
13 Real-life and simulation Plan of presentation Hélène Renard SimGrid vs Real-Life Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Scheduling & heat Wrekavoc 3. Experimental results 4. Conclusion and future works
14 Hélène Renard SimGrid vs Real-Life 14 Real-life and simulation Plan of presentation Scheduling & heat 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Scheduling & heat Wrekavoc 3. Experimental results 4. Conclusion and future works
15 Hélène Renard SimGrid vs Real-Life 15 Scheduling & heat Real-life and simulation Scheduling & heat P1 P3 P4 P5 P2 Figure: Data redistribution Figure: Heat propagation
16 Hélène Renard SimGrid vs Real-Life 16 Ring history Real-life and simulation Scheduling & heat Too large! We split the plate among a set of processors: Heat source Heat source Re inject Communication pattern follows a ring organization.
17 Hélène Renard SimGrid vs Real-Life 17 Real-life and simulation Plan of presentation 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Scheduling & heat Wrekavoc 3. Experimental results 4. Conclusion and future works
18 Hélène Renard SimGrid vs Real-Life 18 Real-life and simulation Goal : Compare the behavior of algorithms for load balancing and data redistribution on two different platforms : Grid 5000 SimGrid Figure: SimGrid Figure: Grid 5000
19 Real-life and simulation The corresponding code and two algorithms The corresponding code: The C language standard UNIX sockets for communication, the XDR layer for interoperable communications between heterogeneous machines. No MPI for the communication layer. while end not detected do if I am master then if modulo (current iteration number, interval) == 0 then Wait for state information from all workers; Use the algorithm to build redistribution information; Send redistribution information to each worker. end else Exchange data with neighbors; Update local data and process current iteration; if modulo (current iteration number, interval) == 0 then Perform benchmarks to get the new characteristics of my processor and my network links; end end end Send my new state to master; Wait for redistribution information from master; Apply the redistribution algorithm according to the decision of master. Algorithm 1: Iterative scheme. Hélène Renard SimGrid vs Real-Life 19
20 Hélène Renard SimGrid vs Real-Life 19 Real-life and simulation The corresponding code and two algorithms The corresponding code: The C language standard UNIX sockets for communication, the XDR layer for interoperable communications between heterogeneous machines. No MPI for the communication layer. while end not detected do if I need to modify the platform then Pick a random number of resources to degrade; for each selected resource do Pick a random degradation factor from the interval [40; 100] end Apply the modification of the characteristics of the platform using wrekavoc; Generate the corresponding simulated platform. end end Algorithm 2: Monitor scheme.
21 Hélène Renard SimGrid vs Real-Life 20 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Figure: Experimental scheme: the master and the workers. This organization is used in both the simulated and real-life context. The difference comes from the monitor which is given by SimGrid in the simulated context.
22 Hélène Renard SimGrid vs Real-Life 20 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Figure: Experimental scheme: the master and the workers. Master: Gather the results of the measurements. Call the redistribution algorithms when needed.
23 Hélène Renard SimGrid vs Real-Life 20 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Figure: Experimental scheme: the master and the workers. Monitor: Modify (using wrekavoc) the characteristics of the platform.
24 Hélène Renard SimGrid vs Real-Life 20 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Workers: Figure: Experimental scheme: the master and the workers. Do all the computations and communications. Exchange data for redistribution according to the results of the master.
25 Hélène Renard SimGrid vs Real-Life 21 Real-life and simulation Plan of presentation Wrekavoc 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Scheduling & heat Wrekavoc 3. Experimental results 4. Conclusion and future works
26 Hélène Renard SimGrid vs Real-Life 22 Real-life and simulation Wrekavoc Wrekavoc, in the center of both platforms 1. In our context, Wrekavoc is used to control CPU and network capabilities; of randomly chosen resources; in order to study the behavior of the application. Modify CPU speed Modify Memory available Node Daemon Modify Network bandwith & Latency Figure: Wrekavoc in pictures
27 Hélène Renard SimGrid vs Real-Life 23 Real-life and simulation Wrekavoc 1. Real and simulated execution: Retrieve through measurements: processor speed network latency inbound bandwidth Differences: Real execution: the modification of the characteristics of the platform are done using wrekavoc, Simulated execution: the modification of the characteristics of the platform is a built-in functionality of SimGrid.
28 Experimental results Plan of presentation Hélène Renard SimGrid vs Real-Life Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Scheduling & heat Wrekavoc 3. Experimental results 4. Conclusion and future works
29 Experimental results Hélène Renard SimGrid vs Real-Life 25 Bordeaux Grenoble Lille Lyon Nancy Orsay Rennes Sophia Toulouse total 1 site sites sites Grid 5000: Table: Description of the experimental platforms. A highly reconfigurable, controllable and monitorable experimental platform; Three different set of results; Neither the master nor the monitor are counted. SimGrid: Version 3.3.2; Simulacrum tool to generate the XML description of the platform; Provides the theoretical characteristics of the platform.
30 Experimental results Hélène Renard SimGrid vs Real-Life Real Life SimGrid 100 Real Life SimGrid Time for iteration (in seconds) 10 Time for iteration (in seconds) Iteration number (a) No platform variation Iteration number (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: one site platform.
31 Experimental results Hélène Renard SimGrid vs Real-Life Real Life SimGrid 100 Real Life SimGrid Time for iteration (in seconds) 10 Time for iteration (in seconds) Iteration number (a) No platform variation Iteration number (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: two sites platform.
32 Experimental results Hélène Renard SimGrid vs Real-Life 28 Time for iteration (in seconds) Real Life SimGrid Time for iteration (in seconds) Real Life SimGrid Iteration number (a) No platform variation Iteration number (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: five sites platform.
33 Experimental results Hélène Renard SimGrid vs Real-Life Real Life SimGrid 100 Real Life SimGrid Time for iteration (in seconds) 10 Time for iteration (in seconds) Iteration number (a) No platform variation Iteration number (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: two sites platform. Each iteration is three time more costly than a regular one.
34 Conclusion and future works Plan of presentation Hélène Renard SimGrid vs Real-Life Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Scheduling & heat Wrekavoc 3. Experimental results 4. Conclusion and future works
35 Hélène Renard SimGrid vs Real-Life 31 Conclusion Conclusion and future works 1. Two versions of the same application: the propagation of heat Simulated implementation on top of SimGrid. Real-life implementation running on the Grid 5000 platform. Using wrekavoc to control the characteristics of the platform. Use the same platform characteristics over time in the two contexts. 2. The observed behavior for the simulated case is very close to that of a real execution. 3. A first step for validation of SimGrid in the context of complex applications.
36 Conclusion and future works Future works Hélène Renard SimGrid vs Real-Life SimGrid vs Grid 5000: tightly coupled application where network models have to be, in general, accurate. 1 paper See below: 2. Local upday: check if it is profitable to replace a processor in the ring with a processor that not belong it. 2 papers or more 3. Global change: new solution from scratch. 2 papers or more
A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche
A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche Hélène Renard 19th International Heterogeneity in Computing Workshop April 19, 2010 École polytechnique universitaire
More informationAccurate emulation of CPU performance
Accurate emulation of CPU performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy Grand Est 2 LORIA / Nancy - Université Validation of distributed systems Approaches: Theoretical approach
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationFault tolerance in Grid and Grid 5000
Fault tolerance in Grid and Grid 5000 Franck Cappello INRIA Director of Grid 5000 fci@lri.fr Fault tolerance in Grid Grid 5000 Applications requiring Fault tolerance in Grid Domains (grid applications
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationAn introduction to mesh generation Part IV : elliptic meshing
Elliptic An introduction to mesh generation Part IV : elliptic meshing Department of Civil Engineering, Université catholique de Louvain, Belgium Elliptic Curvilinear Meshes Basic concept A curvilinear
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More information1.2 Numerical Solutions of Flow Problems
1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationComparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2
Comparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2 Jingwei Zhu March 19, 2014 Instructor: Surya Pratap Vanka 1 Project Description The purpose of this
More informationParallelization Strategy
COSC 335 Software Design Parallel Design Patterns (II) Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure
More informationGrid Computing introduction & illustration. T. Gautier V. Danjean
Grid Computing introduction & illustration T. Gautier V. Danjean SMAI 26 mai 2009 Facts No choice : parallelism is in any mputer MPSoC, Multi, Many, Cluster, Grid Exact Solution to the Quadratic Assignment
More informationParallel Programming Patterns Overview and Concepts
Parallel Programming Patterns Overview and Concepts Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationMPI Programming. Henrik R. Nagel Scientific Computing IT Division
1 MPI Programming Henrik R. Nagel Scientific Computing IT Division 2 Outline Introduction Finite Difference Method Finite Element Method LU Factorization SOR Method Monte Carlo Method Molecular Dynamics
More informationAMath 483/583 Lecture 24. Notes: Notes: Steady state diffusion. Notes: Finite difference method. Outline:
AMath 483/583 Lecture 24 Outline: Heat equation and discretization OpenMP and MPI for iterative methods Jacobi, Gauss-Seidel, SOR Notes and Sample codes: Class notes: Linear algebra software $UWHPSC/codes/openmp/jacobi1d_omp1.f90
More informationParallel FDTD Solver with Static and Dynamic Load Balancing
Parallel FDTD Solver with Static and Dynamic Load Balancing Gleb Balykov Lomonosov Moscow State University, Moscow, 119991, Russia balykov.gleb@yandex.ru Abstract. Finite-difference time-domain method
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationAMath 483/583 Lecture 24
AMath 483/583 Lecture 24 Outline: Heat equation and discretization OpenMP and MPI for iterative methods Jacobi, Gauss-Seidel, SOR Notes and Sample codes: Class notes: Linear algebra software $UWHPSC/codes/openmp/jacobi1d_omp1.f90
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationFractals. Investigating task farms and load imbalance
Fractals Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationLarge Mesh Deformation Using the Volumetric Graph Laplacian
Large Mesh Deformation Using the Volumetric Graph Laplacian Kun Zhou1 Jin Huang2 John Snyder3 Xinguo Liu1 Hujun Bao2 Baining Guo1 Heung-Yeung Shum1 1 Microsoft Research Asia 2 Zhejiang University 3 Microsoft
More informationImproving locality of an object store in a Fog Computing environment
Improving locality of an object store in a Fog Computing environment Bastien Confais, Benoît Parrein, Adrien Lebre LS2N, Nantes, France Grid 5000-FIT school 4th April 2018 1/29 Outline 1 Fog computing
More informationParallel Greedy Matching Algorithms
Parallel Greedy Matching Algorithms Fredrik Manne Department of Informatics University of Bergen, Norway Rob Bisseling, University of Utrecht Md. Mostofa Patwary, University of Bergen 1 Outline Background
More informationHybrid MPI + OpenMP Approach to Improve the Scalability of a Phase-Field-Crystal Code
Hybrid MPI + OpenMP Approach to Improve the Scalability of a Phase-Field-Crystal Code Reuben D. Budiardja reubendb@utk.edu ECSS Symposium March 19 th, 2013 Project Background Project Team (University of
More informationSpectral Compression of Mesh Geometry
Spectral Compression of Mesh Geometry Zachi Karni, Craig Gotsman SIGGRAPH 2000 1 Introduction Thus far, topology coding drove geometry coding. Geometric data contains far more information (15 vs. 3 bits/vertex).
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationRuntime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism
1 Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism Takeshi Nanri (Kyushu Univ. and JST CREST, Japan) 16 Aug, 2016 4th Annual MVAPICH Users Group Meeting 2 Background
More informationFractals exercise. Investigating task farms and load imbalance
Fractals exercise Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationOverlapping Ring Monitoring Algorithm in TIPC
Overlapping Ring Monitoring Algorithm in TIPC Jon Maloy, Ericsson Canada Inc. Montreal April 7th 2017 PURPOSE When a cluster node becomes unresponsive due to crash, reboot or lost connectivity we want
More informationA TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE
A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA
More informationMaster-Worker pattern
COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Spring 2017 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:
More informationIOS: A Middleware for Decentralized Distributed Computing
IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationOptimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH
Optimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH Carsten Clauss, Martin Pöppe, Thomas Bemmerl carsten@lfbs.rwth-aachen.de http://www.mp-mpich.de Lehrstuhl für Betriebssysteme
More informationNetworks & protocols research in Grid5000 DAS3
1 Grid 5000 Networks & protocols research in Grid5000 DAS3 Date Pascale Vicat-Blanc Primet Senior Researcher at INRIA Leader of the RESO team LIP Laboratory UMR CNRS-INRIA-ENS-UCBL Ecole Normale Supérieure
More informationParallel FEM Computation and Multilevel Graph Partitioning Xing Cai
Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example
More informationCommunication Models for Resource Constrained Hierarchical Ethernet Networks
Communication Models for Resource Constrained Hierarchical Ethernet Networks Speaker: Konstantinos Katrinis # Jun Zhu +, Alexey Lastovetsky *, Shoukat Ali #, Rolf Riesen # + Technical University of Eindhoven,
More informationTDP3471 Distributed and Parallel Computing
TDP3471 Distributed and Parallel Computing Lecture 1 Dr. Ian Chai ianchai@mmu.edu.my FIT Building: Room BR1024 Office : 03-8312-5379 Schedule for Dr. Ian (including consultation hours) available at http://pesona.mmu.edu.my/~ianchai/schedule.pdf
More informationLoad-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links
Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links Arnaud Legrand, Hélène Renard, Yves Robert, and Frédéric Vivien LIP, UMR CNRS-INRIA-UCBL 5668, École normale
More informationUniversity of Innsbruck. Topology Aware Data Organisation for Large Scale Simulations
University of Innsbruck Institute of Computer Science Research Group DPS (Distributed and Parallel Systems) Topology Aware Data Organisation for Large Scale Simulations Master Thesis Supervisor: Herbert
More informationContents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage
More informationData mining with sparse grids
Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks
More informationParallel Computing. Parallel Algorithm Design
Parallel Computing Parallel Algorithm Design Task/Channel Model Parallel computation = set of tasks Task Program Local memory Collection of I/O ports Tasks interact by sending messages through channels
More informationComputational Fluid Dynamics (CFD) using Graphics Processing Units
Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationComplexity results for throughput and latency optimization of replicated and data-parallel workflows
Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon June 2007 Anne.Benoit@ens-lyon.fr
More informationParallel Systems. Project topics
Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a
More informationf xx + f yy = F (x, y)
Application of the 2D finite element method to Laplace (Poisson) equation; f xx + f yy = F (x, y) M. R. Hadizadeh Computer Club, Department of Physics and Astronomy, Ohio University 4 Nov. 2013 Domain
More informationSome Visualization Models applied to the Analysis of Parallel Applications
1 / 1 Some Visualization Models applied to the Analysis of Parallel Applications Lucas Mello Schnorr Advisors: Philippe O. A. Navaux & Denis Trystram & Guillaume Huard Federal University of Rio Grande
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:
More informationON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS
ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS NURNABI MEHERUL ALAM M.Sc. (Agricultural Statistics), Roll No. I.A.S.R.I, Library Avenue, New Delhi- Chairperson: Dr. P.K. Batra Abstract: Block designs
More informationParallelization Strategy
COSC 6374 Parallel Computation Algorithm structure Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure
More informationLecture 27: Board Notes: Parallel Programming Examples
Lecture 27: Board Notes: Parallel Programming Examples Part A: Consider the following binary search algorithm (a classic divide and conquer algorithm) that searches for a value X in a sorted N-element
More informationMatrix multiplication
Matrix multiplication Standard serial algorithm: procedure MAT_VECT (A, x, y) begin for i := 0 to n - 1 do begin y[i] := 0 for j := 0 to n - 1 do y[i] := y[i] + A[i, j] * x [j] end end MAT_VECT Complexity:
More informationApplication Example Running on Top of GPI-Space Integrating D/C
Application Example Running on Top of GPI-Space Integrating D/C Tiberiu Rotaru Fraunhofer ITWM This project is funded from the European Union s Horizon 2020 Research and Innovation programme under Grant
More informationParallel Poisson Solver in Fortran
Parallel Poisson Solver in Fortran Nilas Mandrup Hansen, Ask Hjorth Larsen January 19, 1 1 Introduction In this assignment the D Poisson problem (Eq.1) is to be solved in either C/C++ or FORTRAN, first
More informationPARALLEL ID3. Jeremy Dominijanni CSE633, Dr. Russ Miller
PARALLEL ID3 Jeremy Dominijanni CSE633, Dr. Russ Miller 1 ID3 and the Sequential Case 2 ID3 Decision tree classifier Works on k-ary categorical data Goal of ID3 is to maximize information gain at each
More informationAMath 483/583 Lecture 21 May 13, 2011
AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versions of Jacobi iteration Gauss-Seidel and SOR iterative methods Next week: More MPI Debugging and totalview GPU computing Read: Class notes
More informationParallel Hybrid Monte Carlo Algorithms for Matrix Computations
Parallel Hybrid Monte Carlo Algorithms for Matrix Computations V. Alexandrov 1, E. Atanassov 2, I. Dimov 2, S.Branford 1, A. Thandavan 1 and C. Weihrauch 1 1 Department of Computer Science, University
More informationDesign of Parallel Programs Algoritmi e Calcolo Parallelo. Daniele Loiacono
Design of Parallel Programs Algoritmi e Calcolo Parallelo Web: home.dei.polimi.it/loiacono Email: loiacono@elet.polimi.it References q The material in this set of slide is taken from two tutorials by Blaise
More informationGraph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen
Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static
More informationReducing Communication Costs Associated with Parallel Algebraic Multigrid
Reducing Communication Costs Associated with Parallel Algebraic Multigrid Amanda Bienz, Luke Olson (Advisor) University of Illinois at Urbana-Champaign Urbana, IL 11 I. PROBLEM AND MOTIVATION Algebraic
More informationHigh-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers
High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers July 14, 1997 J Daniel S. Katz (Daniel.S.Katz@jpl.nasa.gov) Jet Propulsion Laboratory California Institute of Technology
More informationJoe Wingbermuehle, (A paper written under the guidance of Prof. Raj Jain)
1 of 11 5/4/2011 4:49 PM Joe Wingbermuehle, wingbej@wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download The Auto-Pipe system allows one to evaluate various resource mappings and topologies
More informationA Box-Consistency Contraction Operator Based on extremal Functions
A Box-Consistency Contraction Operator Based on extremal Functions Gilles Trombettoni, Yves Papegay, Gilles Chabert, Odile Pourtallier INRIA, Université de Nice-Sophia, France October 2008, El Paso, Texas
More informationAnna Morajko.
Performance analysis and tuning of parallel/distributed applications Anna Morajko Anna.Morajko@uab.es 26 05 2008 Introduction Main research projects Develop techniques and tools for application performance
More informationThe Design of MPI Based Distributed Shared Memory Systems to Support OpenMP on Clusters
The Design of MPI Based Distributed Shared Memory Systems to Support OpenMP on Clusters IEEE Cluster 2007, Austin, Texas, September 17-20 H sien Jin Wong Department of Computer Science The Australian National
More informationModeling and Simulation of Hybrid Systems
Modeling and Simulation of Hybrid Systems Luka Stanisic Samuel Thibault Arnaud Legrand Brice Videau Jean-François Méhaut CNRS/Inria/University of Grenoble, France University of Bordeaux/Inria, France JointLab
More informationAdvanced Computer Architecture Lab 3 Scalability of the Gauss-Seidel Algorithm
Advanced Computer Architecture Lab 3 Scalability of the Gauss-Seidel Algorithm Andreas Sandberg 1 Introduction The purpose of this lab is to: apply what you have learned so
More informationDistributed Scheduling. Distributed Scheduling
CMSC 621, Advanced Operating Systems. Fall 2003 Distributed Scheduling Dr. Kalpakis Distributed Scheduling System performance can be improved by distributed load among heavy and light loaded nodes of the
More informationCPS343 Parallel and High Performance Computing Project 1 Spring 2018
CPS343 Parallel and High Performance Computing Project 1 Spring 2018 Assignment Write a program using OpenMP to compute the estimate of the dominant eigenvalue of a matrix Due: Wednesday March 21 The program
More informationEvent-driven computing
Event-driven computing Andrew Brown Southampton adb@ecs.soton.ac.uk Simon Moore Cambridge simon.moore@cl.cam.ac.uk David Thomas Imperial College d.thomas1@imperial.ac.uk Andrey Mokhov Newcastle andrey.mokhov@newcastle.ac.uk
More informationNumerical Methods in Physics Lecture 2 Interpolation
Numerical Methods in Physics Pat Scott Department of Physics, Imperial College November 8, 2016 Slides available from http://astro.ic.ac.uk/pscott/ course-webpage-numerical-methods-201617 Outline The problem
More informationCHAO YANG. Early Experience on Optimizations of Application Codes on the Sunway TaihuLight Supercomputer
CHAO YANG Dr. Chao Yang is a full professor at the Laboratory of Parallel Software and Computational Sciences, Institute of Software, Chinese Academy Sciences. His research interests include numerical
More informationSILECS Super Infrastructure for Large-scale Experimental Computer Science
Super Infrastructure for Large-scale Experimental Computer Science Serge Fdida (UPMC) Frédéric Desprez (Inria) Christian Perez (Inria) INRIA, CNRS, RENATER, CEA, CPU, CDEFI, IMT, Sorbonne Universite, Universite
More informationAn Overview of Parallel Computing
An Overview of Parallel Computing Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS2101 Plan 1 Hardware 2 Types of Parallelism 3 Concurrency Platforms: Three Examples Cilk CUDA
More informationEfficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI
Efficient AMG on Hybrid GPU Clusters ScicomP 2012 Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann Fraunhofer SCAI Illustration: Darin McInnis Motivation Sparse iterative solvers benefit from
More informationMaster-Worker pattern
COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Fall 2018 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:
More informationParallel Algorithm Design. CS595, Fall 2010
Parallel Algorithm Design CS595, Fall 2010 1 Programming Models The programming model o determines the basic concepts of the parallel implementation and o abstracts from the hardware as well as from the
More informationCMSoft Case Study: Fast Simulation of Heat Transfer using C# and OpenCL
CMSoft Case Study: Fast Simulation of Heat Transfer using C# and OpenCL Simulation of heat transfer in a solid plate with Dirichlet conditions via finite differences using GPU acceleration Time evolution
More informationA Parallel Solver for Laplacian Matrices. Tristan Konolige (me) and Jed Brown
A Parallel Solver for Laplacian Matrices Tristan Konolige (me) and Jed Brown Graph Laplacian Matrices Covered by other speakers (hopefully) Useful in a variety of areas Graphs are getting very big Facebook
More informationAn Overview of Mathematics 6
An Overview of Mathematics 6 Number (N) read, write, represent, and describe numbers greater than one million and less than one-thousandth using symbols, expressions, expanded notation, decimal notation,
More informationAdvanced Parallel Programming
Sebastian von Alfthan Jussi Enkovaara Pekka Manninen Advanced Parallel Programming February 15-17, 2016 PRACE Advanced Training Center CSC IT Center for Science Ltd, Finland All material (C) 2011-2016
More informationRollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello
Rollback-Recovery Protocols for Send-Deterministic Applications Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello Fault Tolerance in HPC Systems is Mandatory Resiliency is
More information10th August Part One: Introduction to Parallel Computing
Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer
More informationEfficient Grid Resource Selection for a CEM Application
Efficient Grid Resource Selection for a CEM Application Cristian KLEIN 15 June 2009 Under the supervision of: Eddy CARON, Christian PÉREZ Location: GRAAL, LIP, École Normale Supérieure de Lyon Abstract
More informationHardware-Software Codesign
Hardware-Software Codesign 4. System Partitioning Lothar Thiele 4-1 System Design specification system synthesis estimation SW-compilation intellectual prop. code instruction set HW-synthesis intellectual
More informationPower-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Gregor von Laszewski, Lizhe Wang, Andrew J. Younge, Xi He Service Oriented Cyberinfrastructure Lab Rochester Institute of Technology,
More informationEdge and local feature detection - 2. Importance of edge detection in computer vision
Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature
More informationReal-time grid computing for financial applications
CNR-INFM Democritos and EGRID project E-mail: cozzini@democritos.it Riccardo di Meo, Ezio Corso EGRID project ICTP E-mail: {dimeo,ecorso}@egrid.it We describe the porting of a test case financial application
More informationParallelizing a Monte Carlo simulation of the Ising model in 3D
Parallelizing a Monte Carlo simulation of the Ising model in 3D Morten Diesen, Erik Waltersson 2nd November 24 Contents 1 Introduction 2 2 Description of the Physical Model 2 3 Programs 3 3.1 Outline of
More informationMultiprocessor Systems Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message pas
Multiple processor systems 1 Multiprocessor Systems Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message passing multiprocessor, access
More informationA MODELING METHOD OF CURING DEFORMATION FOR CFRP COMPOSITE STIFFENED PANEL WANG Yang 1, GAO Jubin 1 BO Ma 1 LIU Chuanjun 1
21 st International Conference on Composite Materials Xi an, 20-25 th August 2017 A MODELING METHOD OF CURING DEFORMATION FOR CFRP COMPOSITE STIFFENED PANEL WANG Yang 1, GAO Jubin 1 BO Ma 1 LIU Chuanjun
More informationOperating Systems, Fall Lecture 9, Tiina Niklander 1
Multiprocessor Systems Multiple processor systems Ch 8.1 8.3 1 Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message passing multiprocessor,
More informationLoad Balancing in Distributed System through Task Migration
Load Balancing in Distributed System through Task Migration Santosh Kumar Maurya 1 Subharti Institute of Technology & Engineering Meerut India Email- santoshranu@yahoo.com Khaleel Ahmad 2 Assistant Professor
More informationCSE 262 Lecture 13. Communication overlap Continued Heterogeneous processing
CSE 262 Lecture 13 Communication overlap Continued Heterogeneous processing Final presentations Announcements Friday March 13 th, 10:00 AM to 1:00PM Note time change. Room 3217, CSE Building (EBU3B) Scott
More information3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA
3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires
More information