A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche
|
|
- Gwendoline Peters
- 5 years ago
- Views:
Transcription
1 A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche Hélène Renard 19th International Heterogeneity in Computing Workshop April 19, 2010 École polytechnique universitaire de Nice-Sophia Antipolis i3s Université Bordeaux 1 LaBRI
2 Abdou Guermouche SimGrid vs Real-Life 2 Plan of presentation 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion
3 Framework Plan of presentation Abdou Guermouche SimGrid vs Real-Life 3 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion
4 Abdou Guermouche SimGrid vs Real-Life 4 Framework Plan of presentation Data redistribution algorithms 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion
5 Abdou Guermouche SimGrid vs Real-Life 5 Framework Data redistribution algorithms Data redistribution algorithms : context Target platforms: distributed heterogeneous platforms (network of workstations, clusters of clusters, grids, etc.) 1. Various sources of load imbalance : application requirements / platform. 2. The data must be redistributed to achieve a better load balancing. 3. No discussion of the mechanism of load balancing we consider it as given.
6 Abdou Guermouche SimGrid vs Real-Life 6 Framework Data redistribution algorithms Data redistribution algorithms : context The algorithm operates on a wide array of rectangular sample data: The array is split in vertical slices; This geometric constraint recommends that processors must be organized as a virtual ring: Each processor only communicates twice (once with each neighbor). x i 1,j x i,j 1 x i,j x i,j+1 x i+1,j P i 1 P i P i+1 Figure: Communication scheme.
7 Abdou Guermouche SimGrid vs Real-Life 7 Framework Data redistribution algorithms Redistribution problem for heterogeneous bidirectional rings Definition A redistribution is light if each processor initially owns all data that it will send during the execution of the algorithm. Minimize τ subject to S i,i+1 0 S i,i 1 0 S i,i+1 + S i,i 1 S i+1,i S i 1,i = δ i S i,i+1 c i,i+1 + S i,i 1 c i,i 1 τ S i+1,i c i+1,i + S i 1,i c i 1,i τ 1 i n 1 i n 1 i n 1 i n 1 i n (1) To lead to... We can use the solution of System 1 safely.
8 Abdou Guermouche SimGrid vs Real-Life 8 Framework Plan of presentation Heat propagation 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion
9 Abdou Guermouche SimGrid vs Real-Life 9 Laplace equation Framework Heat propagation Context A metal plate to which is applied a source of heat from the edges. The heat will spread within plate. The temperature at the edges is kept constant, the heat distribution in the plate tends to a stationary state. Heat source Laplace equation : 2 f x f y 2 = 0 Heat source
10 Abdou Guermouche SimGrid vs Real-Life 10 Laplace equation : Framework Heat propagation 2 f x + 2 f 2 y = 0 2 Resolution : 1. Approximating the solution discretization grid n 2 points Heat source Heat source 2. Using finite differences on the Laplace equation, this is equivalent to iteratively solve the following equation: 4x i,j (x i 1,j + x i+1,j + x i,j 1 + x i,j+1 ) = 0
11 Abdou Guermouche SimGrid vs Real-Life 11 Laplace equation: Framework Heat propagation 2 f x + 2 f 2 y = 0 2 xi,j 1 xi 1,j xi,j xi,j+1 Same pattern of communication as the ring of processors xi+1,j Pi 1 Pi Pi+1 Figure: Communication scheme. Communication only with immediate neighbors. 3. Solving a linear system Jacobi, since it is of the form: Ax = b, with A and x as... x 1,1... x 1, = b x n,n 1 x n,n
12 Real-life and simulation Plan of presentation Abdou Guermouche SimGrid vs Real-Life Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion
13 Abdou Guermouche SimGrid vs Real-Life 13 Real-life and simulation Plan of presentation 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion
14 Abdou Guermouche SimGrid vs Real-Life 14 Real-life and simulation Goal : Compare the behavior of algorithms for load balancing and data redistribution on two different platforms : Grid 5000 SimGrid Figure: SimGrid Figure: Grid 5000
15 Abdou Guermouche SimGrid vs Real-Life 15 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Figure: Experimental scheme: the master and the workers. This organization is used in both the simulated and real-life context. The difference comes from the monitor which is given by SimGrid in the simulated context.
16 Abdou Guermouche SimGrid vs Real-Life 15 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Figure: Experimental scheme: the master and the workers. Master: Gather the results of the measurements. Call the redistribution algorithms when needed.
17 Abdou Guermouche SimGrid vs Real-Life 15 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Figure: Experimental scheme: the master and the workers. Monitor: Modify (using wrekavoc) the characteristics of the platform.
18 Abdou Guermouche SimGrid vs Real-Life 15 Real-life and simulation The master and the workers Cluster 1... Cluster 3 Monitor Master node (scheduler)... regular communications redistribution/state information monitoring... Cluster 2 Slaves: Figure: Experimental scheme: the master and the workers. Do all the computations and communications. Exchange data for redistribution according to the results of the master.
19 Abdou Guermouche SimGrid vs Real-Life 16 Real-life and simulation Plan of presentation Wrekavoc 1. Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion
20 Abdou Guermouche SimGrid vs Real-Life 17 Real-life and simulation Wrekavoc Wrekavoc, in the center of both platforms 1. in our context, Wrekavoc is used to control CPU and network capabilities; of randomly chosen resources; in order to study the behavior of the application. Modify CPU speed Modify Memory available Node Daemon Modify Network bandwith & Latency Figure: Wrekavoc in pictures
21 Abdou Guermouche SimGrid vs Real-Life 18 Real-life and simulation Wrekavoc 1. Real and simulated execution: Retrieve through measurements: processor speed network latency inbound bandwidth Differences: Real execution: the modification of the characteristics of the platform are done using wrekavoc, Simulated execution: the modification of the characteristics of the platfrom is a built-in functionnality of SimGrid.
22 Experimental results Plan of presentation Abdou Guermouche SimGrid vs Real-Life Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion
23 Experimental results Time for iteration (in seconds) Real-life Simgrid Iteration number Time for iteration (in seconds) Real-life Simgrid Iteration number (a) No platform variation. (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: one site platform. Abdou Guermouche SimGrid vs Real-Life 20
24 Experimental results Time for iteration (in seconds) Real-life Simgrid Iteration number Time for iteration (in seconds) Real-life Simgrid Iteration number (a) No platform variation. (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: two sites platform. Abdou Guermouche SimGrid vs Real-Life 21
25 Experimental results Time for iteration (in seconds) Real-life Simgrid Iteration number Time for iteration (in seconds) Real-life Simgrid Iteration number (a) No platform variation. (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: five sites platform. Abdou Guermouche SimGrid vs Real-Life 22
26 Experimental results Time for iteration (in seconds) Real-life Simgrid Iteration number Time for iteration (in seconds) Real-life Simgrid Iteration number (a) No platform variation. (b) With platform variation (3 platform variations, once every 29 iterations). Figure: Time needed (in seconds) for each iteration on the real-life and the simulated platform: two sites platform. Each iteration is three time more costly than a regular one. Abdou Guermouche SimGrid vs Real-Life 23
27 Conclusion Plan of presentation Abdou Guermouche SimGrid vs Real-Life Framework Data redistribution algorithms Heat propagation 2. Real-life and simulation Wrekavoc 3. Experimental results 4. Conclusion
28 Abdou Guermouche SimGrid vs Real-Life 25 Conclusion Conclusion 1. Two versions of the same application: the propagation of heat Simulated implementation on top of SimGrid. Real-life implementation running on the Grid 5000 platform. Using wrekavoc to control the characteristics of the platform. Use the same platform characteristics over time in the two contexts. 2. The observed behavior for the simulated case is very close to that of a real execution. 3. A first step for validation of SimGrid in the context of complex applications.
Modalis. A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche and Hélène Renard, May 5, 2010
A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche and Hélène Renard, LaBRI/Univ Bordeaux 1 I3S/École polytechnique universitaire de Nice-Sophia Antipolis May
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear
More informationAn introduction to mesh generation Part IV : elliptic meshing
Elliptic An introduction to mesh generation Part IV : elliptic meshing Department of Civil Engineering, Université catholique de Louvain, Belgium Elliptic Curvilinear Meshes Basic concept A curvilinear
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationAccurate emulation of CPU performance
Accurate emulation of CPU performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy Grand Est 2 LORIA / Nancy - Université Validation of distributed systems Approaches: Theoretical approach
More informationJoe Wingbermuehle, (A paper written under the guidance of Prof. Raj Jain)
1 of 11 5/4/2011 4:49 PM Joe Wingbermuehle, wingbej@wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download The Auto-Pipe system allows one to evaluate various resource mappings and topologies
More informationComparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2
Comparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2 Jingwei Zhu March 19, 2014 Instructor: Surya Pratap Vanka 1 Project Description The purpose of this
More informationParallel FEM Computation and Multilevel Graph Partitioning Xing Cai
Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example
More informationParallel Computing. Parallel Algorithm Design
Parallel Computing Parallel Algorithm Design Task/Channel Model Parallel computation = set of tasks Task Program Local memory Collection of I/O ports Tasks interact by sending messages through channels
More informationComputational Fluid Dynamics (CFD) using Graphics Processing Units
Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores
More informationf xx + f yy = F (x, y)
Application of the 2D finite element method to Laplace (Poisson) equation; f xx + f yy = F (x, y) M. R. Hadizadeh Computer Club, Department of Physics and Astronomy, Ohio University 4 Nov. 2013 Domain
More informationFractals. Investigating task farms and load imbalance
Fractals Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationA TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE
A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationUniversity of Innsbruck. Topology Aware Data Organisation for Large Scale Simulations
University of Innsbruck Institute of Computer Science Research Group DPS (Distributed and Parallel Systems) Topology Aware Data Organisation for Large Scale Simulations Master Thesis Supervisor: Herbert
More informationFractals exercise. Investigating task farms and load imbalance
Fractals exercise Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationBreaking the memory barrier (for finite difference modeling)
Breaking the memory barrier (for finite difference modeling) Jon Marius Venstad Norwegian University of Science and Technology (NTNU) Department of Petroleum Engineering & Applied Geophysics E-mail: venstad@gmail.com
More informationParallelization Strategy
COSC 335 Software Design Parallel Design Patterns (II) Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure
More informationParallel FDTD Solver with Static and Dynamic Load Balancing
Parallel FDTD Solver with Static and Dynamic Load Balancing Gleb Balykov Lomonosov Moscow State University, Moscow, 119991, Russia balykov.gleb@yandex.ru Abstract. Finite-difference time-domain method
More informationParallel Programming Patterns Overview and Concepts
Parallel Programming Patterns Overview and Concepts Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationMatrix multiplication
Matrix multiplication Standard serial algorithm: procedure MAT_VECT (A, x, y) begin for i := 0 to n - 1 do begin y[i] := 0 for j := 0 to n - 1 do y[i] := y[i] + A[i, j] * x [j] end end MAT_VECT Complexity:
More informationLecture 27: Board Notes: Parallel Programming Examples
Lecture 27: Board Notes: Parallel Programming Examples Part A: Consider the following binary search algorithm (a classic divide and conquer algorithm) that searches for a value X in a sorted N-element
More informationContents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage
More information1.2 Numerical Solutions of Flow Problems
1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian
More informationSpectral Compression of Mesh Geometry
Spectral Compression of Mesh Geometry Zachi Karni, Craig Gotsman SIGGRAPH 2000 1 Introduction Thus far, topology coding drove geometry coding. Geometric data contains far more information (15 vs. 3 bits/vertex).
More informationOverlapping Ring Monitoring Algorithm in TIPC
Overlapping Ring Monitoring Algorithm in TIPC Jon Maloy, Ericsson Canada Inc. Montreal April 7th 2017 PURPOSE When a cluster node becomes unresponsive due to crash, reboot or lost connectivity we want
More informationOptimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,
More informationHardware-Software Codesign
Hardware-Software Codesign 4. System Partitioning Lothar Thiele 4-1 System Design specification system synthesis estimation SW-compilation intellectual prop. code instruction set HW-synthesis intellectual
More information1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3
6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require
More informationParallel Poisson Solver in Fortran
Parallel Poisson Solver in Fortran Nilas Mandrup Hansen, Ask Hjorth Larsen January 19, 1 1 Introduction In this assignment the D Poisson problem (Eq.1) is to be solved in either C/C++ or FORTRAN, first
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationSynchronous Computations
Synchronous Computations Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Introduction We
More informationParallel Hybrid Monte Carlo Algorithms for Matrix Computations
Parallel Hybrid Monte Carlo Algorithms for Matrix Computations V. Alexandrov 1, E. Atanassov 2, I. Dimov 2, S.Branford 1, A. Thandavan 1 and C. Weihrauch 1 1 Department of Computer Science, University
More informationLarge Mesh Deformation Using the Volumetric Graph Laplacian
Large Mesh Deformation Using the Volumetric Graph Laplacian Kun Zhou1 Jin Huang2 John Snyder3 Xinguo Liu1 Hujun Bao2 Baining Guo1 Heung-Yeung Shum1 1 Microsoft Research Asia 2 Zhejiang University 3 Microsoft
More informationMPI Programming. Henrik R. Nagel Scientific Computing IT Division
1 MPI Programming Henrik R. Nagel Scientific Computing IT Division 2 Outline Introduction Finite Difference Method Finite Element Method LU Factorization SOR Method Monte Carlo Method Molecular Dynamics
More informationCSCE 5160 Parallel Processing. CSCE 5160 Parallel Processing
HW #9 10., 10.3, 10.7 Due April 17 { } Review Completing Graph Algorithms Maximal Independent Set Johnson s shortest path algorithm using adjacency lists Q= V; for all v in Q l[v] = infinity; l[s] = 0;
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationOptimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH
Optimising MPI Applications for Heterogeneous Coupled Clusters with MetaMPICH Carsten Clauss, Martin Pöppe, Thomas Bemmerl carsten@lfbs.rwth-aachen.de http://www.mp-mpich.de Lehrstuhl für Betriebssysteme
More informationParallel Greedy Matching Algorithms
Parallel Greedy Matching Algorithms Fredrik Manne Department of Informatics University of Bergen, Norway Rob Bisseling, University of Utrecht Md. Mostofa Patwary, University of Bergen 1 Outline Background
More informationDistributed Fine-Grained Node Localization in Ad-Hoc Networks. A Scalable Location Service for Geographic Ad Hoc Routing. Presented by An Nguyen
Distributed Fine-Grained Node Localization in Ad-Hoc Networks A Scalable Location Service for Geographic Ad Hoc Routing Presented by An Nguyen Distributed Fine-Grained Node Localization in Ad-Hoc Networks
More informationVery fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards
Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop
More informationEvent-driven computing
Event-driven computing Andrew Brown Southampton adb@ecs.soton.ac.uk Simon Moore Cambridge simon.moore@cl.cam.ac.uk David Thomas Imperial College d.thomas1@imperial.ac.uk Andrey Mokhov Newcastle andrey.mokhov@newcastle.ac.uk
More informationMonte Carlo Methods; Combinatorial Search
Monte Carlo Methods; Combinatorial Search Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 22, 2012 CPD (DEI / IST) Parallel and
More informationParallel Systems. Project topics
Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a
More informationby Mahender Reddy Concept To Reality / Summer 2006
by Mahender Reddy Demand for higher extrusion rates, increased product quality and lower energy consumption have prompted plants to use various methods to determine optimum process conditions and die designs.
More informationAMath 483/583 Lecture 21 May 13, 2011
AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versions of Jacobi iteration Gauss-Seidel and SOR iterative methods Next week: More MPI Debugging and totalview GPU computing Read: Class notes
More informationData mining with sparse grids
Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks
More informationScalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India
Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India Outline of Presentation MPEG-2 to H.264 Transcoding Need for a multiprocessor
More informationParallel Algorithm Design. CS595, Fall 2010
Parallel Algorithm Design CS595, Fall 2010 1 Programming Models The programming model o determines the basic concepts of the parallel implementation and o abstracts from the hardware as well as from the
More informationModeling and Simulation of Hybrid Systems
Modeling and Simulation of Hybrid Systems Luka Stanisic Samuel Thibault Arnaud Legrand Brice Videau Jean-François Méhaut CNRS/Inria/University of Grenoble, France University of Bordeaux/Inria, France JointLab
More informationTask Allocation for Minimizing Programs Completion Time in Multicomputer Systems
Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems Gamal Attiya and Yskandar Hamam Groupe ESIEE Paris, Lab. A 2 SI Cité Descartes, BP 99, 93162 Noisy-Le-Grand, FRANCE {attiyag,hamamy}@esiee.fr
More informationParallel Pencil Beam Redefinition Algorithm. Paul Alderson Mark Wright Amit Jain Robert Boyd
Parallel Pencil Beam Redefinition Algorithm Paul Alderson Mark Wright Amit Jain Robert Boyd Problem Definition Radiation Therapy Pencil Beam Redefinition Algorithm (PBRA) calculates radiation dose distributions.
More informationAMath 483/583 Lecture 24. Notes: Notes: Steady state diffusion. Notes: Finite difference method. Outline:
AMath 483/583 Lecture 24 Outline: Heat equation and discretization OpenMP and MPI for iterative methods Jacobi, Gauss-Seidel, SOR Notes and Sample codes: Class notes: Linear algebra software $UWHPSC/codes/openmp/jacobi1d_omp1.f90
More informationRuntime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism
1 Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism Takeshi Nanri (Kyushu Univ. and JST CREST, Japan) 16 Aug, 2016 4th Annual MVAPICH Users Group Meeting 2 Background
More informationOverview of MUMPS (A multifrontal Massively Parallel Solver)
Overview of MUMPS (A multifrontal Massively Parallel Solver) E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, J.-Y. L Excellent, T. Slavova, B. Uçar CERFACS, ENSEEIHT-IRIT, INRIA (LIP and LaBRI),
More informationFluent User Services Center
Solver Settings 5-1 Using the Solver Setting Solver Parameters Convergence Definition Monitoring Stability Accelerating Convergence Accuracy Grid Independence Adaption Appendix: Background Finite Volume
More information10th August Part One: Introduction to Parallel Computing
Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer
More informationAMath 483/583 Lecture 24
AMath 483/583 Lecture 24 Outline: Heat equation and discretization OpenMP and MPI for iterative methods Jacobi, Gauss-Seidel, SOR Notes and Sample codes: Class notes: Linear algebra software $UWHPSC/codes/openmp/jacobi1d_omp1.f90
More informationScheduling Tasks Sharing Files from Distributed Repositories
from Distributed Repositories Arnaud Giersch 1, Yves Robert 2 and Frédéric Vivien 2 1 ICPS/LSIIT, University Louis Pasteur, Strasbourg, France 2 École normale supérieure de Lyon, France September 1, 2004
More informationContents. I The Basic Framework for Stationary Problems 1
page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other
More informationUnit I Reading Graphical Methods
Unit I Reading Graphical Methods One of the most effective tools for the visual evaluation of data is a graph. The investigator is usually interested in a quantitative graph that shows the relationship
More informationDesigning Parallel Programs. This review was developed from Introduction to Parallel Computing
Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis
More informationDesign of Parallel Programs Algoritmi e Calcolo Parallelo. Daniele Loiacono
Design of Parallel Programs Algoritmi e Calcolo Parallelo Web: home.dei.polimi.it/loiacono Email: loiacono@elet.polimi.it References q The material in this set of slide is taken from two tutorials by Blaise
More informationA Decoupled Scheduling Approach for the GrADS Program Development Environment. DCSL Ahmed Amin
A Decoupled Scheduling Approach for the GrADS Program Development Environment DCSL Ahmed Amin Outline Introduction Related Work Scheduling Architecture Scheduling Algorithm Testbench Results Conclusions
More informationON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS
ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS NURNABI MEHERUL ALAM M.Sc. (Agricultural Statistics), Roll No. I.A.S.R.I, Library Avenue, New Delhi- Chairperson: Dr. P.K. Batra Abstract: Block designs
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationComplexity results for throughput and latency optimization of replicated and data-parallel workflows
Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon June 2007 Anne.Benoit@ens-lyon.fr
More informationA Technique for improving the scheduling of network communicating processes in MOSIX
A Technique for improving the scheduling of network communicating processes in MOSIX Rengakrishnan Subramanian Masters Report, Final Defense Guidance by Prof. Dan Andresen Agenda MOSIX Network communicating
More informationSmoothing & Fairing. Mario Botsch
Smoothing & Fairing Mario Botsch Motivation Filter out high frequency noise Desbrun, Meyer, Schroeder, Barr: Implicit Fairing of Irregular Meshes using Diffusion and Curvature Flow, SIGGRAPH 99 2 Motivation
More informationNon supervised classification of individual electricity curves
Non supervised classification of individual electricity curves Jairo Cugliari 17 février 2014 Joint work with Benjamin Auder (LMO, Université Paris-Sud) Outline 1 Motivation 2 Functional clustering 3 Parallel
More informationThe Overall SHRIMP Project. Related Work (from last section) Paper Goals. Bill Kramer April 17, 2002
CS 258 Reading Assignment 16 Discussion Design Choice in the SHRIMP System: An Empirical Study Bill Kramer April 17, 2002 # The Overall SHRIMP Project The SHRIMP (Scalable High-performance Really Inexpensive
More informationHybrid MPI + OpenMP Approach to Improve the Scalability of a Phase-Field-Crystal Code
Hybrid MPI + OpenMP Approach to Improve the Scalability of a Phase-Field-Crystal Code Reuben D. Budiardja reubendb@utk.edu ECSS Symposium March 19 th, 2013 Project Background Project Team (University of
More informationRT 3D FDTD Simulation of LF and MF Room Acoustics
RT 3D FDTD Simulation of LF and MF Room Acoustics ANDREA EMANUELE GRECO Id. 749612 andreaemanuele.greco@mail.polimi.it ADVANCED COMPUTER ARCHITECTURES (A.A. 2010/11) Prof.Ing. Cristina Silvano Dr.Ing.
More informationVirtual Network Embedding with Substrate Support for Parallelization
Virtual Network Embedding with Substrate Support for Parallelization Sheng Zhang, Nanjing University, P.R. China Jie Wu, Temple University, USA Sanglu Lu, Nanjing University, P.R. China Presenter: Pouya
More informationNew Challenges In Dynamic Load Balancing
New Challenges In Dynamic Load Balancing Karen D. Devine, et al. Presentation by Nam Ma & J. Anthony Toghia What is load balancing? Assignment of work to processors Goal: maximize parallel performance
More informationHigh Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters
SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014
More informationApplication Example Running on Top of GPI-Space Integrating D/C
Application Example Running on Top of GPI-Space Integrating D/C Tiberiu Rotaru Fraunhofer ITWM This project is funded from the European Union s Horizon 2020 Research and Innovation programme under Grant
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationANSYS FLUENT. Lecture 3. Basic Overview of Using the FLUENT User Interface L3-1. Customer Training Material
Lecture 3 Basic Overview of Using the FLUENT User Interface Introduction to ANSYS FLUENT L3-1 Parallel Processing FLUENT can readily be run across many processors in parallel. This will greatly speed up
More informationGraph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen
Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static
More informationA Box-Consistency Contraction Operator Based on extremal Functions
A Box-Consistency Contraction Operator Based on extremal Functions Gilles Trombettoni, Yves Papegay, Gilles Chabert, Odile Pourtallier INRIA, Université de Nice-Sophia, France October 2008, El Paso, Texas
More informationPortability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17
Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17 11/27/2017 Background Many developers choose OpenMP in hopes of having a single source code that runs effectively anywhere (performance
More informationGeometric Modeling Assignment 3: Discrete Differential Quantities
Geometric Modeling Assignment : Discrete Differential Quantities Acknowledgements: Julian Panetta, Olga Diamanti Assignment (Optional) Topic: Discrete Differential Quantities with libigl Vertex Normals,
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationGermán Llort
Germán Llort gllort@bsc.es >10k processes + long runs = large traces Blind tracing is not an option Profilers also start presenting issues Can you even store the data? How patient are you? IPDPS - Atlanta,
More informationChapter 3 : Topology basics
1 Chapter 3 : Topology basics What is the network topology Nomenclature Traffic pattern Performance Packaging cost Case study: the SGI Origin 2000 2 Network topology (1) It corresponds to the static arrangement
More informationM.E.J. Newman: Models of the Small World
A Review Adaptive Informatics Research Centre Helsinki University of Technology November 7, 2007 Vocabulary N number of nodes of the graph l average distance between nodes D diameter of the graph d is
More informationLecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC
Lecture 9: Group Communication Operations Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 4 slides of the text, by A. Grama w/ a few changes, augmentations and corrections Topic Overview
More informationTransactions on Information and Communications Technologies vol 3, 1993 WIT Press, ISSN
The implementation of a general purpose FORTRAN harness for an arbitrary network of transputers for computational fluid dynamics J. Mushtaq, A.J. Davies D.J. Morgan ABSTRACT Many Computational Fluid Dynamics
More informationMultigrid Pattern. I. Problem. II. Driving Forces. III. Solution
Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids
More informationOn Cluster Resource Allocation for Multiple Parallel Task Graphs
On Cluster Resource Allocation for Multiple Parallel Task Graphs Henri Casanova Frédéric Desprez Frédéric Suter University of Hawai i at Manoa INRIA - LIP - ENS Lyon IN2P3 Computing Center, CNRS / IN2P3
More informationGraph Partitioning for Scalable Distributed Graph Computations
Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 3/3/08 CAP5510 1 Gene g Probe 1 Probe 2 Probe N 3/3/08 CAP5510
More informationImplicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC
Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma
More informationThe Finite Element Method
The Finite Element Method A Practical Course G. R. Liu and S. S. Quek Chapter 1: Computational modeling An overview 1 CONTENTS INTRODUCTION PHYSICAL PROBLEMS IN ENGINEERING COMPUTATIONAL MODELLING USING
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More information1.7.1 Laplacian Smoothing
1.7.1 Laplacian Smoothing 320491: Advanced Graphics - Chapter 1 434 Theory Minimize energy functional total curvature estimate by polynomial-fitting non-linear (very slow!) 320491: Advanced Graphics -
More information