Load Balanced Parallel Simulated Annealing on a Cluster of SMP Nodes
|
|
- Loraine McCarthy
- 6 years ago
- Views:
Transcription
1 Load Balanced Parallel Simulated Annealing on a Cluster of SMP Nodes Agnieszka Debudaj-Grabysz 1 and Rolf Rabenseifner 2 1 Silesian University of Technology, Gliwice, Poland; 2 High-Performance Computing Center (HLRS), Stuttgart, Germany EuroPar, Dresden, August 30,
2 OUTLINE 1. The algorithm of simulated annealing 2. Hybrid communication method (HC) nesting OpenMP in MPI The reference method The method with a single data exchange 4. Outer level load balancing 5. Inner level load balancing 6. Vehicle routing problem with time windows (VRPTW) an example of a bi-criterion optimization problem 7. Experimental results EuroPar, Dresden, August 30, /18
3 THE GOAL AND ALGORITHM OF SIMULATED ANNEALING Finding the state of minimal (maximal) value of the cost function Begin generate initial solution End Starting at Eagle accepted not accepted e.g., to go into the deepest crater on the moon No End conditions met? Yes Trials Really descending Select a new solution from neighborhood Next position on a suitable footpath Is the new solution better? Accept current solution EuroPar, Dresden, August 30, /18 Yes No deeper one independent dependent Take a random number sometimes climbing back on the crater edge to get a chance to find a The sequence of trials forms a chain Yes No Trial Ceiling moves down in line with decreasing parameter called temperature Is the number < ceiling? The factor sometimes is reduced over the execution time
4 THE HYBRID COMMUNICATION METHOD Nesting OpenMP in MPI Hybrid character of the hardware Parallelization uses two levels SMP nodes CPUs shared memory Node interconnect INNER PARALLELIZATION Only up to a few threads are used Communication overhead is minimal relative to that between nodes Uses OpenMP OUTER PARALLELIZATION Several SMP nodes Communication overhead is not negligible Uses MPI EuroPar, Dresden, August 30, /18
5 THE REFERENCE HYBRID COMMUNICATION METHOD Outer parallelization for communication between nodes Each Each chain chain is is divided divided into into subchainchains by by the the factor factor of of the the sub- number number of of nodes nodes The The process process of of generating generating every every consecutive consecutive chain chain is is performed performed without without communication communication between between nodes nodes At At the the end, end, the the best best solution solution is is picked picked up up as as the the final final one one chain 1 chain 2 chain 3 Node interconnect The The maximal maximal number number of of chain n engaged engaged nodes nodes is is limited limited by by reasonable reasonable shortening shortening of of the the chain A final solution chain length length EuroPar, Dresden, August 30, /18
6 Thread 1 Thread 2 Thread 3 Thread 4 Agnieszka Debudaj-Grabysz, Rolf Rabenseifner Load Balanced Parallel Simulated Annealing... THE REFERENCE HYBRID COMMUNICATION METHOD Inner parallelization for communication within nodes SMP-node Performing CPU-intensive trials in a parallel region Fast comparison of the costfunction in a serial OpenMP master section Select one solution common for all threads from all accepted ones for(l=0;l<nooftempred;l++) EuroPar, Dresden, August 30, /18 { #pragma omp parallel private(ii) { for(ii=0;ii<epochlengthpernode; ii=ii+setoftrialsize) { #pragma omp for schedule(static) for(i=0;i<setoftrialsize;i++) perform_trial(); #pragma omp barrier #pragma omp master {select one solution common for all threads... } #pragma omp barrier } /*end for (ii=0;...) */ } /*end of parallel region*/ {reduce the temperature... } } /*end for(l=0;...)*/
7 THE METHOD WITH A SINGLE DATA EXCHANGE The idea Incorporating one data exchange after elapsing a percent of the specified time limit (e.g. 50%, 70%) During the exchange of the data the best solution is selected and mandated for all processes The idea gives the possibility of: Heavy exploration of the search space during the first phase, i.e., a few (but only a few) paths can reach the area of the global minimum. Improvement of the best path during the second phase by all working processes Many small (instead groups of of only astronauts again a few) are the looking moon independently for the deepest crater and hopefully, at least one group (=SMP node) is finding it Improvement of the best path during the second phase by all working processes (instead of only a few) George A short W. time Bush before (Jan 14, returning 2004): Using to earth, the Crew all groups Exploration are concentrated Vehicle, we will undertake extended to human the deepest missions crater to the found moon up as to early now. as 2015,... The last minutes, they all try --> to Jan. find 2004 the --> deepest Jan. 14 location in that crater! EuroPar, Dresden, August 30, /18
8 time [s] Agnieszka Debudaj-Grabysz, Rolf Rabenseifner Load Balanced Parallel Simulated Annealing... OUTER LEVEL LOAD BALANCING The times for generating 8 sub-chains based on an example run idle times stopping and communicating after a fixed number of temperature levels number of temperature levels time [s] Optimization: stopping and communicating after a fixed amount of time number of temperature levels EuroPar, Dresden, August 30, /18
9 INNER LEVEL LOAD BALANCING The First Optimization Step Each single trial needs extremely different compute time Therefore with always 5 trials per thread: Better load balancing EuroPar, Dresden, August 30, /18
10 time [ms ] Agnieszka Debudaj-Grabysz, Rolf Rabenseifner Load Balanced Parallel Simulated Annealing... INNER LEVEL LOAD BALANCING The First Optimization Step the average fastest trial After After the the optimization: optimization: 5 trials trials per per a thread thread the average value 112% of the average s e parate trials time [ms ] ms *) Before Before the the optimization: optimization: 1 trial trial per per a thread thread 51% of the average the average value the average slowest trial clus te re d trials 624 ms *) *) Execution time for 4x5 trials on 4 threads EuroPar, Dresden, August 30, /18
11 INNER LEVEL LOAD BALANCING The Second Optimization Step Redefinition of a trial: Finding a new valid solution S in the neighborhood of S the most time consuming function Allowing a trial to abort this loop without result causes better load balancing Average trial time more dominated by minimal trial size Absolute maximal trial time significantly reduced EuroPar, Dresden, August 30, /18
12 INNER LEVEL LOAD BALANCING The Second Optimization Step time [ms ] Clustered Clustered trials trials the average value after the redefinition be fore the re de finition 112% of the average afte r the re de finition 1.0 time [ms ] ms *) 273 ms *) 51% of the average be fore the re de finition Separate Separate trials trials 66% of the average the average value after the redefinition 624 ms *) afte r the re de finition 212 ms *) 29% of the average EuroPar, Dresden, August 30, 2006 *) Execution time for 12/18 4x5 trials on 4 threads
13 INNER LEVEL LOAD BALANCING The Third Optimization Step The The easiest easiest solution: solution: for(l=0;l<nooftempred;l++) { for(ii=0;ii<epochlengthpernode; ii=ii+setoftrialsize) { #pragma omp parallel for schedule(static) for(i=0;i<setoftrialsize;i++) perform_trial(); #pragma omp barrier /*end of parallel region*/ {select one solution common for all threads... } } /*end for (ii=0;...) */ {reduce the temperature... } } /*end for(l=0;...)*/ 30-40ms * 0.16ms * ~ for(l=0;l<nooftempred;l++) EuroPar, Dresden, August 30, /18 ~20 Third Third optimization: Reducing the the fork-join overhead The The optimized optimized solution: solution: { #pragma omp parallel private(ii) { for(ii=0;ii<epochlengthpernode; ii=ii+setoftrialsize) { #pragma omp for schedule(static) for(i=0;i<setoftrialsize;i++) perform_trial(); #pragma omp barrier #pragma omp master {select one solution common for all threads... } #pragma omp barrier } /*end for (ii=0;...) */ } /*end of parallel region*/ {reduce the temperature... } } /*end for(l=0;...)*/ *System: NEC TX7 16x1.5Ghz Intel ItaniumII CPUs, 6MB L3 Cache
14 VEHICLE ROUTING PROBLEM WITH TIME WINDOWS Goal Goal Minimizing Minimizing number number of of route route legs legs Minimizing Minimizing of of the the total total travel travel distance distance Constraints Constraints Customer Customer and and warehouse warehouse time time windows windows Vehicle Vehicle capacity capacity Duration Duration of of servicing servicing a single single customer customer Customer s Customer s demand demand Customer Warehouse EuroPar, Dresden, August 30, /18
15 EXPERIMENTAL RESULTS* THE FIRST OPTIMIZATION GOAL** The number of final solutions with the minimal number of route legs (i.e., good solutions), generated within 100 runs number of "good" solutions per 100 runs SEQ HC4, i.e., OMP=4*** HC2, i.e., OMP= number of processors Presented values are the averages over the values obtained for 4 data files from Solomon s benchmark set: R108, R111, RC105, RC108. Constraints: Execution time x number of CPUs = constant * Experiments carried out on NEC Xeon EM64T Cluster ** The previous work *** Emulated usage of 4 OMP threads, based on results of tests of OMP parallelization carried out on NEC TX-7 system EuroPar, Dresden, August 30, /18
16 EXPERIMENTAL RESULTS THE SECOND OPTIMIZATION GOAL The relative distance from the value obtained by a sequential algorithm the distance from a sequential result better results than for the SEQUENTIAL method the sequential result number of processors HC4 HC4-0.9 HC4-0.7 HC4-0.5 Presented values are the averages over the values obtained for 4 data files from Solomon s benchmark set: R108, R111, RC105, RC108. Constraints: Execution time x number of CPUs = constant better results than for the reference method Outer level communication after 100%, 90%, 70% and 50% of the total execution time EuroPar, Dresden, August 30, /18
17 EXPERIMENTAL RESULTS THE SECOND OPTIMIZATION GOAL R108 R111 RC105 RC108 total distance total distance total distance total distance seq HC4 HC4-0.9 HC4-0.7 HC number of processors number of processors number of processors EuroPar, Dresden, August 30, /18 number of processors
18 Acknowledgements This project was supported by HPC-Europa. The research stay at HLRS Jan. Feb was granted by HPC-Europa. HPC-Europa Pan-European Research Infrastructure on High Performance Computing at HLRS, Stuttgart SARA, Amsterdam idris, Paris epcc, Edinburgh CEPBA, Barcelona CINECA, Bologna Research Grants for Computational Science Access to HPC Scientific Collaboration Technical Support All travel and living expense fully reimbursed! 3 13 weeks Poster outside Applications are welcome at any level, from postgraduate researchers to senior profs. Currently, high acceptance rate EuroPar, Dresden, August 30, /18
Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text
Acknowledgments Programming with MPI Parallel ming Jan Thorbecke Type to enter text This course is partly based on the MPI courses developed by Rolf Rabenseifner at the High-Performance Computing-Center
More informationCommunication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures
Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart
More informationAgenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP
More informationA Lightweight OpenMP Runtime
Alexandre Eichenberger - Kevin O Brien 6/26/ A Lightweight OpenMP Runtime -- OpenMP for Exascale Architectures -- T.J. Watson, IBM Research Goals Thread-rich computing environments are becoming more prevalent
More informationBinding Nested OpenMP Programs on Hierarchical Memory Architectures
Binding Nested OpenMP Programs on Hierarchical Memory Architectures Dirk Schmidl, Christian Terboven, Dieter an Mey, and Martin Bücker {schmidl, terboven, anmey}@rz.rwth-aachen.de buecker@sc.rwth-aachen.de
More informationKommunikations- und Optimierungsaspekte paralleler Programmiermodelle auf hybriden HPC-Plattformen
Kommunikations- und Optimierungsaspekte paralleler Programmiermodelle auf hybriden HPC-Plattformen Rolf Rabenseifner rabenseifner@hlrs.de Universität Stuttgart, Höchstleistungsrechenzentrum Stuttgart (HLRS)
More informationShared Memory Parallel Programming. Shared Memory Systems Introduction to OpenMP
Shared Memory Parallel Programming Shared Memory Systems Introduction to OpenMP Parallel Architectures Distributed Memory Machine (DMP) Shared Memory Machine (SMP) DMP Multicomputer Architecture SMP Multiprocessor
More informationLecture 16: Recapitulations. Lecture 16: Recapitulations p. 1
Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationOpenMP and more Deadlock 2/16/18
OpenMP and more Deadlock 2/16/18 Administrivia HW due Tuesday Cache simulator (direct-mapped and FIFO) Steps to using threads for parallelism Move code for thread into a function Create a struct to hold
More informationEfficient Clustering and Scheduling for Task-Graph based Parallelization
Center for Information Services and High Performance Computing TU Dresden Efficient Clustering and Scheduling for Task-Graph based Parallelization Marc Hartung 02. February 2015 E-Mail: marc.hartung@tu-dresden.de
More informationAnalyzing Cache Bandwidth on the Intel Core 2 Architecture
John von Neumann Institute for Computing Analyzing Cache Bandwidth on the Intel Core 2 Architecture Robert Schöne, Wolfgang E. Nagel, Stefan Pflüger published in Parallel Computing: Architectures, Algorithms
More informationParallel Programming
Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationHybrid Implementation of 3D Kirchhoff Migration
Hybrid Implementation of 3D Kirchhoff Migration Max Grossman, Mauricio Araya-Polo, Gladys Gonzalez GTC, San Jose March 19, 2013 Agenda 1. Motivation 2. The Problem at Hand 3. Solution Strategy 4. GPU Implementation
More informationComparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster G. Jost*, H. Jin*, D. an Mey**,F. Hatay*** *NASA Ames Research Center **Center for Computing and Communication, University of
More informationAUTOMATIC SMT THREADING
AUTOMATIC SMT THREADING FOR OPENMP APPLICATIONS ON THE INTEL XEON PHI CO-PROCESSOR WIM HEIRMAN 1,2 TREVOR E. CARLSON 1 KENZO VAN CRAEYNEST 1 IBRAHIM HUR 2 AAMER JALEEL 2 LIEVEN EECKHOUT 1 1 GHENT UNIVERSITY
More informationHybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes
Parallel Programming on lusters of Multi-ore SMP Nodes Rolf Rabenseifner, High Performance omputing enter Stuttgart (HLRS) Georg Hager, Erlangen Regional omputing enter (RRZE) Gabriele Jost, Texas Advanced
More informationBarbara Chapman, Gabriele Jost, Ruud van der Pas
Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationOpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa
OpenMP on the FDSM software distributed shared memory Hiroya Matsuba Yutaka Ishikawa 1 2 Software DSM OpenMP programs usually run on the shared memory computers OpenMP programs work on the distributed
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationOptimization of MPI Applications Rolf Rabenseifner
Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationANALYSIS OF CMAQ PERFORMANCE ON QUAD-CORE PROCESSORS. George Delic* HiPERiSM Consulting, LLC, P.O. Box 569, Chapel Hill, NC 27514, USA
ANALYSIS OF CMAQ PERFORMANCE ON QUAD-CORE PROCESSORS George Delic* HiPERiSM Consulting, LLC, P.O. Box 569, Chapel Hill, NC 75, USA. INTRODUCTION CMAQ offers a choice of three gas chemistry solvers: Euler-Backward
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationConvey Vector Personalities FPGA Acceleration with an OpenMP-like programming effort?
Convey Vector Personalities FPGA Acceleration with an OpenMP-like programming effort? Björn Meyer, Jörn Schumacher, Christian Plessl, Jens Förstner University of Paderborn, Germany 2 ), - 4 * 4 + - 6-4.
More informationGenerating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory
Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri Thejas Ramashekar Chandan Reddy Uday Bondhugula Department of Computer Science and Automation
More informationCPS343 Parallel and High Performance Computing Project 1 Spring 2018
CPS343 Parallel and High Performance Computing Project 1 Spring 2018 Assignment Write a program using OpenMP to compute the estimate of the dominant eigenvalue of a matrix Due: Wednesday March 21 The program
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute
More informationAdaptive Power Profiling for Many-Core HPC Architectures
Adaptive Power Profiling for Many-Core HPC Architectures Jaimie Kelley, Christopher Stewart The Ohio State University Devesh Tiwari, Saurabh Gupta Oak Ridge National Laboratory State-of-the-Art Schedulers
More informationHybrid MPI and OpenMP Parallel Programming
Hybrid MPI and OpenMP Parallel Programming MPI + OpenMP and other models on clusters of SMP nodes Hybrid Parallel Programming Slide 1 Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner High-Performance
More informationNUMA-aware OpenMP Programming
NUMA-aware OpenMP Programming Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de Christian Terboven IT Center, RWTH Aachen University Deputy lead of the HPC
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationProgramming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel
More informationPerformance Evaluation with the HPCC Benchmarks as a Guide on the Way to Peta Scale Systems
Performance Evaluation with the HPCC Benchmarks as a Guide on the Way to Peta Scale Systems Rolf Rabenseifner, Michael M. Resch, Sunil Tiyyagura, Panagiotis Adamidis rabenseifner@hlrs.de resch@hlrs.de
More informationMPI & OpenMP Mixed Hybrid Programming
MPI & OpenMP Mixed Hybrid Programming Berk ONAT İTÜ Bilişim Enstitüsü 22 Haziran 2012 Outline Introduc/on Share & Distributed Memory Programming MPI & OpenMP Advantages/Disadvantages MPI vs. OpenMP Why
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationConcurrent Programming with OpenMP
Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 11, 2012 CPD (DEI / IST) Parallel and Distributed
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More informationEuropean exascale applications workshop, Edinburgh, 19th/20th April 2018 Asynchronous Execution in DLR's CFD Solvers
European exascale applications workshop, Edinburgh, 19th/20th April 2018 Asynchronous Execution in DLR's CFD Solvers Thomas Gerhold Institute of Software Methods for Product Virtualization, Dresden DLR
More informationLecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1
Lecture 14: Mixed MPI-OpenMP programming Lecture 14: Mixed MPI-OpenMP programming p. 1 Overview Motivations for mixed MPI-OpenMP programming Advantages and disadvantages The example of the Jacobi method
More informationProgramming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading
More informationLoop Modifications to Enhance Data-Parallel Performance
Loop Modifications to Enhance Data-Parallel Performance Abstract In data-parallel applications, the same independent
More informationParallel Programming. OpenMP Parallel programming for multiprocessors for loops
Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory
More informationOutline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work
Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3
More informationHPC IN EUROPE. Organisation of public HPC resources
HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC resources provided primarily to enable scientific research and development at European universities and other publicly-funded
More informationLab: Scientific Computing Tsunami-Simulation
Lab: Scientific Computing Tsunami-Simulation Session 4: Optimization and OMP Sebastian Rettenberger, Michael Bader 23.11.15 Session 4: Optimization and OMP, 23.11.15 1 Department of Informatics V Linux-Cluster
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationAltair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015
Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationCrossing the Architectural Barrier: Evaluating Representative Regions of Parallel HPC Applications
Crossing the Architectural Barrier: Evaluating Representative Regions of Parallel HPC Applications Alexandra Ferrerón (University of Zaragoza), Radhika Jagtap, Sascha Bischoff, Roxana Rușitoru (ARM) Senior
More informationNew Features in LS-DYNA HYBRID Version
11 th International LS-DYNA Users Conference Computing Technology New Features in LS-DYNA HYBRID Version Nick Meng 1, Jason Wang 2, Satish Pathy 2 1 Intel Corporation, Software and Services Group 2 Livermore
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationAn Introduction to OpenMP
An Introduction to OpenMP U N C L A S S I F I E D Slide 1 What Is OpenMP? OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism
More informationOn the scalability of tracing mechanisms 1
On the scalability of tracing mechanisms 1 Felix Freitag, Jordi Caubet, Jesus Labarta Departament d Arquitectura de Computadors (DAC) European Center for Parallelism of Barcelona (CEPBA) Universitat Politècnica
More information<Insert Picture Here> OpenMP on Solaris
1 OpenMP on Solaris Wenlong Zhang Senior Sales Consultant Agenda What s OpenMP Why OpenMP OpenMP on Solaris 3 What s OpenMP Why OpenMP OpenMP on Solaris
More informationCMAQ PARALLEL PERFORMANCE WITH MPI AND OPENMP**
CMAQ 5.2.1 PARALLEL PERFORMANCE WITH MPI AND OPENMP** George Delic* HiPERiSM Consulting, LLC, P.O. Box 569, Chapel Hill, NC 27514, USA 1. INTRODUCTION This presentation reports on implementation of the
More informationPowernightmares: The Challenge of Efficiently Using Sleep States on Multi-Core Systems
Powernightmares: The Challenge of Efficiently Using Sleep States on Multi-Core Systems Thomas Ilsche, Marcus Hähnel, Robert Schöne, Mario Bielert, and Daniel Hackenberg Technische Universität Dresden Observation
More informationOut-of-Order Parallel Simulation of SystemC Models. G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.)
Out-of-Order Simulation of s using Intel MIC Architecture G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.) Speaker: Rainer Dömer doemer@uci.edu Center for Embedded Computer
More informationDepartment of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer
HPC-Lab Session 2: OpenMP M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14
More informationParallel Computing in Combinatorial Optimization
Parallel Computing in Combinatorial Optimization Bernard Gendron Université de Montréal gendron@iro.umontreal.ca Course Outline Objective: provide an overview of the current research on the design of parallel
More informationCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 18. Combining MPI and OpenMP
Chapter 18 Combining MPI and OpenMP Outline Advantages of using both MPI and OpenMP Case Study: Conjugate gradient method Case Study: Jacobi method C+MPI vs. C+MPI+OpenMP Interconnection Network P P P
More informationAllows program to be incrementally parallelized
Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP
More informationShared Memory Parallelism - OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial
More informationIMPLEMENTATION OF THE. Alexander J. Yee University of Illinois Urbana-Champaign
SINGLE-TRANSPOSE IMPLEMENTATION OF THE OUT-OF-ORDER 3D-FFT Alexander J. Yee University of Illinois Urbana-Champaign The Problem FFTs are extremely memory-intensive. Completely bound by memory access. Memory
More informationAdvanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs
Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens School of Electrical
More informationUvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP
Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel
More informationTuning Alya with READEX for Energy-Efficiency
Tuning Alya with READEX for Energy-Efficiency Venkatesh Kannan 1, Ricard Borrell 2, Myles Doyle 1, Guillaume Houzeaux 2 1 Irish Centre for High-End Computing (ICHEC) 2 Barcelona Supercomputing Centre (BSC)
More informationAn innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ.
An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ. Of Pisa Italy 29/02/2012, Nuremberg, Germany ARTEMIS ARTEMIS Joint Joint Undertaking
More informationIntroduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014
Introduction to MPI EAS 520 High Performance Scientific Computing University of Massachusetts Dartmouth Spring 2014 References This presentation is almost an exact copy of Dartmouth College's Introduction
More informationApplying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems
V Brazilian Symposium on Computing Systems Engineering Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems Alessandro Trindade, Hussama Ismail, and Lucas Cordeiro Foz
More informationPerformance Tuning and OpenMP
Performance Tuning and OpenMP mueller@hlrs.de University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Outline Motivation Performance Basics General Performance Issues and
More informationParallel Computing Why & How?
Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationLecture 13. Shared memory: Architecture and programming
Lecture 13 Shared memory: Architecture and programming Announcements Special guest lecture on Parallel Programming Language Uniform Parallel C Thursday 11/2, 2:00 to 3:20 PM EBU3B 1202 See www.cse.ucsd.edu/classes/fa06/cse260/lectures/lec13
More informationDistributed Systems CS /640
Distributed Systems CS 15-440/640 Programming Models Borrowed and adapted from our good friends at CMU-Doha, Qatar Majd F. Sakr, Mohammad Hammoud andvinay Kolar 1 Objectives Discussion on Programming Models
More informationParallelism paradigms
Parallelism paradigms Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 23, 2011 Outline 1 Parallelization strategies 2 Shared memory 3 Distributed memory 4 Parallelization
More informationHPC SERVICE PROVISION FOR THE UK
HPC SERVICE PROVISION FOR THE UK 5 SEPTEMBER 2016 Dr Alan D Simpson ARCHER CSE Director EPCC Technical Director Overview Tiers of HPC Tier 0 PRACE Tier 1 ARCHER DiRAC Tier 2 EPCC Oxford Cambridge UCL Tiers
More informationCompiling for GPUs. Adarsh Yoga Madhav Ramesh
Compiling for GPUs Adarsh Yoga Madhav Ramesh Agenda Introduction to GPUs Compute Unified Device Architecture (CUDA) Control Structure Optimization Technique for GPGPU Compiler Framework for Automatic Translation
More informationSingle-Points of Performance
Single-Points of Performance Mellanox Technologies Inc. 29 Stender Way, Santa Clara, CA 9554 Tel: 48-97-34 Fax: 48-97-343 http://www.mellanox.com High-performance computations are rapidly becoming a critical
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationObjective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.
CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationParallelization, OpenMP
~ Parallelization, OpenMP Scientific Computing Winter 2016/2017 Lecture 26 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de made wit pandoc 1 / 18 Why parallelization? Computers became faster and faster
More informationVýpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu
Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu Filip Staněk Seminář gridového počítání 2011, MetaCentrum, Brno, 7. 11. 2011 Introduction I Project objectives: to establish a centre
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationTowards OpenMP for Java
Towards OpenMP for Java Mark Bull and Martin Westhead EPCC, University of Edinburgh, UK Mark Kambites Dept. of Mathematics, University of York, UK Jan Obdrzalek Masaryk University, Brno, Czech Rebublic
More information