Design Approach for a Generic and Scalable Framework for Parallel FMU Simulations
|
|
- Neil Farmer
- 6 years ago
- Views:
Transcription
1 Center for Information Services and High Performance Computing TU Dresden Design Approach for a Generic and Scalable Framework for Parallel FMU Simulations Martin Flehmig, Marc Hartung, Marcus Walther Linko ping, 02. February martin.flehmig@tu-dresden.de
2 Outline 1 Introduction and Motivation 2 Coupled Simulations Using FMI 3 Design Approach 4 Summary and Outlook 2/23
3 HPC-OM 3/23
4 Outline 1 Introduction and Motivation 2 Coupled Simulations Using FMI 3 Design Approach 4 Summary and Outlook 4/23
5 Tasks within HPC-OM The three main tasks/goals within the HPC-OM project are: 1. Domain independent parallelization of Modelica simulations. Equation based parallelization using task graph. Algorithms for task merging and clustering. Implementation of various schedulers. Code generation for OpenMP, PThreads and Intel TBB. Exploiting repeated structures and vectorization in Modelica. 2. Parallel time integration. 3. Coupling of interactive simulations and an HPC system. 5/23
6 Tasks within HPC-OM The three main tasks/goals within the HPC-OM project are: 1. Domain independent parallelization of Modelica simulations. Equation based parallelization using task graph. Algorithms for task merging and clustering. Implementation of various schedulers. Code generation for OpenMP, PThreads and Intel TBB. Exploiting repeated structures and vectorization in Modelica. 2. Parallel time integration. 3. Coupling of interactive simulations and an HPC system. 4. High speedups. 5/23
7 Task Graph Parallelization... is promising, but current restrictions are: Model dependent benefit, because tasks are too lightweight, one large equation system thwarts whole parallel computation. OpenModelica has some issues regarding large models. 6/23
8 How to Achieve High Speedups? Idea: Build simulation from several FMUs to obtain multiple levels of parallelism: Task graph parallelization with FMUs as nodes. Use task graph parallelization generated by OMC in each FMU. 7/23
9 Outline 1 Introduction and Motivation 2 Coupled Simulations Using FMI 3 Design Approach 4 Summary and Outlook 8/23
10 FMI/FMU - Short Recap Can bring together sub-models from distinctive authoring tools (language, libraries, solvers, etc.). Models can be used in already existing production codes and frameworks. Protects intellectual property. 9/23
11 FMI/FMU - Short Recap Can bring together sub-models from distinctive authoring tools (language, libraries, solvers, etc.). Models can be used in already existing production codes and frameworks. Protects intellectual property. Single-threaded simulations using coupled FMUs have limitations: Suitable for real time applications? Large and complex models have long simulation execution times and demand for high memory consumption. 9/23
12 FMI/FMU - Short Recap Can bring together sub-models from distinctive authoring tools (language, libraries, solvers, etc.). Models can be used in already existing production codes and frameworks. Protects intellectual property. Single-threaded simulations using coupled FMUs have limitations: Suitable for real time applications? Large and complex models have long simulation execution times and demand for high memory consumption. Therefore, FMU simulations exploiting today s multi-core hardware are needed. 9/23
13 Synchronisation Step Approach Synchronisation of FMUs happens after predefined time intervals. At every synchronisation point, numerical error is calculated. In between inputs are extrapolated for single solver steps. error control extrapolation FMU 1 T 0 T 1 T 2 T 3 FMU 2 10/23
14 Synchronisation Step Approach Synchronisation of FMUs happens after predefined time intervals. At every synchronisation point, numerical error is calculated. In between inputs are extrapolated for single solver steps. error control extrapolation FMU 1 T 0 T 1 T 2 T 3 FMU 2 + No communication between synchronisation steps. + Clear and strait forward implementation possible. Revert FMUs to last synchronisation point or even rerun simulation. Communication leads to delays during synchronisation points. 10/23
15 Generic Approach - Idea Values of every valid solver step are communicated. Dependent FMUs take most recent values as inputs. Solvers can base the error estimation on profound data. Less interfering in solver step size. error controle and extra-/interpolation FMU 1 FMU 2 11/23
16 Generic Approach - Aspects and Challenges Aspects + Just single solver steps need to be rerun. + Replaces unsafe extrapolation with interpolation. + Direct error handling, i.e., change of numerical behaviour is treated on occurrence, solver step size depends only on input values, not on synch. points. Increasing communication effort. Sophisticated implementation with complex data structures. 12/23
17 Generic Approach - Aspects and Challenges Aspects + Just single solver steps need to be rerun. + Replaces unsafe extrapolation with interpolation. + Direct error handling, i.e., change of numerical behaviour is treated on occurrence, solver step size depends only on input values, not on synch. points. Increasing communication effort. Sophisticated implementation with complex data structures. Challenges Asynchronous communication is needed. FMUs need to be smartly distributed on system for high efficiency. Adoptable to different simulation setups. 12/23
18 Generic Approach - Aspects and Challenges Aspects + Just single solver steps need to be rerun. + Replaces unsafe extrapolation with interpolation. + Direct error handling, i.e., change of numerical behaviour is treated on occurrence, solver step size depends only on input values, not on synch. points. Increasing communication effort. Sophisticated implementation with complex data structures. Challenges Asynchronous communication is needed. Field of parallel computing provides several solutions. FMUs need to be smartly distributed on system for high efficiency. Adoptable to different simulation setups. 12/23
19 Generic Approach - Aspects and Challenges Aspects + Just single solver steps need to be rerun. + Replaces unsafe extrapolation with interpolation. + Direct error handling, i.e., change of numerical behaviour is treated on occurrence, solver step size depends only on input values, not on synch. points. Increasing communication effort. Sophisticated implementation with complex data structures. Challenges Asynchronous communication is needed. Field of parallel computing provides several solutions. FMUs need to be smartly distributed on system for high efficiency. Knowledge transfer from task scheduling. Adoptable to different simulation setups. 12/23
20 Generic Approach - Aspects and Challenges Aspects + Just single solver steps need to be rerun. + Replaces unsafe extrapolation with interpolation. + Direct error handling, i.e., change of numerical behaviour is treated on occurrence, solver step size depends only on input values, not on synch. points. Increasing communication effort. Sophisticated implementation with complex data structures. Challenges Asynchronous communication is needed. Field of parallel computing provides several solutions. FMUs need to be smartly distributed on system for high efficiency. Knowledge transfer from task scheduling. Adoptable to different simulation setups.! Requires interchangeable modular structure of the system and components. 12/23
21 FMU Simulation Framework Final goal is a generic and scalable framework for coupled FMU simulations. Framework for Parallel FMU Simulation OpenMP C++ Error control/handling Simulation Data Management MPI FMI Initialization I/O Communication Solver Generic User friendliness Adaptive step sizes Scalable 13/23
22 Outline 1 Introduction and Motivation 2 Coupled Simulations Using FMI 3 Design Approach 4 Summary and Outlook 14/23
23 DataHistory Challenge: How to provide input/output data for asynchronous simulation? FMU 1 FMU 2 15/23
24 DataHistory Challenge: How to provide input/output data for asynchronous simulation? Answer: Use buffers! Saves relevant state values of FMUs. Input values are written remotely. Interface for accessing input values on local storage. FMU 1 Buffer FMU 2 15/23
25 DataManager Challenge: How to achieve efficient communication and handling of parallel FMU simulation? 16/23
26 DataManager Challenge: How to achieve efficient communication and handling of parallel FMU simulation? Answer: Use a manager! Initiates shared and distributed memory writes for input values. Writes result data to file. Controls DataHistory, e.g., flushes unnecessary data. Interpolation/extrapolation of input data. Hides communication and data flow from solvers. FMU 1 Buffer FMU 2 16/23
27 Outline 1 Introduction and Motivation 2 Coupled Simulations Using FMI 3 Design Approach 4 Summary and Outlook 17/23
28 Summary Presented approach for efficient asynchronous parallel simulation of coupled FMUs. Key features and main challenges have been identified: Use task graph parallelized FMUs generated from OpenModelica. Use model exchange FMUs in order to obtain asynchronous simulation.! Need scalable data structures and communication.! Need localized buffers to provide input values for FMUs. 18/23
29 Outlook What has to be done? Finish implementation. Show scalability by performing large simulation with numerous FMUs. Find a fancy name for this piece of software - suggestions are welcome. Make a release available. 19/23
30 HPC-OM Thank you for your attention. 20/23
31 21/23
32 FMU Input Extrapolation error control extrapolation FMU 1 T 0 T 1 T 2 T 3 FMU 2 error control of input values only at synchronization points in between extrapolated inputs only based on previous interval in several cases FMUs need to be set back to last synchronization point 22/23
33 FMU Input Extrapolation error controle and extra-/interpolation FMU 1 FMU 2 error control of input values based on most recent values solver can directly check error after every step leading to higher numerical stability less interfering in solver step size more dynamical error handling possible 23/23
Efficient Clustering and Scheduling for Task-Graph based Parallelization
Center for Information Services and High Performance Computing TU Dresden Efficient Clustering and Scheduling for Task-Graph based Parallelization Marc Hartung 02. February 2015 E-Mail: marc.hartung@tu-dresden.de
More informationHigh-Performance-Computing meets OpenModelica: Achievements in the HPC-OM Project
OpenModelica Workshop 2015 High-Performance-Computing meets OpenModelica: Achievements in the HPC-OM Project Linköping, 02/02/2015 HPC-OM www.hpc-om.de slide 2 Outline Outline 1. Parallelization Approaches
More informationOpenModelica. Workshop Chair of Construction Machines. Functional Design-Prototyping using. OpenModelica. Volker Waurich
Chair of Construction Machines OpenModelica Workshop 2017 Functional Design-Prototyping using OpenModelica Volker Waurich Linköping, 06/02/2017 Outline Outline 1. Introduction 2. Functional Design-Prototyping
More informationFunctional Design-Prototyping using OpenModelica
Professur für Baumaschinen 15. Modelisax Treffen Functional Design-Prototyping using OpenModelica Volker Waurich Dresden, 01/03/2017 Outline Outline 1. Introduction 2. Functional Design-Prototyping 3.
More informationModelica3D. Platform Independent Simulation Visualization. Christoph Höger. Technische Universität Berlin Fraunhofer FIRST
Modelica3D Platform Independent Simulation Visualization Christoph Höger Technische Universität Berlin Fraunhofer FIRST c Fraunhofer FIRST/TU Berlin 6. Februar 2012 Motivation - Goal Dymola MultiBody Visualization
More informationTask-Graph-Based Parallelization of Modelica-Simulations. Tutorial on the Usage of the HPCOM-Module
Task-Graph-Based Parallelization of Modelica-Simulations Tutorial on the Usage of the HPCOM-Module 2 Introduction Prerequisites https://openmodelica.org/download/nightlybuildsdownload a multi-core cpu
More informationEvaluation and Improvements of Programming Models for the Intel SCC Many-core Processor
Evaluation and Improvements of Programming Models for the Intel SCC Many-core Processor Carsten Clauss, Stefan Lankes, Pablo Reble, Thomas Bemmerl International Workshop on New Algorithms and Programming
More informationMulti-core Simulation of Internal Combustion Engines using Modelica, FMI and xmod
CO 2 maîtrisé Carburants diversifiés Véhicules économes Raffinage propre Réserves prolongées Multi-core Simulation of Internal Combustion Engines using Modelica, FMI and xmod Abir Ben Khaled, IFPEN Mongi
More informationTuning Alya with READEX for Energy-Efficiency
Tuning Alya with READEX for Energy-Efficiency Venkatesh Kannan 1, Ricard Borrell 2, Myles Doyle 1, Guillaume Houzeaux 2 1 Irish Centre for High-End Computing (ICHEC) 2 Barcelona Supercomputing Centre (BSC)
More informationDynamic Load Balancing in Parallelization of Equation-based Models
Dynamic Load Balancing in Parallelization of Equation-based Models Mahder Gebremedhin Programing Environments Laboratory (PELAB), IDA Linköping University mahder.gebremedhin@liu.se Annual OpenModelica
More informationACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS
ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation
More informationFunctional Mockup Interface for Tool and Model Interoperability
Functional Mockup Interface for Tool and Model Interoperability Willi Braun, Bernhard Bachmann Acknowledgements: FMI Development Project is developing FMI. Most slides in this presentation by Martin Otter,
More informationThe Use of Cloud Computing Resources in an HPC Environment
The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes
More informationA Modular. OpenModelica. Compiler Backend
Chair of Construction Machines and Conveying Technology OpenModelica Workshop 2011 A Modular OpenModelica Compiler Backend J. Frenkel W. Braun A. Pop M. Sjölund Outline 1. Introduction 2. Concept of Modular
More informationOptimize HPC - Application Efficiency on Many Core Systems
Meet the experts Optimize HPC - Application Efficiency on Many Core Systems 2018 Arm Limited Florent Lebeau 27 March 2018 2 2018 Arm Limited Speedup Multithreading and scalability I wrote my program to
More information1. Define algorithm complexity 2. What is called out of order in detail? 3. Define Hardware prefetching. 4. Define software prefetching. 5. Define wor
CS6801-MULTICORE ARCHECTURES AND PROGRAMMING UN I 1. Difference between Symmetric Memory Architecture and Distributed Memory Architecture. 2. What is Vector Instruction? 3. What are the factor to increasing
More informationREADEX: A Tool Suite for Dynamic Energy Tuning. Michael Gerndt Technische Universität München
READEX: A Tool Suite for Dynamic Energy Tuning Michael Gerndt Technische Universität München Campus Garching 2 SuperMUC: 3 Petaflops, 3 MW 3 READEX Runtime Exploitation of Application Dynamism for Energy-efficient
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 4, 2016 Outline Multi-core v.s. multi-processor Parallel Gradient Descent Parallel Stochastic Gradient Parallel Coordinate Descent Parallel
More informationIntroducing OTF / Vampir / VampirTrace
Center for Information Services and High Performance Computing (ZIH) Introducing OTF / Vampir / VampirTrace Zellescher Weg 12 Willers-Bau A115 Tel. +49 351-463 - 34049 (Robert.Henschel@zih.tu-dresden.de)
More informationLeveraging Flash in HPC Systems
Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationQuantifying power consumption variations of HPC systems using SPEC MPI benchmarks
Center for Information Services and High Performance Computing (ZIH) Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks EnA-HPC, Sept 16 th 2010, Robert Schöne, Daniel Molka,
More informationThe MPI Message-passing Standard Practical use and implementation (I) SPD Course 2/03/2010 Massimo Coppola
The MPI Message-passing Standard Practical use and implementation (I) SPD Course 2/03/2010 Massimo Coppola What is MPI MPI: Message Passing Interface a standard defining a communication library that allows
More informationAlgorithms, System and Data Centre Optimisation for Energy Efficient HPC
2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy
More informationFunctional Mockup Interface (FMI) A General Standard for Model Exchange and Simulator Coupling
Functional Mockup Interface (FMI) A General Standard for Model Exchange and Simulator Coupling Adeel Asghar and Willi Braun Linköping University University of Applied Sciene Bielefeld 2017-02-07 FMI Motivation
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationL21: Putting it together: Tree Search (Ch. 6)!
Administrative CUDA project due Wednesday, Nov. 28 L21: Putting it together: Tree Search (Ch. 6)! Poster dry run on Dec. 4, final presentations on Dec. 6 Optional final report (4-6 pages) due on Dec. 14
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationGeneration of Functional Mock-up Units from Causal Block Diagrams
Generation of Functional Mock-up Units from Causal Block Diagrams Bavo Vander Henst University of Antwerp Model Driven Engineering Bavo.VanderHenst@student.uantwerpen.be Abstract The purpose of this paper
More informationLeveraging Burst Buffer Coordination to Prevent I/O Interference
Leveraging Burst Buffer Coordination to Prevent I/O Interference Anthony Kougkas akougkas@hawk.iit.edu Matthieu Dorier, Rob Latham, Rob Ross, Xian-He Sun Wednesday, October 26th Baltimore, USA Outline
More informationFirst Experiences with Intel Cluster OpenMP
First Experiences with Intel Christian Terboven, Dieter an Mey, Dirk Schmidl, Marcus Wagner surname@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University, Germany IWOMP 2008 May
More informationINTEROPERABILITY WITH FMI TOOLS AND SOFTWARE COMPONENTS. Johan Åkesson
INTEROPERABILITY WITH FMI TOOLS AND SOFTWARE COMPONENTS Johan Åkesson 1 OUTLINE FMI Technology FMI tools Industrial FMI integration example THE FUNCTIONAL MOCK-UP INTERFACE Problems/needs Component development
More informationParallel Programming Environments. Presented By: Anand Saoji Yogesh Patel
Parallel Programming Environments Presented By: Anand Saoji Yogesh Patel Outline Introduction How? Parallel Architectures Parallel Programming Models Conclusion References Introduction Recent advancements
More informationHPC code modernization with Intel development tools
HPC code modernization with Intel development tools Bayncore, Ltd. Intel HPC Software Workshop Series 2016 HPC Code Modernization for Intel Xeon and Xeon Phi February 17 th 2016, Barcelona Microprocessor
More informationFMI for Industrial Programmable Logic Controllers Rüdiger Kampfmann
FMI for Industrial Programmable Logic Controllers Rüdiger Kampfmann 07.02.2017 1 Outline Motivation Toolchain Application Limitations 2 Outline Motivation Toolchain Application Limitations 3 Motivation
More informationExpressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17
Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Tutorial Instructors [James Reinders, Michael J. Voss, Pablo Reble, Rafael Asenjo]
More informationOverview: Memory Consistency
Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering
More informationSciDAC CScADS Summer Workshop on Libraries and Algorithms for Petascale Applications
Parallel Tiled Algorithms for Multicore Architectures Alfredo Buttari, Jack Dongarra, Jakub Kurzak and Julien Langou SciDAC CScADS Summer Workshop on Libraries and Algorithms for Petascale Applications
More informationRAMSES on the GPU: An OpenACC-Based Approach
RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU
More informationPerformance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG
Performance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG Holger Brunst Center for High Performance Computing Dresden University, Germany June 1st, 2005 Overview Overview
More informationMost real programs operate somewhere between task and data parallelism. Our solution also lies in this set.
for Windows Azure and HPC Cluster 1. Introduction In parallel computing systems computations are executed simultaneously, wholly or in part. This approach is based on the partitioning of a big task into
More informationParallelization Using a PGAS Language such as X10 in HYDRO and TRITON
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallelization Using a PGAS Language such as X10 in HYDRO and TRITON Marc Tajchman* a a Commissariat à l énergie atomique
More informationL20: Putting it together: Tree Search (Ch. 6)!
Administrative L20: Putting it together: Tree Search (Ch. 6)! November 29, 2011! Next homework, CUDA, MPI (Ch. 3) and Apps (Ch. 6) - Goal is to prepare you for final - We ll discuss it in class on Thursday
More informationPerformance analysis basics
Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis
More informationSplotch: High Performance Visualization using MPI, OpenMP and CUDA
Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,
More informationFourteen years of Cactus Community
Fourteen years of Cactus Community Frank Löffler Center for Computation and Technology Louisiana State University, Baton Rouge, LA September 6th 2012 Outline Motivation scenario from Astrophysics Cactus
More informationOutline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work
Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3
More informationUnder the Hood, Part 1: Implementing Message Passing
Lecture 27: Under the Hood, Part 1: Implementing Message Passing Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 Today s Theme 2 Message passing model (abstraction) Threads
More informationCHAO YANG. Early Experience on Optimizations of Application Codes on the Sunway TaihuLight Supercomputer
CHAO YANG Dr. Chao Yang is a full professor at the Laboratory of Parallel Software and Computational Sciences, Institute of Software, Chinese Academy Sciences. His research interests include numerical
More informationPerformance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures
Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures Dirk Ribbrock, Markus Geveler, Dominik Göddeke, Stefan Turek Angewandte Mathematik, Technische Universität Dortmund
More informationHybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes
Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD) & Alexandros Stamatakis (TUM) February 25, 2010 What was done? Why is it important? Who cares? Hybrid MPI/OpenMP
More informationOpenACC 2.6 Proposed Features
OpenACC 2.6 Proposed Features OpenACC.org June, 2017 1 Introduction This document summarizes features and changes being proposed for the next version of the OpenACC Application Programming Interface, tentatively
More informationAUTOMATIC SMT THREADING
AUTOMATIC SMT THREADING FOR OPENMP APPLICATIONS ON THE INTEL XEON PHI CO-PROCESSOR WIM HEIRMAN 1,2 TREVOR E. CARLSON 1 KENZO VAN CRAEYNEST 1 IBRAHIM HUR 2 AAMER JALEEL 2 LIEVEN EECKHOUT 1 1 GHENT UNIVERSITY
More informationA Modelica Power System Library for Phasor Time-Domain Simulation
2013 4th IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe), October 6-9, Copenhagen 1 A Modelica Power System Library for Phasor Time-Domain Simulation T. Bogodorova, Student Member, IEEE,
More information"Charting the Course to Your Success!" MOC A Developing High-performance Applications using Microsoft Windows HPC Server 2008
Description Course Summary This course provides students with the knowledge and skills to develop high-performance computing (HPC) applications for Microsoft. Students learn about the product Microsoft,
More informationPROCESSES AND THREADS
PROCESSES AND THREADS A process is a heavyweight flow that can execute concurrently with other processes. A thread is a lightweight flow that can execute concurrently with other threads within the same
More informationIntroduction to Parallel Performance Engineering
Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationHigh Performance Computing. Introduction to Parallel Computing
High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials
More informationAutoTune Workshop. Michael Gerndt Technische Universität München
AutoTune Workshop Michael Gerndt Technische Universität München AutoTune Project Automatic Online Tuning of HPC Applications High PERFORMANCE Computing HPC application developers Compute centers: Energy
More informationOptimizing an Earth Science Atmospheric Application with the OmpSs Programming Model
www.bsc.es Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model HPC Knowledge Meeting'15 George S. Markomanolis, Jesus Labarta, Oriol Jorba University of Barcelona, Barcelona,
More informationMonitoring and Trouble Shooting on BioHPC
Monitoring and Trouble Shooting on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2017-03-15 Why Monitoring & Troubleshooting data code Monitoring jobs running
More informationParallel Programming on Larrabee. Tim Foley Intel Corp
Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This
More informationBİL 542 Parallel Computing
BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,
More informationScalable and Fault Tolerant Failure Detection and Consensus
EuroMPI'15, Bordeaux, France, September 21-23, 2015 Scalable and Fault Tolerant Failure Detection and Consensus Amogh Katti, Giuseppe Di Fatta, University of Reading, UK Thomas Naughton, Christian Engelmann
More informationASYNCHRONOUS SHADERS WHITE PAPER 0
ASYNCHRONOUS SHADERS WHITE PAPER 0 INTRODUCTION GPU technology is constantly evolving to deliver more performance with lower cost and lower power consumption. Transistor scaling and Moore s Law have helped
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #15 3/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline
More informationExploiting Task-Parallelism on GPU Clusters via OmpSs and rcuda Virtualization
Exploiting Task-Parallelism on Clusters via Adrián Castelló, Rafael Mayo, Judit Planas, Enrique S. Quintana-Ortí RePara 2015, August Helsinki, Finland Exploiting Task-Parallelism on Clusters via Power/energy/utilization
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationHPC Workshop University of Kentucky May 9, 2007 May 10, 2007
HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Part 3 Parallel Programming Parallel Programming Concepts Amdahl s Law Parallel Programming Models Tools Compiler (Intel) Math Libraries (Intel)
More informationBring your application to a new era:
Bring your application to a new era: learning by example how to parallelize and optimize for Intel Xeon processor and Intel Xeon Phi TM coprocessor Manel Fernández, Roger Philp, Richard Paul Bayncore Ltd.
More informationPower-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Gregor von Laszewski, Lizhe Wang, Andrew J. Younge, Xi He Service Oriented Cyberinfrastructure Lab Rochester Institute of Technology,
More informationReconstruction of Trees from Laser Scan Data and further Simulation Topics
Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair
More informationEE/CSCI 451 Spring 2017 Homework 3 solution Total Points: 100
EE/CSCI 451 Spring 2017 Homework 3 solution Total Points: 100 1 [10 points] 1. Task parallelism: The computations in a parallel algorithm can be split into a set of tasks for concurrent execution. Task
More informationA Characterization of Shared Data Access Patterns in UPC Programs
IBM T.J. Watson Research Center A Characterization of Shared Data Access Patterns in UPC Programs Christopher Barton, Calin Cascaval, Jose Nelson Amaral LCPC `06 November 2, 2006 Outline Motivation Overview
More informationModel-Based Development of Multi-Disciplinary Systems Challenges and Opportunities
White Paper Model-Based Development of Multi-Disciplinary Systems Challenges and Opportunities Model-Based Development In the early days, multi-disciplinary systems, such as products involving mechatronics,
More informationParallelism paradigms
Parallelism paradigms Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 23, 2011 Outline 1 Parallelization strategies 2 Shared memory 3 Distributed memory 4 Parallelization
More informationEZTrace upcoming features
EZTrace 1.0 + upcoming features François Trahay francois.trahay@telecom-sudparis.eu 2015-01-08 Context Hardware is more and more complex NUMA, hierarchical caches, GPU,... Software is more and more complex
More informationPerformance Analysis with Vampir
Performance Analysis with Vampir Johannes Ziegenbalg Technische Universität Dresden Outline Part I: Welcome to the Vampir Tool Suite Event Trace Visualization The Vampir Displays Vampir & VampirServer
More information5/5/2012. Message Passing Programming Model Blocking communication. Non-Blocking communication Introducing MPI. Non-Buffered Buffered
Lecture 7: Programming Using the Message-Passing Paradigm 1 Message Passing Programming Model Blocking communication Non-Buffered Buffered Non-Blocking communication Introducing MPI 2 1 Programming models
More informationET International HPC Runtime Software. ET International Rishi Khan SC 11. Copyright 2011 ET International, Inc.
HPC Runtime Software Rishi Khan SC 11 Current Programming Models Shared Memory Multiprocessing OpenMP fork/join model Pthreads Arbitrary SMP parallelism (but hard to program/ debug) Cilk Work Stealing
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationChapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin
More informationParallel programming models. Main weapons
Parallel programming models Von Neumann machine model: A processor and it s memory program = list of stored instructions Processor loads program (reads from memory), decodes, executes instructions (basic
More informationMiddleware and Interprocess Communication
Middleware and Interprocess Communication Reading Coulouris (5 th Edition): 41 4.1, 42 4.2, 46 4.6 Tanenbaum (2 nd Edition): 4.3 Spring 2015 CS432: Distributed Systems 2 Middleware Outline Introduction
More informationGenerating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory
Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri Thejas Ramashekar Chandan Reddy Uday Bondhugula Department of Computer Science and Automation
More informationVAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW
VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW 8th VI-HPS Tuning Workshop at RWTH Aachen September, 2011 Tobias Hilbrich and Joachim Protze Slides by: Andreas Knüpfer, Jens Doleschal, ZIH, Technische Universität
More informationScalable, Hybrid-Parallel Multiscale Methods using DUNE
MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE R. Milk S. Kaulmann M. Ohlberger December 1st 2014 Outline MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 2 /28 Abstraction
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationParallel Execution of Functional Mock-up Units in Buildings Modeling
ORNL/TM-2016/173 Parallel Execution of Functional Mock-up Units in Buildings Modeling Ozgur Ozmen James J. Nutaro Joshua R. New Approved for public release. Distribution is unlimited. June 30, 2016 DOCUMENT
More informationMultiprocessor scheduling
Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.
More informationCS377P Programming for Performance Multicore Performance Multithreading
CS377P Programming for Performance Multicore Performance Multithreading Sreepathi Pai UTCS October 14, 2015 Outline 1 Multiprocessor Systems 2 Programming Models for Multicore 3 Multithreading and POSIX
More informationLast Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications
Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#12: External Sorting (R&G, Ch13) Static Hashing Extendible Hashing Linear Hashing Hashing
More informationELP. Effektive Laufzeitunterstützung für zukünftige Programmierstandards. Speaker: Tim Cramer, RWTH Aachen University
ELP Effektive Laufzeitunterstützung für zukünftige Programmierstandards Agenda ELP Project Goals ELP Achievements Remaining Steps ELP Project Goals Goals of ELP: Improve programmer productivity By influencing
More informationTechniques to improve the scalability of Checkpoint-Restart
Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart
More informationProgramming Models for Multi- Threading. Brian Marshall, Advanced Research Computing
Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows
More informationEquation based parallelization of Modelica models
Marcus Walther Volker Waurich Christian Schubert Dr.-Ing. Ines Gubsch Dresden University of Technology {marcus.walther, volker.waurich, christian.schubert, ines.gubsch@tu-dresden.de Abstract In order to
More information