High Level Mathematical Modeling and Parallel/GRID Computing with Modelica. Peter Fritzson

Similar documents
MetaModelica A Unified Equation-Based Semantical and Mathematical Modeling Language

OpenModelica Environment and Modelica Overview

OpenModelica OMShell and OMNotebook Introduction and Exercises

Integrated Model-Driven Development Environments for Equation-Based Object-Oriented Languages

Modelica Environments and OpenModelica

Towards Unified System Modeling with the ModelicaML UML Profile

Modelica Language Concepts. Classes and Inheritance

Towards Run-time Debugging of Equation-based Object-oriented Languages

Increasing programmability

Automatic Parallelization of Mathematical Models Solved with Inlined Runge-Kutta Solvers

AUTOMATIC PARALLELIZATION OF OBJECT ORIENTED MODELS ACROSS METHOD AND SYSTEM

The OpenModelica Modeling, Simulation, and Development Environment

INTRODUCTION TO OBJECT-ORIENTED MODELING AND SIMULATION WITH MODELICA USING THE OPENMODELICA ENVIRONMENT

Vice Chairman of Modelica Association Director of Open Source Modelica Consortium

Parallel Computing Using Modelica

CIM 2 Modelica Factory

OpenModelica Compiler (OMC) Overview

OpenModelica Compiler (OMC) Overview

OpenModelica Development Environment with Eclipse Integration for Browsing, Modeling, and Debugging

Dynamic Load Balancing in Parallelization of Equation-based Models

Oriented Modeling and Simulation with Modelica

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

MathWorks Technology Session at GE Physical System Modeling with Simulink / Simscape

int a; class x { public int a; class x { float b; Asda ad public int a; Asa asad Aac sdscfcc c a

Parallel Code Generation in MathModelica / An Object Oriented Component Based Simulation Environment

MathModelica An Extensible Modeling and Simulation Environment with Integrated Graphics and Literate Programming

Modelica. Language, Libraries, Tools, Workshop and EU-Project RealSim

Rotational3D Efficient modelling of 3D effects in rotational mechanics

Simulation and Benchmarking of Modelica Models on Multi-core Architectures with Explicit Parallel Algorithmic Language Extensions

MapleSim User's Guide

An Easy-to-use Generic Model Configurator for Models in Multiple Tool Formats

6.1 Multiprocessor Computing Environment

Research in Model-Based Product Development at PELAB in the MODPROD Center

Parallel Architectures

An Introduction to Parallel Programming

Wolfram SystemModeler. Getting Started

16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model

Technical Overview of OpenModelica and its Development Environment

Improving the Practicality of Transactional Memory

Proceedings of the 4th International Modelica Conference, Hamburg, March 7-8, 2005, Gerhard Schmitz (editor)

Modeling Kernel Language (MKL)

Design optimisation of industrial robots using the Modelica multi-physics modeling language

Predictable Timing of Cyber-Physical Systems Future Research Challenges

A brief introduction to OpenMP

The Modelica Object-Oriented Equation-Based Language and Its OpenModelica Environment with MetaModeling, Interoperability, and Parallel Execution

EE/CSCI 451: Parallel and Distributed Computation

Behavioral Model Composition in Simulation-Based Design

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

COMP Parallel Computing. SMM (2) OpenMP Programming Model

This is the published version of a paper presented at IEEE PES General Meeting 2013.

Quick Start Training Guide

Physical Modelling with Simscape

COSC 6374 Parallel Computation. Parallel Design Patterns. Edgar Gabriel. Fall Design patterns

Performance of computer systems

Introduction to Physical Modelling Rory Adams Senior Application Engineer

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

An Eclipse-based Integrated Environment for Developing Executable Structural Operational Semantics Specifications

ADMINISTRATIVE MANAGEMENT COLLEGE

1st Annual OpenModelica Workshop Feb 2, 2009

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Master Class: Diseño de Sistemas Mecatrónicos

The Art of Parallel Processing

Fall 2018 Updates. Materials and Energy Balances. Fundamental Programming Concepts. Data Structure Essentials (Available now) Circuits (Algebra)

The Bulk Synchronous Parallel Model (PSC 1.2) Lecture 1.2 Bulk Synchronous Parallel Model p.1

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Online Course Evaluation. What we will do in the last week?

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

CloneCloud: Elastic Execution between Mobile Device and Cloud, Chun et al.

CS 475: Parallel Programming Introduction

Boca Raton Community High School AP Computer Science A - Syllabus 2009/10

Modeling Technical Systems [ ] Mag MA MA Schweiger Gerald TU Graz Spring 2017

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser

Modeling Structural - Dynamics Systems in MODELICA/Dymola, MODELICA/Mosilab and AnyLogic

Parallel Computer Architecture Concepts

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Michel Heydemann Alain Plaignaud Daniel Dure. EUROPEAN SILICON STRUCTURES Grande Rue SEVRES - FRANCE tel : (33-1)

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS

10th August Part One: Introduction to Parallel Computing

A Proposal for Verified Code Generation from Modelica

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009

Introduction to Parallel Programming

Software-Entwicklungswerkzeuge

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved.

A Test Suite for High-Performance Parallel Java

Software-Entwicklungswerkzeuge. Modelica Overview. Contents. 1. Modelica Introduction. 2. Modelica Users View. 3. Modelica Libraries

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

High Performance Computing on GPUs using NVIDIA CUDA

Parallel Computing Introduction

Outline Marquette University

What is a parallel computer?

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems

Flight Systems are Cyber-Physical Systems

Parallel Computing. Hwansoo Han (SKKU)

CDA3101 Recitation Section 13

EE382N (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV

Transcription:

High Level Mathematical Modeling and Parallel/GRID Computing with Modelica PELAB Programming Environment Laboratory Department of Computer and Information Science, PELAB Linköping University, SE 58-32 Linköping, Sweden {petfr}@ida.liu.se Parallel Programming Workshop Linköping 2006 June 3 Outline Complex Systems and High Performance Simulations Introduction to Modelica Overview of OpenModelica Environment Automatic fine-grained parallelization Explicit model-based parallelization Explicit parallel programming 2 Parallel Programming Workshop Linköping 2006 June 3

Examples of Complex Systems Robotics Automotive Aircraft Living organisms Power plants Heavy Vehicles 3 Parallel Programming Workshop Linköping 2006 June 3 Objectives Convenient high level mathematical modeling and simulation of complex systems High performance computation using parallel computers 4 Parallel Programming Workshop Linköping 2006 June 3

Towards High-Level Parallel Modeling and Simulation Simulations are time-consuming Moore s Law : (since 965) #devices per chip area doubles every 8 months CPU clock rate also doubled every 8 months until 2003, superscalar technology has reached its limits, only (thread-level) parallelism can increase throughput substantially The consequence: Chip multiprocessors (+ clusters) Multi-core, PIM,... (for general-purpose computing) Need parallel programming/modeling Automatic parallelization Explicit parallel modeling and parallel programming Log2 Speedup 6.0 4.0 2.0 0.0 8.0 6.0 4.0 Single-processor Performance Scaling Throughput incr. 55%/year Device speed Limit: Clock rate Pipelining Parallelism Assumed increase 7%/year possible 2.0 Limit: RISC ILP RISC/CISC CPI 0.0 then: heat and power issues, limited ILP,... 2006 Source: Doug Burger, UT Austin 2005 90 nm 65 nm45 nm 32nm 22nm 984 986 988 990 992 994 996 998 2000 2002 2004 2006 2008 200 202 204 206 208 2020 5 Parallel Programming Workshop Linköping 2006 June 3 Three Approaches to Parallel Computation in Mathematical Modeling with Modelica Automatic fine-grained parallelization of mathematical models (ModPar) Coarse-Grained Explicit Parallelization Using Components (GRIDMOdelica) Explicit Parallel Programming Constructs in Modelica (NestStepModelica) 6 Parallel Programming Workshop Linköping 2006 June 3

Background Modelica the Next Generation Modeling Language 7 Parallel Programming Workshop Linköping 2006 June 3 Stored Scientific and Engineering Knowledge Model knowledge is stored in books and human minds which computers cannot access The change of motion is proportional to the motive force impressed Newton 8 Parallel Programming Workshop Linköping 2006 June 3

The Form Equations Equations were used in the third millennium B.C. Equality sign was introduced by Robert Recorde in 557 Newton still wrote text (Principia, vol., 686) The change of motion is proportional to the motive force impressed CSSL (967) introduced a special form of equation : variable = expression v = INTEG(F)/m Programming languages usually do not allow equations! 9 Parallel Programming Workshop Linköping 2006 June 3 Modelica The Next Generation Modeling Language Declarative language Equations and mathematical functions allow acausal modeling, high level specification, increased correctness Multi-domain modeling Combine electrical, mechanical, thermodynamic, hydraulic, biological, control, event, real-time, etc... Everything is a class Strongly typed object-oriented language with a general class concept, Java & MATLAB-like syntax Visual component programming Hierarchical system architecture capabilities Efficient, non-proprietary Efficiency comparable to C; advanced equation compilation, e.g. 300 000 equations, ~50 000 lines on standard PC 0 Parallel Programming Workshop Linköping 2006 June 3

Modelica Language Properties Declarative and Object-Oriented Equation-based; continuous and discrete equations Parallel process modeling of real-time applications, according to synchronous data flow principle Functions with algorithms without global side-effects (but local data updates allowed) Type system inspired by Abadi/Cardelli Everything is a class Real, Integer, models, functions, packages, parameterized classes... Parallel Programming Workshop Linköping 2006 June 3 Object Oriented Mathematical Modeling with Modelica The static declarative structure of a mathematical model is emphasized OO is primarily used as a structuring concept OO is not viewed as dynamic object creation and sending messages Dynamic model properties are expressed in a declarative way through equations. Acausal classes supports better reuse of modeling and design knowledge than traditional classes 2 Parallel Programming Workshop Linköping 2006 June 3

Brief Modelica History First Modelica design group meeting in fall 996 International group of people with expert knowledge in both language design and physical modeling Industry and academia Modelica Versions.0 released September 997 2.0 released March 2002 Latest version, 2.2 released March 2005 Modelica Association established 2000 Open, non-profit organization 3 Parallel Programming Workshop Linköping 2006 June 3 Modelica Conferences The st International Modelica conference October, 2000 The 2 nd International Modelica conference March 8-9, 2002 The 3 rd International Modelica conference November 5-6, 2003 in Linköping, Sweden The 4 th International Modelica conference March 6-7, 2005 in Hamburg, Germany The 5 th International Modelica conference September 4-5, 2006 in Vienna, Austria 4 Parallel Programming Workshop Linköping 2006 June 3

Modelica Class Libraries - for Reuse Visual View Info R= C= L= G Info bar= body= bodybar= AC= DC= Info Vs shaft= shaft3ds= shaft3d= S Is S gear2= shafts= inertial S cylbody=bodyshape= S revs= S prisms= S screw S= cyls= S D T - + diff= Op gear= V i planet= planetary= ring= univs S planars= spheres S S frees S rev= prism= E : bearing sun= fixtooth S moves c= d= move screw = barc= y cyl= univ planar= sphere spherec free c= d= cser= torque fric= C x barc2= C frictab clutch= converter r force lineforce= sensor linesensor torque linetorque= s sd S state w a t fixedbase advanced drive translation Library Library Library 5 Parallel Programming Workshop Linköping 2006 June 3 Modelica Model Example Industry Robot k2 i qddref qdref S qref S k i r3control r3motor r3drive cut joint qd tn axis6 axis5 l qdref Kd 0.03 S rel qref psum - Kv 0.3 sum + + w Sum - rate2 b(s) a(s) rate3 340.8 S iref Jmotor=J spring=c fric=rv0 gear=i joint=0 S axis4 axis3 rate tacho2 tacho b(s) a(s) b(s) a(s) PT g5 axis2 q qd Rd=00 Rp2=50 C=0.004*D/w m Rp=200 Ra=250 axis Rd2=00 - + diff Ri=0 - + OpI - + pow er La=(250/(2*D*wm)) Rd4=00 Srel = n*transpose(n)+(identity(3)- n*transpose(n))*cos(q)- skew(n)*sin(q); g3 wrela = n*qd; zrela = n*qdd; g Sb = Sa*transpose(Srel); r0b = r0a; hall vb = Srel*va; wb = Srel*(wa + wrela); ab = Srel*aa; g4 g2 qd zb = Srel*(za + zrela + cross(wa, wrela)); Vs Rd3=00 hall2 emf w r q y x inertial Courtesy of ABB Corp. Research and of Martin Otter, DLR 6 Parallel Programming Workshop Linköping 2006 June 3

Modelica Model Example GTX Gas Turbine Power Cutoff Mechanism Hello Courtesy of Siemens Industrial Turbomachinery AB 7 Parallel Programming Workshop Linköping 2006 June 3 Graphical Modeling /Visual Programming View 8 Parallel Programming Workshop Linköping 2006 June 3

Multi-Domain (Electro-Mechanical) Modelica DCMotor Model A DC motor can be thought of as an electrical circuit which also contains an electromechanical component. model DCMotor Resistor R(R=00); Inductor L(L=00); VsourceDC DC(f=0); Ground G; ElectroMechanicalElement EM(k=0,J=0, b=2); Inertia load; equation R L connect(dc.p,r.n); connect(r.p,l.n); DC connect(l.p, EM.n); connect(em.p, DC.n); connect(dc.n,g.p); connect(em.flange,load.flange); G end DCMotor EM load 9 Parallel Programming Workshop Linköping 2006 June 3 Corresponding DCMotor Model Equations The following equations are automatically derived from the Modelica model: (load component not included) Automatic transformation to ODE or DAE for simulation: 20 Parallel Programming Workshop Linköping 2006 June 3

The Translation Process Modelica Graphical Editor Modelica Textual Editor Modelica Model Modelica Source code Modelica Model Translator Analyzer Optimizer Code generator C Compiler Simulation Flat model Sorted equations Optimized sorted equations C Code Executable 2 Parallel Programming Workshop Linköping 2006 June 3 Modelica Acausal Modeling What is acausal modeling/design? Why does it increase reuse? The acausality makes Modelica library classes more reusable than traditional classes containing assignment statements where the input-output causality is fixed. Example: a resistor equation: R*i = v; 22 can be used in three ways: i := v/r; v := R*i; R := v/i; Parallel Programming Workshop Linköping 2006 June 3

Visual Model Design Using Connector Classes, Components and Connections Connector Pin Voltage v; flow Current i; End Pin; Keyword flow indicates that currents of connected pins sums to zero. A connect statement in Modelica connect(pin,pin2) corresponds to Pin.v = Pin2.v Pin.i + Pin2.i = 0 23 Connection between Pin and Pin2 Parallel Programming Workshop Linköping 2006 June 3 Common Component Structure as SuperClass model TwoPin Superclass of elements with two electrical pins Pin p,n; Voltage v; Current i; equation v = p.v n.v; 0 = p.i + n.i; i = p.i; end TwoPin; 24 Parallel Programming Workshop Linköping 2006 June 3

Electrical Components Reuse TwoPin SuperClass model Resistor Ideal electrical resistor extends TwoPin; parameter Real R Resistance ; equation R*i = u end Resistor; model Inductor Ideal electrical inductor extends TwoPin; parameter Real L Inductance ; equation L*der(i) = u end Inductor; 25 Parallel Programming Workshop Linköping 2006 June 3 Discrete-time time vs. Continuous-time time Continuous-time variables: change at any point in time Discrete-time variables only change at certain points in time y,z y z event event 2 event 3 time 26 Parallel Programming Workshop Linköping 2006 June 3

Modelica Discrete and Hybrid Properties Discrete event semantics based on conditional equations (if conditions..., when conditions..., etc.) Discrete event model follows synchronous data flow principle events take no time Efficient hybrid modeling and simulation possible Can handle most discrete formalisms: FSA, DEVS, Petri Nets, State Charts, Queuing models,... 27 Parallel Programming Workshop Linköping 2006 June 3 Recent Book, 2004 Principles of Object Oriented Modeling and Simulation with Modelica 2. Wiley-IEEE Press 940 pages Book web page: www.mathcore.com/drmodelica 28 Parallel Programming Workshop Linköping 2006 June 3

The OpenModelica Environment 29 Parallel Programming Workshop Linköping 2006 June 3 OpenModelica Environment The goals of the OpenModelica project is to: Create a complete Modelica modeling, compilation and simulation environment. Provide free software distributed in binary and source code form. Provide a modeling and simulation environment for research, teaching, and industrial purposes. Reference implementation of Modelica in Modelica http://www.ida.liu.se/projects/openmodelica 30 Parallel Programming Workshop Linköping 2006 June 3

OpenModelica End-Users vs. Developers OpenModelica End-Users People who use OpenModelica for modeling and simulation OpenModelica Developers People who develop/contribute to parts in the OpenModelica environment including the OpenModelica compiler 3 Parallel Programming Workshop Linköping 2006 June 3 OpenModelica Environment Architecture Eclipse Plugin Editor/Browser Emacs Editor/Browser Interactive session handler Graphical Model Editor/Browser Textual Model Editor DrModelica OMNoteBook Model Editor Execution Modelica Compiler Modelica Debugger 32 Parallel Programming Workshop Linköping 2006 June 3

OpenModelica Client-Server Architecture Callable via CORBA API Server: Main Program Including Compiler, Interpreter, etc. Parse Corba Client: Graphic Model Editor Client: OMShell Interactive Session Handler SCode Inst Ceval Interactive system Untyped API Typed Checked Command API plot etc. Client: Eclipse Plugin MDT 33 Parallel Programming Workshop Linköping 2006 June 3 Interactive Session Handler on dcmotor Example (Session handler called OMShell OpenModelica Shell) >>simulate(dcmotor,starttime=0.0,stoptime=0.0) >>plot({load.w,load.phi}) model dcmotor Modelica.Electrical.Analog.Basic.Resistor r(r=0); Modelica.Electrical.Analog.Basic.Inductor i; Modelica.Electrical.Analog.Basic.EMF emf; Modelica.Mechanics.Rotational.Inertia load; Modelica.Electrical.Analog.Basic.Ground g; Modelica.Electrical.Analog.Sources.ConstantVoltage v; equation connect(v.p,r.p); connect(v.n,g.p); connect(r.n,i.p); connect(i.n,emf.p); connect(emf.n,g.p); connect(emf.flange_b,load.flange_a); end dcmotor; 34 Parallel Programming Workshop Linköping 2006 June 3

Event Handling by OpenModelica BouncingBall >>simulate(bouncingball, stoptime=3.0); >>plot({h,flying}); model BouncingBall parameter Real e=0.7 "coefficient of restitution"; parameter Real g=9.8 "gravity acceleration"; Real h(start=) "height of ball"; Real v "velocity of ball"; Boolean flying(start=true) "true, if ball is flying"; Boolean impact; Real v_new; equation impact=h <= 0.0; der(v)=if flying then -g else 0; der(h)=v; when {h <= 0.0 and v <= 0.0,impact} then v_new=if edge(impact) then -e*pre(v) else 0; flying=v_new > 0; reinit(v, v_new); end when; end BouncingBall; 35 Parallel Programming Workshop Linköping 2006 June 3 OpenModelica Eclipse Plugin in Action Browsing and Building OpenModelica Compiler Browsing of Modelica/MetaMode lica packages, classes, functions Automatic building of executables Separate compilation Syntax highlighting Code completion, Code query support for developers Automatic Indentation 36 Parallel Programming Workshop Linköping 2006 June 3

Graphic Modelica Model Editor (From MathCore; runs on Windows, Linux) Runs together with OpenModelica Free for university usage 37 Parallel Programming Workshop Linköping 2006 June 3 OpenModelica OMNotebook Electronic Notebook with DrModelica Primarily for teaching Interactive electronic book Platform independent OMNotebook Does not need Mathematica 38 Parallel Programming Workshop Linköping 2006 June 3

Interactive Contents in DrModelica Contains Examples and Exercises from Modelica Book 39 Parallel Programming Workshop Linköping 2006 June 3 Cells with both Text and Graphics 40 Parallel Programming Workshop Linköping 2006 June 3

OpenModelica MetaModelica Compiler Supports extended subset of Modelica Used for development of OpenModelica compiler in Modelica Some MetaModelica Language properties: Modelica syntax and base semantics Pattern matching (named/positional) Local equations (local within expression) Recursive tree data structures Lists and tuples Garbage collection of heap-allocated data Arrays (with local update as in standard Modelica) Polymorphic functions Function parameters to functions Simple builtin exception (failure) handling mechanism 4 Parallel Programming Workshop Linköping 2006 June 3 Parallelism in Modelica 42 Parallel Programming Workshop Linköping 2006 June 3

Integrating Parallelism and Mathematical Models Three Approaches Automatic Parallelization of Mathematical Models (ModPar) Parallelism over the method. Parallelism over time. Parallelism over the model equation system... with fine-grained task scheduling [Peter Aronsson s PhD thesis June 4, 2006] Coarse-Grained Explicit Parallelization Using Components Programmer structures the application into computational components using strongly-typed communication interfaces. Co-Simulation, Transmission-Line Modeling (TLM) Our GridModelica project Explicit Parallel Programming Providing general, easy-to-use explicit parallel programming constructs within the algorithmic part of the modeling language. NestStepModelica 43 Parallel Programming Workshop Linköping 2006 June 3 Automatic Parallelization of Mathematical Models (ModPar) (with Peter Aronsson) Parallelism over the method. Parallelism over time. Parallelism over the model equation system... with fine-grained task scheduling [Peter Aronsson s PhD thesis June 4, 2006] 44 Parallel Programming Workshop Linköping 2006 June 3

Modelica Simulations Simulation = solution of (hybrid) DAEs from models g( X, X, Y, t) = 0 h( X, Y, t) = 0 In each step of numerical solver: Calculate X in g (and Y in h) Parallelization approach: perform the calculation of in parallel Called parallelization over the system. X 45 Parallel Programming Workshop Linköping 2006 June 3 Automatic Generation of Parallel Code from Modelica Equation-Based Models 5 2 0 2 0 0 4 5 2 3 2 0 6 2 0 7 0 8 0 Speedup 3 2.5 2.5 0.5 2 4 8 6 # Proc Clustered Task Graph Thermofluid Pipe Application 46 Parallel Programming Workshop Linköping 2006 June 3

Multiprocessor Scheduling Problem Given Task Graph G = (V,E,τ, c) A fixed number of processors P,..P N Find for each task A processor assignment (or several) and a starting time such that Overall execution time is minimized Problem: Known scheduling algorithms perform bad on very fine grained task graphs. Solution : Increase granularity by merging tasks 47 Parallel Programming Workshop Linköping 2006 June 3 Clustering v.s. Merging 5 2 0 2 0 0 4 5 2 3 2 0 6 2 0 7 0 8 0 Clustered Task Graph 4 7 2 5 5 5 5 2 2 0 8 3 2 5 6 2 0 0 0 merging 0 2 5 0 2,4,5 4 7 0 8 3,6 6 Merged Task Graph 0 48 Parallel Programming Workshop Linköping 2006 June 3

Fine-Grained Automatic Parallelization Summary A task merging algorithm using graph rewrite system has been proposed and used Improved patterns to increase applicability Can easily be integrated in existing scheduling tools. Successfully merges tasks considering Bandwidth & Latency Task duplication Merging criterion: decrease Parallel Time, by decreasing tlevel (PT) Tested on examples from simulation code Speedup (on current examples) e.g. up to 4.5 49 Parallel Programming Workshop Linköping 2006 June 3 GridModelica Coarse-Grained Component-Level Parallel Simulation (with Kaj Nyström) Very large system of equations with computational models from several domains: Mechanical domain Electrical domain Chemical domain 50 Parallel Programming Workshop Linköping 2006 June 3

Outline of a Partitioned Model SubSystem 3 Solver: Euler Stepsize:0.00 TLM TLM SubSystem 4 Solver: LAPACK Stepsize:.0 TLM TLM SubSystem Solver: Dassl Stepsize:0. SubSystem 2 Solver: Lsode2 Stepsize:0.0 5 Parallel Programming Workshop Linköping 2006 June 3 Transmission Line Modeling Connections in Modelica are ideal and information exchange takes no time In the real world, this is not the case. The Idea: Let us use this time delay to decouple a model into smaller pieces Instead of one large equation system, we get two (or more). The solution to these systems is dependent only of the previous solutions to neighbouring systems. Result: Many smaller systems which can be solved independently 52 Parallel Programming Workshop Linköping 2006 June 3

Transmission Line Modeling (2) c, c2 are the TLM-parameters Ttlm is the information propagation time Zf is the implicit impedance c2 v,i v2,i2 c 53 Parallel Programming Workshop Linköping 2006 June 3 Advantages Partitioning at model level using drag and drop No knowledge in parallelization techniques needed Reduces model stiffness Domain independent (the Modelica way!) Separate solvers and settings possible... and of course, better performance 54 Parallel Programming Workshop Linköping 2006 June 3

Small Example Application (To SubSystem2) 55 Parallel Programming Workshop Linköping 2006 June 3 NestStepModelica Explicit Parallel Programming in Modelica (w. Christoph Kessler and Mattias Eriksson) Introduce simple explicit parallel programming model in Modelica algorithmic code BSP Bulk Synchronous Parallel Programming (master-slave) NestStep: BSP + Nested parallelism NestStepModelica: Modelica + NestStep runtime 56 Parallel Programming Workshop Linköping 2006 June 3

BSP Bulk Synchronous Parallelism BSP Model (Bulk-Synchronous Parallelism) [Valiant 990] BSP computer: abstract MIMD parallel computer (p, L, g) Group of p processors / threads (SPMD) Message passing Barrier synchronisation, overhead: L Communication data rate: g BSP program: Sequence of supersteps t (prog) = Σ step t (step) Superstep: Max. Computation time per processor: w Max. Communication volume per proc.: h t (step) = w + h g + L w 57 Parallel Programming Workshop Linköping 2006 June 3 NestStep BSP + shared variables / arrays + nested parallelism Set of extensions to existing languages NestStep-Java [K. 99] NestStep-C [K. 00, K. 04] NestStep-Modelica 58 Parallel Programming Workshop Linköping 2006 June 3

NestStep-Modelica Modelica: : step statement Shared Variables Integer x ( mem = shared );... step();... =... x...... x =... endstep(); Invariants: Superstep synchronicity: All processors of an (active) group work on the same superstep at the same time Superstep memory consistency: At entry and exit from a superstep, all copies of a shared variable have the same value on all processors of the group. Within a superstep, only its local value is visible. 59 Parallel Programming Workshop Linköping 2006 June 3 Implementation Issues Spanning tree over all processors of the group P 0 P P P 4 7 P 2 P 3 P 5 P6 2 Phases: combine Communication upwards commit Communication downwards Barrier synchronisation included ;-) Binomial trees better than d-ary trees [Sohl 06] 60 Parallel Programming Workshop Linköping 2006 June 3

Example: Parallel Prefix Sums i- prefix i = Σ j= a j for i =,...,N Example: a = {, 3,, 4 } prefix = { 0,, 4, 5 }, sum = 9 // Sequential in linear time: sum := 0; for i in :N loop prefix[i] := sum; sum := sum + a[i]; end for; 6 Parallel Programming Workshop Linköping 2006 June 3 Parallel Prefix Sums in NestStepModelica () function parprefix "Compute prefix sums in parallel" input Integer[:] arr ( mem="shared", distr="block ); output Integer[size(arr,)] arr ( mem="shared", distr="block") := arr; protected parameter Integer p = nprocessors(); Integer Ndp = N div p; // Assume p divides N for simplicity Integer[Ndp] prefix; // Local prefix array Integer mypresum; // Local prefix offset for this processor Integer sum ( mem="shared ); // Shared, consistent at superstep boundary Integer i, j; algorithm... // the parallel code comes here, see next slide end parprefix; 62 Parallel Programming Workshop Linköping 2006 June 3

Parallel Prefix Sums in NestStepModelica (2) 63 a: a: P0 P P0: prefix P: prefix P0: sum: P: sum: sum: mypresum: sum: mypresum: Parallel Programming Workshop Linköping 2006 June 3 algorithm j := ; // In Modelica, arrays start at >= step(); // Start of BSP superstep for i in arr loop // Iterate over local elements prefix[j] := sum; sum := sum + arr[i]; j:= j + ; end for; endstepreduce ( result=sum, op=operators.plus, prefixvar=mypresum ); endstep(); // End of BSP superstep j:= ; step(); for i in arr loop // Put prefix sum results into arr arr[i] := prefix[j] + mypresum; j := j + ; end for; endstep(); end parprefix; Speedup for Parallel prefix sums Xeon cluster Monolith, NSC Linköping 64 Parallel Programming Workshop Linköping 2006 June 3 [Sohl 06]

Nested BSP-Parallelism Parallelism Parallelism-creating constructs can be nested: statically: e.g. nested parallel loops dynamically: e.g. parallel divide-and-conquer computations generates massive parallelism In NestStep: split the group (= fork a parallel process) neststep( nsubgroups = 2, mysubgroup =... );... if (thisgroup().gid==0) foo(); else bar();... endneststep(); 65 Parallel Programming Workshop Linköping 2006 June 3 Summary NestStepModelica NestStep Run-time Shared-Memory Programming on Message-Passing Systems Alternative to OpenMP and UPC Simple, deterministic memory consistency and synchronization model Structured parallelism Run-time system on top of MPI on Linux clusters [Sohl 06]: Scalable implementation, up to 30x faster than OpenMP on Cluster-DSM with 32 processors NestStepModelica Expose explicit parallelism at the language level Encapsulated in the imperative parts of Modelica code Front-end (NestStep-Modelica C + NestStep RTS) as Modelica-Metamodel extension, under development Plans for further targets: CC-NUMA Chip multiprocessors, e.g. IBM CELL 66 Parallel Programming Workshop Linköping 2006 June 3

Conclusions Finegrained automatic parallelization of models Now operational and give speedups. Could give more for larger applications and if solver sould not be a single-processor bottleneck Coarse-grained manual component-oriented model parallelization on the GRID (GRIDModelica) Preliminary prototype started running. Explicit parallel programming in Modelica algorithmic code (NestStepModelica) Run-time system gives speedups compared to OpenMP. Integration with OpenModelica compiler under way. 67 Parallel Programming Workshop Linköping 2006 June 3 Contact www.ida.liu.se/projects/openmodelica Download OpenModelica and drmodelica www.mathcore.com/drmodelica Book web page, Download book chapter, DrModelica www.modelica.org Modelica Association peter.fritzson@ida.liu.se.com OpenModelica@ida.liu.se 68 Parallel Programming Workshop Linköping 2006 June 3