Parallel Maximum Likelihood Fitting Using MPI
|
|
- Norma Little
- 5 years ago
- Views:
Transcription
1 Parallel Maximum Likelihood Fitting Using MPI Brian Meadows, U. Cincinnati and David Aston, SLAC Roofit Workshop, SLAC, Dec 6th,
2 What is MPI? Message Passing Interface - a standard defined for passing messages between processors (CPU s) Communications interface to Fortran, C or C++ (maybe others) Definitions apply across different platforms (can mix Unix, Mac, etc.) Parallelization of code is explicit - recognized and defined by users Memory can be Shared between CPU s Distributed among CPU s OR A hybrid of these Number of CPU s allowed is not pre-defined, but is fixed in any one application The required number of CPU s is defined by the user at job startup and does not undergo runtime optimization. Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 2
3 How Efficient is MPI? The best you can do is speed up a job by a factor equal to the number of physical CPU s involved. Factors limiting this Poor synchronization between CPU s due to unbalanced loads Sections of code that cannot be vectorized Signalling delays. NOTE it is possible to request more CPU s than physically exist This will produce some overhead in processing, though! Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 3
4 Running MPI Run the program with mpirun <job> -np N which submits N identical jobs to the system (You can also specify IP addresses for distributed CPU s) The OS in each machine allocates physical CPU s dynamically as usual. Each job is given an ID (0 N-1) which it can access needs to be in an identical environment to the others Users can use this ID to label a main job ( JOB0 for example) and the remaining satellite jobs. Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 4
5 Fitting with MPI For a fit, each job should be structured to be able to run the parts it is required to do: Any set up (read in events, etc.) The parts that are vectorized (e.g. its group of events or parameters). One job needs to be identified as the main one JOB0 and must do everything, farming out groups of events or parameters to the others. Each satellite job must send results ( signals ) back to JOB0 when done with its group and await return signal from JOB0 when it must start again. Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 5
6 How MPI Runs Scatter-Gather running CPU 0 CPU 0 CPU 0 CPU 0 m p i r u n CPU 1 CPU 2 Wait CPU 1 CPU 2 Wait CPU CPU Start Scatter Gather Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 6
7 Ways to Implement MPI in Maximum Likelihood Fitting Two main alternatives: A. Vectorize FCN - evaluates f(x) = -2Σ ln W B. Vectorize MINUIT (which finds the best parameters) Alternative A has been used in previous Babar analyses E.g. Mixing analysis of D 0 K + π - Alternative B is reported here (done by DYAEB and tested by BTM) An advantage of B over A is that the vectorization is implemented outside a user s code. Vectorizing FCN may not be efficient if an integral is computed on each call Unless the integral evaluation is also vectorized. Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 7
8 Vectorize FCN Log-likelihood always includes a sum: where n = number of events or bins. Vectorize computation of sum - 2 steps ( Scatter-Gather ): Scatter: Divide up events (or bins) among the CPU s. Each CPU computes Gather: Re-combine the N CPU s: Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 8
9 Vectorize FCN Computation of the integral: also needs to be vectorized This is usually a sum (over bins) so can be done in a similar way. Main advantage of this method: Assuming function evaluation dominates CPU cycles, your gain coefficient is close to 1.0 independent of number of CPU s or pars. Main dis-advantage: It requires that the user code each application appropriately. Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 9
10 Vectorize MINUIT Several algorithms in MINUIT: A. MIGRAD (Variable metric algorithm) Finds local minimum and error matrix at that point B. SIMPLEX (Nelder-Mead method) Linear programming method C. SEEK (MC method) Random search virtually obsolete Most often used is MIGRAD so focus on that Is easily vectorized, but results may not be at highest efficiency Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 10
11 One iteration in MIGRAD Compute function and gradient at current position Use current curvature metric to compute step: Take (large) step: Compute function and gradient there then (cubic) interpolate back to local minimum (may need to iterate) If satisfactory, improve Curvature metric Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 11
12 One iteration in MIGRAD Most of the time is spent in computing the gradient: Numerical evaluation of gradient requires 2 FCN calls per parameter: Vectorize this computation in two steps ( Scatter-Gather ): Scatter: Divide up parameters (x i ) among the CPU s. Each CPU computes Gather: Re-combine the N CPU s. Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 12
13 Vectorize MIGRAD This is less efficient the smaller the number of parameters Works well if NPAR comparable to the number of CPU s. Gain ~ NCPU*(NPAR + 2) / (NPAR + 2*NCPU) Max. Gain = NCPU 120.0% 100.0% Gain / Max. Ga 80.0% 60.0% 40.0% For 105 parameters a factor 3.7 was gained with 4 CPU s. 20.0% 0.0% Number of CPU's Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 13
14 Initialization of MPI Program FIT_Kpipi C C- Maximum likelihood fit of D -> Kpipi Dalitz plot. C Implicit none Save external fcn include 'mpif.h' MPIerr= 0 MPIrank= 0 MPIprocs= 1 MPIflag= 1 call MPI_INIT(MPIerr)! Initialize MPI call MPI_COMM_RANK(MPI_COMM_WORLD, MPIrank, MPIerr)! Get number of CPU s call MPI_COMM_SIZE(MPI_COMM_WORLD, MPIprocs, MPIerr)! Which one am I? call MINUIT, etc. call MPI_FINALIZE(MPIerr) Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 14
15 Use of Scatter-Gather Mechanism in MNDERI (Fortran) C Distribute the parameters from proc 0 to everyone 33 call MPI_BCAST(X, NPAR+1, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, MPIerr) C Use scatter-gather mechanism to compute subset of derivatives in each process: nperproc= (NPAR-1)/MPIprocs + 1 iproc1= 1+nperproc*MPIrank iproc2= MIN(NPAR,iproc1+nperproc-1) call MPI_SCATTER(GRD, nperproc, MPI_DOUBLE_PRECISION, A GRD(iproc1), nperproc, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, MPIerr) C C Loop over variable parameters DO 60 i=iproc1,iproc2 compute G(I) End Do C C Wait until everyone is done: call MPI_GATHER(GRD(iproc1), nperproc, MPI_DOUBLE_PRECISION, A GRD, nperproc, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, MPIerr) C everyone but proc 0 goes back to await the next set of parameters If ( MPIrank.ne.0) GO TO 33 C Continue computation (CPU 0 only) Roofit Workshop, SLAC, Dec 6th, 2007 B. Meadows, U. Cincinnati 15
Elementary Parallel Programming with Examples. Reinhold Bader (LRZ) Georg Hager (RRZE)
Elementary Parallel Programming with Examples Reinhold Bader (LRZ) Georg Hager (RRZE) Two Paradigms for Parallel Programming Hardware Designs Distributed Memory M Message Passing explicit programming required
More informationCSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)
Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts
More informationCINES MPI. Johanne Charpentier & Gabriel Hautreux
Training @ CINES MPI Johanne Charpentier & Gabriel Hautreux charpentier@cines.fr hautreux@cines.fr Clusters Architecture OpenMP MPI Hybrid MPI+OpenMP MPI Message Passing Interface 1. Introduction 2. MPI
More informationModern Methods of Data Analysis - WS 07/08
Modern Methods of Data Analysis Lecture XV (04.02.08) Contents: Function Minimization (see E. Lohrmann & V. Blobel) Optimization Problem Set of n independent variables Sometimes in addition some constraints
More informationIntroduction to parallel computing with MPI
Introduction to parallel computing with MPI Sergiy Bubin Department of Physics Nazarbayev University Distributed Memory Environment image credit: LLNL Hybrid Memory Environment Most modern clusters and
More informationParallel Programming Using MPI
Parallel Programming Using MPI Gregory G. Howes Department of Physics and Astronomy University of Iowa Iowa High Performance Computing Summer School University of Iowa Iowa City, Iowa 6-8 June 2012 Thank
More informationMessage Passing Programming. Designing MPI Applications
Message Passing Programming Designing MPI Applications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationFROM SERIAL FORTRAN TO MPI PARALLEL FORTRAN AT THE IAS: SOME ILLUSTRATIVE EXAMPLES
FROM SERIAL FORTRAN TO MPI PARALLEL FORTRAN AT THE IAS: SOME ILLUSTRATIVE EXAMPLES Let's begin with a simple trapezoidal integration program. This version creates 'nbin' bins, and sums the function `fcn'
More informationMPI: The Message-Passing Interface. Most of this discussion is from [1] and [2].
MPI: The Message-Passing Interface Most of this discussion is from [1] and [2]. What Is MPI? The Message-Passing Interface (MPI) is a standard for expressing distributed parallelism via message passing.
More informationL14 Supercomputing - Part 2
Geophysical Computing L14-1 L14 Supercomputing - Part 2 1. MPI Code Structure Writing parallel code can be done in either C or Fortran. The Message Passing Interface (MPI) is just a set of subroutines
More informationThe MPI Message-passing Standard Lab Time Hands-on. SPD Course 11/03/2014 Massimo Coppola
The MPI Message-passing Standard Lab Time Hands-on SPD Course 11/03/2014 Massimo Coppola What was expected so far Prepare for the lab sessions Install a version of MPI which works on your O.S. OpenMPI
More informationMPI Lab. How to split a problem across multiple processors Broadcasting input to other nodes Using MPI_Reduce to accumulate partial sums
MPI Lab Parallelization (Calculating π in parallel) How to split a problem across multiple processors Broadcasting input to other nodes Using MPI_Reduce to accumulate partial sums Sharing Data Across Processors
More informationMinimization with ROOT using TMinuit. Regis Terrier 10/15/01
Minimization with ROOT using TMinuit Regis Terrier 10/15/01 TMinuit TMinuit inherits from Tobject C++ translation of F. James's Minuit package See Documentation at http://root.cern.ch/root/html/tminuit.html
More informationMPI Lab. Steve Lantz Susan Mehringer. Parallel Computing on Ranger and Longhorn May 16, 2012
MPI Lab Steve Lantz Susan Mehringer Parallel Computing on Ranger and Longhorn May 16, 2012 1 MPI Lab Parallelization (Calculating p in parallel) How to split a problem across multiple processors Broadcasting
More informationPARALLEL FRAMEWORK FOR PARTIAL WAVE ANALYSIS AT BES-III EXPERIMENT
PARALLEL FRAMEWORK FOR PARTIAL WAVE ANALYSIS AT BES-III EXPERIMENT V.A. Tokareva a, I.I. Denisenko Laboratory of Nuclear Problems, Joint Institute for Nuclear Research, 6 Joliot-Curie, Dubna, Moscow region,
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing
More informationlslogin3$ cd lslogin3$ tar -xvf ~train00/mpibasic_lab.tar cd mpibasic_lab/pi cd mpibasic_lab/decomp1d
MPI Lab Getting Started Login to ranger.tacc.utexas.edu Untar the lab source code lslogin3$ cd lslogin3$ tar -xvf ~train00/mpibasic_lab.tar Part 1: Getting Started with simple parallel coding hello mpi-world
More informationMaximum Likelihood Fits on GPUs S. Jarp, A. Lazzaro, J. Leduc, A. Nowak, F. Pantaleo CERN openlab
Maximum Likelihood Fits on GPUs S. Jarp, A. Lazzaro, J. Leduc, A. Nowak, F. Pantaleo CERN openlab International Conference on Computing in High Energy and Nuclear Physics 2010 (CHEP2010) October 21 st,
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18
More informationIntroduction to Parallel Programming with MPI
Introduction to Parallel Programming with MPI PICASso Tutorial October 25-26, 2006 Stéphane Ethier (ethier@pppl.gov) Computational Plasma Physics Group Princeton Plasma Physics Lab Why Parallel Computing?
More informationMPI introduction - exercises -
MPI introduction - exercises - Paolo Ramieri, Maurizio Cremonesi May 2016 Startup notes Access the server and go on scratch partition: ssh a08tra49@login.galileo.cineca.it cd $CINECA_SCRATCH Create a job
More informationChapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin
More informationParallelism paradigms
Parallelism paradigms Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 23, 2011 Outline 1 Parallelization strategies 2 Shared memory 3 Distributed memory 4 Parallelization
More informationParallel Programming Using Basic MPI. Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center
05 Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Talk Overview Background on MPI Documentation Hello world in MPI Basic communications Simple
More informationLecture 9: Load Balancing & Resource Allocation
Lecture 9: Load Balancing & Resource Allocation Introduction Moler s law, Sullivan s theorem give upper bounds on the speed-up that can be achieved using multiple processors. But to get these need to efficiently
More informationarxiv: v1 [cs.dc] 7 Nov 2013
GooFit: A library for massively parallelising maximum-likelihood fits arxiv:1311.1753v1 [cs.dc] 7 Nov 2013 R Andreassen 1, B T Meadows 1, M de Silva 1, M D Sokoloff 1 and K Tomko 2 1 University of Cincinnati,
More information30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy
Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How
More informationMPI version of the Serial Code With One-Dimensional Decomposition. Timothy H. Kaiser, Ph.D.
MPI version of the Serial Code With One-Dimensional Decomposition Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Overview We will choose one of the two dimensions and subdivide the domain to allow the distribution
More informationAMath 483/583 Lecture 18 May 6, 2011
AMath 483/583 Lecture 18 May 6, 2011 Today: MPI concepts Communicators, broadcast, reduce Next week: MPI send and receive Iterative methods Read: Class notes and references $CLASSHG/codes/mpi MPI Message
More informationMPI Version of the Stommel Code with One and Two Dimensional Decomposition
MPI Version of the Stommel Code with One and Two Dimensional Decomposition Timothy H. Kaiser, Ph.D. tkaiser@sdsc.edu 1 Overview We will choose one of the two dimensions and subdivide the domain to allow
More informationOpenMP and MPI parallelization
OpenMP and MPI parallelization Gundolf Haase Institute for Mathematics and Scientific Computing University of Graz, Austria Chile, Jan. 2015 OpenMP for our example OpenMP generation in code Determine matrix
More informationBatch Jobs Performance Testing
Batch Jobs Performance Testing October 20, 2012 Author Rajesh Kurapati Introduction Batch Job A batch job is a scheduled program that runs without user intervention. Corporations use batch jobs to automate
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationThe MPI Message-passing Standard Lab Time Hands-on. SPD Course Massimo Coppola
The MPI Message-passing Standard Lab Time Hands-on SPD Course 2016-2017 Massimo Coppola Remember! Simplest programs do not need much beyond Send and Recv, still... Each process lives in a separate memory
More informationMPI introduction - exercises -
MPI introduction - exercises - Introduction to Parallel Computing with MPI and OpenMP P. Ramieri May 2015 Hello world! (Fortran) As an ice breaking activity try to compile and run the Helloprogram, either
More informationParallel Paradigms & Programming Models. Lectured by: Pham Tran Vu Prepared by: Thoai Nam
Parallel Paradigms & Programming Models Lectured by: Pham Tran Vu Prepared by: Thoai Nam Outline Parallel programming paradigms Programmability issues Parallel programming models Implicit parallelism Explicit
More informationParallel Programming & Cluster Computing
Parallel Programming & Cluster Computing Distributed Multiprocessing David Joiner, Kean University Tom Murphy, Contra Costa College Henry Neeman, University of Oklahoma Charlie Peck, Earlham College Kay
More informationParallel Programming Using MPI
Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a
More informationMPI Optimisation. Advanced Parallel Programming. David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh
MPI Optimisation Advanced Parallel Programming David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh Overview Can divide overheads up into four main categories: Lack of parallelism Load imbalance
More informationDelaunay-based Derivative-free Optimization via Global Surrogate. Pooriya Beyhaghi, Daniele Cavaglieri and Thomas Bewley
Delaunay-based Derivative-free Optimization via Global Surrogate Pooriya Beyhaghi, Daniele Cavaglieri and Thomas Bewley May 23, 2014 Delaunay-based Derivative-free Optimization via Global Surrogate Pooriya
More informationTheory Implementation Results Conclusion References. Cloud Computing. Special Task 2 - Parallel Merge Sort with MPI Summer Term 2018
Agrawal, Cocos, Merkl, Santos Summer Term 2018 Cloud Computing 1/19 Cloud Computing Special Task 2 - Parallel Merge Sort with MPI Summer Term 2018 Prachi Agrawal, Henry Cocos, David Merkl, Samuel Santos
More informationComPWA: A common amplitude analysis framework for PANDA
Journal of Physics: Conference Series OPEN ACCESS ComPWA: A common amplitude analysis framework for PANDA To cite this article: M Michel et al 2014 J. Phys.: Conf. Ser. 513 022025 Related content - Partial
More informationIntroduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.
Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright
More informationProgramming with MPI
Programming with MPI p. 1/?? Programming with MPI Miscellaneous Guidelines Nick Maclaren Computing Service nmm1@cam.ac.uk, ext. 34761 March 2010 Programming with MPI p. 2/?? Summary This is a miscellaneous
More informationAdvanced Operating Systems (CS 202) Scheduling (2)
Advanced Operating Systems (CS 202) Scheduling (2) Lottery Scheduling 2 2 2 Problems with Traditional schedulers Priority systems are ad hoc: highest priority always wins Try to support fair share by adjusting
More informationMPI Program Structure
MPI Program Structure Handles MPI communicator MPI_COMM_WORLD Header files MPI function format Initializing MPI Communicator size Process rank Exiting MPI 1 Handles MPI controls its own internal data structures
More informationRooFit Tutorial. Jeff Haas Florida State University April 16, 2010
RooFit Tutorial Jeff Haas Florida State University April 16, 2010 Outline Purpose Structure Basic Classes Implementation Toy Monte Carlo Fitting data Fitting options & results April 16, 2009 FSU CMS Meeting
More informationCEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced
1 / 32 CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole
More informationDistributed Memory Programming with Message-Passing
Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your
More informationLecture 9: MPI continued
Lecture 9: MPI continued David Bindel 27 Sep 2011 Logistics Matrix multiply is done! Still have to run. Small HW 2 will be up before lecture on Thursday, due next Tuesday. Project 2 will be posted next
More informationMPI MESSAGE PASSING INTERFACE
MPI MESSAGE PASSING INTERFACE David COLIGNON CÉCI - Consortium des Équipements de Calcul Intensif http://hpc.montefiore.ulg.ac.be Outline Introduction From serial source code to parallel execution MPI
More informationIntroduction to Parallel Programming Message Passing Interface Practical Session Part I
Introduction to Parallel Programming Message Passing Interface Practical Session Part I T. Streit, H.-J. Pflug streit@rz.rwth-aachen.de October 28, 2008 1 1. Examples We provide codes of the theoretical
More informationThe Message Passing Model
Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform
More informationCSC630/CSC730 Parallel & Distributed Computing
CSC630/CSC730 Parallel & Distributed Computing Analytical Modeling of Parallel Programs Chapter 5 1 Contents Sources of Parallel Overhead Performance Metrics Granularity and Data Mapping Scalability 2
More informationHolland Computing Center Kickstart MPI Intro
Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:
More informationCS 426. Building and Running a Parallel Application
CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations
More informationIntroduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/
Introduction to MPI Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Distributed Memory Programming (MPI) Message Passing Model Initializing and terminating programs Point to point
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationANALYSIS OF CMAQ PERFORMANCE ON QUAD-CORE PROCESSORS. George Delic* HiPERiSM Consulting, LLC, P.O. Box 569, Chapel Hill, NC 27514, USA
ANALYSIS OF CMAQ PERFORMANCE ON QUAD-CORE PROCESSORS George Delic* HiPERiSM Consulting, LLC, P.O. Box 569, Chapel Hill, NC 75, USA. INTRODUCTION CMAQ offers a choice of three gas chemistry solvers: Euler-Backward
More informationVulkan: Scaling to Multiple Threads. Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics
Vulkan: Scaling to Multiple Threads Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics www.imgtec.com Introduction Who am I? Kevin Sun Working at Imagination Technologies Take responsibility
More informationIntroduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014
Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction
More informationHPC Parallel Programing Multi-node Computation with MPI - I
HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright
More informationLinear, Quadratic, Exponential, and Absolute Value Functions
Linear, Quadratic, Exponential, and Absolute Value Functions Linear Quadratic Exponential Absolute Value Y = mx + b y = ax 2 + bx + c y = a b x y = x 1 What type of graph am I? 2 What can you tell me about
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More information2 TEST: A Tracer for Extracting Speculative Threads
EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath
More informationMessage Passing Programming. Introduction to MPI
Message Passing Programming Introduction to MPI Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationConstrained and Unconstrained Optimization
Constrained and Unconstrained Optimization Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Oct 10th, 2017 C. Hurtado (UIUC - Economics) Numerical
More informationReduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection
Switching Operational modes: Store-and-forward: Each switch receives an entire packet before it forwards it onto the next switch - useful in a general purpose network (I.e. a LAN). usually, there is a
More information8/25/2014. What is an Operating System? Operating systems: System calls (for programmers) From a user s perspective: System goals:
Rensselaer Polytechnic Institute CSCI-4210 Operating Systems CSCI-6140 Computer Operating Systems David Goldschmidt, Ph.D. What is an Operating System? The software interface between hardware and its users
More informationIntel MPI Library Conditional Reproducibility
1 Intel MPI Library Conditional Reproducibility By Michael Steyer, Technical Consulting Engineer, Software and Services Group, Developer Products Division, Intel Corporation Introduction High performance
More informationCollective Communication: Gatherv. MPI v Operations. root
Collective Communication: Gather MPI v Operations A Gather operation has data from all processes collected, or gathered, at a central process, referred to as the root Even the root process contributes
More informationParallel Programming, MPI Lecture 2
Parallel Programming, MPI Lecture 2 Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Introduction and Review The Von Neumann Computer Kinds of Parallel
More informationParallel Computing. Lecture 17: OpenMP Last Touch
CSCI-UA.0480-003 Parallel Computing Lecture 17: OpenMP Last Touch Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Some slides from here are adopted from: Yun (Helen) He and Chris Ding
More informationShared Memory programming paradigm: openmp
IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM
More informationWhat s in this talk? Quick Introduction. Programming in Parallel
What s in this talk? Parallel programming methodologies - why MPI? Where can I use MPI? MPI in action Getting MPI to work at Warwick Examples MPI: Parallel Programming for Extreme Machines Si Hammond,
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationParallelization. Tianhe-1A, 2.45 Pentaflops/s, 224 Terabytes RAM. Nigel Mitchell
Parallelization Tianhe-1A, 2.45 Pentaflops/s, 224 Terabytes RAM Nigel Mitchell Outline Pros and Cons of parallelization Shared memory vs. cluster computing MPI as a tool for sending and receiving messages
More informationMPI Casestudy: Parallel Image Processing
MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by
More informationLOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2016 Hermann Härtig LECTURE OBJECTIVES starting points independent Unix processes and block synchronous execution which component (point in
More information1/17/2012. What is an Operating System? Operating systems: From a user s perspective: System goals: What is system software? Operating systems:
Rensselaer Polytechnic Institute CSC 432 Operating Systems David Goldschmidt, Ph.D. What is an Operating System? The software interface between hardware and its users Operating systems: Execute user and
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More informationstatus and future of: frameworks for optimization & high-performance computing
status and future of: frameworks for optimization & high-performance computing Mike McKerns California Institute of Technology May 19, 2010 http://dev.danse.us/trac/mystic overview of major features and
More informationParallel Programming with MPI and OpenMP
Parallel Programming with MPI and OpenMP Michael J. Quinn (revised by L.M. Liebrock) Chapter 7 Performance Analysis Learning Objectives Predict performance of parallel programs Understand barriers to higher
More informationSlides prepared by : Farzana Rahman 1
Introduction to MPI 1 Background on MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers, and parallel programmers Used to create parallel programs based
More informationExercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing
Exercises: April 11 1 PARTITIONING IN MPI COMMUNICATION AND NOISE AS HPC BOTTLENECK LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2017 Hermann Härtig THIS LECTURE Partitioning: bulk synchronous
More informationChapter 4: Threads. Operating System Concepts 9 th Edit9on
Chapter 4: Threads Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads 1. Overview 2. Multicore Programming 3. Multithreading Models 4. Thread Libraries 5. Implicit
More informationProgramming with MPI. Pedro Velho
Programming with MPI Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage - Who might be interested in those applications?
More informationMATH 676. Finite element methods in scientific computing
MATH 676 Finite element methods in scientific computing Wolfgang Bangerth, Texas A&M University Lecture 41: Parallelization on a cluster of distributed memory machines Part 1: Introduction to MPI Shared
More informationAn Empirical Study of Per-Instance Algorithm Scheduling
An Empirical Study of Per-Instance Algorithm Scheduling Marius Lindauer, Rolf-David Bergdoll, and Frank Hutter University of Freiburg Abstract. Algorithm selection is a prominent approach to improve a
More informationSung-Eui Yoon ( 윤성의 )
CS480: Computer Graphics Curves and Surfaces Sung-Eui Yoon ( 윤성의 ) Course URL: http://jupiter.kaist.ac.kr/~sungeui/cg Today s Topics Surface representations Smooth curves Subdivision 2 Smooth Curves and
More informationUsing Hidden Markov Models to analyse time series data
Using Hidden Markov Models to analyse time series data September 9, 2011 Background Want to analyse time series data coming from accelerometer measurements. 19 different datasets corresponding to different
More informationMPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group
MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,
More informationIntroduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology
Introduction to the SHARCNET Environment 2010-May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology available hardware and software resources our web portal
More informationIntroduction to CosmoMC
Introduction to CosmoMC Part II: Installation and Execution Institut de Ciències del Cosmos - Universitat de Barcelona Dept. de Física Teórica y del Cosmos, Universidad de Granada, 1-3 Marzo 2016 Outline
More informationAdvanced Operating Systems (CS 202) Scheduling (1)
Advanced Operating Systems (CS 202) Scheduling (1) Today: CPU Scheduling 2 The Process The process is the OS abstraction for execution It is the unit of execution It is the unit of scheduling It is the
More informationSupercomputing in Plain English
Supercomputing in Plain English Distributed Multiprocessing Henry Neeman Director OU Supercomputing Center for Education & Research November 19 2004 The Desert Islands Analogy Distributed Parallelism MPI
More informationAMath 483/583 Lecture 21
AMath 483/583 Lecture 21 Outline: Review MPI, reduce and bcast MPI send and receive Master Worker paradigm References: $UWHPSC/codes/mpi class notes: MPI section class notes: MPI section of bibliography
More information