Task farming on Blue Gene
|
|
- Mitchell Little
- 6 years ago
- Views:
Transcription
1 Task farming on Blue Gene Fiona J. L. Reid July 3, 2006 Abstract In this paper we investigate how to implement a trivial task farm on the EPCC eserver Blue Gene/L system, BlueSky. This is achieved by adding a small number of MPI calls to an existing serial code. We illustrate the method using example codes and demonstrate it to be successful by application to a real user code.
2 Contents 1 Introduction 1 2 IBM eserver Blue Gene 2 3 Implementing a trivial task farm on Blue Gene Encapsulate the serial code with MPI calls Test cases - ClockModel code 5 5 Conclusions 6 6 Appendix Fortran 90 version of the serial test code Fortran 90 version of the serial test code with MPI calls added C version of the serial test code C version of the serial test code with MPI calls added ii
3 1 Introduction Many serial codes are limited by the total CPU time that they require to run. Often the individual tasks are actually independent of one another and therefore can be potentially be run simultaneously (in parallel) on different processors. This approach can greatly reduce the actual time required to obtain a scientific result. For example, consider a code which takes 1 hour to execute and requires 1000 runs to obtain a reliable solution. On a single processor this would require 1000 hours (42 days) of continuous runs. The same result could be obtained in just 1 hour if all the runs can be performed simultaneously using 1000 processors. Distributing the separate runs across many processors in such a way is known as task farming. Trivial task farming (or job farming) is one of the most common forms of parallelism available. It relies on being able to decompose your problem into a number of identical but independent serial tasks. Essentially, each processor (or node) runs its own copy of the serial code with its own input file(s) and output file(s). There is no communication required between the processes. The trivial task farming method is particularly suited to examining large independent parameter spaces or large independent datasets. Providing all tasks complete at the same time the there will be no load imbalance and linear scaling will be obtained. Trivial task farming can be very efficient and on many systems is relatively easy to implement. For example a Montecarlo simulation would be a good candidate for the trivial task farm approach. In a Montecarlo simulation the same model is typically run many times (with slightly different start points). This allows statistically significant summaries of the overall model behaviour to be built up. As each model takes approximately the same length of time to run, linear scaling will be attainable. The main advantages and disadvantages of the trivial task farm approach are given below: Advantages Generally easy to implement (on some systems it can be carried out via the batch system directly, e.g. lomond or via a taskfarm harness, e.g. HPCx) Can be very efficient - providing tasks take same length of time Linear scaling can be achieved Existing serial code can be used with minimal modification - in fact for some situations no modifications to the serial code are required whatsoever No communication overheads User may not require detailed knowledge of MPI techniques Disadvantages If tasks take different amounts of time then execution time will be governed by the slowest process Data/parameter space must be truly independent Not ideal for problems requiring communication between processes May restrict future code development - e.g. problem size will be limited to that which can fit on a single processor 1
4 2 IBM eserver Blue Gene BlueSky is an IBM eserver Blue Gene/L system consisting of a single cabinet containing 1024 compute chips (nodes). Each compute node consists of a dual-core 700MHz PowerPC 440 processor, 512 MB of RAM. A compute node can operate in two modes; Coprocessor (CO) mode or Virtual-Node (VN) mode. In Coprocessor mode one core handles communication whilst the other handles computation with 512 MB main memory available to the compute core. The idea behind this is that its possible for the programmer to overlap communications and computations and thus obtain optimal performance. In Virtual-Node mode both cores are used simultaneously for computation with 256MB main memory available to each core. In addition to the compute nodes there are also dedicated I/O nodes. The BlueSky service is a relatively I/O rich system and is configured with one I/O node for every eight compute nodes. The computes nodes run a lightweight Linux derived compute node kernel (CNK). The kernel offers only very limited functionality. The I/O nodes run a full Linux kernel. The rationale is to keep the compute nodes as uninterrupted by the operating system as possible by outsourcing the usual operating system tasks to dedicated additional hardware. For example on BlueSky the compute nodes (in CO mode) can access 508 MB of the total 512 MB main memory i.e. the CNK requires 4 MB. By comparison, a single 16 processor node of the HPCx [1, 2] system has 32 GB main memory, however, only 26.9 GB can be accessed by user code with the rest being required by the operating system. Finally the are four front-end nodes which provide the user interface to BlueSky. The front-end nodes consist of an IBM eserver BladeCenter JS20 with 4 blades. The frontend nodes run SUSE Linux and can be used for editing, compilation and job submission. Further details of the BlueSky system can be found at [3]. For the purposes of performing a trivial task farm users can think of the system as either up to 1024 processors each with 512 MB main memory (CO mode) or up to 2048 processors each with 256 MB main memory (VN mode). 3 Implementing a trivial task farm on Blue Gene Ideally we would like to run multiple copies (one copy per processor) of a serial code simultaneously, with each copy capable of accessing its own input/output file(s). On many high performance computing (HPC) systems (e.g. lomond [4, 5], various Linux clusters) the batch system can be used to execute multiple serial executables simultaneously with each running on a different processor. Unfortunately, this is not possible on either the HPCx or Blue Gene systems. Both HPCx and Blue Gene use the IBM scheduling software, Loadleveler, which does not allow more than one executable to be run simultaneously [6]. On the HPCx system this problem was overcome by using a task farm harness code which allows users to run multiple copies of a serial code with different input/output files without any modification to the serial code. Further details regarding the task farm harness code are available at: Essentially, the task farm harness code consists of an MPI wrapper code which invokes the serial executable by using the system() function/subroutine. e.g. 2
5 In Fortran: call system("./serialexename") or in C: int retcode; retcode = system("./serialexename"); would run the serial executable serialexename. Due to the reduced operating system installed on the compute nodes, the Blue Gene system does not allow calls to system on the compute nodes (backend). This means that the task farm harness code cannot be used and therefore another method of invoking the serial code must be found. As a result, all of methods considered in this paper will require some modifications to the serial code. In testing the different methods of implementing a trivial task farm on Blue Gene we make the following assumptions:- 1. The user has an existing serial code which runs on a single Blue Gene node 2. The memory requirements of the serial code do not exceed 512 MB (CO mode) 256 MB (VN mode) 3. This serial code can have both input and output file(s) or parameter sets 4. The file unit numbers (Fortran) are declared as variables within the serial code. If the file unit numbers are hard-wired then the serial code should be amended and tested prior to the addition of any MPI calls. To simulate such problem a simple test code has been written. The test code performs some simple statistical computations on an input dataset. The input data set consists of a vector of data of length nmax. The output file contains the statistics (mean and standard deviation) as computed from the input data. The full source for the test code is given in the Appendix. Several different approaches of implementing a task farm on Blue Gene are investigated: 1. Encapsulate the serial code with MPI calls 2. Place serial code inside a function and call this function from an MPI template code Both these approaches require very careful consideration of how file IO is handled. 3.1 Encapsulate the serial code with MPI calls For this approach an existing serial code is encapsulated with MPI calls. The encapsulated code will then be able to run on any number of processors. Input/output files require careful consideration. Essentially the procedure for a Fortran code is as follows: Add include "mpif.h" directly after the implicit none statement 3
6 Add the following block of code directly after all type declarations! MPI related declarations integer :: errcode, rank! New declarations to handle input/output files character (len=4) :: dir_id! Initialise MPI call MPI_INIT(errcode) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, errcode) Add call MPI FINALIZE(errcode) directly before the end program statement These changes will enable a copy of the serial code to run on each processor simultaneously. However, the input and output file names still require further consideration. Without further modification each processor will attempt to open the file taskfarm data.dat and will attempt to write output to the file taskfarm results.output. Clearly, this would result in the same input file being read in by all processors when in fact the user may wish a different file to be read in on each processor. Additionally, as all processors will attempt to write to the same output file this could also result in output files generated on one processor potentially being over-written by another processor. Therefore, some method of distinguishing which files are read from/written to by each processor is required. Probably the simplest way to achieve this is to place the input/output files in directories which are labelled in accordance with their MPI rank (or label the files in accordance with their MPI rank). The procedure for doing this for a Fortran code is as follows: We use the MPI rank to define a character variable, dir id, which will be used to determine the directory which contains the input/output files for a particular process, e.g. write(dir_id, (i4.4) )rank This statement should be executed after the call to MPI INIT and before any file open statements. As each MPI process has it s own copy of the input/output file unit numbers, iounit in and iounit out we do not need to change the file unit numbers (or file pointers for C/C++ codes), we only need to change the file names. We then modify all references to the input/output file names within the code so that they are preceded by the following "dir"//dir id// and add an additional / directly before the original file name e.g. open(unit=iounit_in, file = "dir"//dir_id//"/taskfarm_data.dat") Finally, before running the code you will need to ensure that the correct number of dir???? directories have been created and that the relevant input files are placed inside these directories. For example, if 4 processors are used, then the following directories need to be created prior to running the code: dir0000, dir0001, dir0002 and dir0003. The relevant input file(s) also need to be copied/moved into the relevant directory. A 4
7 Unix shell script could be used to achieve this. It may also be possible to achieve this via the Loadleveler batch script. A full version of the modified code is contained in the Appendix. It should be noted that the main body of the serial code remains completely unchanged. With the exception of the file name specific modifications all modifications occur at the beginning and end of the code. If the user doesn t want to place the input/output files into separate directories then the input and output filenames can simply be appended with the rank as follows: open(unit=iounit_in, file = "taskfarm_data"//dir_id//".dat") C or C++ codes can be also treated in a similar manner. A simple C example (testcode serial.c) and corresponding modified code containing the required MPI calls (testcode serial MPIwrapper.c) which allow the code to be run on several processors are also included in the Appendix for reference. 4 Test cases - ClockModel code To investigate the ease of applying this approach we have tested the method described in this paper on a real user supplied code. The code is a serial C code which models the biological clock of plants and was supplied by Professor Andrew Millar, University of Edinburgh. The serial code exists as a number of source files (*.c) and a single header file (*.h) which all the source files refer to. The serial code reads in a number of input files (up to 4) and writes to 3 output files. One of these output files is used for both input and output as the code executes (e.g. a solution is written out and subsequently re-read). The input and output files are opened from a number of different source files and therefore careful consideration of variable scope is required. The procedure used to implement a trivial task farm on the ClockModel code was very similar to that described in Section 3.1. The only additional complication arose from the fact that the input/output files are opened from both the main program and other functions out-with the scope of the main program. This means that the character variable (char dir id[4] in the sample code) used to control the output directory needs to be in global scope. This can be achieved by specifying this variable as an external variable within the header file (e.g. extern char dir id[]) and then declaring the variable prior to the (main) statement e.g. #include headerfile.h char dir_id[4]; int main(int argc, char* argv[])... We have successfully tested the trivial taskfarm on BlueSky by verifying that the same (or similar) results are obtained on all processors when using identical input files on each processor. Due to the nature of the code identical results cannot be obtained as a random number generator is used. 5
8 5 Conclusions Trivial task farming of a serial code can be performed relatively easily on the BlueSky machine allowing users to utilise a large number of processors simultaneously. Minimal modification to an existing serial code is required and no detailed MPI knowledge is needed. The method has been tested successfully on a real user application. Acknowledgements We would like to acknowledge the following for their support and assistance: Mark Bull and Joachim Hein. References [1] User s Guide to the HPCx Service (Version 2.02) [2] HPCx web page [3] User Guide to EPCC s BlueGene/L Service (Version 1.0), bgapps/userguide/bguser/bguser.html [4] Introduction to the University of Edinburgh HPC Service (Version 3.00) [5] Lomond web page [6] S. Kannan, P. Mayes, M. Roberts, D. Brelsford, and J. F. Skovira (2001). Workload Management with LoadLeveler, IBM Corp, SG
9 6 Appendix 6.1 Fortran 90 version of the serial test code! Serial code which is used to test various ways of performing a trivial! task farm on Blue Gene. The code reads in a vector of data from the input! file with unit iounit_in and writes out the mean and standard deviation to! the output file with iounit_out.! This test code attempts to simulate a typical user code that may be! appropriate for trivial task farming. program testcode_serial implicit none integer, parameter :: nmax = 10 real, dimension(nmax) :: adata real :: stddev = 0.0, mean = 0.0 integer :: i integer :: iounit_in, iounit_out iounit_in = 10 iounit_out = 11! Input data! Output data! Open input and output data files open(unit=iounit_in, file = "taskfarm_data.dat") open(unit=iounit_out, file = "taskfarm_results.output")! Read in input data from file with unit number iounit_in do i = 1, nmax read(iounit_in,*,err=100)adata(i) end do 100 continue write(*,*)"total number of points read from file = ",i-1! Close input file close(iounit_in)! Compute mean and standard deviation mean = sum(adata)/nmax do i = 1, nmax stddev = stddev + (adata(i) - mean)**2.0d0 end do stddev = sqrt(stddev/nmax)! Write results to output file with unit number iounit_out write(iounit_out,101)"standard deviation = ",stddev, " mean = ",mean 101 format(a21,x,f10.3,a8,x,f10.3)! Close output file 7
10 close(iounit_out) end program testcode_serial 6.2 Fortran 90 version of the serial test code with MPI calls added! Serial code with MPI calls inserted which will be used to perform a trivial! task farm on Blue Gene. The code reads in a vector of data from the input! file with iounit_in and writes out the mean and standard deviation to the! output file with iounit_out.! This test code attempts to simulate a typical user code that may be! appropriate for trivial task farming and includes the necessary MPI calls! in onder to run the code on multiple processors. program testcode_serial_mpiwrapper implicit none include "mpif.h" integer, parameter :: nmax = 10 real, dimension(nmax) :: adata real :: stddev = 0.0, mean = 0.0 integer :: i integer :: iounit_in, iounit_out! MPI related declarations integer :: errcode, rank! New declarations to handle input/output files character (len=4) :: dir_id! Initialise MPI call MPI_INIT(errcode) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, errcode)! Use the rank to define the directory name write(dir_id, (i4.4) )rank! Open input and output data files in directory dir**. The value! of "**" is determined by the rank of the process open(unit=iounit_in, file = "dir"//dir_id//"/taskfarm_data.dat") open(unit=iounit_out, file = "dir"//dir_id//"/taskfarm_results.output")! Read in input data from file with unit number iounit_in do i = 1, nmax read(iounit_in,*,err=100)adata(i) end do 100 continue 8
11 write(*,*)"total number of points read from file = ",i-1! Close input file close(iounit_in)! Compute mean and standard deviation mean = sum(adata)/nmax do i = 1, nmax stddev = stddev + (adata(i) - mean)**2.0d0 end do stddev = sqrt(stddev/nmax)! Write results to output file with unit number iounit_out write(iounit_out,101)"standard deviation = ",stddev, " mean = ",mean 101 format(a21,x,f10.3,a8,x,f10.3)! Close output file close(iounit_out)! Finalise MPI call MPI_FINALIZE (errcode) end program testcode_serial_mpiwrapper 6.3 C version of the serial test code #include <stdlib.h> #include <stdio.h> #include <math.h> #define nmax 10 int main() float adata[nmax]; float stddev = 0.0, mean = 0.0; int i; int count = 0; char fnamein[100], fnameout[100]; FILE *fpin, *fpout; /* Open input and output files */ sprintf(fnamein,"taskfarm_data.dat"); if (NULL == (fpin = fopen(fnamein,"r"))) fprintf(stderr, "Cannot open <%s>\n",fnamein); exit(-1); 9
12 sprintf(fnameout,"taskfarm_results.output"); if (NULL == (fpout = fopen(fnameout,"w"))) fprintf(stderr, "Cannot open <%s>\n",fnameout); exit(-1); /* Read in input data from file with file pointer fpin */ for (i = 0; i < nmax; i++) fscanf(fpin,"%f",&adata[i]); count = count + 1; printf("total number of points read from file = %d \n",count); /* Close the input file */ fclose(fpin); /* Compute mean and standard deviation */ for (i = 0; i < nmax; i++) mean = mean + adata[i]; mean = mean/nmax; for (i = 0; i < nmax; i++) stddev = stddev + (adata[i] - mean)*(adata[i] - mean); stddev = sqrt(stddev/nmax); /* Write results to output file */ fprintf(fpout,"standard deviation =%8.3f, mean = %8.3f \n",stddev,mean); /* Close output file */ fclose(fpout); 6.4 C version of the serial test code with MPI calls added #include <stdlib.h> #include <stdio.h> #include <math.h> #include <mpi.h> 10
13 #define nmax 10 int main(argc, argv) int argc; char *argv[]; float adata[nmax]; float stddev = 0.0, mean = 0.0; int i; int count = 0; char fnamein[100], fnameout[100]; FILE *fpin, *fpout; /* MPI related declarations */ int rank; /* New declarations to handle input/output files */ char dir_id[4]; /* Initialise MPI */ MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); /* Create character variable dir_id to control input/output directory */ sprintf(dir_id,"%i",rank); /* Open input and output files */ sprintf(fnamein,"dir%s/taskfarm_data.dat",dir_id); if (NULL == (fpin = fopen(fnamein,"r"))) fprintf(stderr, "Cannot open <%s>\n",fnamein); exit(-1); sprintf(fnameout,"dir%s/taskfarm_results.output",dir_id); if (NULL == (fpout = fopen(fnameout,"w"))) fprintf(stderr, "Cannot open <%s>\n",fnameout); exit(-1); /* Read in input data from file with file pointer fpin */ for (i = 0; i < nmax; i++) fscanf(fpin,"%f",&adata[i]); count = count + 1; printf("total number of points read from file = %d \n",count); 11
14 /* Close the input file */ fclose(fpin); /* Compute mean and standard deviation */ for (i = 0; i < nmax; i++) mean = mean + adata[i]; mean = mean/nmax; for (i = 0; i < nmax; i++) stddev = stddev + (adata[i] - mean)*(adata[i] - mean); stddev = sqrt(stddev/nmax); /* Write results to output file */ fprintf(fpout,"standard deviation =%8.3f, mean = %8.3f \n",stddev,mean); /* Close output file */ fclose(fpout); /* Finalise MPI */ MPI_Finalize (); 12
Message Passing Programming. Introduction to MPI
Message Passing Programming Introduction to MPI Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More information30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy
Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How
More informationMPI Program Structure
MPI Program Structure Handles MPI communicator MPI_COMM_WORLD Header files MPI function format Initializing MPI Communicator size Process rank Exiting MPI 1 Handles MPI controls its own internal data structures
More informationMPI introduction - exercises -
MPI introduction - exercises - Paolo Ramieri, Maurizio Cremonesi May 2016 Startup notes Access the server and go on scratch partition: ssh a08tra49@login.galileo.cineca.it cd $CINECA_SCRATCH Create a job
More informationIntroduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology
Introduction to the SHARCNET Environment 2010-May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology available hardware and software resources our web portal
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationExecuting Message-Passing Programs. Mitesh Meswani
Executing Message-assing rograms Mitesh Meswani resentation Outline Introduction to Top Gun (eserver pseries 690) MI on Top Gun (AIX/Linux) Itanium2 (Linux) Cluster Sun (Solaris) Workstation Cluster Environment
More informationSupercomputing in Plain English Exercise #6: MPI Point to Point
Supercomputing in Plain English Exercise #6: MPI Point to Point In this exercise, we ll use the same conventions and commands as in Exercises #1, #2, #3, #4 and #5. You should refer back to the Exercise
More informationParallel Programming with MPI: Day 1
Parallel Programming with MPI: Day 1 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH 43212-1163 1 Table of Contents Brief History of MPI
More informationPart One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary
C MPI Slurm Tutorial - Hello World Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program. Knowledge of C is assumed. Having read the
More informationIntroduction to the Message Passing Interface (MPI)
Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018
More informationITCS 4145/5145 Assignment 2
ITCS 4145/5145 Assignment 2 Compiling and running MPI programs Author: B. Wilkinson and Clayton S. Ferner. Modification date: September 10, 2012 In this assignment, the workpool computations done in Assignment
More informationIntroduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014
Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction
More informationMPI-Hello World. Timothy H. Kaiser, PH.D.
MPI-Hello World Timothy H. Kaiser, PH.D. tkaiser@mines.edu 1 Calls we will Use MPI_INIT( ierr ) MPI_Get_Processor_name(myname,resultlen,ierr) MPI_FINALIZE(ierr) MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr
More information518 Lecture Notes Week 3
518 Lecture Notes Week 3 (Sept. 15, 2014) 1/8 518 Lecture Notes Week 3 1 Topics Process management Process creation with fork() Overlaying an existing process with exec Notes on Lab 3 2 Process management
More informationComputer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview
Computer Science 322 Operating Systems Mount Holyoke College Spring 2010 Topic Notes: C and Unix Overview This course is about operating systems, but since most of our upcoming programming is in C on a
More informationApplication Performance on an e-server Blue Gene: Early Experiences. Lorna Smith EPCC, The University of Edinburgh
Application Performance on an e-server Blue Gene: Early Experiences Lorna Smith EPCC, The University of Edinburgh l.smith@epcc.ed.ac.uk 3-Jun-05 ScicomP 11 - Edinburgh 2 Overview Acknowledgements / contributors
More informationMPI-Hello World. Timothy H. Kaiser, PH.D.
MPI-Hello World Timothy H. Kaiser, PH.D. tkaiser@mines.edu 1 Calls we will Use MPI_INIT( ierr ) MPI_Get_Processor_name(myname,resultlen,ierr) MPI_FINALIZE(ierr) MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr
More informationDebugging on Blue Waters
Debugging on Blue Waters Debugging tools and techniques for Blue Waters are described here with example sessions, output, and pointers to small test codes. For tutorial purposes, this material will work
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationMPI: The Message-Passing Interface. Most of this discussion is from [1] and [2].
MPI: The Message-Passing Interface Most of this discussion is from [1] and [2]. What Is MPI? The Message-Passing Interface (MPI) is a standard for expressing distributed parallelism via message passing.
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More informationWorking with Shell Scripting. Daniel Balagué
Working with Shell Scripting Daniel Balagué Editing Text Files We offer many text editors in the HPC cluster. Command-Line Interface (CLI) editors: vi / vim nano (very intuitive and easy to use if you
More informationParallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy:
COMP528 MPI Programming, I www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Flynn s taxonomy: Parallel hardware SISD (Single
More informationIntroduction to Computing V - Linux and High-Performance Computing
Introduction to Computing V - Linux and High-Performance Computing Jonathan Mascie-Taylor (Slides originally by Quentin CAUDRON) Centre for Complexity Science, University of Warwick Outline 1 Program Arguments
More informationParallel Programming. Libraries and implementations
Parallel Programming Libraries and implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationTool for Analysing and Checking MPI Applications
Tool for Analysing and Checking MPI Applications April 30, 2010 1 CONTENTS CONTENTS Contents 1 Introduction 3 1.1 What is Marmot?........................... 3 1.2 Design of Marmot..........................
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationTech Computer Center Documentation
Tech Computer Center Documentation Release 0 TCC Doc February 17, 2014 Contents 1 TCC s User Documentation 1 1.1 TCC SGI Altix ICE Cluster User s Guide................................ 1 i ii CHAPTER 1
More informationMessage-Passing Programming with MPI
Message-Passing Programming with MPI Message-Passing Concepts David Henty d.henty@epcc.ed.ac.uk EPCC, University of Edinburgh Overview This lecture will cover message passing model SPMD communication modes
More informationSolution of Exercise Sheet 2
Solution of Exercise Sheet 2 Exercise 1 (Cluster Computing) 1. Give a short definition of Cluster Computing. Clustering is parallel computing on systems with distributed memory. 2. What is a Cluster of
More informationCOMP/CS 605: Introduction to Parallel Computing Topic : Distributed Memory Programming: Message Passing Interface
COMP/CS 605: Introduction to Parallel Computing Topic : Distributed Memory Programming: Message Passing Interface Mary Thomas Department of Computer Science Computational Science Research Center (CSRC)
More informationHybrid MPI and OpenMP Parallel Programming
Hybrid MPI and OpenMP Parallel Programming Jemmy Hu SHARCNET HPTC Consultant July 8, 2015 Objectives difference between message passing and shared memory models (MPI, OpenMP) why or why not hybrid? a common
More informationIBM Engineering and Scientific Subroutine Library V4.4 adds new LAPACK and Fourier Transform subroutines
Announcement ZP08-0527, dated November 11, 2008 IBM Engineering and Scientific Subroutine Library V4.4 adds new LAPACK and Fourier Transform subroutines Table of contents 1 At a glance 3 Offering Information
More informationSimple examples how to run MPI program via PBS on Taurus HPC
Simple examples how to run MPI program via PBS on Taurus HPC MPI setup There's a number of MPI implementations install on the cluster. You can list them all issuing the following command: module avail/load/list/unload
More informationITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...
ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure
More informationIntroduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign
Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s
More informationParallel Programming Using MPI
Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a
More informationrepresent parallel computers, so distributed systems such as Does not consider storage or I/O issues
Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines
More informationComputer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview
Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring 2009 Topic Notes: C and Unix Overview This course is about computer organization, but since most of our programming is
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing
More informationMessage-Passing Programming with MPI. Message-Passing Concepts
Message-Passing Programming with MPI Message-Passing Concepts Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationCreating a Shell or Command Interperter Program CSCI411 Lab
Creating a Shell or Command Interperter Program CSCI411 Lab Adapted from Linux Kernel Projects by Gary Nutt and Operating Systems by Tannenbaum Exercise Goal: You will learn how to write a LINUX shell
More informationMPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group
MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,
More informationUNIVERSITY OF NEBRASKA AT OMAHA Computer Science 4500/8506 Operating Systems Fall Programming Assignment 1 (updated 9/16/2017)
UNIVERSITY OF NEBRASKA AT OMAHA Computer Science 4500/8506 Operating Systems Fall 2017 Programming Assignment 1 (updated 9/16/2017) Introduction The purpose of this programming assignment is to give you
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18
More informationMPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018
MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,
More informationProgramming with MPI. Pedro Velho
Programming with MPI Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage - Who might be interested in those applications?
More informationCSE 374 Programming Concepts & Tools
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2017 Lecture 8 C: Miscellanea Control, Declarations, Preprocessor, printf/scanf 1 The story so far The low-level execution model of a process (one
More informationTHE UNIVERSITY OF WESTERN ONTARIO. COMPUTER SCIENCE 211a FINAL EXAMINATION 17 DECEMBER HOURS
Computer Science 211a Final Examination 17 December 2002 Page 1 of 17 THE UNIVERSITY OF WESTERN ONTARIO LONDON CANADA COMPUTER SCIENCE 211a FINAL EXAMINATION 17 DECEMBER 2002 3 HOURS NAME: STUDENT NUMBER:
More informationHolland Computing Center Kickstart MPI Intro
Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:
More informationMPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationParallel Computing: Overview
Parallel Computing: Overview Jemmy Hu SHARCNET University of Waterloo March 1, 2007 Contents What is Parallel Computing? Why use Parallel Computing? Flynn's Classical Taxonomy Parallel Computer Memory
More informationEarly experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007
Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some
More informationParallel Programming Overview
Parallel Programming Overview Introduction to High Performance Computing 2019 Dr Christian Terboven 1 Agenda n Our Support Offerings n Programming concepts and models for Cluster Node Core Accelerator
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationProgramming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho
Programming with MPI on GridRS Dr. Márcio Castro e Dr. Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage -
More informationLesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.
The goals of this lesson are: understanding the MPI programming model managing the MPI environment handling errors point-to-point communication 1. The MPI Environment Lesson 1 MPI (Message Passing Interface)
More informationPGAS: Partitioned Global Address Space
.... PGAS: Partitioned Global Address Space presenter: Qingpeng Niu January 26, 2012 presenter: Qingpeng Niu : PGAS: Partitioned Global Address Space 1 Outline presenter: Qingpeng Niu : PGAS: Partitioned
More informationParallel Performance of the XL Fortran random_number Intrinsic Function on Seaborg
LBNL-XXXXX Parallel Performance of the XL Fortran random_number Intrinsic Function on Seaborg Richard A. Gerber User Services Group, NERSC Division July 2003 This work was supported by the Director, Office
More informationBIL 104E Introduction to Scientific and Engineering Computing. Lecture 14
BIL 104E Introduction to Scientific and Engineering Computing Lecture 14 Because each C program starts at its main() function, information is usually passed to the main() function via command-line arguments.
More informationFaculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering
Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Course Number EE 8218 011 Section Number 01 Course Title Parallel Computing
More informationIntroduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.
Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright
More informationBuilding Library Components That Can Use Any MPI Implementation
Building Library Components That Can Use Any MPI Implementation William Gropp Mathematics and Computer Science Division Argonne National Laboratory Argonne, IL gropp@mcs.anl.gov http://www.mcs.anl.gov/~gropp
More informationFINAL TERM EXAMINATION SPRING 2010 CS304- OBJECT ORIENTED PROGRAMMING
FINAL TERM EXAMINATION SPRING 2010 CS304- OBJECT ORIENTED PROGRAMMING Question No: 1 ( Marks: 1 ) - Please choose one Classes like TwoDimensionalShape and ThreeDimensionalShape would normally be concrete,
More informationPROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18
PROCESS VIRTUAL MEMORY CS124 Operating Systems Winter 2015-2016, Lecture 18 2 Programs and Memory Programs perform many interactions with memory Accessing variables stored at specific memory locations
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationFile IO and command line input CSE 2451
File IO and command line input CSE 2451 File functions Open/Close files fopen() open a stream for a file fclose() closes a stream One character at a time: fgetc() similar to getchar() fputc() similar to
More informationOverview Interactive Data Language Design of parallel IDL on a grid Design of IDL clients for Web/Grid Service Status Conclusions
GRIDL: High-Performance and Distributed Interactive Data Language Svetlana Shasharina, Ovsei Volberg, Peter Stoltz and Seth Veitzer Tech-X Corporation HPDC 2005, July 25, 2005 Poster Overview Interactive
More informationProgramming Techniques for Supercomputers. HPC RRZE University Erlangen-Nürnberg Sommersemester 2018
Programming Techniques for Supercomputers HPC Services @ RRZE University Erlangen-Nürnberg Sommersemester 2018 Outline Login to RRZE s Emmy cluster Basic environment Some guidelines First Assignment 2
More informationIBM PSSC Montpellier Customer Center. Content
Content IBM PSSC Montpellier Customer Center Standard Tools Compiler Options GDB IBM System Blue Gene/P Specifics Core Files + addr2line Coreprocessor Supported Commercial Software TotalView Debugger Allinea
More informationCS Operating Systems Lab 3: UNIX Processes
CS 346 - Operating Systems Lab 3: UNIX Processes Due: February 15 Purpose: In this lab you will become familiar with UNIX processes. In particular you will examine processes with the ps command and terminate
More informationNIOS CPU Based Embedded Computer System on Programmable Chip
1 Objectives NIOS CPU Based Embedded Computer System on Programmable Chip EE8205: Embedded Computer Systems This lab has been constructed to introduce the development of dedicated embedded system based
More informationContents. Chapter 1 Overview of the JavaScript C Engine...1. Chapter 2 JavaScript API Reference...23
Contents Chapter 1 Overview of the JavaScript C Engine...1 Supported Versions of JavaScript...1 How Do You Use the Engine?...2 How Does the Engine Relate to Applications?...2 Building the Engine...6 What
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your
More informationAssignment 3 MPI Tutorial Compiling and Executing MPI programs
Assignment 3 MPI Tutorial Compiling and Executing MPI programs B. Wilkinson: Modification date: February 11, 2016. This assignment is a tutorial to learn how to execute MPI programs and explore their characteristics.
More informationCompute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 2003
Compute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 2003 Compute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 20031 Lab Objective...1
More informationHello, World! in C. Johann Myrkraverk Oskarsson October 23, The Quintessential Example Program 1. I Printing Text 2. II The Main Function 3
Hello, World! in C Johann Myrkraverk Oskarsson October 23, 2018 Contents 1 The Quintessential Example Program 1 I Printing Text 2 II The Main Function 3 III The Header Files 4 IV Compiling and Running
More informationSupercomputing in Plain English
Supercomputing in Plain English An Introduction to High Performance Computing Part VI: Distributed Multiprocessing Henry Neeman, Director The Desert Islands Analogy Distributed Parallelism MPI Outline
More informationACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009
ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009 What is ACEnet? Shared resource......for research computing... physics, chemistry, oceanography, biology, math, engineering,
More informationFractals exercise. Investigating task farms and load imbalance
Fractals exercise Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationName :. Roll No. :... Invigilator s Signature : INTRODUCTION TO PROGRAMMING. Time Allotted : 3 Hours Full Marks : 70
Name :. Roll No. :..... Invigilator s Signature :.. 2011 INTRODUCTION TO PROGRAMMING Time Allotted : 3 Hours Full Marks : 70 The figures in the margin indicate full marks. Candidates are required to give
More informationMPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com.
December 2003 Provided by ClusterWorld for Jeff Squyres cw.squyres.com www.clusterworld.com Copyright 2004 ClusterWorld, All Rights Reserved For individual private use only. Not to be reproduced or distributed
More informationPCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.
PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed
More informationSharpen Exercise: Using HPC resources and running parallel applications
Sharpen Exercise: Using HPC resources and running parallel applications Contents 1 Aims 2 2 Introduction 2 3 Instructions 3 3.1 Log into ARCHER frontend nodes and run commands.... 3 3.2 Download and extract
More informationC Compilation Model. Comp-206 : Introduction to Software Systems Lecture 9. Alexandre Denault Computer Science McGill University Fall 2006
C Compilation Model Comp-206 : Introduction to Software Systems Lecture 9 Alexandre Denault Computer Science McGill University Fall 2006 Midterm Date: Thursday, October 19th, 2006 Time: from 16h00 to 17h30
More informationCSE 303 Midterm Exam
CSE 303 Midterm Exam October 29, 2008 Name Sample Solution The exam is closed book, except that you may have a single page of hand written notes for reference. If you don t remember the details of how
More informationPROGRAMMAZIONE I A.A. 2017/2018
PROGRAMMAZIONE I A.A. 2017/2018 FUNCTIONS INTRODUCTION AND MAIN All the instructions of a C program are contained in functions. üc is a procedural language üeach function performs a certain task A special
More informationPreview from Notesale.co.uk Page 6 of 52
Binary System: The information, which it is stored or manipulated by the computer memory it will be done in binary mode. RAM: This is also called as real memory, physical memory or simply memory. In order
More informationKilling Zombies, Working, Sleeping, and Spawning Children
Killing Zombies, Working, Sleeping, and Spawning Children CS 333 Prof. Karavanic (c) 2015 Karen L. Karavanic 1 The Process Model The OS loads program code and starts each job. Then it cleans up afterwards,
More informationCpSc 1010, Fall 2014 Lab 10: Command-Line Parameters (Week of 10/27/2014)
CpSc 1010, Fall 2014 Lab 10: Command-Line Parameters (Week of 10/27/2014) Goals Demonstrate proficiency in the use of the switch construct and in processing parameter data passed to a program via the command
More informationMessage Passing Interface (MPI)
CS 220: Introduction to Parallel Computing Message Passing Interface (MPI) Lecture 13 Today s Schedule Parallel Computing Background Diving in: MPI The Jetson cluster 3/7/18 CS 220: Parallel Computing
More informationDocker task in HPC Pack
Docker task in HPC Pack We introduced docker task in HPC Pack 2016 Update1. To use this feature, set the environment variable CCP_DOCKER_IMAGE of a task so that it could be run in a docker container on
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationIntroduction to Parallel Programming Message Passing Interface Practical Session Part I
Introduction to Parallel Programming Message Passing Interface Practical Session Part I T. Streit, H.-J. Pflug streit@rz.rwth-aachen.de October 28, 2008 1 1. Examples We provide codes of the theoretical
More informationLecture 7: Distributed memory
Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing
More informationCS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco
CS 326 Operating Systems C Programming Greg Benson Department of Computer Science University of San Francisco Why C? Fast (good optimizing compilers) Not too high-level (Java, Python, Lisp) Not too low-level
More informationCS342 - Spring 2019 Project #3 Synchronization and Deadlocks
CS342 - Spring 2019 Project #3 Synchronization and Deadlocks Assigned: April 2, 2019. Due date: April 21, 2019, 23:55. Objectives Practice multi-threaded programming. Practice synchronization: mutex and
More informationNAG Library Function Document nag_dtr_load (f16qgc)
1 Purpose NAG Library Function Document nag_dtr_load () nag_dtr_load () initializes a real triangular matrix. 2 Specification #include #include void nag_dtr_load (Nag_OrderType order,
More information