Advanced and Parallel Python
|
|
- Gary Greer
- 6 years ago
- Views:
Transcription
1 Advanced and Parallel Python December 1st, By: Bart Oldeman and Pier-Luc St-Onge 1
2 Financial Partners 2
3 Setup for the workshop 1. Get a user ID and password paper (provided in class): ##: usernm XXXXXXXXXX ********** 2. Access to local computer (replace ## and with appropriate values, is provided in class): a. User name: csuser## b. 3. HTTPS connection to Colosse (replace **********): a. b. User name: usernm c. Password: ********** d. If requested: i. click Start Server button, set walltime 8 3
4 Select Modules Change Notebook Kernel In the Software tab, select: compilers/llvm/3.7.1 compilers/gcc/4.8.5 Open notebooks/01-stack.ipynb File -> Save and Checkpoint 4
5 Import Examples and Exercises In case the cq-formation-advanced-python folder is not in your home directory, open a Terminal and type: module load apps/git/ # If on Colosse git clone -b ulaval \ cd cq-formation-advanced-python 5
6 Outline Revisiting the Scientific Python Stack Why (and What) is Python? Accelerating Python code: PyPy and Numpy Using C code from Python code Finding Bottlenecks - Profiling code Compiling Python Code Using Cython and Numba Parallelizing Python Programs Parallel Programming Concepts The multiprocessing Module MPI for Python (mpi4py) 6
7 The Scientific Python stack 7
8 Scientific Python stack In the introductory workshop we looked at: Python itself Numpy, for numerical array objects Scipy, for higher level routines IPython, an advanced Python shell Matplotlib, for plotting On top of that we introduce some new components, for example: Cython, for speed and interfacing mpi4py for using MPI in Python 8
9 Speeding up Python programs 9
10 Speeding up Python Central example: approx_pi.c / approx_pi.py: // approx_pi.c double approx_pi(int intervals) { double pi = 0.0; # approx_pi.py def approx_pi(intervals): pi = 0.0 int i; for (i = 0; i < intervals; i++) { pi += (4 - ((i % 2) * 8)) / (double)(2 * i + 1); } for i in range(intervals): pi += (4-8 * (i % 2)) / (float)(2 * i + 1) return pi return pi; } 10
11 Speeding up Python Compile: $ gcc -O2 pi_collect.c approx_pi.c -o pi_collect $./pi_collect Time = 0.88 sec Python run (example on Guillimin): $ module load iomkl/2015b Python/3.5.0 $ python pi_collect.py approx_pi The compiled C code runs almost 100 times faster than the Python code (0.88 vs. 66 seconds with intervals = ). Note that approx_pi is the module to import for pi_collect.py. 11
12 Speeding up Python How to speed up: two approaches 1. Make Python go faster a. Use the PyPy just-in-time compiler b. Use Numpy with vectorized code c. Use Cython 2. Call C code from Python a. b. c. d. e. Manually Use SWIG Use Ctypes Use Cython... 12
13 Speeding up Python using PyPy How to speed up: use PyPy: $ module add pypy/ $ pypy3 pi_collect.py approx_pi gives 2.2 seconds (30 times faster) An alternative to PyPy is Numba (not installed on Guillimin). 13
14 Speeding up with numpy How to speed up: use vectorized code: from future import division # only needed for Python 2.x def approx_pi(intervals): pi1 = 4/numpy.arange(1, intervals*2, 4) pi2 = -4/numpy.arange(3, intervals*2, 4) return numpy.sum(pi1) + numpy.sum(pi2) $ python3 pi_collect.py approx_pi_numpy gives 1.4 seconds (47 times faster). Drawback: extra memory use. How to speed up: Cython: see later 14
15 Interfacing with C/C++/Fortran 15
16 Interfacing with C and C++ There are at least 14 different ways to do it: By hand using the Python API (*) Pyrex Cython (**) SWIG (*) SIP Boost.Python PyCXX CTypes (*) Py++ f2py (*) PyD Interrogate Robin (*) Quick introduction Pybind11 (**) Most popular now, more thorough introduction 16
17 Using the Python API Pros: no extra dependencies Cons: a lot of boilerplate code, which can change between Python version /* Example of wrapping approx_pi() with the Python-C-API. */ #include <Python.h> #include "approx_pi.h" static PyObject* approx_pi_func(pyobject* self, PyObject* args) // wrapped approx_pi() { int value; double answer; if (!PyArg_ParseTuple(args, "i", &value)) // parse input, python float to c double return NULL; /* if the above function returns -1, an appropriate Python exception will * have been set, and the function simply returns NULL */ answer =approx_pi(value); /* construct the output from approx_pi, from c double to python float */ return Py_BuildValue("f", answer); } 17
18 Using the Python API /* define functions in module */ static PyMethodDef PiMethods[] = { {"approx_pi", approx_pi_func, METH_VARARGS, "approximate Pi"}, {NULL, NULL, 0, NULL} }; static struct PyModuleDef PiModule = { PyModuleDef_HEAD_INIT, "approx_pi_pyapi", NULL, -1, PiMethods, NULL, NULL, NULL, NULL }; /* module initialization */ PyMODINIT_FUNC PyInit_approx_pi_pyapi(void) { (void) PyModule_Create(&PiModule);} Compile using $ python3 setup_approx_pi_pyapi.py build_ext --inplace from distutils.core import setup, Extension # define the extension module module = Extension('approx_pi_pyapi', sources=['approx_pi_pyapi.c', 'approx_pi.c']) setup(ext_modules=[module]) # run the setup 18
19 Using CTypes Pros: the ctypes package is in Python by default, pure Python solution Cons: wrapped code in shared lib, interface not fast First compile approx_pi_ctypes.so: $ gcc -fpic -shared -O2 approx_pi.c -o approx_pi_ctypes.so # approx_pi_ctypes.py """ Example of wrapping approx_pi using ctypes. """ import ctypes approx_pi_dll = ctypes.cdll.loadlibrary('./approx_pi_ctypes.so') # find and load the library approx_pi_dll.approx_pi.argtypes = [ctypes.c_int] # set the argument type approx_pi_dll.approx_pi.restype = ctypes.c_double # set the return type def approx_pi(arg): ''' Wrapper for approx_pi ''' return approx_pi_dll.approx_pi(arg) 19
20 Using SWIG Mature solution Wrapper file is autogenerated from interface file. /* approx_pi_swig.i */ /* Example of wrapping approx_pi using SWIG. */ %module approx_pi_swig %{ /* the resulting C file should be built as a python extension */ #define SWIG_FILE_WITH_INIT /* Includes the header in the wrapper code */ #include "approx_pi.h" %} /* Parse the header file to generate wrappers */ %include "approx_pi.h" 20
21 Using SWIG Use distutils as before (python3 setup_approx_pi_swig.py build_ext --inplace) but mention the interface file in the setup script. from distutils.core import setup, Extension approx_pi_module = Extension("_approx_pi", sources=["approx_pi.c", "approx_pi.i"]) setup(ext_modules=[approx_pi_module]]) This generates three files: approx_pi_swig.py, approx_pi_swig_wrap.c, and _approx_pi_swig*.so 21
22 Using f2py Fortran version: approx_pi.f90 subroutine approx_pi(intervals, pi) integer, intent(in) :: intervals double precision, intent(out) :: pi integer i pi = 0 do i = 0, intervals - 1 pi = pi + (4 - (mod(i,2) * 8)) / dble(2 * i + 1) enddo end subroutine approx_pi Compile using f2py3 -c -m approx_pi_f2py approx_pi.f90 Then do python3 pi_collect.py approx_pi_f2py
23 Cython 23
24 Cython Cython compiles from Python (with extensions) to C. Based on Pyrex Goals: faster execution (especially with those extensions) and easier interoperability with other C code. Cython files use the.pyx extension. 24
25 Cython Example: approx_pi_cython1.pyx (same as approx_pi.py) def approx_pi(intervals): pi = 0.0 for i in range(intervals): pi += (4-8 * (i % 2)) / (float)(2 * i + 1) return pi Executing python3 setup_cython.py build_ext --inplace from distutils.core import setup from Cython.Build import cythonize setup(ext_modules = cythonize("*.pyx")) turns all.pyx files into.c files and.so modules Run python3 pi_collect.py approx_pi_cython seconds: the C code uses only Python objects. 25
26 Cython: declare variables Need to declare variables using cdef to make it fast Example: approx_pi_cython2.pyx def approx_pi(int intervals): cdef double pi cdef int i pi = 0.0 for i in range(intervals): pi += (4-8 * (i % 2)) / (float)(2 * i + 1) return pi Execute python3 setup_cython.py build_ext --inplace Run python3 pi_collect.py approx_pi_cython seconds: almost as fast as native C. 26
27 Cython: division Inspecting approx_pi_cython2.c we found it uses Pyx_mod_long( pyx_v_i, 2) instead of a plain pyx_v_i % 2. This is because in C, -1%10=-1 but in Python, -1%10=9. Here we can ignore this and tell Cython to use C behaviour, by adding a line #cython:cdivision=true Execute python3 setup_cython.py build_ext --inplace Check that approx_pi_cython3.c uses %. Run python3 pi_collect.py approx_pi_cython seconds: the same as native C. Note: use Cython in IPython/Jupyter using %load_ext cythonmagic and %%cython in a cell. 27
28 Cython: wrapping C code Last but not least: interfacing with C code: # approx_pi_cython4.pyx cdef extern from "approx_pi.h": double c_approx_pi "approx_pi" (int intervals) # C name: approx_pi, Cython name: c_approx_pi def approx_pi(int intervals): return c_approx_pi(intervals) Plus special setup_cython4.py script from distutils.core import setup, Extension from Cython.Distutils import build_ext setup(cmdclass={'build_ext': build_ext}, ext_modules=[extension("approx_pi_cython4", sources=["approx_pi_cython4.pyx", "approx_pi.c"])]) Execute python3 setup_cython4.py build_ext Run python3 pi_collect.py approx_pi_cython4 --inplace
29 Parallel Programming Concepts 29
30 Vocabulary Serial tasks Any task that cannot be split in two simultaneous sequences of actions Examples: starting a process, reading a file, any communication between two processes Parallel tasks Data parallelism: same action applied on different data. Could be serial tasks done in parallel. Process parallelism: one action on one set of data. Action split in multiple processes or threads. Data partitioning: rectangles or blocks 30
31 Parallel tasks Parallel efficiency (scaling) Amdahl s law: how long does it take to compute a task with an infinite number of processors? Gustafson's law: what size of problem can we solve in a given time with N processors? Shared memory Multiple threads share the same memory space in a single process: full read and write access. Distributed memory Each process has its own memory space Information is sent and received by messages 31
32 Distributed Memory Model Process 1 Different variables! Network A(10) A(10) Process 2 32
33 Serial Code Parallelization Implicit Parallelization - minimum work for you Threaded libraries (MKL, ACML, GOTO, etc.) Compiler directives (OpenMP) Good for desktops and shared memory machines Explicit Parallelization - work is required! You tell what should be done on what CPU Solution for distributed clusters (shared nothing!) Hybrid Parallelization - work is required! Mix of implicit and explicit parallelization Vectorization and parallel CPU instructions Good for accelerators (CUDA, OpenCL, etc.) 33
34 The multiprocessing Module 34
35 The multiprocessing Module Because of the implementation of CPython, only one thread at a time can execute Python code This avoids common issues with the shared memory model: race condition,... There is a threading module, but it is no longer recommended Solution: the multiprocessing module! 35
36 Pool of Workers For embarrassingly parallel tasks, the Pool class allows the creation of worker processes. Each process will compute different data. Warning: only works in a script! from multiprocessing import Pool def prod(values): return values[0] * values[1] if name == ' main ': N = 12 values = [(i + 1, N - i) for i in range(0, N)] print(values) workers = Pool(processes=4) results = workers.map(prod, values) print(results) 36
37 Pool of Workers Run: python script.py What happens with 4 workers: 37
38 Pool of Workers Asynchronous map calls can be used in order to do something else in the main process. The map_async() method returns an AsyncResult object which can wait until all workers are done. from multiprocessing import Pool import time def prod(values): time.sleep(1) return values[0] * values[1] if name == ' main ': N = 12 values = [(i + 1, N - i) for i in range(0, N)] print(values) workers = Pool(processes=4) results = workers.map_async(prod, values) print('waiting...') print(results.get(timeout=10)) 38
39 Pool of Workers Asynchronous map calls can use a callback function. Then, the main thread has to wait by first closing the access to workers, and by joining the pool of workers. def printres(results): print(results) if name == ' main ': N = 12 values = [(i + 1, N - i) for i in range(0, N)] print(values) workers = Pool(processes=4) results = workers.map_async(prod, values, callback=printres) print('waiting...') workers.close() workers.join() 39
40 Pool of Workers class Pool([processes[,...]]) processes: number of worker processes. If None, processes=multiprocessing.cpu_count() Methods: map(func, iterable[,...]): returns results map_async(func, iterable[,...]): returns an AsyncResult object close(): closes access to worker processes join(): waiting for all workers to exit. Must call close() before. 40
41 Pool of Workers class AsyncResult Methods: get([timeout]): blocking, get results as soon as they are available. In case of error, get wait([timeout]): blocking, waits until the call is done ready(): non-blocking, returns a boolean indicating if the call has completed. successful(): non-blocking, returns a boolean indicating if the call has succeeded. 41
42 Exercise - Baby Genomic Edit baby-genomic.py Use a pool of 4 workers Use the asynchronous map function Provide a callback function that will print results at the end Tip: use the edproxy() function in order to call the real editdistance() function. Run: time -p python baby-genomic.py 42
43 The Process class The Process class: manually spawn and control each process Process(target=fct, args=(arg1,arg2)).start() Communication channels: The Pipe class: to communicate between two processes, one sends data, one receives data The Queue class: a shared pipe managed with locks and semaphores, one puts data, one gets data Synchronization: The Lock class: one acquires lock, one releases lock 43
44 MPI for Python (mpi4py) 44
45 MPI for Python The mpi4py package provides bindings from Python to MPI (Message Passing Interface). MPI functions are then available in Python but with some simplifications: MPI_Init() and MPI_Finalize() are done automatically The bindings can auto-detect many values that need to be specified as explicit parameters in the C and Fortran bindings. Example: dest = 1; tag = 54321; MPI_Send( &matrix, count, MPI_INT, dest, tag, MPI_COMM_WORLD ) becomes MPI.COMM_WORLD.Send(matrix, dest=1, tag=54321) 45
46 MPI for Python Import as from mpi4py import MPI Then often use comm = MPI.COMM_WORLD Two variations for most functions: a. all lowercase, e.g. comm.recv() works on general Python objects, using pickle (can be slow) received object (value) returned: matrix = comm.recv(source=0, tag=mpi.any_tag) b. capitalized, e.g. comm.recv() works fast on numpy arrays & other buffers received object given as parameter: comm.recv(matrix, source=0, tag=mpi.any_tag) Specify [matrix, MPI.INT], or [data, count, MPI.INT] if autodetection fails. 46
47 Conclusions Main techniques covered: Speeding up: PyPy, Numba, CTypes, Cython Parallel programming: multiprocessing, mpi4py Useful links: erfacing_with_c.html
48 Questions? Calcul Quebec support team: Specific site support teams: 48
http://tinyurl.com/cq-advanced-python-20151029 1 2 ##: ********** ## csuser## @[S## ********** guillimin.hpc.mcgill.ca class## ********** qsub interactive.pbs 3 cp -a /software/workshop/cq-formation-advanced-python
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):
More informationIntroduction to Python for Scientific Computing
1 Introduction to Python for Scientific Computing http://tinyurl.com/cq-intro-python-20151022 By: Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@calculquebec.ca, Bart.Oldeman@mcgill.ca Partners and
More informationC - extensions. only a small part of application benefits from compiled code
C - EXTENSIONS C - extensions Some times there are time critical parts of code which would benefit from compiled language 90/10 rule: 90 % of time is spent in 10 % of code only a small part of application
More informationInterfacing With Other Programming Languages Using Cython
Lab 19 Interfacing With Other Programming Languages Using Cython Lab Objective: Learn to interface with object files using Cython. This lab should be worked through on a machine that has already been configured
More informationExtensions in C and Fortran
Extensions in C and Fortran Why? C and Fortran are compiled languages Source code is translated to machine instructons by the compiler before you run. Ex: gfortran -o mycode mycode.f90 gcc -o mycode mycode.c
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your
More informationPYTHON IS SLOW. Make it faster with C. Ben Shaw
PYTHON IS SLOW Make it faster with C Ben Shaw It s OK that Python isn t fast, you can write your slow functions in C! Everyone TABLE OF CONTENTS C Module vs C Types TABLE OF CONTENTS C Module vs C Types
More informationPython Optimization and Integration
[Software Development] Python Optimization and Integration Davide Balzarotti Eurecom Sophia Antipolis, France 1 When Python is not Enough Python is great for rapid application development Many famous examples...
More informationPython Scripting for Computational Science
Hans Petter Langtangen Python Scripting for Computational Science Third Edition With 62 Figures 43 Springer Table of Contents 1 Introduction... 1 1.1 Scripting versus Traditional Programming... 1 1.1.1
More informationPython, C, C++, and Fortran Relationship Status: It s Not That Complicated. Philip Semanchuk
Python, C, C++, and Fortran Relationship Status: It s Not That Complicated Philip Semanchuk (philip@pyspoken.com) This presentation is part of a talk I gave at PyData Carolinas 2016. This presentation
More informationPython Scripting for Computational Science
Hans Petter Langtangen Python Scripting for Computational Science Third Edition With 62 Figures Sprin ger Table of Contents 1 Introduction 1 1.1 Scripting versus Traditional Programming 1 1.1.1 Why Scripting
More informationHigh Performance Python Micha Gorelick and Ian Ozsvald
High Performance Python Micha Gorelick and Ian Ozsvald Beijing Cambridge Farnham Koln Sebastopol Tokyo O'REILLY 0 Table of Contents Preface ix 1. Understanding Performant Python 1 The Fundamental Computer
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More informationMixed language programming
Mixed language programming Simon Funke 1,2 Ola Skavhaug 3 Joakim Sundnes 1,2 Hans Petter Langtangen 1,2 Center for Biomedical Computing, Simula Research Laboratory 1 Dept. of Informatics, University of
More informationHigh Performance Computing with Python
High Performance Computing with Python Pawel Pomorski SHARCNET University of Waterloo ppomorsk@sharcnet.ca April 29,2015 Outline Speeding up Python code with NumPy Speeding up Python code with Cython Using
More informationMixed language programming with NumPy arrays
Mixed language programming with NumPy arrays Simon Funke 1,2 Ola Skavhaug 3 Joakim Sundnes 1,2 Hans Petter Langtangen 1,2 Center for Biomedical Computing, Simula Research Laboratory 1 Dept. of Informatics,
More informationScientific Computing Using. Atriya Sen
Scientific Computing Using Atriya Sen Broad Outline Part I, in which I discuss several aspects of the Python programming language Part II, in which I talk about some Python modules for scientific computing
More informationRunning Cython. overview hello world with Cython. experimental setup adding type declarations cdef functions & calling external functions
Running Cython 1 Getting Started with Cython overview hello world with Cython 2 Numerical Integration experimental setup adding type declarations cdef functions & calling external functions 3 Using Cython
More informationDiffusion processes in complex networks
Diffusion processes in complex networks Digression - parallel computing in Python Janusz Szwabiński Outlook: Multiprocessing Parallel computing in IPython MPI for Python Cython and OpenMP Python and OpenCL
More informationCython. April 2008 Brian Blais
Cython O p t i m i z a t i o n i n P y t h o n April 2008 Brian Blais Rule #1 of Optimization Premature optimization is the root of all evil - Donald Knuth What is Cython/Pyrex? Python to C/Python-API
More informationSpeeding up Python. Antonio Gómez-Iglesias April 17th, 2015
Speeding up Python Antonio Gómez-Iglesias agomez@tacc.utexas.edu April 17th, 2015 Why Python is nice, easy, development is fast However, Python is slow The bottlenecks can be rewritten: SWIG Boost.Python
More informationHolland Computing Center Kickstart MPI Intro
Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:
More informationRobot Vision Systems Lecture 8: Python wrappers in OpenCV
Robot Vision Systems Lecture 8: Python wrappers in OpenCV Michael Felsberg michael.felsberg@liu.se Why Python Wrappers Assume a small library based on OpenCV Python interface for Testing Distribution Prototyping
More informationmultiprocessing and mpi4py
multiprocessing and mpi4py 02-03 May 2012 ARPA PIEMONTE m.cestari@cineca.it Bibliography multiprocessing http://docs.python.org/library/multiprocessing.html http://www.doughellmann.com/pymotw/multiprocessi
More informationProgramming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam
Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction
More informationLECTURE 7: STUDENT REQUESTED TOPICS
1 LECTURE 7: STUDENT REQUESTED TOPICS Introduction to Scientific Python, CME 193 Feb. 20, 2014 Please download today s exercises from: web.stanford.edu/~ermartin/teaching/cme193-winter15 Eileen Martin
More informationCNRS ANF PYTHON Packaging & Life Cycle
CNRS ANF PYTHON Packaging & Life Cycle Marc Poinot Numerical Simulation Dept. Outline Package management with Python Concepts Software life cycle Package services Pragmatic approach Practical works Source
More informationCIS192 Python Programming
CIS192 Python Programming Graphical User Interfaces Robert Rand University of Pennsylvania December 03, 2015 Robert Rand (University of Pennsylvania) CIS 192 December 03, 2015 1 / 21 Outline 1 Performance
More informationCS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with
More informationAstronomical Data Analysis with Python
Astronomical Data Analysis with Python Lecture 8 Yogesh Wadadekar NCRA-TIFR July August 2010 Yogesh Wadadekar (NCRA-TIFR) Topical course 1 / 27 Slides available at: http://www.ncra.tifr.res.in/ yogesh/python_course_2010/
More informationScientific Computing with Python and CUDA
Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 1 / 55 Inhalt 1 A
More informationComputer Architecture
Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 2 Part I Programming
More informationIntroduction to the Julia language. Marc Fuentes - SED Bordeaux
Introduction to the Julia language Marc Fuentes - SED Bordeaux Outline 1 motivations Outline 1 motivations 2 Julia as a numerical language Outline 1 motivations 2 Julia as a numerical language 3 types
More informationSession 12: Introduction to MPI (4PY) October 9 th 2018, Alexander Peyser (Lena Oden)
Session 12: Introduction to MPI (4PY) October 9 th 2018, Alexander Peyser (Lena Oden) Overview Introduction Basic concepts mpirun Hello world Wrapping numpy arrays Common Pitfalls Introduction MPI: de
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationAdvanced Message-Passing Interface (MPI)
Outline of the workshop 2 Advanced Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Morning: Advanced MPI Revision More on Collectives More on Point-to-Point
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationAn introduction to scientific programming with. Session 5: Extreme Python
An introduction to scientific programming with Session 5: Extreme Python Managing your environment Efficiently handling large datasets Optimising your code Squeezing out extra speed Writing robust code
More informationParallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially
More informationBlue Waters Programming Environment
December 3, 2013 Blue Waters Programming Environment Blue Waters User Workshop December 3, 2013 Science and Engineering Applications Support Documentation on Portal 2 All of this information is Available
More informationPython where we can, C ++ where we must
Python where we can, C ++ where we must Source: http://xkcd.com/353/ Guy K. Kloss Python where we can,c++ where we must 1/28 Python where we can, C ++ where we must Guy K. Kloss BarCamp Auckland 2007 15
More informationSession 12: Introduction to MPI (4PY) October 10 th 2017, Lena Oden
Session 12: Introduction to MPI (4PY) October 10 th 2017, Lena Oden Overview Introduction Basic concepts mpirun Hello world Wrapping numpy arrays Common Pittfals Introduction MPI de facto standard for
More informationParallelism paradigms
Parallelism paradigms Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 23, 2011 Outline 1 Parallelization strategies 2 Shared memory 3 Distributed memory 4 Parallelization
More informationSpeeding up Python using Cython
Speeding up Python using Cython Rolf Boomgaarden Thiemo Gries Florian Letsch Universität Hamburg November 28th, 2013 What is Cython? Compiler, compiles Python-like code to C-code Code is still executed
More informationRunning Cython and Vectorization
Running Cython and Vectorization 1 Getting Started with Cython overview hello world with Cython 2 Numerical Integration experimental setup adding type declarations cdef functions & calling external functions
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationGuillimin HPC Users Meeting December 14, 2017
Guillimin HPC Users Meeting December 14, 2017 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Please be kind to your fellow user meeting attendees Limit
More informationExceptions in Python. AMath 483/583 Lecture 27 May 27, Exceptions in Python. Exceptions in Python
AMath 483/583 Lecture 27 May 27, 2011 Today: Python exception handling Python plus Fortran: f2py Next week: More Python plus Fortran Visualization Parallel IPython Read: Class notes and references If you
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationHigh Performance Computing with Python
High Performance Computing with Python Pawel Pomorski SHARCNET University of Waterloo ppomorsk@sharcnet.ca March 15,2017 Outline Speeding up Python code with NumPy Speeding up Python code with Cython Speeding
More informationmultiprocessing HPC Python R. Todd Evans January 23, 2015
multiprocessing HPC Python R. Todd Evans rtevans@tacc.utexas.edu January 23, 2015 What is Multiprocessing Process-based parallelism Not threading! Threads are light-weight execution units within a process
More informationChip Multiprocessors COMP Lecture 9 - OpenMP & MPI
Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather
More informationConcurrency, Thread. Dongkun Shin, SKKU
Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationProgramming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam
Clemens Grelck University of Amsterdam UvA / SURFsara High Performance Computing and Big Data Message Passing as a Programming Paradigm Gentle Introduction to MPI Point-to-point Communication Message Passing
More informationThreaded Programming. Lecture 9: Alternatives to OpenMP
Threaded Programming Lecture 9: Alternatives to OpenMP What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming
More informationCython: Stop writing native Python extensions in C
Python extensions March 29, 2016 cython.org programming language similar to Python static typing from C/C++ compiler from Cython language to C/C++ to Python extension module or to standalone apps* feels
More informationRunning Cython and Vectorization
Running Cython and Vectorization 1 Getting Started with Cython overview hello world with Cython 2 Numerical Integration experimental setup adding type declarations cdef functions & calling external functions
More informationPython for Earth Scientists
Python for Earth Scientists Andrew Walker andrew.walker@bris.ac.uk Python is: A dynamic, interpreted programming language. Python is: A dynamic, interpreted programming language. Data Source code Object
More informationImplementation of Parallelization
Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization
More informationAdvanced MPI. Andrew Emerson
Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One
More informationAllinea DDT Debugger. Dan Mazur, McGill HPC March 5,
Allinea DDT Debugger Dan Mazur, McGill HPC daniel.mazur@mcgill.ca guillimin@calculquebec.ca March 5, 2015 1 Outline Introduction and motivation Guillimin login and DDT configuration Compiling for a debugger
More informationMPI: the Message Passing Interface
15 Parallel Programming with MPI Lab Objective: In the world of parallel computing, MPI is the most widespread and standardized message passing library. As such, it is used in the majority of parallel
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationMixed Python/C programming with Cython September /14. Mixed Python/C programming with Cython Ben Dudson, 22nd September 2017
Mixed Python/C programming with Cython September 2017 1/14 Mixed Python/C programming with Cython Ben Dudson, 22nd September 2017 Mixed Python/C programming with Cython September 2017 2/14 Cython http://cython.org/
More informationIntroduction to Scientific Python, CME 193 Jan. 9, web.stanford.edu/~ermartin/teaching/cme193-winter15
1 LECTURE 1: INTRO Introduction to Scientific Python, CME 193 Jan. 9, 2014 web.stanford.edu/~ermartin/teaching/cme193-winter15 Eileen Martin Some slides are from Sven Schmit s Fall 14 slides 2 Course Details
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationPCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.
PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed
More informationGetting along and working together. Fortran-Python Interoperability Jacob Wilkins
Getting along and working together Fortran-Python Interoperability Jacob Wilkins Fortran AND Python working together? Fortran-Python May 2017 2/19 Two very different philosophies Two very different code-styles
More informationAdministrivia. HW1 due Oct 4. Lectures now being recorded. I ll post URLs when available. Discussing Readings on Monday.
Administrivia HW1 due Oct 4. Lectures now being recorded. I ll post URLs when available. Discussing Readings on Monday. Keep posting discussion on Piazza Python Multiprocessing Topics today: Multiprocessing
More informationECE 574 Cluster Computing Lecture 13
ECE 574 Cluster Computing Lecture 13 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 15 October 2015 Announcements Homework #3 and #4 Grades out soon Homework #5 will be posted
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationPython is awesome. awesomeness C/gcc C++/gcc Java 6 Go/6g Haskell/GHC Scala Lisp SBCL C#/Mono OCaml Python
Python is awesome 30 28.07 awesomeness 22.5 15 7.5 0 2.95 3.35 2.16 2.22 2.67 1.55 2.01 1.07 1.19 C/gcc C++/gcc Java 6 Go/6g Haskell/GHC Scala Lisp SBCL C#/Mono OCaml Python A benchmark http://geetduggal.wordpress.com/2010/11/25/speed-up-your-python-unladen-vs-shedskin-vs-pypy-vs-c/
More informationIntroduction to Scientific Computing with Python, part two.
Introduction to Scientific Computing with Python, part two. M. Emmett Department of Mathematics University of North Carolina at Chapel Hill June 20 2012 The Zen of Python zen of python... fire up python
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with
More informationThe Python interpreter
The Python interpreter Daniel Winklehner, Remi Lehe US Particle Accelerator School (USPAS) Summer Session Self-Consistent Simulations of Beam and Plasma Systems S. M. Lund, J.-L. Vay, D. Bruhwiler, R.
More informationGuillimin HPC Users Meeting July 14, 2016
Guillimin HPC Users Meeting July 14, 2016 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Outline Compute Canada News System Status Software Updates Training
More informationBag of Tasks Parallelism. Timothy H. Kaiser, Ph.D.
Bag of Tasks Parallelism Timothy H. Kaiser, Ph.D. tkaiser@mines.edu Examples at: http://hpc.mines.edu/examples/ To just get mpi4py examples: mkdir examples cd examples curl http://hpc.mines.edu/examples/examples/mpi/mpi4py/mpi4py.tgz
More informationAdvanced MPI. Andrew Emerson
Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 22/02/2017 Advanced MPI 2 One
More informationThe Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing
The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task
More informationAn introduction to scientific programming with. Session 5: Extreme Python
An introduction to scientific programming with Session 5: Extreme Python PyTables For creating, storing and analysing datasets from simple, small tables to complex, huge datasets standard HDF5 file format
More informationMPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationIntel Distribution for Python* и Intel Performance Libraries
Intel Distribution for Python* и Intel Performance Libraries 1 Motivation * L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer, 2000, Vol. 33, Issue 10, pp. 23-29 ** RedMonk
More informationPROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec
PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization
More informationHigh Performance Computing Course Notes Message Passing Programming I
High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works
More informationUsing jupyter notebooks on Blue Waters. Roland Haas (NCSA / University of Illinois)
Using jupyter notebooks on Blue Waters https://goo.gl/4eb7qw Roland Haas (NCSA / University of Illinois) Email: rhaas@ncsa.illinois.edu Jupyter notebooks 2/18 interactive, browser based interface to Python
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationSupercomputing in Plain English Exercise #6: MPI Point to Point
Supercomputing in Plain English Exercise #6: MPI Point to Point In this exercise, we ll use the same conventions and commands as in Exercises #1, #2, #3, #4 and #5. You should refer back to the Exercise
More informationOpenPIV Documentation
OpenPIV Documentation Release 0.0.1 OpenPIV group Jun 20, 2018 Contents 1 Contents: 3 1.1 Installation instruction.......................................... 3 1.2 Information for developers and contributors...............................
More informationMPI MESSAGE PASSING INTERFACE
MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions
More informationLecture 28: Introduction to the Message Passing Interface (MPI) (Start of Module 3 on Distribution and Locality)
COMP 322: Fundamentals of Parallel Programming Lecture 28: Introduction to the Message Passing Interface (MPI) (Start of Module 3 on Distribution and Locality) Mack Joyner and Zoran Budimlić {mjoyner,
More informationHPC Workshop University of Kentucky May 9, 2007 May 10, 2007
HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Part 3 Parallel Programming Parallel Programming Concepts Amdahl s Law Parallel Programming Models Tools Compiler (Intel) Math Libraries (Intel)
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationCMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)
CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI
More informationShared Memory programming paradigm: openmp
IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM
More informationOperating Systems, Assignment 2 Threads and Synchronization
Operating Systems, Assignment 2 Threads and Synchronization Responsible TA's: Zohar and Matan Assignment overview The assignment consists of the following parts: 1) Kernel-level threads package 2) Synchronization
More information