Advanced and Parallel Python

Size: px
Start display at page:

Download "Advanced and Parallel Python"

Transcription

1 Advanced and Parallel Python December 1st, By: Bart Oldeman and Pier-Luc St-Onge 1

2 Financial Partners 2

3 Setup for the workshop 1. Get a user ID and password paper (provided in class): ##: usernm XXXXXXXXXX ********** 2. Access to local computer (replace ## and with appropriate values, is provided in class): a. User name: csuser## b. 3. HTTPS connection to Colosse (replace **********): a. b. User name: usernm c. Password: ********** d. If requested: i. click Start Server button, set walltime 8 3

4 Select Modules Change Notebook Kernel In the Software tab, select: compilers/llvm/3.7.1 compilers/gcc/4.8.5 Open notebooks/01-stack.ipynb File -> Save and Checkpoint 4

5 Import Examples and Exercises In case the cq-formation-advanced-python folder is not in your home directory, open a Terminal and type: module load apps/git/ # If on Colosse git clone -b ulaval \ cd cq-formation-advanced-python 5

6 Outline Revisiting the Scientific Python Stack Why (and What) is Python? Accelerating Python code: PyPy and Numpy Using C code from Python code Finding Bottlenecks - Profiling code Compiling Python Code Using Cython and Numba Parallelizing Python Programs Parallel Programming Concepts The multiprocessing Module MPI for Python (mpi4py) 6

7 The Scientific Python stack 7

8 Scientific Python stack In the introductory workshop we looked at: Python itself Numpy, for numerical array objects Scipy, for higher level routines IPython, an advanced Python shell Matplotlib, for plotting On top of that we introduce some new components, for example: Cython, for speed and interfacing mpi4py for using MPI in Python 8

9 Speeding up Python programs 9

10 Speeding up Python Central example: approx_pi.c / approx_pi.py: // approx_pi.c double approx_pi(int intervals) { double pi = 0.0; # approx_pi.py def approx_pi(intervals): pi = 0.0 int i; for (i = 0; i < intervals; i++) { pi += (4 - ((i % 2) * 8)) / (double)(2 * i + 1); } for i in range(intervals): pi += (4-8 * (i % 2)) / (float)(2 * i + 1) return pi return pi; } 10

11 Speeding up Python Compile: $ gcc -O2 pi_collect.c approx_pi.c -o pi_collect $./pi_collect Time = 0.88 sec Python run (example on Guillimin): $ module load iomkl/2015b Python/3.5.0 $ python pi_collect.py approx_pi The compiled C code runs almost 100 times faster than the Python code (0.88 vs. 66 seconds with intervals = ). Note that approx_pi is the module to import for pi_collect.py. 11

12 Speeding up Python How to speed up: two approaches 1. Make Python go faster a. Use the PyPy just-in-time compiler b. Use Numpy with vectorized code c. Use Cython 2. Call C code from Python a. b. c. d. e. Manually Use SWIG Use Ctypes Use Cython... 12

13 Speeding up Python using PyPy How to speed up: use PyPy: $ module add pypy/ $ pypy3 pi_collect.py approx_pi gives 2.2 seconds (30 times faster) An alternative to PyPy is Numba (not installed on Guillimin). 13

14 Speeding up with numpy How to speed up: use vectorized code: from future import division # only needed for Python 2.x def approx_pi(intervals): pi1 = 4/numpy.arange(1, intervals*2, 4) pi2 = -4/numpy.arange(3, intervals*2, 4) return numpy.sum(pi1) + numpy.sum(pi2) $ python3 pi_collect.py approx_pi_numpy gives 1.4 seconds (47 times faster). Drawback: extra memory use. How to speed up: Cython: see later 14

15 Interfacing with C/C++/Fortran 15

16 Interfacing with C and C++ There are at least 14 different ways to do it: By hand using the Python API (*) Pyrex Cython (**) SWIG (*) SIP Boost.Python PyCXX CTypes (*) Py++ f2py (*) PyD Interrogate Robin (*) Quick introduction Pybind11 (**) Most popular now, more thorough introduction 16

17 Using the Python API Pros: no extra dependencies Cons: a lot of boilerplate code, which can change between Python version /* Example of wrapping approx_pi() with the Python-C-API. */ #include <Python.h> #include "approx_pi.h" static PyObject* approx_pi_func(pyobject* self, PyObject* args) // wrapped approx_pi() { int value; double answer; if (!PyArg_ParseTuple(args, "i", &value)) // parse input, python float to c double return NULL; /* if the above function returns -1, an appropriate Python exception will * have been set, and the function simply returns NULL */ answer =approx_pi(value); /* construct the output from approx_pi, from c double to python float */ return Py_BuildValue("f", answer); } 17

18 Using the Python API /* define functions in module */ static PyMethodDef PiMethods[] = { {"approx_pi", approx_pi_func, METH_VARARGS, "approximate Pi"}, {NULL, NULL, 0, NULL} }; static struct PyModuleDef PiModule = { PyModuleDef_HEAD_INIT, "approx_pi_pyapi", NULL, -1, PiMethods, NULL, NULL, NULL, NULL }; /* module initialization */ PyMODINIT_FUNC PyInit_approx_pi_pyapi(void) { (void) PyModule_Create(&PiModule);} Compile using $ python3 setup_approx_pi_pyapi.py build_ext --inplace from distutils.core import setup, Extension # define the extension module module = Extension('approx_pi_pyapi', sources=['approx_pi_pyapi.c', 'approx_pi.c']) setup(ext_modules=[module]) # run the setup 18

19 Using CTypes Pros: the ctypes package is in Python by default, pure Python solution Cons: wrapped code in shared lib, interface not fast First compile approx_pi_ctypes.so: $ gcc -fpic -shared -O2 approx_pi.c -o approx_pi_ctypes.so # approx_pi_ctypes.py """ Example of wrapping approx_pi using ctypes. """ import ctypes approx_pi_dll = ctypes.cdll.loadlibrary('./approx_pi_ctypes.so') # find and load the library approx_pi_dll.approx_pi.argtypes = [ctypes.c_int] # set the argument type approx_pi_dll.approx_pi.restype = ctypes.c_double # set the return type def approx_pi(arg): ''' Wrapper for approx_pi ''' return approx_pi_dll.approx_pi(arg) 19

20 Using SWIG Mature solution Wrapper file is autogenerated from interface file. /* approx_pi_swig.i */ /* Example of wrapping approx_pi using SWIG. */ %module approx_pi_swig %{ /* the resulting C file should be built as a python extension */ #define SWIG_FILE_WITH_INIT /* Includes the header in the wrapper code */ #include "approx_pi.h" %} /* Parse the header file to generate wrappers */ %include "approx_pi.h" 20

21 Using SWIG Use distutils as before (python3 setup_approx_pi_swig.py build_ext --inplace) but mention the interface file in the setup script. from distutils.core import setup, Extension approx_pi_module = Extension("_approx_pi", sources=["approx_pi.c", "approx_pi.i"]) setup(ext_modules=[approx_pi_module]]) This generates three files: approx_pi_swig.py, approx_pi_swig_wrap.c, and _approx_pi_swig*.so 21

22 Using f2py Fortran version: approx_pi.f90 subroutine approx_pi(intervals, pi) integer, intent(in) :: intervals double precision, intent(out) :: pi integer i pi = 0 do i = 0, intervals - 1 pi = pi + (4 - (mod(i,2) * 8)) / dble(2 * i + 1) enddo end subroutine approx_pi Compile using f2py3 -c -m approx_pi_f2py approx_pi.f90 Then do python3 pi_collect.py approx_pi_f2py

23 Cython 23

24 Cython Cython compiles from Python (with extensions) to C. Based on Pyrex Goals: faster execution (especially with those extensions) and easier interoperability with other C code. Cython files use the.pyx extension. 24

25 Cython Example: approx_pi_cython1.pyx (same as approx_pi.py) def approx_pi(intervals): pi = 0.0 for i in range(intervals): pi += (4-8 * (i % 2)) / (float)(2 * i + 1) return pi Executing python3 setup_cython.py build_ext --inplace from distutils.core import setup from Cython.Build import cythonize setup(ext_modules = cythonize("*.pyx")) turns all.pyx files into.c files and.so modules Run python3 pi_collect.py approx_pi_cython seconds: the C code uses only Python objects. 25

26 Cython: declare variables Need to declare variables using cdef to make it fast Example: approx_pi_cython2.pyx def approx_pi(int intervals): cdef double pi cdef int i pi = 0.0 for i in range(intervals): pi += (4-8 * (i % 2)) / (float)(2 * i + 1) return pi Execute python3 setup_cython.py build_ext --inplace Run python3 pi_collect.py approx_pi_cython seconds: almost as fast as native C. 26

27 Cython: division Inspecting approx_pi_cython2.c we found it uses Pyx_mod_long( pyx_v_i, 2) instead of a plain pyx_v_i % 2. This is because in C, -1%10=-1 but in Python, -1%10=9. Here we can ignore this and tell Cython to use C behaviour, by adding a line #cython:cdivision=true Execute python3 setup_cython.py build_ext --inplace Check that approx_pi_cython3.c uses %. Run python3 pi_collect.py approx_pi_cython seconds: the same as native C. Note: use Cython in IPython/Jupyter using %load_ext cythonmagic and %%cython in a cell. 27

28 Cython: wrapping C code Last but not least: interfacing with C code: # approx_pi_cython4.pyx cdef extern from "approx_pi.h": double c_approx_pi "approx_pi" (int intervals) # C name: approx_pi, Cython name: c_approx_pi def approx_pi(int intervals): return c_approx_pi(intervals) Plus special setup_cython4.py script from distutils.core import setup, Extension from Cython.Distutils import build_ext setup(cmdclass={'build_ext': build_ext}, ext_modules=[extension("approx_pi_cython4", sources=["approx_pi_cython4.pyx", "approx_pi.c"])]) Execute python3 setup_cython4.py build_ext Run python3 pi_collect.py approx_pi_cython4 --inplace

29 Parallel Programming Concepts 29

30 Vocabulary Serial tasks Any task that cannot be split in two simultaneous sequences of actions Examples: starting a process, reading a file, any communication between two processes Parallel tasks Data parallelism: same action applied on different data. Could be serial tasks done in parallel. Process parallelism: one action on one set of data. Action split in multiple processes or threads. Data partitioning: rectangles or blocks 30

31 Parallel tasks Parallel efficiency (scaling) Amdahl s law: how long does it take to compute a task with an infinite number of processors? Gustafson's law: what size of problem can we solve in a given time with N processors? Shared memory Multiple threads share the same memory space in a single process: full read and write access. Distributed memory Each process has its own memory space Information is sent and received by messages 31

32 Distributed Memory Model Process 1 Different variables! Network A(10) A(10) Process 2 32

33 Serial Code Parallelization Implicit Parallelization - minimum work for you Threaded libraries (MKL, ACML, GOTO, etc.) Compiler directives (OpenMP) Good for desktops and shared memory machines Explicit Parallelization - work is required! You tell what should be done on what CPU Solution for distributed clusters (shared nothing!) Hybrid Parallelization - work is required! Mix of implicit and explicit parallelization Vectorization and parallel CPU instructions Good for accelerators (CUDA, OpenCL, etc.) 33

34 The multiprocessing Module 34

35 The multiprocessing Module Because of the implementation of CPython, only one thread at a time can execute Python code This avoids common issues with the shared memory model: race condition,... There is a threading module, but it is no longer recommended Solution: the multiprocessing module! 35

36 Pool of Workers For embarrassingly parallel tasks, the Pool class allows the creation of worker processes. Each process will compute different data. Warning: only works in a script! from multiprocessing import Pool def prod(values): return values[0] * values[1] if name == ' main ': N = 12 values = [(i + 1, N - i) for i in range(0, N)] print(values) workers = Pool(processes=4) results = workers.map(prod, values) print(results) 36

37 Pool of Workers Run: python script.py What happens with 4 workers: 37

38 Pool of Workers Asynchronous map calls can be used in order to do something else in the main process. The map_async() method returns an AsyncResult object which can wait until all workers are done. from multiprocessing import Pool import time def prod(values): time.sleep(1) return values[0] * values[1] if name == ' main ': N = 12 values = [(i + 1, N - i) for i in range(0, N)] print(values) workers = Pool(processes=4) results = workers.map_async(prod, values) print('waiting...') print(results.get(timeout=10)) 38

39 Pool of Workers Asynchronous map calls can use a callback function. Then, the main thread has to wait by first closing the access to workers, and by joining the pool of workers. def printres(results): print(results) if name == ' main ': N = 12 values = [(i + 1, N - i) for i in range(0, N)] print(values) workers = Pool(processes=4) results = workers.map_async(prod, values, callback=printres) print('waiting...') workers.close() workers.join() 39

40 Pool of Workers class Pool([processes[,...]]) processes: number of worker processes. If None, processes=multiprocessing.cpu_count() Methods: map(func, iterable[,...]): returns results map_async(func, iterable[,...]): returns an AsyncResult object close(): closes access to worker processes join(): waiting for all workers to exit. Must call close() before. 40

41 Pool of Workers class AsyncResult Methods: get([timeout]): blocking, get results as soon as they are available. In case of error, get wait([timeout]): blocking, waits until the call is done ready(): non-blocking, returns a boolean indicating if the call has completed. successful(): non-blocking, returns a boolean indicating if the call has succeeded. 41

42 Exercise - Baby Genomic Edit baby-genomic.py Use a pool of 4 workers Use the asynchronous map function Provide a callback function that will print results at the end Tip: use the edproxy() function in order to call the real editdistance() function. Run: time -p python baby-genomic.py 42

43 The Process class The Process class: manually spawn and control each process Process(target=fct, args=(arg1,arg2)).start() Communication channels: The Pipe class: to communicate between two processes, one sends data, one receives data The Queue class: a shared pipe managed with locks and semaphores, one puts data, one gets data Synchronization: The Lock class: one acquires lock, one releases lock 43

44 MPI for Python (mpi4py) 44

45 MPI for Python The mpi4py package provides bindings from Python to MPI (Message Passing Interface). MPI functions are then available in Python but with some simplifications: MPI_Init() and MPI_Finalize() are done automatically The bindings can auto-detect many values that need to be specified as explicit parameters in the C and Fortran bindings. Example: dest = 1; tag = 54321; MPI_Send( &matrix, count, MPI_INT, dest, tag, MPI_COMM_WORLD ) becomes MPI.COMM_WORLD.Send(matrix, dest=1, tag=54321) 45

46 MPI for Python Import as from mpi4py import MPI Then often use comm = MPI.COMM_WORLD Two variations for most functions: a. all lowercase, e.g. comm.recv() works on general Python objects, using pickle (can be slow) received object (value) returned: matrix = comm.recv(source=0, tag=mpi.any_tag) b. capitalized, e.g. comm.recv() works fast on numpy arrays & other buffers received object given as parameter: comm.recv(matrix, source=0, tag=mpi.any_tag) Specify [matrix, MPI.INT], or [data, count, MPI.INT] if autodetection fails. 46

47 Conclusions Main techniques covered: Speeding up: PyPy, Numba, CTypes, Cython Parallel programming: multiprocessing, mpi4py Useful links: erfacing_with_c.html

48 Questions? Calcul Quebec support team: Specific site support teams: 48

http://tinyurl.com/cq-advanced-python-20151029 1 2 ##: ********** ## csuser## @[S## ********** guillimin.hpc.mcgill.ca class## ********** qsub interactive.pbs 3 cp -a /software/workshop/cq-formation-advanced-python

More information

Practical Introduction to Message-Passing Interface (MPI)

Practical Introduction to Message-Passing Interface (MPI) 1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):

More information

Introduction to Python for Scientific Computing

Introduction to Python for Scientific Computing 1 Introduction to Python for Scientific Computing http://tinyurl.com/cq-intro-python-20151022 By: Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@calculquebec.ca, Bart.Oldeman@mcgill.ca Partners and

More information

C - extensions. only a small part of application benefits from compiled code

C - extensions. only a small part of application benefits from compiled code C - EXTENSIONS C - extensions Some times there are time critical parts of code which would benefit from compiled language 90/10 rule: 90 % of time is spent in 10 % of code only a small part of application

More information

Interfacing With Other Programming Languages Using Cython

Interfacing With Other Programming Languages Using Cython Lab 19 Interfacing With Other Programming Languages Using Cython Lab Objective: Learn to interface with object files using Cython. This lab should be worked through on a machine that has already been configured

More information

Extensions in C and Fortran

Extensions in C and Fortran Extensions in C and Fortran Why? C and Fortran are compiled languages Source code is translated to machine instructons by the compiler before you run. Ex: gfortran -o mycode mycode.f90 gcc -o mycode mycode.c

More information

Practical Introduction to Message-Passing Interface (MPI)

Practical Introduction to Message-Passing Interface (MPI) 1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your

More information

PYTHON IS SLOW. Make it faster with C. Ben Shaw

PYTHON IS SLOW. Make it faster with C. Ben Shaw PYTHON IS SLOW Make it faster with C Ben Shaw It s OK that Python isn t fast, you can write your slow functions in C! Everyone TABLE OF CONTENTS C Module vs C Types TABLE OF CONTENTS C Module vs C Types

More information

Python Optimization and Integration

Python Optimization and Integration [Software Development] Python Optimization and Integration Davide Balzarotti Eurecom Sophia Antipolis, France 1 When Python is not Enough Python is great for rapid application development Many famous examples...

More information

Python Scripting for Computational Science

Python Scripting for Computational Science Hans Petter Langtangen Python Scripting for Computational Science Third Edition With 62 Figures 43 Springer Table of Contents 1 Introduction... 1 1.1 Scripting versus Traditional Programming... 1 1.1.1

More information

Python, C, C++, and Fortran Relationship Status: It s Not That Complicated. Philip Semanchuk

Python, C, C++, and Fortran Relationship Status: It s Not That Complicated. Philip Semanchuk Python, C, C++, and Fortran Relationship Status: It s Not That Complicated Philip Semanchuk (philip@pyspoken.com) This presentation is part of a talk I gave at PyData Carolinas 2016. This presentation

More information

Python Scripting for Computational Science

Python Scripting for Computational Science Hans Petter Langtangen Python Scripting for Computational Science Third Edition With 62 Figures Sprin ger Table of Contents 1 Introduction 1 1.1 Scripting versus Traditional Programming 1 1.1.1 Why Scripting

More information

High Performance Python Micha Gorelick and Ian Ozsvald

High Performance Python Micha Gorelick and Ian Ozsvald High Performance Python Micha Gorelick and Ian Ozsvald Beijing Cambridge Farnham Koln Sebastopol Tokyo O'REILLY 0 Table of Contents Preface ix 1. Understanding Performant Python 1 The Fundamental Computer

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

Mixed language programming

Mixed language programming Mixed language programming Simon Funke 1,2 Ola Skavhaug 3 Joakim Sundnes 1,2 Hans Petter Langtangen 1,2 Center for Biomedical Computing, Simula Research Laboratory 1 Dept. of Informatics, University of

More information

High Performance Computing with Python

High Performance Computing with Python High Performance Computing with Python Pawel Pomorski SHARCNET University of Waterloo ppomorsk@sharcnet.ca April 29,2015 Outline Speeding up Python code with NumPy Speeding up Python code with Cython Using

More information

Mixed language programming with NumPy arrays

Mixed language programming with NumPy arrays Mixed language programming with NumPy arrays Simon Funke 1,2 Ola Skavhaug 3 Joakim Sundnes 1,2 Hans Petter Langtangen 1,2 Center for Biomedical Computing, Simula Research Laboratory 1 Dept. of Informatics,

More information

Scientific Computing Using. Atriya Sen

Scientific Computing Using. Atriya Sen Scientific Computing Using Atriya Sen Broad Outline Part I, in which I discuss several aspects of the Python programming language Part II, in which I talk about some Python modules for scientific computing

More information

Running Cython. overview hello world with Cython. experimental setup adding type declarations cdef functions & calling external functions

Running Cython. overview hello world with Cython. experimental setup adding type declarations cdef functions & calling external functions Running Cython 1 Getting Started with Cython overview hello world with Cython 2 Numerical Integration experimental setup adding type declarations cdef functions & calling external functions 3 Using Cython

More information

Diffusion processes in complex networks

Diffusion processes in complex networks Diffusion processes in complex networks Digression - parallel computing in Python Janusz Szwabiński Outlook: Multiprocessing Parallel computing in IPython MPI for Python Cython and OpenMP Python and OpenCL

More information

Cython. April 2008 Brian Blais

Cython. April 2008 Brian Blais Cython O p t i m i z a t i o n i n P y t h o n April 2008 Brian Blais Rule #1 of Optimization Premature optimization is the root of all evil - Donald Knuth What is Cython/Pyrex? Python to C/Python-API

More information

Speeding up Python. Antonio Gómez-Iglesias April 17th, 2015

Speeding up Python. Antonio Gómez-Iglesias April 17th, 2015 Speeding up Python Antonio Gómez-Iglesias agomez@tacc.utexas.edu April 17th, 2015 Why Python is nice, easy, development is fast However, Python is slow The bottlenecks can be rewritten: SWIG Boost.Python

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

Robot Vision Systems Lecture 8: Python wrappers in OpenCV

Robot Vision Systems Lecture 8: Python wrappers in OpenCV Robot Vision Systems Lecture 8: Python wrappers in OpenCV Michael Felsberg michael.felsberg@liu.se Why Python Wrappers Assume a small library based on OpenCV Python interface for Testing Distribution Prototyping

More information

multiprocessing and mpi4py

multiprocessing and mpi4py multiprocessing and mpi4py 02-03 May 2012 ARPA PIEMONTE m.cestari@cineca.it Bibliography multiprocessing http://docs.python.org/library/multiprocessing.html http://www.doughellmann.com/pymotw/multiprocessi

More information

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction

More information

LECTURE 7: STUDENT REQUESTED TOPICS

LECTURE 7: STUDENT REQUESTED TOPICS 1 LECTURE 7: STUDENT REQUESTED TOPICS Introduction to Scientific Python, CME 193 Feb. 20, 2014 Please download today s exercises from: web.stanford.edu/~ermartin/teaching/cme193-winter15 Eileen Martin

More information

CNRS ANF PYTHON Packaging & Life Cycle

CNRS ANF PYTHON Packaging & Life Cycle CNRS ANF PYTHON Packaging & Life Cycle Marc Poinot Numerical Simulation Dept. Outline Package management with Python Concepts Software life cycle Package services Pragmatic approach Practical works Source

More information

CIS192 Python Programming

CIS192 Python Programming CIS192 Python Programming Graphical User Interfaces Robert Rand University of Pennsylvania December 03, 2015 Robert Rand (University of Pennsylvania) CIS 192 December 03, 2015 1 / 21 Outline 1 Performance

More information

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011. CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with

More information

Astronomical Data Analysis with Python

Astronomical Data Analysis with Python Astronomical Data Analysis with Python Lecture 8 Yogesh Wadadekar NCRA-TIFR July August 2010 Yogesh Wadadekar (NCRA-TIFR) Topical course 1 / 27 Slides available at: http://www.ncra.tifr.res.in/ yogesh/python_course_2010/

More information

Scientific Computing with Python and CUDA

Scientific Computing with Python and CUDA Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 1 / 55 Inhalt 1 A

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 2 Part I Programming

More information

Introduction to the Julia language. Marc Fuentes - SED Bordeaux

Introduction to the Julia language. Marc Fuentes - SED Bordeaux Introduction to the Julia language Marc Fuentes - SED Bordeaux Outline 1 motivations Outline 1 motivations 2 Julia as a numerical language Outline 1 motivations 2 Julia as a numerical language 3 types

More information

Session 12: Introduction to MPI (4PY) October 9 th 2018, Alexander Peyser (Lena Oden)

Session 12: Introduction to MPI (4PY) October 9 th 2018, Alexander Peyser (Lena Oden) Session 12: Introduction to MPI (4PY) October 9 th 2018, Alexander Peyser (Lena Oden) Overview Introduction Basic concepts mpirun Hello world Wrapping numpy arrays Common Pitfalls Introduction MPI: de

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

Advanced Message-Passing Interface (MPI)

Advanced Message-Passing Interface (MPI) Outline of the workshop 2 Advanced Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Morning: Advanced MPI Revision More on Collectives More on Point-to-Point

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

An introduction to scientific programming with. Session 5: Extreme Python

An introduction to scientific programming with. Session 5: Extreme Python An introduction to scientific programming with Session 5: Extreme Python Managing your environment Efficiently handling large datasets Optimising your code Squeezing out extra speed Writing robust code

More information

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially

More information

Blue Waters Programming Environment

Blue Waters Programming Environment December 3, 2013 Blue Waters Programming Environment Blue Waters User Workshop December 3, 2013 Science and Engineering Applications Support Documentation on Portal 2 All of this information is Available

More information

Python where we can, C ++ where we must

Python where we can, C ++ where we must Python where we can, C ++ where we must Source: http://xkcd.com/353/ Guy K. Kloss Python where we can,c++ where we must 1/28 Python where we can, C ++ where we must Guy K. Kloss BarCamp Auckland 2007 15

More information

Session 12: Introduction to MPI (4PY) October 10 th 2017, Lena Oden

Session 12: Introduction to MPI (4PY) October 10 th 2017, Lena Oden Session 12: Introduction to MPI (4PY) October 10 th 2017, Lena Oden Overview Introduction Basic concepts mpirun Hello world Wrapping numpy arrays Common Pittfals Introduction MPI de facto standard for

More information

Parallelism paradigms

Parallelism paradigms Parallelism paradigms Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 23, 2011 Outline 1 Parallelization strategies 2 Shared memory 3 Distributed memory 4 Parallelization

More information

Speeding up Python using Cython

Speeding up Python using Cython Speeding up Python using Cython Rolf Boomgaarden Thiemo Gries Florian Letsch Universität Hamburg November 28th, 2013 What is Cython? Compiler, compiles Python-like code to C-code Code is still executed

More information

Running Cython and Vectorization

Running Cython and Vectorization Running Cython and Vectorization 1 Getting Started with Cython overview hello world with Cython 2 Numerical Integration experimental setup adding type declarations cdef functions & calling external functions

More information

ECE 574 Cluster Computing Lecture 10

ECE 574 Cluster Computing Lecture 10 ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular

More information

Guillimin HPC Users Meeting December 14, 2017

Guillimin HPC Users Meeting December 14, 2017 Guillimin HPC Users Meeting December 14, 2017 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Please be kind to your fellow user meeting attendees Limit

More information

Exceptions in Python. AMath 483/583 Lecture 27 May 27, Exceptions in Python. Exceptions in Python

Exceptions in Python. AMath 483/583 Lecture 27 May 27, Exceptions in Python. Exceptions in Python AMath 483/583 Lecture 27 May 27, 2011 Today: Python exception handling Python plus Fortran: f2py Next week: More Python plus Fortran Visualization Parallel IPython Read: Class notes and references If you

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

High Performance Computing with Python

High Performance Computing with Python High Performance Computing with Python Pawel Pomorski SHARCNET University of Waterloo ppomorsk@sharcnet.ca March 15,2017 Outline Speeding up Python code with NumPy Speeding up Python code with Cython Speeding

More information

multiprocessing HPC Python R. Todd Evans January 23, 2015

multiprocessing HPC Python R. Todd Evans January 23, 2015 multiprocessing HPC Python R. Todd Evans rtevans@tacc.utexas.edu January 23, 2015 What is Multiprocessing Process-based parallelism Not threading! Threads are light-weight execution units within a process

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

Concurrency, Thread. Dongkun Shin, SKKU

Concurrency, Thread. Dongkun Shin, SKKU Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point

More information

Parallel Programming Libraries and implementations

Parallel Programming Libraries and implementations Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SURFsara High Performance Computing and Big Data Message Passing as a Programming Paradigm Gentle Introduction to MPI Point-to-point Communication Message Passing

More information

Threaded Programming. Lecture 9: Alternatives to OpenMP

Threaded Programming. Lecture 9: Alternatives to OpenMP Threaded Programming Lecture 9: Alternatives to OpenMP What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming

More information

Cython: Stop writing native Python extensions in C

Cython: Stop writing native Python extensions in C Python extensions March 29, 2016 cython.org programming language similar to Python static typing from C/C++ compiler from Cython language to C/C++ to Python extension module or to standalone apps* feels

More information

Running Cython and Vectorization

Running Cython and Vectorization Running Cython and Vectorization 1 Getting Started with Cython overview hello world with Cython 2 Numerical Integration experimental setup adding type declarations cdef functions & calling external functions

More information

Python for Earth Scientists

Python for Earth Scientists Python for Earth Scientists Andrew Walker andrew.walker@bris.ac.uk Python is: A dynamic, interpreted programming language. Python is: A dynamic, interpreted programming language. Data Source code Object

More information

Implementation of Parallelization

Implementation of Parallelization Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One

More information

Allinea DDT Debugger. Dan Mazur, McGill HPC March 5,

Allinea DDT Debugger. Dan Mazur, McGill HPC  March 5, Allinea DDT Debugger Dan Mazur, McGill HPC daniel.mazur@mcgill.ca guillimin@calculquebec.ca March 5, 2015 1 Outline Introduction and motivation Guillimin login and DDT configuration Compiling for a debugger

More information

MPI: the Message Passing Interface

MPI: the Message Passing Interface 15 Parallel Programming with MPI Lab Objective: In the world of parallel computing, MPI is the most widespread and standardized message passing library. As such, it is used in the majority of parallel

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Mixed Python/C programming with Cython September /14. Mixed Python/C programming with Cython Ben Dudson, 22nd September 2017

Mixed Python/C programming with Cython September /14. Mixed Python/C programming with Cython Ben Dudson, 22nd September 2017 Mixed Python/C programming with Cython September 2017 1/14 Mixed Python/C programming with Cython Ben Dudson, 22nd September 2017 Mixed Python/C programming with Cython September 2017 2/14 Cython http://cython.org/

More information

Introduction to Scientific Python, CME 193 Jan. 9, web.stanford.edu/~ermartin/teaching/cme193-winter15

Introduction to Scientific Python, CME 193 Jan. 9, web.stanford.edu/~ermartin/teaching/cme193-winter15 1 LECTURE 1: INTRO Introduction to Scientific Python, CME 193 Jan. 9, 2014 web.stanford.edu/~ermartin/teaching/cme193-winter15 Eileen Martin Some slides are from Sven Schmit s Fall 14 slides 2 Course Details

More information

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008 SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed

More information

Getting along and working together. Fortran-Python Interoperability Jacob Wilkins

Getting along and working together. Fortran-Python Interoperability Jacob Wilkins Getting along and working together Fortran-Python Interoperability Jacob Wilkins Fortran AND Python working together? Fortran-Python May 2017 2/19 Two very different philosophies Two very different code-styles

More information

Administrivia. HW1 due Oct 4. Lectures now being recorded. I ll post URLs when available. Discussing Readings on Monday.

Administrivia. HW1 due Oct 4. Lectures now being recorded. I ll post URLs when available. Discussing Readings on Monday. Administrivia HW1 due Oct 4. Lectures now being recorded. I ll post URLs when available. Discussing Readings on Monday. Keep posting discussion on Piazza Python Multiprocessing Topics today: Multiprocessing

More information

ECE 574 Cluster Computing Lecture 13

ECE 574 Cluster Computing Lecture 13 ECE 574 Cluster Computing Lecture 13 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 15 October 2015 Announcements Homework #3 and #4 Grades out soon Homework #5 will be posted

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

Python is awesome. awesomeness C/gcc C++/gcc Java 6 Go/6g Haskell/GHC Scala Lisp SBCL C#/Mono OCaml Python

Python is awesome. awesomeness C/gcc C++/gcc Java 6 Go/6g Haskell/GHC Scala Lisp SBCL C#/Mono OCaml Python Python is awesome 30 28.07 awesomeness 22.5 15 7.5 0 2.95 3.35 2.16 2.22 2.67 1.55 2.01 1.07 1.19 C/gcc C++/gcc Java 6 Go/6g Haskell/GHC Scala Lisp SBCL C#/Mono OCaml Python A benchmark http://geetduggal.wordpress.com/2010/11/25/speed-up-your-python-unladen-vs-shedskin-vs-pypy-vs-c/

More information

Introduction to Scientific Computing with Python, part two.

Introduction to Scientific Computing with Python, part two. Introduction to Scientific Computing with Python, part two. M. Emmett Department of Mathematics University of North Carolina at Chapel Hill June 20 2012 The Zen of Python zen of python... fire up python

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with

More information

The Python interpreter

The Python interpreter The Python interpreter Daniel Winklehner, Remi Lehe US Particle Accelerator School (USPAS) Summer Session Self-Consistent Simulations of Beam and Plasma Systems S. M. Lund, J.-L. Vay, D. Bruhwiler, R.

More information

Guillimin HPC Users Meeting July 14, 2016

Guillimin HPC Users Meeting July 14, 2016 Guillimin HPC Users Meeting July 14, 2016 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Outline Compute Canada News System Status Software Updates Training

More information

Bag of Tasks Parallelism. Timothy H. Kaiser, Ph.D.

Bag of Tasks Parallelism. Timothy H. Kaiser, Ph.D. Bag of Tasks Parallelism Timothy H. Kaiser, Ph.D. tkaiser@mines.edu Examples at: http://hpc.mines.edu/examples/ To just get mpi4py examples: mkdir examples cd examples curl http://hpc.mines.edu/examples/examples/mpi/mpi4py/mpi4py.tgz

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 22/02/2017 Advanced MPI 2 One

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task

More information

An introduction to scientific programming with. Session 5: Extreme Python

An introduction to scientific programming with. Session 5: Extreme Python An introduction to scientific programming with Session 5: Extreme Python PyTables For creating, storing and analysing datasets from simple, small tables to complex, huge datasets standard HDF5 file format

More information

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Intel Distribution for Python* и Intel Performance Libraries

Intel Distribution for Python* и Intel Performance Libraries Intel Distribution for Python* и Intel Performance Libraries 1 Motivation * L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer, 2000, Vol. 33, Issue 10, pp. 23-29 ** RedMonk

More information

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization

More information

High Performance Computing Course Notes Message Passing Programming I

High Performance Computing Course Notes Message Passing Programming I High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works

More information

Using jupyter notebooks on Blue Waters. Roland Haas (NCSA / University of Illinois)

Using jupyter notebooks on Blue Waters.   Roland Haas (NCSA / University of Illinois) Using jupyter notebooks on Blue Waters https://goo.gl/4eb7qw Roland Haas (NCSA / University of Illinois) Email: rhaas@ncsa.illinois.edu Jupyter notebooks 2/18 interactive, browser based interface to Python

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Supercomputing in Plain English Exercise #6: MPI Point to Point

Supercomputing in Plain English Exercise #6: MPI Point to Point Supercomputing in Plain English Exercise #6: MPI Point to Point In this exercise, we ll use the same conventions and commands as in Exercises #1, #2, #3, #4 and #5. You should refer back to the Exercise

More information

OpenPIV Documentation

OpenPIV Documentation OpenPIV Documentation Release 0.0.1 OpenPIV group Jun 20, 2018 Contents 1 Contents: 3 1.1 Installation instruction.......................................... 3 1.2 Information for developers and contributors...............................

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions

More information

Lecture 28: Introduction to the Message Passing Interface (MPI) (Start of Module 3 on Distribution and Locality)

Lecture 28: Introduction to the Message Passing Interface (MPI) (Start of Module 3 on Distribution and Locality) COMP 322: Fundamentals of Parallel Programming Lecture 28: Introduction to the Message Passing Interface (MPI) (Start of Module 3 on Distribution and Locality) Mack Joyner and Zoran Budimlić {mjoyner,

More information

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Part 3 Parallel Programming Parallel Programming Concepts Amdahl s Law Parallel Programming Models Tools Compiler (Intel) Math Libraries (Intel)

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman) CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI

More information

Shared Memory programming paradigm: openmp

Shared Memory programming paradigm: openmp IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM

More information

Operating Systems, Assignment 2 Threads and Synchronization

Operating Systems, Assignment 2 Threads and Synchronization Operating Systems, Assignment 2 Threads and Synchronization Responsible TA's: Zohar and Matan Assignment overview The assignment consists of the following parts: 1) Kernel-level threads package 2) Synchronization

More information