Guillimin HPC Users Meeting January 13, 2017

Size: px

Start display at page:

Download "Guillimin HPC Users Meeting January 13, 2017"

Victor Hall
5 years ago
Views:

1 Guillimin HPC Users Meeting January 13, 2017 McGill University / Calcul Québec / Compute Canada Montréal, QC Canada

2 Please be kind to your fellow user meeting attendees Limit to two slices of pizza per person to start please And please recycle your pop cans. Thank you! 2

3 Outline Compute Canada News System Status Software Updates Training News Special Topic Multithreaded C++ Programming with TBB 3

4 Compute Canada News 2017 Resource Allocation Competitions Scientific reviews undergoing Announcement of Awards: Early March 2017 Implementation of Awards: Mid April 2017 Compute Canada MSI 2.0 ( ): 69M$ For operation (not for hardware) ial-government-funding-announced-some-canadas-lea ding-national-research 4

5 Storage and Infiniband Status GPFS file system more stable since early December: We are closely monitoring the status of all Infiniband links and modules Reseated and replaced faulty network cables and modules Infiniband Leaf Module Reset, Tuesday Jan 17, 9h00: this is a faulty part of the HB core switch affecting 18 (now offline) nodes; the reset should be safe Made system more resilient: no more local DNS lookups via ethernet, fixed scripts, so that failures are localized and do not spread to the whole system Guillimin core elements nearly 6 years old 5

6 Storage Status Space Management /gs is full: 95% used, 184 TB free (as of Jan. 12) For better space management we continue to migrate cold data from disk to tape Metadata remains on disk Users can still access their files through usual methods, but with an increased latency Storage space is a precious resource - manage it wisely! Delete temporary files, compress large files not frequently accessed, tar many smaller files into collections, 6

7 New Software Installations Please use module spider modulename for load instructions. tbb/ (Intel Threading Building Blocks) PETSc/3.7.3-Python (PDE solvers) SuperLU/5.1.1 (direct sparse matrix equation solver) FIAT/ Python (finite elements) sympy/0.7.6-python (symbolic computation) GROMACS/5.1.1-cuda hybrid (molecular dynamics, with GPU support) GROMACS/5.1.1-hybrid (same without GPU) 7

8 Training News All upcoming events: calculquebec.eventbrite.ca Jan. 26 (TBC): Introduction to ARC (McGill) Feb. 7 - Analyse et visualisation de données en Python (U. Laval) Recently completed: --- All materials from previous workshops are available online: wiki.calculquebec.ca/w/formations/en All user meeting presentations online at 8

9 User Feedback and Discussion Questions? Comments? We value your feedback. Contact us at: Guillimin Operational News for Users Status Pages (all CQ systems) Follow us on Twitter 9

10 Multithreaded C++ Programming with TBB January 13, 2017 McGill University / Calcul Québec / Compute Canada Montréal, QC Canada

11 Outline What is TBB? Why using TBB? Problems in Multithreading Design Patterns and TBB Parallel For Parallel Reduce Split and Join - Dynamic Scheduling TBB Examples Mandelbrot Set Approximation of pi 11

12 What is TBB? Intel Threading Building Blocks (TBB) Is a C++ library for scalable data-parallel programming Task-based programming: The user specifies the workload (tasks) TBB manages threads efficiently and spreads the workload among threads It provides class, function and data type templates Much like the C++ Standard Template Library (STL) Highly concurrent container classes for parallel access Solutions to common problems (design patterns for parallel programming) are implemented and optimized by TBB: Reusable algorithms for user-specific types of data 12

13 Why Using TBB? Benefits: C++ library working with any compiler on a x86 computer (Intel or AMD processors) Portable (Linux, Windows, OS X) We can focus on parallel tasks instead of managing low-level threads and splitting the workload Collection of optimized algorithms solving multiple problems in Multithreading (see next page) Specific actions are defined in classes instead of procedural code: allows object-oriented code How about OpenMP and POSIX threads? Because TBB does not resolve all problems, it can work with other threading packages 13

14 Problems in Multithreading Parallelizing Simple Loops Parallelizing Complex Loops While-loops, pipelines Parallelizing Data Flow and Dependence Graphs Exceptions and Cancellation for threads Containers for parallel computing Mutual Exclusion and Atomic Operations Benchmarking, measuring the performance Memory Allocation, avoiding false sharing Scheduling tasks to threads Other Design Patterns (see next page) 14

15 Design Patterns and TBB TBB documentation shows how to implement: Agglomeration: how to split the data area/volume Elementwise: independent computation on each item Odd-Even Communication: alternate between 2 partitions Wavefront: using results from previous iterations Reduction: associative reduction operation Divide and Conquer: subtasks, example: quicksort GUI Thread: waiting for results Non-Preemptive Priorities: choose next task Local Serializer: parallel threads of serial tasks Fenced Data Transfer: synchronization Lazy Initialization: initialize when needed Reference Counting: deleting an object no longer used Compare and Swap Loop: Atomic compare+swap 15

16 Parallel For Used for independent data and results: parallel_for(range, workerobject); The workerobject gets duplicated by TBB (by using the copy constructor) and is automatically assigned to a thread. Typically, for the workerobject: The constructor gets constant parameters for the loop The task is done by the class operator(): void Worker::operator()(range) const 16

17 Parallel For class Worker { private: int *buffer; public: Worker(int *buff) : buffer(buff) {} }; void operator()(const blocked_range<size_t>& r) const { for (size_t i = r.begin(); i < r.end(); i++) buffer[i] = i; } parallel_for( blocked_range<size_t>(0, N), Worker(some_buffer) ); 17

18 Parallel Reduce Used when reduction operations can be applied to two consecutive subtasks or subranges (associative op.): parallel_reduce(range, workerobject); The workerobject is split by TBB by using a special split constructor: Worker(Worker &w, split): buf(w.buf), sum(0) {} The work is still done in Worker::operator() A child workerobject joins back its parent with: void join(worker &child) { sum += child.sum; } The final result is in the initial workerobject: const double getsum() const { return sum; } 18

19 Split and Join - Dynamic Scheduling Global range of [0, ), construction of thread > 1.1 : will be responsible of range [500000, ) > : will be responsible of range [750000, )... 1 < : [0, ) <-- [375000, ) < : [679442, ) <-- [679564, ) < : [679198, ) <-- [679442, ) < : [678710, ) <-- [679198, ) < : [675781, ) <-- [678710, ) 1.1 < : [500000, ) <-- [675781, ) 1.1 < : [500000, ) <-- [679687, ) 1.1 < : [500000, ) <-- [687500, ) 1.1 < : [500000, ) <-- [750000, ) 1 < : [0, ) <-- [500000, ) 19

20 TBB Examples Demo files in: /software/workshop/tbb-demo Mandelbrot set - a fractal Image of 1920*1080 pixels where each pixel s coordinates are converted to a x+iy complex number c = x+iy, z 0 = 0, z n+1 = z n 2 + c If abs(z n ) remains less than 2.0 after 1024 iterations, c is part of the Mandelbrot set Using parallel_for(blocked_range2d<t>,) Approximation of Pi Taylor development of 4*arctan(1): 4 * sum(i=0..n)((-1)^i / (2 * i + 1)) Using parallel_reduce(blocked_range<t>,) 20

21 TBB Official Documentation Intel Website: Introduction to TBB: Developer Guide: to learn by topics and case studies Developer Reference: all the details about TBB tools TBB Website: Classes with Doxygen: dex.html 21

Guillimin HPC Users Meeting October 20, 2016

Guillimin HPC Users Meeting October 20, 2016 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Please be kind to your fellow user meeting attendees Limit