Parallel Computing with Matlab and R

Similar documents
Introduction to HPC Using zcluster at GACRC

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

High Performance Computing (HPC) Using zcluster at GACRC

Name Department/Research Area Have you used the Linux command line?

Speeding up MATLAB Applications Sean de Wolski Application Engineer

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

Calcul intensif et Stockage de Masse. CÉCI/CISM HPC training sessions Use of Matlab on the clusters

Calcul intensif et Stockage de Masse. CÉCI/CISM HPC training sessions

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

Parallel Computing with MATLAB

Introduction to HPC Using zcluster at GACRC

High Performance Computing for BU Economists

MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization

Using the computational resources at the GACRC

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Using Parallel Computing Toolbox to accelerate the Video and Image Processing Speed. Develop parallel code interactively

Parallel and Distributed Computing with MATLAB The MathWorks, Inc. 1

Introduction to HPC Using zcluster at GACRC

Optimizing and Accelerating Your MATLAB Code

Grid Engine - A Batch System for DESY. Andreas Haupt, Peter Wegner DESY Zeuthen

Introduction to HPC Using zcluster at GACRC On-Class GENE 4220

HPC DOCUMENTATION. 3. Node Names and IP addresses:- Node details with respect to their individual IP addresses are given below:-

Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer

Parallel MATLAB at VT

Advanced Topics in High Performance Scientific Computing [MA5327] Exercise 1

Introduction to Discovery.

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

Introduction to HPC Using zcluster at GACRC

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Computing with the Moore Cluster

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

High Performance Computing for BU Economists

Introduction to Discovery.

Cluster Clonetroop: HowTo 2014

Introduction to Parallel Programming (Session 2: MPI + Matlab/Octave/R)

Using a GPU in InSAR processing to improve performance

Introduction to Computing V - Linux and High-Performance Computing

Sharpen Exercise: Using HPC resources and running parallel applications

Joint High Performance Computing Exchange (JHPCE) Cluster Orientation.

New User Tutorial. OSU High Performance Computing Center

Speeding up MATLAB Applications The MathWorks, Inc.

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen

Parallel Computing with MATLAB

Getting started with the CEES Grid

Introduction to High Performance Computing Using Sapelo2 at GACRC

General Purpose GPU Computing in Partial Wave Analysis

Introduction to MATLAB

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

Multicore Computer, GPU 및 Cluster 환경에서의 MATLAB Parallel Computing 기능

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009

Effective Use of CCV Resources

Batch Systems. Running calculations on HPC resources

MATLAB Based Optimization Techniques and Parallel Computing

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Sharpen Exercise: Using HPC resources and running parallel applications

Running Applications on The Sheffield University HPC Clusters

R on BioHPC. Rstudio, Parallel R and BioconductoR. Updated for

Sherlock for IBIIS. William Law Stanford Research Computing

HPC Resources at Lehigh. Steve Anthony March 22, 2012

Advanced Research Computing. ARC3 and GPUs. Mark Dixon

For Dr Landau s PHYS8602 course

An Introduction to Cluster Computing Using Newton

Data Management at ARSC

The GPU-Cluster. Sandra Wienke Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

ITCS 4145/5145 Assignment 2

Our Workshop Environment

Working on the NewRiver Cluster

Anthill User Group Meeting, 2015

Introductory Parallel and Distributed Computing Tools of nirvana.tigem.it

Accelerator programming with OpenACC

Our new HPC-Cluster An overview

Spartan '08 for Linux Installation Instructions

RWTH GPU-Cluster. Sandra Wienke March Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Introduction to PICO Parallel & Production Enviroment

Introduction to Discovery.

Introduction to Linux and Supercomputers

Introduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

Graham vs legacy systems

Introduction to CARC. To provide high performance computing to academic researchers.

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Introduction of Linux

Bioinformatics Facility at the Biotechnology/Bioservices Center

AMS 200: Working on Linux/Unix Machines

Introduction to GALILEO

Genius Quick Start Guide

Using Computing Resources

Batch system usage arm euthen F azo he Z J. B T

Introduction to HPC Using zcluster at GACRC

NUSGRID a computational grid at NUS

National Biochemical Computational Research Familiarize yourself with the account policy

Introduction to HPC Using zcluster at GACRC

Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS

Introduction to OpenACC. Shaohao Chen Research Computing Services Information Services and Technology Boston University

Introduction to SLURM & SLURM batch scripts

Introduction to HPC Resources and Linux

Introduction to High-Performance Computing (HPC)

Introduction to Cuda Visualization. Graphical Application Tunnelling on Palmetto

Transcription:

Parallel Computing with Matlab and R scsc@duke.edu https://wiki.duke.edu/display/scsc Tom Milledge tm103@duke.edu Overview Running Matlab and R interactively and in batch mode Introduction to Parallel Computing Running Matlab and R as Array jobs Using the Matlab Parallel Computing Toolbox Using Rmpi and SNOW Using GPUs as accelerators for Matlab and R 1

Running Matlab Interactively Run Matlab with the qrsh command: head4 % qrsh stat-n03 % /opt/matlabr14/bin/matlab -nojvm -nodisplay or: head4 % qrsh stat-n03 % /opt/matlab2007a/bin/matlab -nojvm -nodisplay or: head2 % qrsh l centos5 core-n75 % /opt/matlabr2009a/bin/matlab -nojvm -nodisplay Using Matlab with SGE #!/bin/tcsh # #$ -S /bin/tcsh -cwd #$ -o matlab.out -j y #$ -l centos5 /opt/matlabr2009a/bin/matlab -nojvm -nodisplay r my_program '-r' option : immediately run an m-function instead of presenting an interactive prompt "my_program.m" (with a ".m" extension), but the Matlab executable is called with "my_program" (without the ".m" extension). 2

A simple example % simple Matlab script % do work here A = eye(5,5); x = (1:5)'; y = A*x; % leaving of the semicolon outputs y to the screen % where it is captured by SGE and sent to the -o file y' quit Your Matlab script must call the 'quit' command at the end of it's processing or else it will run forever! About R About R (http://www.r-project.org/): R is an Open Source (GPL), most widely used programming environment for statistical analysis and graphics; similar to S. Provides good support for both users and developers. Highly extensible via dynamically loadable add-on packages. Originally developed by Robert Gentleman and Ross Ihaka. 3

Running R Interactively Run R with the qrsh command: (version 2.2.1) head4 % qrsh stat-n03 % R --vanilla or: (version 2.7.1) head4 % qrsh stat-n03 % /opt/r271/bin/r --vanilla or: (version 2.9.2) head2 % qrsh l highprio core-n75 % R --vanilla Using R with SGE #!/bin/tcsh # #$ -S /bin/tcsh -cwd #$ -o R.out -j y R CMD BATCH My_R_program results.rout CMD BATCH options which tells it to immediately run an R program instead of presenting an interactive prompt R.out is the screen output results.rout is the R program output 4

Job Parallelism You have a large number of unconnected jobs Pool of Work or Bag of Tasks model Parameter space studies Many data sets, all of which needs to be processed by the same algorithm No communication or interaction is needed between Job-#i and Job-#j This is often the most efficient form of parallelism possible! *IF* you can fit the jobs onto individual machines Memory could be a limiting factor *AND* it will still take X hours for one job, it s just that you will get back 10 or 100 results every X hours Sequential Parallel Programming For Job Parallelism (pool of work model), you may not have to write real parallel code at all E.g. you may have (or can generate) 1000 input files, each of which has to be processed on a single CPU What input files do you need? What output files are produced? make sure to name files appropriately to avoid over-writing them no keyboard input, no screen output (or use redirection) 5

Iterative Parallelism Iterative parallelism means breaking up a problem where there are large iterative processes or loops Eg. for loops in C, do loops in Fortran Large matrix-vector problems Example: Poisson s Equation: For I=1 to N For J=1 to N v_new(i,j) = 0.25*( v(i+1,j)+v(i- 1,j)+v(I,J+1)+v(I,J-1) ) Next J Next I If N is large, then there is a lot of parallel work that can be done Note that v_new(i,j) requires only information at v(i±1,j±1) So work on row I=1 is independent of rows I={3,4,5, } Submitting MATLAB or R programs as Array Jobs A script that is to be run multiple times Only difference between each run is a single environment variable, $SGE_TASK_ID Pre-compute N different input files, or input directories #!/bin/csh # #$ -cwd #$ -t 1-1000 cd dir.$sge_task_id matlab -nojvm -nodisplay r my_program will run the script 1000 times, first with $SGE_TASK_ID=1, then with $SGE_TASK_ID=2, etc. SGE will start as many of the tasks as it can, as soon as it can 6

Parallel Computing with MATLAB TOOLBOXES BLOCKSETS Development & Testing Pool of MATLAB Workers Run Four Local Workers with a Parallel Computing Toolbox License Easily experiment with explicit parallelism on multicore machines Parallel Computing Toolbox Rapidly develop parallel applications on local computer Take full advantage of desktop power Separate computer cluster not required 7

Parallel for-loops parfor i = 1 : n % do something with i end Mix task-parallel and serial code in the same function Run loops on a pool of Matlab resources Iterations must be order-independent Parallel for-loop example clear A d = 0; i = 0; parfor i = 1:400000000 d = i*2; A(i) = d; end A d i 8

Parallel R Options Rmpi offers access to numerous functions from the MPI API, as well as a number of R-specific extensions. The snow (Simple Network of Workstations) package provides an abstraction layer by hiding the communications details. Sample Parallel R program using snow rm(list = ls()) library("snow") library("rsprng") ### create a cluster clusterevalq(cl,... clusterexport(cl, "epsilona") clusterexport(cl, "epsilonw") clusterexport(cl, "SIMULATION") clusterexport(cl, "SIM.PATH") clusterevalq(cl, print(ls())) #Do job... ### must always do at the end stopcluster(cl) 9

Parallel R SGE script #!/bin/bash # #$ -S /bin/bash cwd # $ -l arch=lx26-amd64 #$ -l highprio #$ -pe high 10 /usr/bin/lamboot -H -s $TMPDIR/machines /usr/lib64/r/library/snow/rmpisnow CMD BATCH cl_simulation.r results.rout /usr/bin/lamhalt -H Blue Devil Grid GPU cluster The BDGPU cluster is a shared set of machines provided by the University, each with one or more Nvidia GT-200 series GPUs. Not a "Beowulf" cluster just a collection x86-64 Linux boxes Machines have no keyboards and no monitors must use ssh There is a front-end node, bdgpu-login-01.oit.duke.edu, for compilation, job submission, and debugging 17 GPU compute nodes Machine list Nodes CPU #cores CPU Speed Mem GPU (cores, speed, memory) bdgpu-login-01 Phenom II 940 4 3.0 Ghz 4 GB GTX 260 (216, 1.24 GHz, 896 MB) bdgpu-n01-bdgpu-n10 Phenom 9350e 4 2.0 Ghz 4 GB GTX 275 (240, 1.48 GHz, 896 MB) bdgpu-n11 Athlon II 240 2 2.9 Ghz 4 GB GTX 275 (240, 1.48 GHz, 896 MB) bdgpu-n12 Sempron 140 1 2.7 Ghz 4 GB Tesla C1060 (240, 1.30 GHz, 4 GB) bdgpu-n13-bdgpu-n17 Athlon II 620 4 2.6 Ghz 4 GB Tesla C1060 (240, 1.30 GHz, 4 GB) 10

BDGPU Filesystems Home directory (/afs) - the campus Andrew file system (AFS). can also be mounted directly to your workstation or accessed via a browser: https://webdav.webfiles.duke.edu/~yournetid Scratch directory (/bdscratch) NFS-mounted RAID 0 partition temporary file storage during job execution not archival create your own subdirectory, copy over files, delete when done mkdir /bdscratch/tm103/job1_scratch cp ~/job1/* /bdscratch/tm103/job1_scratch cd /bdscratch/tm103/job1_scratch qsub submit_script (job completes) rm -fr /bdscratch/tm103/myjob Applications directory (/opt) All cluster installed applications: https://wiki.duke.edu/display/scsc/bdgrid+installed+applications BDGRID Installed Applications Bioinformatics GPU-HMMER 0.92 /opt/bin/ http://mpihmmer.org/ Math Library BLAS 3.0-37 /opt/lib64/libblas.so.3 http://www.netlib.org/blas/ GPUmat 0.24 /opt/gpumat http://gp-you.org/ Lapack 3.0-37 /opt/lib64/liblapack.so.3 http://www.netlib.org/lapack/ Math/Statistics R 2.10 /opt/bin/r http://www.r-project.org/ Matlab R2009b /opt/bin/matlab http://www.mathworks.com/ Molecular Dynamics VMD 1.8.7 /opt/vmd/bin/vmd http://www.ks.uiuc.edu/research/vmd AMBER 10 /opt/amber10 http://ambermd.org/ Miscellaneous SQLite 3.3.6 /usr/bin/sqlite3 http://www.sqlite.org/ Boost 1.34.1 /usr/include/boost http://www.boost.org/ 11

Interactive access - Graphical Linux connect with ssh X netid@bdgpu-node-number Windows connect using X-Win32 (download from www.oit.duke.edu) Mac connect with X11 (free from Apple) GPUmat CUDA plug-in for Matlab GPU computational power can be easily accessed from MATLAB without any GPU knowledge. MATLAB code is directly executed on the GPU GPUmat speeds up MATLAB functions by using the GPU multiprocessor architecture. Existing MATLAB code can be ported and executed on GPUs with few modifications. GPU resources are accessed using MATLAB scripting language. The rapid code prototyping capability of the scripting language is combined with the fast code execution on the GPU. The most important MATLAB functions are currently implemented. GPUmat can be used as a Source development Kit to create missing functions and to extend the library functionality. Supports real/complex, single/double precision data types. 12

GPUmat example Allows standard MATLAB code to run on GPUs. Execution is transparent to the user: A = GPUsingle(rand(100)); % A is on GPU memory B = GPUdouble(rand(100)); % B is on GPU memory C = A+B; % executed on GPU. D = fft(c); % executed on GPU A = single(rand(100)); % A is on CPU memory B = double(rand(100)); % B is on CPU memory C = A+B; % executed on CPU. D = fft(c); % executed on CPU Porting existing Matlab code to GPUmat Convert Matlab variables to GPU variables (except scalars) The easiest way is to use GPUsingle or GPUdouble initialized with the existing Matlab variable: Ah = [0:10:1000]; % Ah is on CPU A = GPUsingle(Ah); % A is on GPU, single precision B = GPUdouble(Ah); % B is on GPU, double precision The above code can be written more efficiently using the colon function, as follows: A = colon(0,10,1000,gpusingle); % A is on GPU B = colon(0,10,1000,gpudouble); % B is on GPU Matlab scalars are automatically converted into GPU variables 13