CS 470 Spring Mike Lam, Professor. Performance Analysis

Size: px

Start display at page:

Download "CS 470 Spring Mike Lam, Professor. Performance Analysis"

Dominic Fields
5 years ago
Views:

1 CS 470 Sring 2018 Mike Lam, Professor Performance Analysis

2 Performance analysis Why do we arallelize our rograms?

3 Performance analysis Why do we arallelize our rograms? So that they run faster!

4 Performance analysis How do we evaluate whether we've done a good job in arallelizing a rogram?

5 Performance analysis How do we evaluate whether we've done a good job in arallelizing a rogram? Asymtotic analysis (i.e., distributed sum) Emirical analysis

6 Emirical analysis issues How do you measure time-to-solution accurately? CPU cycles, OS clock "ticks", wall time, etc. How do you comare across systems? Differing CPUs, memories, OSes, etc. How do you comare against the original? 1-core arallel version will likely be slower How do you assess scalability? Does erformance imrove as you add cores? How do you quantify the imrovement? Is there a limit to how far we can imrove erformance?

7 Exerimental methods Measure wall time for secific code regions of interest Ignore startu and I/O time if not relevant Make sure you have a high-resolution timer! /usr/bin/time -v for whole rograms gettimeofday() from sys/time.h for Pthreads om_get_wtime() for OenMP MPI_Wtime() for MPI Use barriers if necessary to make sure all threads/rocesses have finished before you sto a timer

8 Exerimental methods Control for variance Do all exeriments on the same machine or cluster Maximum of one thread er core and one job er node Our cluster can suort 8 threads er node (or 16 if hyerthreading, but this is not recommended) Run multile trials and use minimum time Avoid OS interference or noise Track variance to measure system noise If your variance is low or if your slowest and fastest time are relatively close, it's robably noise!

9 Emirical analysis T s = serial time = arallel time = # of rocesses S = seedu = T S S E = efficiency = = T S should increase as grows usually decreases as grows

10 Emirical analysis T s = serial time = arallel time = # of rocesses S = seedu = T S S E = efficiency = = T S should increase as grows usually decreases as grows r = serial % of original rogram = (1 r )T S +r T S S = seedu = T S (1 r)t S +r T S

11 Emirical analysis T s = serial time = arallel time = # of rocesses S = seedu = T S S E = efficiency = = T S should increase as grows usually decreases as grows r = serial % of original rogram = (1 r )T S +r T S S = seedu = T S (1 r)t S +r T S Amdahl's Law: S 1 r as increases

12 Amdahl's Law = # of rocessors r = serial % of rogram S = seedu = T S (1 r)t S +r T S S Amdahl's Law: 1 r as increases r = 50% seedu limited to 2x r = 25% seedu limited to 4x r = 10% seedu limited to 10x r = 5% seedu limited to 20x Seedu limited inversely roortionally by serial %

13 Scaling Generally, we don't care about any articular TP Or with how it comares to T S (excet as a sanity check) More imortant: how TP, S, and E change as increases And/or as the roblem size increases Similar to asymtotic analysis in CS 240 In general, a rogram is scalable if E remains fixed as and the roblem size increase at fixed rates Most common: grah on y-axis vs. on logarithmic x-axis

14 Scaling Strong scaling: as increases, TP decreases Linear seedu: same rate of change (2x rocs half time) Sublinear (most common) / suerlinear (exceedingly rare) seedu Weak scaling: as increases AND the roblem size increases roortionally, stays roughly the same bad bad Strong scaling good Weak scaling good and _size

15 Scaling Alternatively: Strong scaling means we can kee the efficiency fixed without increasing the roblem size Weak scaling means we can kee the efficiency fixed by increasing the roblem size at the same rate as the rocess/thread count S E = efficiency = = T S usually decreases as grows

16 Cluster access Detailed instructions online: w3.cs.jmu.edu/lam2mo/cs470/cluster.html Connect to login node via SSH Hostname: login.cluster.cs.jmu.edu User/assword: (your e-id and assword) Recommended conveniences Set u ublic/rivate key access from stu Set u.ssh/config entries Install Sack for access to more software

17 Cluster access Things to lay with: "squeue" or "watch squeue" to see jobs "srun <command>" to run an interactive job Use -n <> to launch rocesses Use -N <n> to request n nodes (defaults to /8) The given <command> will run in every rocess "salloc <command>" to run an interactive MPI job Use -n <> to launch MPI rocesses srun hostname srun -n 4 hostname srun -n 16 hostname srun -N 4 hostname srun slee 5 srun -N 2 slee 5 salloc -n 1 mirun /shared/mi-i/mii salloc -n 2 mirun /shared/mi-i/mii salloc -n 4 mirun /shared/mi-i/mii salloc -n 8 mirun /shared/mi-i/mii salloc -n 16 mirun /shared/mi-i/mii (etc.) What s the max n?

18 Job management SLURM (Simle Linux Utility for Resource Management) is a iece of system software outside the OS (a.k.a. middleware) that handles job submission and scheduling on our cluster An interactive job takes control of your terminal Run with srun or sbatch You may interact with it (rovide standard inut, etc.) You also have to wait for it to finish Similar to a foreground shell job A batch job runs in the background without interaction Create a shell scrit and run it with sbatch Sends outut to a file (named slurm-jobid.out by default) Use squeue to check to see if it has finished

19 Batch jobs To run a batch job on the cluster, create a shell scrit and run it with sbatch Bash examle: #!/bin/bash # #SBATCH --job-name=hostname #SBATCH --nodes=1 #SBATCH --ntasks=1 <your commands go here>

20 Running exeriments Common exerimentation atterns in Bash: # run 5 times for i in $(seq 1 5); do <cmd> done # run common thread counts for t in ; do OMP_NUM_THREADS=$t <cmd> done

Topics. Lecture 4. IT Group Cluster2 (1/2) What is a cluster? IT Group Cluster2 (2/2) Important Commands / Queuing.

Topics. Lecture 4. IT Group Cluster2 (1/2) What is a cluster? IT Group Cluster2 (2/2) Important Commands / Queuing. Toics Our Cluster Lecture 4 MPI Programming (I) MPI Introduction Information inquery Broadcast / Reduce 1 2 What is a cluster? A cluster is a dedicated resource for running comutational tasks. A collection