Scalability issues : HPC Applications & Performance Tools

Size: px

Start display at page:

Download "Scalability issues : HPC Applications & Performance Tools"

Barnard Tate
5 years ago
Views:

1 High Performance Computing Systems and Technology Group Scalability issues : HPC Applications & Performance Tools Chiranjib Sur India Systems and Technology Lab chiranjib.sur@in.ibm.com

2 Top 500 : Some statistics Scalability Performance 35.63% 33.99% 34% 16.4% Top Domains Top500 : Systems 2 Top500 : Performance Source :

3 Laboratory astrophysics - computational snapshot Laboratory Astrophysics Multi-phased, multi-level Massive computation Computational challenge!! Massive Parallelism Required 3

4 Scalability challenges different aspects OS & Parallel Environment Interconnects Compilers Optimization & Debuggers Scalable parallel File System Scalable High Performance Computing Performance Analysis tools - Single place to go! High Throughput Hardware Architecture, Threading, I/O 4 Parallel Language Parallel Algorithm Sustained Performance

5 High PERFORMANCE or High THROUGHPUT Amdahl's law Gustafson's law If the serial component shrinks in size, as the problem scales, there is opportunity for speedup! If the serial component remain proportionately equal, there is no inherent speedup! 70% 30% 70% 30% Parallel component is 50x, max speed up is 3.25x % 95% 30% 5% Parallel component is 50x, max speed up is 18.26x

6 High PERFORMANCE or High THROUGHPUT Amdahl's law Gustafson's law If the serial component remain Parametrization of Scalability proportionately equal, there is no inherent Tp = parallel execution time speedup! Ts = serial execution time TOh = Overheard 70% 70% 30% 30% Parallel component is 50x, max speed up is 3.25x 6 If the serial component shrinks in size, as the problem scales, there is T s for opportunity T p=! +T Oh ( p) speedup p 70% 95% 30% 5% Parallel component is 50x, max speed up is 18.26x

7 Scalability algorithm / programming languages Parallel algorithm - Most legacy codes are not designed to work in parallel - Mostly not designed to exploit modern day HPC architecture Parallel languages - Legacy codes contains language (version) specific syntaxes (e.g. dynamic memory in FORTRAN 77) - Old codes needs major revision to use modern features, e.g. handling of large arrays - Not so easy to re-write old codes using new languages like X10, UPC etc. 7

8 Legacy code Algorithm - a Case Study 8

9 Legacy code Algorithm - a Case Study 9

10 Legacy code Algorithm - a Case Study 10

11 Scalability computing platform Hardware Scaling OUT or Scaling UP? 11 Courtesy : Thomas Dunning,

12 Scalability computing platform Hardware what to look for? / how to look for? Hardware Thread Management Usage of multiple lightweight concurrent threads Less switching overhead Addressing the issue of instruction and memory latency On-Chip Shared Memory Efficient managament of cache Efficient thread communication / cooperation within blocks Threading - Random Access to Global Memory Any thread can read/write any location(s) Sync with the system software Monolithic thread vs blocks (smaller in size) of threads 12

13 Scalability computing platform Hardware what to look for? / how to look for? Hardware Thread Management Usage of multiple lightweight concurrent threads Less switching overhead Addressing the issue of instruction and memory latency On-Chip Shared Memory Efficient managament O1 of O2cache O3 O4 Efficient thread communication / cooperation within blocks Threading - Random Access to Global Memory Opt level ----> Any thread can read/write any location(s) Sync with the system software Monolithic thread vs blocks (smaller in size) of threads 13

Scalability system software User Space NSD - VDISK SOCKETS GSM Infrastructure LAPI Reliable FIFO, RDMA, Striping, Failover/Recovery, Checkpoint/Restart, Pre-emption, User

xseries Network Adapter(s) HFI, IB Network(s) Operating Systems: AIX / Linux GSM MPI SHMEM UPC MASS CAF ESSL Parallel ESSL PNSD / NRT Debug/Comm Infrastructure GPFS

14 Scalability system software User Space NSD - VDISK SOCKETS GSM Infrastructure LAPI Reliable FIFO, RDMA, Striping, Failover/Recovery, Checkpoint/Restart, Pre-emption, User Space Statistics, Multi-Protocol, Scalability UDP TCP Multi-Link, Superpkt HAL AIX & Linux AIX & Linux Verbs NM HCP 14 DD HYP IP IF_LS Hardware Platforms: pseries / xseries Network Adapter(s) HFI, IB Network(s) Operating Systems: AIX / Linux GSM MPI SHMEM UPC MASS CAF ESSL Parallel ESSL PNSD / NRT Debug/Comm Infrastructure GPFS APPLICATION Fortran (77, 95) OpenMP HPCS Toolkit Eclipse Tools Eclipse PTP Framework C, C++ OpenMP Parallel Debugger xcat LL / Resource Mgr Pre-emption, C/R POE Runtime Kernel Space

15 Scalability System Software stack Compilers ( Five distinct optimization levels + many additional options Code generation and tuning for specific hardware chipsets Interprocedural optimization and inlining using IPA Profile-directed feedback (PDF) optimization User-directed optimization with directives and source-level intrinsic functions Optimization of OpenMP programs and auto-parallelization capabilities to exploit SMP systems Automatic parallelization of calculations using vector machine instructions and high-performance mathematical libraries OS and Parallel Environment 15

16 Scalability System Software stack Compilers ( Five distinct optimization levels + many additional options Code generation and tuning for specific hardware chipsets Interprocedural optimization and inlining using IPA Profile-directed feedback (PDF) optimization User-directed optimization with directives and source-level intrinsic functions Mflops/Sec Optimization of OpenMP programs and auto-parallelization capabilities to exploit SMP systems Automatic parallelization of calculations using vector machine instructions and high-performance mathematical libraries OS and Parallel Environment Opt level 16

17 Scalability System Software stack Parallel Environment what next? Memory -Using Remote Direct Memory Access (RDMA) Interconnects - RDMA with proper interconnect Parallel tuned library - Customized %2Fcom.ibm.cluster.pe432.opuse1.doc%2Fam102_scalaperf.html 17 Data intensive / Task intensive computing Combining Massive Data parallelism and instruction level parallelism heterogeneous model? Next generation MPI 3..?

18 The Computing cycle 18

19 The Performance Pie Performance Dimensions CPU Performance MPI Performance Threading Performance I/O Performance 19

Scalability Performance Tools What this tool is all about? More on next few sessions What we can do with a tool like this? What programming language?

20 Scalability Performance Tools What this tool is all about? More on next few sessions What we can do with a tool like this? What programming language? - FORTRAN, C, C Which platform we can use? - Entire range of IBM HPC hardware portfolio Which operating system? - AIX & Linux What we mean by Scalable Tools? M$

21 Performance analysis in a nutshell IBM HPC Toolkit 21 HPM Visualization Hardware Performance Monitoring Eclipse Plug-in, PeekPerf, Xprof MPI OpenMP MIO Profiling MPI calls Profiling openmp directives I/O analysis and optimization

22 Scalability Performance tools NPB Fourier Transform - Class A 60 NonInst Inst 50 Execution time No of procs 16 32

23 Scalability case studies : Timing and overhead Timing - ft.a Exec time (2) Initialization time (4) Overhead (4) Timing No of procs

24 Scalability case studies : MPI communication MPI All-to-All communication - ft.a Data transfer (bytes) No of procs Time (s) Average Communication time (MPI) - ft.a No of Procs

25 Scalability case studies : Hardware & I/O No. of pagefault without I/O - ft.a 6000 page faults No of Procs Context switch - ft.a 100 Context switch No of procs

26 Summary : Performance analysis and next... What we can do now? What we need? 26

27 Summary : Performance analysis and next... What we can do now? What we need? What we are planning to do? 27

28 Next few talks.. Today Tomorrow 28

29 The team working on performance IBM Dave Aditya John Pidad Praful Servesh 29 Chiranjib

IBM High Performance Computing Toolkit

IBM High Performance Computing Toolkit Pidad D'Souza (pidsouza@in.ibm.com) IBM, India Software Labs Top 500 : Application areas (November 2011) Systems Performance Source : http://www.top500.org/charts/list/34/apparea