Parallel computing on the Grid

arallel computing on the Grid and application porting examples Roberto Alfieri - arma University & INFN Italy Comput-er.it Meeting - arma 21/06/2011 1

Current MI support in Grid (glite middleware) Issue Solution Description Multiple CU request JDL file CUnumber=4 Multiple MI version/flavour support Get the MI machinefile from the job manager mpi-start tool mpi-start tool openmpi mpich mpich2 Automatic management (BS LSF SGE) Files distribution among nodes mpi-start tool Automatic management (home shared not shared) CUnumber=4; Executable= "mpi-start"; Arguments= my-mpi-prog OENMI"; InputSandbox="my-mpi-prog";. 2

MI-start MpiStart (developed by HLRS Stuttgart) is a set of scripts that ease the execution of MI programs by using a unique and stable interface to the middleware. The adoption of MpiStart in Grid comes from the EGEE MI-WG recommendations http://www.grid.ie/mpi/wiki/ Mpi-start frameworks: 1) Batch scheduler - BS, LSF, SGE,.. 2) MI implementations - openmi, MpiCh2,.. 3) Workflow control - Files distribution (if needed) - user's pre/post execution scripts Mpi-start maintainer web page: https://devel.ifca.es/mpi-start 4/13/2011 arallel computing on the Grid CCR-InfnGrid Workshop 2011 3 3

Memory architecture on multicore processors In modern multicore processors the memory architecture is NUMA - Cpu/memory affinity is the ability to bind a process to a specific CU/memory bank - 1 2 3 p p p p p 1 3 2 Measured network performance (using NetIE): Comm Type Latency MAX Bandw. 1 Intra-socket 640 ns 14 GBytes/s 2 Intra-board 820 ns 12 GBytes/s 3 infiniband 3300 ns 11 GBytes/s Memory performance (peak): Memory Type Latency MAX Bandw. L3 cache 35 ns DDR3 50 ns 32 GBytes/s Numa (HT or QI) 90 ns 11 GBytes/s 4 4

Forthcoming MI support (EMI middleware) Issue Solution Description Multiple CU request with granularity selection JDL file New features WholeNodes=true HostNumber=2 SMgranularity=8 openm support Mpi-start MI_USE_OM=1 pnode pcore psocket CU Affinity support Mpi-start MI_USE_AFFINITY=1 #CUnumber=4; Executable= "mpi-start"; Arguments= -t openmpi -pcore -d MI_USE_OM=1 -- my-prog"; InputSandbox="my-prog"; Wholenodes=true; HostNumber=2; SMGranularity=8; 5

arallel job examples 1 MI rank per node Executable= "mpi-start"; Arguments= -t openmpi -- my-mpi-prog"; InputSandbox="my-mpi-prog"; CUnumber=2; SMGranularity=1; 2 whole nodes, 1 MI rank per core Executable= "mpi-start"; Arguments= -t openmpi -- my-mpi-prog"; InputSandbox="my-mpi-prog"; Wholenodes=true; HostNumber=2; SMGranularity=8; Multi-thread job Executable= "my-openmp-prog"; Arguments= "; InputSandbox="my-openmp-prog"; Wholenodes=true; HostNumber=1; SMGranularity=8; 1 MI rank per socket, 4 threads per MI rank Executable= "mpi-start"; Arguments= -t openmpi d MI_USE_OENM=1 d MI_USE_AFFINITY=1 psocket -- my-prog"; InputSandbox="my-prog"; Wholenodes=true; HostNumber=2; SMGranularity=8;

arallel scientific applications Widely used application examples: Name Description MI Multi thread Einstein tk relativistic astrophysics C/C++/F90 YES YES NO Q-Espresso electronic structure F90 YES experimental with openmp GU Beta (May2011) Namd molecular dynamics C++ YES YES YES Chroma lattice field theory C++ YES NO YES Gromacs molecular dynamics C YES since v.4.5 with pthreads YES (Aug 2010) Since the INFN-ARMA grid parallel cluster (8 WNs 8 cores each, BS, openmi) supports a preliminary version of the new JDL syntax (WholeNodes attributes) and of the new mpi-start tool (openm and affinity support), it has been possible to start porting on the grid some widely used parallel applications. 7

orting example: Quantum Espresso QUANTUM ESRESSO http://www.quantum-espresso.org/ is an integrated suite of computer codes for electronic-structure calculations and materials modeling. The maintenance and further development is promoted by the DEMOCRITOS National Simulation Center of IOM-CNR under the terms of the GNU General ublic License. arallel execution: In Quantum Espresso several parallelization levels are implemented with MI. Typical execution command on C clusters: mpirun -np 16 $BINDIR/pw.x < file.in >file.out Explicit OpenM is a very recent addition, still at an experimental stage. For execution using OpenM on N threads: env OM_NUM_THREADS=N $BINDIR/pw.x < file.in > file.out Hybrid parallelization (MI/openM) should be executed carefully to prevent conflicts: env OM_NUM_THREADS=N mpirun -np 8 $BINDIR/pw.x < file.in > file.out 8

espresso.jdl Executable = "espresso.sh"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"espresso.sh, file.in }; OutputSandbox = {"std.err","std.out, file.out }; espresso.sh Running QE on the Grid WholeNodes=true HostNumber=2 Granularity selection SMGranularity=8 #Requirements=(other.GlueCEUniqueID=="cream-ce.pr.infn.it:8443/cream-pbs-parallel"); #!/bin/bash #install wget http://qe-forge.org/frs/download.php/159/espresso-4.3.1.tar.gz tar xzvf espresso-4.3.1.tar.gz cd $ESRESSO/espresso-4.3.1./configure #-enable-openmp make all # or whatever you need #execute $W=$ESRESSO/espresso-4.3.1/bin/pw.x mpi-start -t openmpi -psoket -d MI_USE_OM=1 -d MI_USE_AFFINITY=1 -- $W < file.in > file.out flavour MI rocess/thread distribution 9

orting example: Einstein toolkit Eintstein toolkit http://einsteintoolkit.org/ is an open software for relativistic astrophysics with mixed MI and openm parallelization. Evolution of a stable general relativistic TOV-Star. Hydro-dynamical Simulation of a perfect fluid coupled to the full Einstein Equations (dynamical space-time) on a 3-dimensional grid with 5-level of refinement spanning an octant of radius 177 km with a maximum resolution within the star of 740 m. Newly support mpi-start has adaptive way to mix openm and MI process. The best performance depends on the Hardware characteristic of the node: MI M Time (s) mpi-start -pcore... 8 1 760 mpi-start -psocket 2 4 730 mpi-start -pnode 1 8 922 Thanks to R. De ietri (arma Univ.) 10

orting example: NAMD NAMD http://www.ks.uiuc.edu/research/namd/ is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. arallel execution: NAMD uses the Charm++ communications layer to launch namd2 processes. SM builds of Charm++ can be used with multiple threads on a single node or across the network. Typical parallel execution command: charmrun namd2 +p16 +ppn 8 <configfile> In case of MI-based SM builds: mpiexec np 16 namd2 +ppn 8 <configfile> NAMD may use specific syntax to start parallel executions, so a customization of the execution script is needed to adapt this application in the Grid environment. 11

namd.jdl Running parallel NAMD on the Grid Executable = namd.sh"; InputSandbox = { namd.sh, NAMD_2.8_Source.tar.gz, configfile }; WholeNodes=true HostNumber=2 SMGranularity=8 namd.sh #!/bin/bash #install tar xzf NAMD_2.8_Source.tar.gz make #execute set NODES = `cat $BS_HOSTFILE` set NODELIST = namd2.nodelist echo group main >! $NODELIST foreach node ( $nodes ) echo host $node >> $NODELIST end @ NUMROCS = 2 * $#NODES charmrun namd2 +p$numrocs ++nodelist $NODELIST <configfile> 12

Thank you for your attention! 13