CALMIP : HIGH PERFORMANCE COMPUTING Nicolas.renon@univ-tlse3.fr Emmanuel.courcelle@inp-toulouse.fr CALMIP (UMS 3667) Espace Clément Ader www.calmip.univ-toulouse.fr
CALMIP :Toulouse University Computing Center q Start in 1994 : 17 Labs. share computing resources q Support of University of Toulouse (6 Universities+ CNRS) q Purposes : q Promote High Performance Computing q training in parallel computing, code optimisation q Exchanges experiences (Thematic Days) q Access to a Competitive computing system q Purchase a system q Achieving performance/»easy to use»/stable q Support Users q Basics q Developping parallel code Page 2
CALMIP : a medium/meso size computing Center European O(10) PF National O(1) PF Mesocentre O(100) TF Labs. Mesocentre CALMIP : ü Propinquity ü Production (reliable ressources) ü Multi scientific topic TF = TeraFlop/s (10 12 flop/s) PF = PetaFlop/s (10 15 flop/s) Flop/s = Floating Operation per second Page 3
HPC in Europe (Partnership for Advanced Computing in Europe) Sweden Finland Archer 1.3 PFLOP Nederlands CURIE 9PFLOP JUQUEEN 5PFLOP HAZEL HEN SUPERMUC 5PFLOP 3PFLOP Czech Piz Daint 19 PFLOP Poland MareNostrum 6,2 PF MARCONI 6PFLOP PF = PetaFlop/s (10 15 flop/s)
CALMIP : CPU hours needs 2010-2017 Evolution demande H_CPU/nombre_projets 110000000 90000000 231 77 813 765 250 265 98 124 080 280 103 619 210 260 253 240 220 70000000 178 211 217 200 180 Heures CPU 50000000 137 35 061 314 41 771 450 160 140 120 nombre projets 30000000 29 354 000 100 80 10000000 10 279 000 22 486 860 Total demandes Total projets 60 40-10000000 2010 2011 2012 2013 2014 2015 2016 2017 20 0 Années Page 5
CALMIP : CPU hours needs % per Scientific Topics. (+45 Labs) Theoretical Physics 4% Hour CPU needs per scientific topics Engineer Science 3% Matter Science 24% Fluid Mechanics 30% Quantum Chemistry 12% Life Science 10% Numerical Algorithm and Method 1% Universe Science 16% Page 6
CALMIP COMPUTING CAPACITY EVOLUTION CALMIP : Supercomputer Evolution 2004-2014 # 223 @TOP 500 x 7 # 183 @TOP 500 # 183 @TOP 500 274 TFlops x 22 33 TFlops 38,5 TFlops x 3,75 1,5 TFlops 400 GFlops SOLEIL1 SOLEIL2 HYPERION HYPERION+ EOS 68 CPU 136 Go RAM 512 CPU 512 Go RAM 2 912 CPU 14 To RAM 3 500CPU 14 To RAAM 2004 2007 2009 2011 12 240CPU 39 To RAM 2014 7
TOP 500 List #1 2017 = 93 PF 15 MW 2,2 MW Moore s Law #1 2011 = 10 PF 12 MW Page 8
Ø Shered Memory Machine (or SMP : Symetric Multi-Processing) Memory interconnect Application Parallelism: OpenMP!$OMP DO PARALLEL cores 0 1 limited #cores do i = 1, n a(i) = 92290. + real(i) ; end do Ø Distributed Memory Machine!$OMP END DO Memory interconnect cores interconnexion interconnexion 0 1 n n+1 Application Parallelism: MPI (Message Passing Interface) more complexity ++ n large, almost unlimited #cores
CALMIP & Atomic and Molecular Computation Codes in Matter Science, Quantum Chemistry q 4HE-DFT (OpenMP) q LAMMPS (MPI, GPU) q ADF (MPI) q ABINIT (MPI) q AMBER (MPI, GPU) q bigdft (MPI GPU) q DeMon (demon-nano) (MPI) q DIRAC (MPI) q CPMD (MPI+OpenMP) q CP2K (MPI+OpenMP) q GAMESS (MPI) q GAUSSIAN (OpenMP) q GROMACS (MPI+OpenMP, GPU) q MOLCAS q MOLPRO q NWCHEM (MPI) q NAMD (MPI, GPU) q ORCA (MPI) q QMC=chem (Custom) q QUANTUM ESPRESSO (MPI+OpenMP) q SIESTA (MPI) q VASP (MPI+OpenMP, GPU) q WIEN2K (MPI, Custom) q Today focus on : 4HE-DFT Page 10
HIGH PERFORMANCE COMPUTING EOS COMPUTING SYSTEM Frontales de connexion : 4 x (20-cores,128 GB RAM) Cluster distribué BULLx DLC : 12240 cores - 612 nodes Intel Ivybridge 2,8 Ghz 10-cores 64 GB RAM / nœud Interconnection : Infiniband FDR Nœud large mémoire : 128 cores - 2 TB RAM Intel Haswell-EX 2,2 Ghz 16-cores - Page web associée Solution de visualisation à distance : 2 nœuds (20-cores, 128 GB RAM) Cartes Nvidia Quadro 6000
Supercomputer EOS : Architecture Interconnect Interconnexion infiniband FDR interconnexion Compute node (SMP) Compute node (SMP) Compute node (SMP) Memory 64Go SHARED 64Go SHARED 64Go SHARED 20 cores Cache Cache Cache Cache Cache Cache Intel Ivybridge processeur or socket = 10 cores and 25 Mo de cache
HIGH PERFORMANCE COMPUTING HANDS ON : CONNEXION TO FRONTAL NODE Connexion «Secure Shell» (ssh) Linux / macos ssh X {login}@eos Frontals nodes : 4 x (20-cores,128 GB RAM) Windows Client ssh with serveur X (Putty/Xming, MobaXterm) - Page web associée
HIGH PERFORMANCE COMPUTING LAUNCH COMPUTATIONS : BATCH SCHEDULER Connexion ssh X {login}@eos Frontales de connexion : 4 x (20-cores,128 GB RAM) Launch through the batch scheduler (SLURM) sbatch mon_job Distributed Memory Cluster BULLx DLC (12240 cores - 612 nodes) Processeurs Intel Ivybridge 2,8 Ghz 10-cores 64 GB RAM / nœud Interconnection : Infiniband FDR Topologie : full fat-tree
HIGH PERFORMANCE COMPUTING SLURM COMMANDS : BASICS Launch job sbatch mon_job Stop job scancel $SLURM_JOBID The list of jobs for $USER squeue u $USER - Page web associée
HIGH PERFORMANCE COMPUTING LAUNCH AN OPENMP CODE Shared Memory parallelism (multithreading) : #!/bin/bash #SBATCH --job-name=script_utilisationeos #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=5 #SBATCH --time=0-01:00:00 export OMP_NUM_THREADS=5 Specify # cores (example : 5) OpenMP variable specify the number of threads (should be the same as # cores above) srun./mon_appli.exe - Page web associée
MESOCENTER CALMIP