HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure

Size: px

Start display at page:

Download "HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure"

Derek Evans
6 years ago
Views:

1 HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure - CÉCI: What we want - Cluster HMEM - Cluster Lemaitre2 - Comparison - What next? - Support and training - Conclusions

2 CÉCI: What we want CÉCI: Consor*um des Equipements de Calcul Intensif en FWB (ULB, ULG, UMONS, FUNDP and UCL) supported by the F.R.S FNRS Infrastructures: Several sites with high performance parallel systems: Clusters with many nodes (many core) and high throughput and low latency interconnect (infiniband) Dedicated sites for specific needs: High memory systems > Large amount of memory per node > Large amount of memory per core Adapted planorm for very long processes (up to 30 days) GPU accelerators =>... Support and training: organized once a year for all the researchers in FWB User support available locally (close to researchers) HMEM Lemaitre2

3 HMEM : the processor 50% UCL 50% CÉCI AMD Opteron 6100 series (Magny Cours): 12 cores 45 nm SOI L1 64 KB D + 64 KB I 512 KB (per core) L3 12 MB (per socket) 4 HT3 links 25.6 GB/s 4 channel DDR3 memory controller 34.15GB/s@1066 MHz

4 HMEM : the cluster 50% UCL 50% CÉCI 816 CPUs (Opteron GHz), IB QDR (40Gbps) 2 nodes 512 GB RAM, 7 nodes 256 GB et 8 nodes 128 GB 11 TB /home, 5 TB /workdir, 30 TB /scratch Infiniband QDR Working node... Working node... Working node... Working node 2 x 512 GB RAM, 3.2 TB /scratch 7 x 256 GB RAM, 1.7 TB /scratch 7 x 128 GB RAM, 1.7 TB /scratch GbE 1 x 128 GB RAM, 0.7 TB /scratch (UCL/ELIC) FHGFS: 30TB Sharing local disks of the WN

Lemaitre2: the processor 100% CÉCI CISM Up to 32 GB/s

GB/s CÉCI HMEM Lemaitre2 Comparison Next?

EP): 6 cores 32 nm L1 32 KB D + 32 KB I 256 KB (per

5 Lemaitre2: the processor 100% CÉCI CISM Up to 32 GB/s DDR3 DDR3 DDR3 Ch0 Ch0 C0 C1 C2 C3 C4 C5 C0 C1 C2 C3 C4 C5 Ch1 Ch2 Ch1 L3 12MB L3 12MB QPI 25.6 GB/s 5520 Chipset Ch2 DDR3 DDR3 DDR3 Up to 32 GB/s CÉCI HMEM Lemaitre2 Comparison Next? Training Conclusions Intel Xeon 5600 series (Westmere EP): 6 cores 32 nm L1 32 KB D + 32 KB I 256 KB (per core) L3 12 MB (per socket) 2 QPI links 25.6 GB/s 3 channel DDR3 memory controller 32GB/s CECI Day 2012

Lemaitre2 : the cluster CISM CÉCI HMEM Lemaitre2 Comparison Next?

53 GHz), IB QDR (40Gbps) 4 GB/core 40 TB /home and 120 TB /scratch

Mngmt Lemaitre2 Working node Server Server.

6 Lemaitre2 : the cluster CISM CÉCI HMEM Lemaitre2 Comparison Next? Training Conclusions 112 nodes, 1344 cores (Intel Xeon X GHz), IB QDR (40Gbps) 4 GB/core 40 TB /home and 120 TB /scratch Server Server 12 disks GbE 12 disks Home: 40TB Access Inﬁniband QDR Mngmt Lemaitre2 Working node Server Server disks 12 disks Working node 12 disks 12 disks Scratch: Lustre: 120TB Working node CECI Day 2012

7 Comparison

8 % Utilization

9 What next? 2012: Installajon of at UCL : New infrastructures at ULB, ULG, UMONS and FUNDP

10 Access Centralized Idenjficajon System: => Access to any cluster with the same login Same resource manager system everywhere (SLURM): => Same submission script everywhere CECI s infrastructures are in a restricted area => Access restricted to the idenjfied university networks

11 Training session organised by the CISM

12 Training session organised by the CISM Last session (October 2011)

=> Up to 6000 cores expected for 2013 Centralized idenjficajon system and use of the same resource manager system everywhere (SLURM) is organized once a year

13 Conclusions Two clusters, and Lemaitre2, have been installed and successfully shared into the Consorjum CECI. => Average usage already very high 2012: New infrastructures at ULB, ULG, FUNDP and UMONS. => Up to 6000 cores expected for 2013 Centralized idenjficajon system and use of the same resource manager system everywhere (SLURM) is organized once a year for researchers coming from all the universijes of the FWB. Support is provided locally. Maintaining such a high end infrastructures and providing user support and training require dedicated human resources. => Thanks to local administrators and FNRS logisjcians who make it possible.

14 Thanks

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) 02 May 2011 NextMuSE 2 Next generation Multi-mechanics Simulation Environment Cluster