DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

Size: px

Start display at page:

Download "DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)"

Jayson Jordan
5 years ago
Views:

1 DELIVERABLE D5.5 Report on ICARUS visualization cluster installation John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) 02 May 2011

3 NextMuSE 2 Next generation Multi-mechanics Simulation Environment

4 Cluster configuration The original EIGER vizualization and analysis cluster (installed in April 2010) includes 19 nodes based on the six-core dual-socket AMD Istanbul Opteron 2427 processor and running at 2.2 GHz. Four nodes are reserved to specific tasks: one for the login, one for the administration and two for the file system IO routing; leaving 15 nodes to which we have now added (March 2011) added an extension of four nodes based on the 12-core dual socket AMD Magny-Cours Opteron 6174 running at 2.2 GHz. Standard nodes offers 24 GB of main system memory, whereas fat (memory) nodes and extension nodes offer 48 GB per node. We therefore get a total of 276 cores and 664 GB of memory. In addition to the CPUs, every node hosts two GPU cards, GeForce or Tesla. The latest nodes come with Fermi cards providing 448 cuda cores each and have either 3 or 6 GB of memory onboard. More details are given in Table 1. For the high speed network interconnect, the cluster relies on a dedicated Infiniband QDR fabric infrastructure, able to support both parallel-mpi traffic and parallel data file system traffic to IO nodes. In addition, a commodity 10 GbE LAN ensures interactive login access, home, project and application file sharing among the cluster nodes. A standard 1 GbE administration network is also reserved for cluster management purposes. Altair PBS Professional V 10.2 is the main batch queuing system installed and supported on the cluster. A CSCS user project has been created which allows external partners to access the cluster The accounting system has a partition reserved for NextMuSE so that CPU hours consumed by (external) NextMuSE users can be automatically recorded. Node configuration (extension) Nodes are dual socket nodes with 48 GB of memory. As shown in Figure 1, one socket has 32 GB of memory whereas the other one has 16 GB, NUMA effects must therefore be considered when using more than the amount of memory a single socket provides, ie. 32 or 16 GB. Nevertheless each core comes with a L1 cache of 64 KB, a L2 cache of 512 KB and a shared L3 cache of 10 MB (2x6 MB but only 10 MB visible). Figure 1: Magny-Cours Node Topology Next generation Multi-mechanics Simulation Environment 3

5 Node Node CPU Type # cores per # sockets per Memory per CPU GPU # GPU per Name Type node node node frequency type node eiger160 login AMD Istanbul GB 2.2 Ghz Matrox 1 eiger170 admin AMD Istanbul GB 2.2 Ghz Matrox 1 eiger180 gpfs AMD Istanbul GB 2.2 Ghz Matrox 1 eiger181 gpfs AMD Istanbul GB 2.2 Ghz Matrox 1 1 eiger200 vis AMD Istanbul GB 2.2 Ghz GTX eiger201 vis AMD Istanbul GB 2.2 Ghz GTX eiger202 vis AMD Istanbul GB 2.2 Ghz GTX eiger203 vis AMD Istanbul GB 2.2 Ghz GTX eiger204 vis AMD Istanbul GB 2.2 Ghz GTX eiger205 vis AMD Istanbul GB 2.2 Ghz GTX eiger206 vis AMD Istanbul GB 2.2 Ghz GTX eiger207 visfat AMD Magny-Cours GB 2.2 Ghz M eiger208 visfat AMD Magny-Cours GB 2.2 Ghz M eiger209 visfat AMD Magny-Cours GB 2.2 Ghz C eiger210 visfat AMD Magny-Cours GB 2.2 Ghz C eiger220 visfat AMD Istanbul GB 2.2 Ghz GTX eiger221 visfat AMD Istanbul GB 2.2 Ghz GTX eiger222 visfat AMD Istanbul GB 2.2 Ghz GTX eiger223 visfat AMD Istanbul GB 2.2 Ghz GTX eiger240 a.d.n. AMD Istanbul GB 2.2 Ghz S eiger241 a.d.n. AMD Istanbul GB 2.2 Ghz S eiger242 a.d.n. AMD Istanbul GB 2.2 Ghz C eiger243 a.d.n. AMD Istanbul GB 2.2 Ghz C Table 1: Eiger System Configuration with newly installed NextMuSE Extension 4 Next generation Multi-mechanics Simulation Environment

6 MPI configuration The default MPI distribution installed on the system is MVAPICH2, which provides a good and reliable implementation of MPI over InfiniBand more details are available on Below are presented benchmark of the measured bandwidth and latency between eiger nodes using inter-process communication or intra-process communication. Note that an additional kernel module is used for one-copy intra-node message passing, optimizing the performance for this type of configuration. Between two nodes with an Infiniband QDR 4X link, the theoretical bandwidth is expected to be 4 GB/s, here the achieved bandwidth appears to be only 3 GB/s. Note also that since these measurements were made, the system has been constantly updated and the acheivable bandwidth should be slightly higher, though on the newly installed nodes, bandwidth is slightly lower due to the internal hardware configuration. Inter-node Two Sided Operations (OFA-IB-Nemesis) Intra-node Two Sided Operations (KNEM) Next generation Multi-mechanics Simulation Environment 5

Remote access configuration Remote access to the cluster is provided using either ssh through the main CSCS front-end machine ELA, or using remote desktop viewer solutions such as TurboVNC or

An example of the connection procedure using the TurboVNC software is available at the following address: http://user.cscs.

7 Remote access configuration Remote access to the cluster is provided using either ssh through the main CSCS front-end machine ELA, or using remote desktop viewer solutions such as TurboVNC or TigerVNC which allow the use of OpenGL applications (e.g. ParaView) at a reliable frame rate. An example of the connection procedure using the TurboVNC software is available at the following address: n_access_procedure/index.html. Below is a screen-shot of what any NextMuSE partner should be able to get: Launching parallel paraview server jobs Additional information on how to configure paraview to launch reverse connection jobs for HPC visualization is available via the pv-meshless wiki pv-meshless is a ParaView plugin developed by CSCS which forms the main host for the SPH analysis modules developed within the NextMuSE project. 6 Next generation Multi-mechanics Simulation Environment

HYCOM Performance Benchmark and Profiling

HYCOM Performance Benchmark and Profiling Jan 2011 Acknowledgment: - The DoD High Performance Computing Modernization Program Note The following research was performed under the HPC Advisory Council activities