Using the IAC Chimera Cluster Ángel de Vicente (Tel.: x5387) SIE de Investigación y Enseñanza
Chimera overview Beowulf type cluster Chimera: a monstrous creature made of the parts of multiple animals. Mailing list: beowulf@iac.es Web page: http://chimera Course on Adv. Prog. and Parallel Comp. (June 11 25)
Schematic View
Hardware Details Nodes: 1 master node (EM64T) 16 old i686 nodes: 32 Xeon 2.80 Ghz (chi32) 16 new EM64T nodes: 32 Xeon 3.20 Ghz (chi64) RAM: 98 GB (master: 2 + chi32: 32 + chi64: 64) Disk: ~ 5TB (master: 280 + chi32: 480 + chi64: 4.5TB) Network: two independent Gigabit networks (user applications and admin, nfs, etc.)
Disk space User available space: (all) /home (NFS master): 50 GB /scratch (NFS master): 195 GB (chi32) /local_scratch (local): (per node) 20 GB (chi64) /mnt/pvfs2 (PVFS2 chi64): 3.9 TB /home quotas to be implemented automatic deletion in the other partitions to be implemented as well.
PVFS2 Introduction Stripes data across disks (chi64 in Chimera) Larger files can be created, and potential bandwith is increased. Multiple user interfaces: MPI IO support Traditional Linux file system
PVFS2 Example With MPI IO: /scratch (NFS) /mnt/pvfs2 (PVFS2) Processors: 60 Write bandwith: 24MB/s 892MB/s Read bandwith: 116MB/s 482MB/s Traditional Linux file system: local disk /scratch (NFS) /mnt/pvfs2(pvfs2) Processors: 1 Write 900 MB 14.77s 43.942s 11.779s Read 900 MB (wc) 6.401s 10.007s 45.942s
Modules package Dynamic modification of a user's environment: PATH, MANPATH, etc. Shared and/or private modulefiles. Useful in managing different versions of applications. Very simple to use: module help avail list load unload Use module commands is.bashrc for common environment. Useful for dealing with chi32 vs. chi64
Compiling code Code compiled in 64 bits can only run in chi64. Code compiled in 32 bits can run in chi32, chi64 or chimera (chi32 + chi64). By default you login into a 64bits environment. (see this by running uname a) Modules are by default 64 bits. 32 bits versions end with _32 Environment and modules' bitness should match.
Compiling code (2) Compiling example for 64 bits: [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64 bit LSB executable, AMD x86 64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped Compiling example for 32 bits: env32 puts us into a 32 bits environment [angelv@chimera sieminar]$ module list (verify 32 bits versions) [angelv@chimera sieminar]$ mpicc o cpi_32 cpi.c [angelv@chimera sieminar]$ file cpi_32 cpi_32: ELF 32 bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped
Submitting jobs to the cluster Chimera's queueing system: Torque: Resource Manager Maui: Scheduler Maui/Torque basic commands: showq, qsub, checkjob, canceljob qsub needs a submitting file: [angelv@chimera sieminar]$ cat submit cpi #!/bin/sh NP=$(wc l $PBS_NODEFILE awk '{print $1}') cd $PBS_O_WORKDIR mpirun np $NP machinefile $PBS_NODEFILE./cpi
Submitting jobs to the cluster (2) With qsub you specify: the number of nodes required, the time required, the bitness of nodes required, etc. Example submissions: To chi64 (default): qsub l nodes=4:ppn=2,walltime=03:00:00 submit cpi To chi32: qsub l nodes=4:ppn=2 q chi32 submit cpi To chimera: qsub l nodes=4:ppn=2 q chimera submit cpi
Scheduling policies Current policies NOT FIFO (/usr/local/maui/maui.cfg): Time in queue Expansion factor Backfilling Number of requested processors Fairshare Max time for a job: 3.5 days for 128 processors. Usage of Beoiac (old cluster): 54.18% (last 2 years) The early bird catches the worm!
Monitorization Graphical view of scheduling status (same output as showq, but perhaps easier to interpret) http://chimera/cgi bin/mauistatus.pl Graphical view of different metrics of the cluster (are your allocated nodes really doing something?) http://chimera/ganglia/
Other resources at the IAC Condor system (~ 180 machines, ideal for parameter studies). Future CALP node (512 nodes, 20% exclusive to IAC)
References Beowulf.org (http://www.beowulf.org) Chimera@wikipedia (http://en.wikipedia.org/wiki/chimera_%28mythology%29) IAC mailing list (http://listas.iac.es/mailman/listinfo/beowulf) Chimera IAC web page (http://chimera/) IAC Course on Parallel Comp. (http://goya/sie/forum/viewtopic.php?t=141) PVFS2 (http://www.pvfs.org) Modules package (http://modules.sourceforge.net) Maui (http://www.clusterresources.com/pages/products/maui cluster scheduler.php) Torque (http://www.clusterresources.com/pages/products/torque resource manager.php) Condor IAC web page (http://www.iac.es/sieinvens/sinfin/condor/)