Guillimin HPC Users Meeting. Bart Oldeman

Similar documents
Guillimin HPC Users Meeting. Bryan Caron

Guillimin HPC Users Meeting November 16, 2017

Guillimin HPC Users Meeting February 11, McGill University / Calcul Québec / Compute Canada Montréal, QC Canada

Guillimin HPC Users Meeting March 17, 2016

Outline. March 5, 2012 CIRMMT - McGill University 2

Guillimin HPC Users Meeting March 16, 2017

Guillimin HPC Users Meeting April 13, 2017

Guillimin HPC Users Meeting October 20, 2016

Guillimin HPC Users Meeting June 16, 2016

Guillimin HPC Users Meeting

Guillimin HPC Users Meeting December 14, 2017

Guillimin HPC Users Meeting July 14, 2016

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Practical Introduction to

Introduction to HPC Using the New Cluster at GACRC

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

Guillimin HPC Users Meeting January 13, 2017

Introduction to High Performance Computing

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Introduction to High Performance Computing (HPC) Resources at GACRC

Introduction to HPC Resources and Linux

MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization

Users and utilization of CERIT-SC infrastructure

OBTAINING AN ACCOUNT:

Introduction to Discovery.

The JANUS Computing Environment

ACCRE High Performance Compute Cluster

Practical Introduction to Message-Passing Interface (MPI)

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Our new HPC-Cluster An overview

Introduction to Discovery.

Batch Systems. Running calculations on HPC resources

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

CENTER FOR HIGH PERFORMANCE COMPUTING. Overview of CHPC. Martin Čuma, PhD. Center for High Performance Computing

Genius Quick Start Guide

Queue systems. and how to use Torque/Maui. Piero Calucci. Scuola Internazionale Superiore di Studi Avanzati Trieste

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Introduction to High Performance Computing (HPC) Resources at GACRC

Using ITaP clusters for large scale statistical analysis with R. Doug Crabill Purdue University

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

UAntwerpen, 24 June 2016

Introduction to PICO Parallel & Production Enviroment

Compilation and Parallel Start

and how to use TORQUE & Maui Piero Calucci

Introduction to Python for Scientific Computing

Workshop Set up. Workshop website: Workshop project set up account at my.osc.edu PZS0724 Nq7sRoNrWnFuLtBm

Introduction to Cheyenne. 12 January, 2017 Consulting Services Group Brian Vanderwende

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

UF Research Computing: Overview and Running STATA

SPARC 2 Consultations January-February 2016

Minerva User Group 2018

Introduction to GALILEO

Introduction to Advanced Research Computing (ARC)

Tools for Handling Big Data and Compu5ng Demands. Humani5es and Social Science Scholars

Introduction to GALILEO

Introduction to Discovery.

Getting started with the CEES Grid

Leonhard: a new cluster for Big Data at ETH

Migrating from Zcluster to Sapelo

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Introduction to HPC Using zcluster at GACRC

High Performance Computing (HPC) Using zcluster at GACRC

How to Use a Supercomputer - A Boot Camp

Knights Landing production environment on MARCONI

Introduction to GALILEO

WVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services

PACE. Instructional Cluster Environment (ICE) Orientation. Research Scientist, PACE

PACE. Instructional Cluster Environment (ICE) Orientation. Mehmet (Memo) Belgin, PhD Research Scientist, PACE

Introduction to HPC Using zcluster at GACRC

Using the IAC Chimera Cluster

Introduction to CINECA HPC Environment

Introduc)on to Hyades

AN INTRODUCTION TO CLUSTER COMPUTING

Introduction to HPCC at MSU

High Performance Computing Resources at MSU

BRC HPC Services/Savio

PACE Orientation. Research Scientist, PACE

Running Jobs on Blue Waters. Greg Bauer

Introduction to HPC Using the New Cluster at GACRC

Introduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Cluster Network Products

Working on the NewRiver Cluster

Introduction to HPC Using the New Cluster at GACRC

Batch Systems. Running your jobs on an HPC machine

SuperMike-II Launch Workshop. System Overview and Allocations

Martinos Center Compute Cluster

Using Sapelo2 Cluster at the GACRC

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

Introduction to High Performance Computing Using Sapelo2 at GACRC

BEAR User Forum. 24 June 2013

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Grid Computing Competence Center Large Scale Computing Infrastructures (MINF 4526 HS2011)

XSEDE New User Training. Ritu Arora November 14, 2014

FUJITSU PHI Turnkey Solution

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

SCITAS. Scientific IT and Application support unit at EPFL. created in February today : 5 system engineers + 6 application experts SCITAS

Transcription:

June 19, 2014 Bart Oldeman bart.oldeman@mcgill.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada

Outline Compute Canada News Upcoming Maintenance Downtime in August Storage System News Scheduler Updates and Demonstration Software and User Environment Updates Training News New Visualization and Collaboration Environment

Compute Canada News Compute Canada SPARC (Sustainable Planning for Advanced Research Computing) consultation process with the research community to build a national plan for advanced computing, data storage and archiving requirements targeted for CFIs planned renewal of Compute Canada infrastructure as well as funding for domain specific data projects consultations (white papers, workshops) in summer to prepare a preliminary plan for November 2014 renewal plan due April 2015 Notices of Intent for domain proposals due Jan. 2015 More info: www.computecanada.ca

Maintenance Downtime Guillimin Maintenance Downtime: August 4-7 Maintenance outage to the data centre cooling distribution system Will require stoppage of all logins, data access and batch job activities Further information regarding the planned maintenance downtime will be distributed by middle of July.

Storage System News GPFS Stability Issues - Update Regular occurrence of GPFS stability on nodes due to node expels Typical impact: interruption or halt of writing from jobs Investigation with GPFS and IB support team ongoing with critical priority Latest Actions (June 11): 2nd update to all node IB network tunings additional increase in receive queue size for IP-over-IB communications across the much larger scale IB fabric Continue to observe significant decrease in number of node expels (~1-2 every few days - major stability improvement) additional investigations to further improve performance

Storage System News Reminder: Upcoming Activities Online expansion of /gs to full target size (~ 2.9PB) Tape Archive (Backup) and Hierarchical Storage Management (HSM) Integration Migration of scratch policy to use HSM rules for identification and cleanup (In Progress) Analyzing characteristics of file system contents to identify suitable HSM migration policies (In Progress) Access to tape for targeted backups (In Progress)

Scheduler Update In general improved overall stability and performance A few outstanding issues under review with Adaptive Computing Testing in development environment with update to Torque 4.2.8 in progress Recall: April 10 - qsub for job submission enabled Default PATH settings updated to include Torque commands (qsub, qstat, ) Much faster response for submissions, queries compared to Moab commands (msub, canceljob, ) qsub submission filter: qsub A <RAPid> required for proper accounting and priority assignment (will be relaxed later)

Scheduler Update Job submission documentation updated www.hpc.mcgill.ca Documentation Submitting Your Job With migration to CentOS 6 nodes are set to new scheduler In the default queue, the chosen node depends on the pmem (memory) PBS parameter or node feature (ie. m256g, m512g, ) Internal routing for short jobs in default queue

Scheduler Update Default Queue - Serial Jobs(nodes=1:ppn=n, n<12) (new:sw2) Default Queue - Parallel Jobs (new:higher walltime boundary) () default if procs > 12 or nodes > 1 (which need to communicate over IB) () default if procs = 12 or nodes=1:ppn=12

Scheduler Update Extra large memory nodes (XLM2) Alternative to ScaleMP (offline, to be reimaged to CentOS-6 next week) Some nodes reserved by CFI grant holders, others get 12 hours only Example PBS submission lines: #PBS -l nodes=1:ppn=16,pmem=11700m,walltime=10:00:00 #PBS -l nodes=1:ppn=16,pmem=11700m,walltime=1:00:00:00 #PBS -l nodes=1:ppn=1,pmem=31700m,walltime=10:00:00 #PBS -l nodes=1:ppn=16,pmem=31700m,walltime=10:00:00 #PBS -l nodes=1:ppn=16,pmem=31700m,walltime=1:00:00:00 #PBS -l nodes=4:ppn=16,pmem=31700m,walltime=1:00:00:00 #PBS -l nodes=1:ppn=32:m1024g,pmem=31700m #PBS -l nodes=1:ppn=16:m256g,pmem=15700m (any XLM2 node) (non-cfi nodes only) (serial on m512g/m1024g) (16 cores on m512g/m1024g) (16 cores on m512g: non-cfi) (all cores on m512g nodes) (specific node type, IF you are the CFI holder) (specific node type)

Scheduler Update Examples: why is my job not running yet? checkjob -v JOB_ID (can also use -v -v, etc.) showq showq -i -v showq -r -v showq -w class=<queue_name> showq -w class=hb showq -w class=hbplus showq -w class=hb -r showq -w class=hbplus -r showq -w class=hb -i

Software Update New Installations petsc/3.4.4-openmpi-1.6.3-{gcc,intel} h5py for python/2.7.3 GPU Updates Driver update completed: NVIDIA-Linux-x86_64-331.67 Update to /etc/bashrc on GPU nodes to allow for correct operation of the NVIDIA Profiler MDCS and Matlab Update April 22 - license manager migrated to CentOS 6 Now supports up to 2014a Includes update to standard Matlab license for McGill users (access restricted due to Mathworks license requirements)

Software Update Compiler Updates / Additions to come Intel 14.0.2 License manager migration required to support newer Intel installations Long-term: project to standardize modules across Calcul Québec Others in progress MIO2/1.0 modular I/O library from IBM Research IOBUFF from Calcul Québec

Training News See Training at www.hpc.mcgill.ca for our full calendar of training and workshops planned for 2014 and to register Upcoming: July 10 - MapReduce and Hadoop for Big Data August 17 - Scientific Visualization Tools Recently Completed: June 5 - Advanced OpenMP May 22 - Introduction to the Xeon Phi

New Visualization & Collaboration Environment Located at the McGill HPC Centre at ETS (Peel and Notre-Dame O.) Polycom Group 700 HD series multi-point conferencing unit Two 55 LED LCD and one 65 LED LCD screens Crestron AirMedia for wireless connectivity room capacity of 10-15 people Room is available for video-conferencing or data visualization Contact us at guillimin@calculquebec.ca to access this resource

Other Developments Work has started on the summer upgrade of network link from data centre at ETS to McGill core network Upgrade from 10 to 40 Gbps Will include 10 Gbps connection to the Calcul Québec router Upgrade will enable support for projects requiring additional dedicated network bandwidth in/out from the data centre Testing to be completed in July and in production by end of August

User Feedback and Discussion Questions? Comments? We value your feedback. Guillimin Operational News for Users Follow us on Twitter: http://twitter.com/mcgillhpc