Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO Ulrike Schnoor (CERN) Anton Gamel, Felix Bührer, Benjamin Rottler, Markus Schumacher (University of Freiburg) February 02, 2018 Pre-GDB Meeting

Using HPC ressources via virtualization HPC NEMO at Uni Freiburg Resource Use NEMO in Freiburg to extend local Tier-3 resources ( Black Forest Grid = BFG) Job types Currently mainly local ATLAS analysis and simulation jobs, but easily extendable to any ATLAS jobs Setup Full virtualization of the environment + embedding into the existing OpenStack-Torque/Moab infrastructure in a way that is based on demand fully automated transparent for the user Ulrike Schnoor (CERN) 2/14

bwforcluster HPC center NEMO Shared by 3 communities in Baden-Württemberg: Elementary Particle Physics, Neuroscience, Microsystems engineering 752 worker nodes, each with 2 10 cores 128 GB RAM 100 Gbit/s OmniPath 240 GB local SSD 500 TB workspace (BeeGFS) TOP500: Ranked 214 in June 2016 389 in June 2017 (Link) In operation since July 2016 Hybrid of HPC and cloud approach: OpenStack orchestrates bare metal jobs and virtual machines in parallel Ulrike Schnoor (CERN) 3/14

Virtualization of ATLAS infrastructure on NEMO Ingredients OpenStack: Management framework allowing to run both virtual machines and bare metal jobs on NEMO Hypervisor: KVM User interface: BFG login nodes Access to CVMFS, Frontier via BFG squid Scheduler: Slurm (front-end for users @BFG) Torque/Moab (back-end for VMs @NEMO) Scheduling for dynamic allocation of VMs: ROCED VM image (SL6, CentOS7) Access to storage: dcache client, local BeeGFS Access to software: CVMFS client Ulrike Schnoor (CERN) 4/14

Virtual machine image tool chain Requirements Scientific Linux 6 CernVM image uses modified kernel not suitable Setup Packer (www.packer.io) for automatized image generation Basis: SL6 iso Output: VM template image (qcow2) Contextualization with puppet Install software, services (e.g. cvmfs client), user management etc. with the BFG puppet server identical and modularized setup Important updates? generate new VM Ulrike Schnoor (CERN) 5/14

Scheduling with Slurm Elastic Computing Slurm Elastic Computing: Resume and suspend machines on demand with adaptable resume/suspend functions and timeouts Challenges: 3-layer system with Slurm, Torque/Moab, and OpenStack allows almost no transmission/propagation of error messages Not intended for non-permanent resources (queue in Moab): Timeouts not sufficiently adaptable Solution: intermediate layer such as ROCED Ulrike Schnoor (CERN) 6/14

ROCED Responsive On-Demand Cloud-enabled Deployment Tool developed by CMS colleagues in Karlsruhe (KIT): https://github.com/roced-scheduler/roced Monitors demands in a batch system and dynamically manages virtual machines accordingly Python code with modular structure to adapt to different schedulers, VM types, Clouds etc. Integration and Requirement Adapters modified for BFG/Slurm setup: in production Integration Adapters... integrates booted compute nodes into existing batch server HTCondor Torque Grid Engine SLURM ROCED Core Broker... decides which machines to boot or shutdown Site Adapters Requirement Adapters... supplies information about needed compute nodes, e.g. queue size HTCondor Torque Grid Engine SLURM... boot machines on various Cloud Computing sites Hybrid HPC Cluster Commercial Providers OpenStack Ulrike Schnoor (CERN) 7/14

Summary and Outlook Slurm Elastic Computing setup can be used but is very fragile and leads to many job failures Using ROCED instead of Slurm Elastic Computing use non-elastic Slurm together with ROCED Requirement Adapter Integration Adapter implementation for Slurm and BFG in place Future possibilities: Use of containers CVMFS images instead of home-brew with Packer Ulrike Schnoor (CERN) 8/14

The Team Anton Gamel, Felix Buehrer, Benjamin Rottler, Ulrike Schnoor, Markus Schumacher Contacts in the Computing Center (HPC Team): Michael Janczyk, Bernd Wiebelt, Dirk von Suchodoletz Formerly also: Konrad Meier Ulrike Schnoor (CERN) 9/14

Backup Ulrike Schnoor (CERN) 10/14

The Black Forest Grid (BFG) Tier-2 and Tier-3 site of the WLCG In operation since 2005 CPU: 260 nodes, in total 4700 cores (HT) Several generations of worker node hardware Storage: dcache 1.35 PB (grid) lustre parallel storage 180 TB (local users) Local users from physics, biodynamics, and many other groups Future: exclusively Tier-2 and Tier-3 WLCG Ulrike Schnoor (CERN) 11/14

Baden-Württemberg HPC bwhpc-c5 project: Initiative in Baden-Württemberg for common frame for HPC ressources at BW universities co-financed by DFG bwforclusters federated approach: user group defined by research field not affiliation Freiburg: bwforcluster for Elementary Particle Physics, Neuroscience, and Microsystems Engineering: NEMO Ulrike Schnoor (CERN) 12/14

How to run ATLAS jobs on NEMO? OS: ATLAS currently needs Scientific Linux 6; NEMO runs CentOS7 Software: cvmfs = CERN VM File System: basis for all experiment-specific software not installed on NEMO Storage: afs not available on NEMO Virtualize the environment - Virtual machine image and orchestration/scheduling setup can be used both by local jobs as well as grid jobs Ulrike Schnoor (CERN) 13/14

Timeouts in Slurm Elasticity of the Slurm Elastic Computing module can be influenced with several timeout parameters: Main issue: ResumeTimeout should be long in order to catch Moab queue should be short in order to restart quickly if VM start fails Other problem: VMs often stay in COMPLETING (after job is terminated, before turning IDLE) for a long time Ulrike Schnoor (CERN) 14/14