Transient Compute ARC as Cloud Front-End
|
|
- Julian Richardson
- 5 years ago
- Views:
Transcription
1 Digital Infrastructures for Research , 11:30, Cracow 30 min slot AEC ALBERT EINSTEIN CENTER FOR FUNDAMENTAL PHYSICS Transient Compute ARC as Cloud Front-End Sigve Haug, AEC-LHEP University of Bern 1
2 University of Bern Cloud Cluster Commodity Clusters CSCS HPC Cluster 2
3 Big Data Science Challenge - For example ATLAS Experiment at CERN: CPU needs grow drastically - But budget is close to flat - get is What to do? (Illustration only) 3 0
4 Consolidate where ever possible - New type of service provider for HPC may be part of the answer NREN and commercial cloud providers - NREN as a non-profit business dedicated to science is particularly interesting. Is it an alternative to traditional HPC providers and private clusters, i.e. more cost effective? - The Swiss example: Swiss NREN offers IaaS on OpenStack 4
5 New player - usable for large science? Clouds Commodity Clusters CSCS Super HPC Clusters 5
6 Important incentives for this exercise - Federal infrastructure funding for free trials on the NREN IaaS (SWITCHengines) ok let put big science onto it - Some nice tools make it easy : ARC and ElastiCluster - May be an alternative to buying own hardware, fight for HPC allocations or deal with rigid central batch clusters and policies - Is it cheaper for science? 6
7 The test case - ATLAS experiment Particle physics experiment at the Large Hadron Collider (LHC) at CERN in Geneva Investigates smallest particles in universe, dark matter, big bang 4 decades life time The underground LHC tunnel at CERN in Geneva x PB per year, hundreds of thousands CPU cores running permanently, using 100s of distributed sites on several continents. 7
8 Compute workflow Input Flow Processing Step MB/event t / s*event Theory Generation Detector Simulation Digitization 3.6 Data /300 Detector saves about 1000 events per second Reconstruct ion 3.6/ / Analysis 0.4 So far running only Generation and Detector Simulation step on SWITCHengines - moderate I/O 8
9 - SWITCHengines How? - Got an account on IaaS SWITCHengines (an ) with some quota - Made an instance for elasticluster (ubuntu) - Made an instance for ATLAS with CentOS, mounted /cvmfs, installed some stuff to make ATLAS run. Made a snapshot (image). - Fired up a SLURM cluster with 304 cores with that image (30 min) UZH S3IT: Service and Support for ScienceIT - Riccardo Murri - Sergio Maffioletti 9
10 How : Elasticluster in action - Get a cluster in 30 minutes (elasticluster)ubuntu@elasticluster-nei:~$ tail.elasticluster/config security_group=mpi_test image_id=92cf2dc2-547c-4ab6-8d4f-9a383a4cf6e6 flavor=nei-ch-8cpu-16gb_ram frontend_nodes=1 compute_nodes=38 image_userdata= ssh_to=frontend network_ids=c9e33fb0-5adf-4c81-97a6-a6eba639d0b1 (elasticluster)ubuntu@elasticluster-nei:~$ elasticluster start slurm -n ATLAS (elasticluster)ubuntu@elasticluster-nei:~$ elasticluster list The following clusters have been started. Please note that there's no guarantee that they are fully configured: ATLAS name: ATLAS slurm template: - frontend nodes: 1 - compute nodes: 38 (elasticluster)ubuntu@elasticluster-nei:~$ (elasticluster)ubuntu@elasticluster-nei:~$ elasticluster resize -t slurm -a 5:compute ATLAS (elasticluster)ubuntu@elasticluster-nei:~$ elasticluster stop ATLAS 10
11 HOW cont. - ARC - Cloned our ARC HPC VM front-end for Cray (at home inst, not in IaaS) - ssh mounted /home/atlas from SWITCHengines and activated our ssh back-end (small wrapper around standard ARC slurm back-end ) - Registered front-end in ATLAS production system - Started running OpenStack Volunteer Computing Test Cray 11
12 Sketch of the solution OpenStack - The compute cluster has become transient (30 min thing), ARC frontend is persistent 12
13 Performance The CPU return has become good (around 90%), compared to a fix quota model, full elastic pay per usage seems interesting 13
14 20 % Discount for heavy usage like in this case Swiss NREN Prices - Yearly 1000 core cluster with 2 GB RAM per core yearly cost about 70 kchf per year - Similar to dedicated cluster operation by the CSCS (Swiss Supercomputing Center) - Not competitive with subsidised (power, manpowe) in-house solutions 14
15 Wrap up - connecting cloud to grid - Fire up application dedicated O(1000) core clusters with elasticluster on an OpenStack IaaS within an hour is possible - Hook this cluster to a remote ARC front-end works well for tested LHC tasks - Performance is sufficient, very stable - The cluster back-end becomes transient, i.e. can be reinstalled on the time-scale of changing a disk drive - In the Swiss example, pricing has become competitive to other outsourcing alternatives. - compute becomes transient 15
16 University of Bern Cloud Cluster Commodity Clusters CSCS HPC Cluster 16
17 Additional Material 17
18 ARC Bern ssh back-end ~]# ll /opt/sshslurm/ total 8 drwxr-xr-x. 2 root root 4096 Dec 18 17:50 config lrwxrwxrwx. 1 root root 8 Apr sacct -> sshslurm lrwxrwxrwx. 1 root root 8 Apr sacctmgr -> sshslurm lrwxrwxrwx. 1 root root 8 Apr salloc -> sshslurm lrwxrwxrwx. 1 root root 8 Apr sattach -> sshslurm lrwxrwxrwx. 1 root root 8 Apr sbatch -> sshslurm lrwxrwxrwx. 1 root root 8 Apr sbcast -> sshslurm lrwxrwxrwx. 1 root root 8 Apr scancel -> sshslurm lrwxrwxrwx. 1 root root 8 Apr scontrol -> sshslurm lrwxrwxrwx. 1 root root 8 Apr sdiag -> sshslurm lrwxrwxrwx. 1 root root 8 Apr sinfo -> sshslurm lrwxrwxrwx. 1 root root 8 Apr sprio -> sshslurm lrwxrwxrwx. 1 root root 8 Apr squeue -> sshslurm lrwxrwxrwx. 1 root root 8 Apr sreport -> sshslurm lrwxrwxrwx. 1 root root 8 Apr srun -> sshslurm lrwxrwxrwx. 1 root root 8 Apr sshare -> sshslurm -rwxr-xr-x. 1 root root 604 Nov sshslurm lrwxrwxrwx. 1 root root 8 Apr sstat -> sshslurm lrwxrwxrwx. 1 root root 8 Apr strigger -> sshslurm [root@ce04 ~]# [root@ce04 ~]# cat /opt/sshslurm/config/sshslurm-config SSHSLURM_HOST="atlas@ " SSH_CMDLINE="/opt/openssh-6.6/bin/ssh -o "ControlPath=~/.ssh/controlmaster-%r@%h:%p" -o "ControlMaster=auto" -o "ControlPersist=2h" -o "ServerAliveInterval=120" -i /opt/sshslurm/ config/id_rsa.$(whoami)" SCP_CMDLINE="/opt/openssh-6.6/bin/scp -o "ControlPath=~/.ssh/controlmaster-%r@%h:%p" -o "ControlMaster=auto" -o "ControlPersist=2h" -o "ServerAliveInterval=120" -i /opt/sshslurm/ config/id_rsa.$(whoami)" REMOTE_SLURM_PATH="/usr/bin" REMOTE_TEMP_PATH="/tmp" [root@ce04 ~]# 18
19 ARC Bern ssh back-end ~]# cat /opt/sshslurm/sshslurm #!/bin/bash # config source /opt/sshslurm/config/sshslurm-config SBINARY=$(basename "$0") SARGS="" for token in "$@"; do SARGS="$SARGS '$token'" # echo $SARGS done echo $(date) - $SBINARY "$SARGS" >> /tmp/sshslurm.log if [[ "$SBINARY" == "sbatch" && "$1"!= "" ]]; then SARGS=$REMOTE_TEMP_PATH/$(basename "$1") $SCP_CMDLINE -q "$1" "$SSHSLURM_HOST:$SARGS" $SSH_CMDLINE $SSHSLURM_HOST -- [ -d "$PWD" ] \&\& cd "$PWD"\; $REMOTE_SLURM_PATH/ $SBINARY "$SARGS" \&\& rm -f "$SARGS" exit $? fi $SSH_CMDLINE $SSHSLURM_HOST -- [ -d "$PWD" ] \&\& cd "$PWD"\; $REMOTE_SLURM_PATH/$SBINARY "$SARGS" exit $? [root@ce04 ~]# sshfs atlas@ :/home/atlas/ /home/atlas/ -o reconnect -o allow_other -o workaround=rename -o idmap=file -o uidfile=/opt/sshslurm/config/sshfs-cloud.uidmap -o gidfile=/opt/sshslurm/config/sshfs-cloud.gidmap -o nomap=ignore -o ServerAliveInterval=30 -o ServerAliveCountMax=2 -o IdentityFile=/opt/sshslurm/config/id_rsa.root -s -o nonempty 19
20 20
21 Running in true pilot mode APF : ATLAS Pilot Fabric act : ATLAS Control Tower Panda: Workload Manager 21 No I/O restrictions on SWITCH So it makes sense to let WN do I/O
ElastiCluster Automated provisioning of computational clusters in the cloud
ElastiCluster Automated provisioning of computational clusters in the cloud Riccardo Murri (with contributions from Antonio Messina, Nicolas Bär, Sergio Maffioletti, and Sigve
More informationIntroduction to Slurm
Introduction to Slurm Tim Wickberg SchedMD Slurm User Group Meeting 2017 Outline Roles of resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm configuration
More informationSlurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC
Slurm Overview Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17 Outline Roles of a resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm
More informationSlurm basics. Summer Kickstart June slide 1 of 49
Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource
More informationATLAS Experiment and GCE
ATLAS Experiment and GCE Google IO Conference San Francisco, CA Sergey Panitkin (BNL) and Andrew Hanushevsky (SLAC), for the ATLAS Collaboration ATLAS Experiment The ATLAS is one of the six particle detectors
More informationConference The Data Challenges of the LHC. Reda Tafirout, TRIUMF
Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment
More information1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents Overview. Principal concepts. Architecture. Scheduler Policies. 2 Bull, 2011 Bull Extreme Computing SLURM Overview Ares, Gerardo, HPC Team Introduction
More informationHow to run a job on a Cluster?
How to run a job on a Cluster? Cluster Training Workshop Dr Samuel Kortas Computational Scientist KAUST Supercomputing Laboratory Samuel.kortas@kaust.edu.sa 17 October 2017 Outline 1. Resources available
More informationRHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK
RHRK-Seminar High Performance Computing with the Cluster Elwetritsch - II Course instructor : Dr. Josef Schüle, RHRK Overview Course I Login to cluster SSH RDP / NX Desktop Environments GNOME (default)
More informationHigh Performance Computing Cluster Basic course
High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux
More informationOpportunities for container environments on Cray XC30 with GPU devices
Opportunities for container environments on Cray XC30 with GPU devices Cray User Group 2016, London Sadaf Alam, Lucas Benedicic, T. Schulthess, Miguel Gila May 12, 2016 Agenda Motivation Container technologies,
More informationClouds at other sites T2-type computing
Clouds at other sites T2-type computing Randall Sobie University of Victoria Randall Sobie IPP/Victoria 1 Overview Clouds are used in a variety of ways for Tier-2 type computing MC simulation, production
More informationSlurm Birds of a Feather
Slurm Birds of a Feather Tim Wickberg SchedMD SC17 Outline Welcome Roadmap Review of 17.02 release (Februrary 2017) Overview of upcoming 17.11 (November 2017) release Roadmap for 18.08 and beyond Time
More informationIntroduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU
Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationHeterogeneous Job Support
Heterogeneous Job Support Tim Wickberg SchedMD SC17 Submitting Jobs Multiple independent job specifications identified in command line using : separator The job specifications are sent to slurmctld daemon
More informationClouds in High Energy Physics
Clouds in High Energy Physics Randall Sobie University of Victoria Randall Sobie IPP/Victoria 1 Overview Clouds are integral part of our HEP computing infrastructure Primarily Infrastructure-as-a-Service
More informationHPC Introductory Course - Exercises
HPC Introductory Course - Exercises The exercises in the following sections will guide you understand and become more familiar with how to use the Balena HPC service. Lines which start with $ are commands
More informationFrom raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider
From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider Andrew Washbrook School of Physics and Astronomy University of Edinburgh Dealing with Data Conference
More informationData Management for the World s Largest Machine
Data Management for the World s Largest Machine Sigve Haug 1, Farid Ould-Saada 2, Katarina Pajchel 2, and Alexander L. Read 2 1 Laboratory for High Energy Physics, University of Bern, Sidlerstrasse 5,
More informationAndrej Filipčič
Singularity@SiGNET Andrej Filipčič SiGNET 4.5k cores, 3PB storage, 4.8.17 kernel on WNs and Gentoo host OS 2 ARC-CEs with 700TB cephfs ARC cache and 3 data delivery nodes for input/output file staging
More informationHigh Performance Computing Cluster Advanced course
High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on
More informationSlurm at the George Washington University Tim Wickberg - Slurm User Group Meeting 2015
Slurm at the George Washington University Tim Wickberg - wickberg@gwu.edu Slurm User Group Meeting 2015 September 16, 2015 Colonial One What s new? Only major change was switch to FairTree Thanks to BYU
More informationBatch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC
Batch Usage on JURECA Introduction to Slurm May 2016 Chrysovalantis Paschoulas HPS group @ JSC Batch System Concepts Resource Manager is the software responsible for managing the resources of a cluster,
More informationBatch Services at CERN: Status and Future Evolution
Batch Services at CERN: Status and Future Evolution Helge Meinhard, CERN-IT Platform and Engineering Services Group Leader HTCondor Week 20 May 2015 20-May-2015 CERN batch status and evolution - Helge
More informationThe Swiss ATLAS Grid End 2008 Progress Report for the SwiNG EB
The Swiss ATLAS Grid End 2008 Progress Report for the SwiNG EB 2009-02-06 E. Cogneras a, S. Gadomski b, S. Haug a, Peter Kunszt c Sergio Maffioletti c, Riccardo Murri c a Center for Research and Education
More informationSTATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID
The WLCG Motivation and benefits Container engines Experiments status and plans Security considerations Summary and outlook STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID SWISS EXPERIENCE
More informationUsing a Linux System 6
Canaan User Guide Connecting to the Cluster 1 SSH (Secure Shell) 1 Starting an ssh session from a Mac or Linux system 1 Starting an ssh session from a Windows PC 1 Once you're connected... 1 Ending an
More informationHIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016
HIGH ENERGY PHYSICS ON THE OSG Brian Bockelman CCL Workshop, 2016 SOME HIGH ENERGY PHYSICS ON THE OSG (AND OTHER PLACES TOO) Brian Bockelman CCL Workshop, 2016 Remind me again - WHY DO PHYSICISTS NEED
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationAn Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012
An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012 What is Gauss? * http://wiki.cse.ucdavis.edu/support:systems:gauss * 12 node compute cluster (2 x 16 cores per
More informationAn Introduction to GC3Pie Session-based scripts. Riccardo Murri
An Introduction to GC3Pie Session-based scripts Riccardo Murri 04/04/2016 What is GC3Pie? GC3Pie is... 1. An opinionated Python framework for defining and running computational
More informationCSCS CERN videoconference CFD applications
CSCS CERN videoconference CFD applications TS/CV/Detector Cooling - CFD Team CERN June 13 th 2006 Michele Battistin June 2006 CERN & CFD Presentation 1 TOPICS - Some feedback about already existing collaboration
More informationExercises: Abel/Colossus and SLURM
Exercises: Abel/Colossus and SLURM November 08, 2016 Sabry Razick The Research Computing Services Group, USIT Topics Get access Running a simple job Job script Running a simple job -- qlogin Customize
More informationVirtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO
Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO Ulrike Schnoor (CERN) Anton Gamel, Felix Bührer, Benjamin Rottler, Markus Schumacher (University of Freiburg) February 02, 2018
More informationSoftware and computing evolution: the HL-LHC challenge. Simone Campana, CERN
Software and computing evolution: the HL-LHC challenge Simone Campana, CERN Higgs discovery in Run-1 The Large Hadron Collider at CERN We are here: Run-2 (Fernando s talk) High Luminosity: the HL-LHC challenge
More informationResource Management at LLNL SLURM Version 1.2
UCRL PRES 230170 Resource Management at LLNL SLURM Version 1.2 April 2007 Morris Jette (jette1@llnl.gov) Danny Auble (auble1@llnl.gov) Chris Morrone (morrone2@llnl.gov) Lawrence Livermore National Laboratory
More informationStorage and I/O requirements of the LHC experiments
Storage and I/O requirements of the LHC experiments Sverre Jarp CERN openlab, IT Dept where the Web was born 22 June 2006 OpenFabrics Workshop, Paris 1 Briefly about CERN 22 June 2006 OpenFabrics Workshop,
More informationComputing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013
Computing at the Large Hadron Collider Frank Würthwein Professor of Physics of California San Diego November 15th, 2013 Outline The Science Software & Computing Challenges Present Solutions Future Solutions
More informationCYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING. M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś
CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś Presentation plan 2 Cyfronet introduction System description SLURM modifications Job
More informationOracle Cloud IaaS: Compute and Storage Fundamentals
Oracle University Contact Us: 1.800.529.0165 Oracle Cloud IaaS: Compute and Storage Fundamentals Duration: 3 Days What you will learn This Oracle Cloud IaaS: Compute and Storage Fundamentals training gives
More informationCloudOpen Europe 2013 SYNNEFO: A COMPLETE CLOUD STACK OVER TECHNICAL LEAD, SYNNEFO
SYNNEFO: A COMPLETE CLOUD STACK OVER GOOGLE GANETI. VANGELIS KOUKIS TECHNICAL LEAD, SYNNEFO 1 Running a public cloud: ~okeanos History - Design started late 2010 - Production since July 2011 Numbers -
More informationEnabling web-based interactive notebooks on geographically distributed HPC resources. Alexandre Beche
Enabling web-based interactive notebooks on geographically distributed HPC resources Alexandre Beche Outlines 1. Context 2. Interactive notebook running on cluster(s) 3. Advanced
More informationUPPMAX Introduction Martin Dahlö Valentin Georgiev
UPPMAX Introduction 2017-11-27 Martin Dahlö martin.dahlo@scilifelab.uu.se Valentin Georgiev valentin.georgiev@icm.uu.se Objectives What is UPPMAX what it provides Projects at UPPMAX How to access UPPMAX
More informationThe CMS Computing Model
The CMS Computing Model Dorian Kcira California Institute of Technology SuperComputing 2009 November 14-20 2009, Portland, OR CERN s Large Hadron Collider 5000+ Physicists/Engineers 300+ Institutes 70+
More informationPre-Commercial Procurement project - HNSciCloud. 20 January 2015 Bob Jones, CERN
Pre-Commercial Procurement project - HNSciCloud 20 January 2015 Bob Jones, CERN PCP PPI Why PCP? Commercial IaaS exists but not certified, integrated with public e-infrastructures, offering std interfaces
More informationATLAS Distributed Computing Experience and Performance During the LHC Run-2
ATLAS Distributed Computing Experience and Performance During the LHC Run-2 A Filipčič 1 for the ATLAS Collaboration 1 Jozef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia E-mail: andrej.filipcic@ijs.si
More informationOverview of ATLAS PanDA Workload Management
Overview of ATLAS PanDA Workload Management T. Maeno 1, K. De 2, T. Wenaus 1, P. Nilsson 2, G. A. Stewart 3, R. Walker 4, A. Stradling 2, J. Caballero 1, M. Potekhin 1, D. Smith 5, for The ATLAS Collaboration
More informationStudent HPC Hackathon 8/2018
Student HPC Hackathon 8/2018 J. Simon, C. Plessl 22. + 23. August 2018 J. Simon - Architecture of Parallel Computer Systems SoSe 2018 < 1 > Student HPC Hackathon 8/2018 Get the most performance out of
More informationVolunteer Computing at CERN
Volunteer Computing at CERN BOINC workshop Sep 2014, Budapest Tomi Asp & Pete Jones, on behalf the LHC@Home team Agenda Overview Status of the LHC@Home projects Additional BOINC projects Service consolidation
More informationThe creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM
The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM Lukas Nellen ICN-UNAM lukas@nucleares.unam.mx 3rd BigData BigNetworks Conference Puerto Vallarta April 23, 2015 Who Am I? ALICE
More informationShifter at CSCS Docker Containers for HPC
Shifter at CSCS Docker Containers for HPC HPC Advisory Council Swiss Conference Alberto Madonna, Lucas Benedicic, Felipe A. Cruz, Kean Mariotti - CSCS April 9 th, 2018 Table of Contents 1. Introduction
More informationSCALABLE HYBRID PROTOTYPE
SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform
More informationBash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University
Bash for SLURM Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University wesley.schaal@farmbio.uu.se Lab session: Pavlin Mitev (pavlin.mitev@kemi.uu.se) it i slides at http://uppmax.uu.se/support/courses
More informationPreparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch
Preparing for High-Luminosity LHC Bob Jones CERN Bob.Jones cern.ch The Mission of CERN Push back the frontiers of knowledge E.g. the secrets of the Big Bang what was the matter like within the first
More informationIntroduction to RCC. September 14, 2016 Research Computing Center
Introduction to HPC @ RCC September 14, 2016 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers
More informationCNAG Advanced User Training
www.bsc.es CNAG Advanced User Training Aníbal Moreno, CNAG System Administrator Pablo Ródenas, BSC HPC Support Rubén Ramos Horta, CNAG HPC Support Barcelona,May the 5th Aim Understand CNAG s cluster design
More informationIntroduction to RCC. January 18, 2017 Research Computing Center
Introduction to HPC @ RCC January 18, 2017 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much
More informationVisita delegazione ditte italiane
Visita delegazione ditte italiane CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Massimo Lamanna/CERN IT department - Data Storage Services group Innovation in Computing in High-Energy
More informationCRUK cluster practical sessions (SLURM) Part I processes & scripts
CRUK cluster practical sessions (SLURM) Part I processes & scripts login Log in to the head node, clust1-headnode, using ssh and your usual user name & password. SSH Secure Shell 3.2.9 (Build 283) Copyright
More informationCIT 668: System Architecture. Amazon Web Services
CIT 668: System Architecture Amazon Web Services Topics 1. AWS Global Infrastructure 2. Foundation Services 1. Compute 2. Storage 3. Database 4. Network 3. AWS Economics Amazon Services Architecture Regions
More informationFuture trends in distributed infrastructures the Nordic Tier-1 example
Future trends in distributed infrastructures the Nordic Tier-1 example O. G. Smirnova 1,2 1 Lund University, 1, Professorsgatan, Lund, 22100, Sweden 2 NeIC, 25, Stensberggata, Oslo, NO-0170, Norway E-mail:
More informationThe ATLAS PanDA Pilot in Operation
The ATLAS PanDA Pilot in Operation P. Nilsson 1, J. Caballero 2, K. De 1, T. Maeno 2, A. Stradling 1, T. Wenaus 2 for the ATLAS Collaboration 1 University of Texas at Arlington, Science Hall, P O Box 19059,
More informationMonitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino
Monitoring system for geographically distributed datacenters based on Openstack Gioacchino Vino Tutor: Dott. Domenico Elia Tutor: Dott. Giacinto Donvito Borsa di studio GARR Orio Carlini 2016-2017 INFN
More informationGrid Computing a new tool for science
Grid Computing a new tool for science CERN, the European Organization for Nuclear Research Dr. Wolfgang von Rüden Wolfgang von Rüden, CERN, IT Department Grid Computing July 2006 CERN stands for over 50
More informationOne Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing
One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool D. Mason for CMS Software & Computing 1 Going to try to give you a picture of the CMS HTCondor/ glideinwms global pool What s the use case
More informationBatch Systems. Running your jobs on an HPC machine
Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationBatch Systems & Parallel Application Launchers Running your jobs on an HPC machine
Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike
More informationNew strategies of the LHC experiments to meet the computing requirements of the HL-LHC era
to meet the computing requirements of the HL-LHC era NPI AS CR Prague/Rez E-mail: adamova@ujf.cas.cz Maarten Litmaath CERN E-mail: Maarten.Litmaath@cern.ch The performance of the Large Hadron Collider
More informationChoosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing
Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 Email:plamenkrastev@fas.harvard.edu Objectives Inform you of available computational resources Help you choose appropriate computational
More informationToward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017
Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store Wei Xie TTU CS Department Seminar, 3/7/2017 1 Outline General introduction Study 1: Elastic Consistent Hashing based Store
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 6 February 2018 Overview of Talk Basic SLURM commands SLURM batch
More informationVersions and 14.11
Slurm Update Versions 14.03 and 14.11 Jacob Jenson jacob@schedmd.com Yiannis Georgiou yiannis.georgiou@bull.net V14.03 - Highlights Support for native Slurm operation on Cray systems (without ALPS) Run
More informationDirections in Workload Management
Directions in Workload Management Alex Sanchez and Morris Jette SchedMD LLC HPC Knowledge Meeting 2016 Areas of Focus Scalability Large Node and Core Counts Power Management Failure Management Federated
More informationTravelling securely on the Grid to the origin of the Universe
1 Travelling securely on the Grid to the origin of the Universe F-Secure SPECIES 2007 conference Wolfgang von Rüden 1 Head, IT Department, CERN, Geneva 24 January 2007 2 CERN stands for over 50 years of
More informationCERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008
CERN openlab II CERN openlab and Intel: Today and Tomorrow Sverre Jarp CERN openlab CTO 16 September 2008 Overview of CERN 2 CERN is the world's largest particle physics centre What is CERN? Particle physics
More informationGraham vs legacy systems
New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet
More informationOverview. About CERN 2 / 11
Overview CERN wanted to upgrade the data monitoring system of one of its Large Hadron Collider experiments called ALICE (A La rge Ion Collider Experiment) to ensure the experiment s high efficiency. They
More informationCloud Computing For Researchers
Cloud Computing For Researchers August, 2016 Compute Canada is often asked about the potential of outsourcing to commercial clouds. This has been investigated as an alternative or supplement to purchasing
More information13th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT 2010 Jaipur, India February
LHC Cloud Computing with CernVM Ben Segal 1 CERN 1211 Geneva 23, Switzerland E mail: b.segal@cern.ch Predrag Buncic CERN E mail: predrag.buncic@cern.ch 13th International Workshop on Advanced Computing
More informationFederated Cluster Support
Federated Cluster Support Brian Christiansen and Morris Jette SchedMD LLC Slurm User Group Meeting 2015 Background Slurm has long had limited support for federated clusters Most commands support a --cluster
More informationCase study of a computing center: Accounts, Priorities and Quotas
Afficher le masque pour Insérer le titre ici Direction Informatique 05/02/2015 Case study of a computing center: Accounts, Priorities and Quotas Michel Ringenbach mir@unistra.fr HPC Center, Université
More informationUK Tier-2 site evolution for ATLAS. Alastair Dewhurst
UK Tier-2 site evolution for ATLAS Alastair Dewhurst Introduction My understanding is that GridPP funding is only part of the story when it comes to paying for a Tier 2 site. Each site is unique. Aim to
More informationSherlock for IBIIS. William Law Stanford Research Computing
Sherlock for IBIIS William Law Stanford Research Computing Overview How we can help System overview Tech specs Signing on Batch submission Software environment Interactive jobs Next steps We are here to
More informationIEPSAS-Kosice: experiences in running LCG site
IEPSAS-Kosice: experiences in running LCG site Marian Babik 1, Dusan Bruncko 2, Tomas Daranyi 1, Ladislav Hluchy 1 and Pavol Strizenec 2 1 Department of Parallel and Distributed Computing, Institute of
More informationHOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION
HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION Steve Bertoldi, Solutions Director, MarkLogic Agenda Cloud computing and on premise issues Comparison of traditional vs cloud architecture Review of use
More informationUsing Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine
Journal of Physics: Conference Series OPEN ACCESS Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine To cite this article: Henrik Öhman et al 2014 J. Phys.: Conf.
More informationOpportunities A Realistic Study of Costs Associated
e-fiscal Summer Workshop Opportunities A Realistic Study of Costs Associated X to Datacenter Installation and Operation in a Research Institute can we do EVEN better? Samos, 3rd July 2012 Jesús Marco de
More informationIntroduction to UBELIX
Science IT Support (ScITS) Michael Rolli, Nico Färber Informatikdienste Universität Bern 06.06.2017, Introduction to UBELIX Agenda > Introduction to UBELIX (Overview only) Other topics spread in > Introducing
More informationCSCS Proposal writing webinar Technical review. 12th April 2015 CSCS
CSCS Proposal writing webinar Technical review 12th April 2015 CSCS Agenda Tips for new applicants CSCS overview Allocation process Guidelines Basic concepts Performance tools Demo Q&A open discussion
More informationThe Fastest And Most Efficient Block Storage Software (SDS)
The Fastest And Most Efficient Block Storage Software (SDS) StorPool: Product Summary 1. Advanced Block-level Software Defined Storage, SDS (SDS 2.0) Fully distributed, scale-out, online changes of everything,
More informationSLURM Operation on Cray XT and XE
SLURM Operation on Cray XT and XE Morris Jette jette@schedmd.com Contributors and Collaborators This work was supported by the Oak Ridge National Laboratory Extreme Scale Systems Center. Swiss National
More informationIntroduction to SLURM on the High Performance Cluster at the Center for Computational Research
Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY
More informationSubmitting and running jobs on PlaFRIM2 Redouane Bouchouirbat
Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Summary 1. Submitting Jobs: Batch mode - Interactive mode 2. Partition 3. Jobs: Serial, Parallel 4. Using generic resources Gres : GPUs, MICs.
More informationFederated data storage system prototype for LHC experiments and data intensive science
Federated data storage system prototype for LHC experiments and data intensive science A. Kiryanov 1,2,a, A. Klimentov 1,3,b, D. Krasnopevtsev 1,4,c, E. Ryabinkin 1,d, A. Zarochentsev 1,5,e 1 National
More informationSilicon House. Phone: / / / Enquiry: Visit:
Silicon House Powering Top Blue Chip Companies and Successful Hot Start Ups around the World Ranked TOP Performer among the registrars by NIXI Serving over 750000 clients in 90+ countries Phone: +91-7667-200-300
More informationUse to exploit extra CPU from busy Tier2 site
Use ATLAS@home to exploit extra CPU from busy Tier2 site Wenjing Wu 1, David Cameron 2 1. Computer Center, IHEP, China 2. University of Oslo, Norway 2017-9-21 Outline ATLAS@home Running status New features/improvements
More informationSlurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC https://www.schedmd.
Slurm Roadmap Danny Auble, Morris Jette, Tim Wickberg SchedMD Slurm User Group Meeting 2017 HPCWire apparently does awards? Best HPC Cluster Solution or Technology https://www.hpcwire.com/2017-annual-hpcwire-readers-choice-awards/
More informationDuke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu
Duke Compute Cluster Workshop 10/04/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch
More informationSLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
More information