Teraflops of Jupyter: A Notebook Based Analysis Portal at BNL

Size: px
Start display at page:

Download "Teraflops of Jupyter: A Notebook Based Analysis Portal at BNL"

Transcription

1 Teraflops of Jupyter: A Notebook Based Analysis Portal at BNL Ofer Rind Spring HEPiX, Madison, WI May 17,2018 In collaboration with: Doug Benjamin, Costin Caramarcu, Zhihua Dong, Will Strecker-Kellogg, Thomas Throwe

2 BNL SDCC Serves an increasingly diverse, multi-disciplinary user community: RHIC Tier-0, US ATLAS Tier-1 and Tier-3, Belle2 Tier-1, Neutrino, Astro, LQCD, CFN,. Large HTC infrastructure accessed via HTCondor (plus experiment-specific job management layers) Growing HPC infrastructure, currently with two production clusters accessed via Slurm Limited interactive resources accessed via ssh gateways

3 Interactive Data Analysis Wish list for running effective, interactive data analysis in an era of large-scale computing, with complex software stacks.? Lower the barrier to entry for using data analysis resources at BNL Minimize or eliminate software setup and installation Need flexible, easy-to-follow examples/tutorials Simple way to document and share results and code Reproducibility, adaptability Straightforward way to make use of software methods and ecosystems being developed in non-hep communities (e.g. machine learning) And make our resources more easily available to non-hep communities

4 Data Analysis As A Service Jupyter Notebooks (ipython) Provide a flexible, standardized, platform independent interface through a web browser No local software to install Many language extensions (kernels) and tools available Easy to share, reproduce, document results and create tutorials From the facility point of view: Can we implement this by leveraging existing resources? Would prefer to avoid building new dedicated infrastructure, such as a specialized cluster (cf. CERN Swan)

5 Some terminology Jupyter notebook: web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results Jupyterlab: nextgeneration webbased user interface

6

7 Some terminology Jupyterhub: multi-user hub, spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server

8 Current Test Setup at BNL Jupyterhub servers deployed on RHEV Anaconda3 install Varying environments/networks depending on function: HTC/ Condor or HPC/Slurm Access via ssh tunnel through firewall to Jupyterhub https proxy Kerberos auth to Jupyterhub server Transparent setup leverages PAM stack Working on OAuth implementation Jira, confluence, etc. for documentation Temporary two node slurm reservation on Institutional Cluster (Broadwell/Nvidia) for testing How to connect users to the batch resources?

9 One approach: slurm_magic Execute usual CLI batch commands through notebook interface. Easily adapted for HTCondor as well. But not so satisfying could just open a terminal to do this anyway

10

11 More useful approach: HTCondor API Provide access to distributed computing through familiar APIs (python s threading, multiprocessing, asyncio, etc ) I d like to submit and manage a job or cluster of jobs

12 More useful approach: HTCondor API At a higher level, abstract away the batch job layer I d like to run over a dataset Serialize function, ship off to jobs, serialize output, gather Early stage of development - see Will Strecker-Kellogg for details

13 Yet another approach: batchspawner Spawn ipython notebook itself within a single node batch job Notebook can be spawned locally or onto batch system, with connection established back to hub and to browser through the http proxy batchspawner.py with hooks for slurm, HTCondor, torque, etc. Also, wrapspawner.py, which allows for selection between multiple profile setups at startup. github.com/jupyterhub/wrapspawner Anaconda installation for batch users on shared gpfs volume

14 Slurm jupyterhub_config.py HTCondor + Profiles # # BatchSpawner(Spawner) configuration # Using Slurm to Spawn user to IC # c.jupyterhub.spawner_class = 'batchspawner.slurmspawner' # # BatchSpawnerBase configuration # These are simply setting parameters used in the job script template below # c.batchspawnerbase.req_nprocs = '1' c.batchspawnerbase.req_partition = 'long' c.batchspawnerbase.req_runtime = '120:00' c.batchspawnerbase.req_account = 'pq302951' c.slurmspawner.batch_script = '''#!/bin/sh #SBATCH --partition={partition} #SBATCH --time={runtime} #SBATCH --job-name=spawner-jupyterhub #SBATCH --workdir={homedir} #SBATCH --export={keepvars} #SBATCH --get-user-env=l #SBATCH --account={account} #SBATCH --reservation=racf_32 #SBATCH {options} {cmd} ''' # # ProfilesSpawner configuration # # List of profiles to offer for selection. Signature is: # List(Tuple( Unicode, Unicode, Type(Spawner), Dict )) # corresponding to profile display name, unique key, Spawner class, # dictionary of spawner config options. # # The first three values will be exposed in the input_template as {display}, # {key}, and {type} # c.jupyterhub.spawner_class = 'wrapspawner.profilesspawner' c.profilesspawner.profiles = [ ( "Local server", 'local', 'jupyterhub.spawner.localprocessspawner', {'ip':' '} ), ('Condor Shared Queue', 'CondorDefault', 'batchspawner.condorspawner', dict(req_nprocs='20', req_memory='18g', req_options='+job_type = "jupyter"')), ] c.condorspawner.batch_script = ''' Executable = /bin/sh RequestMemory = {memory} RequestCpus = {nprocs} Arguments = \"-c 'export PATH=/u0b/software/anaconda3/bin:$PATH; exec {cmd}'\" Remote_Initialdir = {homedir} Output = {homedir}/.jupyterhub.$(clusterid).condor.out Error = {homedir}/.jupyterhub.$(clusterid).condor.err ShouldTransferFiles = False GetEnv = True PeriodicRemove = (JobStatus == 1 && NumJobStarts > 1) {options} Queue '''

15 Jupyterhub Server Compute Cluster (Slurm, HTCondor) HTTP Proxy Spawner Shared Storage Resources (GPFS, dcache, BNL Box)

16 Auth (krb, oauth, sso) Jupyterhub Server Compute Cluster (Slurm, HTCondor) HTTP Proxy Spawner Shared Storage Resources (GPFS, dcache, BNL Box)

17 Auth (krb, oauth, sso) Jupyterhub Server Compute Cluster (Slurm, HTCondor) LocalProcessSpawner HTTP Proxy Spawner Shared Storage Resources (GPFS, dcache, BNL Box)

18 Auth (krb, oauth, sso) Jupyterhub Server Compute Cluster (Slurm, HTCondor) LocalProcessSpawner HTTP Proxy Spawner Shared Storage Resources (GPFS, dcache, BNL Box)

19 Auth (krb, oauth, sso) Jupyterhub Server Compute Cluster (Slurm, HTCondor) LocalProcessSpawner HTTP Proxy Spawner Slurm_magic HTCondor API Batch Scheduler Shared Storage Resources (GPFS, dcache, BNL Box)

20 Auth (krb, oauth, sso) Jupyterhub Server Compute Cluster (Slurm, HTCondor) LocalProcessSpawner HTTP Proxy Spawner Slurm_magic HTCondor API Batch Scheduler sbatch, condor_submit Shared Storage Resources (GPFS, dcache, BNL Box)

21 Auth (krb, oauth, sso) Jupyterhub Server Compute Cluster (Slurm, HTCondor) HTTP Proxy Spawner Batch Scheduler Shared Storage Resources (GPFS, dcache, BNL Box)

22 Auth (krb, oauth, sso) Jupyterhub Server Compute Cluster (Slurm, HTCondor) HTTP Proxy Spawner BatchSpawner Batch Scheduler Shared Storage Resources (GPFS, dcache, BNL Box)

23 Auth (krb, oauth, sso) Jupyterhub Server Compute Cluster (Slurm, HTCondor) HTTP Proxy Spawner BatchSpawner Batch Scheduler sbatch, condor_submit Shared Storage Resources (GPFS, dcache, BNL Box)

24 Auth (krb, oauth, sso) Jupyterhub Server Compute Cluster (Slurm, HTCondor) HTTP Proxy Spawner BatchSpawner Batch Scheduler sbatch, condor_submit Shared Storage Resources (GPFS, dcache, BNL Box)

25 The Interface

26 The Interface

27 The Interface

28 The Interface

29 The Interface

30 The Interface

31 The Interface

32 The Interface

33 Example: ML Applications for ATLAS Tensorflow/Keras

34 Example: ML Applications for ATLAS Tensorflow/Keras :00: : I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA [ ] :00: : I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryclockrate(ghz): pcibusid: 0000:07:00.0 totalmemory: 15.89GiB freememory: 15.60GiB :00: : I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 1 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryclockrate(ghz): pcibusid: 0000:81:00.0 totalmemory: 15.89GiB freememory: 15.60GiB [ ] :00: : I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/ job:localhost/replica:0/task:0/device:gpu:0 with MB memory) -> physical GPU (device: 0, name: Tesla P100- PCIE-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0) :00: : I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/ job:localhost/replica:0/task:0/device:gpu:1 with MB memory) -> physical GPU (device: 1, name: Tesla P100- PCIE-16GB, pci bus id: 0000:81:00.0, compute capability: 6.0)

35 Open Issues Authentication (implementing oauth) Eliminate need for tunneling Allocation of resources Interactive users won t be patient with batch system latency - how do we handle scheduling on an oversubscribed cluster? What are appropriate time/resource limitations on the notebooks? How to handle idle notebooks taking up job slots? External connectivity requirements (esp. on HPC) Management of software environment (is anaconda the way to go?) Who are the users? HEP inreach and outreach (e.g. notebooks already heavily used at NSLS2)

36 Conclusions We have begun to overlay a flexible, Jupyter notebook-based analysis portal atop existing batch resources at BNL Providing the tools that users want for the new ways they work now Technical and policy issues remain Looking for users from the diverse communities we serve and other interested admins/developers to collaborate with

37 Let s discuss Must a question have answer? Can t there be another way? Would you like to talk about it?

Changing landscape of computing at BNL

Changing landscape of computing at BNL Changing landscape of computing at BNL Shared Pool and New Users and Tools HTCondor Week May 2018 William Strecker-Kellogg Shared Pool Merging 6 HTCondor Pools into 1 2 What? Current Situation

More information

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018 What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018 Todd Tannenbaum Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison

More information

New Directions and BNL

New Directions and BNL New Directions and HTCondor @ BNL USATLAS TIER-3 & NEW COMPUTING DIRECTIVES William Strecker-Kellogg RHIC/ATLAS Computing Facility (RACF) Brookhaven National Lab May 2016 RACF Overview 2 RHIC Collider

More information

Enabling web-based interactive notebooks on geographically distributed HPC resources. Alexandre Beche

Enabling web-based interactive notebooks on geographically distributed HPC resources. Alexandre Beche Enabling web-based interactive notebooks on geographically distributed HPC resources Alexandre Beche Outlines 1. Context 2. Interactive notebook running on cluster(s) 3. Advanced

More information

AixViPMaP towards an open simulation platform for microstructure modelling

AixViPMaP towards an open simulation platform for microstructure modelling AixViPMaP towards an open simulation platform for microstructure modelling Aachen, den 7. September 2018 Lukas Koschmieder, ICME Group, Steel Institute IEHK, RWTH Aachen University ICME Cloud Service Web-based

More information

Monitoring and Analytics With HTCondor Data

Monitoring and Analytics With HTCondor Data Monitoring and Analytics With HTCondor Data William Strecker-Kellogg RACF/SDCC @ BNL 1 RHIC/ATLAS Computing Facility (SDCC) Who are we? See our last two site reports from the HEPiX conference for a good

More information

PROOF-Condor integration for ATLAS

PROOF-Condor integration for ATLAS PROOF-Condor integration for ATLAS G. Ganis,, J. Iwaszkiewicz, F. Rademakers CERN / PH-SFT M. Livny, B. Mellado, Neng Xu,, Sau Lan Wu University Of Wisconsin Condor Week, Madison, 29 Apr 2 May 2008 Outline

More information

HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH

HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina Koch, Research Computing Facilitator Center for High Throughput Computing STAT679, October 29, 2018 1 About Me I work for the Center for High Throughput

More information

JupyterHub Documentation

JupyterHub Documentation JupyterHub Documentation Release 0.4.0.dev Project Jupyter team January 30, 2016 User Documentation 1 Getting started with JupyterHub 3 2 Further reading 11 3 How JupyterHub works 13 4 Writing a custom

More information

OpenDreamKit. Computational environments for research and education Min Ragan-Kelley. Simula Research Lab

OpenDreamKit. Computational environments for research and education Min Ragan-Kelley. Simula Research Lab OpenDreamKit Computational environments for research and education Min Ragan-Kelley Simula Research Lab OpenDreamKit H2020 project Virtual Research Environments 16 Institutions Generic (Jupyter, SageMath)

More information

ALICE Grid Activities in US

ALICE Grid Activities in US ALICE Grid Activities in US 1 ALICE-USA Computing Project ALICE-USA Collaboration formed to focus on the ALICE EMCal project Construction, installation, testing and integration participating institutions

More information

Clouds at other sites T2-type computing

Clouds at other sites T2-type computing Clouds at other sites T2-type computing Randall Sobie University of Victoria Randall Sobie IPP/Victoria 1 Overview Clouds are used in a variety of ways for Tier-2 type computing MC simulation, production

More information

Clouds in High Energy Physics

Clouds in High Energy Physics Clouds in High Energy Physics Randall Sobie University of Victoria Randall Sobie IPP/Victoria 1 Overview Clouds are integral part of our HEP computing infrastructure Primarily Infrastructure-as-a-Service

More information

Graham vs legacy systems

Graham vs legacy systems New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet

More information

Bright Cluster Manager

Bright Cluster Manager Bright Cluster Manager Using Slurm for Data Aware Scheduling in the Cloud Martijn de Vries CTO About Bright Computing Bright Computing 1. Develops and supports Bright Cluster Manager for HPC systems, server

More information

VC3. Virtual Clusters for Community Computation. DOE NGNS PI Meeting September 27-28, 2017

VC3. Virtual Clusters for Community Computation. DOE NGNS PI Meeting September 27-28, 2017 VC3 Virtual Clusters for Community Computation DOE NGNS PI Meeting September 27-28, 2017 Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago John Hover, Brookhaven National Lab A

More information

ATLAS NorduGrid related activities

ATLAS NorduGrid related activities Outline: NorduGrid Introduction ATLAS software preparation and distribution Interface between NorduGrid and Condor NGlogger graphical interface On behalf of: Ugur Erkarslan, Samir Ferrag, Morten Hanshaugen

More information

Parsl: Developing Interactive Parallel Workflows in Python using Parsl

Parsl: Developing Interactive Parallel Workflows in Python using Parsl Parsl: Developing Interactive Parallel Workflows in Python using Parsl Kyle Chard (chard@uchicago.edu) Yadu Babuji, Anna Woodard, Zhuozhao Li, Ben Clifford, Ian Foster, Dan Katz, Mike Wilde, Justin Wozniak

More information

Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers

Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers Technical White Paper Table of Contents Pre-requisites...1 Setup...2 Run PyTorch in Kubernetes...3 Run PyTorch in Singularity...4 Run

More information

Facilitating Collaborative Analysis in SWAN

Facilitating Collaborative Analysis in SWAN Facilitating Collaborative Analysis in SWAN E. Tejedor, D. Castro, D. Piparo, P. Mato E. Bocchi, J. Moscicki, M. Lamanna, P. Kothuri https://swan.cern.ch July 11th, 2018 CHEP 2018, Sofia (Bulgaria) Introduction

More information

Opportunities for container environments on Cray XC30 with GPU devices

Opportunities for container environments on Cray XC30 with GPU devices Opportunities for container environments on Cray XC30 with GPU devices Cray User Group 2016, London Sadaf Alam, Lucas Benedicic, T. Schulthess, Miguel Gila May 12, 2016 Agenda Motivation Container technologies,

More information

SUG Breakout Session: OSC OnDemand App Development

SUG Breakout Session: OSC OnDemand App Development SUG Breakout Session: OSC OnDemand App Development Basil Mohamed Gohar Web and Interface Applications Manager Eric Franz Senior Engineer & Technical Lead This work is supported by the National Science

More information

Onto Petaflops with Kubernetes

Onto Petaflops with Kubernetes Onto Petaflops with Kubernetes Vishnu Kannan Google Inc. vishh@google.com Key Takeaways Kubernetes can manage hardware accelerators at Scale Kubernetes provides a playground for ML ML journey with Kubernetes

More information

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO Ulrike Schnoor (CERN) Anton Gamel, Felix Bührer, Benjamin Rottler, Markus Schumacher (University of Freiburg) February 02, 2018

More information

What s new in HTCondor? What s coming? European HTCondor Workshop June 8, 2017

What s new in HTCondor? What s coming? European HTCondor Workshop June 8, 2017 What s new in HTCondor? What s coming? European HTCondor Workshop June 8, 2017 Todd Tannenbaum Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison Release

More information

Day 9: Introduction to CHTC

Day 9: Introduction to CHTC Day 9: Introduction to CHTC Suggested reading: Condor 7.7 Manual: http://www.cs.wisc.edu/condor/manual/v7.7/ Chapter 1: Overview Chapter 2: Users Manual (at most, 2.1 2.7) 1 Turn In Homework 2 Homework

More information

XSEDE High Throughput Computing Use Cases

XSEDE High Throughput Computing Use Cases XSEDE High Throughput Computing Use Cases 31 May 2013 Version 0.3 XSEDE HTC Use Cases Page 1 XSEDE HTC Use Cases Page 2 Table of Contents A. Document History B. Document Scope C. High Throughput Computing

More information

Reproducibility and Extensibility in Scientific Research. Jessica Forde

Reproducibility and Extensibility in Scientific Research. Jessica Forde Reproducibility and Extensibility in Scientific Research Jessica Forde Project Jupyter @projectjupyter @mybinderteam Project Jupyter IPython Jupyter Notebook Architecture of JupyterHub Overview The problem

More information

Singularity in CMS. Over a million containers served

Singularity in CMS. Over a million containers served Singularity in CMS Over a million containers served Introduction The topic of containers is broad - and this is a 15 minute talk! I m filtering out a lot of relevant details, particularly why we are using

More information

HTCondor with KRB/AFS Setup and first experiences on the DESY interactive batch farm

HTCondor with KRB/AFS Setup and first experiences on the DESY interactive batch farm HTCondor with KRB/AFS Setup and first experiences on the DESY interactive batch farm Beyer Christoph & Finnern Thomas Madison (Wisconsin), May 2018 HTCondor week The Team and the Outline The Team Outline

More information

arxiv: v1 [cs.dc] 7 Apr 2014

arxiv: v1 [cs.dc] 7 Apr 2014 arxiv:1404.1814v1 [cs.dc] 7 Apr 2014 CernVM Online and Cloud Gateway: a uniform interface for CernVM contextualization and deployment G Lestaris 1, I Charalampidis 2, D Berzano, J Blomer, P Buncic, G Ganis

More information

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers CHEP 2016 - San Francisco, United States of America Gunther Erli, Frank Fischer, Georg Fleig, Manuel Giffels, Thomas

More information

UW-ATLAS Experiences with Condor

UW-ATLAS Experiences with Condor UW-ATLAS Experiences with Condor M.Chen, A. Leung, B.Mellado Sau Lan Wu and N.Xu Paradyn / Condor Week, Madison, 05/01/08 Outline Our first success story with Condor - ATLAS production in 2004~2005. CRONUS

More information

Enabling a SuperFacility with Software Defined Networking

Enabling a SuperFacility with Software Defined Networking Enabling a SuperFacility with Software Defined Networking Shane Canon Tina Declerck, Brent Draney, Jason Lee, David Paul, David Skinner May 2017 CUG 2017-1 - SuperFacility - Defined Combining the capabilities

More information

Programming with Python

Programming with Python Stefan Güttel Programming with Python Getting started for Programming with Python A little bit of terminology Python A programming language, the language you write computer programs in. IPython A Python

More information

UI and Python Interface

UI and Python Interface UI and Python Interface Koichi Murakami (KEK) Geant4 Collaboration Meeting 2017 27 September 2017 22ND GEANT4 COLLABORATION MEETING 1 Important fix in UI : BZ1989 (2006) Symptom : In UI terminal, PreInit>

More information

Our Workshop Environment

Our Workshop Environment Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Our Environment Today Your laptops or workstations: only used for portal access Bridges

More information

An overview of batch processing. 1-June-2017

An overview of batch processing. 1-June-2017 An overview of batch processing 1-June-2017 One-on-one Your computer Not to be men?oned in this talk Your computer (mul?ple cores) (mul?ple threads) One thread One thread One thread One thread One thread

More information

HTCONDOR USER TUTORIAL. Greg Thain Center for High Throughput Computing University of Wisconsin Madison

HTCONDOR USER TUTORIAL. Greg Thain Center for High Throughput Computing University of Wisconsin Madison HTCONDOR USER TUTORIAL Greg Thain Center for High Throughput Computing University of Wisconsin Madison gthain@cs.wisc.edu 2015 Internet2 HTCondor User Tutorial CONTENTS Overview Basic job submission How

More information

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer Daniel Yorgov Department of Mathematical & Statistical Sciences, University of Colorado Denver

More information

JupyterHub Documentation

JupyterHub Documentation JupyterHub Documentation Release 0.9.1 Project Jupyter team Jul 04, 2018 Contents 1 Contents 3 2 Indices and tables 5 3 Questions? Suggestions? 7 4 Full Table of Contents 9 4.1 Installation Guide............................................

More information

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.

More information

Andrej Filipčič

Andrej Filipčič Singularity@SiGNET Andrej Filipčič SiGNET 4.5k cores, 3PB storage, 4.8.17 kernel on WNs and Gentoo host OS 2 ARC-CEs with 700TB cephfs ARC cache and 3 data delivery nodes for input/output file staging

More information

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science T. Maeno, K. De, A. Klimentov, P. Nilsson, D. Oleynik, S. Panitkin, A. Petrosyan, J. Schovancova, A. Vaniachine,

More information

Current Status of the Ceph Based Storage Systems at the RACF

Current Status of the Ceph Based Storage Systems at the RACF Journal of Physics: Conference Series PAPER OPEN ACCESS Current Status of the Ceph Based Storage Systems at the RACF To cite this article: A. Zaytsev et al 2015 J. Phys.: Conf. Ser. 664 042027 View the

More information

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014 Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014 Karthik Krishnan Page 1 of 20 Table of Contents Table of Contents... 2 Abstract... 3 What

More information

JupyterHub Documentation

JupyterHub Documentation JupyterHub Documentation Release 0.6.1 Project Jupyter team Oct 04, 2016 User Documentation 1 Getting started with JupyterHub 3 1.1 Installation................................................ 3 1.2 Overview.................................................

More information

HPC learning using Cloud infrastructure

HPC learning using Cloud infrastructure HPC learning using Cloud infrastructure Florin MANAILA IT Architect florin.manaila@ro.ibm.com Cluj-Napoca 16 March, 2010 Agenda 1. Leveraging Cloud model 2. HPC on Cloud 3. Recent projects - FutureGRID

More information

Parallel Storage Systems for Large-Scale Machines

Parallel Storage Systems for Large-Scale Machines Parallel Storage Systems for Large-Scale Machines Doctoral Showcase Christos FILIPPIDIS (cfjs@outlook.com) Department of Informatics and Telecommunications, National and Kapodistrian University of Athens

More information

Slurm basics. Summer Kickstart June slide 1 of 49

Slurm basics. Summer Kickstart June slide 1 of 49 Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource

More information

Building a Virtualized Desktop Grid. Eric Sedore

Building a Virtualized Desktop Grid. Eric Sedore Building a Virtualized Desktop Grid Eric Sedore essedore@syr.edu Why create a desktop grid? One prong of an three pronged strategy to enhance research infrastructure on campus (physical hosting, HTC grid,

More information

Introduction to BioHPC

Introduction to BioHPC Introduction to BioHPC New User Training [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2015-06-03 Overview Today we re going to cover: What is BioHPC? How do I access

More information

Analytics Platform for ATLAS Computing Services

Analytics Platform for ATLAS Computing Services Analytics Platform for ATLAS Computing Services Ilija Vukotic for the ATLAS collaboration ICHEP 2016, Chicago, USA Getting the most from distributed resources What we want To understand the system To understand

More information

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017 A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council Perth, July 31-Aug 01, 2017 http://levlafayette.com Necessary and Sufficient Definitions High Performance Computing: High

More information

Batch Services at CERN: Status and Future Evolution

Batch Services at CERN: Status and Future Evolution Batch Services at CERN: Status and Future Evolution Helge Meinhard, CERN-IT Platform and Engineering Services Group Leader HTCondor Week 20 May 2015 20-May-2015 CERN batch status and evolution - Helge

More information

Users and utilization of CERIT-SC infrastructure

Users and utilization of CERIT-SC infrastructure Users and utilization of CERIT-SC infrastructure Equipment CERIT-SC is an integral part of the national e-infrastructure operated by CESNET, and it leverages many of its services (e.g. management of user

More information

Developing Microsoft Azure Solutions

Developing Microsoft Azure Solutions 1 Developing Microsoft Azure Solutions Course Prerequisites A general understanding of ASP.NET and C# concepts Upon Completion of this Course, you will accomplish following: Compare the services available

More information

SCALABLE HYBRID PROTOTYPE

SCALABLE HYBRID PROTOTYPE SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform

More information

HTCondor Essentials. Index

HTCondor Essentials. Index HTCondor Essentials 31.10.2017 Index Login How to submit a job in the HTCondor pool Why the -name option? Submitting a job Checking status of submitted jobs Getting id and other info about a job

More information

Developing Microsoft Azure Solutions (70-532) Syllabus

Developing Microsoft Azure Solutions (70-532) Syllabus Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages

More information

The Portal Aspect of the LSST Science Platform. Gregory Dubois-Felsmann Caltech/IPAC. LSST2017 August 16, 2017

The Portal Aspect of the LSST Science Platform. Gregory Dubois-Felsmann Caltech/IPAC. LSST2017 August 16, 2017 The Portal Aspect of the LSST Science Platform Gregory Dubois-Felsmann Caltech/IPAC LSST2017 August 16, 2017 1 Purpose of the LSST Science Platform (LSP) Enable access to the LSST data products Enable

More information

Using jupyter notebooks on Blue Waters. Roland Haas (NCSA / University of Illinois)

Using jupyter notebooks on Blue Waters.   Roland Haas (NCSA / University of Illinois) Using jupyter notebooks on Blue Waters https://goo.gl/4eb7qw Roland Haas (NCSA / University of Illinois) Email: rhaas@ncsa.illinois.edu Jupyter notebooks 2/18 interactive, browser based interface to Python

More information

XSEDE New User Training. Ritu Arora November 14, 2014

XSEDE New User Training. Ritu Arora   November 14, 2014 XSEDE New User Training Ritu Arora Email: rauta@tacc.utexas.edu November 14, 2014 1 Objectives Provide a brief overview of XSEDE Computational, Visualization and Storage Resources Extended Collaborative

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

The GISandbox: A Science Gateway For Geospatial Computing. Davide Del Vento, Eric Shook, Andrea Zonca

The GISandbox: A Science Gateway For Geospatial Computing. Davide Del Vento, Eric Shook, Andrea Zonca The GISandbox: A Science Gateway For Geospatial Computing Davide Del Vento, Eric Shook, Andrea Zonca 1 Paleoscape Model and Human Origins Simulate Climate and Vegetation during the Last Glacial Maximum

More information

Scheduling Computational and Storage Resources on the NRP

Scheduling Computational and Storage Resources on the NRP Scheduling Computational and Storage Resources on the NRP Rob Gardner Dima Mishin University of Chicago UCSD Second NRP Workshop Montana State University August 6-7, 2018 slides: http://bit.ly/nrp-scheduling

More information

PROOF as a Service on the Cloud: a Virtual Analysis Facility based on the CernVM ecosystem

PROOF as a Service on the Cloud: a Virtual Analysis Facility based on the CernVM ecosystem PROOF as a Service on the Cloud: a Virtual Analysis Facility based on the CernVM ecosystem D Berzano, J Blomer, P Buncic, I Charalampidis, G Ganis, G Lestaris, R Meusel CERN PH-SFT CERN, CH-1211 Geneva

More information

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1 Why GPUs? A Tale of Numbers 100x Performance Increase Infrastructure Cost Savings Performance 100x gains over traditional

More information

Presented by: Jon Wedell BioMagResBank

Presented by: Jon Wedell BioMagResBank HTCondor Tutorial Presented by: Jon Wedell BioMagResBank wedell@bmrb.wisc.edu Background During this tutorial we will walk through submitting several jobs to the HTCondor workload management system. We

More information

The BOINC Community. PC volunteers (240,000) Projects. UC Berkeley developers (2.5) Other volunteers: testing translation support. Computer scientists

The BOINC Community. PC volunteers (240,000) Projects. UC Berkeley developers (2.5) Other volunteers: testing translation support. Computer scientists The BOINC Community Projects Computer scientists UC Berkeley developers (2.5) PC volunteers (240,000) Other volunteers: testing translation support Workshop goals Learn what everyone else is doing Form

More information

Tutorial 4: Condor. John Watt, National e-science Centre

Tutorial 4: Condor. John Watt, National e-science Centre Tutorial 4: Condor John Watt, National e-science Centre Tutorials Timetable Week Day/Time Topic Staff 3 Fri 11am Introduction to Globus J.W. 4 Fri 11am Globus Development J.W. 5 Fri 11am Globus Development

More information

Shifter: Fast and consistent HPC workflows using containers

Shifter: Fast and consistent HPC workflows using containers Shifter: Fast and consistent HPC workflows using containers CUG 2017, Redmond, Washington Lucas Benedicic, Felipe A. Cruz, Thomas C. Schulthess - CSCS May 11, 2017 Outline 1. Overview 2. Docker 3. Shifter

More information

Alteryx Technical Overview

Alteryx Technical Overview Alteryx Technical Overview v 1.5, March 2017 2017 Alteryx, Inc. v1.5, March 2017 Page 1 Contents System Overview... 3 Alteryx Designer... 3 Alteryx Engine... 3 Alteryx Service... 5 Alteryx Scheduler...

More information

Introduction to Condor. Jari Varje

Introduction to Condor. Jari Varje Introduction to Condor Jari Varje 25. 27.4.2016 Outline Basics Condor overview Submitting a job Monitoring jobs Parallel jobs Advanced topics Host requirements Running MATLAB jobs Checkpointing Case study:

More information

Monitoring HTCondor with the BigPanDA monitoring package

Monitoring HTCondor with the BigPanDA monitoring package Monitoring HTCondor with the BigPanDA monitoring package J. Schovancová 1, P. Love 2, T. Miller 3, T. Tannenbaum 3, T. Wenaus 1 1 Brookhaven National Laboratory 2 Lancaster University 3 UW-Madison, Department

More information

Exam Questions

Exam Questions Exam Questions 70-354 Universal Windows Platform App Architecture and UX/UI https://www.2passeasy.com/dumps/70-354/ 1.You need to recommend an appropriate solution for the data mining requirements. Which

More information

Workstations & Thin Clients

Workstations & Thin Clients 1 Workstations & Thin Clients Overview Why use a BioHPC computer? System Specs Network requirements OS Tour Running Code Locally Submitting Jobs to the Cluster Run Graphical Jobs on the Cluster Use Windows

More information

A Virtual Comet. HTCondor Week 2017 May Edgar Fajardo On behalf of OSG Software and Technology

A Virtual Comet. HTCondor Week 2017 May Edgar Fajardo On behalf of OSG Software and Technology A Virtual Comet HTCondor Week 2017 May 3 2017 Edgar Fajardo On behalf of OSG Software and Technology 1 Working in Comet What my friends think I do What Instagram thinks I do What my boss thinks I do 2

More information

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende Introduction to the NCAR HPC Systems 25 May 2018 Consulting Services Group Brian Vanderwende Topics to cover Overview of the NCAR cluster resources Basic tasks in the HPC environment Accessing pre-built

More information

The Materials Data Facility

The Materials Data Facility The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials

More information

Flying HTCondor at 100gbps Over the Golden State

Flying HTCondor at 100gbps Over the Golden State Flying HTCondor at 100gbps Over the Golden State Jeff Dost (UCSD) HTCondor Week 2016 1 What is PRP? Pacific Research Platform: - 100 gbit network extending from Southern California to Washington - Interconnects

More information

Automatic Dependency Management for Scientific Applications on Clusters. Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain

Automatic Dependency Management for Scientific Applications on Clusters. Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain Automatic Dependency Management for Scientific Applications on Clusters Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain Where users are Scientist says: "This demo task runs on my

More information

Benchmarking the ATLAS software through the Kit Validation engine

Benchmarking the ATLAS software through the Kit Validation engine Benchmarking the ATLAS software through the Kit Validation engine Alessandro De Salvo (1), Franco Brasolin (2) (1) Istituto Nazionale di Fisica Nucleare, Sezione di Roma, (2) Istituto Nazionale di Fisica

More information

WLCG Lightweight Sites

WLCG Lightweight Sites WLCG Lightweight Sites Mayank Sharma (IT-DI-LCG) 3/7/18 Document reference 2 WLCG Sites Grid is a diverse environment (Various flavors of CE/Batch/WN/ +various preferred tools by admins for configuration/maintenance)

More information

NI Linux Real-Time. Fanie Coetzer. Field Sales Engineer SA North. ni.com

NI Linux Real-Time. Fanie Coetzer. Field Sales Engineer SA North. ni.com 1 NI Linux Real-Time Fanie Coetzer Field Sales Engineer SA North Agenda 1. Hardware Overview 2. Introduction to NI Linux Real-Time OS Background & Core Technology Filesystem Connectivity and Security 3.

More information

Introduction to SciTokens

Introduction to SciTokens Introduction to SciTokens Brian Bockelman, On Behalf of the SciTokens Team https://scitokens.org This material is based upon work supported by the National Science Foundation under Grant No. 1738962. Any

More information

USING NGC WITH GOOGLE CLOUD PLATFORM

USING NGC WITH GOOGLE CLOUD PLATFORM USING NGC WITH GOOGLE CLOUD PLATFORM DU-08962-001 _v02 April 2018 Setup Guide TABLE OF CONTENTS Chapter 1. Introduction to... 1 Chapter 2. Deploying an NVIDIA GPU Cloud Image from the GCP Console...3 2.1.

More information

ElastiCluster Automated provisioning of computational clusters in the cloud

ElastiCluster Automated provisioning of computational clusters in the cloud ElastiCluster Automated provisioning of computational clusters in the cloud Riccardo Murri (with contributions from Antonio Messina, Nicolas Bär, Sergio Maffioletti, and Sigve

More information

LHConCRAY. Acceptance Tests 2017 Run4 System Report Miguel Gila, CSCS August 03, 2017

LHConCRAY. Acceptance Tests 2017 Run4 System Report Miguel Gila, CSCS August 03, 2017 LHConCRAY Acceptance Tests 2017 Run4 System Report Miguel Gila, CSCS August 03, 2017 Table of Contents 1. Changes since Run2/3 2. DataWarp 3. Current configuration 4. System statistics 5. Next steps LHConCRAY

More information

Supporting GPUs in Docker Containers on Apache Mesos

Supporting GPUs in Docker Containers on Apache Mesos Supporting GPUs in Docker Containers on Apache Mesos MesosCon Europe - 2016 Kevin Klues Senior Software Engineer Mesosphere Yubo Li Staff Researcher IBM Research China Kevin Klues Yubo Li Kevin Klues is

More information

Working with Shell Scripting. Daniel Balagué

Working with Shell Scripting. Daniel Balagué Working with Shell Scripting Daniel Balagué Editing Text Files We offer many text editors in the HPC cluster. Command-Line Interface (CLI) editors: vi / vim nano (very intuitive and easy to use if you

More information

HPC Resources at Lehigh. Steve Anthony March 22, 2012

HPC Resources at Lehigh. Steve Anthony March 22, 2012 HPC Resources at Lehigh Steve Anthony March 22, 2012 HPC at Lehigh: Resources What's Available? Service Level Basic Service Level E-1 Service Level E-2 Leaf and Condor Pool Altair Trits, Cuda0, Inferno,

More information

Leveraging the Globus Platform in your Web Applications. GlobusWorld April 26, 2018 Greg Nawrocki

Leveraging the Globus Platform in your Web Applications. GlobusWorld April 26, 2018 Greg Nawrocki Leveraging the Globus Platform in your Web Applications GlobusWorld April 26, 2018 Greg Nawrocki greg@globus.org Topics and Goals Platform Overview Why expose the APIs A quick touch of the Globus Auth

More information

HDP Security Overview

HDP Security Overview 3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New

More information

HDP Security Overview

HDP Security Overview 3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New

More information

Dask-jobqueue Documentation

Dask-jobqueue Documentation Dask-jobqueue Documentation Release 0.4.1+11.g821d3d9 [ Dask-jobqueue Development Team ] Nov 29, 2018 Getting Started 1 Example 3 2 Adaptivity 5 i ii Easily deploy Dask on job queuing systems like PBS,

More information

Developing Microsoft Azure Solutions (70-532) Syllabus

Developing Microsoft Azure Solutions (70-532) Syllabus Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages

More information

Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E

Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc Overview Supported Technologies Cray PE Python Support Shifter Urika-XC Anaconda Python Spark Intel BigDL machine

More information

CONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY

CONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY CONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY VIRTUAL MACHINE (VM) Uses so&ware to emulate an en/re computer, including both hardware and so&ware. Host Computer Virtual Machine Host Resources:

More information

Metview s new Python interface first results and roadmap for further developments

Metview s new Python interface first results and roadmap for further developments Metview s new Python interface first results and roadmap for further developments EGOWS 2018, ECMWF Iain Russell Development Section, ECMWF Thanks to Sándor Kertész Fernando Ii Stephan Siemen ECMWF October

More information