Clouds in High Energy Physics

Similar documents
Clouds at other sites T2-type computing

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

LHCb experience running jobs in virtual machines

Batch Services at CERN: Status and Future Evolution

Running HEP Workloads on Distributed Clouds

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

WLCG Lightweight Sites

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Using a dynamic data federation for running Belle-II simulation applications in a distributed cloud environment

The Legnaro-Padova distributed Tier-2: challenges and results

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

CERN: LSF and HTCondor Batch Services

Scientific Computing on Emerging Infrastructures. using HTCondor

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

HTCondor Week 2015: Implementing an HTCondor service at CERN

Use of containerisation as an alternative to full virtualisation in grid environments.

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

Grid Computing Activities at KIT

Distributing storage of LHC data - in the nordic countries

CernVM-FS beyond LHC computing

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

Operating the Distributed NDGF Tier-1

The Diverse use of Clouds by CMS

Distributed Computing Framework. A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

13th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT 2010 Jaipur, India February

Managing a tier-2 computer centre with a private cloud infrastructure

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

BESIII Computing Model and most recent R&Ds

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Evolution of Cloud Computing in ATLAS

The Fermilab HEPCloud Facility: Adding 60,000 Cores for Science! Burt Holzman, for the Fermilab HEPCloud Team HTCondor Week 2016 May 19, 2016

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

NCP Computing Infrastructure & T2-PK-NCP Site Update. Saqib Haleem National Centre for Physics (NCP), Pakistan

Distributed Systems. 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski. Rutgers University. Fall 2013

ALICE Grid Activities in US

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

A ground-up approach to High-Throughput Cloud Computing in High-Energy Physics

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

ElastiCluster Automated provisioning of computational clusters in the cloud

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Singularity tests at CC-IN2P3 for Atlas

INDIGO-DataCloud Architectural Overview

Benchmarking the ATLAS software through the Kit Validation engine

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

Introducing the HTCondor-CE

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

Grid and Cloud Activities in KISTI

The LHC Computing Grid

First Experience with LCG. Board of Sponsors 3 rd April 2009

DESY. Andreas Gellrich DESY DESY,

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

IN2P3-CC cloud computing (IAAS) status FJPPL Feb 9-11th 2016

The LHC computing model and its evolution. Dr Bob Jones CERN

Transient Compute ARC as Cloud Front-End

LHCb Computing Resource usage in 2017

Cloud du CCIN2P3 pour l' ATLAS VO

Computing activities in Napoli. Dr. Silvio Pardi (INFN-Napoli) Belle II Italian collaboration meeting 21 November 2017 Pisa - Italy

Status of KISTI Tier2 Center for ALICE

CERN Network activities update

Interfacing HTCondor-CE with OpenStack: technical questions

Workload management at KEK/CRC -- status and plan

Computing at Belle II

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

UW-ATLAS Experiences with Condor

Dynamic Extension of a Virtualized Cluster by using Cloud Resources

PROOF-Condor integration for ATLAS

Module Day Topic. 1 Definition of Cloud Computing and its Basics

PROOF as a Service on the Cloud: a Virtual Analysis Facility based on the CernVM ecosystem

Industry-leading Application PaaS Platform

Connecting Restricted, High-Availability, or Low-Latency Resources to a Seamless Global Pool for CMS

Pre-Commercial Procurement project - HNSciCloud. 20 January 2015 Bob Jones, CERN

Austrian Federated WLCG Tier-2

Towards Monitoring-as-a-service for Scientific Computing Cloud applications using the ElasticSearch ecosystem

Visita delegazione ditte italiane

Summary of the LHC Computing Review

LHCb Computing Strategy

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

The EU DataGrid Testbed

WHY COMPOSABLE INFRASTRUCTURE INSTEAD OF HYPERCONVERGENCE

A comparison of performance between KVM and Docker instances in OpenStack

Lightweight scheduling of elastic analysis containers in a competitive cloud environment: a Docked Analysis Facility for ALICE

Towards Network Awareness in LHC Computing

Geant4 on Azure using Docker containers

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

When (and how) to move applications from VMware to Cisco Metacloud

Cloud Computing. Summary

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

Five years of OpenStack at CERN

Improved ATLAS HammerCloud Monitoring for Local Site Administration

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Windows Azure Services - At Different Levels

Certificate. Certificate number: Certified by EY CertifyPoint since: February 28, 2017

T0-T1-T2 networking. Vancouver, 31 August 2009 LHCOPN T0-T1-T2 Working Group

t and Migration of WLCG TEIR-2

Hedvig as backup target for Veeam

Transcription:

Clouds in High Energy Physics Randall Sobie University of Victoria Randall Sobie IPP/Victoria 1

Overview Clouds are integral part of our HEP computing infrastructure Primarily Infrastructure-as-a-Service (IAAS) Our use is wide ranging and diverse CERN Agile Infrastructure Tier-1 computing at centres such as BNL, FNAL and RAL Tier-2 computing around the world Expanding use of HEP-clouds, private clouds and commercial clouds Randall Sobie IPP/Victoria 2

Motivation A wide range of reasons for using clouds Ease management of existing infrastructure Separation of application and system administration Simplifies allocation of resources Leverage software development Opportunistic computing Non-HEP computing centres Commercial cloud resources Randall Sobie IPP/Victoria 3

Types of cloud resources Dedicated Cloud computing in HEP is typically providing 5-20% of the processing of current projects Virtual cluster Dedicated clouds (Owned by HEP) Opportunistic Opportunistic clouds (private and commercial) Randall Sobie IPP/Victoria 4

Cloud deployments Traditional bare-metal Static cloud (e.g.. LTDA BaBar, HLT clouds) Standalone/private cloud (e.g. PNNL, NorduGrid) Distributed clouds (e.g. UK, Canada, Australia, INFN Clouds) Bare-metal or in-house cloud with external cloud (e.g.. CERN, BNL) Randall Sobie IPP/Victoria 5

OpenStack Clouds at CERN In production: q 4 clouds q >230K cores q >8,000 hypervisors >90% of CERN s compute compute resources are now resources delivered on are top virtualized of OpenStack Up A further to 42K 42K cores cores to be installed to be installed the next in next few months subject to funding D. Giordano WLCG Workshop 9/10/2016 2 Randall Sobie IPP/Victoria 6

892 cores u,lizing a Ceph storage backend. 3 alterna,ng racks of CPU and Storage nodes. Tier 1 services now running on Cloud VMs. Engaging with various European Cloud projects (e.g DataCloud). Clouds at RAL Batch Work on the Cloud For ~1.5 years the RAL HTCondor batch system has made opportunis,c use of unused cloud resources. HTCondor rooster daemon used to provision VMs. Running jobs from all 4 LHC experiments & many non- LHC experiments. ARC/CREAM CEs condor_schedd Offline machine ClassAds Draining Central manager S3 and SwiS Storage Storing Docker images for Container Orchestra,on via SwiS. Openstack service under development. Available to LHC VOs next year. condor_rooster Virtual worker nodes condor_startd condor_collector condor_nego,ator Worker nodes condor_startd Randall Sobie IPP/Victoria 7

FNAL/CMS AWS January/February 2016 Tier-1 Cloud bursting onto EC2 13 BNL/ATLAS AWS September 2015 9 Randall Sobie IPP/Victoria 8

Special purpose clouds BaBar Long Term Data Access (LTDA) System Ability to preserve data and analysis capability for BaBar (stopped data taking in 2008) Randall Sobie IPP/Victoria 9

High level trigger farms of the LHC Experiments (large multi-10k core systems) Virtual machines are booted during no-beam periods Randall Sobie IPP/Victoria 10

Examples of Tier-2 cloud deployments Randall Sobie IPP/Victoria 11

SWITCHengines Swiss NREN commercial cloud Australian Belle II Grid Site Single CREAM CE services ATLAS Tier-2 (Torque) and Belle II site (Dynamic Torque) Australia-ATLAs Tier 2 TORQUE + Maui 14,000 HEPSpec ~ (1400 cores) Dynamic Torque CREAM CE distribute jobs via SSH TORQUE + Maui (Belle II) LCG.Melbourne.au Dynamic Torque control VMs Research Cloud (Currently 700 cores) Why private cloud? Chosen for flexibility, efficient use of compute resources for services Provides easy load-balancing and availability features Provides templating features Easy re-use of templates to test and instantiate new server instances Non-systems staff can provision their own instances of services Software Defined Networking is more malleable than physical networking, encourages better networking practices, including security Randall Sobie IPP/Victoria 12

UK / GridPP Clouds at HEP institutions (Oxford/Imperial). ECDF cloud in Edinburgh has recently made available to the HEP UK Vacuum deployment Commercial cloud DataCentred Openstack cloud Italy / INFN PrivateOpenStack Cloud (Padova-Legnaro) called CLOUD AREA PADOVANA ~ 25 user groups/project CMS production PNNL / Washington Private OpenStack cloud for Belle II project (KEK) and other local users Randall Sobie IPP/Victoria 13

Canada Distributed cloud system for ATLAS and Belle II 10-15 clouds HTCondor/CloudScheduler 4000-5000 cores Job HTCondor CloudScheduler CloudScheduler Submit user script Start VMs Starts job VM Image Repository Compute Cloud VM ATLAS jobs on cloud for CA-system 10 clouds 4300 cores Randall Sobie IPP/Victoria 14

Job scheduling/vm provisioning Variety of methods for running HEP workloads on clouds VM-DIRAC (LHCb and Belle II) VAC/Vcycle (UK) HTCondor/CloudScheduler (Canada) HTC/GlideinWMS (FNAL), HTC/VM (PNNL), HTC/APF (BNL) Dynamic-Torque (Australia) Cloud Area Padovana (INFN) ARC (NorduGrid) Each method has its own merits and often was designed to integrated clouds into an existing infrastructure (e.g. local, WLCG and experiment) Randall Sobie IPP/Victoria 15

Commercial clouds Amazon EC2 and Microsoft Azure Short-term multi-10k tests Long-term 1K-scale production GCE evaluation but no production Other commercial OpenStack clouds DataCentred (UK), SWITCHengines (Switzerland) CERN commercial cloud procurement Randall Sobie IPP/Victoria 16

Network connectivity Amazon and Microsoft clouds are connected to the research networks in North America (probably GCE as well) Trans-border or trans-ocean traffic can be an issue Become an important discussion topic in the LHCONE meetings Private opportunistic clouds traffic flows over research network but not LHCONE network Randall Sobie IPP/Victoria 17

CPU Benchmarks New suite of fast benchmarks HEPiX Benchmark Working Group Suite available includes fast HS (LHCb) and Whetstone benchmarks Write to ElasticSearch DB Run benchmarks in the pilot job or during the boot of the VM Data storage Data written to local storage on node and then transferred to selected SE UK group has done some work integrating their object store with ATLAS BNL using S3 storage on EC2 for T2-SE Randall Sobie IPP/Victoria 18

Monitoring Cloud or site monitor Cloud System monitor Sensu, Munin, RabbitMQ, Mongo-DB Application Application monitor Panda Benchmarks and accounting ElasticSearch DB Randall Sobie IPP/Victoria 19

Summary Clouds in HEP Growing, diverse use of clouds Typically integrated into an existing infrastructure Seen as a way to better manage multi-user resources Opportunistic research clouds Easy way to utilize clouds at non-hep research computing facilities No requirement for on-site application specialists or complex software Commercial clouds EC2/Azure as well as other OpenStack clouds Trans-border network connectivity challenges Randall Sobie IPP/Victoria 20