Scalability / Data / Tasks

Similar documents
Grid Computing at Ljubljana and Nova Gorica

EGI: Linking digital resources across Eastern Europe for European science and innovation

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

Grid Computing Activities at KIT

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

Clouds at other sites T2-type computing

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Opportunities for container environments on Cray XC30 with GPU devices

HPC Cloud at SURFsara

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

EGI-InSPIRE. ARC-CE IPv6 TESTBED. Barbara Krašovec, Jure Kranjc ARNES. EGI-InSPIRE RI

Andrej Filipčič

Operating the Distributed NDGF Tier-1

CSinParallel Workshop. OnRamp: An Interactive Learning Portal for Parallel Computing Environments

The LHC Computing Grid

Clouds in High Energy Physics

Support for multiple virtual organizations in the Romanian LCG Federation

Opportunities A Realistic Study of Costs Associated

WLCG Lightweight Sites

The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015

ATLAS Tier-3 UniGe

Molecular dynamics simulations in the MolDynGrid Virtual Laboratory by means of ARC between Grid and Cloud

Monitoring ARC services with GangliARC

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Use to exploit extra CPU from busy Tier2 site

Users and utilization of CERIT-SC infrastructure

Parallel Computing in EGI

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

A Laconic HPC with an Orgone Accumulator. Presentation to Multicore World Wellington, February 15-17,

WLCG and Grid Computing Summer 2011 Part1: WLCG Markus Schulz. IT Grid Technology Group, CERN WLCG

Singularity tests at CC-IN2P3 for Atlas

Distributing storage of LHC data - in the nordic countries

GRID AND HPC SUPPORT FOR NATIONAL PARTICIPATION IN LARGE-SCALE COLLABORATIONS

Jetstream: A science & engineering cloud

CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING. M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś

EGEE and Interoperation

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

SuperVessel: The Open Cloud Service for OpenPOWER

EGI-InSPIRE. Cloud Services. Steven Newhouse, EGI.eu Director. 23/05/2011 Cloud Services - ASPIRE - May EGI-InSPIRE RI

Contrail Cloud Platform Architecture

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Contrail Cloud Platform Architecture

Transient Compute ARC as Cloud Front-End

Batch Services at CERN: Status and Future Evolution

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

FUJITSU PHI Turnkey Solution

Accelerate OpenStack* Together. * OpenStack is a registered trademark of the OpenStack Foundation

Bright Cluster Manager Advanced HPC cluster management made easy. Martijn de Vries CTO Bright Computing

CS500 SMARTER CLUSTER SUPERCOMPUTERS

Garuda : The National Grid Computing Initiative Of India. Natraj A.C, CDAC Knowledge Park, Bangalore.

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

ATLAS NorduGrid related activities

CZECH REPUBLIC IN THE EOSC ARENA

DESY. Andreas Gellrich DESY DESY,

CernVM-FS beyond LHC computing

ALICE Grid Activities in US

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017

SZDG, ecom4com technology, EDGeS-EDGI in large P. Kacsuk MTA SZTAKI

Outline. March 5, 2012 CIRMMT - McGill University 2

Bringing ATLAS production to HPC resources - A use case with the Hydra supercomputer of the Max Planck Society

Andrea Sciabà CERN, Switzerland

European Grid Infrastructure

First Experience with LCG. Board of Sponsors 3 rd April 2009

Virtualization. A very short summary by Owen Synge

Integration of Cloud and Grid Middleware at DGRZR

Global Software Distribution with CernVM-FS

Workflow applications on EGI with WS-PGRADE. Peter Kacsuk and Zoltan Farkas MTA SZTAKI

Graham vs legacy systems

On-demand Research Computing: the European Grid Infrastructure

Scheduling Computational and Storage Resources on the NRP

SLATE. Services Layer at the Edge. First Meeting of the National Research Platform Montana State University August 7-8, 2017

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

Research e-infrastructures in Czech Republic (e-infra CZ) for scientific computations, collaborative research & research support

Five years of OpenStack at CERN

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

Virtualizing a Batch. University Grid Center

ATLAS Experiment and GCE

HTCondor Week 2015: Implementing an HTCondor service at CERN

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

The Why and How of HPC-Cloud Hybrids with OpenStack

IN2P3-CC cloud computing (IAAS) status FJPPL Feb 9-11th 2016

The LHC Computing Grid

Grid and Cloud Activities in KISTI

ARC integration for CMS

The ATLAS Distributed Analysis System

Distributed e-infrastructures for data intensive science

Monash High Performance Computing

Volunteer Computing at CERN

Operating two InfiniBand grid clusters over 28 km distance

Barcelona Supercomputing Center

The Grid: Processing the Data from the World s Largest Scientific Machine

irods usage at CC-IN2P3: a long history

A distributed tier-1. International Conference on Computing in High Energy and Nuclear Physics (CHEP 07) IOP Publishing. c 2008 IOP Publishing Ltd 1

Bob Jones. EGEE and glite are registered trademarks. egee EGEE-III INFSO-RI

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

DESY site report. HEPiX Spring 2016 at DESY. Yves Kemp, Peter van der Reest. Zeuthen,

VC3. Virtual Clusters for Community Computation. DOE NGNS PI Meeting September 27-28, 2017

Transcription:

Jožef Stefan Institute Scalability / Data / Tasks Meeting Scalability Requirements with Large Data and Complex Tasks: Adapting Existing Technologies and Best Practices in Slovenia Jan Jona Javoršek Jožef Stefan Institute jona.javorsek@ijs.si SLING Slovenian Initiative for National Grid http://www.ijs.si/ http://www.sling.si/

Historical Zuse Z 23 CONVEX C3860 CDC Cyber 74 CONVEX C3860 3/29

SLING PRIKLJUČENI PRIKLJUČENI CENTRI CENTRI Arctur* Arctur* 1024 1024 Arnes Arnes 4400 4400 Atos* Atos* 3000 3000 CIPKeBiP CIPKeBiP 990 990 SiGNET SiGNET 4200 4200 UNG UNG 120 120 R4* R4* 1800 1800 NSC NSC 1800 1800 8 sites > 18.000 jeder (> 11.000 ARC-active) > 1PB disk > 4 milion jobs / y HPC, GPGPU, chroot > 80% SLO capacity Candidates Candidates Meteo Meteo 2200 2200 CI CI 2000 2000 ME ME 1050 1050 4/29

SLING users Arnes NREN users Cluster owners* Projects* Individual researchers University professors Student groups *not always ARC 5/29

Use Cases Particle Physics: ATLAS Pierre Auger Theoretical Physics Meteo/Geo Modelling Fluid Dynamics Reactor Physics Simulations 6/29 Pierre Auger Observatory

Use Cases Life Sciences, mostly computational (bio-)chemistry and genomics IJS users (biology, chemistry, knowledge technologies) Collaboration with EMBL Diagnostic genomics ELIXIR 7/29

Use Cases Knowledge technologies Modelling for different fields Genetic alghoriths Big/Web data analyisis Advanced computational linguistic models CLARIN.si 8/29

Steam explosion moment 9/29

Power distribution for Krsko NPP reactor Parallel Monte Carlo simulation of neutron transport, F-8 department 10/29

Innovation? batch system virtualisation network? 11/29

ARC and LRMS (batch system) 12/29

ARC Computing Element 13/29

ARC user accounts 14/29

CVMFS Salt Mix'n'match... CERN Agile model KeyStone OpenCL Ceph Globus NorduGrid ARC glite Cinder PKI VOMS Torque dcache OpenMP SLURM CUDA OpenStack gftp Glance OpenNebula ovirt Puppet science portals VRC 15/29

Software Deployment and Virtualization Admin install Environment Modules Compile job Run Time Environments Install job CHROOTs Shared disk Shared image Containers Docker Shifter 16/29

Storage Basic suport Short-term / local storage Medium-term storage Long term storage 17/29

User-Facing Issues Batch / ARC interface / PKI / VOMS Software installations and use Submission delays, error reporting and debugging MPI scalability difficulties Understanding of job and cluster topology GPGPU use 18/29

Groups and Projects Job and task management scalability Data management task managers Storage and troughputh hardware and cluster setup Oppurtunistic resource use Resource optimization innovative job models 19/29

ATLAS as an example ~100 distributed sites 250k cores used all the time 200PB of storage space 1M jobs/day 2PB of data is transferred per day between computing sites Sites include: WLCG GRID sites, HPCs, Clouds, Volunteer computing 20/29

act: ARC Control Tower Components: act Submitter Status checker Fetcher (app verification) Cleaner External&job& provider App&config App&engine ARC&config ARC&engine Site&1 Site&2 ARC&CE Site&3 ARC&CE Cluster ARC&CE Cluster Cluster App&table ARC&table DB&(Oracle/MySQL) 21/29

Opportunistic Resouce Use Grid clusters HPC clusters Private computers Public (commercial) cloud Microjobs 22/29

ATLAS scaling 2010 Planned data distribution Jobs go to data Multi-hop data flows Poor T2 networking across regions ~20 AOD copies distributed worldwide 23/29

ATLAS scaling 2010 Planned data distribution Jobs go to data Multi-hop data flows Poor T2 networking across regions 2013 Planned & dynamic distribution data Jobs go to data & data to free sites Direct data flows for most of T2s Many T2s connected to 10Gb/s link ~20 AOD copies distributed worldwide 4 AOD copies distributed worldwide 24/29

Social Component Accessibility beyond large projects Long-term funding Perception of public clouds Not invented here syndrome Users with no Unix experience Sustainability pressure 25/29

People Involved Andrej Filipčič, JSI Barbara Krašovec, Arnes, JSI Dejan Lesjak, JSI Janez Srakar, JSI Jan Jona Javoršek, JSI + 4 site administrators National Initiative: http://www.sling.si/ 26/29

Thanks! Questions? 27/29

New Computing Centre 200 m² slightly dislocated New network installation Water cooling Not enough power on-site yet Housing Pikolit, NSC, parts of others Interesting issues on cost sharing... 28/29

New Cluster Grid + HPC GPGPU: 16 x K80 NorduGrid ARC + SLURM Considering EGI Users: IJS departments related research supported EU infrastructures NSC Cluster in Numbers ~1800 cores ~35 TB scratch ~35 TB storage ~8 TB RAM 29/29