PROOF-Condor integration for ATLAS

Similar documents
UW-ATLAS Experiences with Condor

PARALLEL PROCESSING OF LARGE DATA SETS IN PARTICLE PHYSICS

PROOF Benchmark on Different

Storage Resource Sharing with CASTOR.

High-Energy Physics Data-Storage Challenges

The ATLAS EventIndex: Full chain deployment and first operation

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Virtualizing a Batch. University Grid Center

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

High Throughput Urgent Computing

LCG data management at IN2P3 CC FTS SRM dcache HPSS

The CMS Computing Model

CouchDB-based system for data management in a Grid environment Implementation and Experience

ATLAS Experiment and GCE

Batch Services at CERN: Status and Future Evolution

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Cloud Computing. Summary

DIAL: Distributed Interactive Analysis of Large Datasets

The National Analysis DESY

Tutorial 4: Condor. John Watt, National e-science Centre

An update on the scalability limits of the Condor batch system

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

Expressing Parallelism with ROOT

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Grid Computing Activities at KIT

Summary of the LHC Computing Review

The INFN Tier1. 1. INFN-CNAF, Italy

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Changing landscape of computing at BNL

Status of KISTI Tier2 Center for ALICE

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

and the GridKa mass storage system Jos van Wezel / GridKa

Multi-threaded, discrete event simulation of distributed computing systems

The former pager tasks have been replaced in 7.9 by the special savepoint tasks.

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

High Throughput WAN Data Transfer with Hadoop-based Storage

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

glideinwms architecture by Igor Sfiligoi, Jeff Dost (UCSD)

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Chapter 1: Distributed Information Systems

HTCondor Week 2015: Implementing an HTCondor service at CERN

Clustering and Reclustering HEP Data in Object Databases

CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

1. Introduction. Outline

Teraflops of Jupyter: A Notebook Based Analysis Portal at BNL

Adaptive Cluster Computing using JavaSpaces

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Data Transfers Between LHC Grid Sites Dorian Kcira

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

AliEn Resource Brokers

Towards Network Awareness in LHC Computing

Chapter 13: I/O Systems

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Tier 3 batch system data locality via managed caches

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Improve Web Application Performance with Zend Platform

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

CMS HLT production using Grid tools

Chapter 12: I/O Systems

Chapter 13: I/O Systems

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition

Andrea Sciabà CERN, Switzerland

The ATLAS Distributed Analysis System

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

PROOF and ROOT Grid Features

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2

ATLAS Analysis Workshop Summary

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

CERN Tape Archive (CTA) :

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Monitoring and Analytics With HTCondor Data

ROOT Course. Vincenzo Vitale, Dip. Fisica and INFN Roma 2

THOUGHTS ON SDN IN DATA INTENSIVE SCIENCE APPLICATIONS

PROOF as a Service on the Cloud: a Virtual Analysis Facility based on the CernVM ecosystem

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

High Performance Computing Course Notes Grid Computing I

Clouds in High Energy Physics

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

The Global Grid and the Local Analysis

Overview of ATLAS PanDA Workload Management

A L I C E Computing Model

arxiv: v1 [physics.ins-det] 1 Oct 2009

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

Existing Tools in HEP and Particle Astrophysics

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Transcription:

PROOF-Condor integration for ATLAS G. Ganis,, J. Iwaszkiewicz, F. Rademakers CERN / PH-SFT M. Livny, B. Mellado, Neng Xu,, Sau Lan Wu University Of Wisconsin Condor Week, Madison, 29 Apr 2 May 2008

Outline HEP End-User analysis with PROOF Why PROOF on Condor Proof-Of-Concept using COD The ATLAS model Summary 04/30/2008 2

The Large Hadron Collider (LHC) p-p collisions at 14 TeV start: end 2008 / beg 2009 4 experiments SWITZERLAND FRANCE LHCb ATLAS ALICE CMS 04/30/2008 3

The LHC Data Challenge The LHC generates 40 10 6 collisions / s Trigger rate 100 Hz 10 PB/y for all experiments E.g. ATLAS: 3.2 PB / year raw data ~1 PB / year ESD + ~same from simulations Analysis at Tier 2 / Tier 3 centers: O(100) cores End-user analysis is a contineous refinement cycle Implement algorithm Run over data set Make improvements Reading O(1) PB @ 50 MB/s takes ~230 days! Using parallelism is the only way out 04/30/2008 4

ROOT: the analysis package C++ framework providing tools for Storage: optimized for HEP data Visualization: 2D, 3D, event display,... Statistics, math functions, fitting,... Abstract interfaces: VirtualMC,... Puts together what was PAW, ZEBRA, CERNLIB and more How does ROOT address the problem of data processing? 04/30/2008 5

The ROOT data model: Trees & Selectors Selector output list Begin() Create histos, Define output list Process() OK preselection analysis Terminate() Final analysis (fitting, ) event leaf leaf branch branch branch n read needed parts only branch leaf leaf leaf leaf leaf Chain 1 2 n last loop over events 04/30/2008 6

Batch-oriented Approach File catalog Query Job splitter Job Job Job Job Split analysis job in N stand-alone sub-jobs Batch cluster Queue Job Job Storage Job Merger Job Job Collect sub-jobs and merge into single output Job Job CPU s Job Job Batch Scheduler Split analysis task in N batch jobs Job submission sequential Analysis finished when last sub-job finished 04/30/2008 7

Batch-oriented approach (2) Works well for frozen algorithms Reconstruction, creation of AOD, nano-esd,... Not very practical for algorithm refinements Work-around is to reduce the data to a (temporary) compact format Refine the algorithm on the compact format Possibly adjust the compact format 04/30/2008 8

PROOF Transparent,, scalable extension of the ROOT shell 3-tier Client-Master-Workers architecture Flexible Master tier Adapt to heterogeneous configurations Dilute load of reduction (merging) phase PROOF daemon is a plug-in (protocol) to SCALLA (xrootd) Data and PROOF access with the same daemon Local storage pool Coordinator functionality on the master Global view of PROOF activities 04/30/2008 9

PROOF architecture PROOF enabled facility geographical domain commands, scripts master sub master workers MSS geographical domain client top master sub master workers MSS list of output objects (histograms, ) sub master workers geographical domain MSS Network performance: Less important VERY important Optimize for data locality or high bandwidth data server access 04/30/2008 10

SCALLA (xrootd) Scalable Cluster Architecture for Low Latency Access Two building blocks xrootd: server for low latency high bandwidth data access cmsd (olbd): server to build scalable xrootd clusters Multi-thread, extensible architecture Top layer controls threading, memory, protocols Protocol is a generic interface to services Can run any number of protocols at the same time Special focus on fault-tolerance Developed by SLAC/INFN for BaBar Growing interest by LHC collaborations http://xrootd.slac.stanford.edu 04/30/2008 11

The PROOF Approach File catalog PROOF cluster Storage Query PROOF query: data file list, myselector.c Scheduler CPU s Feedback, merged final output Master 04/30/2008 12

PROOF features Dynamic use of resources Pull architecture Workers ask for work when idle Real-time feedback Set of objects can be send to the client at a tunable frequency Package manager Allows to upload/enable code needed by the job Cluster perceived as an extension of the local shell Can control many clusters from the same shell Automatic splitting and merging 04/30/2008 13

Interest in PROOF Started as a joint project MIT (PHOBOS) / CERN (ALICE) Currently CERN (PH-SFT) + contributions from GSI Darmstadt Used by PHOBOS for end-user analysis since 2003 At LHC ALICE: official requirement for analysis facilities ATLAS, CMS: testing analysis models based on SCALLA for data-serving and PROOF LHCb: started some work to adapt pyroot (Python ROOT) to PROOF 04/30/2008 14

Why PROOF and Condor? End-user interactive analysis is cahotic Typically intensive for limited periods of time Average load on a PROOF dedicated pool may be low ALICE express-line at their CERN Analysis Facility aims at that ~50 users for ~500 cores Fast response time for prompt quality control analysis But in general this is not affordable Can we increase the average load, keeping the advantages of PROOF? 04/30/2008 15

PROOF and Condor Use Condor as a tool to share the available resources between batch-like activities and PROOF Get the CPUs by suspending / preempting running jobs Needs to free all resources (not only CPU) Simple renicing could be sufficient for CPU-intensive jobs Use them interactively Resume jobs after the session is finished 04/30/2008 16

The first PROOF + COD model Developed for PHOBOS analysis Based on Computer-On-Demand (COD) COD requests submitted by either directly by users or by their master session 'proofd' daemon started during 'activate' User starts a PROOF session on the allocated machines 04/30/2008 17

The first PROOF + COD model (2) Condor + Pool Normal Production Condor jobs Condor Master 1: COD requests 1: COD requests Heavy I/O Load PROOF queries 2: PROOF requests PROOF Master Centralized storage servers (NFS, (x)rootd, dcache, CASTOR) 04/30/2008 18

The first PROOF + COD model (3) Pros It worked: used by PHOBOS to manage their resources at RCF / BNL Successful Proof-Of-Concept Cons Release of COD claims under users responsibility '-lease' not used Startup scaling issues with large number of nodes Needs better activate strategy Potential problems with many users COD does not affect Condor priority system Reading data from a central storage system may cause heavy traffic and may be very inefficient 04/30/2008 19

Towards an improved model Ideas Use standard suspension / resume instead of COD Get Condor priority system in the game Exploit local storage to optimize data access Temporarly upload data files on the pool ALICE experience shows that this is more efficient if the same data are used by many groups Exploit global view of the system provided by the SCALLA-based PROOF connection layer Control machines where to start workers taking into account exact location of data files 04/30/2008 20

Some Remarks If pre-installation not available all what we need is ROOT either from a shared file system or from a tarball (~100 MB, from the Web or shipped) Installation script available Local storage not available or small Use remote access to files Asynchronous read-ahead recently introduced in XROOTD

The ATLAS Wisconsin model Designed for a large scale Analysis Facility with PROOF pool > 100 users Limited and structured storage Backup source: D-Cache, CASTOR, xrootdfs, or DDM(Grid) Issues: HD pool SSD pool RAM Efficient scheduling of large number of users Data staging-in/-out on the pool 04/30/2008 22

The ATLAS Wisconsin model (2) Synchronize job running to dataset availability Dedicated file staging daemons running under Condor Database with dataset availability info Datasets are named collection of files Control the total number of running PROOF sessions using SLOTs Pool of High priority slots for PROOF Pool of slots for background jobs PROOF scheduler can enforce external requirements, e.g. experiment specific policies 04/30/2008 23

The ATLAS Wisconsin model (3) PROOF Pool Normal Production Condor jobs Condor Master History, Load, Policy,... 3: Submits Condor Job 5: Start 4: M ake sure the data are in xrootd PROOF queries PROOF coordinator PROOF query 1: start query PROOF Master 6: Resume the query 2: Put query on-hold File stager D-Cache, CASTOR, Xrootdfs, Or DDM(Grid) 04/30/2008 24

The ATLAS Wisconsin model (4) Users issue PROOF processing queries in normal way PROOF master Puts the query on-hold Creates a Condor job for each query Dataset readiness as requirement ClassAd Submits the job to the Condor scheduler Condor scheduler puts the job in Held state File stager daemon checks regurarly for new dataset requests Collects the dataset requirements and arranges the movement of files Releases the processing job when their required datasets are ready Condor scheduler runs the job on PROOF, resuming the query put on-hold at the previous step 04/30/2008 25

The ATLAS Wisconsin model (5) Condor Scheduler for PROOF Service for Scheduling Condor Master Condor Collector Condor Scheduler Service for PROOF jobs Condor Starter Job slots for PROOF session slot1@pcuw104 slot2@pcuw104 slot3@pcuw104 slot4@pcuw104 These slots can be used to limit the total number of running PROOF sessions Job slots for File Stage-In (can run on background) slot5@pcuw104 slot6@pcuw104 slot7@pcuw104 slot8@pcuw104 slot9@pcuw104 slot10@pcuw104 04/30/2008 26

File stager Database for Datasets Dataset name #of req # of file Last req date Status comment PROOF / xrootd pool mc08.017506.pythiab_bbm u6mu4x.evgen.e306 2 50 2008/2/2 0 waiting xx mc08.017506.pythiab_bbm u6mu4x.evgen.e306 1 50 2008/2/2 0 waiting xx mc08.017506.pythiab_bbm u6mu4x.evgen.e306 PROOF batch jobs list Name input dataset Job1 mc08.017506.pythiab_bbmu6mu4x.evgen.e306 500 Job2 mc08.017506.pythiab_bbmu6mu6x.evgen.e306 400 Job3 mc08.0175068.pythiab_bbmu6mu4x.evgen.e306 30 Job4 mc08.017506.pythiab_bbmu6mu4x.evgen 50 Job5 mc08.017888.pythiab_bbmu6mu4x.evgen.e306 100 Job6 mc08.017506.pythiab_bbmu6mu4x.evgen.e306 120 1 500 2008/2/2 0 waiting Condor jobs set to Held by default. The Stage Server release them once the dataset is staged into the PROOF / Xrootd pool. Dataset stage-in also has priority which depends on the number of requests, number of files, waiting time, etc.. xx Read the requirement Release the job Stage daemons Check files status Stage in the files D-Cache, CASTOR, Xrootdfs, Or DDM(Grid) 04/30/2008 27

Open Issues Condor Full understanding of 'slot' handling Enforce dependency on dataset readiness Job held / release? Direct synchronization using a ClassAd? Startup performance issues Suspension / preemption Dataset management Error handling during file movement Dataset lifetime 04/30/2008 28

Open Issues (2) PROOF Main missing ingredient was support for on-hold query submission and preempt / resume Prototype on test Query preemption / restart via signal If Condor needs to preempt only part of the workers doing it via PROOF may allow processing to continue on the reduced set of workers Start PROOF servers via Condor to fully control the session 04/30/2008 29

Summary PROOF-Condor integration allows to optimally share a cluster between batch and interactive usage Basic model based on COD available since 2003 Recent PROOF-xrootd integration allow the design of an alternative model Addresses the case of concurrent multi-user analysis of large amounts of HEP data Working prototype under development 04/30/2008 30