Distributed production managers meeting. Armando Fella on behalf of Italian distributed computing group

Similar documents
Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Cluster Setup and Distributed File System

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

The Legnaro-Padova distributed Tier-2: challenges and results

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

glite Grid Services Overview

Eurogrid: a glideinwms based portal for CDF data analysis - 19th January 2012 S. Amerio. (INFN Padova) on behalf of Eurogrid support group

Outline. ASP 2012 Grid School

The INFN Tier1. 1. INFN-CNAF, Italy

UW-ATLAS Experiences with Condor

The PanDA System in the ATLAS Experiment

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Comparative evaluation of software tools accessing relational databases from a (real) grid environments

A GENERAL PURPOSE SUITE FOR JOB MANAGEMENT, BOOKKEEPING, AND GRID SUBMISSION

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

Beob Kyun KIM, Christophe BONNAUD {kyun, NSDC / KISTI

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Status of KISTI Tier2 Center for ALICE

Distributed Monte Carlo Production for

Philippe Charpentier PH Department CERN, Geneva

Ganga The Job Submission Tool. WeiLong Ueng

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

Flying HTCondor at 100gbps Over the Golden State

EUROPEAN MIDDLEWARE INITIATIVE

Service Availability Monitor tests for ATLAS

AMGA metadata catalogue system

WMS overview and Proposal for Job Status

Implementing GRID interoperability

Introducing the HTCondor-CE

Computing activities in Napoli. Dr. Silvio Pardi (INFN-Napoli) Belle II Italian collaboration meeting 21 November 2017 Pisa - Italy

Geographical failover for the EGEE-WLCG Grid collaboration tools. CHEP 2007 Victoria, Canada, 2-7 September. Enabling Grids for E-sciencE

CMS Belgian T2. G. Bruno UCL, Louvain, Belgium on behalf of the CMS Belgian T2 community. GridKa T1/2 meeting, Karlsruhe Germany February

Challenges of the LHC Computing Grid by the CMS experiment

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

PoS(ACAT)020. Status and evolution of CRAB. Fabio Farina University and INFN Milano-Bicocca S. Lacaprara INFN Legnaro

Edinburgh (ECDF) Update

(Tier1) A. Sidoti INFN Pisa. Outline: Tasks and Goals The analysis (physics) Resources Needed

Optimization of Italian CMS Computing Centers via MIUR funded Research Projects

Data Access and Data Management

Tutorial for CMS Users: Data Analysis on the Grid with CRAB

Ganga - a job management and optimisation tool. A. Maier CERN

Long Term Data Preservation for CDF at INFN-CNAF

Experimental Computing. Frank Porter System Manager: Juan Barayoga

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

Grid Infrastructure For Collaborative High Performance Scientific Computing

Advanced Job Submission on the Grid

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

LCG-2 and glite Architecture and components

The Global Grid and the Local Analysis

Overview of HEP software & LCG from the openlab perspective

AGATA Analysis on the GRID

CMS HLT production using Grid tools

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

ARC integration for CMS

Testing SLURM open source batch system for a Tierl/Tier2 HEP computing facility

Promoting Open Standards for Digital Repository. case study examples and challenges

Cluster Nazionale CSN4

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

XRAY Grid TO BE OR NOT TO BE?

Understanding StoRM: from introduction to internals

ALICE Grid Activities in US

The National Analysis DESY

Grid Computing. Olivier Dadoun LAL, Orsay. Introduction & Parachute method. Socle 2006 Clermont-Ferrand Orsay)

PROOF-Condor integration for ATLAS

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Look What I Can Do: Unorthodox Uses of HTCondor in the Open Science Grid

The ATLAS Production System

The Grid: Processing the Data from the World s Largest Scientific Machine

The grid for LHC Data Analysis

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

FREE SCIENTIFIC COMPUTING

Failover procedure for Grid core services

OSG Lessons Learned and Best Practices. Steven Timm, Fermilab OSG Consortium August 21, 2006 Site and Fabric Parallel Session

A Login Shell interface for INFN-GRID

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

A Virtual Comet. HTCondor Week 2017 May Edgar Fajardo On behalf of OSG Software and Technology

CREAM-WMS Integration

CRAB tutorial 08/04/2009

CernVM-FS beyond LHC computing

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

Parallel programming in Matlab environment on CRESCO cluster, interactive and batch mode

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

Using Hadoop File System and MapReduce in a small/medium Grid site

Kerberos & HPC Batch systems. Matthieu Hautreux (CEA/DAM/DIF)

Troubleshooting Grid authentication from the client side

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

I Service Challenge e l'implementazione dell'architettura a Tier in WLCG per il calcolo nell'era LHC

Patrick Fuhrmann (DESY)

(HT)Condor - Past and Future

Distributed Computing Framework. A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

LHCb Computing Resource usage in 2017

Transcription:

Distributed production managers meeting Armando Fella on behalf of Italian distributed computing group

Distributed Computing human network CNAF Caltech SLAC McGill Queen Mary RAL LAL and Lyon Bari Legnaro - Padova Napoli Ferrara Pisa Italian group Frank Porter, Piti Ongmongkolkul Steffen Luiz, Wei Yang Steven Robertson Adrian Bevan Fergus Wilson Nicolas Arnaud Giacinto Donvito, Vincenzo Spinoso Gaetano Maron, Alberto Crescente Silvio Pardi Giovanni Fontana, Marco Ronzano Alberto Ciampa, Enrico Mazzoni, Dario Fabiani Email list: superb-grid-mng@lists.infn.it 12/08/09 SuperB workshop, Frascati 1-4 December 2009 2

Site setup procedure Wiki reference page: http://mailman.fe.infn.it/superbwiki/index.php/how_to_grid/site_setup Three main setup procedure steps: A) the grid site should enable the VO on each Grid element B) data handling solution validation: LCG-Utils C) software installation procedure For each point, the wiki section for simple test exists: http://mailman.fe.infn.it/superbwiki/index.php/how_to_grid/site_setup/tests and job based test: http://mailman.fe.infn.it/superbwiki/index.php/how_to_grid/site_setup/job Please send an email to list superb-grid-mng@lists.infn.it reporting the site setup status A, B or C to keep track of progress and permit the planning of next step distributed model validation. 12/08/09 SuperB workshop, Frascati 1-4 December 2009 3

Production design proposals Next MC productions involve the exploitation of distributed computing resources, two different distributed design systems are proposed: the February production distributed system proposal include the installation at remote sites of tools used at CNAF during the Nov '09 test production. the web production interface, local Bookkeeping DB communication, Data handling via direct access to file system the setup of a full Grid compliant system will be proposed as solution for April '10 production Full GANGA based job management, Grid Security Interface authentication, central bookkeeping DB communication, data handling via lcg-utils. the site resources exploitation during official production should be ruled: - the production managers will have priority on systems utilization - an agreement on submission policy with user community should be defined 12/08/09 SuperB workshop, Frascati 1-4 December 2009 4

Full Grid integrated production design AMGA is a possible future metadata manager solution 12/08/09 SuperB workshop, Frascati 1-4 December 2009 5

Full Grid integrated production workflow The job input files are transferred via LCG-Utils to the involved sites Storage Elements The job submission is performed by GANGA on User Interface at CNAF The WMS routes the jobs to the matched sites The job is scheduled by the site Computing Element to a Worker Node The job during running time accesses the DB for initialization and status update retrieves input files by local Storage Element transfers the output to the CNAF Storage Element 12/08/09 SuperB workshop, Frascati 1-4 December 2009 6

Full Grid integrated production timeline 12/08/09 SuperB workshop, Frascati 1-4 December 2009 7

February production design description Production system components to be installed at sites: Bookkeeping MySQL DB, phpmyadmin as management web interface (optional) Apache web server PHP 5 scripting language + apache php module Web production and monitor interface software (submission to Grid and to batch) Requirements: One server host accessible from worker nodes where the above components should be installed (prod.server@your.site), in the Grid case it should be a User Interface Disk space to contain input/output production files The disk space should be accessible r/w from WN, The output files should be copied back to CNAF central storage at end production time. 12/08/09 SuperB workshop, Frascati 1-4 December 2009 8

February production procedure proposal 1) Background files production at CNAF : The background files to be used as input for Fast Simulation are produced at CNAF 2) Background files transfer to sites : The background files should be transferred to file system accessible from WN 3) Perform the production at sites : Use the web User Interface to produce the simulated data Store the data into an accessible disk space 4) Output files transfer to CNAF central storage : The job output files should be transferred to CNAF 12/08/09 SuperB workshop, Frascati 1-4 December 2009 9

February production procedure proposal 12/08/09 SuperB workshop, Frascati 1-4 December 2009 10

Backup solution 1) The site contacts transfer the job input files from CNAF to site storage 2) CNAF submission scripts creation and DB update The production system will be customized with specific site setup (storage paths, submission launch, etc.) 3) The site contacts launch the scripts (aka submit jobs) 4) The site contacts performs the transfers of output files to CNAF 12/08/09 SuperB workshop, Frascati 1-4 December 2009 11

Discussion layout Site resource availability/required : CPU, Disk, Production host Software related issues: installation procedure, Bruno compatibility Requirement step-by-step revision: moving through wiki tree looking for open issues 12/08/09 SuperB workshop, Frascati 1-4 December 2009 12

Site resource availability/required Computational available resources per access system: Via Grid Computing: CPU Grid LCG, CPU Grid OSG, CPU Grid WestGrid Via direct batch system: CPU Condor, CPU LSF, CPU via PBS, CPU batch x - Total requirement number of cores used in two weeks: 2000/2500, 500/700 from CNAF. Disk space availability per access system: NFS like, direct Linux FS access: NFS, GPFS, GFS, AFS, Luster Data Handling system managed: dcache, xrootd, SAM - NFS like, direct Linux FS access, recommended solution - For February prod a SRM interface is not needed - Background files (job input) occupancy: ~410GB --> greater then 500GB - Output files occupancy: more is better 12/08/09 SuperB workshop, Frascati 1-4 December 2009 13

12/08/09 SuperB workshop, Frascati 1-4 December 2009 14

Software related issues Bruno software compatibility, the CNAF example ---> Frontend machine configuration bbr-serv08 Wiki tour. 12/08/09 SuperB workshop, Frascati 1-4 December 2009 15

Requirement step-by-step revision Wiki tour 12/08/09 SuperB workshop, Frascati 1-4 December 2009 16

Conclusion The February distributed solution is an hybrid solution looking toward the full Distributed (Grid compliant) solution. The coordination/cooperation with involved site contacts is a key element The time needed for the February production completion depends strictly by the number of sites ready at the production start time 12/08/09 SuperB workshop, Frascati 1-4 December 2009 17

BACKUP Slides 12/08/09 SuperB workshop, Frascati 1-4 December 2009 18

Production tool status Grid submission: GANGA based job submission test job data handling tasks successfully tested lfc name space creation, output transfer via lcg-util job submission on remote site with data handling successfully tested job output merger solution successfully tested Production tools development (FastSim) web production manager tool: used in successful Nov '09 production web monitor tool: the jobs update the SBK DB during run time state See Luca Tomassetti presentation 12/08/09 SuperB workshop, Frascati 1-4 December 2009 19

What is missing to the full Grid DCM Grid job communication with bookkeeping DB in a WAN scenario the job should be able to Authenticate it self to the DB service GSI compliant authentication via proxy validation Creation of a RESTfull based communication service: Frontier and AMGA systems under study Grid job management Retry policy resubmission procedures and error recovery in Grid env Grid based monitor system 12/08/09 SuperB workshop, Frascati 1-4 December 2009 20