Grid Computing. Olivier Dadoun LAL, Orsay. Introduction & Parachute method. Socle 2006 Clermont-Ferrand Orsay)

Similar documents
Grid Computing. Olivier Dadoun LAL, Orsay Introduction & Parachute method. APC-Grid February 2007

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

DESY. Andreas Gellrich DESY DESY,

International Collaboration to Extend and Advance Grid Education. glite WMS Workload Management System

Advanced Job Submission on the Grid

AGATA Analysis on the GRID

GRID COMPANION GUIDE

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Architecture of the WMS

glite Middleware Usage

Monitoring the Usage of the ZEUS Analysis Grid

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

glite Grid Services Overview

Gergely Sipos MTA SZTAKI

The EU DataGrid Testbed

The National Analysis DESY

Computing in HEP. Andreas Gellrich. DESY IT Group - Physics Computing. DESY Summer Student Program 2005 Lectures in HEP,

NorduGrid Tutorial. Client Installation and Job Examples

Interconnect EGEE and CNGRID e-infrastructures

Architecture Proposal

Troubleshooting Grid authentication from the client side

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

On the employment of LCG GRID middleware

MyProxy Server Installation

Grid Documentation Documentation

Gridbus Portlets -- USER GUIDE -- GRIDBUS PORTLETS 1 1. GETTING STARTED 2 2. AUTHENTICATION 3 3. WORKING WITH PROJECTS 4

The PanDA System in the ATLAS Experiment

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

ISTITUTO NAZIONALE DI FISICA NUCLEARE

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

The DESY Grid Testbed

The glite middleware. Ariel Garcia KIT

Overview of HEP software & LCG from the openlab perspective

DataGrid EDG-BROKERINFO USER GUIDE. Document identifier: Date: 06/08/2003. Work package: Document status: Deliverable identifier:

Parallel Job Support in the Spanish NGI! Enol Fernández del Cas/llo Ins/tuto de Física de Cantabria (IFCA) Spain

EUROPEAN MIDDLEWARE INITIATIVE

Implementing GRID interoperability

Overview. Grid vision Grid application domains The role of CERN in the Grid research Grid Architecture Standards and related activities Summary

Access the power of Grid with Eclipse

Troubleshooting Grid authentication from the client side

GROWL Scripts and Web Services

Future of Grid parallel exploitation

Introduction to Grid Infrastructures

Batch system usage arm euthen F azo he Z J. B T

Beob Kyun KIM, Christophe BONNAUD {kyun, NSDC / KISTI

E UFORIA G RID I NFRASTRUCTURE S TATUS R EPORT

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

Easy Access to Grid Infrastructures

Introduction to Programming and Computing for Scientists

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

ALICE Grid/Analysis Tutorial Exercise-Solutions

( PROPOSAL ) THE AGATA GRID COMPUTING MODEL FOR DATA MANAGEMENT AND DATA PROCESSING. version 0.6. July 2010 Revised January 2011

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

Grid Infrastructure For Collaborative High Performance Scientific Computing

The Grid: Processing the Data from the World s Largest Scientific Machine

A Login Shell interface for INFN-GRID

LCG-2 and glite Architecture and components

EGEODE. !Dominique Thomas;! Compagnie Générale de Géophysique (CGG, France) R&D. Expanding Geosciences On Demand 1. «Expanding Geosciences On Demand»

FREE SCIENTIFIC COMPUTING

Heterogeneous Grid Computing: Issues and Early Benchmarks

VOMS Support, MyProxy Tool and Globus Online Tool in GSISSH-Term Siew Hoon Leong (Cerlane) 23rd October 2013 EGI Webinar

Outline. ASP 2012 Grid School

The LHC Computing Grid

DataGrid. Document identifier: Date: 24/11/2003. Work package: Partner: Document status. Deliverable identifier:

HEP Grid Activities in China

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Edinburgh (ECDF) Update

Introduction to Grid Computing

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Problemi di schedulazione distribuita su Grid

EGEE. Grid Middleware. Date: June 20, 2006

Parallel Computing in EGI

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

ICS-ACI System Basics

MPI SUPPORT ON THE GRID. Kiril Dichev, Sven Stork, Rainer Keller. Enol Fernández

EGEE and Interoperation

SPGrid Efforts in Italy

DIRAC Documentation. Release integration. DIRAC Project. 09:29 20/05/2016 UTC

Tier2 Centre in Prague

CrossGrid testbed status

IEPSAS-Kosice: experiences in running LCG site

Grid Interoperation and Regional Collaboration

Prototype DIRAC portal for EISCAT data Short instruction

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

Operating two InfiniBand grid clusters over 28 km distance

The German National Analysis Facility What it is and how to use it efficiently

XRAY Grid TO BE OR NOT TO BE?

Grid Data Management

Getting started with the CEES Grid

Ganga The Job Submission Tool. WeiLong Ueng

VMs at a Tier-1 site. EGEE 09, Sander Klous, Nikhef

How to use computing resources at Grid

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

Distributed Monte Carlo Production for

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

dcache Introduction Course

Grid Computing Security hack.lu 2006 :: Security in Grid Computing :: Lisa Thalheim 1

Towards a Security Model to Bridge Internet Desktop Grids and Service Grids

How to use the Grid for my e-science

Transcription:

virtual organization Grid Computing Introduction & Parachute method Socle 2006 Clermont-Ferrand (@lal Orsay) Olivier Dadoun LAL, Orsay dadoun@lal.in2p3.fr www.dadoun.net October 2006 1

Contents Preamble Introduction to the Grid Computing Authentication & authorization Job submission examples Parachute method Conclusion Socle October 2006 Olivier Dadoun 2

Preamble One of our goal is to evaluate the background into the detector due to the backscattered secondaries from : the disrupted beam, the compton, and the pairs losses along the extraction line CPU time consuming: To answer to this : BDSIM based on Geant4 Running BDSIM for 500K disrupted beam particles take one week with 160 JOBS (~60 days for 2.8 GHz intel CPU 2GB) and batch wait a lot of time in queue So I decided to use the Grid ( with a Parachute ) Socle October 2006 Olivier Dadoun 3

Introduction Definition: Allow scientists from multiple domains to use, share, and manage geographically distributed resources transparently A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high computational capabilities. The Grid, I. Foster and C. Kesselman, 1998 The name s origin: In analogy with the power grid a computational grid should be easy to use hiding the complex internal process An organization of people from different institutions with common goals who are sharing computational resources to achieve those goals is A Virtual Organization, a VO in the Grid point of view Socle October 2006 Olivier Dadoun 4

Major European Grid Projects European Funded European DataGrid CrossGrid DataTAG DEISA LHC Computing Grid EGEE Socle October 2006 Olivier Dadoun 5

Infrastructure LCG / EGEE Enabling Grids for E-sciencE: Provide and manage an European grid infrastructure to support researchers from many disciplines (Biomedical Applications, Earth Science, Computational Chemistry and High-Energy Physics) LHC Computing Grid: Prepare, deploy, and operate the computing environment to allow the physicists to analyze the data from LHC detectors LCG and EGEE have similar aims: LCG: world wide collaboration (one field) EGEE: European grid (many fields) Socle October 2006 Olivier Dadoun 6

LCG/EGEE Production Service > 200 sites > 20 kcpu > 13 PB Socle October 2006 Olivier Dadoun 7

Virtual Organization A set of individuals and/or institutions defined by such sharing rules is what we call a virtual organization. I. Foster, C. Kesselmann, S. Tuecke (2000) A VO represents a collaboration who is defined by: People from different institutions with common goals Computational share resources to achieve those goals same data same rules to analyze same access rights ILC and CALICE VOs all ready exists and are used why not to use it? Socle October 2006 Olivier Dadoun 8

ILC @ Grid The VOs ILC and CALICE are hosted at DESY CALICE is supported by DESY and Imperial College (IC) Registration to ILC and CALICE is managed by LCG (http://lcgregistrar.cern.ch) and has become a so-called global VO in EGEE ILC is currently supported by ~10 UKI sites, LAL, DESY,... (04/04/2006 27 CEs, 3500 CPUs, 42 TB, 6 RBs) The test beam data of CALICE were moved between DESY and IC using Grid tools (GridFTP, SRM, LFC) Source Andreas Gellrich, DESY ILC Meeting, Cambridge, 04.04.2006 (http://grid.desy.de/talks/) NB: CCIN2P3 support now ILC and CALICE VOs since last summer to use grid there copy your.globus in your home and set up the grid environment lcg_env.sh(.csh) file @ $THRONG_DIR Socle October 2006 Olivier Dadoun 9

Resource Broker Schematics Job submission JDL User Interface cert ssh Data Transfer output Computing Element NFS Storage Element disks workers Socle October 2006 Olivier Dadoun 10

What we need? User Interface (UI) account 1. Authentication (i.e. Who are you?) Certificate Authorities (CA), Electronic Certificat (cert.) X509 User generates time-limited proxy 2. Authorization (i.e. What can you do?) Done by Virtual Organization (VO) Public Key Infrastructure Uses Grid Security Infrastructure (GSI) from Globus Socle October 2006 Olivier Dadoun 11

Authentication & authorization (1) 1. Personal certificate https://igc.services.cnrs.fr/grid-fr For Mac OSX user: don t use safari For any OS I suggest you Firefox (at least for the grid site) Socle October 2006 Olivier Dadoun 12

Authentication & authorization (2) 3. Export, convert and install your certificat 4. VO registration https://lcg-registrar.cern.ch/cgi bin/register/account.pl Socle October 2006 Olivier Dadoun 13

Proxy and myproxy Create a proxy grid-proxy-init (by default 12h life time) lx2/dadoun % grid-proxy-init Your identity: /O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=Olivier Dadoun Enter GRID pass phrase for this identity: Creating proxy... Done Your proxy is valid until: Sat Sep 23 02:25:33 2006 Delete a proxy: grid-proxy-destroy Information on your proxy: grid-proxy-info If you need a longer time life proxy used a proxy server: myproxy-init -d -s <host_name> <host_name> server name proxy myproxy-info -d -s <host_name> myproxy-destroy -d -s <host_name> Socle October 2006 Olivier Dadoun 14

Hello Word submission level 0 (1) Executable = /bin/echo ; Arguments = HelloWorld ; StdError = hello.err ; StdOutput = hello.out ; OutputSandbox = { hello.out, hello.err }; lx2/dadoun % edg-job-submit --vo ilc -o out HelloWord_level0.jdl Selected Virtual Organisation name (from --vo option): ilc Connecting to host grid09.lal.in2p3.fr, port 7772 Logging to host grid09.lal.in2p3.fr, port 9002 ================= edg-job-submit Success ==================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobid) is: - https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg The edg_jobid has been saved in the following file: /users/delphi/dadoun/datagridtutorial/test/out ====================================================== Socle October 2006 Olivier Dadoun 15

Hello Word submission level 0 (2) lx2/dadoun % edg-job-status https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ce02.esc.qmul.ac.uk:2119/jobmanager-lcgpbs-lcg2_long reached on: Wed Sep 20 10:14:42 2006 ************************************************************* Job was successfully running when : Done and Success Exit code 0 NB: if code!= 0 Job running problem, the stderr can help to debug Socle October 2006 Olivier Dadoun 16

Hello Word submission level 0 (3) lx2/dadoun % edg-job-get-output https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg Retrieving files from host: grid09.lal.in2p3.fr ( for https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg have been successfully retrieved and stored in the directory: /users/delphi//dadoun/joboutput/dadoun_ma4eskm9sxt85bjb4onvdg ********************************************************************************* We can check that HelloWord is on the stdout and the stderr is empty (exist code 0) Socle October 2006 Olivier Dadoun 17

Hello Word submission level 1 (1) JDL with an InputSandBox Executable = "HelloWord.sh"; StdOutput = hello.out"; StdError = "hello.err"; InputSandBox = {"HelloWord.sh } OutputSandbox = { hello.out","hello.err"} #!/bin/bash echo HelloWord InputSandBox: can t execeed few Mo lx2/dadoun % edg-job-submit --vo ilc -o out HelloWord.jdl Selected Virtual Organisation name (from --vo option): ilc Connecting to host grid09.lal.in2p3.fr, port 7772 Logging to host grid09.lal.in2p3.fr, port 9002 ============edg-job-submit Success ============================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier edg_jobid is: https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig The edg_jobid has been saved in the following file: /users/delphi/dadoun/datagridtutorial/test/out ============================================================= Socle October 2006 Olivier Dadoun 18

Hello Word submission level 1 (2) lx2/dadoun % edg-job-status https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig Current Status: Status Reason: Destination: Scheduled Job successfully submitted to Globus fal-pygrid-18.lancs.ac.uk:2119/jobmanager-lcgpbs-ilc reached on: Mon Sep 18 15:07:47 2006 ************************************************************* lx2/dadoun % edg-job-status https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig Current Status: Running Status Reason: Job successfully submitted to Globus Destination: fal-pygrid-18.lancs.ac.uk:2119/jobmanager-lcgpbs-ilc reached on: Mon Sep 18 15:11:25 2006 ************************************************************* Socle October 2006 Olivier Dadoun 19

Hello Word submission level 1 (3) lx2/dadoun % edg-job-status https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: fal-pygrid-18.lancs.ac.uk:2119/jobmanager-lcgpbs-ilc reached on: Mon Sep 18 15:13:48 2006 ************************************************************* lx2/dadoun % edg-job-get-output https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig Retrieving files from host: grid09.lal.in2p3.fr ( for https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig have been successfully retrieved and stored in the directory: /users/delphi//dadoun/joboutput/dadoun_3fpxyrq8cbcdxokz-qjnig ********************************************************************************* Socle October 2006 Olivier Dadoun 20

LCG commands (LHC Computing Grid) Configure it: export LCG_CATALOG_TYPE=lfc export LFC_HOST=grid-lfc.desy.de Usefull command List file or directory : lfc-ls /grid/ilc Copy file on SE (for ilc vo): lcg-cr --vo ilc file:`pwd`/your_file -l lfn:/path/you_file Copy file from SE to UI 1. You need Globally Unique IDentifier (GUID) lcg-lg --vo ilc lfn:/your_path/file 2. lcg-cp --vo ilc GUID file:`pwd`/file Erase the file from the 1. You need Site File Name (sfn) lcg-lr --vo ilc GUID 2. lcg-del --vo ilc sfn:sfn Socle October 2006 Olivier Dadoun 21

Underlying Technology Relative CPU, storage, and network capability impacts computing architecture Data Physics continue flux up to 1Go/s onto the grid (~DVD/5s) Using the optical fiber we expected 10Go/s (~2 DVD/s) Data transfer do not be anymore a challenge Socle October 2006 Olivier Dadoun 22

Parachute method for BDSIM how I use Geant4 onto the grid 1. Compiled and run on a Interactive SL (CCIN2P3 machines) 2. Copy the binary and the associates lib on lx2 3. Copy all the lib needed by BDSIM on lx2 (for BDSIM: Geant4, CLHEP, ROOT, ) 4. Defined all the variable and Run BDSIM on lx2 5. Copy Everything on the Storage Element 6. Make all the scripts need the run BDSIM on the grid Socle October 2006 Olivier Dadoun 23

Few words on how to run BDSIM 1. Gmad File Detector and extraction line descriptions (also some G4 flags: thresholdcutcharged ) 2. Input bunch file Output from guinea-pig simulation One gmad file correspond to one input bunch file (when you change the bunch file you need to change the gmad file) Socle October 2006 Olivier Dadoun 24

Parachute method for BDSIM how I use Geant4 onto the grid Tar Ball on SE Geant4, CLHEP, ROOT Get the ROOT files UI SL3 @ LAL Computing Element BASH SCRIPTS n JDLs Install Lib., and the files from the JDLs on Workers Run the shell script Copy the root output on SE RB InputSandBox sh script (how to run BDSIM) Gmad GuineaPig file GuineaPig files, also produced onto the GRID (SEED is now a argument of the program, Cécile, François et Socle October 2006 Guy ) and stored on SE Olivier Dadoun 25

Gains: Gain and problem with the Parachute 1. No disk space problem to store my data 2. At least a factor of 10 compare to CCALI clusters (where most the time is spent in queue) Problems: 1. Lost jobs : wait, no recovery a job may hang in waiting status when some problem arises at RB level 2. Proxy expired problem (still in investigation 10%) 3. Crashed for unknown problem few percent Socle October 2006 Olivier Dadoun 26

Conclusions and prospects Parachute method: > 95% of successful JOBs for GuineaPig Note: In the context of GRIF I used also XtremWeb (Oleg Lodygensky, LAL) for GuineaPig production. Need to be tested with BDSIM 85% of successful JOBs for BDSIM Maybe we need one VO : ILC & CALICE joined And install all the commun softwares for both VO : Geant4, CLHEP and ROOT at least I would like to thanks Charles Loomis (LAL) for useful discussion Socle October 2006 Olivier Dadoun 27