Grid Computing. Olivier Dadoun LAL, Orsay Introduction & Parachute method. APC-Grid February 2007

Similar documents
Grid Computing. Olivier Dadoun LAL, Orsay. Introduction & Parachute method. Socle 2006 Clermont-Ferrand Orsay)

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

DESY. Andreas Gellrich DESY DESY,

GRID COMPANION GUIDE

International Collaboration to Extend and Advance Grid Education. glite WMS Workload Management System

AGATA Analysis on the GRID

Advanced Job Submission on the Grid

Grid Documentation Documentation

Introduction to Programming and Computing for Scientists

glite Middleware Usage

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Architecture of the WMS

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Troubleshooting Grid authentication from the client side

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

Monitoring the Usage of the ZEUS Analysis Grid

VOMS Support, MyProxy Tool and Globus Online Tool in GSISSH-Term Siew Hoon Leong (Cerlane) 23rd October 2013 EGI Webinar

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

MyProxy Server Installation

On the employment of LCG GRID middleware

Introduction to Grid Infrastructures

The Grid: Processing the Data from the World s Largest Scientific Machine

EUROPEAN MIDDLEWARE INITIATIVE

Computing in HEP. Andreas Gellrich. DESY IT Group - Physics Computing. DESY Summer Student Program 2005 Lectures in HEP,

Gergely Sipos MTA SZTAKI

GROWL Scripts and Web Services

The PanDA System in the ATLAS Experiment

glite Grid Services Overview

Access the power of Grid with Eclipse

Troubleshooting Grid authentication from the client side

Architecture Proposal

Interconnect EGEE and CNGRID e-infrastructures

LCG-2 and glite Architecture and components

Overview. Grid vision Grid application domains The role of CERN in the Grid research Grid Architecture Standards and related activities Summary

The EU DataGrid Testbed

DIRAC Documentation. Release integration. DIRAC Project. 09:29 20/05/2016 UTC

EGEE. Grid Middleware. Date: June 20, 2006

( PROPOSAL ) THE AGATA GRID COMPUTING MODEL FOR DATA MANAGEMENT AND DATA PROCESSING. version 0.6. July 2010 Revised January 2011

FREE SCIENTIFIC COMPUTING

NorduGrid Tutorial. Client Installation and Job Examples

Beob Kyun KIM, Christophe BONNAUD {kyun, NSDC / KISTI

Implementing GRID interoperability

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

E UFORIA G RID I NFRASTRUCTURE S TATUS R EPORT

DataGrid EDG-BROKERINFO USER GUIDE. Document identifier: Date: 06/08/2003. Work package: Document status: Deliverable identifier:

DataGrid. Document identifier: Date: 24/11/2003. Work package: Partner: Document status. Deliverable identifier:

ALICE Grid/Analysis Tutorial Exercise-Solutions

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

The DESY Grid Testbed

The National Analysis DESY

Heterogeneous Grid Computing: Issues and Early Benchmarks

The glite middleware. Ariel Garcia KIT

Future of Grid parallel exploitation

Easy Access to Grid Infrastructures

Problemi di schedulazione distribuita su Grid

Parallel Computing in EGI

IEPSAS-Kosice: experiences in running LCG site

HEP Grid Activities in China

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

Parallel Job Support in the Spanish NGI! Enol Fernández del Cas/llo Ins/tuto de Física de Cantabria (IFCA) Spain

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

The LHC Computing Grid

Grid Infrastructure For Collaborative High Performance Scientific Computing

Outline. ASP 2012 Grid School

glite UI Installation

Assignment 3 GridNexus Job Submission. Authors: Grid Computing Course Team C. Ferner and B. Wilkinson February 5, 2010

The GENIUS Grid Portal

Grid Data Management

How to use computing resources at Grid

SPGrid Efforts in Italy

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

Online Steering of HEP Grid Applications

Ganga The Job Submission Tool. WeiLong Ueng

FGI User Guide. Version Kimmo Mattila / CSC - IT center for science

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Future Developments in the EU DataGrid

Batch system usage arm euthen F azo he Z J. B T

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

Travelling securely on the Grid to the origin of the Universe

LHC COMPUTING GRID INSTALLING THE RELEASE. Document identifier: Date: April 6, Document status:

Gridbus Portlets -- USER GUIDE -- GRIDBUS PORTLETS 1 1. GETTING STARTED 2 2. AUTHENTICATION 3 3. WORKING WITH PROJECTS 4

How to use the Grid for my e-science

Supercomputing environment TMA4280 Introduction to Supercomputing

Introduction: What is Unix?

Prototype DIRAC portal for EISCAT data Short instruction

Service Availability Monitor tests for ATLAS

EGEODE. !Dominique Thomas;! Compagnie Générale de Géophysique (CGG, France) R&D. Expanding Geosciences On Demand 1. «Expanding Geosciences On Demand»

Monte Carlo Production on the Grid by the H1 Collaboration

Grid Computing Security hack.lu 2006 :: Security in Grid Computing :: Lisa Thalheim 1

Grid Challenges and Experience

Overview of HEP software & LCG from the openlab perspective

The grid for LHC Data Analysis

Grid Interoperation and Regional Collaboration

UNICORE Globus: Interoperability of Grid Infrastructures

The European DataGRID Production Testbed

Grid Examples. Steve Gallo Center for Computational Research University at Buffalo

Introduzione al GRID computing

Transcription:

Grid Computing Introduction & Parachute method APC-Grid February 2007 Olivier Dadoun LAL, Orsay http://flc-mdi.lal.in2p3.fr dadoun@lal.in2p3.fr www.dadoun.net October 2006 1

Contents Machine Detector Interface (MDI) purpose Introduction to the Grid Computing Authentication & authorization Very simple Job submission examples Parachute method Conclusion All the examples can be found at: http://dadoun.net/informatique/apcgridexamples.tar.gz APC-Grid February 2007 Olivier Dadoun 2

The ILC project: e + e - linear collider e + e - polarized collisions s = 500 GeV - 1TeV #2820: 2.10 10 @ 5Hz σ y ~5.7nm~1% σ x L~2.10 34 cm -2 s -1 Solenoid Calorimeters TPC Vertex detector APC-Grid February 2007 Olivier Dadoun 3

Machine Detector Interface purpose Depend on the beam parameters set, the post collision beam could be very degraded beamstralung photon) BDSIM Need to extract the beam and transport it with the minimal losses to the dump (10-20 MW) In any case we will have some losses along the extraction line (>300m) damage on beam magnet and specially the SC magnets background generation One of our goals: evaluate the backscattered particles into the detector region using BDSIM toolkit (Geant4 based) nb: 4detectors concept X 3(4) extractions line APC-Grid February 2007 Olivier Dadoun 4

Why I use the grid? Geant4 based programs CPU time consuming: Running BDSIM for 500K disrupted beam particles with 50m extraction line take one week (human time) on BQS-CCIN2P3- (with 160 JOBS and job wait a lot of time in queue) Mokka BDSIM So I decided to use the Grid, under ILC Virtual Organization APC-Grid February 2007 Olivier Dadoun 5

Introduction Definition: Allow scientists from multiple domains to use, share, and manage geographically distributed resources transparently A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high computational capabilities. The Grid, I. Foster and C. Kesselman, 1998 The name s origin: In analogy with the power grid a computational grid should be easy to use hiding the complex internal process An organization of people from different institutions with common goals who are sharing computational resources to achieve those goals is A Virtual Organization, a VO in the Grid point of view APC-Grid February 2007 Olivier Dadoun 6

Major European Grid Projects European Funded European DataGrid CrossGrid DataTAG DEISA LHC Computing Grid EGEE APC-Grid February 2007 Olivier Dadoun 7

Infrastructure LCG / EGEE Enabling Grids for E-sciencE: Provide and manage an European grid infrastructure to support researchers from many disciplines (Biomedical Applications, Earth Science, Computational Chemistry and High-Energy Physics) LHC Computing Grid: Prepare, deploy, and operate the computing environment to allow the physicists to analyze the data from LHC detectors LCG and EGEE have similar aims: LCG: world wide collaboration (one field) EGEE: European grid (many fields) APC-Grid February 2007 Olivier Dadoun 8

LCG/EGEE Production Service > 200 sites > 20 kcpu > 13 PB APC-Grid February 2007 Olivier Dadoun 9

Virtual Organization A set of individuals and/or institutions defined by such sharing rules is what we call a virtual organization. I. Foster, C. Kesselmann, S. Tuecke (2000) A VO represents a collaboration who is defined by: People from different institutions with common goals Computational share resources to achieve those goals same data same rules to analyze same access rights ILC vo is currently supported by ~10 UKI sites, LAL, DESY,... (04/04/2006 27 CEs, 3500 CPUs, 42 TB, 6 RBs) APC-Grid February 2007 Olivier Dadoun 10

Resource Broker Schematics Job submission JDL User Interface cert ssh Data Transfer output Computing Element NFS Storage Element disks workers APC-Grid February 2007 Olivier Dadoun 11

What we need? User Interface (UI) account 1. Authentication (i.e. Who are you?) Certificate Authorities (CA), Electronic Certificat (cert.) X509 User generates time-limited proxy 2. Authorization (i.e. What can you do?) Done by Virtual Organization (VO) Public Key Infrastructure Uses Grid Security Infrastructure (GSI) from Globus APC-Grid February 2007 Olivier Dadoun 12

Authentication & authorization (1) 1. Personal certificate https://igc.services.cnrs.fr/grid-fr You will receive your certificate For Mac OSX user: don t use safari (I suggest you Firefox, at least for the grid site) don t use the mailer provided by default APC-Grid February 2007 Olivier Dadoun 13

Authentication & authorization (2) 3. Export, convert and install your certificat Copy your cert.p12 in your UI machine (~/.globus folder openssl pkcs12 -in cert.p12 -clcerts -nokeys -out usercert.pem openssl pkcs12 -in cert.p12 -nocerts -out userkey.pem 4. VO registration https://lcg-registrar.cern.ch/cgi-bin/register/account.pl APC-Grid February 2007 Olivier Dadoun 14

Proxy and myproxy Create a proxy grid-proxy-init (by default 12h life time) lx2/dadoun % grid-proxy-init Your identity: /O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=Olivier Dadoun Enter GRID pass phrase for this identity: Creating proxy... Done Your proxy is valid until: Sat Sep 23 02:25:33 2006 Delete a proxy: grid-proxy-destroy Information on your proxy: grid-proxy-info If you need a longer time life proxy used a proxy server: myproxy-init -d -s <host_name> <host_name> server name proxy myproxy-info -d -s <host_name> myproxy-destroy -d -s <host_name> Since December 06 each VO use voms-proxy-init APC-Grid February 2007 Olivier Dadoun 15

Hello Word submission level 0 (1) Executable = /bin/ls ; Arguments = -rtla ; StdError = first.err ; StdOutput = first.out ; OutputSandbox = { first.out, first.err }; lx2/dadoun % edg-job-submit --vo ilc -o out myfirstjdl.jdl Selected Virtual Organisation name (from --vo option): ilc Connecting to host grid09.lal.in2p3.fr, port 7772 Logging to host grid09.lal.in2p3.fr, port 9002 ================= edg-job-submit Success ==================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobid) is: - https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg The edg_jobid has been saved in the following file: /users/delphi/dadoun/datagridtutorial/apc/level0/output ====================================================== APC-Grid February 2007 Olivier Dadoun 16

Hello Word submission level 0 (2) lx2/dadoun % edg-job-status https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ce02.esc.qmul.ac.uk:2119/jobmanager-lcgpbs-lcg2_long reached on: Wed Sep 20 10:14:42 2006 ************************************************************* Job was successfully running when : Done and Success & Exit code 0 NB: if code!= 0 Job running problem, stderr can help to debug APC-Grid February 2007 Olivier Dadoun 17

Hello Word submission level 0 (3) lx2/dadoun % edg-job-get-output https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg Retrieving files from host: grid09.lal.in2p3.fr ( for https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://grid09.lal.in2p3.fr:9000/ma4eskm9sxt85bjb4onvdg have been successfully retrieved and stored in the directory: /users/delphi//dadoun/joboutput/dadoun_ma4eskm9sxt85bjb4onvdg ********************************************************************************* We can check that HelloWord is in the stdout. first.err file empty (exit code 0) APC-Grid February 2007 Olivier Dadoun 18

Comment on the ouput in first.out -rw-r--r-- 1 ilcsgm ilc 0 Feb 26 16:01.maradona.https_3a_2f_2fgrid rb1.desy.de_3a9000_2ffgoiajqghqqecn1_5fjkly3q.output drwxr-xr-x 3 ilcsgm ilc 4096 Feb 26 16:01.. -rw-r--r-- 1 ilcsgm ilc 807 Feb 26 16:01.BrokerInfo -rw-r--r-- 1 ilcsgm ilc 0 Feb 26 16:01 first.out -rw-r--r-- 1 ilcsgm ilc 0 Feb 26 16:01 first.err drwxr-xr-x 2 ilcsgm ilc 4096 Feb 26 16:01. Don t erase those files in your future scripts (rm.* not a good idea) APC-Grid February 2007 Olivier Dadoun 19

Hello Word submission level 1 (1) JDL with an InputSandBox Executable = "HelloWorld.sh"; StdOutput = hello.out"; StdError = "hello.err"; InputSandBox = {"HelloWorldScript.sh } OutputSandbox = { std.out", std.err"} #!/bin/bash echo Hello World :) InputSandBox: can t execeed few Mo lx2/dadoun % edg-job-submit --vo ilc -o out HelloWord.jdl Selected Virtual Organisation name (from --vo option): ilc Connecting to host grid09.lal.in2p3.fr, port 7772 Logging to host grid09.lal.in2p3.fr, port 9002 ============edg-job-submit Success ============================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier edg_jobid is: https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig The edg_jobid has been saved in the following file: /users/delphi/dadoun/datagridtutorial/test/out ============================================================= APC-Grid February 2007 Olivier Dadoun 20

Hello Word submission level 1 (2) lx2/dadoun % edg-job-status https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig Current Status: Status Reason: Destination: Scheduled Job successfully submitted to Globus fal-pygrid-18.lancs.ac.uk:2119/jobmanager-lcgpbs-ilc reached on: Mon Sep 18 15:07:47 2006 ************************************************************* lx2/dadoun % edg-job-status https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig Current Status: Running Status Reason: Job successfully submitted to Globus Destination: fal-pygrid-18.lancs.ac.uk:2119/jobmanager-lcgpbs-ilc reached on: Mon Sep 18 15:11:25 2006 ************************************************************* APC-Grid February 2007 Olivier Dadoun 21

Hello Word submission level 1 (3) lx2/dadoun % edg-job-status https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: fal-pygrid-18.lancs.ac.uk:2119/jobmanager-lcgpbs-ilc reached on: Mon Sep 18 15:13:48 2006 ************************************************************* lx2/dadoun % edg-job-get-output https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig Retrieving files from host: grid09.lal.in2p3.fr ( for https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://grid09.lal.in2p3.fr:9000/3fpxyrq8cbcdxokz-qjnig have been successfully retrieved and stored in the directory: /users/delphi//dadoun/joboutput/dadoun_3fpxyrq8cbcdxokz-qjnig ********************************************************************************* APC-Grid February 2007 Olivier Dadoun 22

Configure it: LCG commands (LHC Computing Grid) export LCG_CATALOG_TYPE=lfc export LFC_HOST=grid-lfc.desy.de Usefull command List file or directory : lfc-ls /grid/ilc Copy file on SE (for ilc vo): lcg-cr --vo ilc file:`pwd`/your_file -l lfn:/path/you_file Copy file from SE to UI 1. You need Globally Unique IDentifier (GUID) lcg-lg --vo ilc lfn:/your_path/file 2. lcg-cp --vo ilc GUID file:`pwd`/file Erase the file from the 1. You need Site File Name (sfn) lcg-lr --vo ilc GUID 2. lcg-del --vo ilc sfn:sfn APC-Grid February 2007 Olivier Dadoun 23

Underlying Technology Relative CPU, storage, and network capability impacts computing architecture Data Physics continue flux up to 1Go/s onto the grid (~DVD/5s) Using the optical fiber we expected 10Go/s (~2 DVD/s) Data transfer do not be anymore a challenge APC-Grid February 2007 Olivier Dadoun 24

Parachute method using ROOT 1. Compile and run myhisto on a Interactive SL 2. Copy all the lib and header needed by myhisto into your UI (essentially ROOT lib) 3. Defined all the variable and run myhisto on your UI 4. Copy Everything (in a tar ball folder) onto your SE 5. Make the script need to run myhisto on the grid and to copy the output onto your SE APC-Grid February 2007 Olivier Dadoun 25

Parachute method using ROOT Tar Ball on SE with ROOT needed UI SL3 @ LAL Get the ROOT file Computing Element Install the tar ball execute myhisto Copy the root output on SE InputSandBox shell script (how to run myhisto) RB APC-Grid February 2007 Olivier Dadoun 26

Gain and problem with the Parachute Gains: 1. No disk space problem to store my data 2. At least a factor of 10 compare to BQS (where most the time is spent in queue) Problems: 1. Lost jobs : wait, no recovery a job may hang in waiting status when some problem arises at RB level 2. Proxy expired problem (10%) Understand: grid-proxy-init & voms-proxy-init confusion in the RB 3. Crashed for unknown problem few percent APC-Grid February 2007 Olivier Dadoun 27

Conclusions Parachute method: > 95% of successful JOBs for simple Jobs (self depend prog.) Note: In the context of GRIF I used also XtremWeb (on Linux and Mac OS X machines, need to install my software also on Windows, wmshare) 85% of successful JOBs for huge simulation Using: Geant4, CLHEP, ROOT, I would like to thanks Charles Loomis (LAL) for useful discussion Use the Grid APC-Grid February 2007 Olivier Dadoun 28