International Collaboration to Extend and Advance Grid Education. glite WMS Workload Management System

Similar documents
Architecture of the WMS

Problemi di schedulazione distribuita su Grid

Advanced Job Submission on the Grid

How to use computing resources at Grid

DataGrid. Document identifier: Date: 16/06/2003. Work package: Partner: Document status. Deliverable identifier:

glite Advanced Job Management

Gergely Sipos MTA SZTAKI

DataGrid. Document identifier: Date: 28/10/2003. Work package: Partner: Document status. Deliverable identifier:

DataGrid. Document identifier: Date: 24/11/2003. Work package: Partner: Document status. Deliverable identifier:

Grid Computing. Olivier Dadoun LAL, Orsay. Introduction & Parachute method. Socle 2006 Clermont-Ferrand Orsay)

Grid Computing. Olivier Dadoun LAL, Orsay Introduction & Parachute method. APC-Grid February 2007

glite/egee in Practice

glite Middleware Usage

LCG-2 and glite Architecture and components

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

GRID COMPANION GUIDE

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Parallel Computing in EGI

AGATA Analysis on the GRID

J O B D E S C R I P T I O N L A N G U A G E A T T R I B U T E S S P E C I F I C A T I O N

WMS overview and Proposal for Job Status

A Login Shell interface for INFN-GRID

The glite middleware. Ariel Garcia KIT

EUROPEAN MIDDLEWARE INITIATIVE

glite Grid Services Overview

DataGrid EDG-BROKERINFO USER GUIDE. Document identifier: Date: 06/08/2003. Work package: Document status: Deliverable identifier:

DataGrid D EFINITION OF ARCHITECTURE, TECHNICAL PLAN AND EVALUATION CRITERIA FOR SCHEDULING, RESOURCE MANAGEMENT, SECURITY AND JOB DESCRIPTION

Future of Grid parallel exploitation

Job submission and management through web services: the experience with the CREAM service

Parallel Job Support in the Spanish NGI! Enol Fernández del Cas/llo Ins/tuto de Física de Cantabria (IFCA) Spain

A Practical Approach for a Workflow Management System

EGEE. Grid Middleware. Date: June 20, 2006

glite UI Installation

Programming the Grid with glite

MPI SUPPORT ON THE GRID. Kiril Dichev, Sven Stork, Rainer Keller. Enol Fernández

Interconnect EGEE and CNGRID e-infrastructures

SPGrid Efforts in Italy

A unified user experience for MPI jobs in EMI

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

DIRAC Documentation. Release integration. DIRAC Project. 09:29 20/05/2016 UTC

WMS Application Program Interface: How to integrate them in your code

Parallel computing on the Grid

Dr. Giuliano Taffoni INAF - OATS

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

Grid Scheduling Architectures with Globus

Troubleshooting Grid authentication from the client side

CE+WN+siteBDII Installation and configuration

Grid Documentation Documentation

Grid Infrastructure For Collaborative High Performance Scientific Computing

E UFORIA G RID I NFRASTRUCTURE S TATUS R EPORT

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

Getting Started with OSG Connect ~ an Interactive Tutorial ~

Multi-thread and Mpi usage in GRID Roberto Alfieri - Parma University & INFN, Gr.Coll. di Parma

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

ALICE Grid/Analysis Tutorial Exercise-Solutions

AliEn Resource Brokers

CERN LCG. LCG Short Demo. Markus Schulz. FZK 30 September 2003

Programming the Grid with glite *

DataGrid TECHNICAL PLAN AND EVALUATION CRITERIA FOR THE RESOURCE CO- ALLOCATION FRAMEWORK AND MECHANISMS FOR PARALLEL JOB PARTITIONING

Tutorial for CMS Users: Data Analysis on the Grid with CRAB

On the employment of LCG GRID middleware

CREAM-WMS Integration

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

DataGRID EDG TUTORIAL. EDMS id: Lead Partner: Document status: Version 3.2.1

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Introduction to Grid Infrastructures

DataGRID EDG TUTORIAL. Document identifier: EDMS id: Date: April 4, Work package: Partner(s): Lead Partner: Document status: Version 2.6.

Grid Compute Resources and Job Management

Ganga The Job Submission Tool. WeiLong Ueng

MyProxy Server Installation

ISTITUTO NAZIONALE DI FISICA NUCLEARE Sezione di Padova

BOINC extensions in the SZTAKI DesktopGrid system

Lesson 6: Portlet for job submission

Argus Vulnerability Assessment *1

( PROPOSAL ) THE AGATA GRID COMPUTING MODEL FOR DATA MANAGEMENT AND DATA PROCESSING. version 0.6. July 2010 Revised January 2011

DIRAC pilot framework and the DIRAC Workload Management System

Installing and running COMSOL 4.3a on a Linux cluster COMSOL. All rights reserved.

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

QosCosGrid Middleware

A Virtual Observatory for Pulsar Astronomy

Message Passing Interface (MPI-1)

Monitoring the Usage of the ZEUS Analysis Grid

History of SURAgrid Deployment

XSEDE High Throughput Computing Use Cases

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

Gridbus Portlets -- USER GUIDE -- GRIDBUS PORTLETS 1 1. GETTING STARTED 2 2. AUTHENTICATION 3 3. WORKING WITH PROJECTS 4

First evaluation of the Globus GRAM Service. Massimo Sgaravatto INFN Padova

A Simplified Access to Grid Resources for Virtual Research Communities

Chapter 2 Introduction to the WS-PGRADE/gUSE Science Gateway Framework

The PanDA System in the ATLAS Experiment

Client tools know everything

Migrating from Zcluster to Sapelo

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

FREE SCIENTIFIC COMPUTING

Grid Examples. Steve Gallo Center for Computational Research University at Buffalo

Performance of R-GMA for Monitoring Grid Jobs for CMS Data Production

CHARON System Framework for Applications and Jobs Management in Grid Environment

The ATLAS PanDA Pilot in Operation

Introduction to Linux Workshop 1

Transcription:

International Collaboration to Extend and Advance Grid Education glite WMS Workload Management System Marco Pappalardo Consorzio COMETA & INFN Catania, Italy ITIS Ferraris, Acireale, Tutorial GRID per gli Insegnanti, 11.05.2007 INFSO-SSA-26637

Workload Management System (WMS) components and services User Interface (RB) Resource Broker (RB) Logging and Bookkeeping (LB) Computing Element (CE) Contents Job Description Language (JDL) JDL document Sandboxes attachment Type of jobs Type of requests

Overview of the Architecture

Architecture overview Output Sandbox Replicas info Authentication & authirization User Interface Job submit event Input Sandbox Job status Network Server (Resource Broker) Input Sandbox Output Sandbox Publish LHC File Catalogue Information Index Logging & Bookkeeping Computing Element Workload Management System components Storage Element

Components and Services User Interface (UI) terminal to access to all grid facilities, including the WMS Command Line Interface (CLI) Grid Portals (such as GENIUS) Network Server(NS) / Resource Broker (RB) WMS access point. Dispatches jobs across computing resources Implements some scheduling algorithm Logging and Bookkeeping (LB) Keeps track of any WMS status or action Computing Element (CE) Is the actual computing resource It is an interface. We don t know what stays beyond

More on Computing Elements Logically, it is a queue of pending jobs Physically, it is a farm of Worker Nodes (WNs) Stays on top of a Local Resource Management System (LRMS) PBS, Condor, LSF CCS in the future? It exposes a common interface independent of the underlying LRMS

Job Life Cycle (1/4) X The grid user describes a job via a Job Description Language (JDL) document. Some input files (Input Sandbox) can be attached to the JDL doc. The grid user submits the JDL job using the CLI and waits for reply. The Resource Broker gets and stores the JDL document together with attached input files. The just generated jobid is sent back to the user to refer to that job univocally in the future

Job Life Cycle (2/4) The Resources Broker executes a special algorithm (MatchMaking) and selects a Computing Element according to best-fit rules. The job is handed to chosen CE together with the Input SandBox The Computing Element accepts the job and queues it. The job starts execution over the Local Resources Management System (LRMS)

Job Life Cycle (3/4) When the job terminates, the produced output is sent back to the Resource Broker The Resource Broker gets the results and the Output Sandbox and stores them in the local repository At the same time, the Computing Element notifies the Logging & Bookkeeping Now the job output is available on the Resource Broker

Job Life Cycle (4/4) Job-status? Terminated The User queries the L&B to have a look on his/her jobs and realizes that the job has terminated. The User gets Output SandBox from the Resource Broker. The Resource Broker clears all no more needed info from its repository. The job life cycle has terminated (either well or not)!!!

Job State Machine Submitted: The job has been created on the UI but not yet sent to the Resource Broker Waiting: The job is now being processed by the Resource Broker Ready: The job has been processed but not yet sent to chosen CE Scheduled: Job is now queued on the CE and is waiting to be executed Running: The job is running on the Computing Element Done: The job has terminated its execution Aborted: The job has been aborted by the WMS Cancelled: The job has been cancelled by the user Cleared: The job has terminated and the output has been retrieved

Matchmaking Algorithm The Matchmaking Algorithm (within the RB) Decides how to dispatch jobs across resources (where). Uses the Information System as resource discovery system. First phase: Selection In this phase, the algorithm chooses which Computing Elements are suitable for executing a given job Requirements JDL attribute is evaluated for any candidate CE Second Phase: Ranking A fitness function (Rank JDL attribute) is evaluated over suitable CEs. The CE that maximizes the above function is chosen. The job is submitted to the selected CE.

Job Description Language (JDL)

Job Description Language The Job Description Language (JDL) is a language to describe job is composed mainly of a collection of attribute-value pairs allows the attachment of files

Type of jobs Normal A batch executable Interactive Requires interaction of the user MPICH Needs the Message Passing Interface (MPI) installed on the computing resource Partitionable Can be partitioned into more sub-jobs. Deprecated. Checkpointable Execution can be marked at some specific position of the code (checkpoints) to be resumed later. Deprecated. Parametric a job whose JDL contains parametric attributes (e.g. Arguments, StdInput etc.).

Types of requests The JDL allows description of the following types of requests: JOB a simple job (default) DAG a Direct Acyclic Graph of dependent jobs Collection a set of independent jobs

JDL format A JDL file consists of lines having the format: Attribute = expression; and terminated by a semicolon. Expressions can span several lines, but only the last one must be terminated by a semicolon. Comments must have a sharp character (#) or a double slash (//) at the beginning of each line. Comments spanning multiple lines can be specified enclosing the text between /* and */.

Type Example The Type attribute is a string representing the type of the request described by the JDL, e.g. Type = Job ; Possible values are: Job DAG Collection The value for this attribute is case insensitive. If this attribute is not specified in the JDL description, the WMS will set it to Job. Default: Job

JobType Example The JobType attribute is a string representing the type of the job described by the JDL, e.g.: JobType = Interactive ; Possible values are: Normal Interactive MPICH Checkpointable Partitionable Parametric This attribute only makes sense when the Type attribute equals to Job. The value for this attribute is case insensitive. Default: Normal

Example of JDL file scriptls.jdl VirtualOrganisation= gilda ; Executable = "ls.sh"; // this will run on the endpoint StdError = "stderr.log"; // redirect stderror to this file StdOutput = "stdout.log"; // redirect stdout to this file InputSandbox = "ls.sh"; // attach this file to the JDL OutputSandbox = // these files will be the output {"stderr.log", stdout.log"}; // for this job ls.sh #!/bin/sh /bin/ls // simply executes ls on the final // computing resource

JDL Attributes This is a very not exhaustive list of JDL attributes Type Job or DAG JobType Interactive, MPICH... Executable Arguments StdInput StdOutput and StdError InputSandbox The command line to execute String used as argument for the executable A file attached as standard input Files to which redirect standard output and standard error List of files attached to the JDL OutputSandboxList of files to be retrieved as output Requirements Rank RetryCount Boolean expression to select suitable CEs Fitness function to evaluate over candidate CEs Retry matchmaking

Requirements = < logic expression > Expression that uses C-like operators. It represents job requirements on resources. Requirements The Requirements expression can contain attributes that describe the CE in the IS which are prefixed with other.. e.g.: Requirements = other.glueceinfolrmstype == "PBS" && other.glueceinfototalcpus > 2); Rank = < floating point expression > Fitness function that uses C-like operators to select the best CE The Rank expression can contain attributes that describe the CE in the IS which are prefixed with other.. e.g. Rank = other.gluecepolicymaxrunningjobs other.gluecestaterunningjobs

Command Line Interface

edg-job-list-match [glite edg]-job-* commands Gets the list of CEs that satisfy requirements to execute the job edg-job-submit Submits the job to the Resources Broker and returns the just generated job_id edg-job-status Retrieves the status of the given job edg-job-cancel Cancels a submitted jobs edg-job-get-output Retrieves the output only if the job has terminated

Example: hostname.jdl (i) $> cat hostname.jdl Type = Job ; JobType = Normal ; It is a standard job Executable = /bin/sh/ ; Arguments = start_hostname.sh ; The executable to run StdError = stderr.log ; StdOutput = stdout.log ; Redirect standard output and standard error to these files InputSandbox = start_hostname.sh ; OutputSandbox = { stderr.log, stdout.log }; RetryCount = 7; If the job fails the execution, retrys for at most 7 times $> cat start_hostname.sh #!/bin/sh sleep 5 hostname f Attach this file to the JDL document Thesefileshavetoberetrieved when the job terminates

Example: hostname.jdl $> edg-job-submit -o jobid hostname.jdl Selected Virtual Organisation name (from proxy certificate extension): gilda Connecting to host glite-rb.ct.infn.it, port 7772 Logging to host glite-rb.ct.infn.it, port 9002 ================== glite-job-submit Success ============================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier is: - https://glite-rb.ct.infn.it:9000/lb6lihd93s7vyz1rvbcp8a just generated job id The job identifier has been saved in the following file: /home/fscibi/glite/other/jobid ===================================================================== option -o jobid $> cat jobid ###Submitted Job Ids### https://glite-rb.ct.infn.it:9000/lb6lihd93s7vyz1rvbcp8a

Example: hostname.jdl $> edg-job-status -i jobid ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://glite-rb.ct.infn.it:9000/lb6lihd93s7vyz1rvbcp8a Current Status: Done (Success) Terminated Exit code: 0 Status Reason: Job terminated successfully Destination: grid004.iucc.ac.il:2119/jobmanager-lcgpbs-short Submitted: Mon Apr 3 12:27:28 2006 CEST Computing ************************************************************* Element where the job executed

Esempio: hostname.jdl (v) edg-job-status -v 3 -i jobid ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://glite-rb.ct.infn.it:9000/lb6lihd93s7vyz1rvbcp8a Current Status: Cleared Status Reason: user retrieved output sandbox Destination: grid004.iucc.ac.il:2119/jobmanager-lcgpbs-short Submitted: Mon Apr 3 12:27:28 2006 CEST --- - stateentertimes = Submitted : Mon Apr 3 12:27:28 2006 CEST Waiting : Mon Apr 3 12:27:37 2006 CEST Ready Scheduled Running Done Cleared Aborted : --- Cancelled : --- Unknown : --- : Mon Apr 3 12:27:42 2006 CEST : Mon Apr 3 12:28:01 2006 CEST : Mon Apr 3 12:28:55 2006 CEST : Mon Apr 3 12:30:37 2006 CEST : Mon Apr 3 15:36:39 2006 CEST Job status variation

Esempio: hostname.jdl (vi) edg-job-get-output -i jobid Retrieving files from host: glite-rb.ct.infn.it ( for https://gliterb.ct.infn.it:9000/lb6lihd93s7vyz1rvbcp8a ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://glite-rb.ct.infn.it:9000/lb6lihd93s7vyz1rvbcp8a have been successfully retrieved and stored in the directory: /tmp/glite/glite-ui/fscibi_lb6lihd93s7vyz1rvbcp8a Default dir where retrieved output is stored ********************************************************************************* To specify a different $> edg-job-get-output -i jobid --dir <dirname> directory

Esempio: sphere.jdl (i) $> cat sphere.jdl #author: giuseppe.larocca@ct.infn.it Type = "Job"; JobType = "Normal"; Executable = "/bin/sh"; MyProxyServer="lxshare0207.cern.ch"; StdOutput = "sphere.out"; StdError = "sphere.err"; InputSandbox = {"start_sphere.sh","sphere1.pov","sphere1.ini"}; OutputSandbox = {"sphere.out","sphere.err","final_sphere.gif"}; RetryCount = 7; Arguments = "start_sphere.sh"; Requirements = Member("POVRAY-3.5",other.GlueHostApplicationSoftwareRunTimeEnvironment); Select only CEs where POVRAY is installed

Esempio: sphere.jdl (ii) $> edg-job-list-match sphere.jdl Selected Virtual Organisation name (from proxy certificate extension): gilda Connecting to host glite-rb.ct.infn.it, port 7772 *************************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* dgt01.ui.savba.sk:2119/jobmanager-lcgpbs-infinite dgt01.ui.savba.sk:2119/jobmanager-lcgpbs-long dgt01.ui.savba.sk:2119/jobmanager-lcgpbs-short egee008.cnaf.infn.it:2119/blah-pbs-infinite egee008.cnaf.infn.it:2119/blah-pbs-long egee008.cnaf.infn.it:2119/blah-pbs-short fenrir.uniandes.edu.co:2119/blah-pbs-infinite fenrir.uniandes.edu.co:2119/blah-pbs-long *************************************************************************** CEs where POVRAY is installed

Esempio: sphere.jdl (iii) $> edg-job-submit -o jobid sphere.jdl Connecting to host glite-rb.ct.infn.it, port 7772 Logging to host glite-rb.ct.infn.it, port 9002 ====================== glite-job-submit Success ========================= The job has been successfully submitted to the Network Server. Use glite-job-status command to check job current status. Your job identifier is: - https://glite-rb.ct.infn.it:9000/jsowaze9kfzs4tdg1tairq The job identifier has been saved in the following file: /home/fscibi/glite/other/jobid ====================================================================

Esempio: pds2jpg-asar-demo.jdl (i) $> cat pds2jpg-asar-demo.jdl [ ] VirtualOrganisation = "gilda"; Executable = "/bin/bash"; Arguments = "pds2jpg_asar_install.sh ASA_APG_1PXPDE20020819_093043_000000152008_00394_02452_0000"; StdOutput = "pds2jpg_asar.out"; StdError = "pds2jpg_asar.err"; OutputSandbox = { "ASA_APG_1PXPDE20020819_093043_000000152008_00394_02452_0000-b1.jpg", "ENVISAT_Product_courtesy_of_European_Space_Agency", "pds2jpg_asar.out", "pds2jpg_asar.err }; RetryCount = 3; JobType = "normal"; Type = "Job"; InputSandbox = {"./pds2jpg_asar_install.sh","./beam20.tar.gz"}; rank = (-other.gluecestateestimatedresponsetime); requirements = (other.gluecestatestatus=="production") The installer is attached to the JDL Fitness function to select the best CE

Esempio: pds2jpg-asar-demo.jdl (ii) $> cat pds2jpg_asar_install.sh echo Staging Input Data \(Courtesy of European Space Agency\); #edg-rm --vo=gilda copyfile lfn:$1.n1 file://$pwd/$1.n1; lcg-cp --vo=gilda lfn:$1.n1 file://$pwd/$1.n1; echo Staging Application; gunzip beam20.tar.gz; tar xvf beam20.tar; cd beam-2.0/bin echo Starting Application;./pds2jpg-ASAR-run.sh $1; mv $1-b*.jpg../.. cd../.. rm -fr beam-2.0; rm -fr $PWD/$1.N1; rm -fr $PWD/beam20.tar; echo Input ENVISAT Product courtesy of European Space Agency touch ENVISAT_Product_courtesy_of_European_Space_Agency echo No Output Packaging; echo Done!;

Questions