Architecture of the WMS

Similar documents
Problemi di schedulazione distribuita su Grid

Advanced Job Submission on the Grid

International Collaboration to Extend and Advance Grid Education. glite WMS Workload Management System

LCG-2 and glite Architecture and components

How to use computing resources at Grid

CREAM-WMS Integration

Parallel Computing in EGI

DataGrid. Document identifier: Date: 24/11/2003. Work package: Partner: Document status. Deliverable identifier:

DataGrid. Document identifier: Date: 28/10/2003. Work package: Partner: Document status. Deliverable identifier:

DataGrid. Document identifier: Date: 16/06/2003. Work package: Partner: Document status. Deliverable identifier:

J O B D E S C R I P T I O N L A N G U A G E A T T R I B U T E S S P E C I F I C A T I O N

Gergely Sipos MTA SZTAKI

Grid Computing. Olivier Dadoun LAL, Orsay. Introduction & Parachute method. Socle 2006 Clermont-Ferrand Orsay)

A Practical Approach for a Workflow Management System

glite Advanced Job Management

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

glite Grid Services Overview

glite Middleware Usage

GRID COMPANION GUIDE

Grid Computing. Olivier Dadoun LAL, Orsay Introduction & Parachute method. APC-Grid February 2007

WMS overview and Proposal for Job Status

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

DataGrid D EFINITION OF ARCHITECTURE, TECHNICAL PLAN AND EVALUATION CRITERIA FOR SCHEDULING, RESOURCE MANAGEMENT, SECURITY AND JOB DESCRIPTION

Future of Grid parallel exploitation

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

AGATA Analysis on the GRID

Troubleshooting Grid authentication from the client side

Grid Compute Resources and Job Management

The glite middleware. Ariel Garcia KIT

Programming the Grid with glite *

Programming the Grid with glite

WMS Application Program Interface: How to integrate them in your code

Job submission and management through web services: the experience with the CREAM service

Ganga The Job Submission Tool. WeiLong Ueng

On the employment of LCG GRID middleware

Multi-thread and Mpi usage in GRID Roberto Alfieri - Parma University & INFN, Gr.Coll. di Parma

glite/egee in Practice

A Login Shell interface for INFN-GRID

EGEE. Grid Middleware. Date: June 20, 2006

Grid Compute Resources and Grid Job Management

Parallel Job Support in the Spanish NGI! Enol Fernández del Cas/llo Ins/tuto de Física de Cantabria (IFCA) Spain

Interconnect EGEE and CNGRID e-infrastructures

Grid Documentation Documentation

Grid Infrastructure For Collaborative High Performance Scientific Computing

E UFORIA G RID I NFRASTRUCTURE S TATUS R EPORT

Dr. Giuliano Taffoni INAF - OATS

Introduction to Grid Infrastructures

EUROPEAN MIDDLEWARE INITIATIVE

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

DIRAC Documentation. Release integration. DIRAC Project. 09:29 20/05/2016 UTC

Gridbus Portlets -- USER GUIDE -- GRIDBUS PORTLETS 1 1. GETTING STARTED 2 2. AUTHENTICATION 3 3. WORKING WITH PROJECTS 4

CERN LCG. LCG Short Demo. Markus Schulz. FZK 30 September 2003

MPI SUPPORT ON THE GRID. Kiril Dichev, Sven Stork, Rainer Keller. Enol Fernández

ISTITUTO NAZIONALE DI FISICA NUCLEARE Sezione di Padova

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

DataGrid TECHNICAL PLAN AND EVALUATION CRITERIA FOR THE RESOURCE CO- ALLOCATION FRAMEWORK AND MECHANISMS FOR PARALLEL JOB PARTITIONING

DIRAC pilot framework and the DIRAC Workload Management System

R-GMA (Relational Grid Monitoring Architecture) for monitoring applications

DataGrid EDG-BROKERINFO USER GUIDE. Document identifier: Date: 06/08/2003. Work package: Document status: Deliverable identifier:

A unified user experience for MPI jobs in EMI

Parallel computing on the Grid

Troubleshooting Grid authentication from the client side

QosCosGrid Middleware

Scalable Computing: Practice and Experience Volume 10, Number 4, pp

Overview of WMS/LB API

GROWL Scripts and Web Services

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

Overview of HEP software & LCG from the openlab perspective

Tutorial 4: Condor. John Watt, National e-science Centre

AN INTRODUCTION TO WORKFLOWS WITH DAGMAN

glideinwms Training Glidein Internals How they work and why by Igor Sfiligoi, Jeff Dost (UCSD) glideinwms Training Glidein internals 1

Monitoring the Usage of the ZEUS Analysis Grid

Adding Support For a New Resource Manager

Tutorial for CMS Users: Data Analysis on the Grid with CRAB

PoS(ACAT)020. Status and evolution of CRAB. Fabio Farina University and INFN Milano-Bicocca S. Lacaprara INFN Legnaro

DSWR User Guide. In effect from January 29 th,, BCLDB Direct Sales Web Reporting User Guide Page 1

History of SURAgrid Deployment

DiskBoss DATA MANAGEMENT

MyProxy Server Installation

SyncBreeze FILE SYNCHRONIZATION. User Manual. Version Dec Flexense Ltd.

Architecture Proposal

A Virtual Observatory for Pulsar Astronomy

Multiple Broker Support by Grid Portals* Extended Abstract

CHARON System Framework for Applications and Jobs Management in Grid Environment

Table of Contents. Table of Contents Job Manager for remote execution of QuantumATK scripts. A single remote machine

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

Lesson 6: Portlet for job submission

HTCONDOR USER TUTORIAL. Greg Thain Center for High Throughput Computing University of Wisconsin Madison

Monitoring tools in EGEE

AliEn Resource Brokers

DiskBoss DATA MANAGEMENT

Improved 3G Bridge scalability to support desktop grid executions

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

A Grid Software for Virtual Eye Surgery Based on Globus 4 and glite (DRAFT)

Introduction. creating job-definition files into structured directories etc.

CE+WN+siteBDII Installation and configuration

FREE SCIENTIFIC COMPUTING

M. Roehrig, Sandia National Laboratories. Philipp Wieder, Research Centre Jülich Nov 2002

WLCG Lightweight Sites

Condor Local File System Sandbox Requirements Document

Transcription:

Architecture of the WMS Dr. Giuliano Taffoni INFORMATION SYSTEMS UNIT

Outline This presentation will cover the following arguments: Overview of WMS Architecture Job Description Language Overview WMProxy overview New! = New in glite 3.0 2

First Part Architecture of the glite WMS 3

4 Scientific Instruments and sensors in the Grid, ICTP Trieste, 24.04.2007 Workload Manager Services

Workload Manager Services User request 4

Workload Manager Services User request 4

Workload Manager Services User request WMS traslator 4

Workload Manager Services User request WMS traslator 4

Workload Manager Services User request WMS traslator 4

Workload Manager Services User request WMS traslator 4

Workload Manager Services User request WMS traslator grid resour ce 4

WMS Objectives The Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. 5 The purpose of the Workload Manager (WM) is accept and satisfy requests for job management coming from its clients meaning of the submission request is to pass the responsibility of the job to the WM. WM will pass the job to an appropriate CE for execution taking into account requirements and the preferences

WMS Architecture New! New! New! Job management requests (submission, cancellation) expressed via a Job Description Language (JDL) 6

WMS Architecture New! New! New! Job Finds management appropriate requests CE for (submission, each cancellation) request, taking expressed into account job via requests a Job Description and preferences, Grid Language status, utilization (JDL) policies on resources 7

WMS Architecture New! New! New! Keeps submission Job management requests requests (submission, cancellation) Requests are expressed kept via a Job for a Description while if Language no resources (JDL) are immediately available 8

WMS Architecture New! New! New! Repository Keeps submission of resource Job management requests information requests available (submission, to matchmaker cancellation) Requests are expressed kept via Updated a Job for a Description via while notifications if Language no resources and/or (JDL) active are immediately polling on available resources 9

WMS Architecture New! New! New! Performs the actual job submission and monitoring 10

NS and WMProxy The Network Server (NS) is a generic network daemon that provides support for the job control functionality. It is responsible for accepting incoming requests from the WMS-UI (e.g. job submission, job removal), which, if valid, are then passed to the Workload Manager. The Workload Manager Proxy (WMProxy) is a service providing access to WMS functionality through a Web Services based interface. Besides being New! 11

WMS Information Supermarket ISM represents one of the most notable New improvements! in the WM The ISM basically consists of a repository of resource information that is available in read only mode to the matchmaking engine the update is the result of 12

WMS Task Queue The Task Queue represents the second New! most notable improvement in the WM internal design possibility to keep a submission request for a while if no resources are immediately available that match the job requirements technique used by the AliEn and Condor systems Non-matching requests will be retried either periodically eager scheduling approach 13

WMS Job Submission Services WMS components handling the job during its lifetime and performs the submission Job Adapter (JA) is responsible for making the final touches to the JDL expression for a job, before it is passed to CondorC for the actual submission creating the job wrapper script that creates the appropriate execution environment in the CE worker node transfer of the input and of the output sandboxes CondorC responsible for performing the actual job management operations DAGMan meta-scheduler 14 job submission, job removal New! purpose is to navigate the graph determine which nodes are free of dependencies follow the execution of the corresponding jobs

Log Monitor (LM) is responsible for Scientific Instruments and sensors in the Grid, ICTP Trieste, 24.04.2007 WMS Job Submission Services watching the CondorC log file intercepting interesting events concerning active jobs Proxy Renewal Service is responsible for assuring that, for all the lifetime of a job, a valid user proxy exists within the WMS MyProxy Server is contacted in order to renew the user's credential Logging & Bookkeeping (LB) is responsible for Storing events generated by the various components of the WMS Delivering to the user information about the job s status 15

WMS Job Submission Services Jobs State Machine (1/9) 16

WMS Job Submission Services Submitted job is entered by the user to the User Interface but not yet transferred Jobs to Network State Server Machine (1/9) for processing 16

WMS Job Submission Services Jobs State Machine (2/9) 17

WMS Job Submission Services Waiting job accepted by NS and waiting for Workload Manager processing or being processed by WMHelper modules. Jobs State Machine (2/9) 17

WMS Job Submission Services Jobs State Machine (3/9) 18

WMS Job Submission Services Ready job processed by Jobs State Machine (3/9) W M b u t n o t y e t transferred to the CE (local batch system queue). 18

WMS Job Submission Services Jobs State Machine (4/9) 19

WMS Job Submission Services Jobs State Machine (4/9) Scheduled job waiting in the queue on the CE. 19

WMS Job Submission Services Jobs State Machine (5/9) 20

WMS Job Submission Services Jobs State Machine (5/9) Running job is running. 20

WMS Job Submission Services Jobs State Machine (6/9) 21

WMS Job Submission Services Jobs State Machine (6/9) Done job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way). 21

WMS Job Submission Services Jobs State Machine (7/9) 22

WMS Job Submission Services Aborted job processing Jobs State Machine (7/9) was aborted by WMS (waiting in the WM queue or CE for too long, expiration of user credentials). 22

WMS Job Submission Services Jobs State Machine (8/9) 23

WMS Job Submission Services Jobs State Machine (8/9) Cancelled job has been successfully canceled on user request. 23

WMS Job Submission Services Jobs State Machine (9/9) 24

WMS Job Submission Services Jobs State Machine (9/9) Cleared output sandbox was transferred to the user or removed due to the timeout. 24

25 Scientific Instruments and sensors in the Grid, ICTP Trieste, 24.04.2007 Service architecture

Service architecture User interface possible operations 25

Service architecture Find the list of resources suitable to run a specific job User interface possible operations 25

Service architecture Find the list of resources suitable to run a specific job User interface possible operations Submit a job/dag for execution on a remote Computing Element 25

Service architecture Find the list of resources suitable to run a specific job User interface possible operations Submit a job/dag for execution on a remote Computing Element Check the status of a submitted job/dag 25

Service architecture Find the list of resources suitable to run a specific job User interface possible operations Submit a job/dag for execution on a remote Computing Element Check the status of a submitted job/dag Cancel one or more submitted jobs/dags 25

Service architecture Find the list of resources suitable to run a specific job Submit a job/dag for execution on a remote Computing Element User interface possible operations Retrieve checkpoint states of a submitted checkpointable job Check the status of a submitted job/dag Cancel one or more submitted jobs/dags 25

Service architecture Find the list of resources suitable to run a specific job Submit a job/dag for execution on a remote Computing Element User interface possible operations Retrieve the output files of a completed job/dag (output sandbox) Retrieve checkpoint states of a submitted checkpointable job Check the status of a submitted job/dag Cancel one or more submitted jobs/dags 25

Service architecture Find the list of resources suitable to run a specific job Submit a job/dag for execution on a remote Computing Element Check the status of a submitted job/dag Cancel one or more submitted jobs/dags User interface possible operations Retrieve the output files of a completed job/dag (output sandbox) Retrieve and display bookkeeping information about submitted jobs/ DAGs Retrieve checkpoint states of a submitted checkpointable job 25

Service architecture Find the list of resources suitable to run a specific job Submit a job/dag for execution on a remote Computing Element Check the status of a submitted job/dag Cancel one or more submitted jobs/dags User interface possible operations Retrieve the output files of a completed job/dag (output sandbox) Retrieve and display bookkeeping information about submitted jobs/ DAGs Retrieve and display logging information about submitted jobs/dags Retrieve checkpoint states of a submitted checkpointable job 25

Service architecture Find the list of resources suitable to run a specific job Submit a job/dag for execution on a remote Computing Element User interface possible operations Retrieve the output files of a completed job/dag (output sandbox) Retrieve checkpoint states of a submitted checkpointable job Check the status of a submitted job/dag Cancel one or more submitted jobs/dags Retrieve and display bookkeeping information about submitted jobs/ DAGs Retrieve and display logging information about submitted jobs/dags Start a local listener for an interactive job 25

Command Line Interface The most relevant commands to interact with the WMS (NS): edg-job-submit <jdl_file> edg-job-list-match <jdl_file> edg-job-status <job_id> edg-job-get-output <job_id> edg-job-cancel <job_id> In glite 3.0: 26

Command Line Interface Command Line Interface Job Submission Perform the job submission to the Grid. $ edg-job-submit [options] <jdl_file> $ glite-job-submit [options] <jdl_file> where <jdl file> is a file containing the job description, usually with extension.jdl. 27

Command Line Interface Command Line Interface If the request has been correctly submitted this is the tipical output that you can get: edg-job-submit test.jdl ====================glite-job-submit Success ===================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobid) is: - https://lxshare0234.cern.ch:9000/ribubkffkhnsq6cjiluy8q ============================================================== In case of failure, an error message will be displayed instead, and an exit status different form zero will be retured. 28

Command Line Interface Command Line Interface It is possible to see which CEs are eligible to run a job specified by a given JDL file using the command edg-job-list-match test.jdl Connecting to host lxshare0380.cern.ch, port 7772 Selected Virtual Organisation name (from UI conf file): dteam ********************************************************************* COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: adc0015.cern.ch:2119/jobmanager-lcgpbs-infinite adc0015.cern.ch:2119/jobmanager-lcgpbs-long adc0015.cern.ch:2119/jobmanager-lcgpbs-short ********************************************************************** 29

Command Line Interface Command Line Interface After a job is submitted, it is possible to see its status using the glite-job-status command. edg-job-status https://lxshare0234.cern.ch:9000/x-ehtxfdlxxsoidvls0l0w ************************************************************* BOOKKEEPING INFORMATION: Printing status info for the Job: https://lxshare0234.cern.ch:9000/x-ehtxfdlxxsoidvls0l0w Current Status: Scheduled Status Reason: unavailable Destination: lxshare0277.cern.ch:2119/jobmanager-pbs-infinite reached on: Fri Aug 1 12:21:35 2003 ************************************************************* 30

Command Line Interface Command Line Interface A job can be canceled before it ends using the command glite-job-cancel. edg-job-cancel https://lxshare0234.cern.ch:9000/dae162is6estca0vqhvkog Are you sure you want to remove specified job(s)? [y/n]n :y =================== glite-job-cancel Success==================== The cancellation request has been successfully submitted for the following job(s) - https://lxshare0234.cern.ch:9000/dae162is6estca0vqhvkog =========================================================== 31

Command Line Interface After the job has finished Command (it reaches the Line DONE Interface status), its output can be copied to the UI edg-job-get-output https://lxshare0234.cern.ch:9000/snpegp1ymjcns22yf5pflg Retrieving files from host lxshare0234.cern.ch ***************************************************************** JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://lxshare0234.cern.ch:9000/snpegp1ymjcns22yf5pflg have been successfully retrieved and stored in the directory: /tmp/joboutput/snpegp1ymjcns22yf5pflg ***************************************************************** By default, the output is stored under /tmp, but it is possible to specify in which directory to save the output using the - -dir <path name> option. 32

Second Part Job Description Language 33

Job Description Language The JDL is used in glite to specify the job s characteristics and constrains, which are used during the match-making process to select the best resources that satisfy job s requirements. 34

Job Description Language The JDL syntax consists on statements like: Attribute = value; Comments must be preceded by a sharp character ( # ) or have to follow the C++ syntax 35 WARNING: The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line.

Job Description Language In a JDL, some attributes are mandatory while others are optional. An essential JDL is the following: Executable = test.sh ; StdOutput = std.out ; StdError = std.err ; InputSandbox = { test.sh }; OutputSandbox = { std.out, std.err }; If needed, arguments to the executable can be passed: Arguments = Hello World! ; 36

If the argument contains quoted strings, the quotes must be escaped with a backslash e.g. Arguments = \ Hello World!\ 10 ; Special characters such as &,, >, < are only allowed if specified inside a quoted 37

Workload Manager Service The JDL allows the description of the following request types supported by the WMS: Job: a simple application DAG: a direct acyclic graph of dependent jobs With WMSProxy Collection: a set of independent jobs With WMSProxy 38

Jobs The Workload Management System currently supports the following types for Jobs : Normal a simple batch, a set of commands to be processed as single unit Interactive a job whose standard streams are forwarded to the submitting client MPICH a parallel application using MPICH-P4 implementation of MPI Partitionable a job which is composed by a set of independent steps/ iterations Checkpointable a job able to save its state Parametric a job where one or more of its attributes are parameterized Support for MPI and parametric jobs is only available when the submission to the WMS is done through the WMProxy service 39

Jobs The Workload Management System currently supports the a following set of independent types subjobs, each one taking care for Jobs : of a step or of a sub-set of steps, and Normal which a can simple be batch, a set of commands to be processed as single executed unit in parallel Interactive a job whose standard streams are forwarded to the submitting client MPICH a parallel application using MPICH-P4 implementation of MPI Partitionable a job which is composed by a set of independent steps/ iterations Checkpointable a job able to save its state Parametric a job where one or more of its attributes are parameterized Support for MPI and parametric jobs is only available when the submission to the WMS is done through the WMProxy service 39

Jobs The Workload Management System currently supports the a following set of independent types subjobs, each one taking care for Jobs : of a step or of a sub-set of steps, and Normal which a can simple be batch, a set of commands to be processed as single executed unit in parallel Interactive a job whose standard streams are forwarded the job to execution the can be submitting client suspended and resumed later, MPICH a parallel application using MPICH-P4 implementation starting from the of MPI same point whe it was first stopped Partitionable a job which is composed by a set of independent steps/ iterations Checkpointable a job able to save its state Parametric a job where one or more of its attributes are parameterized Support for MPI and parametric jobs is only available when the submission to the WMS is done through the WMProxy service 39

JDL: Relevant Attributes JobType (optional) Normal (simple, sequential job), Interactive, MPICH, Checkpointable, Partitionable, Parametric Or combination of them Checkpointable, Interactive Checkpointable, MPI E.g. JobType = Interactive ; JobType = { Interactive, Checkpointable }; 40

Executable (mandatory) This is a string representing the executable/ command name. The user can specify an executable which is already on the remote CE Executable = { /opt/egeode/gct/egeode.sh }; The user can provide a local executable name, which will be staged from the UI to the WN. Executable = { egeode.sh }; InputSandbox = { /home/larocca/egeode/ egeode.sh }; 41

Arguments (optional) This is a string containing all the job command line arguments. E.g.: If your executable sum has to be started as: $ sum N1 N2 out result.out 42

Environment (optional) List of environment settings needed by the job to run properly E.g. Environment = { JAVA_HOME=/usr/java/ j2sdk1.4.2_08 }; InputSandbox (optional) List of files on the UI local disk needed by the job for proper running The listed files will be automatically staged to the remote resource E.g. InputSandbox ={ myscript.sh, /tmp/cc.sh }; 43

OutputSandbox (optional) List of files, generated by the job, which have to be retrieved from the CE E.g. OutputSandbox ={ std.out, std.err, image.png }; 44

Requirements (optional) Job requirements on computing resources Specified using attributes of resources published in the Information Service If not specified, default value defined in UI configuration file is considered Default. Requirements = other.gluecestatestatus == "Production ; Requirements=other.GlueCEUniqueID == adc006.cern.ch:2119/ jobmanager-pbs-infinite Requirements=Member( ALICE-3.07.01, other.gluehostapplicationsoftwareruntimeenvironment); 45

References JDL Attributes http://server11.infn.it/workload-grid/docs/ DataGrid-01-TEN-0142-0_2.pdf 46 https://edms.cern.ch/document/590869/1 http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/ api_doc/wms_jdl/index.html LCG-2 User Guide Manual Series https://edms.cern.ch/file/454439/lcg-2- UserGuide.html

Third Part Workload Manager Proxy 47

WMProxy (Workload Manager Proxy) is a new service providing access to the glite Workload Management System (WMS) functionality through a simple Web Services based interface. has been designed to handle a large number of requests for job submission glite 1.5 => ~180 secs for 500 jobs goal is to get in the short term to ~60 secs for 1000 jobs it provides additional features such as bulk submission and the support for shared and compressed sandboxes for compound jobs. It s the natural replacement of the NS in the passage to the SOA approach. 48

Support for new types strongly relies on newly developed JDL converters and on the DAG submission support all JDL conversions are performed on the server a single submission for several jobs All new request types can be monitored and controlled through a single handle (the request id) each sub-jobs can be however followed-up and controlled independently through its own id Smarter WMS client commands/api allow submission of DAGs, collections and parametric jobs exploiting the concept of shared sandbox allow automatic generation and submission of collections and DAGs from sets of JDL files located in user specified directories on the UI 49

WMProxy C++ client commands The commands to interact with WMProxy Service are: glite-wms-job-submit <jdl_file> glite-wms-job-list-match <jdl_file> glite-wms-job-cancel <job_ids> In our examples: glite-wms-job-* are edg-job-* 50

References 51 glite 3.0 User Guide https://edms.cern.ch/file/722398/1.1/glite-3-userguide.pdf R-GMA overview page http://www.r-gma.org/ GLUE Schema http://infnforge.cnaf.infn.it/glueinfomodel/ JDL attributes specification for WM proxy https://edms.cern.ch/document/590869/1 WMProxy quickstart http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/ wmproxy_client_quickstart.shtml WMS user guides https://edms.cern.ch/document/572489/1

52 Scientific Instruments and sensors in the Grid, ICTP Trieste, 24.04.2007 Questions