Resource Allocation in computational Grids
|
|
- Jeffery Snow
- 6 years ago
- Views:
Transcription
1 Grid Computing Competence Center Resource Allocation in computational Grids Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Nov. 23, 21
2 Scheduling on a cluster internet ssh username@server batch system server local 1Gb/s ethernet network compute node 1 compute node 2 compute node N All job requests sent to a central server. The server decides which job runs where and when. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
3 where: resource allocation model Computing resources are defined by a structured set of attributes (key=value pairs). SGE s default configuration defines 53 such attributes: number of available cores/cpus; total size of RAM/swap; current load average; etc. A node is eligible for running a job iff the node attributes are compatible with the job resource requirements. (Other batch systems are similar.) Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
4 when: scheduling policy There are usually more jobs than the system can handle concurrently. (Even more so, in high-throughput computing cases we are interested in.) So, job requests must be prioritized. Prioritization of requests is a matter of the local scheduling policy. (And this differs greatly among batch systems and among sites.) Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
5 (Hidden) assumptions 1. The scheduling server has complete knowledge of the nodes Local networks have low latency (RTT average 0.3 ms on a 1GB/s ethernet) and the status information is a small packet. 2. The server has complete control over the nodes So a compute node will immediately execute a job when told by the server. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
6 How does this extend to Grid computing? By definition of a Grid It s geographically distributed High-latency links (hence: resource status may be not up-to-date) Network is easily partitioned or nodes disconnected (hence: resources have a dynamic nature; they may come and go) 2. Resources come from multiple control domains Prioritization is a matter of local policy! AuthZ and other issues may prevent execution at all. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
7 The Globus/ARC model batch system server local 1Gb/s ethernet network internet compute node 1 compute node 2 compute node N arcsub/arcstat/arcget batch system server batch system server local 1Gb/s ethernet network local 1Gb/s ethernet network compute node 1 compute node 2 compute node N compute node 1 compute node 2 compute node N An infrastructure is a set of independent clusters. The client host selects one cluster and submits a job there. Then periodically polls for status information. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
8 Issues in the Globus/ARC approach? 1. How to select a good execution site? 2. How to gather the required information from the sites? 3. Based on the same information, two clients can arrive on the same scheduling information, hence they can flood a site with jobs. 4. Actual job start times are unpredictable, as scheduling is ultimately a local decision. 5. Client polling increases the load linearly with the number of jobs. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
9 The MDS InfoSystem, I arcsub/arcstat/arcget batch system server GRIS internet local 1Gb/s ethernet network batch system server GRIS compute node 2 batch system server compute node N GRIS local 1Gb/s ethernet network local 1Gb/s ethernet network compute node 2 compute node N compute node 2 compute node N The Globus Monitoring and Discovery Service Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
10 The MDS InfoSystem, II A specialized service provides information about site status. Each site reports its information to a local database (GRIS). Each GRIS registers with a global indexing service (GIIS). The client talks with the GIIS to get the list of sites, and then queries each GRIS for the site-specific information. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
11 LDAP The protocol underlying MDS is called LDAP. LDAP allows remote read/write accesses to a distributed database ( X.500 directory system ), with a flexible authentication and authorization scheme. LDAP makes the assumptions that most accesses are reads, so LDAP servers are optimized for infrequent writes. Reference: A. S. Tanenbaum, Computer Networks, ISBN Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
12 LDAP schemas Entries in an LDAP database are sets of key/value pairs. (Keys need not be unique; equivalently: a key can map to multiple values.) An LDAP schema specifies the names of allowed keys, and the type of corresponding values. Each entry declares a set of schemas it conforms to; every attribute in an LDAP entry must be defined in some schema. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
13 X.500/LDAP Directories Entries are organized into a tree structure (DIT). (So LDAP queries return subtrees, as opposed to flat sets of rows as in a RDBMS query.) Each entry is uniquely identified by a Distinguished Name (DN). The DN of an entry is formed by appending a one or more attribute values to the parent entry s DN. LDAP accesses might result in referrals, which redirect the client to access another entry at a remote server. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
14 Example Example: this is how the ARC MDS represent information about a cluster queue in LDAP. # all.q, gordias.unige.ch, Switzerland, grid dn: nordugrid-queue-name=all.q,nordugrid-cluster-name=gordias.unige. ch,mds-vo-name=switzerland,o=grid objectclass: Mds objectclass: nordugrid-queue nordugrid-queue-name: all.q nordugrid-queue-status: active nordugrid-queue-comment: sge default queue nordugrid-queue-homogeneity: TRUE nordugrid-queue-nodecpu: Xeon 2800 MHz nordugrid-queue-nodememory: 2048 nordugrid-queue-architecture: x86_64 nordugrid-queue-opsys: ScientificLinux-5.5 nordugrid-queue-totalcpus: 224 nordugrid-queue-gridqueued: 0 nordugrid-queue-prelrmsqueued: 4 nordugrid-queue-gridrunning: 0 nordugrid-queue-running: 0 nordugrid-queue-maxrunning: 136 nordugrid-queue-localqueued: 4 Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
15 Based on the information in the previous slide, can you decide whether to send a job that requires 200GB of scratch space to this cluster? Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
16 The MDS cluster model Exactly: there s no way to make that decision. ARC (and Globus) only provide CPU/RAM/architecture information. In addition, they assume clusters are organized into homogeneous queues, which might not be the case. This is just an example of a more general problem: what information do we need of a remote cluster and how to represent it? Reference: B. Kónya, The ARC Information System, infosys.pdf Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
17 MDS performance The complete LDAP tree of the SMSCG grid counts over entries. A full dump of the SMSCG infosystem tree requires about 30 seconds. So: 1. Information is several seconds old (on average) 2. It does not make sense to refresh information more often that this. By default, ARC refreshes the infosystem every 60 seconds. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
18 Supported and unsupported use cases, I Pre-installed application: OK The ARC InfoSys has a generic mechanism ( run time environments ) for providing installed software information. So you can select only sites that provide the application you want. (And the information provided in the InfoSys is usually enough to make a good guess about the overall performance.) Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
19 Supported and unsupported use cases, II Single-thread CPU-intensive native binary: OK However, the binary cannot not require unusual dynamic libraries; the binary cannot use CPU-specific features (no information on CPU model, so you cannot broker on that). Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
20 Supported and unsupported use cases, III Java/Python/Ruby/R script Require brokering based on a large number of support library/packages: if the dependencies are not there, the program cannot run. In theory, this solves the issue. In practice: there is always less information that would be useful, and providing all the information that would be useful is too much work. Ultimately, it relies on convention and good practice. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
21 Supported and unsupported use cases, IV Code benchmarking: FAIL Benchmarking code requires running all cases under the same conditions. There is just no way to guarantee that with the federation of clusters model: e.g., the site batch scheduler may run two jobs on compute nodes with a different CPU. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
22 Supported and unsupported use cases, V Parallel jobs: FAIL You can request a certain number of CPUs, but you have no information and no control over: CPU/threads allocation: all slots in a single large SMP machine? slots distributed evenly across nodes? communication mechanism: which MPI library is used? which transport fabric? (In theory, this can be solved by a careful choice of run time environments. In practice, it means that everybody has to agree how to represent that information, so it just replicates the schema problem.) Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
23 ARC: Pros and Cons Pros: Very simple to deploy, easy to extend. System and code complexity still manageable. Cons: The burden for scaling up is on each site, but not all sites have the required know-how/resources. Complexity of managing large collections of jobs is on the client software side. Fixed infosystem schema does not accomodate certain use cases. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
24 The glite approach glite job submit WMS top BDII site BDII batch system server 01 site BDII batch system server 01 site BDII batch system server local 1Gb/s ethernet network local 1Gb/s ethernet network local 1Gb/s ethernet network compute node 1 compute node 2 compute node N compute node 1 compute node 2 compute node N compute node 1 compute node 2 compute node N Reference: content/article/51-generaltechdocs/57-archoverview Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
25 The glite WMS Server-centric architecture: All jobs are submitted to the WMS server. WMS inspects the Grid status, makes the scheduling decision and submits jobs to sites. The WMS also monitors jobs as they run, and fetches back the output when a job is done. The client polls the WMS, and when a job is done gets the output from the WMS. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
26 The glite infosystem, I Hierarchical architecture, based on LDAP: 1. Each Grid element runs its own LDAP server (resource BDII) providing information on the software status and capabilities. 2. A site-bdii polls the local element servers, and aggregates information into a site view. 3. A top-bdii polls the site BDIIs and aggregates information into a global view. Each step requires processing the collected entries and creating a new LDIF tree based on the new information. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
27 The glite infosystem, II The CREAM computing element at CSCS has 43 entries in its resource BDII. Listing them takes 0.5 seconds. The CSCS site-bdii has 191 entries. Listing them takes 0.5 seconds. The CERN top-bdii has > entries, collected from circa 200 sites. Listing them all takes over 2 minutes time. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
28 The GLUE schema The glite information system represents systems status based on the GLUE schema. (Version 1.3 currently being phased out in favor of v. 2.0) Comprehensive and complex schema: 1. aimed at interoperability among Grid providers; 2. attempt to cover every feature supported by the major middlewares and production infrastructures (esp. HEP); 3. heavy use of cross-entry references. Can accomodate the scratch space example, but there s still no way of figuring out whether (and how) a job can request 16 cores on the same physical node. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
29 Comparison with ARC s InfoSystem ARC stores information about jobs and users in the infosystem: relatively large number of entries in the ARC infosys cannot scale to a large high-throughput infrastructure However, glite s BDII puts a large load on the top BDII: must handle load from all clients must be able to poll all site-bdiis in a fixed time so it must cope with network timeouts, slow sites, etc. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
30 glite WMS: Pros and Cons Pros: Global view of the Grid, could take better meta-scheduling decisions. Can support aggregate job types (e.g., workflows) Aggregates the monitoring operations, so reduces the load on site. Cons: The WMS is a single point of failure. Clients still use a polling mechanism, so the WMS must sustain the load. Extremely complex piece of software, running on a single machine: very hard to scale up! Relies on a infosystem to take sensible decisions (fixed schema/representation problem). Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
31 Condor condor_agent condor_master condor_submit batch system server condor_resource condor_resource batch system server condor_resource batch system server local 1Gb/s ethernet network local 1Gb/s ethernet network local 1Gb/s ethernet network compute node 1 compute node 2 compute node N compute node 1 compute node 2 compute node N compute node 1 compute node 2 compute node N Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
32 Condor overview Agents (client-side software) and Resources (cluster-side software) advertise their requests and capabilities to the Condor Master. The Master performs match-making between Agents requests and Resources offerings. An Agent sends its computational job directly to the matching Resource. Reference: Thain, D., Tannenbaum, T. and Livny, M. (2005): Distributed computing in practice: the Condor experience. Concurrency and Computation: Practice and Experience, 17: Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
33 What is matchmaking? Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
34 Matchmaking, I Same idea in Condor, except the schema is not fixed. Agents and Resources report their requests and offers using the ClassAd format (an enriched key=value format). No prescribed schema, hence a Resource is free to advertise any interesting feature it has, and to represent it in any way that fits the key=value model. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
35 Matchmaking, II 1. Agents specify a Requirements constraint: a boolean expression that can use any value from the Agents own (self) ClassAd or the Resource s (other). 2a. Resources whose offered ClassAd does not satisfy the Requirements constraint are discarded. 2b. Conversely, if the Agents ClassAd does not satisfy the Resource Requirements, the Resource is discarded. 3. Surviving Resources are sorted according to the value of the Rank expression in the Agent s ClassAd, and their list is returned to the Agent. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
36 Example: Job ClassAd Select 64-bit Linux hosts, and sort them preferring hosts with larger memory and CPU speed. Requirements = Arch=="x86_64" && OpSys == "LINUX" Rank = TARGET.Memory + TARGET.Mips Agent ClassAds play a role similar to job descriptions in ARC/gLite: specify the compatibility/resource requests. Reference: 4/4 1Condor s ClassAd.html Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
37 Example: Resource ClassAd A complex access policy, giving priority to users from the owner research group, then other friend users, and then the rest... Friend = Owner == "tannenba" ResearchGroup = (Owner == "jbasney" Owner == "raman") Trusted = Owner!= "rival" Requirements = Trusted && ( ResearchGroup LoadAvg < 0.3 && KeyboardIdle > 15*60 ) Rank = Friend + ResearchGroup*10 Resource ClassAds specify an access/usage policy for the resource. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
38 ClassAd wrap-up ClassAds provide an extensible mechanism for describing resources and requirements: 1. A set of standard ClassAd values is provided by Condor itself; 2. New values can be defined by the user (both client- and server-side). How can you submit a job that requires 200GB of local scratch space? Or 16 cores in a single node? Providing the right attributes for the match is now a organizational problem, not a technical one. Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
39 All these job management systems are based on a push model (you send the job to an execution cluster). Is there conversely a pull model? Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
40 References 1. Foster, I. (2002): What is the Grid? A Three Point Checklist., Grid Today, July 20, Thain, D., Tannenbaum, T. and Livny, M. (2005): Distributed computing in practice: the Condor experience. Concurrency and Computation: Practice and Experience, 17: DOI: /cpe Kónya, B. (20): The ARC Information System, infosys.pdf 4. Cecchi, M. et al. (2009): The glite Workload Management System, Lecture Notes in Computer Science, 5529/2009, pp Andreozzi, S. et al. (2009): GLUE Specification v. 2.0, Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
41 Average ping RTT to some SMSCG clusters. cluster time (ms) idgc3grid.uzh.ch hera.wsl.ch arc.lcg.cscs.ch smscg.epfl.ch gordias.unige.ch Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
42 Time to retrieve a single LDAP entry cluster time (ms) connect time (ms) smscg.epfl.ch gordias.unige.ch idgc3grid.uzh.ch arc.lcg.cscs.ch hera.wsl.ch Grid resource allocation R. Murri, Large Scale Computing Infrastructures, Nov. 23, 21
30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy
Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why the Grid? Science is becoming increasingly digital and needs to deal with increasing amounts of
More informationInformation and monitoring
Information and monitoring Information is essential Application database Certificate Certificate Authorised users directory Certificate Certificate Grid tools Researcher Certificate Policies Information
More informationGrid Scheduling Architectures with Globus
Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents
More informationGrid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia
Grid services Dusan Vudragovic dusan@phy.bg.ac.yu Scientific Computing Laboratory Institute of Physics Belgrade, Serbia Sep. 19, 2008 www.eu-egee.org Set of basic Grid services Job submission/management
More informationEGEE and Interoperation
EGEE and Interoperation Laurence Field CERN-IT-GD ISGC 2008 www.eu-egee.org EGEE and glite are registered trademarks Overview The grid problem definition GLite and EGEE The interoperability problem The
More informationGrid Architectural Models
Grid Architectural Models Computational Grids - A computational Grid aggregates the processing power from a distributed collection of systems - This type of Grid is primarily composed of low powered computers
More informationThe NorduGrid/ARC Information System
NORDUGRID NORDUGRID-TECH-4 27/5/2010 The NorduGrid/ARC Information System Technical Description and Reference Manual Balázs Kónya Version 0.9 Comments to: balazs.konya@hep.lu.se 1 Introduction A stable,
More informationGrid Computing Systems: A Survey and Taxonomy
Grid Computing Systems: A Survey and Taxonomy Material for this lecture from: A Survey and Taxonomy of Resource Management Systems for Grid Computing Systems, K. Krauter, R. Buyya, M. Maheswaran, CS Technical
More informationglite Grid Services Overview
The EPIKH Project (Exchange Programme to advance e-infrastructure Know-How) glite Grid Services Overview Antonio Calanducci INFN Catania Joint GISELA/EPIKH School for Grid Site Administrators Valparaiso,
More informationArchitecture Proposal
Nordic Testbed for Wide Area Computing and Data Handling NORDUGRID-TECH-1 19/02/2002 Architecture Proposal M.Ellert, A.Konstantinov, B.Kónya, O.Smirnova, A.Wäänänen Introduction The document describes
More informationCMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status
CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status High Level Requirements for user analysis computing Code Development Environment Compile, run,
More informationARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer
ARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer computing platform Internal report Marko Niinimaki, Mohamed BenBelgacem, Nabil Abdennadher HEPIA, January 2010 1. Background and motivation
More informationGrid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms
Grid Computing 1 Resource sharing Elements of Grid Computing - Computers, data, storage, sensors, networks, - Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem
More informationThe glite middleware. Ariel Garcia KIT
The glite middleware Ariel Garcia KIT Overview Background The glite subsystems overview Security Information system Job management Data management Some (my) answers to your questions and random rumblings
More informationGrid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen
Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen allen@bit.csc.lsu.edu http://www.cct.lsu.edu/~gallen Concrete Example I have a source file Main.F on machine A, an
More informationReplica Selection in the Globus Data Grid
Replica Selection in the Globus Data Grid Sudharshan Vazhkudai 1, Steven Tuecke 2, and Ian Foster 2 1 Department of Computer and Information Science The University of Mississippi chucha@john.cs.olemiss.edu
More informationHTCondor overview. by Igor Sfiligoi, Jeff Dost (UCSD)
HTCondor overview by Igor Sfiligoi, Jeff Dost (UCSD) Acknowledgement These slides are heavily based on the presentation Todd Tannenbaum gave at CERN in Feb 2011 https://indico.cern.ch/event/124982/timetable/#20110214.detailed
More informationAdvanced School in High Performance and GRID Computing November Introduction to Grid computing.
1967-14 Advanced School in High Performance and GRID Computing 3-14 November 2008 Introduction to Grid computing. TAFFONI Giuliano Osservatorio Astronomico di Trieste/INAF Via G.B. Tiepolo 11 34131 Trieste
More informationg-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.
g-eclipse A Framework for Accessing Grid Infrastructures Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.cy) EGEE Training the Trainers May 6 th, 2009 Outline Grid Reality The Problem g-eclipse
More informationInteroperating AliEn and ARC for a distributed Tier1 in the Nordic countries.
for a distributed Tier1 in the Nordic countries. Philippe Gros Lund University, Div. of Experimental High Energy Physics, Box 118, 22100 Lund, Sweden philippe.gros@hep.lu.se Anders Rhod Gregersen NDGF
More informationAliEn Resource Brokers
AliEn Resource Brokers Pablo Saiz University of the West of England, Frenchay Campus Coldharbour Lane, Bristol BS16 1QY, U.K. Predrag Buncic Institut für Kernphysik, August-Euler-Strasse 6, 60486 Frankfurt
More informationIntegration of Cloud and Grid Middleware at DGRZR
D- of International Symposium on Computing 2010 Stefan Freitag Robotics Research Institute Dortmund University of Technology March 12, 2010 Overview D- 1 D- Resource Center Ruhr 2 Clouds in the German
More informationGrids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan
Grids and Security Ian Neilson Grid Deployment Group CERN TF-CSIRT London 27 Jan 2004-1 TOC Background Grids Grid Projects Some Technical Aspects The three or four A s Some Operational Aspects Security
More informationCERN: LSF and HTCondor Batch Services
Batch @ CERN: LSF and HTCondor Batch Services Iain Steers, Jérôme Belleman, Ulrich Schwickerath IT-PES-PS INFN Visit: Batch Batch @ CERN 2 Outline The Move Environment Grid Pilot Local Jobs Conclusion
More informationWMS overview and Proposal for Job Status
WMS overview and Proposal for Job Status Author: V.Garonne, I.Stokes-Rees, A. Tsaregorodtsev. Centre de physiques des Particules de Marseille Date: 15/12/2003 Abstract In this paper, we describe briefly
More informationEdinburgh (ECDF) Update
Edinburgh (ECDF) Update Wahid Bhimji On behalf of the ECDF Team HepSysMan,10 th June 2010 Edinburgh Setup Hardware upgrades Progress in last year Current Issues June-10 Hepsysman Wahid Bhimji - ECDF 1
More informationOn the employment of LCG GRID middleware
On the employment of LCG GRID middleware Luben Boyanov, Plamena Nenkova Abstract: This paper describes the functionalities and operation of the LCG GRID middleware. An overview of the development of GRID
More informationChapter 3. Design of Grid Scheduler. 3.1 Introduction
Chapter 3 Design of Grid Scheduler The scheduler component of the grid is responsible to prepare the job ques for grid resources. The research in design of grid schedulers has given various topologies
More informationDay 9: Introduction to CHTC
Day 9: Introduction to CHTC Suggested reading: Condor 7.7 Manual: http://www.cs.wisc.edu/condor/manual/v7.7/ Chapter 1: Overview Chapter 2: Users Manual (at most, 2.1 2.7) 1 Turn In Homework 2 Homework
More informationAn Evaluation of Alternative Designs for a Grid Information Service
An Evaluation of Alternative Designs for a Grid Information Service Warren Smith, Abdul Waheed *, David Meyers, Jerry Yan Computer Sciences Corporation * MRJ Technology Solutions Directory Research L.L.C.
More informationThe Grid Monitor. Usage and installation manual. Oxana Smirnova
NORDUGRID NORDUGRID-MANUAL-5 2/5/2017 The Grid Monitor Usage and installation manual Oxana Smirnova Abstract The LDAP-based ARC Grid Monitor is a Web client tool for the ARC Information System, allowing
More informationGrid Computing Middleware. Definitions & functions Middleware components Globus glite
Seminar Review 1 Topics Grid Computing Middleware Grid Resource Management Grid Computing Security Applications of SOA and Web Services Semantic Grid Grid & E-Science Grid Economics Cloud Computing 2 Grid
More informationThe ARC Information System
The ARC Information System Overview of a GLUE2 compliant production system Florido Paganelli, Lund University EGI Community Forum 2012, Munich, 26-30 March 2012 Outline Existing solutions ARC key concepts
More informationCondor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet
Condor and BOINC Distributed and Volunteer Computing Presented by Adam Bazinet Condor Developed at the University of Wisconsin-Madison Condor is aimed at High Throughput Computing (HTC) on collections
More informationCloud Computing. Up until now
Cloud Computing Lectures 3 and 4 Grid Schedulers: Condor, Sun Grid Engine 2012-2013 Introduction. Up until now Definition of Cloud Computing. Grid Computing: Schedulers: Condor architecture. 1 Summary
More informationGrid Computing. Lectured by: Dr. Pham Tran Vu Faculty of Computer and Engineering HCMC University of Technology
Grid Computing Lectured by: Dr. Pham Tran Vu Email: ptvu@cse.hcmut.edu.vn 1 Grid Architecture 2 Outline Layer Architecture Open Grid Service Architecture 3 Grid Characteristics Large-scale Need for dynamic
More informationMONITORING OF GRID RESOURCES
MONITORING OF GRID RESOURCES Nikhil Khandelwal School of Computer Engineering Nanyang Technological University Nanyang Avenue, Singapore 639798 e-mail:a8156178@ntu.edu.sg Lee Bu Sung School of Computer
More informationA Compact Computing Environment For A Windows PC Cluster Towards Seamless Molecular Dynamics Simulations
A Compact Computing Environment For A Windows PC Cluster Towards Seamless Molecular Dynamics Simulations Yuichi Tsujita Abstract A Windows PC cluster is focused for its high availabilities and fruitful
More informationThe University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager
The University of Oxford campus grid, expansion and integrating new partners Dr. David Wallom Technical Manager Outline Overview of OxGrid Self designed components Users Resources, adding new local or
More informationA Simulation Model for Large Scale Distributed Systems
A Simulation Model for Large Scale Distributed Systems Ciprian M. Dobre and Valentin Cristea Politechnica University ofbucharest, Romania, e-mail. **Politechnica University ofbucharest, Romania, e-mail.
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationA Taxonomy and Survey of Grid Resource Management Systems
A Taxonomy and Survey of Grid Resource Management Systems Klaus Krauter 1, Rajkumar Buyya 2, and Muthucumaru Maheswaran 1 Advanced Networking Research Laboratory 1 Department of Computer Science University
More informationPoS(EGICF12-EMITC2)081
University of Oslo, P.b.1048 Blindern, N-0316 Oslo, Norway E-mail: aleksandr.konstantinov@fys.uio.no Martin Skou Andersen Niels Bohr Institute, Blegdamsvej 17, 2100 København Ø, Denmark E-mail: skou@nbi.ku.dk
More informationA Survey Paper on Grid Information Systems
B 534 DISTRIBUTED SYSTEMS A Survey Paper on Grid Information Systems Anand Hegde 800 North Smith Road Bloomington Indiana 47408 aghegde@indiana.edu ABSTRACT Grid computing combines computers from various
More informationCA464 Distributed Programming
1 / 25 CA464 Distributed Programming Lecturer: Martin Crane Office: L2.51 Phone: 8974 Email: martin.crane@computing.dcu.ie WWW: http://www.computing.dcu.ie/ mcrane Course Page: "/CA464NewUpdate Textbook
More informationARC middleware. The NorduGrid Collaboration
ARC middleware The NorduGrid Collaboration Abstract The paper describes the existing components of ARC, discusses some of the new components, functionalities and enhancements currently under development,
More informationAssignment 5. Georgia Koloniari
Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last
More informationLook What I Can Do: Unorthodox Uses of HTCondor in the Open Science Grid
Look What I Can Do: Unorthodox Uses of HTCondor in the Open Science Grid Mátyás Selmeci Open Science Grid Software Team / Center for High- Throughput Computing HTCondor Week 2015 More Than a Batch System
More informationFaceID-Grid: A Grid Platform for Face Detection and Identification in Video Storage
FaceID-Grid: A Grid Platform for Face Detection and Identification in Video Storage Filipe Rodrigues filipe.rodrigues@ist.utl.pt Instituto Superior Técnico Abstract. Face Recognition systems have received
More informationGrid Interoperation and Regional Collaboration
Grid Interoperation and Regional Collaboration Eric Yen ASGC Academia Sinica Taiwan 23 Jan. 2006 Dreams of Grid Computing Global collaboration across administrative domains by sharing of people, resources,
More informationConfiguring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.
Configuring the Oracle Network Environment Objectives After completing this lesson, you should be able to: Use Enterprise Manager to: Create additional listeners Create Oracle Net Service aliases Configure
More informationCloud Computing. Summary
Cloud Computing Lectures 2 and 3 Definition of Cloud Computing, Grid Architectures 2012-2013 Summary Definition of Cloud Computing (more complete). Grid Computing: Conceptual Architecture. Condor. 1 Cloud
More informationIntroduction to SRM. Riccardo Zappi 1
Introduction to SRM Grid Storage Resource Manager Riccardo Zappi 1 1 INFN-CNAF, National Center of INFN (National Institute for Nuclear Physic) for Research and Development into the field of Information
More informationL3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat
Grid Workflow Efficient Enactment for Data Intensive Applications L3.4 Data Management Techniques Authors : Eddy Caron Frederic Desprez Benjamin Isnard Johan Montagnat Summary : This document presents
More informationMERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced
MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced Sarvani Chadalapaka HPC Administrator University of California
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationGrid Compute Resources and Job Management
Grid Compute Resources and Job Management How do we access the grid? Command line with tools that you'll use Specialised applications Ex: Write a program to process images that sends data to run on the
More informationMI-PDB, MIE-PDB: Advanced Database Systems
MI-PDB, MIE-PDB: Advanced Database Systems http://www.ksi.mff.cuni.cz/~svoboda/courses/2015-2-mie-pdb/ Lecture 10: MapReduce, Hadoop 26. 4. 2016 Lecturer: Martin Svoboda svoboda@ksi.mff.cuni.cz Author:
More informationScheduling Large Parametric Modelling Experiments on a Distributed Meta-computer
Scheduling Large Parametric Modelling Experiments on a Distributed Meta-computer David Abramson and Jon Giddy Department of Digital Systems, CRC for Distributed Systems Technology Monash University, Gehrmann
More informationGeant4 on Azure using Docker containers
http://www.geant4.org Geant4 on Azure using Docker containers Andrea Dotti (adotti@slac.stanford.edu) ; SD/EPP/Computing 1 Outlook Motivation/overview Docker + G4 Azure + G4 Conclusions 2 Motivation/overview
More informationA SEMANTIC MATCHMAKER SERVICE ON THE GRID
DERI DIGITAL ENTERPRISE RESEARCH INSTITUTE A SEMANTIC MATCHMAKER SERVICE ON THE GRID Andreas Harth Yu He Hongsuda Tangmunarunkit Stefan Decker Carl Kesselman DERI TECHNICAL REPORT 2004-05-18 MAY 2004 DERI
More informationMultiple Broker Support by Grid Portals* Extended Abstract
1. Introduction Multiple Broker Support by Grid Portals* Extended Abstract Attila Kertesz 1,3, Zoltan Farkas 1,4, Peter Kacsuk 1,4, Tamas Kiss 2,4 1 MTA SZTAKI Computer and Automation Research Institute
More informationCHAPTER 2 LITERATURE REVIEW AND BACKGROUND
8 CHAPTER 2 LITERATURE REVIEW AND BACKGROUND 2.1 LITERATURE REVIEW Several researches have been carried out in Grid Resource Management and some of the existing research works closely related to this thesis
More informationScheduling Jobs onto Intel Xeon Phi using PBS Professional
Scheduling Jobs onto Intel Xeon Phi using PBS Professional Scott Suchyta 1 1 Altair Engineering Inc., 1820 Big Beaver Road, Troy, MI 48083, USA Abstract As new hardware and technology arrives, it is imperative
More informationPoS(EGICF12-EMITC2)074
The ARC Information System: overview of a GLUE2 compliant production system Lund University E-mail: florido.paganelli@hep.lu.se Balázs Kónya Lund University E-mail: balazs.konya@hep.lu.se Oxana Smirnova
More informationTowards sustainability: An interoperability outline for a Regional ARC based infrastructure in the WLCG and EGEE infrastructures
Journal of Physics: Conference Series Towards sustainability: An interoperability outline for a Regional ARC based infrastructure in the WLCG and EGEE infrastructures To cite this article: L Field et al
More informationARC integration for CMS
ARC integration for CMS ARC integration for CMS Erik Edelmann 2, Laurence Field 3, Jaime Frey 4, Michael Grønager 2, Kalle Happonen 1, Daniel Johansson 2, Josva Kleist 2, Jukka Klem 1, Jesper Koivumäki
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationDIRAC pilot framework and the DIRAC Workload Management System
Journal of Physics: Conference Series DIRAC pilot framework and the DIRAC Workload Management System To cite this article: Adrian Casajus et al 2010 J. Phys.: Conf. Ser. 219 062049 View the article online
More informationWhat s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018
What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018 Todd Tannenbaum Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison
More informationMONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT
The Monte Carlo Method: Versatility Unbounded in a Dynamic Computing World Chattanooga, Tennessee, April 17-21, 2005, on CD-ROM, American Nuclear Society, LaGrange Park, IL (2005) MONTE CARLO SIMULATION
More informationDISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed
More informationPROOF-Condor integration for ATLAS
PROOF-Condor integration for ATLAS G. Ganis,, J. Iwaszkiewicz, F. Rademakers CERN / PH-SFT M. Livny, B. Mellado, Neng Xu,, Sau Lan Wu University Of Wisconsin Condor Week, Madison, 29 Apr 2 May 2008 Outline
More informationThe NorduGrid Architecture and Middleware for Scientific Applications
The NorduGrid Architecture and Middleware for Scientific Applications O. Smirnova 1, P. Eerola 1,T.Ekelöf 2, M. Ellert 2, J.R. Hansen 3, A. Konstantinov 4,B.Kónya 1, J.L. Nielsen 3, F. Ould-Saada 5, and
More informationHEP replica management
Primary actor Goal in context Scope Level Stakeholders and interests Precondition Minimal guarantees Success guarantees Trigger Technology and data variations Priority Releases Response time Frequency
More informationOutline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems
Distributed Systems Outline Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems What Is A Distributed System? A collection of independent computers that appears
More informationJyotheswar Kuricheti
Jyotheswar Kuricheti 1 Agenda: 1. Performance Tuning Overview 2. Identify Bottlenecks 3. Optimizing at different levels : Target Source Mapping Session System 2 3 Performance Tuning Overview: 4 What is
More informationPoS(EGICF12-EMITC2)004
: bridging the Grid and Cloud worlds Riccardo Murri GC3: Grid Computing Competence Center University of Zurich E-mail: riccardo.murri@gmail.com GC3: Grid Computing Competence Center University of Zurich
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationAdaptive Cluster Computing using JavaSpaces
Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of
More informationHigh Performance Computing Course Notes Grid Computing I
High Performance Computing Course Notes 2008-2009 2009 Grid Computing I Resource Demands Even as computer power, data storage, and communication continue to improve exponentially, resource capacities are
More informationGustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2
Chapter 1: Distributed Information Systems Gustavo Alonso Computer Science Department Swiss Federal Institute of Technology (ETHZ) alonso@inf.ethz.ch http://www.iks.inf.ethz.ch/ Contents - Chapter 1 Design
More informationIvane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU
Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU Grid cluster at the Institute of High Energy Physics of TSU Authors: Arnold Shakhbatyan Prof. Zurab Modebadze Co-authors:
More informationOBTAINING AN ACCOUNT:
HPC Usage Policies The IIA High Performance Computing (HPC) System is managed by the Computer Management Committee. The User Policies here were developed by the Committee. The user policies below aim to
More informationFirst evaluation of the Globus GRAM Service. Massimo Sgaravatto INFN Padova
First evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova massimo.sgaravatto@pd.infn.it Draft version release 1.0.5 20 June 2000 1 Introduction...... 3 2 Running jobs... 3 2.1 Usage examples.
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton
More informationThe ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015
The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015 Overview Architecture Performance LJSFi Overview LJSFi is an acronym of Light
More informationChapter 1: Distributed Information Systems
Chapter 1: Distributed Information Systems Contents - Chapter 1 Design of an information system Layers and tiers Bottom up design Top down design Architecture of an information system One tier Two tier
More informationUnderstanding StoRM: from introduction to internals
Understanding StoRM: from introduction to internals 13 November 2007 Outline Storage Resource Manager The StoRM service StoRM components and internals Deployment configuration Authorization and ACLs Conclusions.
More informationIntroduction to Grid Infrastructures
Introduction to Grid Infrastructures Stefano Cozzini 1 and Alessandro Costantini 2 1 CNR-INFM DEMOCRITOS National Simulation Center, Trieste, Italy 2 Department of Chemistry, Università di Perugia, Perugia,
More informationVirtualizing a Batch. University Grid Center
Virtualizing a Batch Queuing System at a University Grid Center Volker Büge (1,2), Yves Kemp (1), Günter Quast (1), Oliver Oberst (1), Marcel Kunze (2) (1) University of Karlsruhe (2) Forschungszentrum
More informationO. Pospishnyi. National Technical University of Ukraine Kyiv Polytechnic Institute, 37, Peremohy Ave., Kyiv Ukraine
INFORMATION SCIENCE AND INFORMATION SYSTEMS RECEIVED 15.01.2014 ACCEPTED 02.02.2014 PUBLISHED 04.02.2014 DOI: 10.15550/ASJ.2014.02.009 GRID RESOURCE ONTOLOGY: A KEYSTONE OF SEMANTIC GRID INFORMATION SERVICE
More informationHistory of SURAgrid Deployment
All Hands Meeting: May 20, 2013 History of SURAgrid Deployment Steve Johnson Texas A&M University Copyright 2013, Steve Johnson, All Rights Reserved. Original Deployment Each job would send entire R binary
More informationGergely Sipos MTA SZTAKI
Application development on EGEE with P-GRADE Portal Gergely Sipos MTA SZTAKI sipos@sztaki.hu EGEE Training and Induction EGEE Application Porting Support www.lpds.sztaki.hu/gasuc www.portal.p-grade.hu
More informationTutorial 4: Condor. John Watt, National e-science Centre
Tutorial 4: Condor John Watt, National e-science Centre Tutorials Timetable Week Day/Time Topic Staff 3 Fri 11am Introduction to Globus J.W. 4 Fri 11am Globus Development J.W. 5 Fri 11am Globus Development
More informationFault tolerance based on the Publishsubscribe Paradigm for the BonjourGrid Middleware
University of Paris XIII INSTITUT GALILEE Laboratoire d Informatique de Paris Nord (LIPN) Université of Tunis École Supérieure des Sciences et Tehniques de Tunis Unité de Recherche UTIC Fault tolerance
More informationGFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures
GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,
More informationArchitectural challenges for building a low latency, scalable multi-tenant data warehouse
Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationA Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers
A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented
More information