The Data Warehouse in a Distributed Utility Environment

Similar documents
A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Maintaining temporal validity of real-time data on non-continuously executing resources

Cluster Analysis of Electrical Behavior

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Load Balancing for Hex-Cell Interconnection Network

AADL : about scheduling analysis

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Virtual Machine Migration based on Trust Measurement of Computer Node

Efficient Distributed File System (EDFS)

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

An Optimal Algorithm for Prufer Codes *

Load-Balanced Anycast Routing

Simulation Based Analysis of FAST TCP using OMNET++

Video Proxy System for a Large-scale VOD System (DINA)

TripS: Automated Multi-tiered Data Placement in a Geo-distributed Cloud Environment

A New Transaction Processing Model Based on Optimistic Concurrency Control

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

Pricing Network Resources for Adaptive Applications in a Differentiated Services Network

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Module Management Tool in Software Development Organizations

Dynamic Bandwidth Provisioning with Fairness and Revenue Considerations for Broadband Wireless Communication

The Codesign Challenge

Resource and Virtual Function Status Monitoring in Network Function Virtualization Environment

Research Article Adaptive Cost-Based Task Scheduling in Cloud Environment

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

Internet Traffic Managers

Avoiding congestion through dynamic load control

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Real-Time Guarantees. Traffic Characteristics. Flow Control

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

An Entropy-Based Approach to Integrated Information Needs Assessment

Solution Brief: Creating a Secure Base in a Virtual World

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

Optimized Resource Scheduling Using Classification and Regression Tree and Modified Bacterial Foraging Optimization Algorithm

On-line Scheduling Algorithm with Precedence Constraint in Embeded Real-time System

CMPS 10 Introduction to Computer Science Lecture Notes

Efficient Content Distribution in Wireless P2P Networks

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

Lecture 7 Real Time Task Scheduling. Forrest Brewer

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

X- Chart Using ANOM Approach

Goals and Approach Type of Resources Allocation Models Shared Non-shared Not in this Lecture In this Lecture

A Model Based on Multi-agent for Dynamic Bandwidth Allocation in Networks Guang LU, Jian-Wen QI

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Reliability and Performance Models for Grid Computing

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

Shared Running Buffer Based Proxy Caching of Streaming Sessions

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Cognitive Radio Resource Management Using Multi-Agent Systems

A Genetic Algorithm Based Dynamic Load Balancing Scheme for Heterogeneous Distributed Systems

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7

Application of Improved Fish Swarm Algorithm in Cloud Computing Resource Scheduling

ARTICLE IN PRESS. Signal Processing: Image Communication

Optimized caching in systems with heterogeneous client populations

Constructing Minimum Connected Dominating Set: Algorithmic approach

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Analysis of Collaborative Distributed Admission Control in x Networks

Mixed-Criticality Scheduling on Multiprocessors using Task Grouping

Cost-efficient deployment of distributed software services

Active Contours/Snakes

A Binarization Algorithm specialized on Document Images and Photos

Petri Net Based Software Dependability Engineering

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Private Information Retrieval (PIR)

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Two-Stage Data Distribution for Distributed Surveillance Video Processing with Hybrid Storage Architecture

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Verification by testing

Support Vector Machines

Support Vector Machines

Achieving class-based QoS for transactional workloads

Efficient Broadcast Disks Program Construction in Asymmetric Communication Environments

A Proactive Non-Cooperative Game-theoretic Framework for Data Replication in Data Grids

Related-Mode Attacks on CTR Encryption Mode

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

Chapter 1. Introduction

Concurrent Apriori Data Mining Algorithms

A QoS-aware Scheduling Scheme for Software-Defined Storage Oriented iscsi Target

Hybrid Job Scheduling Mechanism Using a Backfill-based Multi-queue Strategy in Distributed Grid Computing

3. CR parameters and Multi-Objective Fitness Function

Multi-objective Virtual Machine Placement for Load Balancing

A fair buffer allocation scheme

SPATIAL DATA INTEGRATION APPROACH WITH APPLICATIONS IN FACILITY LOCATION

Application of VCG in Replica Placement Strategy of Cloud Storage

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Wishing you all a Total Quality New Year!

Game Based Virtual Bandwidth Allocation for Virtual Networks in Data Centers

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Positive Semi-definite Programming Localization in Wireless Sensor Networks

TOWARDS OPTIMAL RESOURCE ALLOCATION FOR DIFFERENTIATED MULTIMEDIA SERVICES IN CLOUD COMPUTING ENVIRONMENT. Xiaoming Nan, Yifeng He, and Ling Guan

Fitting: Deformable contours April 26 th, 2018

Oracle Database: 12c Administrator

Adaptive Resource Allocation Control with On-Line Search for Fair QoS Level

Dynamic Critical-Path Task Mapping and Scheduling for Collaborative In-Network Processing in Multi-Hop Wireless Sensor Networks

Transcription:

The Data Warehouse n a Dstrbuted Utlty Envronment Charles A. Mllgan Dstngushed Engneer, Sun Mcrosystems Charles.mllgan@sun.com Abstract Utlty provsonng, Grd resource management, nstant copy kosks, and network transfers provde an exctng new paradgm for data warehouse functons. Grd technologes are fast becomng THE approach for utlty computng. Ths enables on demand provsonng to be based on the utlty model where resources use s based on requrements specfed n a contract. In ths paper, a contract-based workflow management system relatng to demand use of a Grd enabled data warehouse wth flexble schedulng algorthms s proposed that allows for selectve applcaton of rule systems n order to balance between requrements to mnmze executon cost whle meetng deadlnes for provdng or processng nformaton. 1. Introducton The utlty model [8] s beng suggested as the next provsonng approach capable of supportng dverse e- Busness processes and applcatons over local or global webs. The basc defnton of a utlty s that users subscrbe for servces. These servces can be consumed va a demand pull model or a leased connecton model. In the demand pull model, the users pay only for what s used or consumed. In the leased connecton model, the connecton s what s leased ndependent of the connecton tme or the resources consumed. It has become easer to make utlty based servces avalable by utlzng Grd nfrastructures (both Grd computng for data analyss and Grd Storage for data repostores). A Grd [4] provdes an nfrastructure that enables users to dentfy capabltes and structure servces transparently over a network nfrastructure that has emergng standards for consstency, sharng, securty, and global reach. Data warehousng envronments have focused sutes of processes wth applcatons that use data from or derve metadata from the warehouse. Such sutes are stoveppe domans such as genomcs or astrophyscs and requre workflow processng n whch software s executed based on ther metadata control whch dentfes data dependences. In the recent past, a number of Grd workflow management systems wth schedulng algorthms have been developed (e.g., see lst developed n Yu [9].) whch facltate workflow applcaton executon on Grds and mnmze executon tme. Schedulng workflow based on a user contract has only recently been addressed n Grd workflow management systems Yu [9]. For a utlty model, prcng requres access to the provder metadata as well as the user contract. Typcally servce provders charge hgher prces for hgher levels of contracted servce. However, contracts generally dentfy a schedule of requrements and often there s no advantage for completng the requested set of tasks earler than specfed. Users generally prefer to receve less expensve servces wth lower performance as long as the level s suffcent to meet ther requrements. Gven ths motvaton, the partcular contract form known as Qualty of Servce or QoS wll be the bass for drectng workflow management. QoS resolves specfcatons nto at least two sets, those for whch mnmzaton s preferable (typfed by tme, cost, error rate, nterrupton, varance, etc.), and those for whch maxmzaton s preferable (typfed by transacton rate, data rate, avalablty, etc.). The QoS optmzaton works to globally mnmze the former across the actve populaton (e.g., executon tme) whle also attemptng to maxmzng the latter (e.g., transacton rate). Unfortunately, these efforts often work aganst each other because mnmzng tme means utlzng the fastest resources whch can cause queung that n turn actually slows the overall transacton rate. In ths paper, basc QoS-based workflow management requrements utlzng a novel workflow schedulng method nvolvng resource selecton and data clonng va nstant copy means s presented. The obectve functon of the proposed schedulng algorthm s to develop workflow schedule such that t mnmzes the executon tme/cost and yet meet the performance constrants mposed by the user n a mult user envronment that requres multple accesses to warehouse resources smultaneously. In order to solve schedulng problems effcently for large-scale workflows, we workflow tasks are parttoned and the workflow executon schedule s generated based on the optmal schedules of task parttons. Ths requres that a data parttonng schema s also mposed that mmes the task parttonng. A deadlne assgnment strategy s also developed to dstrbute the overall deadlne over each partton. An optmal schedulng that can be dynamcally 0-7695-2507-5/06/$20.00 (C) 2006 IEEE 1

modfed s derved by modelng the branch partton as a Markov Decson Process (MDP) [7], whch s shown n ths paper to be effectve for modelng these decson problems. 2. QoS contracts for task Management QoS-based management n servce envronments mpacts all levels (specfcaton, dscovery and schedulng). In ths paper, the term servce means utlty based executon of tasks to retreve or analyze data or metadata from the dstrbuted warehouse. The archtecture of a typcal QoSbased workflow management system s shown n Fgure 1. The components of the workflow management system are dscussed below. 2.1. Workflow Specfcaton The QoS-based workflow management system allows the user to specfy ther requrements along wth the descrptons of tasks and ther dependences usng the workflow specfcaton. In general, QoS constrants express the preferences of users and are essental for effcent resource allocaton. We categorze workflow QoS constrants nto task-level and data access constrants. QoS-based contract analyss system Contract volaton Task & Data Plannng Executon Montor Specfcaton Dscovery Executon engne Workflow executon schedulng Contract negotaton Reservaton/Pre-allocaton Specfc nstance of use RSP: Warehouse and Resource Servce Provder Task kosk transfer Feedback Fgure 1. QoS contract model Resource Drectores Servce Drectores Analyss Servce MetaData/Data At the task level, as llustrated n Table 1, QoS constrants are specfed wth correspondng tasks. In ths scenaro, the two QoS constrants, e.g., cost and performance are specfed wth each task. In contrast, QoS constrants at the workflow level are gven for entre workflow executon. In the example shown n Table 2, the workflow executon s requred to be completed before date-tme+-delta at the mnmum cost, but the order specfes prorty of deadlne over cost. RSP Dstrbuted data warehouse Table 1. Task-level QoS specfcaton. Constrants Tme Duraton Dependences Contngences Cost Aspratons Prortes Analyss depth Table 2. Job-level specfcaton. Constrants Tme boundary for entre task set Fnsh by date/tme/+-delta Dependences on other workflows Cost - mnmze Aspratons Qualty Prortes Usng systems (e.g., actual users nterfacng through a user nterface, or applcatons runnng on clent systems, ) may need to specfy QoS two sets of metadata. Frst there are constrants, such as deadlne and budget, for the overall workflow processng. Second there are requrements on specfc tasks dentfed as potental bottlenecks n the overall flow. 2.2. Data and Resource Dscovery and Allocaton After submsson of the workflow specfcaton, the workflow system needs to dscover the approprate servces for processng the tasks and the approprate data repostory declensons. In a complex workflow, dfferent tasks requre dfferent types of servces, metadata, and data. In a utlty model, the computng Grd resources, even for the same type of servces, are deployed by dfferent provders at dfferent tmes and are dstrbuted across multple admnstratve domans. However, n the Grd Storage models beng developed today, the metadata and data resources are n specfc locatons. These locatons must be dentfed and then mapped nto vrtual servce space offerngs to allow for multple smultaneously usage. Ths s done by utlzng an nstant copy approach such as Snapshot [1]. The Snap s taken once the data s dentfed for a partcular executon servce and the two are mated. Thus f the same data s requred for multple smultaneous executons (common n compute Grd domans) each task wll receve parallel access to the exact same data but not have to resolve any data sharng problems. In addton, every servce has ts own local polcy for dfferent users, such as authorzaton and prcng. The workflow system should be able to query a dstrbuted nformaton servce such as a grd compute resource drectory and Grd data warehouse drectory and generate a lst of avalable servces and concomtant data for every task for the user of the workflow. Admttedly, data and servces come and go on Grds so the nformaton servce s constantly n a state of flux drven by dscovery of mssng resources and announcement of new arrvals. In a utlty model of Grd enabled resources, the attrbutes of servces for processng the same task can be qute dverse. Dfferent servce provders can each offer 2

levels of servce dfferent from all the rest. One servce provder also can offer varous levels for satsfyng dfferent sets of requrements. The prcng for the servces s usually closely related to the level of resource consumpton provded but can be rather flat for the varous nstances of data provded. However, some users may have prorty n terms of servce order, executon tme and prce that can be demanded from certan servce provders. In addton, servce provders may adust the servce prce based on peak and off-peak perods n order to enhance the utlzaton of ther resources. The ablty of a provder of data resources to antcpate access requrements and create nstant data marts and data mart clones va nstant copy Mllgan [6] s analogous to the avalablty of alternatve compute nodes beng brought on lne for peak hour processng. The knowledge of contract detals for all avalable servces s the key to schedulng workflow tasks effcently. Such knowledge ntally starts from the Workflow Management System (WMS) sendng a request to the servces drectory management systems. In the request t ndcates the task parameters, user of the workflow and the estmated executon perod and dentfes data domans. On recevng the request, the drectory servces reply wth the contract parameters (e.g. processng speed, avalable storage space and free memory, data locatons [e.g., clone ID or data mart ID]) of the servce they can offer and the correspondng prce for delverng the servce at the specfed QoS level. 2.3. Workflow Schedulng Workflow schedulng conssts of two phases. The frst maps the data requrements to the task elements. Ths effort then wll ntate the creaton of data clones where advsed by makng nstant data marts va a snapshot technque. The second focuses on mappng and managng the executon of ndvdual workflow tasks onto servces found n the drectory and from there onto specfc resources. For the utlty model of Grd, the schedulng decson durng workflow schedulng must be guded by the QoS constrants. There are three maor steps n workflow schedulng: plannng, pre-allocaton, executon flow control. Flow control s then montored and exceptons noted whch may requre run-tme reschedulng. Workflow plannng s to select a publshed servce provder for every task n the workflow. If no servces can be found, the task must be reected or a servce must be constructed from a composte of several more fundamental servces. Ths servce constructon process s beyond the scope of ths paper. Ths then must be placed n the context of a schedule before workflow executon can be ntated. A model s constructed from the schedule and the results of the model must satsfy the QoS constrants. The decson makng of the planner for workflow executon needs to reference the entre workflow accordng to the QoS parameters of servces obtaned from QoS requests. In general, mappng tasks on dstrbuted servces s dffcult for a varety of reasons (e.g., coordnatng data locatons wth compute locatons, transent nature of dstrbuted servces, ndependent queung efforts ) The workflow planner wll generally only produce sub-optmal ntal schedulng n order to balance the requrements. The pre-allocaton functon s requred n order to guarantee servces to meet the QoS requrements [2]. It s mportant to workflow schedulng especally for workflow executon that s lengthy or complcated. Workflow management systems need to make reservaton of servces selected by the planner n advance to ensure the avalablty of servces. The tme slots for advance reservaton servces can be generated based on a multtude of potental servce confguratons and possble start tmes for the workflow executon. The earlest start tme of the task depends on the possble completon tme of ts parent tasks. If a task has more than one predecessor, the start tme s the latest completon tme of ts predecessors. If we consder communcaton overhead, the task start tme wll be the latest completon tme of parent tasks plus the communcaton tme. The start tme and the communcaton tme nclude the dentfed admnstratve tasks such as creatng nstant copes of data elements or mappng out nstant data marts. Instant data marts are created Mllgan [6] usng an nstant copy mechansm and then may or may not requre physcal movement of the data to an optmal locaton for the contnued processng of the workflow. Tme slots of desred servces requested by the plannng functon may not be avalable when the workflow system makes the reservatons. As a consequence, the schedulng and plannng become pece wse teratve so that acquston can be made on altered schedules. The utlty computng model requres a QoS guarantee and needs to meet servce commtments. However, Grd envronments are noted for flexblty of membershp and servce nodes may dsappear durng any phase of operaton. Thus the possblty that servces may volate the contract between the workflow system and servce provder. Ths comes about because of servce falure (resources dsappear or break) and servce delay due to other servce consumers wth hgher prorty reschedulng and takng resources away. Therefore, the workflow scheduler must be able to adapt and update the schedule based on resource dynamcs. Some resources are prncple and cannot be substtuted (e.g., access to specfc hardware wth unque functonalty). Such resource contenton wll nevtably result n delay that may volate the QoS requrements. The scheduler n ths nstance must then work on the bass of mnmzng the over all cost of penaltes assocated wth such volatons. A QoS montor s requred n the system to montor the agreed performance and nform the planner of any changes. In addton to dealng wth the stuatons of contract volaton, the QoS montor can use reschedulng to handle unforeseen unavalablty of scheduled servces at the tme of a task executon. Some resources are general and can be substtuted wth calculable effects (e.g., a processor s a processor but all do not run at the same executon rate). Such resources must then be pooled and utlzaton meted out on a prortzaton schedule to best meet QoS 3

requrements. In the case of the data and metadata resources, the ablty to clone data Mllgan [6] and make ndvdual task kosks avalable for multple processes run n parallel makes the data schedulng effort very straght forward. Some nodes or resources n a networked system may not sgn up for reservaton of servces, n ths case servce avalablty can only be known at run-tme. In the utlty model, the actual allocaton of servces s not under the control of workflow management system. The actual commtment for servce provson s based on the Servce Executon Contract (SEC) between the workflow management system and servce provders. An SEC s a contract that specfes the mnmum expectatons and oblgatons that exst between consumers and provders [3]. SEC parameters for workflow tasks are QoS requrements of task processng and they nclude performance obectves such as earlest start tme and latest completon tme, and cost for a rate of consumpton, and avalablty of elements schedule. Avalablty of elements s especally mportant n relaton to the nstant kosk management where data s not only cloned va an nstant copy technque, but also must be delvered to a specfc node n the network for the analyss or processng. Penalty clauses for servce level volaton are also requred n an SEC to enforce servce level guarantees. The penalty levels for servce executon volaton may vary for dfferent workflow tasks. The penalty levels for workflow task processng should be based on the degree of mpact on the whole workflow executon rather than on a sngle servce executon. 3. A QoS-based Workflow Schedulng The data avalablty processes tme, data analyss processng tme, and analyss executon costs are typcal QoS constrants for executng workflows on utlty systems. The users normally request executon at lowest possble cost wthn a requred tmeframe. Followng s a workflow schedulng methodology and algorthm that allows the workflow management system to mnmze the cost whle meetng the deadlne. The model of a workflow s taken form known art Yu [9] and repeated here for clarty wth accompanyng examples. The best model for workflow applcatons s to use a Drected Acyclc Graph (DAG). represents the fnte set of tasks T ( 1 n). Wth representng the set of drected arcs of the form ( T, T ). T s dentfed as a parent task oft, and T s the chld task oft. We assume that a chld task cannot be executed untl all of ts parent tasks are completed. Let D be the tme constrant (deadlne) specfed by the users for workflow executon. Then, the workflow applcaton can be descrbed as a tuple (,, D). In such a workflow graph, a task whch does not have any parent task s an entry task denoted as T entry and a task whch does not have any chld task s an ext task denoted as T. ext Let m be the total number of servces avalable. There are a set of servces S : cond ( 1 n, 1 m, m m) s capable of executng the taskt, but only one servce can be assgned for the executon of a task. Servces have vared processng capablty delvered at dfferent prces. In general, the servce prce seems to vary and be drectly proportonal to the performance capabltes of the resources whch makes them nversely proportonal to the tme requred to servce. Denote t (such that the requred condton s satsfed) as the sum of the processng tme and data transmsson tme, and c (such that cond ssatsfed) as the sum of the servce prce and data transmsson cost for processng T on servce S. The schedulng problem s to map every T onto some S to acheve mnmum executon cost and complete the workflow executon wthn the deadlne D. The clonng of data resources and assurng the avalablty of task specfc kosks must be nserted n the work flow as nstances of tasks The schedulng problem s solved by followng a straghtforward methodology. Frst one must dscover avalable servces and then request QoS parameters for these servces organzed by task. Then the workflow tasks must be grouped nto task parttons wth the overall tmng constrants scheduled nto every task partton. The next step s to generate an optmzed schedule based on local optmzaton wthn each partton. Fnally, start workflow executon and montor callng for reschedulng when the ntal schedule s volated at run-tme. Workflow tasks can be categorzed as setup, synchronzaton, or executon. A setup task could be the nstant copy of the data requred for the executon task whch ncludes kosk wrappers and possbly data movement to the executon local. A synchronzaton task s defned as a task whch has more than one parent or chld task. Fgure 2 s avalable complments of Yu [10]. In the dagram a, T 1, T 10 and T 14 are synchronzaton tasks. Other tasks whch have only one parent task and chld task are smple tasks. In the example, T2 T9 and T11 T13 are executon tasks. The setup tasks can be ether synchronzaton or executon tasks but always are parent to an executon. T. 4

T 1 (a) Before parttonng. (b) After parttonng. Fgure 2. Workflow task partton. Let a branch be a set of smple tasks that are executed sequentally between two synchronzaton tasks. For example, the branches n Fgure 2b are { T 2, T3, T4},{ T, T 5 6}, { T } 7, { T, T 8 9}, { T } 11 and { T, T 12 13}. Then partton workflow tasks nto ndependent branches B ( 1 k) and synchronzaton tasks Y ( 1 l), such that k and l are the total number of branches and synchronzaton tasks n the workflow respectvely. LetV be a set of nodes n a DAG correspondng to a set of task parttons V ( 1 k + l). Let E be the set of drected edges of the form ( V, V ) where V s a parent task partton of V and V s a chld task partton ofv. Then, a task partton graph s denoted as G ( V, E, D). A smple path (referred to as path) n G s a sequence of task parttons such that there s a drected edge from every task partton (n the path) to ts chld, where none of the vertces (task parttons) n the path s repeated. A task partton V has four attrbutes: start tme ( st [ V ] ), deadlne ( dl [ V ] ), expected executon tme ( eet [ V ] ), and mnmum executon tme ( met [ V ] ). The earlest start tme of V s the earlest tme the frst task n t can be executed and t can be computed accordng to ts parent parttons, st [ V ] = max dl[ V ], where P s the set of parent task T 8 V P parttons of V s mn t 1 y m Tx V x ] st [ V. dl[ V - ] Executon task Synchronzaton task T 2 T 5 T 7 T 9 V T 3 y x T 6 T 10 T 11 T 4 T 12 T 14 T 13 Branch T 9 T 12. The mnmum executon tme of. The attrbutes are related as: eet V ] = After workflow task parttonng, dstrbute the overall deadlne between each partton. Ths s smlar to the technque known as crtcal path analyss but has some dstnct aspects. The latter tends to focus on executon bottlenecks, whle the deadlne approach focuses on montorng the rate of approachng the bottleneck stuatons. The deadlne assgned to any partton s a sub-deadlne of the overall deadlne. The followng deadlne assgnment polces are extant. A synchronzaton task cannot be executed untl all tasks n ts parent task parttons are completed. Thus, nstead of T 1 T 8 T 3 T 5 T 7 T 2 T 6 T 10 T 11 [ T 4 T 13 T 14 watng for other ndependent paths to be completed, a path capable of beng fnshed earler can be executed on less expensve servces however these must be scheduled wthout compromsng the requred deadlnes. For example, data can be sent further away for processng on free computers but the Cumulatve sub-deadlne of any ndependent path between two synchronzaton tasks must be coordnated and equal to the overall deadlne. One must assure that every task partton s computed wthn ts assgned deadlne so that the whole workflow executon can satsfy the user s requred deadlne. Assgned sub-deadlnes must be greater than or equal to the expected mnmum processng tme of the correspondng task partton. If the assgned sub-deadlne s less than the mnmum processng tme of a task partton, the expected executon tme wll exceed the capablty that ts executon servces are rated to handle. The executon tmes of tasks such as data movement for the task orented kosks and the subsequent data analyss task vary. The overall deadlne s dvded over task parttons n proporton to ther mnmum data movement or processng tme. Some tasks may only need mllseconds to be completed, and some others may need more than an hour. Thus, the deadlne dstrbuton for a task partton should be based on ts calculated data movement or executon tme. Snce there are multple possble elapsed tmes for every task, use the mnmum processng tme to dstrbute the deadlne. Deadlne assgnment polces can be mplemented on the task partton graph by combnng both breadth and depth algorthms wth crtcal path analyss to compute start tmes, proporton and sub-deadlnes of every task partton. The plannng stage s to generate an optmzed schedule for advance reservaton and run-tme executon. The executon of analyss processng s estmated on the bass of the analyss runtme code and the power of the processng resources. The data movement task tmngs are estmated on the sze of the data to be moved and the congeston and relatve speed of the elements of the network used for the transfer. The schedule allocates every workflow task to a selected servce such that they can meet users deadlne at low executon cost. Workflow schedulng s done by dvdng the entre problem nto consttuent task partton schedulng problems. Once each task partton has ts own subdeadlne, a local optmal schedule for each task partton can be found. If each local schedule guarantees that ther task executon can be completed wthn ther sub-deadlne, the whole workflow wll be completed wthn the overall deadlne. Smlarly, the result of the cost mnmzaton soluton for each task partton leads to an optmzed cost soluton for the entre workflow. Therefore, an optmzed workflow schedule can be easly constructed by all local optmal schedules. There are two types of task parttons: synchronzaton task and branch partton. For synchronzaton, the scheduler only consders one task to decde the servce for executng that task. The obectve functon for schedulng of a synchronzaton task s to fnd the soluton to the sngle gatng. The optmal decson s to select the cheapest 5

servce that can process that task wthn the assgned subdeadlne. If there are multple tasks, the scheduler needs to make a decson on whch servce to execute ts chld task after the completon of the parent task. The optmal decson s to mnmze the total executon cost of the branch and complete branch tasks wthn the assgned sub-deadlne. The obectve functon for schedulng branch can be acheved by modelng the problem as a Markov Decson Process (MDP) [7], whch has been shown to be effectve for solvng sequental decson problems. * The MDP problem s to fnd an optmal polcy π for all possble states. A polcy s a process mappng from an ntal state to a desred state,.e., and acton defnton. Decson makng for fndng an optmal acton for each state s not based on the mmedate utlty of the acton but ts expected utlty, whch s the sum of all the mmedate utltes obtaned as a result of decsons made for transtng from ths state to a termnal state. The value assocated to each state represents the expected utlty of ths state n the MDP. Ths value s calculated recursvely by usng the value of successor states. The value of one state s s: U ( s) = mn{ u( s, a, s') + U ( s')} a As The best acton for state s s: * π ( s) = arg mn{ u( s, a, s') + U ( s' )} a As The computaton of the optmal polcy can be solved by usng a standard dynamc programmng algorthm such as polcy teraton and value teraton (we have used value teraton here). The optmal polcy ndcates the best servces that should be assgned to execute branch tasks under a specfc sub-deadlne. In order to complete workflows and satsfy users requrements, run-tme reschedulng s requred to be able to adapt to dynamc stuatons such as the varaton n avalablty of servces due to falures. Network congeston and orthogonal traffc can often dsrupt the planned arrval of data requred for analyss va the nstant kosk data mover. The key dea for an effectve reschedulng polcy for handlng an unexpected stuaton s to adust sub-deadlnes and re-compute optmal schedules for unexpected task delay level-by-level. The motvaton of the level-by-level task partton approach s to reschedule the mnmum number of task parttons. For example, f the executon of one task partton s delayed, look at ts chld task parttons. If the delay tme can be accommodated by the chld task parttons, reschedulng wll not mpact on ts lower levels. Otherwse, the rest of the delay tme s accumulated to ts successors untl the total delay tme has been dstrbuted. The reschedulng algorthm for a synchronzaton task delay has several steps. Frst, adust the start tme of chld task parttons to be the actual completon tme of the delayed synchronzaton task. Then, check whether the new deadlnes of the chld task parttons can be acheved by comparng ther mnmum processng tmes. If achevable, the planner generates new optmal schedules for the tasks n the chld task parttons based on the new expected executon tmes and reschedulng s stopped. Otherwse, new sub-deadlnes are assgned by usng the mnmum processng tme as the expected executon tme and then new schedules are generated. When the delay cannot be accommodated by the frst level chld parttons, the lower level chld parttons are put nto the queue for further reschedulng. For branch task reschedulng, f a branch task executon s delayed, the optmal schedule for the next branch task of the delayed task can stll be obtaned from the ntal MDP result, accordng to ts current remanng sub-deadlne. The other unexecuted parttons wll not be affected as long as the delay does not exceed the mnmum processng tme of the remanng unexecuted tasks n the branch. In addton to handlng task executon delay, the levelby-level task partton based approach can also be appled for managng other dynamc stuatons such as servce unavalablty and servce polcy change. 4. Performance Evaluaton The performance of QoS-based workflow schedulng algorthm descrbed n Secton III has been evaluated through smulaton usng the FlexSm Toolkt [5]. We conducted several experments by smulatng the structure of a data archve scenaro n a partcle physcs smulaton envronment (see Fgure 3). Every task n the workflow requres a certan type of servce for processng. The workflow system frst dscovers avalable servces for every task va Servce Drectores and then queres the servces to obtan ther data transfer capabltes or processng tme and prce. The elapsed tme of a task on a servce depends on the complexty of the task and the combned capablty of resource used for servce provson. Generally servces wth lower processng tme are delvered at hgher prce because resources wth hgher performance capabltes are observed to be more expensve. For example, the cost for 128 PC qualty Sun servers actng ndependently s two orders of magntude less than that of a sngle enterprse class Sun server, but the performance s only one order of magntude less. The two man dstnctons between the competng systems modeled were the connecton of the warehouse of data to the systems. The comparson s between a local net under drect system control and a wde area net under Grd control. The evaluaton was to ascertan the tme constrants and executon costs. The former ndcates whether the schedule produced by the schedulng approach meets the requred deadlne, whle the latter ndcates how much t costs to schedule the workflow tasks on the smulated servce Grd. 6

Host system access to data RAID 3+ Tape group of 8+3 cartrdges Local Ste Data mrror Temporary dsk copy space Hgh level vrtual devce defnton Data encrypton Data mrror Permanent snapshot copy space HSM Onlne data space that must be moved to the Nearlne space RAID 3+ Tape group of 8+3 cartrdges Network Transmsson RAID 3+ Tape group of 8+3 cartrdges Ste #2 Ste #3 Fgure 3. A workflow for partcle physcs smulaton. Fgure 4 compares the systems usng the four schedulng approaches transferrng the smulaton results to archve and callng back portons for specfc analyss. The frst approach s wth dedcated and captve resources on a local network where the analyss tasks are necessarly somewhat seralzed. The second s wth shared resources on a local network usng nstant kosk to manage sharng so tasks can be parallelzed. The thrd s wth shared resources on a wde network usng nstant kosk to manage sharng wthout kosk data movement to executon local. The fourth s wth shared resources on a wde network usng nstant kosk to manage sharng but schedulng kosk data movement to executon local as addtonal tasks. Table 3. Tme & cost usng vs. system approach System Executon Executon Approach Tme Cost Local/Dedcated 24 hr base Local/Shared 13.5 hr base + 11% Wde/shared/statonary kosk 17 hr base 28% Wde/shared/transferred kosk 11.5 hr base 23% We can see from Table 3 that the fastest executon at the least cost was elusve. However, the wde network and sharng approach s unversally better than the dedcated system approach. Ths s because the partcular network smulated had a wde varety of dfferent processng capabltes wth a broad range costs whle the local system was prmarly expensve manframe processors. The local tme of 24 hours was the normalzed work done at that ste n a sngle day. Some mprovement was made n amount of work accomplshed when the resources n the data warehouse were shared, but ths requred addtonal processng to handle the nstant kosk preparatons. The local system s clearly somewhat I/O bound thus the notceable advantage for data sharng approach. The wde network allowed a broader range of sharng because a large set of much less expensve processors was made avalable. Ths allowed the greatest amount of parallel analyss work to be done wthn the bounds of the network capabltes to transfer the data to the analyss locatons. Pre-stagng the nstant kosks to the analyss local pror to actual use allowed the processng to speed up further but ncreased the overall system resource use costs because of multple kosk resdence n remote locatons. 5. Concluson and Future Work Managng comprehensve data analyss of subsets of the data n a warehouse over a Grd network s a new operatng paradgm. Workflow management on utlty servce Grds has also not been addressed n exstng Grd workflow systems. In ths paper, we presented a QoS-based workflow management system. In ths, we proposed a novel QoSbased workflow schedulng approach that utlzes nstant copy to make on demand data kosks for task processng whch mnmzes the cost of executon whle meetng the deadlne. We also descrbed task parttonng and overall deadlne assgnment for optmzed executon plannng and effcent run-tme reschedulng. We have utlzed a Markov Decson Process approach to schedule sequental workflow task executon. The prototyped system uses runtme reschedulng to handle servce agreement volatons. In future work, we wll further enhance our system to evaluate more dynamc scenaros such as aucton bd prcng for underutlzed resources. References [1] Belsan, J.S., Mllgan, C.A., O Bren, J.T., Rudeseal, G.A., Data Record Copy System for a Dsk Drve Array Data Storage Subsystem, US patent 5,410,667 fled Aprl 17, 1992 [2] Benkner, S., Brandc, I., Engelbrecht, G., Schmdt, R., VGE - A Servce-Orented Grd Envronment for On-Demand Supercomputng, Ffth IEEE/ACM Internatonal Workshop on Grd Computng (Grd 2004), Pttsburgh, PA, USA, November 2004. [3] Buco, M.J., et al, Utlty computng SLA management based upon busness obectves, IBM System Journal, Vol. 43(1):159-178, 2004. [4] Buyya, R., Abramson, D., and Gddey, J., Nmrod-g:An archtecture for a resource management and schedulng system n a global computatonal Grd, proceedngs of the 4 th Internatonal Conference and Exposton n Hgh Performance Computng n Asa-Pacfc Regon, Beng, Chna May 17, 2000. [5] Lubow, L., and Mllgan, C., FlexSm: A system for smulaton and vsualzaton of dstrbuted resources The Smulaton Journal, Jan, 2002. [6] Mllgan, C., and Hodge, L., Managng Vrtual Data Marts wth Meta-ponter Tables Proceedngs of the Hawa Internatonal Conference on Systems Scence HICSS2003, January, 2003. [7] Sutton, R.S., and Barto, A.G., Renforcement Learnng: An Introducton, MIT Press, Cambrdge, MA, 1998 [8] Thckns, G., Utlty Computng: The Next New IT Model, Darwn Magazne, Aprl 2003 [9] Yu, J., et al, QoS based schedulng of workflow applcatons on servce Grds, IEEE Transactons on Magnetcs, summer 2005. 7

[10] Yu, J., Venugopal, S., and Buyya, R., A Market-Orented Grd Drectory Servce for Publcaton and Dscovery of Grd Servce Provders and ther Servces, Journal of Supercomputng, Kluwer Academc Publshers, USA, 2005. 8