DESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID

Size: px

Start display at page:

Download "DESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID"

Florence Goodman
6 years ago
Views:

1 I J D M S C L Volume 6, o. 1, January-June 2015 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID S. JAMUA 1, K. REKHA 2, AD R. KAHAVEL 3 ABSRAC Grid workflow execution is approached as a pure best effort scheduling problem that maps the activities onto the Grid processors based on appropriate optimization or local matchmaking heuristics such that the overall execution time is minimized. Even though such heuristics often deliver effective results, the execution in dynamic and unpredictable Grid environments is prone to severe performance losses that must be understood for minimizing the completion time or for the efficient use of high-performance resources. In this paper a new systamtic approach is developed for understanding the sources of the performance losses occurred when executing dynamic workflows in grid environments. Keywords: Distributed system, Grid workflow, Grid performance, Monitoring 1. IRODUCIO Workflows are usually modeled as a graph of activities using graphical composition tools. A scheduler service supported by a Broker or information service is used for optimized use of resources. hese scheduled services are then given to the Grid engine for execution of activities in Grid environments. But this traditional approach does not fully consider the dynamic and course-grain nature of Grid environments such as overheads due to large latencies (several seconds), unpredictable queuing delay, sudden unavailability of existing resources, external loads on Grid resources. Only the pure scheduling strategy is not sufficient for achieving good execution time in Grid environments. We must understand what really happens during the execution of workflows. So performance analysis tools are important for understanding the behaviour and reasons for the performance losses. So that we can improve the run time performance of the Grid environment.. In this paper we propose a systematic approach for developing a performance analysis tool for heterogeneous and distributed grid environment. A theoretical reference parameter called Ideal Execution ime is developed for providing bound for 1,2 Pre final year (Computer Science and Engineering), Government College of Engineering,irunelveli 3 Asst.professor Dept.of Computer Science and Engineering Government College of Engineering,irunelveli

2 64 / S. JAMUA, K. REKHA, AD R. KAHAVEL the lowest execution time in grid environments. hereafter, we present a novel fine-grained classification of non-overlapping workflow overheads covering middleware, data transfer, loss of parallelism, and activity-specific overheads, which contribute to the ideal time to produce the measured Grid execution time of the workflow. We develop a performance tool based on event correlation. Event correlation allows us to reduce the large no of resource specific messages to fewer and more meaningful set of messages. We provide classification of various sets of overheads that occur when executing in Grid environment and validates them using a scalability experiment of a real time application. And we describe a performance analysis tool for analyzing previously described overheads based on event correlation using distributed matrix application. 2. MODEL In this section, we formally define a workflow application based on a restricted directed graph-based structure that represents the result of our experience in modeling and executing several real-world scientific workflows. his model provided us with a good basis for the clear definition of many of the overheads. We model a scientific workflow as a directed graph odes = is the set of activities, (1) C edges = n 1 i [1... n] i ( i, i 1) ( j, k) (2) i 1 j k 1 is the set of control flow dependencies, D Edges ( s, d, D port)` (3) s, d odes is the set of data flow dependencies, An activity can be of the following type, representing workflow regions. 1. Computational activity 2. Parallel region PAR, consisting of set of independent activities. 3. Subworflow We consider the Grid as an aggregation of heterogeneous Grid sites. A Grid site consists of a number of compute and storage systems that share the same local security, network, and resource management policies. Our experimental Grid environments consisting of homogeneous Grid computers within a Grid sites which can the capable of providing their services to remote client computers. Each parallel computers are considered as single computing resources using a local resource management system implemented in Grid Broker computer.

3 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID / 65 Work Flow activities are typically legacy codes that can be remotely accessed and instantiated using back end application providers, which in our environment called Grid Brokers. Data transfer activities are client systems based on java socket system calls. In this execution model, we consider only the work performed by the workflow computational activities as useful execution time. he time required for any other task is considered a temporal overhead. o measure and compute the temporal overheads, a distributed performance analysis service collects online events generated by sensors distributed across Grid sites, which report on the availability and operation of the middleware services. Figure 1: Example Grid System In our model we use no of clients and M no of resource systems. Resource systems run the dedicated services to client. Client system which wants the service sends their request along with the required data in specified format to resources via a Broker. Broker is responsible for scheduling, monitoring and act as Information System of this Grid model. Broker maintains a dynamic repository which contains information about availability details and binding information of the resource systems.

4 66 / S. JAMUA, K. REKHA, AD R. KAHAVEL We formally define an event generated by distinct sensor as a set of attributes A = {a 1,...a n }. We represent an event as a quadruple: e = <s, t, id, A> where S is the sensor that triggers the event, is the timestamp, id the unique ID of the event, and A is set of private attributes that characterize the event. For example, if a sensor called engine associated with the broker submits at time instance t with the event ID 13, the following event will be raised, e = <engine,t,13, ipadd, out>. On the other hand the following event is created by client to indicate it sends data to broker e = <client, t, eventid, ipadd, {out, broker}> 3. WORKFLOW EMPORAL OVERHEADS o systematically understand the overheads that occur when executing scientific workflows in Grid environments, we model the real execution time w of a workflow W as the sum of an ideal execution time ideal and a set of temporal overheads Ow that originate from various sources: W = ideal + O W (4) Describing the sources of the overheads, classifying them in a detailed hierarchy, and measuring them in a systematic manner are the scope of our analysis. 3.1.Ideal Workflow Execution ime We introduce a so-called ideal execution time that defines a theoretical lower bound for the Grid workflow execution time, assuming the following parameters as ideal premises For our analysis: 1. Zero network latency, 2. Infinite network bandwidth, 3. An optimal workflow schedule, and 4. A sufficiently large number of processors to simultaneously execute all activities of parallel regions (that is, one separate processor per activity). We define the ideal execution time of a computational activity scheduled on a processor P as the fastest wall clock execution time P measured on all processors available on the Grid of the same architecture as P. P = min { } (5) ideal P Grid

5 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID / 67 he general technique we use to measure P is to submit the executable to the Broker and collect the wall-clock time from the job s standard error stream. We ignore data dependencies in the ideal time since we consider them as an overhead: D-edges = 0. he ideal execution time of a parallel region PAR is the fastest ideal execution time of all its activities: ideal = min { } (6) idel par par par Finally, the ideal execution time of a workflow W is the sum of the ideal execution times of all its activities odes: ideal W = odes (7) ideal We define the total overhead of a workflow W as the difference between its measured Grid execution time W and the ideal time: 3.2. Overhead Classification ideal O w = (8) w w o understand the losses of performance in scientific workflows, we split the total overhead into a hierarchy based on four main categories: middleware, data transfer, loss of parallelism, and activity related Middleware Overhead he middleware overhead is due to the work performed by the middleware services to support the proper execution and completion of the workflow, which we further divide into several components based on the service functionality. Resource Brokerage: his represents the time required by the Resource Broker to query the information service and provide to the Scheduler the processors and activity deployments needed to execute the application. Additionally, this overhead has an important latency component (few seconds), mostly due to the mutual host authentication. his service latency is a common overhead component present in all our middleware services. Performance prediction: his represents the time to provide forecast information about the execution time of individual activities on the Grid sites indicated by the Scheduler, for example, using a polynomial fitting heuristic based on historical or training data. Scheduling: his represents the time to appropriately map the workflow activities onto the Grid resources, which includes the following two sub overheads: 1) a scheduling algorithm, which represents the time required to compute a schedule and 2) rescheduling, which represents the time needed to make a new scheduling decision, for example, because of a performance contract violation or if the workflow changes its runtime structure.

6 68 / S. JAMUA, K. REKHA, AD R. KAHAVEL Execution management: his represents the time to coordinate the execution of workflow activities based on the control and data flow dependencies, which we divide into two categories: Control of parallelism: his represents the time required to fork a set of activities at the beginning of a parallel region or to join them at the end. Job Preparation: his is an overhead that we split into three categories corresponding to the time for (1) partitioning and simplifying the workflow such that multiple activities scheduled on the same site are aggregated into one composite activity and executed as one single job submission (2) archiving, compressing, or uncompressing multiple files that have to be transferred between data-dependent activities or partitions scheduled on different sites to reduce latencies and increase bandwidth utilization; and (3) creating remote directory structures required to execute activities that wrap existing peculiar legacy applications Data ransfer Overhead he data transfer overhead is generated by any kind of data transfer including input/ output file staging between the local computer and the remote Grid site, database access (for example, to store and access historical execution data),third-party file transfer upon large data dependencies, user input (that is, when user interaction is required), imbalanced parallel data transfer (for example, due to different data sizes or network bandwidth), and lost transfer upon workflow rescheduling or rollback. Presently, all these overheads include any network traffic or interruptions since, unlike processors; the wide area network is a shared resource over which we have no control using our current sensors Loss of Parallelism Overhead he loss of parallelism overhead is introduced by parallel regions and has various causes. Serialized Block: his occurs when, as a result of a Scheduling decision, n independent activities (for example, part of a parallel region) are scheduled onto the same Grid site consisting of p processors, where p < n. In this case, the Scheduler introduces the so-called runtime schedule dependencies that prohibit two activities from running simultaneously on the same processor. We call the set of activities of a parallel region PAR serialized through runtime schedule dependencies as a serialized block: B = {1, n par S( 1 ) =... = S(n)}

7 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID / 69 Where S() denotes the schedule of activity. he execution time of a serialized block is the sum of the serialized activities: B = (9) B For example, assume a parallel region consisting of seven parallel activities in Figure 2. Figure 2: Parallel Region Overhead PAR = 7 i (10) i 1 such that S(1) =S(4) = S(7) = P1, S(2) = S(5) = P2, S(3) = S(6) = P3, and S(3) = S(6) = P3 (where P1 and P2 are part of a dual shared-memory computer). Since the number of parallel activities exceeds the number of processors available, the Scheduler generates three serialized blocks by enhancing the Workflow with the following runtime schedule dependencies : {(1, 4), (4, 7), (2, 5), (3, 6)}. Load imbalance: his occurs in the context of a parallel region when some computational activities finish faster than others and leave the allocated processors idle. We define the load imbalance overhead of a parallel region PAR as the difference between the maximum serialization block end time and the theoretical ideal execution time of the parallel region on the set of processors. Replicated work: his is a technique often applied when performing the same computation redundantly on multiple sites in parallel is faster than computing it on the fastest Grid site and then broadcasting the data. Since replicated jobs are never synchronized and therefore do not produce load imbalance, we calculate the replicated job overhead of a set of identical activities PAR as the ideal time for computing the

8 70 / S. JAMUA, K. REKHA, AD R. KAHAVEL entire replicated work (that is, the harmonic mean of the execution times of all replicated jobs divided by the number of jobs) minus the end time of the fastest job presenting the only useful computation: Activity Overhead 1 ORJ PAR = 1 PAR min { } PAR he activity overhead comprises all overheads that are internal to individual activities, for example, those that we extensively addressed in the previous performance analysis research for parallel programs (for example, communication, load imbalance, cache misses, and synchronization). In addition, we define two new Grid-relevant overheads: 1. he external load of a computational activity on a processor P, which we compute as the difference between the current measured wall-clock time and its ideal execution time: (11) P ideal OEL = (12) 2. lost computation, an important overhead that commonly occurs due to their execution of any computational activity (for example, in case of failures, rollback to the last checkpoint, or when new and substantially more powerful Grid sites become available), which we define as the time difference between the earliest estarted and the latest efailed by the Broker Region-Based Overheads Since every workflow activity produces its own set of overheads, an obvious question is how to aggregate these overheads from individual activities up to parallel regions and the overall workflow level. Let O denotes an arbitrary overhead of an activity. We define the same temporal overhead for an enclosing parallel region O PAR as the sum of the contributed overheads Ocon of all individual activities PAR. he contributed overhead of an activity to an enclosing parallel region is the temporal overhead O averaged across the number of processors used in executing the parallel region. (hat is, as indicated by the Scheduler, which is not necessarily the full available Grid), where each processor P is weighted with its relative speed (execution time of on processor P): O = O, where O PAR PAR CO CO O P Grid And P denotes the wall-clock execution time of activity on processor P. P (13)

9 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID / 71 Similar to the workflow ideal execution time, we compute the value of any overhead such as middleware, load imbalance, serialization, data transfer, external load, or replicated work for any workflow region by summing the overhead values computed for the underlying computational activities, parallel regions, and sub workflows Unidentified Overheads 0 = odes O (14) Since we carefully designed the overheads as non-overlapping, we can compute the total identified overhead O identified w as the sum of all measured overheads for the entire workflow. We call the difference between the theoretical total overhead O W, and the total unidentified overhead. unidentified identified O W = OW O W (15) Minimizing the unidentified overhead is one important goal of our analysis effort. A high unidentified overhead value indicates that the analysis is unsatisfactory, and further efforts are required to identify new sources of overhead in the workflow execution ormalized Metrics ormalized metrics are used for quantifying the overhead importance with respect to entire workflow execution. We define the severity of a workflow O as the value of the overhead with respect to Grid execution time. SEV O = O W We define the workflow Speed up as the ratio between fastest single-site workflow execution time M Seq and actual Grid execution time W S = min { M } seq M Grid W Further, we define the workflow efficiency as the speedup normalized against the number of Grid sites used, where each site M is weighted with the speed up of the corresponding single-site execution time: (16) (17) S E = M Grid he efficiency formula therefore becomes, S M, where : S M,' M Grid M seq, M seq min { } (18) E = 1 W M ( ) M Grid seq 1 (19) he fastest Grid site has a weight of one, whereas the slowest Grid site has the smallest weight closest to zero.

10 72 / S. JAMUA, K. REKHA, AD R. KAHAVEL 5. COCLUSIO In this paper, we presented a analysis model consisting of a theoretical execution time and the detailed hierarchy of overhead to help the application developers to understand the various sources of bottlenecks that affects the execution of workflows in heterogeneous grid execution model. he purpose of the work is to adjusted well known normalized metrics from parallel processing to the Grid computing scope, including overhead severity, speed up, and efficiency, which are invaluable parameters to be considered before scheduling when the efficient use of resources is an important issue. In the future we are planned to develop solutions for overcome this overheads which affects the execution time of the application. REFERECES [1] Radu Prodan and homas Fahringer, Overhead Analysis of Scientific Workflows in Grid Environments. IEEE rans. on Parallel and Distributed Systems, 19, o. 3, March 2008 [2] K. Czajkowski et al., Grid Information Services for Distributed Resource Sharing, Proc. 10th IEEE Int l Symp. High Performance Distributed Computing (HPDC), [3] R. Vaarandi, SEC-A Lightweight Event Correlation ool, Proc.Workshop IP Operations and Management (IPOM), [4] JAVA Complete Reference by Herbert Schildt.

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last