A novel task scheduling algorithm based on dynamic critical path and effective duplication for pervasive computing environment

Size: px

Start display at page:

Download "A novel task scheduling algorithm based on dynamic critical path and effective duplication for pervasive computing environment"

Ruby Peters
5 years ago
Views:

1 WIRELESS COMMUNICATIONS AND MOBILE COMPUTING Wirel. Commun. Mob. Comput. 2010; 10: Published online 12 December 2008 in Wiley Online Library (wileyonlinelibrary.com).717 A novel task scheduling algorithm based on dynamic critical path and effective duplication for pervasive computing environment Junzhou Luo, Fang Dong,, Jiuxin Cao and Aibo Song School of Computer Science and Engineering, Southeast University, Nanjing, P.R. China Summary In order to effectively utilize massive heterogeneous resources and provide transparent computing capability to upper applications, task scheduling as the key issue of pervasive computing system becomes significantly important. Previous proposed priority and duplication based task scheduling algorithms, which can be applied in pervasive computing environment, usually have following limitations: critical path cannot be calculated accurately while neglecting the effect of resource availability in scheduling; in duplication based resource allocation stage, duplications without restriction would lead to some negative effects on final schedule length (SL). For the purpose of solving these problems, a novel task scheduling algorithm based on dynamic critical path (DCP) and effective duplication, called DCPED, is presented in this paper. In DCPED, a more accurate DCP calculation method which takes resource availability into account is introduced. Meanwhile an effective task duplication strategy is proposed to eliminate ineffective duplications and make an optimized schedule result by using space compression technique and dynamic critical path length (DCPL) based evaluation technique respectively. Finally, simulation results show that DCPED can outperform previous algorithms significantly in NSL and speedup rate metrics. Especially, it is very effective for utilizing computing resources and scheduling the fine-grain and large-scale workflow applications in pervasive computing system. Copyright 2008 John Wiley & Sons, Ltd. KEY WORDS: pervasive computing; task scheduling; dynamic critical path; effective task duplication 1. Introduction With the widespread of Internet, many critical resources and computer devices, such as handheld and wearable computers, wireless LANs, and devices to sense and control appliances are increasingly shared through network. Meanwhile, with computing application penetrating into everyday life, people expect to use the information service anytime and anywhere with no impediment. Pervasive computing is accordingly generated. The whole concept of pervasive computing is to provide transparent computing ability to users as they wish anywhere and anytime, thus improving human experience and quality of life without explicit awareness of the underlying communications and computing technologies [1]. Correspondence to: Fang Dong, School of Computer Science and Engineering, Southeast University (Si Pai Lou 2#), Nanjing , P.R. China. fdong@seu.edu.cn Copyright 2008 John Wiley & Sons, Ltd.

2 1284 J. LUO ET AL. Nowadays, several pervasive computing projects have emerged at major universities, including project Aura [2] at Carnegie Mellon University; endeavour [3] at the University of California at Berkeley (UC Berkeley) and Oxygen [4] at the Massachusetts Institute of Technology (MIT). Although each of these projects addresses a different mixture of issues in pervasive computing, in common, the main objective of them is to provide the transparent computing capability to different people for executing users complex tasks without concerning the underlying technologies. The architecture of pervasive computing can be mainly divided into two layers: interaction layer and computing layer. The main function of interaction layer is to realize interaction process between users and pervasive environment, obtain relevant contexts and transmit users requests from pervasive devices (PDA, sensor, etc.) to lower computing platform. Meanwhile the main function of computing layer is to provide transparent computing capability to upper layer to execute users jobs or response and adjust the pervasive system according to the changing context. Obviously, it is very important to organize the massive computing resources located at largescale heterogeneous environment to provide uniform transparent computing ability to users. Therefore, task scheduling strategy is significantly important in pervasive computing system. In most cases, a pervasive computing application (or job) consists of several atomic computation sub-tasks constrained by certain processes (e.g. sequential order or parallel order). For example, in pervasive learning system, in order to provide appropriate education resources to users from massive education resources, the recommendation mechanism is necessary [5], and the main process can be described in Figure 1. From Figure 1 we can deduce that generally the pervasive applications (or jobs) can be denoted as a workflow (or task-flow) represented by directed acyclic graph (DAG) in which nodes represent application tasks and edges represent inter-task data dependencies. And the objective of task scheduling is to map these tasks onto different heterogeneous resources and order their executions so that precedence relations between tasks are satisfied and a minimum overall completion time can be obtained. However, as different with traditional parallel or distributed computing environment, computing resources in pervasive environment are located in a wide area network, meanwhile the transmission medium and communication mode between them are various, and thus not only computing capabilities but also bandwidths of these computing resources are heterogeneous. So far, most traditional task scheduling strategies are usually based on homogeneous environment in which computing capability of each node is the same, and communication bandwidths between them are always neglected or treated homogeneously. Thus, at present, there is not a sophisticated technique to achieve the goal of task scheduling perfectly in large-scale pervasive computing environment. So, how to achieve effective task scheduling becomes key challenge issue in pervasive computing environment, and closely links with the system performance. In a broad sense, the scheduling problem exists in two forms in pervasive computing systems: compile-time scheduling and run-time scheduling. In compile-time scheduling, task profiling and analytical benchmarking tools play an important role in providing the right estimation of costs on tasks and links, prior to scheduling [7]. Sometimes, the estimation of performance may not be obtained exactly due to the dynamic character of resources in pervasive computing environment. By contrast, run-time scheduling is more likely to make realistic decisions by considering the most up-to-data schedule information. Yet on the other Fig. 1. The main process of resource recommendation.

3 A NOVEL TASK SCHEDULING ALGORITHM 1285 hand this approach neglects the application graph s structure and always chooses the most appropriate solution for single task, and it is usually undermined by the unavoidable overheads about the iterative realtime information retrieval and the multi-step task scheduling and assigning, so the whole application may not be scheduled optimally. Therefore, the compiletime scheduling is considered in our work, as it cannot only avoid the run-time scheduling overheads but also can employ the overall structure information of applications effectively. Hence it can afford to use a sophisticated scheduling heuristics to generate much better scheduling results. In addition, for the problem about inexact estimation, the compile-time scheduling technique can be used to generate a good initial scheduling scheme; and then the relevant rescheduling techniques might be used to improve it at the run time of certain application for adapting the dynamic character of pervasive computing systems. In this paper, we will not focus on the rescheduling technique. Because of its extreme importance, the compiletime scheduling problem has been extensively studied and massive heuristic algorithms were proposed in the literature [6 22]. Generally, these approaches are mainly priority-based scheduling [6,8 10] and duplication-based scheduling [7,11 19]. The prioritybased scheduling is a two-step approach. In the first step, the proper priorities are assigned to tasks and the task with highest priority is chosen for scheduling. And in the second step, the most proper resource, usually the one getting the earliest finish time, is selected to execute this task. Duplication-based techniques, on the other hand, are found to be quite effective for messagepassing systems [18]. The main idea of task duplication is to duplicate the more critical tasks to some resources redundantly in order to minimize communication costs among dependent tasks and further reduce the schedule length (SL). In most cases, the combination of task duplication technique and priority-based scheduling algorithms can overcome others obviously [7,11 19], in which duplication phase can be used as additional part of the second stage of priority-based algorithm. In recent years, although several priority and duplication based heterogeneous scheduling algorithms have been proposed (such as HLD [7], LDBS [11] and HCPFD [14]), there also exists several problems within them. In task selection stage, for the purpose of solving the problem that critical path may change dynamically in scheduling process, Kwok et al. [10] used the dynamic critical path (DCP) mechanism to effectively select tasks for scheduling in the environment with unbound number of homogeneous resources. But in pervasive computing environment, the resources are heterogeneous and the number of them is bounded. Thus, the critical path may also be affected by the availability of resources. Unfortunately, the previous researches pay little attention on this issue. So if we apply these algorithms into pervasive computing systems directly, the critical path cannot be obtained accurately in scheduling process. In duplication based resource allocation stage, duplication without restriction would lead to some negative effects on the final SL. In previous duplicationbased algorithms, the reasonable trade-off between the benefit of reducing inter-task communication costs and the damage of occupying the resource available time was always neglected, which would produce a lot of ineffective duplications. Therefore, in order to reduce the overall SL, some duplication should be restricted. In Reference [17], with restricting the duplication conditions, SD algorithm could avoid the ineffective task duplications to some extent. However, it was only applicable to homogeneous environment and its simple strategy may not only hinder the further optimization but also have negative effects on the following tasks scheduling. In this paper, in order to overcome these problems, a novel task scheduling algorithm based on the dynamic critical path and effective duplication (DCPED), is proposed. In task selection stage of this algorithm, a more accurate critical path calculation method is introduced, in which the traditional tlevel and blevel [6] (sum of mean computation and communication costs along the longest directed path from the concerned task to the entry task and exit task respectively) are extended under consideration of the effect of resource availability. Meanwhile, a task selection strategy which is recursively traversing the unscheduled predecessors is presented. In duplication based resource allocation stage, all ineffective duplications are divided into two types: forbidden duplication and redundant duplication. Then a new duplication strategy is proposed to eliminate these ineffective duplications by using space compression technique and dynamic critical path length (DCPL) based evaluation technique respectively; meanwhile, effective duplications can be made as much as possible to minimizing the final schedule result. The main contributions of our works lie in: 1. Introduce a novel DCP calculation method in consideration of the effect of resource available time.

4 1286 J. LUO ET AL. 2. Present a task selection strategy with recursively traversing the unscheduled predecessors. 3. Divide all ineffective duplications into two types: forbidden duplication and redundant duplication, and propose a new duplication strategy to eliminate them, furthermore to lead much more effective duplications. The remainder of this paper is organized as follows: in the next section, we provide taxonomy of task scheduling algorithms and the related works. In Section 3, we describe in detail the schedule model and the related terminology. Section 4 elaborates and analyses our novel task scheduling algorithm based on the heterogeneous DCPED. Section 5 presents the simulation results and performance analysis. The summary of the research and the direction of future work are given in Section Related Work The renowned works available on scheduling of precedence constrained task graphs are based on, simple yet effective, priority and duplication-based heuristics like heterogeneous earliest finish time (HEFT) [6], dynamic critical path algorithm (DCP) [10], heterogeneous critical parents with fast duplicator (HCPFD) [14], selective duplication (SD) [17] and levelized duplication based scheduling (LDBS) [11]. Thereinto, the first two are just priority-based heuristics and the rest three are the priority and duplication combined heuristics. The main difference between priority-based heuristics lies in their strategies to select candidate task for scheduling. Some algorithms select the candidate task with largest blevel, while some algorithms attempt to select the candidate task following the critical path strategy. To be more specific, the HEFT (O(pt 2 ), where p denotes the resource number and t denotes the task number of DAG) algorithm assigns priorities to the tasks on the basis of blevel value. And the task with highest priority is scheduled on a processor that completes its execution at the earliest time. But this approach does not consider the critical path of DAG, therefore cannot obtain the proper priority. DCP (O(t 3 )), as a dynamic priority scheduling heuristic in homogeneous systems, selects the candidate task by recalculating the current DCP at each schedule step. So the more actual priority of certain task can be obtained. But as mentioned above, this approach neglects the resource availability. Therefore, it cannot obtain the accurate priority of tasks yet. In priority and duplication combined heuristics, the main difference between them lies in their strategies to duplicate tasks. To reduce the start time of tasks, some algorithms duplicate only the critical immediate predecessor, while some algorithms attempt to duplicate the whole immediate predecessors. To be more specific, the HCPFD algorithm (O(pt 2 )) assigns priorities to tasks on the basis of static critical path. And only the task s critical immediate predecessor is considered to duplicate. Therefore, although it has a relative low complexity, it cannot make an effective duplication. The SD (O(pt 2 d max + e)) algorithm also assigns priorities to tasks on the basis of static critical path (d max being the maximum number of immediate predecessors of tasks). In duplication based resource allocation stage, a duplication condition is proposed to eliminate the ineffective duplication: it duplicates the tasks only if the ECT(T can ) can be improved. However, this simple approach cannot eliminate the ineffective duplications completely and the multi-level predecessors are not considered to be duplicated. In additional, there is another algorithm called HLD [7] which is the extension version of SD for heterogeneous environment. LDBS (O(p 3 et 3 )) is a dynamic priority and duplication based scheduling heuristic that tasks are scheduled level by level starting from the top. In the task selection stage, the candidate task can be selected with the earliest start time. And in duplication based resource allocation stage, all immediate predecessors are considered to be duplicated at each resource to minimizing the candidate task s completion time. However, due to the levelized approach as adopted by both of the LDBS algorithms, this priority gets localized to a particular precedence level, which may not reflect the true priority for scheduling a task. And meanwhile, its duplication strategy is not very effective but more complexity. The brief characteristics of these renewed schedule algorithms are summarized in Figure 2. After analysing these algorithms, we can indicate that in task selection process, the correct priority of tasks can not be obtained due to neglecting the availability of heterogeneous resources when calculating DAG s critical path; meanwhile in the duplication process, only the immediate predecessors were considered to be duplicated and how to eliminate the ineffective duplications was always ignored. Consequently, the need of an algorithm which can accurately calculate

5 A NOVEL TASK SCHEDULING ALGORITHM 1287 Fig. 2. Some priority and duplication-based scheduling algorithms and their characteristics. the DAG s critical path to select the proper candidate task and exploit benefits of duplication with eliminating the ineffective duplications is felt great. 3. Scheduling Problem Formulation In general, the form of scheduling system model in pervasive computing environment consists of a large number of application programs submitted by users and a target computing resource environment. Many important pervasive computing applications fall into the category of workflow (or task-flow) applications, examples, Gauss elimination [6] and mean value analysis, [10] etc. Instead of the application being a single large component doing all workloads, the workflow application consists of several interacting tasks that need to be executed in a certain partial order for successful execution of the application as a whole [22]. In most cases, a certain application can be represented by a DAG, as a triple G = (T, R, C), in which vertexes represent tasks and edges represent inter-task data dependencies. Thereinto, T is the set of t partitioned tasks, R represents a partial order on the task set T such that if T i R T j, then task T i must complete its execution before T j can start, C is a t*t matrix of communication data, where C i,j is the amount of data required to be transmitted from task T i to T j if these two tasks are assigned to different resources. Without loss of generality, assume that there is only one entry task (without any parents) and one exit task (without any successors) of a DAG. Figure 3 gives the task graph structures of several simple workflow applications. The target computing resource environment consists of a set P with p heterogeneous resources connected in a fully connected topology in which all inter-resource communications are assumed to perform without contention. B is a p*p matrix of bandwidth, where B m,n is the bandwidth between P m and P n. Communication links between each resources are assumed contention free, and allowing concurrent computation and Fig. 3. The task graphs structure of different workflow applications. communication. Further, the communication overhead between two tasks scheduled on the same resource can be ignored. The objective of scheduling is to allocate each task in the task graph to a resource, and assignment of a start time so that the SL or makespan is minimized. It can be defined as Equation (6). In addition, the necessary parameters used in the algorithm are mathematically described in Figure Dynamic Critical Path and Effective Duplication-based Task Scheduling Algorithm As every priority and duplication-based schedule algorithm, the DCPED algorithm consists of two stages, the task selection stage and task duplicationbased resource allocation stage. In the first stage of DCPED, a novel and much more accurate DCP calculation method is proposed. It takes into account the available time of resources and makes an extension of the traditional tlevel and blevel. At each schedule step, the current DCPL is calculated and the candidate

6 1288 J. LUO ET AL. Fig. 4. Mathematical description of parameters. task (T can ) can be identified by using a recursion-based selection strategy. In the second stage of DCPED, all ineffective duplications are divided into two kinds: forbidden duplication and redundant duplication. And these two kinds of ineffective duplications can be eliminated meanwhile effective duplications can be made as much as possible by using space compression and DCPL evaluation techniques. Then, T can is assigned to certain resource with minimum DCPL for obtaining a better final SL Task Selection In DAG schedule problem, the critical path length [10] potentially determines the SL. Therefore, in order to minimize final SL, the tasks located at critical path should be assigned higher priority. But in scheduling process, the task graph s critical path may change dynamically. In homogeneous system, the main reason leading to this problem is that: the communication cost between two tasks can be neglected when they are scheduled onto the same resource. For the purpose of solving this problem, Kwok et al. [10] used DCP mechanism to effectively select tasks for scheduling in unbound number homogeneous resources environment. In Reference [10], after each task been scheduled, the current DCP is recalculated and then the candidate task can be selected under certain strategy. But in pervasive computing system, as a bounded number of heterogeneous resources environments, the critical path may change dynamically by two more reasons: 1. With scheduling process going on, the mean value of unscheduled tasks communication and execution cost which were used to calculate the former critical path can be substituted by the determinate value gradually. 2. The earliest start time of certain tasks might be restricted by the availability of resources. And with the process of scheduling going on, the workload assigned on the certain resource may increase gradually. Thus, in order to minimize the final SL, the DCP should be calculated by a new method to effectively grasp the dynamic changes during the scheduling process. And then, the earliest unscheduled task located on the current DCP should be selected as the preference task (T pre ) for considering to be scheduled at current step. In order to calculate the DCP and identify T pre, first of all, the length of DCP via T i can be defined as follows: Definition 1. The length of DCP via T i, which can be denoted as DCPL (T i ), which is the sum of T i s anticipated start time (AS) and anticipated remnant execution time (AR), where AR denotes the anticipated execution time from this task to T exit. Meanwhile, according to the scheduling attributes, we can find that the current DCP either passes through the ready tasks set (R) or the partial ready tasks set (PR). Thus based on Definition 1, the current global DCPL can be defined as: DPCL = max {DCPL(T i)}. Furthermore, the T pre can be T i R PR identified as the task T i which belongs to either R or PR set and possess the maximum DCPL(T i ). In pervious algorithm, the relevant AS and AR are usually represented by Tlevel(T i ) and Blevel(T i ) respectively. However, we can find that the calculation of Tlevel neglects the current resource available time.

7 A NOVEL TASK SCHEDULING ALGORITHM 1289 Fig. 6. The calculation of HD-Blevel. Fig. 5. Deviation between Tlevel(T i ) and EST(T i ). As shown in Figure 5, since there are several workloads on the certain resource, the actual start time of T i might be delayed from a to b. Hence, AST(T i ) will be much later than the anticipated data arriving time of T i s CP denoted as Tlevel(T i ). For the purpose of taking consideration the resource availability, for each ready task T i, the AS can be denoted by the mean value of EST(T i ) on each resource, as MEST(T i ). The details can be defined as follows: MEST(T i ) = 1 P (EST(T i, P 1 ) + EST(T i, P 2 ) + +EST(T i, P P )) = 1 EST(T i, P m ) (7) P m (1, P ) Extending them to partial ready task set, the dynamic anticipated start time of each ready and partial ready task in the heterogeneous environment, called HD- Tlevel(T i ), is defined. Definition 2. HD-Tlevel(T i ) is the extension of Tlevel under consideration of resource available time. The HD-Tlevel(T i ) can be calculated recursively by traversing the task graph upward from T i. The recursion equation is presented as follows: where US denote the unscheduled task set, S denotes the scheduled task set and R denotes the ready task set. On the other hand, the resource available time may also affect the anticipated remnant (AR) execution time. In pervious algorithms, the AR of T i are usually denoted by blevel(t i ), and assume that blevel value will not subject to change until T i is scheduled. However, as the number of resources is bounded and one task may be assigned to a schedule hole for execution, so it may lead to the increasing of AR. As shown in Figure 6(1), when T i is scheduled on P m, we can think that the AR of T i may not increase. But in Figure 6(2), when T i is scheduled on P m, the resource available time after T i is occupied by T 1 and T 2. So, the relevant AR value may increase. Thus, in this paper, the dynamic AR execution time for heterogeneous system about T i on P m called HD-Blevel(T i,p m ) is introduced. Definition 3. HD-Blevel(T i, P m ) denotes the dynamic AR execution time for heterogeneous system. It is an extension of blevel meanwhile the effect of resource available time is taken into consideration. As it is hard to estimate the exact remnant execution time of T i, only an approximate method is proposed in this paper. The calculation rule of ready tasks HD-Blevel can be defined as follows: Rule 1 (the calculation of HD-Blevel). Under the same situation as Figure 6(2), assume that the initial value of HD-Blevel(T i,p m ) is Blevel(T i ). When AST(T 1 ) AST(T i ) > = Blevel(T i ), the HD-Tlevel(T i ) = max{ max {AFT(T s) + C s,i }, T s pred(t i ) max {HD-Tlevel(T x) + ETC(T x ) + C x,i }} T x pred(t i ) MEST(T i ) T x US&T s S T i R (8)

8 1290 J. LUO ET AL. AR of T i will not change and HD-Blevel(T i,p m )is equal to Blevel(T i ). And when AST(T 1 ) AST(T i ) < Blevel(T i ), T 1 should be hypothetically migrated upwards as shown in Figure 6(3). Meanwhile, the HD-Blevel should be updated as: HD-Blevel(T i,p m ) = HD-Blevel(T i,p m ) + ETC(T 1 ). And the new available time slot which can be denoted as AST(T 2 ) (AFT(T 1 ) AST(T 2 ) + AST(T i )) is checked whether it is larger than Blevel(T i ). Repeat this process until the available time of new schedule hole is larger than Blevel(T i )or the last task on this resource has been hypothetically migrated. Then, the HD-Blevel of ready tasks can be obtained. Furthermore, the HD-Blevel of partial-ready tasks can be obtained in the same way. As each task can be assigned to different resource, the current dynamic AR execution time of T i can be denoted as follows: MHD-Blevel(T i ) = 1 HD-Blevel(T P i, P m ) m P (9) Therefore, the current overall DCPL of the task graph can be denoted as DCPL hete = max {HD-Tlevel(T i) T i R PR + MHD- Blevel(T i )} (10) And the current T pre can be identified as the certain task with the maximum DCPL value. However, as there may be some unscheduled predecessors of T pre, the T pre cannot be selected as the candidate task for scheduling under such situation. The pervious algorithms usually select T pre s unscheduled predecessor which has the maximum blevel value as T can for scheduling. But this approach cannot guarantee T pre for scheduling as early as possible. In this paper, a task selection strategy which is based on recursively traversing the unscheduled predecessors is present. The details are described in Figure 7. An example process of task selection strategy is described as in Figure 8(1). Assume that current global DCP is denoted by the bold line. G is the current T pre, and D, E, F are the unscheduled predecessors of T pre. First of all, calculate the DCPL of G sub(1) with A and G being the entry task and exit task respectively as Figure 8(2). And the new critical path of G sub(1) is: {A C D G}, then D is the current T pre, denoted as T pre(1). But D also has the unscheduled predecessor; therefore the recursive process will be executed. As shown in Figure 8(3), the new critical path of G sub(2), with A and D being the entry task and exit task respectively, is{a E D} and E is denoted as T pre(2). As there is not any unscheduled predecessor, E can be identified as T can at current schedule step. The overall recursive process can be denoted as dashed line in Figure 8(1). Fig. 7. The procedure of task selection algorithm. Fig. 8. The illumination of the task selection algorithm.

9 A NOVEL TASK SCHEDULING ALGORITHM 1291 Theorem 1. Assume that in task schedule process, using DCP-based task schedule method can lead to the optimal schedule result. Then selecting T pre s certain unscheduled predecessor as the current T can by using our task selection strategy can make T pre to be scheduled as early as possible. Proof. According to the task selection strategy, when all the recursive process are done (assume the recursion is called s times), the final T can is equal to T pre(s) which is the preference task in the sth G sub (as G sub(s) ). And according to the assumed conditions, it can deduce that select T pre(s) as T can can make G sub(s) s exit task (denoted as T pre(s 1) ) to be scheduled at the earliest time. According to the same reason, it can also make G sub(s 1) s exit task (denoted as T pre(s 2) )tobe scheduled at the earliest time. The rest may be deduced by analogy, it can make G sub(1) s exit task (denoted as T pre(0) ) to be scheduled at the earliest time. As T pre(0) is equal to T pre, the proof is done Task Duplication-based Resource Allocation At this stage, the resource allocation process is based on task duplication technique. Different with the traditional distributed computing environment, the communication overhead of inter-resource cannot be omitted in pervasive computing environment, and will be the major hurdle to effective execution of parallel applications. This overhead can cause a serious penalty especially in pervasive computing systems where the network bandwidths are considerable slower than the traditional situation. Since the communication cost between tasks assigned to the same resource can be negligible, task duplication is a relatively effective approach which can make effective use of the idle time slots to reduce the inter-task communication. Therefore, the main procedure of duplication-based resource allocation stage can be defined as assigning candidate task (T can ) to the most proper resource by using task duplication strategy to obtain the optimum performance (generally the earliest completion time). In task duplication process, EST(T can ) is restricted to the arriving time of its immediate predecessors. So in order to obtain the optimal schedule result by reducing the inter-task communication cost, all predecessors of the candidate task must be considered for duplicating following the descending order of data arrive time [17]. Although the duplication can reduce the inter-task communication cost, at the same time it also utilize the available time of a certain resource. Thus, duplicating without any restriction may occupy the available time of the resource which will be used for the following task execution, and may lead to a bad schedule. Consequently, in order to determine which tasks should be duplicated and which should not, the effectiveness of duplication should be taken into consideration. But unfortunately, previous works pay very little attention on this issue. In this paper, we can divided the ineffective duplications which may cause the negative effect on the final SL into two kinds: forbidden duplication which can not reduce the T can s completion time and redundant duplication which may delay the final SL even if the local completion time can be reduced. Consequently, in order to obtain the better schedule performance, the tasks need to be duplicated effectively as much as possible on the premise that these two kinds of ineffective duplication must be eliminated. Thus, in our work, we propose a two-stage effective task duplication strategy to eliminate both forbidden duplication and redundant duplication. The first stage is local optimization effective duplication. Based on eliminating forbidden duplication, the local task duplication performance can be improved further by using space compression technique. And the second stage is global optimization effective duplication. On the basis of local optimization stage, the redundant duplication can be eliminated by using DCP-based effective duplication checking standard, and furthermore, final SL can be minimized to the largest extent Local optimization-based effective duplication In order to eliminate the forbidden duplication, SD algorithm [17] adopted the width-first-based duplication strategy and denoted the basic effectiveduplication condition that only the duplication which can improves the EST(T can ) will be done. This approach can eliminate the forbidden duplication to some extend. However, the performance of SD cannot be guaranteed very well because of its overly simplified strategy. The details of the relevant drawbacks can be described as Figure 9. The schedule result without duplications is depicted in Figure 9(2), T 1 is the CP of T can, and assume that DAT(T 1,P m ) > DAT(T 2,P m ) > DAT(T 3,P m ). The width-first-based duplication result can be shown

10 1292 J. LUO ET AL. Fig. 9. (1) The topology of task graph, (2) schedule without duplication, (3) using width first strategy to duplicate, (4) consider effective multi-level predecessors duplication, (5) using space compression strategy to duplicate. in Figure 9(3). After duplicating T 1 P m, the EST(T can,p m ) is restricted by T 1 s completion time (because ACT(T 1,P m ) > DAT(T 2,P m )). Under this condition, duplicating T 2 cannot reduce EST(T can,p m ). Therefore according to pervious strategy, the duplication process is end. However the Figure 9(4) indicates that if we duplicate T 1 s critical predecessor T 4 after duplicating T 1 onto P m, ACT(T 1,P m ) can be reduced to g. Then T 2 has large enough slot to be duplicated, and EST(T can,p m ) can be further reduced to c. Consequently, the multi-level predecessors duplication should be taking into consideration on the premise of satisfying the basic effective duplication condition. The corresponding effective multi-level predecessors duplication rule can be defined as follows: Rule 2 (effective multi-level predecessors duplication). Assume that after T can s immediate predecessor (denoted as T s ) have been duplicated on P m, the EST(T can,p m ) is effected by EFT(T s,p m ), namely EST(T can,p m ) = AFT(T s,p m ). At this time, T s s CP should be considered for duplicating. This process can be repeated until the kth CP cannot satisfies the duplication conditions or entry task has been duplicated. However, following Figure 9(4) we can find another problem that: as the available time of P m has been partition into several segments, there might not be enough available time to duplicate the next candidate predecessor (here is T 3 ). Therefore, the space compression-based strategy is proposed to combine the several small available time slots into a bigger one. The so-called space compression concept means that, the duplicated predecessors could be migrated downwards without any data arrive time constrain to make them adjacent in order to splice several small distributed available time slots to a bigger one for satisfying the next duplication requirement. The rule of effective duplicated predecessors downward migration can be defined as follows: Rule 3 (effective downward migration). Assume that the current candidate duplication task is T d. Only the duplicated predecessors whose EST is later than T d should be migrated downwards respectively from top to down meanwhile the last duplicated predecessors completion time must be earlier than EST(T can, P m ) all the time, that is, AFT(T k,p m ) < EST(T can, P m ). The downward migration is terminated when a big enough available time slot has been formed or AFT(T k,p m ) = EST(T can,p m ). Theorem 2. According to rule 3, it can lead to the most effective result of the duplicated predecessors downward migration. Furthermore, if only T can s immediate predecessors are considered to be duplicated, this rule can lead to the optimal local schedule result. Proof. Without loss generality, assume that T 2 is the first one in the duplicated predecessor sequence on certain resource which starts after T d s EST; T 1 and T 3 are two duplicated predecessors located at upper and nether of T 2 respectively, which can be described as Figure 10(1). Following rule 3, in order to duplicate T d, only T 2 and later duplicated predecessors need to be migrated downwards, as shown in Figure 10(2). Here the distance of T 2 s downward migration is e d, the distance of T 3 s is Assume that downward migration from T 1 can generate the big enough available time slot to duplicate T d. The Figure 10(3) indicates that the actual distance of T 1 s downward migration is c a, and c a (b a) = ETC(T d,p m ). According to the assumption, b > a c a > ETC(T d,p m ). Correspondingly, the distance of T 2 s downward migration is c + ETC(T 1,P m ) d. Therefore,

A NOVEL TASK SCHEDULING ALGORITHM 1293 than the result which is considered to be optimal in the previous width-first strategies. Fig. 10. The different result of downward migration.

11 A NOVEL TASK SCHEDULING ALGORITHM 1293 than the result which is considered to be optimal in the previous width-first strategies. Fig. 10. The different result of downward migration. According to this strategy, the duplication result about the scenario example is shown in Figure 10(5) and the EST (T can ) is reduced further. In summary, the local optimized task duplication strategy can completely eliminate the forbidden duplication meanwhile the local schedule performance can be optimized by effectively utilizing the resource available time to duplication more predecessors. c + ETC(T 1,P m ) d (e d) = c + ETC(T 1,P m ) (a + ETC(T 1,P m ) + ETC(T d,p m )) = c a ETC (T d,p m ) > 0. Then T 2 has a needless downward migration. Although T d s EST is smaller than Figure 10(2), according to the successive duplication, denoted as T d -next, there exist two situations. For the situation about the EST(T d -next) is earlier than a or later than h, there is no effect about the T 1 s downward migration. But between a and h, the next duplication need to migrate more duplicated task downwards, thus T 1 s migration may lead to the worse result. So T 1 s downward migration is ineffective. And for the same reason, the downward migrations of all the upper duplicated tasks are also ineffectively. 2. Assume that downward migration from T 3 can generate the big enough available time slot to duplicate T d. Figure 10(4) shows that the actual distance of T 3 s downward migration is g f > 0. But in Figure 10(2), T 3 need not to be migrated downwards. Moreover, here EST(T d,p m ) is d + ETC(T 2,P m ) > a + ETC(T 1,P m ). Therefore, not only T 3 needs to be migrated downwards in a large distance, but also the EST(T d,p m ) is delayed. Thus, downward migration from T 3 cannot obtain good performance. For the same reason to Figure 10(1), the successive duplication may be badly effected by migration starting from T 3. Thus, the downward migrations of all the nether duplicated predecessors are also ineffective. In a word, in downward migration-based duplication process, the order of duplications is based on the immediate predecessors data arriving time, thus the optimal result performance can be obtained by following the rule 3. And it can obtain a better result Global optimization-based effective duplication The previous duplication strategies mainly duplicate the certain predecessors with the evaluation criteria that T can may finish its execution as early as possible. Therefore, it can only gain the local optimal scheduling. However, the duplicated tasks may occupy several available time on certain resource. So there is a trade-off between reducing inter-task communication cost and using the available time of certain resource. Thus, although EST(T can ) can be reduced further by duplicating certain predecessor, it may delay the following unscheduled task s completion time and even the final SL. This is so-called redundant duplication. Unfortunately, in previous algorithms, how to eliminate this kind of ineffective duplications was always neglected. Consequently, in order to examine and eliminate the redundant duplication effectively, the influence of task duplication on the final SL should be taken into consideration. In our works, the DCPL calculation method which is mentioned above is adopted to evaluate the final SL here effectively. After duplicating each predecessor following the local effective duplication strategy, the current DCPL should be calculated to examine and eliminate the redundant duplication to gain the global optimized duplication scheme which can lead to the minimum final SL. But in effective duplication process, the DCPL may not be monotone decreasing. Therefore, in order to grasp the whole information about duplication process, all DCPL obtained after each local effective duplicating must be examined. Then, according to the minimum value of DCPL, the global optimized duplication scheme can be obtained and the redundant duplication will be eliminated effectively. The corresponding

12 1294 J. LUO ET AL. Fig. 11. The procedure of task duplication based resource allocation strategy. equation can be defined as Min{Min(DCPL dup (T d, P n ), DCPL no dup (P n ))} s.t. P n P&T d D (11) where D denotes the duplicated-predecessors set. Thus, the details of duplication based resource allocation strategy can be described in Figure Details of DCPED Algorithm and Scheduling Examples Analysis The DCPED scheduling algorithm is formalized in Figure 12. In DCPED algorithm, the number of the tasks belonged to R and PR set is much less than t. And in DCPL calculation process, actually most of tasks just need to make a little change based on the pervious calculation. Therefore, the complexity of DCPL calculation is about O(pt). And in task selection strategy, the recursion process should be executed t times at worst case, so the complexity of it is about O(pt 2 ). Moreover, in duplication-based resource allocation stage, we can assume that the number of duplication is about d max which is the maximum in/out degree of a task in DAG. And in DCPL calculation process after each duplicating, there is not any new ready or partial ready task to join in the R and PR set. Therefore, the complexity of this phase is about O((t + t)*d max *p). Thus, the overall worst case complexity of DCPED algorithm comes out to be O(t*(pt + pt 2 + 2d max pt)) O(pt 3 ). Generally, the DCPL calculation process in task duplication based resource allocation phase usually can determine the T pre in next schedule step, thus the actual complexity of DCPED is less than O(pt 3 ). In comparison to LDBS algorithms (O(p 3 et 3 )), complexity of DCPED algorithm is lower by an order at least. In this paper, a scenario example about complex workflow application is shown in Figure 13 to illuminate all improvements. Thereinto, its structure has several common attributes, so the various complexworkflow applications can be viewed as the derivation or the simplification of this structure. The running trace and schedule result generated by DCPED are given in Figures 14 and 15(a) in comparison to HEFT [6], E-DCP [10], HCPFD [14], LDBS [11] and HLD [7] algorithm (Figure 15(b f)) which mentioned in the related works, where E-DCP is the extension of DCP algorithm into heterogeneous environment. In DCPED s running process, scheduling T 5 onto P 1 with duplicating T 1 and T 2 will obtain the minimum completion time of T 5. But according to the global optimization strategy, scheduling T 5 P 0 without any duplication can get the minimum DCPL. Therefore, Fig. 12. The main procedure of DCPED algorithm.

A NOVEL TASK SCHEDULING ALGORITHM 1295 Fig. 13. The task graph of a scenario complex-workflow application and execution time matrix. Fig. 14. Running trace of DCPED algorithm.

And while scheduling T 8, following the space compression-based effective duplication strategy, DCPED can migrate T 4 downwards to gain a big enough available time to duplicate T 5, so EST(T 8 ) can

13 A NOVEL TASK SCHEDULING ALGORITHM 1295 Fig. 13. The task graph of a scenario complex-workflow application and execution time matrix. Fig. 14. Running trace of DCPED algorithm. the better scheduling can be obtained. And while scheduling T 8, following the space compression-based effective duplication strategy, DCPED can migrate T 4 downwards to gain a big enough available time to duplicate T 5, so EST(T 8 ) can be reduced obviously. In general, DCPED can accurately select the candidate task and meanwhile get a better effective duplication by using local and global optimization strategy. Thus, a much better performance can be obtained; and the corresponding final SL is 23. Compared with DCPED, the other mentioned strategies cannot calculate critical path exactly and usually ignore the ineffective duplication. Therefore, the schedule performance is not very good. These final SLs are as follows: HEFT (29), E-DCP (29), HDCPD (25), LDBS (25) and HLD(25). In this scenario example, all the novel improvements mentioned in Section 4 can be reflected. Although not every complex workflow applications may cover all these characters, including any one of them can also improve the scheduling result independently; and moreover in the worst case as none of these improvements can be reflected, DCPED would be degenerated to the previous algorithms. Fig. 15. Schedule results by different strategy. 5. Simulation and Discussion In this section, the performance of DCPED algorithm is presented in comparison to the five recently proposed scheduling algorithms mentioned in Section 2. There are HLD, LDBS, HCPFD, E-DCP and HEFT algorithms. For this purpose, we consider two sets of application graphs as the workload for testing these algorithms: irregular graphs and regular graphs. The

14 1296 J. LUO ET AL. irregular graphs are generated randomly according to the relevant parameters and regular graphs which have some fixed structures are generated by following some numerical real world workflow applications such as Gauss elimination [6], fork join [18] and mean value analysis [9] Performance Metrics and Simulation Graph Generation The comparisons of these five scheduling algorithms are based on the following four metrics [6,19]: Normalized schedule length (NSL): SL NSL = min ETC(T i, R j ) R j R T i CP static (12) The denominator is the summation of the minimum computation costs of tasks located at the static CP on the certain resource. And the NSL of a graph is always more than one. The algorithm which gives the lower NSL of a graph is the better algorithm. Speedup: min ETC(T i, R j ) Speedup = R j R T i T SL (13) The numerator is the minimum sum of execution time with assigning all DAG tasks onto one single resource. Number of occurrences of better quality of schedules: The number of times that each algorithm given better, worse and equal schedule results compared to other algorithms is counted and evaluated in this paper. Algorithms running time: The running time of each algorithm with the respect to DAG size Simulation Graph Generation Aim at the generation of experimental DAG, the relevant parameters can be defined as follows: 1. DAG size n: the task number of DAG. 2. Shape: denotes the shape factor of DAG. Using this parameter, the width of each level about the task graph can be obtained, and the range of this value is (0, n shape]; 3. D+: the maximum outdegree in DAG; 4. CCR: the ratio of communication to computation cost. 5. Heterogeneity factor chf: it can reflect the extent of DAG s heterogeneity. 6. CM: denotes the mean value of computation cost. The mean value of each task s execution time can be generated by obeying uniform distribution with CM for the expectation. Furthermore, the range of ETC(T i,r j ) value is [( ( ETC(T i ) 1 chf 2 ( ( ETC(T i ) 1 + chf 2 )), ))] (14) In each simulation, the values of these parameters are selectively assigned by some certain values which will be mentioned in the following section to generate the required graphs Performance Comparison of Irregular Graphs The irregular workflow graphs are considered firstly in our simulation. In order to generate the random graphs, the values of DAG generation parameters mentioned above are assigned through the following sets as shown in Figure 16. These combinations give 1440 different DAG types. Since 10 random DAGs were generated for each DAG type, the total number of DAGs used in irregular graph simulation was around 14.4 k. The first simulation is to compare the average NSL and Speedup of six algorithms with respect to the size of DAG while CCR is 2.0 and resource number is 8, as shown in Figure 17(a) and (b). With the increment of DAG size, NSL and speedup of each algorithm is increasing. And DCPED always produces a better performance than any other algorithm. This is because Fig. 16. The setting of the randomly graphs generation parameters.

15 A NOVEL TASK SCHEDULING ALGORITHM 1297 Fig. 17. The performance of randomly generated graphs with respect to DAG size, CCR and resource number. that when DAG size is small, the difference between each algorithm s schedule results is little. With the DAG size increasing, the advantage of critical path and duplication-based algorithm is becoming obvious. In the duplication stage, due to only duplicating the single immediate predecessor, HCPFD cannot obtain a great schedule result. By contrary, LDBS and HLD could duplicate more immediate predecessors than HCPFD, so it can outperform HCPFD especially when the DAG size is large. Moreover, HLD algorithm uses critical path technique to assign tasks priority, thus it will get a better schedule result than LDBS. For E-DCP and HEFT algorithm, as without considering duplication, E-DCP with using DCP-based strategy is superior to HEFT. For DCPED algorithm, because of using the DCPED-based strategy, it can outperform any other

16 1298 J. LUO ET AL. algorithms (by 12 per cent better than HLD, 15 per cent better than LDBS, 23 per cent better than HCPFD, 31 per cent better than E-DCP, 34.5 per cent better than HEFT). The next simulation is to compare the average NSL and speedup of six algorithms with respect to the CCR value where DAG size is 160. The details are showed in Figure 17(c) and (d). It indicates that with the increment of CCR value, NSL of each algorithm is increasing but by contraries Speedup is decreasing. The reason is that due to the increasing of CCR, the communication cost is becoming the dominated part of DAG execution. For the non-duplication based algorithms, due to eliminating the communication cost, tasks which have the data constraints will be scheduled on the same resource. Therefore, it may lead to large NSL and load imbalance of resources. For the duplication-based algorithms, the predecessors of T can can be duplicated due to the decreasing of communication cost. On the other side, with the increment of CCR, the sum execution time of the tasks is fixed. Thus, the speedup will decrease drastically. And as in the first simulation, LDBS and HLD can outperform HCPFD. Under this situation, as DCPED can make good use of the resource idle time to eliminate the communication cost, it can produces a better schedule result (NSL) than HLD by 10 per cent, LDBS by 15 per cent, HCPFD by 30 per cent, E-DCP by 39 per cent and HEFT by 49 per cent. The third simulation is to compare the average NSL and speedup of six algorithms with respect to the resource number where DAG size is 200 and CCR is 2.0. The results are showed in Figure 17(e) and (f). It can indicate that with the increment of resource number, NSL of each algorithm is decreasing and by contraries speedup is increasing. But when the resource number is larger than a certain value, the NSL and speedup of each algorithm will not change. At this situation, duplication-based algorithms usually outperform the non-duplication-based algorithms. In details, DCPED can make a better schedule result any other algorithms (By 11 per cent better than HLD, 15 per cent better than LDBS, 20 per cent better than HCPFD, 26.5 per cent better than E-DCP, 30 per cent better than HEFT). Here we also measure the running times of all algorithms running on DELL 5150 PC. These times are plotted in Figure 18. At most time, HEFT and HCPFD are faster than the rest of the algorithms. The running time of DCPED algorithm is slightly larger than E-DCP and HLD algorithm. However, since the main objective of pervasive computing is to obtain Fig. 18. The running time of algorithms with respect to DAG size. a better user-centric performance, minimum schedule length is the much more important aspect; a slightly long running time in producing a much better schedule result can be acceptable. Finally, in this part of simulation, the number of times which each algorithm produced better, worse and equal result compared to every other algorithm is counted for about 14.4 k DAGs in Figure 19. A white rectangle in this figure represents the comparison results about the one of the algorithms on the left side and another algorithm on the top side. In each rectangle, there are three values representing the number of times the algorithm on the left side performed better, worse and equal schedule result compared to the algorithm on the top side Performance Comparison of Regular Graphs In addition to the irregular task graphs, we also considered regular graphs of three real world problems: Gauss elimination [6], Lobe fork join [19] and mean value analysis graph [10]. The fixed structures of these kinds of graphs are shown in Figure Gauss elimination graph The structure of the application graph is defined in Reference [6], shown in Figure 20. The number of tasks n and the number of graph levels l depend on the matrix size m. Therefore, only the execution and communication cost generation parameters were selected randomly. The total number of tasks n in Gauss

17 A NOVEL TASK SCHEDULING ALGORITHM 1299 Fig. 19. Average comparison of these algorithms in terms of better, worse and equal performance. Fig. 20. Elementary task graph structures for some regular parallel numerical workflow applications. elimination graph is equal to m2 +m 2 2. Figure 21(a) and (b) give the average NSL and speedup of each algorithm at various matrix sizes form 5 20, with the increment of 5. The smallest size graph in this simulation has 14 tasks and the largest one has 209 tasks. As the structure is fixed, it should not set the DAG sharp and in/out degree parameters, the rest of parameters are set as in Figure 16. In Gauss graph, there is a path which has the largest number of tasks. Therefore, critical path-based algorithms could produce a better performance. Meanwhile, as there are only two immediate predecessors with each task, the difference performance between duplication-based algorithms is small. Additionally, at this situation, HLD algorithm will be degenerated to HCPFD. Thus, DCPED algorithm considers both the critical and duplication, thus it can cause the most effective schedule result Fork--join graph The structure of the application graph is defined in Reference [19], shown in Figure 20. The characteristic of this kind of graphs is that the number of DAG s level is odd, where each odd level has only one task and each even level has more than one task. In the generation process, the task number of each even level should be generated. And the rest of parameters should be set as in Figure 16. Figure 21(c) and (d) give the average NSL and speedup of each algorithm with respect to DAG size from 40 to 200. In Fork join graph, as each odd level has only one task, the difference between critical path-based and non-critical path-based algorithms is small. However, there might be many tasks in each even level, and each of them has only one same immediate predecessor, so the multi-predecessors duplicationbased algorithms are more effective. Thus at this situation, HLD and LDBS will get the close schedule results and meanwhile can outperform the HCPFD. On the other side, as DCPED algorithm cannot only consider the multi-predecessors duplication like LDBS and HLD but also can use space compression technique to get more change to duplicating, it can lead to the best schedule performance Mean value graph The structure of the application graph is defined in Reference [10], shown in Figure 20. The number of the level about mean value graph is odd. The total number

18 1300 J. LUO ET AL. Fig. 21. The performance of regular graphs with respect to the matrix size. (a), (b) About Gauss elimination graphs, (c), (d) about the fork join graphs and (e), (f) about the mean value graphs. of tasks n in mean value analysis graph is equal to ([ l / 2 ]) 2. Figure 21(e) and (f) give the average NSL and speedup of each algorithm at various l / 2 values form 5 13, with the increment of 2. In the generation process, the parameters should be set as Gauss graphs. In mean value graph, there might be several parallel paths; it indicates that the critical path-based algorithm should be obtained the better performance. On the other side, liking Gauss graph, each task only has two immediate predecessors, so the difference between duplication-based algorithms is small. Under consideration about both of the dynamic critical and effective duplication, the DCPED algorithm can still be superior to any other algorithms. Additionally, liking in the Gauss elimination graphs scheduling, HLD will be degenerated to HCPFD algorithm.

19 A NOVEL TASK SCHEDULING ALGORITHM Conclusion and Future Works In this paper, for scheduling the complex workflow applications in pervasive computing environment, a novel task graph scheduling algorithm based on DCPED is presented to solve the problems of the pervious scheduling algorithms about the inaccurate calculation of critical path and ineffective duplication in pervasive environment. In DCPED, the candidate task can be obtained by using DCP strategy under consideration the resource availability. And in task duplication-based resource allocation stage, an effective task duplication algorithm is proposed to eliminate the ineffective duplications which are divided into forbidden duplication and redundant duplication respectively, and meanwhile; the more predecessors can be duplication effectively to obtain a better schedule result by using space compression and DCPL evaluation techniques. Thereinto, each of these strategies can improve the scheduling result independently, and in the worst case with no improvement being reflected, the result should be no worse than the previous algorithms. Based on the experimental study using a large set (around 14.4 k) of randomly generated graphs with various generation parameters and several real world workflow application graphs (Gauss elimination, fork join and mean value analysis) and meanwhile with a simulative heterogeneous network environment, the DCPED algorithm significantly outperformed all of the pervious algorithms in terms of performance metrics including average normalize SL, speedup and meanwhile got an acceptable running times. Therefore, the DCPED is a much more effective schedule algorithm in the pervasive computing environment especially for the large scale and fine-grain workflow applications. As mentioned in Section 1, for the purpose of solving the inexact estimation problem in compiletime scheduling, the compile and run-time combined scheduling techniques will be considered in our future works. At the initial stage, a pre-scheduling scheme will be obtained by using a sophisticated compiletime scheduling approach like DCPED. And then, at the run-time stage, the rescheduling technique will be used to adjust the former schedule scheme by using the real-time schedule information to adapt the dynamic computing environment. In the rescheduling process, in order to avoid the overheads about taking some additional time to retrieve the real-time information, to run the reschedule algorithm and to assign the task to the proper resource according to the new scheme, the task should not be rescheduled until its immediate predecessors are starting to run. Thus, the retrieval of real-time schedule information and the execution of task reschedule can be overlapped with the execution of immediate predecessors. However, as it cannot utilize the real-time execution information of the immediate predecessors, there still exists the estimation error to some extent. Meanwhile, as the development of estimation technique, the better performance knowledge estimation can be obtained. But most of current schedule algorithms only consider the snapshot value of resource performance when they make pre-scheduling estimate. Therefore, a suitable approach which can exploit the dynamic estimation information needs to be explored to obtain a good performance about the trade-off mentioned above. On the other hand, in the previous algorithms, the rescheduling process of certain task will be called instantly when the reschedule conditions mentioned above are satisfied, it will neglects the task graph structure and the whole application may not be scheduled optimally. Thus, an intermediate solution which considers both the task graph structure and the dynamic behaviour of environment need to be explored. Consequently, these new ideas may lead to a robust schedule approach to adjust the dynamism of the pervasive computing environment and may be the main direction of our future works. Acknowledgements This work is supported by National Natural Science Foundation of China under Grants No and , Jiangsu Provincial Natural Science Foundation of China under Grants No. BK , Jiangsu Provincial Key Laboratory of Network and Information Security under Grants No. BM and Key Laboratory of Computer Network and Information Integration, Ministry of Education of China, under Grants No. 93K-9. References 1. Weiser M. The computer of the 21st century[j]. Scientific American 1991; 265(3): aura/

1302 J. LUO ET AL. 5. Luo J, Dong F, Cao. J. Multicontext-aware resource recommendation mechanism for service-oriented ubiquitous learning environment.

Performance-effective and lowcomplexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 2002; 13(3): 260 274. 7. Bansala S, Kumar P, Singh K.

Scheduling directed A-cyclic task graphs on heterogeneous network of workstations to minimize schedule length.

Efficient compile-time task scheduling for heterogeneous distributed computing systems. In Proceedings of the 12th International Conference on Parallel and Distributed Systems (ICPADS 06), 2006. 10.

20 1302 J. LUO ET AL. 5. Luo J, Dong F, Cao. J. Multicontext-aware resource recommendation mechanism for service-oriented ubiquitous learning environment. In Proceeding of the Third International Conference on Pervasive Computing and Applications (ICPCA 08), Topcuoglu H, Hariri S. Performance-effective and lowcomplexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 2002; 13(3): Bansala S, Kumar P, Singh K. Dealing with heterogeneity through limited duplication for scheduling precedence constrained task graphs. Journal of Parallel and Distributed Computing 2005; 65: Baskiyar S, SaiRanga P. Scheduling directed A-cyclic task graphs on heterogeneous network of workstations to minimize schedule length. In Proceeding of the 2003 International Conference on Parallel Processing Workshops (ICPPW 03), Daoud MI, Kharma N. Efficient compile-time task scheduling for heterogeneous distributed computing systems. In Proceedings of the 12th International Conference on Parallel and Distributed Systems (ICPADS 06), Kwok YK, Ahmad I. Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors. IEEE Transactions on Parallel and Distributed Systems 1996; 7(5): Dogan A, Ozguner F. LDBS: a duplication based scheduling algorithm for heterogeneous computing systems. In Proceeding of the International Conference on Parallel Processing (ICPP 02), Liu CH, Li. CF. A dynamic Critical Path Duplication Task Scheduling Algorithm for Distributed Heterogeneous Computing Systems. In Proceedings of the 12th International Conference on Parallel and Distributed Systems (ICPADS 06), Maheswaran M, Siegel. HJ. A dynamic matching and scheduling algorithm for heterogeneous computing systems. In Proceedings of the Seventh Heterogeneous Computing Workshop, Hagras T, Janecek J. A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems. In Proceeding of the 18th International Parallel and Distributed Processing Symposium (IPDPS 04), Li G, Zhang Y. Scalable duplication strategy with bounded availability of processors. In Proceedings of the 10th International Conference on Parallel and Distributed Systems (ICPADS 04), Wieczorek M, Prodan R, Fahringer T. Scheduling of scientific workflows in the ASKALON grid environment. SIGMOD Record 2005; 34(3):. 17. Bansal S, Kumar P. An improved duplication strategy for scheduling precedence constrained graphs in multi-processor systems. IEEE Transactions on Parallel and Distributed Systems 2003; 14(6): Park GL, Shirazi B. DFRN: a new approach for duplication based scheduling for distributed memory multiprocessor systems[c]. 11th International Parallel Processing Symposium (IPDPS 97), Ahmad I, Kwok. YK. On exploiting task duplication in parallel program scheduling. IEEE Transactions on Parallel and Distributed Systems 1998; 9(9): Park CI, Choe. TY. An optimal scheduling algorithm based on task duplication. IEEE Transactions on Parallel and Distributed Systems 2002; 51(4): Bajaj R, Agrawal DP. Improving scheduling of tasks in a heterogeneous environment. IEEE Transactions on Parallel and Distributed Systems 2004; 15(2): Mandal A, Kennedy K, Koelbel C. Scheduling strategies for mapping application workflows onto the grid. In Proceedings of 14th IEEE International Symposium on High performance distributed computing, 2005, HPDC-14, July 2005; Author s Biographies Junzhou Luo is currently a professor and the dean in School of Computer Science and Engineering, Southeast University, China. He received his M.S. and Ph.D. degrees in Computer Science from the Southeast University, China in 1992 and 2000, respectively. His current research interests include pervasive computing, network security, service computing and protocol engineering. Fang Dong is currently a Ph.D. student in School of Computer Science and Engineering, Southeast University, China. He received his B.S. and M.S. degrees in Computer Science from Nanjing University of Science & Technology, China in 2004 and 2006, respectively. His current research interests include pervasive computing, service computing and task scheduling. Jiuxin Cao is currently an associate professor in School of Computer Science and Engineering, Southeast University, China. He received his M.S. degree in Computer Application from Henan University of Science and Technology, China in 1999 and Ph.D. degrees in Computer Science from Xi an Jiaotong University, China in His current research interests include pervasive computing, service computing and e-learning. Aibo Song is currently an associate professor in School of Computer Science and Engineering, Southeast University, China. He received his M.S. and Ph.D. degrees in Computer Science from the Southeast University, Nanjing, China in 1996 and 2003, respectively. His current research interests include pervasive computing and grid computing.

A resource-aware scheduling algorithm with reduced task duplication on heterogeneous computing systems

J Supercomput (2014) 68:1347 1377 DOI 10.1007/s11227-014-1090-4 A resource-aware scheduling algorithm with reduced task duplication on heterogeneous computing systems Jing Mei Kenli Li Keqin Li Published