Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Similar documents
Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Load Balancing for Hex-Cell Interconnection Network

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Hybrid Job Scheduling Mechanism Using a Backfill-based Multi-queue Strategy in Distributed Grid Computing

Load-Balanced Anycast Routing

A Genetic Algorithm Based Dynamic Load Balancing Scheme for Heterogeneous Distributed Systems

Efficient Distributed File System (EDFS)

A Binarization Algorithm specialized on Document Images and Photos

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Virtual Machine Migration based on Trust Measurement of Computer Node

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

(1) The control processes are too complex to analyze by conventional quantitative techniques.

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Video Proxy System for a Large-scale VOD System (DINA)

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

AADL : about scheduling analysis

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Dynamic Bandwidth Provisioning with Fairness and Revenue Considerations for Broadband Wireless Communication

Research Article Adaptive Cost-Based Task Scheduling in Cloud Environment

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Efficient Content Distribution in Wireless P2P Networks

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Chapter 1. Introduction

A Frame Packing Mechanism Using PDO Communication Service within CANopen

A New Approach For the Ranking of Fuzzy Sets With Different Heights

The Data Warehouse in a Distributed Utility Environment

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks

Fibre-Optic AWG-based Real-Time Networks

Queueing Network-based Optimisation Techniques for Workload Allocation in Clusters of Computers *

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

Module Management Tool in Software Development Organizations

Smoothing Spline ANOVA for variable screening

An Image Fusion Approach Based on Segmentation Region

Cluster Analysis of Electrical Behavior

Concurrent Apriori Data Mining Algorithms

Optimized Resource Scheduling Using Classification and Regression Tree and Modified Bacterial Foraging Optimization Algorithm

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

A GENETIC ALGORITHM FOR PROCESS SCHEDULING IN DISTRIBUTED OPERATING SYSTEMS CONSIDERING LOAD BALANCING

QoS-aware composite scheduling using fuzzy proactive and reactive controllers

Analysis of Collaborative Distributed Admission Control in x Networks

Simulation Based Analysis of FAST TCP using OMNET++

Self-tuning Histograms: Building Histograms Without Looking at Data

Burst Round Robin as a Proportional-Share Scheduling Algorithm

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Evaluation of Parallel Processing Systems through Queuing Model

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

3. CR parameters and Multi-Objective Fitness Function

Priority-Based Scheduling Algorithm for Downlink Traffics in IEEE Networks

A Semi-Distributed Load Balancing Architecture and Algorithm for Heterogeneous Wireless Networks

A HIERARCHICAL SIMULATION FRAMEWORK FOR APPLICATION DEVELOPMENT ON SYSTEM-ON-CHIP ARCHITECTURES. Vaibhav Mathur and Viktor K.

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Related-Mode Attacks on CTR Encryption Mode

A comparative study of scheduling algorithms for the multiple deadline-constrained workflows in heterogeneous computing systems with time windows

An Entropy-Based Approach to Integrated Information Needs Assessment

A Proximity-aware Load Balancing in Peer-to-Peer based Volunteer Computing Systems

A fair buffer allocation scheme

X- Chart Using ANOM Approach

Use of Genetic Algorithms in Efficient Scheduling for Multi Service Classes

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Behavioral Model Extraction of Search Engines Used in an Intelligent Meta Search Engine

Constructing Minimum Connected Dominating Set: Algorithmic approach

Towards Autonomous Service Composition in A Grid Environment

Adaptive Power-Aware Prefetch in Wireless Networks

Wireless Sensor Network Localization Research

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Parallel matrix-vector multiplication

Mathematics 256 a course in differential equations for engineering students

Query Clustering Using a Hybrid Query Similarity Measure

An Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems

A QoS-aware Scheduling Scheme for Software-Defined Storage Oriented iscsi Target

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

The Codesign Challenge

Delay Variation Optimized Traffic Allocation Based on Network Calculus for Multi-path Routing in Wireless Mesh Networks

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

Design and implementation of priority and timewindow based traffic scheduling and routingspectrum allocation mechanism in elastic optical networks

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Reliability and Performance Models for Grid Computing

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

Capability-Aware Object Management based on Skip List in Large-Scale Heterogeneous P2P Networks

Fast Retransmission of Real-Time Traffic in HIPERLAN/2 Systems

The Research of Support Vector Machine in Agricultural Data Classification

Adaptive Energy and Location Aware Routing in Wireless Sensor Network

A KIND OF ROUTING MODEL IN PEER-TO-PEER NETWORK BASED ON SUCCESSFUL ACCESSING RATE

An Approach to Optimized Resource Scheduling Algorithm for Open-source Cloud Systems

Edge Detection in Noisy Images Using the Support Vector Machines

An Online Delay Efficient Multi-Class Packet Scheduler for Heterogeneous M2M Uplink Traffic

Study on Fuzzy Models of Wind Turbine Power Curve

THere are increasing interests and use of mobile ad hoc

Performance Evaluation of Information Retrieval Systems

Assembler. Building a Modern Computer From First Principles.

Transcription:

Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen, Iran Abstract Schedulng s a fundamental ssue n achevng hgh performance n multclusters and computatonal grds. In ths paper, we nvestgate the schedulng problem usng the proposed scheduler uses a dstrbuted approach wth the capablty of consderng the local clusters advantages for executng the jobs. On the other hand, the dstrbuted scheduler explots the capabltes of fuzzy logc to qualtatvely deal wth dfferent parameters avalable n the schedulng decson. Smulaton results show the effectveness of the algorthm n terms of job completon tme. Keywords: Fuzzy theory, Grd computng, Job schedulng, Dstrbuted schedulng 1. Introducton A grd computng nfrastructure s a collecton of resources connected by a network, n whch, by means of approprate software, resource dscovery and sharng s made possble [1]. One way to create a platform for grd computng s to nterconnect exstng separate clusters. These clusters may be located wthn a sngle organzaton or across dfferent admnstratve organzatons [2]. Schedulng s an mportant ssue n grd computng, and parallel jobs consttute a typcal workload n the schedulng scenaro. Parallel jobs can be classfed nto two categores: Rgd and moldable [2]. Rgd parallel jobs run on a user specfed number of resources, whle moldable jobs can run on dfferent number of computatonal resources. In ths paper we consder rgd parallel jobs. One class of applcatons that typcally run on a grd superstructure s the class of sngleprogram-multple-data (SPMD) applcatons, also known as data-parallel applcatons. Data-parallel applcatons are parttoned nto tasks whch perform computatons on separate peces of a data set. The tasks work together to process the entre data set, and are collectvely called a job. The data-parallel program model s often used to solve scentfc computng problems. These jobs may run for many hours or days, and can consume a large amount of system resources. The job may perform a large amount of computaton, communcaton between tasks, or both [1]. One category of rgd jobs that we have nvestgated s data parallel applcatons. A grd scheduler uses the nformaton of grd system and jobs to produce an assgnment of tasks to machnes for the gven grd job. The general problem of mappng tasks to machnes has been shown to be NP-complete [3]. The schedulng of parallel jobs has been extensvely studed n a sngle cluster envronment [5, 6]. Several heurstc algorthms have been developed to schedule tasks to machnes on heterogeneous computng systems. Eleven such schedulng algorthms have been evaluated n [4]. These algorthms are developed for heterogeneous computng systems. Some heurstc schedulng algorthms for grd envronments are developed n [7, 8, 9]. They deal wth tasks and machnes n terms of assgnng tasks to machnes, and have the defcency of not beng scalable, when appled to a large scale grd. For attanng scalablty, we use a dstrbuted approach, n a way that the arrval job can be submtted to any of the clusters. Afterward a 2 layer (global and local) schedulng scheme s deployed, whch ts frst stage s responsble for assgnng the job to an approprate cluster. Ths stage s called global schedulng, whch n fact schedules the job at the grd-level. Afterward the cluster s scheduler submts the job to the scheduler of the selected cluster, whch n turn starts the schedulng of job s tasks n ts local nodes upon recevng the job. The local scheduler (cluster- ISSN: 1790-5109 820 ISBN: 978-960-6766-85-5

level scheduler) uses the frst-n frst-out polcy for schedulng. In ths study we focus on dstrbuted global schedulng (schedulng at the grd-level), whch deals wth jobs and clusters, and assgns each job to a cluster. For ths, we consder the computatonal needs (.e. the number of computatonal resources on whch the job s requested to be run), and communcaton requrements (the amount of communcaton between job s tasks) of the job. We beleve that job s communcaton rato s an mportant parameter n schedulng. Exstng schedulers, presently, gnore ssues such as network load and the communcaton requrements of the applcaton. In a typcal scenaro, f an ncomng job requres a fxed number of computatonal resources and the same number of machnes s avalable n a cluster, the job s submtted to that cluster, even f the load on the cluster network precludes the completon of the job by an acceptable tme. The work presented n ths paper, addresses the need to buld a fuzzy system that augments the schedulng capabltes by ncorporatng ssues such as cluster s network load and job s communcaton requrements. The specfc queston beng addressed n ths paper s whch of the clusters should be allocated to an ncomng job. Ths queston requres a process of prortzaton of the avalable clusters. For ths, fuzzy logc was used n the form of a controller that contans rules whch match the resource specfcaton requrements of the jobs to the avalable resources of the clusters. The objectve s to nclude cluster s network load as a cluster resource, and job s communcaton requrements as a resource specfcaton n the prortzaton process. Usng fuzzy logc, t s possble to reason about these parameters n a qualtatve manner, and at the same tme to mprove the schedulng decsons. The Rest of the paper s organzed as follows. In secton 2, we descrbe the schedulng model, n whch the assumptons about the grd and ncomng jobs are expressed. In secton 3, the proposed global schedulng algorthm s presented. In secton 4, we evaluate the global scheduler through smulaton, and secton 5 concludes the paper. 2. The schedulng model and assumptons The assumed multcluster conssts of m clusters C1, C2,,C m, and each cluster s composed of a number of homogenous computatonal resources, and a scheduler. The schedulng s done n two levels: global and local. The arrval job could be submtted to the scheduler of any of the clusters. The cluster whch the job s submtted to s called local cluster, and the others are called remote. The scheduler of the local cluster should decde where the arrval job s gong to be run. It may choose the local or one of the remote clusters for executng the job. Ths decson s done through global schedulng, whch s the man subject of ths paper. If the selected cluster s one of the remote ones, the local scheduler submts the whole job to the scheduler of that cluster. After ths stage, the scheduler of the cluster whch fnally receves the job s responsble for schedulng the parallel job wthn ts local nodes. Ths stage of the schedulng process s called local schedulng, whch has prevously been the subject of some papers [5,6]. Fgure 1 shows an overall schema of the multcluster under nvestgaton. Parallel jobs consdered n ths paper are rgd. The job model s bult from user-provded applcaton characterstcs that do not requre extensve job proflng. They are The number of parttoned tasks. The rato of communcaton to executon. The rato of communcaton to executon gves a method of weghng the relatve mportance of communcaton rates and computatonal pow er for the job, wthout requrng extensve applcaton proflng. In summary, a parallel job, denoted by J, s dentfed by a 3-tuple (A, N, R ), where A s J s arrval tme, N s the number of computatonal resources on whch J requested to be run, R s the rato of communcaton to executon. ISSN: 1790-5109 821 ISBN: 978-960-6766-85-5

Fgure 1: The multcluster superstructure 3. schedulng After a user submts a job to one of the clusters, the global schedulng process s started by the scheduler of that cluster. The goal s to fnd a sutable cluster for assgnng the job to t. The global schedulng conssts of two stages: In the frst stage, all of the clusters are consdered equally, and a prorty s assgned to each of them. Afterwards the cluster wth the hghest prorty s selected as a canddate for assgnng the Job. When the ultmate cluster for assgnng the job s a remote cluster, a requrement of permsson s arsen, because one or more than one job may have been scheduled to that cluster by ts local scheduler or other schedulers durng the nterval between choosng that cluster for schedulng and submttng the job. Hence before submttng a job to a remote cluster, the local scheduler should send a permsson message to that cluster, and upon recevng the ok response, t submts the job. In reverse, f the response s No, the local scheduler repeats the global schedulng process after updatng the state nformaton of other clusters. 3.1 The proposed global schedulng algorthm For the characterzaton of the state of the cluster, we consder two parameters: the avalable number of CPUs and the cluster s network load. What s meant by cluster s network load s the level of communcaton presently traffckng through the cluster, whch s a rato between zero and one. When a job arrves, the scheduler s trggered to assgn the job to a cluster. For ths, t consders all the clusters and assgns two weghts (whch are numbers between zero and one) to each cluster. The frst weght (w), also called cluster w eght for number of machnes, determnes a matchng degree between the number of avalable low load CPUs n the cluster, and the number of tasks of the job. The second weght (w2), also called cluster weght for network load, consders job s communcaton requrements, and cluster s network load. We have performed many smulaton tests to adjust the coeffcent of these weghts n aggregatng prorty equaton. The fnal formula obtaned for calculatng prorty of a cluster wth respect to the newly arrves job s gven below Prorty = 0.7 W1 + 0.3 W2 After computng the prorty of all the clusters, the scheduler assgns the job to the cluster wth the maxmum prorty. 3.1.1 Cluster s weght for number of machnes Load of cluster s nodes s a dynamc attrbute, and s computed by averagng the currently reported loads (CPU usage) of the node. The scheduler parttons the nodes of a cluster nto low-load, medum-load, and hgh-load nodes. A node whch ts load s less than 0.3 s low-load; a node wth load between 0.3 and 0.6 s medumload; and a node wth load hgher than 0.6 s a th hgh-load node. For the cluster, L, M, H show the number of low-load, medum-load and hgh-load nodes, respectvely. The followng pseudo code shows the algorthm for computng the cluster s weght for number of machnes (W1). f (L> numoftask) { L numoftask w1 1 L f w1 0.5 w1 0.5; } else f ( L M numoftask ) { numoftask L w1 0. 5 numoftask 3 3 ISSN: 1790-5109 822 ISBN: 978-960-6766-85-5

f w1 0.2 w1 0.2; } else f ( L M H numoftask { w 1 numoftask ( L M ) 0.2 numoftask f w1 0 w1 0; } 3 Here, numoftask determnes the number of parttoned tasks of the job. 3.1.2 Cluster s weght for network load One of the man features of ths work s consderng jobs communcaton requrements and clusters network load. When a job has a hgh communcaton rato, t must be scheduled to a cluster wth low network load. For ths, we use fuzzy logc to assgn a weght to the cluster, whch determnes sutablty of executng the job on the cluster. Ths assgnment consders communcaton requrements of the job and network load of the cluster. The fuzzy rule based system has two nput parameters: Jobs communcaton rato, and avalable bandwdth of the cluster (as a rato between zero and one), and one output: cluster s weght for network load (W2 ). When job s communcaton rato s close to one, t means that, the job requres hgh communcaton, so a small weght (close to zero) should be assgned to a cluster wth a lttle (close to zero) avalable bandwdth, and a hgh weght (close to one) should be assgned to a cluster wth hgh ( close to one) avalable bandwdth. Fgures 2 through 4 show examples of the fuzzy membershp functons for these two nputs, and for the output beng the weght assgned to network load of the cluster. Table I summarzes the rules that map the nputs to the output. We have used product nference engne, sngleton fuzzfre, and center average defuzzfre. rato Fgure 2: Fuzzy membershp functons of ommuncaton to executon rato Fgure 3: Fuzzy membershp functons of avalable BW Fgure 4: Fuzzy membershp functons of cluster weght ISSN: 1790-5109 823 ISBN: 978-960-6766-85-5

Table I. Fuzzy rules for mappng nputs to the output both Fgures 6 and 7, we conclude that the proposed fuzzy algorthm s better than the best-ft algorthm n reducng parallel completon tme. The proposed algorthm uses the job s communcaton requrements, and cluster s network load to take schedulng decsons more effcently, and as a result, t mproves job s completon tme. 4. Smulaton results We have developed a smulator n Matlab to evaluate the performance of the proposed schedulng algorthm. The smulated multcluster conssts of several clusters, each of whch composed of 1 to 100 nodes. Interarrval tme of jobs s consdered to satsfy a Posson process wth the parameter λ seconds. Each job s submtted wth three attrbutes: arrval tme, number of tasks, and communcaton to executon rato. Each job has a random number of tasks between 1 and 50, and a random communcaton to executon rato between 0 and 0.7. The scheduler does not need to predct the job s executon tme. We compare our schedulng algorthm wth a dstrbuted best-ft polcy, whch gnores the communcaton requrements of the jobs, and avalable bandwdth of the clusters. The best-ft schedulng algorthm assgns the submtted job to a cluster whose number of dle nodes s greater than the number of tasks of the job, and whose number of dle nodes s the least. There are two schedulng scenaros n the smulaton. Scenaro I: 5 clusters, 15 jobs, λ=12. Scenaro II: 30 clusters, 50 jobs, λ=16. For each scenaro systems are automatcally generated Fgure 6 shows parallel jobs completon tme for scenaro I. Small rectangles and parallelograms show completon tme of jobs n the best-ft and the proposed algorthm, respectvely. In Fgure 7 parallel jobs completon tme are shown for scenaro II. In both fgures, horzontal axs shows the arrval tme of jobs, and vertcal axs represents the completon tme of the submtted jobs. The completon tme of a job s computed as the nterval between job s arrval tme and job s completon tme. As can be seen n Fgure 6: Smulaton for parallel jobs completon tme (Scenaro I) Fgure 7: Smulaton for parallel jobs completon tme (Scenaro II) ISSN: 1790-5109 824 ISBN: 978-960-6766-85-5

5. Concluson Job schedulng s very complcated n computatonal grds. Parallel jobs are a set of mportant applcatons that usually consttute the workflow of a grd. In ths paper, we present a dstrbuted schedulng algorthm for schedulng parallel jobs n a computatonal grd. The schedulng s done n two layers: global and local. We focus our work to global schedulng, whch s responsble for allocatng the submtted job to a cluster. The global scheduler assgns a prorty, based on two matchng degrees, to each cluster for a submtted job. It then allocates the cluster wth hghest prorty to the job. Frst matchng degree (w 1 ) expresses the sutablty of executng the job on the cluster n terms of the number of tasks of the job, and the number of avalable nodes n the cluster. The other (w 2 ) expresses the sutablty n terms of cluster s avalable bandwdth, and job s communcaton requrements. For computng w 2, we use a fuzzy rule based system to consder dfferent parameters n a qualtatve manner. The smulaton results show the mprovement of the proposed algorthm over a dstrbuted best-ft polcy whch gnores communcaton requrements of the jobs and network traffc of the clusters. [5] B. G. Lawson and E. Smrn, Multple-queue Backfllng Schedulng wth Prortes and Reservatons for Parallel Systems, Proceedngs of the Eghth Job Schedulng Strateges for Parallel Processng, 2002. [6] E. Shmuel and D. G. Fetelson, Backfllng wth look ahead to optmze the performance of parallel job schedulng, n Job Schedulng Strateges for Parallel Processng, D. G. Fetelson, L.Rudolph, and U. Schwegelshohn (Eds.), pp.228-251, Sprnger-Verlag, 2003. [7] H. Yan, et, al, An Improved Ant Algorthm for Job Schedulng n Grd Computng, Proceedngs of the Fourth Internatonal Conference on Machne Learnng and Cybernetcs, Guangzhou, 18-21 August 2005. [8] Y. Hu, et. Al, An Algorthm for Job Schedulng n Computatonal Grd Based on Tme-Balancng Strategy, Proceedngs of the Fourth Internatonal Conference on Machne Learnng and Cybernetcs, Guangzhou, 18-21 August 2005. [9] Lang-Teh Lee Chn-Hsan Lang Hung-Yuan Chang, An Adaptve Task Schedulng System for Grd Computng, Proceedngs of the Sxth IEEE Internatonal Conf erence on Computer and Informaton Technology, 2006. References [1] Mchael Walker, A Framework for Effectve Schedulng of Data-Parallel Applcatons n Grd Systems, Master thess, Unversty of Vrgna, 2001. [2] Lgang He, Stephen A. Jarvs, Danel P. Spooner, Xnuo Chen and Graham R. Nudd, Dynamc Schedulng of Parallel Jobs wth QoS Demands n Multclusters and Grds, Proceedngs of the Ffth IEEE/ACM Internatonal Workshop on Grd Computng, 2004 [3] HU Rong, HU Zhgang, A Schedulng Algorthm Amed at Tme and Cost for Meta-tasks n Grd Computng Usng Fuzzy Applcablty, Proceedngs of the Eghth Internatonal Conference on Hgh-Performance Computng n Asa-Pacfc Regon, IEEE 2005. [4] Tracy D Braun, Howard Jay Segel et al, A comparson of eleven statc heurstcs for mappng a class of ndependent tasks onto heterogeneous dstrbuted computng system, Journal of Parallel and Dstrbuted computng, 2001, 6, pp.810-837. ISSN: 1790-5109 825 ISBN: 978-960-6766-85-5