CHAPTER 7 CONCLUSION AND FUTURE SCOPE

Similar documents
Assignment 5. Georgia Koloniari

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

Boundary control : Access Controls: An access control mechanism processes users request for resources in three steps: Identification:

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

Global Load Balancing and Fault Tolerant Scheduling in Computational Grid

OPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5

A Comparative Study of Load Balancing Algorithms: A Review Paper

Various Strategies of Load Balancing Techniques and Challenges in Distributed Systems

Introduction to Grid Computing

Resolving Load Balancing Issue of Grid Computing through Dynamic Approach

Properties of Processes

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Chapter 5: CPU Scheduling

Operating Systems Unit 3

Application of SDN: Load Balancing & Traffic Engineering

Study of Load Balancing Schemes over a Video on Demand System

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Distributed Web Crawling over DHTs. Boon Thau Loo, Owen Cooper, Sailesh Krishnamurthy CS294-4

Two-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration

LOAD BALANCING AND DEDUPLICATION

Parallel Databases C H A P T E R18. Practice Exercises

Effective Load Balancing in Grid Environment

It also performs many parallelization operations like, data loading and query processing.

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Introduction to parallel Computing

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

CPU Scheduling: Objectives

Computational Grid System Load Balancing Using an Efficient Scheduling Technique

WSN Routing Protocols

Distributed and Cloud Computing

CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo

Datacenter replication solution with quasardb

Replication in Distributed Systems

GRID SIMULATION FOR DYNAMIC LOAD BALANCING

LECTURE 3:CPU SCHEDULING

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013)

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Nowadays data-intensive applications play a

Queueing Theoretic Approach to Job Assignment Strategy Considering Various Inter-arrival of Job in Fog Computing

Multiprocessor and Real- Time Scheduling. Chapter 10

Virtual Machine Placement in Cloud Computing

Scheduling in Distributed Computing Systems Analysis, Design & Models

Job Scheduler Simulator Extension for Evaluating Queue Mapping to Computing Node

Chapter 5 CPU scheduling

HUAWEI OceanStor Enterprise Unified Storage System. HyperReplication Technical White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

Systematic Cooperation in P2P Grids

Distributed Scheduling. Distributed Scheduling

Load Balancing with Random Information Exchanged based Policy

Load Balancing Techniques in Cloud Computing

The Google File System

Developing deterministic networking technology for railway applications using TTEthernet software-based end systems

Chapter 5: CPU Scheduling. Operating System Concepts 8 th Edition,

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

CHAPTER 6 ENERGY AWARE SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT

A New Approach to Ant Colony to Load Balancing in Cloud Computing Environment

Mobile Edge Computing for 5G: The Communication Perspective

Distributed System Chapter 16 Issues in ch 17, ch 18

ayaz ali Micro & Macro Scheduling Techniques Ayaz Ali Department of Computer Science University of Houston Houston, TX

Enhanced Round Robin Technique with Variant Time Quantum for Task Scheduling In Grid Computing

SCALING A DISTRIBUTED SPATIAL CACHE OVERLAY. Alexander Gessler Simon Hanna Ashley Marie Smith

Chapter 5: Process Scheduling

Black-box and Gray-box Strategies for Virtual Machine Migration

6. Peer-to-peer (P2P) networks I.

Multiprocessor and Real-Time Scheduling. Chapter 10

2. LITERATURE REVIEW. Performance Evaluation of Ad Hoc Networking Protocol with QoS (Quality of Service)

Distributed Systems Principles and Paradigms

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Delay Performance of Multi-hop Wireless Sensor Networks With Mobile Sinks

Mark Sandstrom ThroughPuter, Inc.

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Shen, Tang, Yang, and Chu

Chapter 20: Database System Architectures

May Gerd Liefländer System Architecture Group Universität Karlsruhe (TH), Systemarchitektur

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Chapter 5: CPU Scheduling

CHAPTER 5 AN AODV-BASED CLUSTERING APPROACH FOR EFFICIENT ROUTING

A Comparative Study of Various Computing Environments-Cluster, Grid and Cloud

CS370 Operating Systems

Tasks. Task Implementation and management

Wireless Sensor Networks: Clustering, Routing, Localization, Time Synchronization

BIG-IQ Centralized Management: ADC. Version 5.0

A New Fuzzy Algorithm for Dynamic Load Balancing In Distributed Environment

Operating Systems. Process scheduling. Thomas Ropars.

MODELING OF SMART GRID TRAFFICS USING NON- PREEMPTIVE PRIORITY QUEUES

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

A large cluster architecture with Efficient Caching Coherency, Intelligent Management, and High Performance for a Low-Cost storage node

Best Practices for Setting BIOS Parameters for Performance

MC7204 OPERATING SYSTEMS

CS3733: Operating Systems

Oracle E-Business Availability Options. Solution Series for Oracle: 2 of 5

Lecture 23 Database System Architectures

Process- Concept &Process Scheduling OPERATING SYSTEMS

Future Generation Computer Systems. A survey of dynamic replication strategies for improving data availability in data grids

Scheduling in the Supermarket

Job sample: SCOPE (VLDBJ, 2012)

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date:

Precedence Graphs Revisited (Again)

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007

Transcription:

121 CHAPTER 7 CONCLUSION AND FUTURE SCOPE This research has addressed the issues of grid scheduling, load balancing and fault tolerance for large scale computational grids. To investigate the solution space, a study was premeditated for the fault tolerant scheduling and load balancing. A system model was developed to study the issues in computational grids. In particular, five decentralized algorithms were designed using only partial information. Decentralized and dynamic schemes have been built that are capable of efficient fault tolerance, load assignment and redistribution to minimize the average response time of the job and optimize resource utilization despite the scalability of grid systems, the heterogeneity in processing power and network bandwidth and considerable communication costs induced owing to information collection. This chapter concludes the dissertation by briefing the major contributions and unfolding the future research directions. Section 7.1 highlights the chief contributions. Section 7.2 focuses on the future scope, which is an extension of the past and current research on fault tolerant scheduling and load balancing for computational grids.

122 7.1 MAJOR CONTRIBUTIONS 7.1.1 Recent Neighbour Load Balancing Algorithm Recent Neighbour technique is a decentralized dynamic load balancing strategy with periodic load information exchanges. It logically divides the grid into three levels namely grid-level, cluster-level and leaf nodes. The jobs are assumed to be computationally intensive, mutually independent and can be executed at any cluster. No deterministic or priori information about the job is available. Each job is assigned a timer when it is generated. If the timer reaches a threshold and the job is not processed, the job is given the highest priority for execution The algorithm maintains two queues for storing incoming jobs namely the local job waiting queue and the global job waiting queue. The local job-waiting queue holds the jobs waiting to be assigned to intra-cluster nodes when load balancing is initiated. The global job-waiting queue holds those jobs waiting to be assigned to inter-cluster nodes when load balancing is initiated. The jobs in the global and local job-waiting queue are processed in First-Come First-Serve order. RN algorithm also maintains a node list NSET which contains information about the neighbours of the cluster and a cluster list CSET which contains information about the neighbors of the grid. NSET and CSET are updated whenever a computing node enters or fails in the cluster. RN algorithm first tries to assigns jobs and perform load balancing locally. If any neighbour in the cluster or grid at any instant of time is over-loaded, RN allots jobs to other neighbours in NSET or CSET with a minimal load using the sender-initiated approach to load balancing. Hence, based on the load information RN chooses the most suitable system for each job, thereby

123 minimizing the job execution time and maximizing the system throughput. RN also takes into account the system heterogeneity with respect to processing power but at the cost of a high communication delay induced owing to frequent load information exchanges. 7.1.2 Recent Neighbour Algorithm with Fault Tolerance This technique is a fault-tolerant version of RN algorithm. In a computing environment, job migration is the only efficient way to guarantee that the submitted jobs are completed reliably and efficiently even if a failure occurs. RN with fault tolerance detects the occurrence and type of resource failure by analyzing the information about the state of a resource. Resource failures is considered as process failure. RN algorithm with fault tolerance uses the concept of passive replication scheme and backup approach to avoid loss of jobs during resource failures. The algorithm guarantees that the jobs submitted are completely executed using available resources. 7.1.3 Symmetric-Initiated Algorithm Symmetric-initiated algorithm (SI-LB) is an extended study of RN method without fault tolerance. In Symmetric-initiated load balancing method, both the sender system and receiver system are responsible for job migration. In SI-LB, the load of a system at a particular instant of time t is defined as the total length of the jobs in job waiting queue divided by the system s current processing capacity. The algorithm surmounts the issue of a high communication delay by means of mutual information feedback (MIF) policy.

124 MIF is an event-driven policy which minimizes the overhead involved in collecting load status information. In MIF, each system maintains the state information of other systems by using a state object. The state object helps a system to estimate the load of other systems at any time without message transfer. This is done by using the concept of piggybacking. Each system collects and maintains the state information of its neighboring systems only. 7.1.4 Hybrid Load Balancing Algorithm This technique is a hybrid version of static and dynamic load balancing strategies for non-dedicated grid environments. It employs FCFS method for job scheduling. In hybrid load balancing technique, the resources of the grid environment are considered dynamic. That is, each computing resource can join or leave the grid dynamically and provides its time and level of contribution. 7.1.5 Performance-Driven Load Balancing Algorithm This technique is an extended study of the performance-driven load balancing proposed by (Kai Lu et al 2006 and 2007). It is based on a dedicated grid environment where all the computing resources work together to solve a compute-intensive problem. It proposes a primary-backup approach for fault- tolerance with a minimum replication cost and an efficient job scheduling technique with minimum communication cost. The main idea of passive replication scheme is that a backup copy of a job is activated only if a fault occurs while executing its primary copy. It does not require fault diagnosis and is guaranteed to recover all the affected jobs by processor failure. In such a scheme, only two copies of the job are scheduled on different processors

125 (space exclusion) and time exclusion (Budhiraja et al 1992). This approach is immensely helpful for grid where fault diagnosis is very difficult as one can discover a failure in a grid processor about which he/she could never know its hardware platform model has existed. Two techniques have been applied while scheduling primary and backup copies of each job. (1) Backup overloading consists of scheduling backups for multiple primary jobs at the same time slot in order to make an efficient utilization of the available processor time. (2) De-allocation of the resources reserved for backup jobs when the corresponding primaries complete successfully. Both hybrid load balancing algorithm and performance-driven technique juxtapose the strong points of neighbour-based and cluster-based load balancing strategies. A load balancing algorithm in which a resource exchanges information and transfers jobs to its physical and/or logical neighbours is called neighbour-based load balancing method. The load balancing algorithms in which the resources are partitioned into clusters based on network transfer delay are called cluster-based load balancing methods (Chatrapati et al 2010). With a view to improve the system flexibility, reliability and save the system resource, both approaches employ the passive replication scheme. The main objective of these techniques is to arrive at the job assignments that can achieve minimum response time, maximum resource utilization and a well balanced load across all the computing resources involved in a grid. 7.1.6 Discussion Optimizing workload allotment for the dynamic grid system is not a simple mission. The assignment of jobs to the systems is performed so as to

126 minimize the average response time and communication delay and optimize the resource utilization. Due to the dynamic nature of the grids, designing a supreme fault tolerant scheduling and load balancing technique still remains a challenge. It is hoped that the techniques can serve as an illustration for pursuing research work in the field of fault tolerance and load balancing. 7.2 FUTURE SCOPE As the configuration of the grid is small, the mathematical and theoretical performance of the proposed techniques cannot be derived with certainty. In the course of designing and evaluating fault tolerant scheduling and load balancing schemes for grids, quite a few attention-grabbing issues have been found which require further investigation. These issues are as follows: 7.2.1 Real Grid Environment The proposed techniques can be tested on real grid environment with assumed distribution for job arrival rate, their resource requirements and execution times for analysing their performance from the mathematical and theoretical perspective. 7.2.2 Security Concerns Grids are mostly formed by resources owned by many organizations and thus are not dedicated to certain users. As such jobs dispatched to the remote systems may experience security issues if the system is attacked by malicious users. Hence, a grid scheduler must be security driven. Applying the notion of security into the proposed approach is clearly a research opportunity.

127 7.2.3 Data Grid A data grid is a collection of geographically dispersed storage resources over a wide area network. The goal of the data grid is to provide a large virtual storage framework with unlimited power through collaboration among individuals and institutions. Heterogeneity is a big challenge for the data-intensive applications running on the data grids where interconnections are relatively slow and network latencies are high. Hence, the performance of the proposed approaches needs to be investigated for data grids.