HDFS Fair Scheduler Preemption for Pending Request

Size: px

Start display at page:

Download "HDFS Fair Scheduler Preemption for Pending Request"

Marshall Watts
5 years ago
Views:

1 HDFS Fair Scheduler Preemption for Pending Request B.PURNACHANDRA RAO Research Scholor, Dept of Computer Science & Engg, ANU College of Engg & Technology,Guntur,India. Dr. N.NAGAMALLESWARA RAO Prof. Dept of Information Technology, R.V.R. & J.C. College of Engg& Technology,Guntur, India. ABSTRACT Hadoop is a popular open-source framework software framework for storing and processing large sets of data on a platform consisiting of commodity hardware. The main idea behind Hadoop is to move computation to the data, instead of the traditional way of moving data to computation. Hadoop file sizes are usually very large, ranging from gigabytes to terabytes,and large Hadoop clusters store millions of these files. Resource allocation among competing users, and scheduling jobs in Hadoop cluster is one of the key task in Hadoop internal oprations.hadoop is having three built in resource schedulers, such as the Fair scheduler, FIFO scheduler and the Capacity scheduler. First In First Out is a simple and early scheduler, and it uses a single queue for all jobs. There is no concept of priority in chosing a job for execution, with the oldest jobs getting chosen first from the head of the queue. The Fair scheduler enables multiple queue placement policies, which dictate where the scheduler places new applications. You can submit applications to a nonexistent queue by setting the create flag, which creates a new queue. The capacity scheduler chooses jobs with the highest gap between currently used and granted capacity, that is the most undeserved queues are offerred resources before other queues. The Fair scheduler selects jobs on the basis of the highest time deficit. The Fair scheduler uses preemption to support fairness among the queues and assigns priorities to users through weights. The capacity scheduler uses preemption to return guaranteed capacity back to the queues. Fair Scheduler is not having the capability of preemption for pending request. That means it doesn t consider what s the actual resource request, this causes when an under-utilized queue needs one single container with 60GB, scheduler could preemption 60 * 1GB containers on different nodes. Such preempted resources cannot be used by target resource request. This paper shows how to consider the actual metrics of the request before preempting resources. KEYWORDS Hadoop Distributed File System (HDFS), Schedulers, FIFO Scheduler, Fair Scheduler, Capacity Scheduler, Preemption, Preemption for Pending Request, NameNode, DataNode. 1 INTRODUCTION Apache Hadoop [1] is explicitly designed to handle large amounts of data, which can easily run into many petabytes and even exabytes. Hadoop file sizes are very usually very large,ranging from gigabytes to terabytes, and large Hadoop clusters store millions of these files. Everything in Hadoop is based on the assumption that hardware fails. Hadoop depends on large numbers of servers so it can parallelize work across them. Server and storage failures are to be expected, and the system is not affected by nonfunctioning storage units or even failed server. Traditional databases are geared mostly for fast access to data and not for both batch processing. Hadoop was originally designed for batch processing such as the indexing of millions of web pages and provides streaming access to data sets. Unlike traditional databases, Hadoop data files employ a write-once-read-many access model. Data consistency issues that may arise in an updatable database are not anissue with Hadoop file systems, because only a single writer can write to a file at any time. HDFS [2] [3] is designed for write-oncereadmany access model for files. Hadoop, MapReduce [5], Dryad [10] and HPCC (High- Performance Computing Cluster) [11] frameworks are Data-intensive and they are depending on the disk based file systems to meet their exponential storage demands. Hadoop is having the master nodes called NameNode and it is having namespace. Data will be stored in DataNodes where these are connected to Namenode and periodically sends status report to NameNode. Namenode maintains the blocks info where the data is available in datanode and empty blocks info. Each block data is stored in couple of locations, it means it is replicated to number of 119 B.PURNACHANDRA RAO, Dr. N.NAGAMALLESWARA RAO

2 blocks and the replication factor we have to configure in the hadoop configuration file. Hadoop distributed file system (HDFS) [6] has the capacity to store large amounts of data. Whenever client wants to store data to HDFS file system, client will talk to Namenode. Namenode will find out the metadata available in namespace to get the number of data blocks info which are vacant. The list of data blocks will be returned to HDFS client. Based on the list client will talk to DataNode to write the data[9]. Resource allocation refers to the allocation of scarce, finite computing resources, such as CPU time, memory, storage space and network bandwith, among the users that utikize a Hadoop cluster.the two more important resources that you have control over are processing power (CPU) and memory (RAM). Hadoop resource schedukers are components that are responsible for assigning tasks to available YARN containers on various DataNodes. The scheduler is a plugin with in Resource Manager. Hadoop is having three types of schedulers FIFO, Capacity scheduler and Fair Scheduler. FIFO is a simple early Hadoop scheduler, and it uses a single queue for all jobs. Capacity scheduler submits jobs into queues, each of which is guranteed a minimum amount of resources such as RAM and CPU. Fair scheduler assigns jobs to queues with guaranteed minimum resources. Preemption "When the cluster doesn t have enough idle resource, one queue is over-utilized and another queue is under-utilized, scheduler can preemption resource from the overutilized queue" is supported by both Capacity and Fair scheduler. But preemption for pending request is not supported by Fair scheduler. In this paper we will show that how we can utilize the concept of preemption for pending request in Fair scheduler. 2 LITERATURE REVIEW 2.1 HDFS SCHEDULERS Resource allocation is a crucial part of Hadoop Administration[4]. Hadoop resource schedulers are components that are responsible for assigning tasks to available YARN containers on various DataNodes. The scheduler is a plugin within the ResourceManager. First In First Out scheduler is a simple early Hadoop scheduler, and it uses a single queue for all jobs. There is no concept of priority in choosing a job for execution, with the oldest jobs first from the head of the queue. Capacity Scheduler submits the jobs to queues, each of which is guranteed a minimum amount of resources such as RAM and CPU. The queues with a greater gap between their used capacity and their granted resources are offerred priority in the allocation of new resources are released by completing jobs. If it has excess capacity, the scheduler shares it among the cluster users, just as the Fair scheduler does. It uses the concept of reservation and preemption to return the guaranteed capacity to the queus. Fair scheduler asigns jobs to queues with guranteed minimum resources. The scheduler picks up the jobs with the greatest time defecit for allocatiing resources that are freed by other applications. This scheduler can also allocates excess capacity from a pool to other pools. The Fair scheduler uses the concept of priority to support the importance of an application within a pool. It uses the concept of preemption to support fairness among different resource pools. Please refer Fig 1 for configuring Fair Scheduler. Need to use the value as org.apacahe. hadoop.yarn.server.resource.resourcemanager.schedu ler.capacity.capacityscheduler to utilize CapacityScheduler. Fig 1: Fair Scheduler Configuration The essential concept of CapacityScheduler is it uses dedicated queues to which you assign jobs. Each queue has a predetermined amount of resources allocated to it. However you pay in terms of the clusters resource utilization, since you are reserving and guaranteeing queue resource capacities. The goal of the capacity scheduler is to enable multiple tenants (users) of an organization to share the resources of a Hadoop cluster in a predictable fashion. Hadoop achieves this goal by using job queues. The scheduler provides guaranteed capacity of the job queues, while providing elasticity for the utilization of the cluster by the queues. Elasticity in this context means that the assignment of the resources is not set in concrete. As the queues wend their way through the cluster, it is common for some queues to be overloaded and for some others to be relatively idle. The capacity scheduler realizes this and automatically transfers the unused capacity of the lighlty used queues to the overloaded queues. A queue is an ordered list of jobs. A queue is allocated a certain portion of your clusters resources. When you create a queue, you allocate it a certain portion 120 B.PURNACHANDRA RAO, Dr. N.NAGAMALLESWARA RAO

3 of your clusters resources. User applications will then be submitted to this queue to access the queue's allotted resources. Applications submitted to a queue will run in a FIFO order. Once the apllications submitted to queue start running, they cant be preempted, but as the tasks complete, any free resources will be assigned to queues running below the capacity allowed to them. Please refer Fig 3 for creation of queues. Fig 3: Create queues NAMENODE HDFS is having master node called Namenode having the namespace which maintains metadata regarding datanode blocks info. It will manage the user access to file system. It will process the block reports sent by the DataNodes and maintaining the location where data blocks live. The NameNode also updates the fsimage file with the updated HDFS state information[12]. Once client receives set of datanodes the data will be written to datanodes in pipeline fashion [7,8] DATANODE HDFS is having slave nodes called DataNode. It provides the block storage by storing blocks on the local file system. Fulfilling the read/write requests from the clients who want to work with the data stored on the DataNodes. Create and delete of the data DataNodes will keep in touch with the NameNode by sending periodic block reports and heartbeats. A heartbeat confirms the DataNode is alive and healthy, and a block report shows the blocks being managed by the DataNode. Please refer Fig 4 for HDFS architecture. Fig 4: HDFS Architecture FAIR SCHEDULER The Fair Scheduler is a built-in Hadoop resource scheduler whose goal is to let smaller jobs finish fast (short response times) and provide a guaranteed service level for production jobs. The jobs are not usually of same type, some are production jobs that involve data imports and hourly reports. Some other jobs are run by the data analysis who are running adhoc Hive queries and Pig jobs. Usually there are some long-running data analyses or machine learning jobs running at the same time. The main task here is how to allocate the resources of the cluster in an efficient manner among these competing jobs. You dont need to reserve a predefined amount of capacity to groups or queues. The scheduler dynamically distributes the available resources among all the running jobs in a cluster. Please refer Fig 5 for FairScheduler. 121 B.PURNACHANDRA RAO, Dr. N.NAGAMALLESWARA RAO

Fig 5: FairScheduler when a large job starts first, and it happens to be the only job running, it starts using all the clusters resources by default (unless you specify maximum resource limits).

4 Fig 5: FairScheduler when a large job starts first, and it happens to be the only job running, it starts using all the clusters resources by default (unless you specify maximum resource limits). Subsequently when a second job starts up, it is allocated roughly half of the total cluster resources(by default), now both jobs share the cluster resources on an equal basis. This is the concept of fairness that led to naming this scheduler the Fair Scheduler. The Fair scheduler ensures the resource allocation for application is fair meaning that all applications get roughly equal amounts of resources over time. When we talk about resources in the context of the Fair Scheduler, I am referring to memory only. However, you can also employ a variation of the Fair Scheduler called the Dominant Resource Fairness DRF scheduler, which uses both memory and CPU as resources. Dominant Resource Fairness is a concept wherein the YARN schedulers examine each user's dominant resource and use it as a measure of the resource usage by that user PREEMPTING APPLICATIONS Preempting an application means containers from other applications may need to be killed if necessary, to make room for the new applications. If you dont want late arriving applications to a specific leaf queue to wait because the running applications in other leaf queues are taking up all the allotted resources, you can use preemption. Under these situations although you have guaranteed a set capacity for a queue, there are no free resources available to be allotted to this leaf queue. The application Master container is killed only as a last resort, with preference being given to killing containers that have not been executed yet. Minimum Sharing preemption is when a pool is operating below its configured minimum share and Fair share preemption is when a pool is operating below its fair share. Of these two minimum share preemption is stricter and kicks in immediately when a pool, starts operating below its minimum allocated share for a specific period, before job preemption begins. Once preemption starts a pool thats currently below its minimum allocated share can go up to its minimum share, where as a pool that is now below 50 percent of its fair share will go all the way up until it hits full fair share. You can configure task preemption to ensure that key jobs are processed on time. However, preemption is not arbitrary - it's used to kill containers for queues that are using more than their fair share of resources. If you enable preemption in the cluster the fair scheduler will preempt applications in other queues if a queue's minimum share is not met for some period of time. Preemption ensures that your key production jobs are not delayed because other less important jobs are already running in the cluster. The Fair scheduler kills the most recently launched application to minimize the waste of resources in the cluster. To enable preemption, set the yarn.scheduler.fair.preemption property to true in the yarn-site.xml file. Both the Fair scheduler and the Capacity Scheduler have an identical goal : Allow long running jobs to complete in a decent time while simultaneously enabling users running queries to get their results back quickly. Both schedulers support hierarchical queues. All queues descend from a root or default queue. You can submit applications to leaf queues. Both queues support minimum and maximum capacities. Both queues support maximum application limits on a per-queue basis. Both schedulers let you move applications across queues. The Fair Scheduler contains scheduling policies that determine which jobs get resources each time the scheduler allocates resources. You can use the three types of sheduling policies -fifo, fair and drf by specifying the policy with the defaultqueueschedulingpolicy top-level element. The Capacity Scheduler on the other hand always schedules jobs within each queue with the FIFO principle. The Fair scheduker enables multiple queue placement policies., which dictate where the scheduker places new applications among the queues based on users, groups, or the queue requests made by the applications. You can submit applications to 122 B.PURNACHANDRA RAO, Dr. N.NAGAMALLESWARA RAO

5 non existent queue by setting the create flag so that it will create new queue. Preemption is a way to balance resource usage between queues: When the cluster doesn t have enough idle resource, one queue is over-utilized and another queue is under-utilized, scheduler can preemption resource from the overutilized queue. Preemption for pending request means it doesn t consider what s the actual resource request, this causes when an under-utilized queue needs one single container with 60GB, scheduler could preemption 60 * 1GB containers on different nodes. Such preempted resources cannot be used by target resource request. Capacity Scheduler is having the capability of preemption for pending request. But Fair scheduler is not having this capability. If it requests for resources using preemption policy for the pending request, it is not guarantee that It will utilize the allotted resource. This is the problem in the existing architecture. 3 PROPOSED METHOD 3.1 PROBLEM STATEMENT HDFS Fair scheduler is not supporting Preemption for pending request. It doesn t consider what s the actual resource request, this causes when an underutilized queue needs one single container with 60GB, scheduler could preemption 60 * 1GB containers on different nodes. Such preempted resources cannot be used by target resource request. If it requests for resources using preemption policy for the pending request, it is not guarantee that It will utilize all the allotted resources. This is the problem in the existing architecture. 3.2 PROPOSAL When there is a resource request Fair Scheduler needs to consider the resource request by any means so that It can request/preempt for exact need. The solution is Fair Scheduler needs to look for the 60GB space at only one location instead of getting at different locations/nodes. So we need to implement the Best fit memory allocation algorithm in Fair Scheduling process. Input memory blocks and processes with sizes. Initialize all memory blocks as free. Start by picking each process and find the minimum block size that can be assigned to current process i.e, find min(blocksize[1], blocksize[2],...blocksize[n]) > processsize[current], if found then assign it to the current process. If not then leave that process and keep checking the further processes. As per this allocation The best fit deals with allocating the smallest free partition which meets the requirement of the requesting process. This algorithm first searches the entire list of free partitions and considers the smallest hole that is adequate. It then tries to find a hole which is close to actual process size needed. As we discussed it is allocating 1GB resources at 60 different nodes, but if we implement this it will look for smallest free partition which meets the requirement. If it is not there it will preempt for expected resource. Please Refer Fig 6 for Best fit memory allocation, where P3 is placed at smallest affordable place in best fit, where as in worstfit it will be placed at highest capacity block, where as in fitrst fit it will look for first capable slot. Fig 6: Best Fit Memory allocation 4 IMPLEMENTATION Refer Fig 7 for the Implementation architecture using Best Fit allocation algorithm. Whenever pre emption is required process will look for exact fit and allocate the same. Fair scheduler uses best fit allocation process as shown in the figure before opting for the block to write the data. The extra complexity includes searching time required to get the exact fit. 123 B.PURNACHANDRA RAO, Dr. N.NAGAMALLESWARA RAO

entire list of free partitions and considers the smallest hole that is adequate. It then tries to find a hole which is close to actual process size needed.

6 entire list of free partitions and considers the smallest hole that is adequate. It then tries to find a hole which is close to actual process size needed. As we discussed it is allocating 1GB resources at 60 different nodes, but if we implement this it will look for smallest free partition which meets the requirement. If it is not there it will not allocate small resources, so that we can utilize these small resources for another applications. This will result in least wasted space. Internal fragmentation reduced but not eliminated. Only disadvantage in best fit algorithm is searching time increases for finding the exact fit. The future work includes finding the procedure for reducing the time to get the exact fit. Fig 7: Fair Scheduler with Best fit allocation 5 CONCLUSION Preempting an application means containers from other applications may need to be killed if necessary, to make room for the new applications. If you dont want late arriving applications to a specific leaf queue to wait because the running applications in other leaf queues are taking up all the allotted resources, you can use preemption. Preemption is a way to balance resource usage between queues: When the cluster doesn t have enough idle resource, one queue is over-utilized and another queue is under-utilized, scheduler can preemption resource from the over-utilized queue. Preemption for pending request means it doesn t consider what s the actual resource request, this causes when an under-utilized queue needs one single container with 60GB, scheduler could preemption 60 * 1GB containers on different nodes. Such preempted resources cannot be used by target resource request. Capacity Scheduler is having the capability of preemption for pending request. But Fair scheduler is not having this capability. The solution is Fair Scheduler needs to look for the 60GB sapce at only one location instead of getting at different locations/nodes. So we need to implement the Best fit memory allocation algorithm in Fair Scheduling process. As per this allocation The best fit deals with allocating the smallest free partition which meets the requirement of the requesting process. This algorithm first searches the REFERENCES [1] Apache Hadoop. Available at Hadoop Apache. [2] Tom White, "Hadoop:The Definitive Guide", Storage and Analysis at Internet Scale, Second ed., Yahoo Press, [3] Konstantin V. Shvachko, Scalability of Hadoop Distributed File System. [4] George Porter. Decoupling storage and computation in Hadoop with SuperDataNodes, ACM SIGOPS Operating System Review, 44, [5] Hadoop Distributed File System with Cache system - a paradigm for performnace improvement by Archana Kakade and Dr. SuhasRaut, International journal of scientific research and management (IJSRM), Vol.2,Issue.1: Pp, /Aug [6] J. Dean and S. Ghemawat (2004), Mapreduce: Simplified Data Processing on Large Clusters. In Proceeding of the 6th Conference on Symposium on operating Systems Design and Implementation (OSDI 04), Berkeley, CA, USA, 2004, pp [7] Shafer J, Rixner S, Cox AL. The Hadoop Distributed Filesystem: Balancing Portability and Performance, in Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2010), White Plains, NY, [8] Feng Wang, Jie Qiu, Jie Yang, Bo Dong, Xinhui Li, Ying Li, " Hadoop High Availability through Metadata Replication", IBM China Research Laboratory, ACM, pp 37-44,2009. [9] Derek Tankel. Scalability of Hadoop Distributed File System, Yahoo developer work, [10] The Case for RAMClouds: Scalable High- Performance Storage Entirely in DRAM Department of ComputerScience Stanford University. [11] J. Shafer and S Rixner (2010), "The Hadoop distributed file system: balancing portability and performance, In 2010 IEEE International Symposium on Performance Analysis of System and Software (ISPASS2010), White Plains, NY, March Pp B.PURNACHANDRA RAO, Dr. N.NAGAMALLESWARA RAO

7 [12] SAM R. ALAPATI, Expert Hadoop Administration, Managing, Tuning and Securing Spark, YARN and HDFS, Addison wesley data & analytics series, B.PURNACHANDRA RAO, Dr. N.NAGAMALLESWARA RAO

HDFS Pipeline Reconstruction to Avoid Data Loss

HDFS Pipeline Reconstruction to Avoid Data Loss Purnachandra Rao. B Department of Computer Science & Engineering ANU College of Engg & Technology, Guntur, India. pcr.bobbepalli@gmail.com Nagamalleswara