Performance Optimization for Short MapReduce Job Execution in Hadoop

Size: px

Start display at page:

Download "Performance Optimization for Short MapReduce Job Execution in Hadoop"

Bertina Shanon Parks
5 years ago
Views:

2012 Second International Conference on Cloud and Green Computing Performance Optimization for Short MapReduce Job Execution in Hadoop Jinshuang Yan, Xiaoliang Yang, Rong Gu, Chunfeng Yuan, and Yihua

1 2012 Second International Conference on Cloud and Green Computing Performance Optimization for Short MapReduce Job Execution in Hadoop Jinshuang Yan, Xiaoliang Yang, Rong Gu, Chunfeng Yuan, and Yihua Huang Department of Computer Science and Technology National Key Laboratory for Novel Software Technology Nanjing University Nanjing , China Abstract Hadoop MapReduce is a widely used parallel computing framework for solving data-intensive problems. To be able to process large-scale datasets, the fundamental design of the standard Hadoop places more emphasis on highthroughput of data than on job execution performance. This causes performance limitation when we use Hadoop MapReduce to execute short jobs that requires quick responses. In order to speed up the execution of short jobs, this paper proposes optimization methods to improve the execution performance of MapReduce jobs. We made three major optimizations: first, we reduce the time cost during the initialization and termination stages of a job by optimizing its setup and cleanup tasks; second, we replace the pull-model task assignment mechanism with a push-model; third, we replace the heartbeat-based communication mechanism with an instant message communication mechanism for event notifications between the JobTracker and TaskTrackers. Experimental results show that the job execution performance of our improved version of Hadoop is about 23% faster on average than the standard Hadoop for our test application. Keywords-MapReduce; parallel computing; job execution; performance optimization I. INTRODUCTION The MapReduce parallel computing framework [1], proposed by Google in 2004, has become an effective and attractive solution for large scale data processing problems. Unlike many classic parallel programming models such as MPI [2] and PVM [3], MapReduce provides simple programming interfaces with two functions: map and reduce. The functions can be automatically executed in parallel on a cluster without requiring any intervention from the programmer. Moreover, MapReduce offers other benefits, including load balancing, high scalability, and fault tolerance, which make it a widely adopted parallel computing framework that also gained attention in academia. Hadoop [4], an open-source project under Apache Software Foundation, is an implementation of Google s MapReduce framework and is widely used by research communities and industry. Recently, many researchers tend to implement and deploy data-intensive and/or computation-intensive algorithms (machine learning algorithms [5] etc.) on MapReduce parallel computing framework for high processing efficiency. At the same time, there are also quite a few researchers exploring performance improvements for MapReduce or designing and implementing novel parallel computing architectures inspired by MapReduce framework for various purposes. Facebook proposed fair scheduler [6] to better solve the job scheduling problem in MapReduce framework. Researchers in UC Berkley proposed a task scheduling algorithm called LATE (longest Approximate Time to End) [7], which execute speculative tasks to improve the response of Hadoop. A delay task scheduling method is introduced in [8], which aims to achieve locality and fairness in task scheduling for increasing throughput. HPMR [9] adopts pre-fetching and pre-shuffling optimization strategies, which improve the overall performance of the standard Hadoop. Researcher Shubin Zhang and his partner have tried to accelerate MapReduce Job execution by caching intermediate data in distribute memory [10], estimating the progress of MapReduce pipelines [11] or optimizing MapReduce programs automatically [12]. Users usually expect short execution or quick response time from a short MapReduce job. Especially, this is true for online query or analysis-type applications. To provide SQLlike queries or analysis, some query systems are available, such as Google s Sawzall [13], Facebook s Hive [14], and Yahoo! s Pig [15]. These systems execute users requests by converting SQL-like queries to a series of MapReduce jobs which are usually short. Obviously, these systems are very sensitive to the execution time of underlying MapReduce jobs. Therefore, reducing the executing time of short jobs is very important to these types of applications. In this paper, we focus on improving the execution performance of short jobs on Hadoop. After analyzing shortcomings of the job execution mechanism in the standard Hadoop, we implement an optimized version of Hadoop MapReduce which is designed to reduce the time consumed in the execution process of a job. The first optimization is to reduce the time cost during the initialization and termination stages of a job by removing the constant time cost of 4 heartbeats for its setup and cleanup tasks. For the second optimization, instead of using the heartbeat-based pull-model task assignment, we design and implement a push-model task assignment mechanism. For the third optimization, we design and implement an instant message communication mechanism for events notification between the JobTracker and TaskTrackers to separate the message communication from heartbeats. Experimental results on our test application Correspondence should be addressed to: yhuang@nju.edu.cn, Yihua Huang, Ph.D /12 $ IEEE DOI /CGC

2 show improved performance with the optimized Hadoop for executing short jobs. Comparing with a long job, the short job refers to a job which costs several minutes during the whole execution. The rest of the paper is organized as follows. Section gives a brief introduction to the Hadoop MapReduce architecture, analyzes current job execution mechanism, and identifies the places that cause performance problems in the standard Hadoop. Section describes our optimization methods to solve the problems discussed in section. Section IV discusses experiments and performance evaluations of the optimization II. BACKGROUND A. Hadoop MapReduce Job Execution Process The Hadoop MapReduce framework, deployed on top of HDFS, consists of a JobTracker running on the master node and many TaskTrackers running on slave nodes. As the core component of the MapReduce framework, the JobTracker is in charge of scheduling and monitoring all the tasks of a MapReduce job. Tasks are distributed to the TaskTrackers on which the map and reduce functions implemented by users are executed. When a MapReduce job is submitted into Hadoop, the input data of the job would be split into several independent data splits with equal sizes with each map task processing one data split. These map tasks run in parallel and their outputs would be sorted by the framework and then fetched by reduce tasks for further processing. During the job execution, the JobTracker monitors the execution of each task, assigning failed tasks and altering state of the job in each phase. Job and task are two important concepts in Hadoop MapReduce architecture. In order to elaborate our problems, we firstly present the state transition of the job execution and the sequential process of a task. Figure 1 shows the state transition of the job execution. Generally, the execution process of a job can be partitioned into 3 stages: PREP, RUNNING and FINISHED. When the client submits a job to the Hadoop cluster, the execution process is as follows: PREP stage: at the beginning, the job is in the NEW state. Then, the job enters the PREP.INITIALIZING state to initialize job, conducting some initialization processing such as reading information of splits from HDFS, and creating map and reduce tasks on the JobTracker. Next, the setup task of the job will be scheduled to the TaskTracker. When the setup task is completed, the job will enter the RUNNING state. RUNNING stage: first, the job is in the RUNNING.RUN_WAIT state to wait for tasks to be scheduled. When a task has been scheduled to a TaskTracker, the job will enter the RUNNING.RUNING_TASKS state to execute all map/reduce tasks. Once all the map and reduce tasks are completed, the job moves to the RUNNING.SUC_WAIT state. In this state, the cleanup task of the job will be scheduled to the TaskTrackers to clean up the environment of the job. FINISHED stage: After the job cleanup task is done, the job will go into the SUCCEEDED state, or in other words, the job has been completed. In any state, a job can be killed by the client and go into the KILLED state or go into the FAILED state due to some failure. Figure 1. The state transition of a job Figure 2 shows the sequential process of a task. When a task is being processed in Hadoop, it has two components: the TaskInProgress in the JobTracker and the TaskTracker. TaskInProgress in the TaskTracker. According to Figure 1, we know that when a job is initialized, many map/reduce tasks of the job will be created. These tasks are waiting to be scheduled to the TaskTrackers for execution. In Figure 2, we illustrate how a task is processed. 1) The JobTracker creates a JobInProgress instance for each job and many corresponding map/reduce tasks are created. At this time, the tasks are in the UNASSIGNED state. 689

3 2) Each TaskTracker sends a heartbeat to the JobTracker for requesting tasks. In response, the JobTracker allocates one or several tasks to each TaskTracker. This is done by the first heartbeat. 3) After receiving tasks, the TaskTracker does the following work: creating a TaskTracker.TaskInProgress instance, running an independent child thread to execute the task, and then changing the task state of the TaskTracker to the RUNNING state. 4) Each TaskTracker reports the information of its task to the JobTracker, and the JobTracker changes the task state into the RUNNING state. This is done by the second heartbeat. Figure 2. The sequential process of a task 5) When the child thread is completed, each TaskTracker changes the task state into the COMMIT_PENDING state. 6) Each TaskTracker reports the information to the JobTracker again through a heartbeat. In response, the JobTracker changes the task state into the COMMIT_PENDING state to allow the TaskTrackers to commit the task results. 7) Each TaskTracker submits task results and changes the task state into the SUCCEEDED state. 8) Each TaskTracker reports success through a heartbeat. The JobTracker changes the task state into SUCCEEDED. By this time, the task is completed. B. Hadoop MapReduce Job Setup/Cleanup As presented in the state transition of a job, prior to scheduling map/reduce tasks of a job, a setup task should be scheduled first. In brief, the task is processed as follows: 1) Launch job setup task: through a heartbeat, the JobTracker discovers a TaskTracker with free map/reduce slot which can accept a new task, and then the JobTracker schedules a task to this TaskTracker. 2) Job setup task completed: the TaskTracker processes the task, and then reports information of the task to the JobTracker. The two steps described above will take two heartbeats (at least 6 seconds as a heartbeat interval is at least 3 seconds). Similarly, a cleanup task must be scheduled after all map/reduce tasks are completed and will take another 2 heartbeats with at least 6 seconds. As a result, the setup and cleanup tasks will take at least 12 seconds total. For a short job which runs only in a couple of minutes, these two special tasks may take around 10% or more of the total execution time. If we can cut down the fixed time cost of 4 heartbeats for a short job, that will be a noticeable performance improvement for a job execution. By taking a closer look at the implementation of the setup and cleanup tasks, we find that we can modify these two tasks to remove the time cost of the 4 heartbeats. Section will discuss this optimization in more detail. C. Heartbeat Delay From Figure 2 we can see that, each TaskTracker periodically sends information to the JobTracker and performs the pull-model task requests, and the JobTracker responds. We refer to this as the pull-model heartbeat communication mechanism. With the heartbeat communication, the TaskTrackers report node information to the JobTracker and then the JobTracker issues control commands to the TaskTrackers. For controlling and managing a Hadoop cluster, an appropriate heartbeat period should be set. Now, for a cluster with less than 100 nodes, the default heartbeat interval is 3 seconds. An additional 1 second is needed per 100 extra nodes. To some extent, the pull-model heartbeat communication mechanism can help prevent the JobTracker from being overwhelmed. But it comes with a heavy time cost: 1) The JobTracker has to wait for the TaskTrackers to request tasks passively, and as a result, there will be a delay between submitting a job and scheduling the job due to the heartbeat interval. 2) Important information (task success, commit, failure etc.) cannot be immediately reported from the 690

4 TaskTrackers to the JobTracker and this delays the task schedule, further increasing the time cost of job execution and decreasing the utilization efficiency of computing resources. Merely decreasing the value of heartbeat interval cannot improve the resource utilization of Hadoop cluster. In part B and C of section, we propose a task assignment and communication mechanism between the JobTracker and the TaskTrackers to reduce the delay caused by the pull-model heartbeat communication mechanism. III. OPTIMIZATION METHODS A. Dismiss of the Job Setup and Cleanup Tasks In the standard Hadoop, we observe that, the job setup task simply makes a temp directory for data output, and the job cleanup task deletes the directory. The actual time cost of these two tasks is very small. Thus, instead of sending message to the TaskTrackers to launch the job setup/cleanup task by a heartbeat, we can immediately execute the job setup/cleanup task in the JobTracker. That means, when the JobTracker initializes a job, the setup task of the job will be immediately executed in the JobTracker. After all map/reduce tasks of the job are completed, a cleanup task of the job will be immediately executed in the JobTracker as well. PREP.SETUP and CLEANUP states are incorporated into the PREP.INITIALIZED and RUNNING.SUC_WAIT states respectively. We implement a new version of Hadoop MapReduce from the standard Hadoop framework to achieve our proposal: 1) Add the methods setupjob() and cleanupjob() to JobInProgress (the object of a job); the method setupjob() implements what the method runjobsetuptask() in the Task class does and similarly the method cleanupjob() implements what the method runjobcleanup() does in the Task class. 2) Calls the method setupjob() from JobInProgress.initTask() and then alters the state of job to the RUNNING state; 3) Calls the method cleanupjob() from JobInProgress.completedTask() when all map/reduce tasks have been completed. B. Change the Task Assignment from Pull to Push In the standard Hadoop, the JobTracker never actively communicates with the TaskTrackers; even if it has received a new job and created a lot of map/reduce tasks. It has to wait for the TaskTrackers to issue requests by heartbeats. This delays the task schedule and further increase the time cost of job execution. As in Figure 4, we modify the task assignment mechanism from its original pull model to push model. After initializing a job, the JobTracker will actively send messages to start task assignments to the TaskTrackers. FIGURE 3. JOB STATE TRANSITION AFTER OPTIMIZATION Figure 3 shows the modified state transition of a job after making the optimization (to simplify, we omit conditions of kill and failed ). After making this optimization, the Figure 4. Push-model task assignment and instant message communication mechanisms after optimization 691

C. Separate the Job/Tasks Control Messages from Heartbeatsto The JobTracker and TaskTrackers perform the message communication with each other by heartbeats.

To improve the communication performance, we separate the job/tasks control message communication from heartbeats and provide an instant message communication mechanism as shown in Figure 4.

For all job/tasks scheduling events, we use the instant message communication, but for those cluster management events that are not that performancesensitive we still use the heartbeat communication

5 C. Separate the Job/Tasks Control Messages from Heartbeatsto The JobTracker and TaskTrackers perform the message communication with each other by heartbeats. The content of a heartbeat includes information of the TaskTracker, and task state, etc. To improve the communication performance, we separate the job/tasks control message communication from heartbeats and provide an instant message communication mechanism as shown in Figure 4. In this new mechanism, when important events such as task completion happen, the information will be send to the JobTracker immediately. For all job/tasks scheduling events, we use the instant message communication, but for those cluster management events that are not that performancesensitive we still use the heartbeat communication mechanism. Figure 4 shows how the messages of job/tasks scheduling are sent after making optimization. It is noticeable that, for achieving good scalability, we can increase the interval of heartbeat while using this optimization. IV. PERFORMANCE EVALUATION In this section, we conduct experiments to evaluate the performance for our optimized version of the Hadoop MapReduce framework compared to the standard Hadoop. A. Environment Setup We build an experimental environment of a Hadoop cluster with 19 nodes connected by Gigabit Ethernet. The hardware of the cluster is described in Table 1. We use the parallelized BLAST [16] that we developed as our test application. BLAST is a sequence alignment tool widely used by biology researchers. In our parallelized BLAST, we designed and implemented two type of BLAST algorithms: the map side extension BLAST without any reduce processing and the reduce side extension BLAST with both map and reduce processing. The details of these two parallelized BLAST algorithms with MapReduce are in [16]. We choose BLAST for our experiment because it represents a type of query applications. The original nt database was 32GB in size, and around 16GB after converting to Sequence File format. B. Evaluation During the execution of a job, we recorded the state of the slot of each TaskTracker at every second. We run the BLAST job that processes 16GB data with the standard Hadoop-0.20 environment and our new optimized version of Hadoop MapReduce respectively. The results are shown in Figure 5 and Figure 6. (a). Standard Hadoop (b). Dismiss of the job setup/cleanup tasks TABLE 1. HARDWARE INFORMATION We download the latest release FASTA format sequence databases, nt (nucleotide sequence database, with entries from all traditional divisions of GenBank, EMBL, and DDBJ excluding bulk divisions) from NCBI as our test datasets. (c). Push-model task assignment and instant message communication mechanisms Figure 5. Performance evaluation for map side extension BLAST As shown in Figure 5(b) and 6(b), after applying the optimization of dismissing setup/cleanup tasks, the setup and cleanup time costs are noticeably reduced. The total job execution time is shortened from about 47 seconds down to 34 for the map side extension BLAST, and from about 58 seconds down to 46 for the reduce side extension BLAST. 692

As shown in Figure 5(c) and 6(c), after further applying the optimization of the posh-model task assignment and instant message communication mechanisms, the total job execution time is further

6 As shown in Figure 5(c) and 6(c), after further applying the optimization of the posh-model task assignment and instant message communication mechanisms, the total job execution time is further shortened to 27 seconds for the map side extension BLAST, and to 40 for the reduce side extension BLAST. Comparing Figure 6(c) with Figure 6(b), it is obvious that, while cluster executing a job, the slots are used at a higher level of utilization, in both map phase and reduce phase. Figure 6. Performance evaluation for reduce side extension BLAST In Figure 7, the horizontal ordinate represents queries with different lengths of DNA sequences and the vertical one the time costs. Comparison to the standard Hadoop, our optimized Hadoop reduces the time cost about 23% on average. (a). Map side extension BLAST (a). Standard Hadoop (b). Reduce side extension BLAST Figure 7. Time cost comparison of the standard and optimized Hadoop (b). Dismiss of the job setup/cleanup tasks (c). Push-model task assignment and instant message communication mechanisms V. CONCLUSIONS Hadoop MapReduce has been proven to be a successful model and framework for large scale data processing. In this paper, we conducted our research on performance optimization for Hadoop MapReduce job execution. This optimization is especially helpful for the query or analysistype short jobs. By optimizing the job initialization and termination stages, changing the task assignment from heartbeat-based pull-model to a push model, and providing an instant message communication mechanism instead of heartbeats, we achieved 23% performance improvement on average for our test application. Also our optimized version of Hadoop preserves full features of standard Hadoop MapReduce framework, without changing any programming APIs of Hadoop. Further work need to be done to perform stability test of our optimized version of Hadoop and make more tests for a 693

7 variety of benchmark applications and datasets for further improvement. REFERENCES [1] J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, vol. 51, no. 1, 2008, pp [2] MPI (Message Passing Interface). Available: [3] PVM (Parallel Virtual Machine). Available: [4] Apahce Software Foundation, Hadoop, Available: [5] D. Gillick, A. Faria, and J. Denero, Mapreduce: Distributed computing for machine learning, [Online]. Available: [6] Fair Scheduler Guide. Available: [7] Matei Zaharia, Andy Konwinski, Anthony D.Joseph, Randy Katz, Ion Stoica. Improving MapReduce Performance in Heterogeneous Environments. 8th USENIX Symposium on Operating Systems Design and Implementation., p.29-42, December 08-10, 2008, San Diego, California. [8] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, Ion Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. Proceedings of the 5th European conference on Computer systems. Pages: , [9] Sangwon Seo, Ingook Jang, Kyungchang Woo, Inkyo Kim, Jin-Soo Kim, Seungryoul Maeng. HPMR:Prefetching and Pre-Shuf Shared MapReduce Computation Environment. IEEE International Conference on Cluster Computing and Workshops, 2009 [10] Shubin Zhang, Jizhong Han, Zhiyong Liu, Kai Wang, Shengzhong Feng. Accelerating MapReduce with Distributed Memory Cache. IEEE, th International Conference on Parallel and Distributed Systems [11] Kristi Morton, Abram Friesen, Magdalena Balazinska, Dan Grossman. Estimating the Progress of MapReduce Pipelines IEEE 26 th International Conference on Data Engineering [12] Shivnath Babu. Towards automatic optimization of MapReduce programs. International Conference on Management of Data, Proceedings of the 1st ACM symposium on Cloud computing [13] R. Pike, S. Dorward, R. Griesemer, S. Quinlan. Interpreting the Data: Parallel Analysis with Sawzall, Scientific Programming Journal, vol. 13, no. 4, Oct. 2005, pp [14] Apache Software Foundation, Hive, Available: [15] I. S. Jacobs and C. P. Bean, Fine particles, thin films and exchange anisotropy, in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp [16] Xiao-Liang Yang, Yu-Long Liu, Chun-Feng Yuan, Yi-Hua Huang, parallelization of BLAST with MapReduce for Long Sequence Alignment, International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), 2011, pp

SURVEY OF MAPREDUCE OPTIMIZATION METHODS

SURVEY OF MAPREDUCE OPTIMIZATION METHODS 1 Parmeshwari P. Sabnis, 2 Chaitali A.Laulkar Computer Department Sinhgad College of Engineering,Pune,India Email : 1 sabnis.parmeshwari@gmail.com, 2 calaulkar.scoe@sinhgad.edu