Research on Load Balancing in Task Allocation Process in Heterogeneous Hadoop Cluster

Size: px
Start display at page:

Download "Research on Load Balancing in Task Allocation Process in Heterogeneous Hadoop Cluster"

Transcription

1 nd International Conference on Artificial Intelligence and Engineering Applications (AIEA 2017) ISBN: Research on Load Balancing in Task Allocation Process in Heterogeneous Hadoop Cluster ZHIHAO TENG and ZHENGPING JIN ABSTRACT The original task scheduling algorithm of Hadoop cannot meet the performance requirements of heterogeneous clusters. The existing Hadoop task scheduling algorithm assumes that the performance of each node is consistent. This algorithm can perform well in homogeneous clusters. However, in the heterogeneous Hadoop cluster, due to different performance between CPU, disks and memory, the load imbalance will occur in the cluster. In the view of the unbalanced load of Hadoop cluster in heterogeneous environment, this paper proposes a new algorithm named Load balancing algorithm based on heterogeneous environment (LBAHE), which takes into account the performance differences of each node. And when measuring the performance of nodes, the number of slots is no longer the only criterion. We also add the CPU, disk memory and other factors. Experiments show the efficiency of the new algorithm and it can perform tasks faster in heterogeneous clusters than original algorithm. KEYWORDS load balancing, heterogeneous cluster, task scheduling, Hadoop INTRODUCTION Hadoop is one of the most important processing frameworks for big data. Load balancing has always been an important factor affecting Hadoop cluster s performance. Increasing data poses a challenge to processing capabilities of Hadoop cluster[1]. Inappropriate load balancing strategy will lead to the waste of computing resources, increase execution time, and even lead to system downtime. On the other hand, the appropriate load balancing strategy cannot satisfy the demand of users and make rational use of resources to complete tasks as soon as possible. The current load balancing strategy of Hadoop cluster perform well in the homogeneous cluster. However, in heterogeneous environments, due to the differences between data processing capacity, disk usage, and file read frequency, the load imbalance will occur in Hadoop clusters. In the heterogeneous environment, the performance of nodes is different. The current Hadoop cluster load balancing strategy does not take into account the differences in nodes, resulting in unreasonable allocation of tasks, which may result in unreasonable use of resources. Zhihao Teng, @163.com, Zhengping Jin, zhpjin@bupt.edu.cn, State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, , China 1013

2 And when allocating task in the Hadoop s MapReduce stage, the current strategy only considers whether the node has enough memory. The CPU, disk, and I/O capabilities are not used as a reference factor for node load capacity. This could lead to some unreasonable allocation of tasks. Therefore, Hadoop load balancing strategy for heterogeneous environment needs further study [2]. In this paper, a novel Hadoop load balancing strategy is proposed for heterogeneous environment. The main research results are as follows: 1. Deeply analyze and understand the existing Hadoop load balancing strategy, and improve the Hadoop load balancing strategy in heterogeneous environment. 2. Through experimental evaluation, the same task is performed under the new and original load balancing strategies, and the task runs shorter under the new strategy compared to the original one. In the second section, we will introduce the relative work of load balancing, and the third section is about the existing problems of load balancing in heterogeneous environment and the new load balancing strategy proposed in this paper. In the fourth section, we evaluate the efficiency of the strategy through the relevant experiments. The fifth section is the summary. RELATED WORK Load balancing has always been an important factor affecting the performance of Hadoop clusters. In this section, we will introduce the development of Hadoop load balancing strategy in heterogeneous environments. Matei Zaharia et al.[3] proposed a LATE scheduling algorithm to improve MapReduce performance in an unbalanced environment. Guo et al.[4] and others put forward a new resource scheduling algorithm where tasks are assigned to those nodes whose resources cannot be fully utilized. At this stage, the main study of load balancing: Smriti R. Ramaknshnan et al.[5] studied the load balancing of Reduce. Venkata Swamy et al.[6] proposed a h-mapreduce model. It evaluates Reduce overloaded node. Yuanquan Fan et al.[7] proposed a LBVP algorithm, this strategy can ensure that each Reduce to obtain roughly the same amount of data, so as to achieve the purpose of load balancing. In [8], the heterogeneous cluster load balancing and file response time are studied, but the heterogeneity of node capacity in heterogeneous clusters is neglected. In [9], a strategy of prorating data is proposed which takes into account the heterogeneity of nodes, but ignores the influence of the heterogeneity of node storage space on data storage. In [10], the overload load efficiency of Hadoop data is proposed, which can balance the data load of each frame in a short time, but does not consider the heterogeneity of nodes. LOAD BALANCING STRATEGY Hadoop consists of two core components: HDFS and MapReduce. MapReduce is a parallel computing framework and it is mainly responsible for data processing. JobTracker is the master of MapReduce, which is responsible for scheduling the tasks. These tasks are distributed to different TaskTracker nodes. TaskTracker is MapReduce Slave that is only responsible for running the master Task execution. JobTracker assigns tasks to TaskTracker according to a certain scheduling algorithm. Default scheduling algorithm are shown in Figure 1: 1014

3 Figure 1. Task-obtaining process of the default Hadoop scheduling algorithm. [1] 1. Tasktracker count the number of currently executing tasks. 2. Tasktrackers determine whether the number of current executing tasks is less than the fixed number of slots for tasks or not. The fixed number of slots for tasks limits the number of tasks simultaneously running on a tasktracker, which is marked as FixedTasksCapacity. 3. AskForNewTask is a flag to indicate whether to obtain a new task or not. If RunningTasks is less than Fixed-TasksCapacity, it means that this tasktracker could accept new tasks. The tasktracker will set the flag to true, otherwise, to false. We can see when JobTracker allocate the task, it collect the number of nodes through the heartbeat mechanism firstly. The implementation of a MapReduce job consists of three main processes: Map, Shuffle, Reduce. In the Map phase, the Partition function is called to assign tasks to Reduce: PartitionNum = Hash (key)% ReduceNum (3-1) PartitionNum indicates the partition number and ReduceNum represents the number of Reduce. As can be seen from the above formula, the data will be evenly distributed to each Reduce. In isomorphic environment, it is reasonable to call the Partition function to assign the results of Map calculations to Reduce equally. However, for heterogeneous clusters, the network bandwidth, CPU frequency, memory size, disk read and write speed of each node is inconsistent. Those will lead to the performance of each node is quite different, and the load of each node is dynamic changes, which May cause some problems: For the same task, the load on different nodes is different. Assume that A and B have X and 2X memory, respectively, and take up 50%. But the remaining amount is different. So the amount that can continue to accept new tasks is different. This results in an irrational allocation of tasks in heterogeneous clusters. If the Tasktracker is a high performance computing node and cannot get more tasks. This means a hunger situation. If the task tracker is a low performance computing node and continues to get new tasks. However, it cannot run more tasks, which means "saturation", resulting in overloading. And in the course of the operation of various tasks, the requirements for resources is not the same. Some tasks can be heavily calculated, i.e., CPU intensive tasks, while others require a large number of I/O operations. However, it can be seen from the above that the default scheduling algorithm for Hadoop is only based on TaskTracker node number to assign task which can cause unreasonable assignment. In addition, the 1015

4 following situation may occur: the node has enough slots to continue to accept the new task, but this node does not have enough CPU resources. But the CPU is the necessary resources for it, this situation will exacerbate CPU resource consumption, resulting in node load imbalance and the implementation of the overall slow task. A new algorithm is proposed for the problem of default task allocation. The new algorithm takes into account the heterogeneous situation of the cluster, so that it can assign different numbers of tasks to different nodes. The new algorithm also allocates cpu, disk, memory and others as the evaluation criteria for node s load capacity. This can be more comprehensive than the default method. The algorithm steps are as follows: 1. JobTracker accepts the job and starts the task. 2. TaskTracker detects the node's resource information (slot number, CPU, disk, memory, etc.), and each node task execution status. 3. Pass the detected information to the JobTracker via the heartbeat mechanism. 4. Under the premise of heterogeneous consideration, JobTracker judges whether TaskTracker continues to assign new tasks according to the information transmitted, the actual load situation of each node and the ability to run tasks. When the new algorithm detects the resource information, it not only tests whether there are enough slots, but also has a more comprehensive detection of node information: CPU, disk, memory and so on. All the information is combined to judge the load capacity of the nodes. This avoids the situation when the number of slots is sufficient, but other resources are insufficient to cause load imbalance. When the node information is delivered to the JobTracker, the node assignment is judged according to the information obtained and the actual situation of each node. This process takes into account the difference in performance of each node, the load capacity of different nodes. Therefore, different performance nodes corresponding to the overload of the threshold is also different. The new algorithm judges the conditions of the node under the conditions of the load. In isomorphic clusters, Hash functions are assigned to each Slave node on average. But in the heterogeneous environment, we must first evaluate the performance of each node. For CPU, because the kernel number of each node is different, we must first determine the number of nodes, then detect the load of CPU, and use the kernel and each load to get the whole node about the use of CPU. For CPU nodes with different kernel numbers, different thresholds are set, and when the corresponding threshold is exceeded, the node is considered to be overloaded. Similarly, for the disk, you cannot simply use a unified standard to judge different node. A larger threshold is set for a larger node on the disk, a smaller threshold is set for a smaller disk. This allows you to have different criteria for the nodes that have different disks. The Slave node periodically reports the heartbeat to the Master node through the RPC protocol, which includes memory, CPU, disk, and so on. Master node assigns tasks to slave nodes according to the information of each node and a certain scheduling algorithm. After the Master receives the node confidence of the Slave, it will determine the memory, CPU, disk and other information of the node. If each option is not overloaded, a new task will be assigned to this node. If one of the returned nodes is overloaded, i.e., the threshold is exceeded, a new task will not be assigned to this node for the time being, but it will continue to monitor the situation of this node. For the next heartbeat message, if the load of this node reaches its normal level and has enough resources to run the new task, it will get a new task. 1016

5 EXPERIMENTAL EVALUATION In order to verify the performance of the new algorithm. Running time in the heterogeneous Hadoop system will be the criterion. The running time of the task includes the response time and the actual running time. Response time is from the submission of tasks to the beginning. This time indicator reflects the system's ability to provide services. The shorter the execution time, the stronger the system's ability to handle the task. The shorter the time, the more reasonable the dispatch of the system task is, and the better the load condition. Experimental environment: Experiments require sufficient tasks with heterogeneous cluster environments. Because the experimental environment requires an unbalanced cluster, a heterogeneous Hadoop cluster is built with a virtual machine. Each VM is a Hadoop node, the node's hardware status is different, there are three nodes with 2G memory, two nodes with 1G memory. One node is set to JobTracker, and the other four nodes are set to TaskTracker. All hard disk space is 20 GB. Virtual machines installed CentOS system, JDK version jdk-7u79-linux-x64, Hadoop version of The configuration of the cluster environment is shown in the following table. Which will be running Slave nodes with different shell scripts to consume the limited resources. In the above experimental environment, different tasks are executed in different algorithms, and each task is executed many times. The following experimental results are obtained: Figure 2 shows the running time of different algorithms under different raw data. The new line represents the running time of the improved algorithm, and original is the running time of the original algorithm. As can be seen from the diagram, the two algorithms run under the data of 100M, 200M, 500M, 1024M and 2048M respectively, and the improved algorithm runs much shorter. This shows that in the same heterogeneous environment, the new algorithm can be more reasonable allocation of tasks, and more rational use of resources. And make the task completed more quickly.this proves that the new algorithm has better load balancing capability and the effectiveness of the algorithm. Table 1. Hadoop cluster communication information. Host name memory disk Master 2G 20G Slave1 2G 10G Slave2 2G 20G Slave3 1G 20G Slave4 1G 10G Execution time(s) Size of tak(m) Original Figure 2. experimental results. 1017

6 task ratio/performance ratio size of task(mb) node1 Figure 3. Experimental results of the original algorithm. 1.5 task ratio/performance size of task(mb) node1 Figure 4. Experimental results of the new algorithm (LBAHE). The percentage of tasks represents the ratio of the actual task and the total amount of tasks that are actually running throughout the job. The performance ratio represents the weight of the performance of the running task node across the cluster performance, which reflects the amount of task that the node actually should allocate. This ratio is used to measure the load status of nodes in a cluster. In a reasonably loaded system, the ratio of the task to capacity ratio and the performance ratio should be close to 1. When the node approaches 1, the node that represents a certain capability runs a corresponding number of tasks. Fig. 3 is the experimental result of the original algorithm, and the experimental result selects the proportion of two nodes. As can be seen from the experimental data, in the result of the original algorithm, the scale values are all greater than 1 or less than 1. It shows that the task assignment is unreasonable and the load is unbalanced. In the experiment results of the new algorithm (LBAHE), the scale value is fluctuating between 1 and above, which shows that the task assignment is reasonable and the load is better. Experimental results show that the new algorithm (LBAHE) is effective for heterogeneous cluster load balancing. SUMMARY In this paper, we solve the Hadoop load balancing problem in heterogeneous environment. Firstly, we analyze the factors of load imbalance caused in unbalanced cluster: the performance of the cluster is inconsistent and the judgment factor is simple. In the view of the above two problems, a new algorithm is proposed, which can adapt to the Hadoop cluster of heterogeneous environment. It can be seen from the experiment 1018

7 that in the case of heterogeneous and different load conditions of each node. The new algorithm shows good performance. ACKNOWLEDGEMENTS This work is supported by NSFC (Grant No ), and the Fundamental Research Funds for the Central Universities (Grant No.2015RC23). REFERENCES 1. Apache. (2012, Aug.). Hadoop, The Apache Software Foundation, ForrestHill, MD, USA. [Online]. Available: 2. Xiaolong Xu, Lingling Cao, and Xinheng Wang, Adaptive Task Scheduling Strategy Based on Dynamic Workload Adjustment for Heterogeneous Hadoop Clusters. IEEE Systems Journal, 2016, 10 (2): Zaharia M., Konwinski A., Joseph A.D., et al. Improving MapReduce Performance in Heterogeneous Environments [C]/IOSDI. 2008, 8(4): Guo Z., Fox G.: Improving MapReduce performance in heterogeneous network environments and resource utilization [C]//Proceedings of the th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012). IEEE Computer Society, 2012: Ramakrishnan S.R., Swart G., Urmanov A. Balancing Reducer skew in MapReduce workloads using progressive sampling [C]//Proceedings of the Third ACM Symposium on Cloud Computing_ ACM, 2012: Martha V.S., Zhao W., Xu X. h-mapreduce: A Framework for Workload Balancing in MapReduce [C]//Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on. IEEE, 2013: Fan Y., Wu W., Cao H., et al. LBVP.: A load balance algorithm based on Virtual Partition in Hadoop cluster [C]//Cloud Computing Congress (APC1oudCC), 2012 IEEE Asia Pacific. IEEE, 2012: Liu Kun, Niu Wenliang. An improved Hadoop data load balancing algorithm [J]. Journal of Henan Polytechnic University: Natural Science Edition, 2013, 32(3): Xie J.,.Yin S., Ruan X., et al. Improving map reduce performance through data placement in heterogeneous hadoop clusters [C]/ /Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on. IEEE, 2010: Liu Kun, Xiao Lin, Zhao Haiyan. Research and optimization of cloud data load balancing in Hadoop.Microellectronics & Computer,2012,29(9):

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Shiori KURAZUMI, Tomoaki TSUMURA, Shoichi SAITO and Hiroshi MATSUO Nagoya Institute of Technology Gokiso, Showa, Nagoya, Aichi,

More information

A priority based dynamic bandwidth scheduling in SDN networks 1

A priority based dynamic bandwidth scheduling in SDN networks 1 Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems

More information

Dynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c

Dynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c 2016 Joint International Conference on Service Science, Management and Engineering (SSME 2016) and International Conference on Information Science and Technology (IST 2016) ISBN: 978-1-60595-379-3 Dynamic

More information

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge

More information

A Study on Load Balancing Techniques for Task Allocation in Big Data Processing* Jin Xiaohong1,a, Li Hui1, b, Liu Yanjun1, c, Fan Yanfang1, d

A Study on Load Balancing Techniques for Task Allocation in Big Data Processing* Jin Xiaohong1,a, Li Hui1, b, Liu Yanjun1, c, Fan Yanfang1, d International Forum on Mechanical, Control and Automation IFMCA 2016 A Study on Load Balancing Techniques for Task Allocation in Big Data Processing* Jin Xiaohong1,a, Li Hui1, b, Liu Yanjun1, c, Fan Yanfang1,

More information

Performance Analysis of MapReduce Program in Heterogeneous Cloud Computing

Performance Analysis of MapReduce Program in Heterogeneous Cloud Computing 1734 JOURNAL OF NETWORKS, VOL. 8, NO. 8, AUGUST 2013 Performance Analysis of MapReduce Program in Heterogeneous Cloud Computing Wenhui Lin 1,2 and Jun Liu 1 1 Beijing Key Laboratory of Network System Architecture

More information

ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS

ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS Radhakrishnan R 1, Karthik

More information

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory

More information

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017) Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 1 Information Engineering

More information

Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b

Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 1

More information

The Improvement and Implementation of the High Concurrency Web Server Based on Nginx Baiqi Wang1, a, Jiayue Liu2,b and Zhiyi Fang 3,*

The Improvement and Implementation of the High Concurrency Web Server Based on Nginx Baiqi Wang1, a, Jiayue Liu2,b and Zhiyi Fang 3,* Computing, Performance and Communication systems (2016) 1: 1-7 Clausius Scientific Press, Canada The Improvement and Implementation of the High Concurrency Web Server Based on Nginx Baiqi Wang1, a, Jiayue

More information

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG * 2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Novel Scheduling Algorithms for Efficient Deployment of MapReduce Applications in Heterogeneous Computing Environments

Novel Scheduling Algorithms for Efficient Deployment of MapReduce Applications in Heterogeneous Computing Environments Novel Scheduling Algorithms for Efficient Deployment of MapReduce Applications in Heterogeneous Computing Environments Sun-Yuan Hsieh 1,2,3, Chi-Ting Chen 1, Chi-Hao Chen 1, Tzu-Hsiang Yen 1, Hung-Chang

More information

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1 3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015) Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao

More information

Processing Technology of Massive Human Health Data Based on Hadoop

Processing Technology of Massive Human Health Data Based on Hadoop 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,

More information

A Study of Cloud Computing Scheduling Algorithm Based on Task Decomposition

A Study of Cloud Computing Scheduling Algorithm Based on Task Decomposition 2016 3 rd International Conference on Engineering Technology and Application (ICETA 2016) ISBN: 978-1-60595-383-0 A Study of Cloud Computing Scheduling Algorithm Based on Task Decomposition Feng Gao &

More information

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH) Cloudera CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Download Full Version : http://killexams.com/pass4sure/exam-detail/cca-410 Reference: CONFIGURATION PARAMETERS DFS.BLOCK.SIZE

More information

Improving MapReduce Performance in a Heterogeneous Cloud: A Measurement Study

Improving MapReduce Performance in a Heterogeneous Cloud: A Measurement Study Improving MapReduce Performance in a Heterogeneous Cloud: A Measurement Study Xu Zhao 1,2, Ling Liu 2, Qi Zhang 2, Xiaoshe Dong 1 1 Xi an Jiaotong University, Shanxi, China, 710049, e-mail: zhaoxu1987@stu.xjtu.edu.cn,

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 11 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(11), 2014 [5368-5376] The study on magnanimous data-storage system based

More information

A Brief on MapReduce Performance

A Brief on MapReduce Performance A Brief on MapReduce Performance Kamble Ashwini Kanawade Bhavana Information Technology Department, DCOER Computer Department DCOER, Pune University Pune university ashwinikamble1992@gmail.com brkanawade@gmail.com

More information

Distributed Face Recognition Using Hadoop

Distributed Face Recognition Using Hadoop Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,

More information

A Scheme of Multi-path Adaptive Load Balancing in MANETs

A Scheme of Multi-path Adaptive Load Balancing in MANETs 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) A Scheme of Multi-path Adaptive Load Balancing in MANETs Yang Tao1,a, Guochi Lin2,b * 1,2 School of Communication

More information

A Spark Scheduling Strategy for Heterogeneous Cluster

A Spark Scheduling Strategy for Heterogeneous Cluster Copyright 2018 Tech Science Press CMC, vol.55, no.3, pp.405-417, 2018 A Spark Scheduling Strategy for Heterogeneous Cluster Xuewen Zhang 1, Zhonghao Li 1, Gongshen Liu 1, *, Jiajun Xu 1, Tiankai Xie 2

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

HADOOP BLOCK PLACEMENT POLICY FOR DIFFERENT FILE FORMATS

HADOOP BLOCK PLACEMENT POLICY FOR DIFFERENT FILE FORMATS INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

Cooperation between Data Modeling and Simulation Modeling for Performance Analysis of Hadoop

Cooperation between Data Modeling and Simulation Modeling for Performance Analysis of Hadoop Cooperation between Data ing and Simulation ing for Performance Analysis of Hadoop Byeong Soo Kim and Tag Gon Kim Department of Electrical Engineering Korea Advanced Institute of Science and Technology

More information

Modeling and Optimization of Resource Allocation in Cloud

Modeling and Optimization of Resource Allocation in Cloud PhD Thesis Progress First Report Thesis Advisor: Asst. Prof. Dr. Tolga Ovatman Istanbul Technical University Department of Computer Engineering January 8, 2015 Outline 1 Introduction 2 Studies Time Plan

More information

Survey on MapReduce Scheduling Algorithms

Survey on MapReduce Scheduling Algorithms Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used

More information

The Design of Distributed File System Based on HDFS Yannan Wang 1, a, Shudong Zhang 2, b, Hui Liu 3, c

The Design of Distributed File System Based on HDFS Yannan Wang 1, a, Shudong Zhang 2, b, Hui Liu 3, c Applied Mechanics and Materials Online: 2013-09-27 ISSN: 1662-7482, Vols. 423-426, pp 2733-2736 doi:10.4028/www.scientific.net/amm.423-426.2733 2013 Trans Tech Publications, Switzerland The Design of Distributed

More information

Big Data for Engineers Spring Resource Management

Big Data for Engineers Spring Resource Management Ghislain Fourny Big Data for Engineers Spring 2018 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models

More information

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao 1 Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao and Shigeru Chiba The University of Tokyo Thanh-Chung Dao 2 Supercomputers Expensive clusters Multi-core

More information

A New Approach to Web Data Mining Based on Cloud Computing

A New Approach to Web Data Mining Based on Cloud Computing Regular Paper Journal of Computing Science and Engineering, Vol. 8, No. 4, December 2014, pp. 181-186 A New Approach to Web Data Mining Based on Cloud Computing Wenzheng Zhu* and Changhoon Lee School of

More information

Journal of East China Normal University (Natural Science) Data calculation and performance optimization of dairy traceability based on Hadoop/Hive

Journal of East China Normal University (Natural Science) Data calculation and performance optimization of dairy traceability based on Hadoop/Hive 4 2018 7 ( ) Journal of East China Normal University (Natural Science) No. 4 Jul. 2018 : 1000-5641(2018)04-0099-10 Hadoop/Hive 1, 1, 1, 1,2, 1, 1 (1., 210095; 2., 210095) :,, Hadoop/Hive, Hadoop/Hive.,,

More information

A Micro Partitioning Technique in MapReduce for Massive Data Analysis

A Micro Partitioning Technique in MapReduce for Massive Data Analysis A Micro Partitioning Technique in MapReduce for Massive Data Analysis Nandhini.C, Premadevi.P PG Scholar, Dept. of CSE, Angel College of Engg and Tech, Tiruppur, Tamil Nadu Assistant Professor, Dept. of

More information

A MapReduce based Parallel K-Means Clustering for Large Scale CIM Data Verification

A MapReduce based Parallel K-Means Clustering for Large Scale CIM Data Verification A MapReduce based Parallel K-Means Clustering for Large Scale CIM Data Verification Chuang Deng, Yang Liu*, Lixiong Xu, Jie Yang, Junyong Liu School of Electrical Engineering and Information, Sichuan University,

More information

LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud

LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud Shadi Ibrahim, Hai Jin, Lu Lu, Song Wu, Bingsheng He*, Qi Li # Huazhong University of Science and Technology *Nanyang Technological

More information

Hadoop MapReduce Framework

Hadoop MapReduce Framework Hadoop MapReduce Framework Contents Hadoop MapReduce Framework Architecture Interaction Diagram of MapReduce Framework (Hadoop 1.0) Interaction Diagram of MapReduce Framework (Hadoop 2.0) Hadoop MapReduce

More information

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage

More information

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Available online at  ScienceDirect. Procedia Computer Science 89 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 341 348 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Parallel Approach

More information

Big Data 7. Resource Management

Big Data 7. Resource Management Ghislain Fourny Big Data 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

A New Model of Search Engine based on Cloud Computing

A New Model of Search Engine based on Cloud Computing A New Model of Search Engine based on Cloud Computing DING Jian-li 1,2, YANG Bo 1 1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China 2. Tianjin Key

More information

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 89 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 203 208 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Tolhit A Scheduling

More information

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241

More information

An Intelligent Load Balancing Algorithm Towards Efficient Cloud Computing

An Intelligent Load Balancing Algorithm Towards Efficient Cloud Computing AI for Data Center Management and Cloud Computing: Papers from the 2011 AAAI Workshop (WS-11-08) An Intelligent Load Balancing Algorithm Towards Efficient Cloud Computing Yang Xu, Lei Wu, Liying Guo, Zheng

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 DOI:10.19026/ajfst.5.3106 ISSN: 2042-4868; e-issn: 2042-4876 2013 Maxwell Scientific Publication Corp. Submitted: May 29, 2013 Accepted:

More information

Implementation and performance test of cloud platform based on Hadoop

Implementation and performance test of cloud platform based on Hadoop IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Implementation and performance test of cloud platform based on Hadoop To cite this article: Jingxian Xu et al 2018 IOP Conf. Ser.:

More information

Batch Inherence of Map Reduce Framework

Batch Inherence of Map Reduce Framework Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.287

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

Dynamic Load-balance Scheduling Approach in Linux Based on. Real-time Modifying LVS Weight. YihuaLIAO1, a,min LIN2

Dynamic Load-balance Scheduling Approach in Linux Based on. Real-time Modifying LVS Weight. YihuaLIAO1, a,min LIN2 7th International Conference on Advanced Design and Manufacturing Engineering (ICADME 217) Dynamic Load-balance Scheduling Approach in Linux Based on Real-time Modifying LVS Weight YihuaLIAO1, a,min LIN2

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Dynamic Replication Management Scheme for Cloud Storage

Dynamic Replication Management Scheme for Cloud Storage Dynamic Replication Management Scheme for Cloud Storage May Phyo Thu, Khine Moe Nwe, Kyar Nyo Aye University of Computer Studies, Yangon mayphyothu.mpt1@gmail.com, khinemoenwe@ucsy.edu.mm, kyarnyoaye@gmail.com

More information

A Hybrid Architecture for Video Transmission

A Hybrid Architecture for Video Transmission 2017 Asia-Pacific Engineering and Technology Conference (APETC 2017) ISBN: 978-1-60595-443-1 A Hybrid Architecture for Video Transmission Qian Huang, Xiaoqi Wang, Xiaodan Du and Feng Ye ABSTRACT With the

More information

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.

More information

Improvement on PageRank Algorithm Based on User Influence

Improvement on PageRank Algorithm Based on User Influence Improvement on Algorithm Based on User Influence Yang Wang Basic Medical College ShaanXi University of Chinese Medicine Xianyang, Shanxi, China Abstract With the rapid development of the Internet, web

More information

2/26/2017. For instance, consider running Word Count across 20 splits

2/26/2017. For instance, consider running Word Count across 20 splits Based on the slides of prof. Pietro Michiardi Hadoop Internals https://github.com/michiard/disc-cloud-course/raw/master/hadoop/hadoop.pdf Job: execution of a MapReduce application across a data set Task:

More information

Preliminary Research on Distributed Cluster Monitoring of G/S Model

Preliminary Research on Distributed Cluster Monitoring of G/S Model Available online at www.sciencedirect.com Physics Procedia 25 (2012 ) 860 867 2012 International Conference on Solid State Devices and Materials Science Preliminary Research on Distributed Cluster Monitoring

More information

Study of Load Balancing Schemes over a Video on Demand System

Study of Load Balancing Schemes over a Video on Demand System Study of Load Balancing Schemes over a Video on Demand System Priyank Singhal Ashish Chhabria Nupur Bansal Nataasha Raul Research Scholar, Computer Department Abstract: Load balancing algorithms on Video

More information

A Method of Identifying the P2P File Sharing

A Method of Identifying the P2P File Sharing IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.11, November 2010 111 A Method of Identifying the P2P File Sharing Jian-Bo Chen Department of Information & Telecommunications

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

The Optimization and Improvement of MapReduce in Web Data Mining

The Optimization and Improvement of MapReduce in Web Data Mining Journal of Software Engineering and Applications, 2015, 8, 395-406 Published Online August 2015 in SciRes. http://www.scirp.org/journal/jsea http://dx.doi.org/10.4236/jsea.2015.88039 The Optimization and

More information

A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors

A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors Proceedings of the World Congress on Engineering 2018 ol I A Routing Algorithm for 3 Network-on-Chip in Chip Multi-Processors Rui Ben, Fen Ge, intian Tong, Ning Wu, ing hang, and Fang hou Abstract communication

More information

Configuring a MapReduce Framework for Dynamic and Efficient Energy Adaptation

Configuring a MapReduce Framework for Dynamic and Efficient Energy Adaptation Configuring a MapReduce Framework for Dynamic and Efficient Energy Adaptation Jessica Hartog, Zacharia Fadika, Elif Dede, Madhusudhan Govindaraju Department of Computer Science, State University of New

More information

Delegated Access for Hadoop Clusters in the Cloud

Delegated Access for Hadoop Clusters in the Cloud Delegated Access for Hadoop Clusters in the Cloud David Nuñez, Isaac Agudo, and Javier Lopez Network, Information and Computer Security Laboratory (NICS Lab) Universidad de Málaga, Spain Email: dnunez@lcc.uma.es

More information

Data Prefetching for Scientific Workflow Based on Hadoop

Data Prefetching for Scientific Workflow Based on Hadoop Data Prefetching for Scientific Workflow Based on Hadoop Gaozhao Chen, Shaochun Wu, Rongrong Gu, Yongquan Xu, Lingyu Xu, Yunwen Ge, and Cuicui Song * Abstract. Data-intensive scientific workflow based

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

Big Data Processing: Improve Scheduling Environment in Hadoop Bhavik.B.Joshi

Big Data Processing: Improve Scheduling Environment in Hadoop Bhavik.B.Joshi IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 06, 2016 ISSN (online): 2321-0613 Big Data Processing: Improve Scheduling Environment in Hadoop Bhavik.B.Joshi Abstract

More information

Analyzing and Improving Load Balancing Algorithm of MooseFS

Analyzing and Improving Load Balancing Algorithm of MooseFS , pp. 169-176 http://dx.doi.org/10.14257/ijgdc.2014.7.4.16 Analyzing and Improving Load Balancing Algorithm of MooseFS Zhang Baojun 1, Pan Ruifang 1 and Ye Fujun 2 1. New Media Institute, Zhejiang University

More information

Design of an Optimal Data Placement Strategy in Hadoop Environment

Design of an Optimal Data Placement Strategy in Hadoop Environment Design of an Optimal Data Placement Strategy in Hadoop Environment Shah Dhairya Vipulkumar 1, Saket Swarndeep 2 1 PG Scholar, Computer Engineering, L.J.I.E.T., Gujarat, India 2 Assistant Professor, Dept.

More information

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,

More information

An Optimization Algorithm of Selecting Initial Clustering Center in K means

An Optimization Algorithm of Selecting Initial Clustering Center in K means 2nd International Conference on Machinery, Electronics and Control Simulation (MECS 2017) An Optimization Algorithm of Selecting Initial Clustering Center in K means Tianhan Gao1, a, Xue Kong2, b,* 1 School

More information

Research and Realization of AP Clustering Algorithm Based on Cloud Computing Yue Qiang1, a *, Hu Zhongyu2, b, Lei Xinhua1, c, Li Xiaoming3, d

Research and Realization of AP Clustering Algorithm Based on Cloud Computing Yue Qiang1, a *, Hu Zhongyu2, b, Lei Xinhua1, c, Li Xiaoming3, d 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) Research and Realization of AP Clustering Algorithm Based on Cloud Computing Yue Qiang1, a *, Hu Zhongyu2, b,

More information

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Presented by: Dishant Mittal Authors: Juwei Shi, Yunjie Qiu, Umar Firooq Minhas, Lemei Jiao, Chen Wang, Berthold Reinwald and Fatma

More information

QADR with Energy Consumption for DIA in Cloud

QADR with Energy Consumption for DIA in Cloud Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Indexing Strategies of MapReduce for Information Retrieval in Big Data

Indexing Strategies of MapReduce for Information Retrieval in Big Data International Journal of Advances in Computer Science and Technology (IJACST), Vol.5, No.3, Pages : 01-06 (2016) Indexing Strategies of MapReduce for Information Retrieval in Big Data Mazen Farid, Rohaya

More information

Locality Aware Fair Scheduling for Hammr

Locality Aware Fair Scheduling for Hammr Locality Aware Fair Scheduling for Hammr Li Jin January 12, 2012 Abstract Hammr is a distributed execution engine for data parallel applications modeled after Dryad. In this report, we present a locality

More information

An Improved KNN Classification Algorithm based on Sampling

An Improved KNN Classification Algorithm based on Sampling International Conference on Advances in Materials, Machinery, Electrical Engineering (AMMEE 017) An Improved KNN Classification Algorithm based on Sampling Zhiwei Cheng1, a, Caisen Chen1, b, Xuehuan Qiu1,

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV

More information

SQL Query Optimization on Cross Nodes for Distributed System

SQL Query Optimization on Cross Nodes for Distributed System 2016 International Conference on Power, Energy Engineering and Management (PEEM 2016) ISBN: 978-1-60595-324-3 SQL Query Optimization on Cross Nodes for Distributed System Feng ZHAO 1, Qiao SUN 1, Yan-bin

More information

Correlation based File Prefetching Approach for Hadoop

Correlation based File Prefetching Approach for Hadoop IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie

More information

DISTRIBUTED VIRTUAL CLUSTER MANAGEMENT SYSTEM

DISTRIBUTED VIRTUAL CLUSTER MANAGEMENT SYSTEM DISTRIBUTED VIRTUAL CLUSTER MANAGEMENT SYSTEM V.V. Korkhov 1,a, S.S. Kobyshev 1, A.B. Degtyarev 1, A. Cubahiro 2, L. Gaspary 3, X. Wang 4, Z. Wu 4 1 Saint Petersburg State University, 7/9 Universitetskaya

More information

Towards Makespan Minimization Task Allocation in Data Centers

Towards Makespan Minimization Task Allocation in Data Centers Towards Makespan Minimization Task Allocation in Data Centers Kangkang Li, Ziqi Wan, Jie Wu, and Adam Blaisse Department of Computer and Information Sciences Temple University Philadelphia, Pennsylvania,

More information

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT. Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem

More information

Energy efficient optimization method for green data center based on cloud computing

Energy efficient optimization method for green data center based on cloud computing 4th ational Conference on Electrical, Electronics and Computer Engineering (CEECE 2015) Energy efficient optimization method for green data center based on cloud computing Runze WU1, a, Wenwei CHE1, b,

More information

An Improved Weighted Least Connection Scheduling Algorithm for Load Balancing in Web Cluster Systems

An Improved Weighted Least Connection Scheduling Algorithm for Load Balancing in Web Cluster Systems An Improved Weighted Least Connection Scheduling Algorithm for Load Balancing in Web Cluster Systems Gurasis Singh 1, Kamalpreet Kaur 2 1Assistant professor,department of Computer Science, Guru Nanak Dev

More information

PaaS and Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University

PaaS and Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University PaaS and Hadoop Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University laiping@tju.edu.cn 1 Outline PaaS Hadoop: HDFS and Mapreduce YARN Single-Processor Scheduling Hadoop Scheduling

More information

K-MEANS METHOD FOR GROUPING IN HYBRID MAPREDUCE CLUSTERS

K-MEANS METHOD FOR GROUPING IN HYBRID MAPREDUCE CLUSTERS K-MEANS METHOD FOR GROUPING IN HYBRID MAPREDUCE CLUSTERS 1 YANG YANG, XIANG LONG, BO JIANG, YU LIU, 1 School of Computer Science and Engineering, Beihang University, Haidian 100191, Being, China :Corresponding

More information

Performance Optimization for Short MapReduce Job Execution in Hadoop

Performance Optimization for Short MapReduce Job Execution in Hadoop 2012 Second International Conference on Cloud and Green Computing Performance Optimization for Short MapReduce Job Execution in Hadoop Jinshuang Yan, Xiaoliang Yang, Rong Gu, Chunfeng Yuan, and Yihua Huang

More information

Construction Scheme for Cloud Platform of NSFC Information System

Construction Scheme for Cloud Platform of NSFC Information System , pp.200-204 http://dx.doi.org/10.14257/astl.2016.138.40 Construction Scheme for Cloud Platform of NSFC Information System Jianjun Li 1, Jin Wang 1, Yuhui Zheng 2 1 Information Center, National Natural

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

Mixing and matching virtual and physical HPC clusters. Paolo Anedda

Mixing and matching virtual and physical HPC clusters. Paolo Anedda Mixing and matching virtual and physical HPC clusters Paolo Anedda paolo.anedda@crs4.it HPC 2010 - Cetraro 22/06/2010 1 Outline Introduction Scalability Issues System architecture Conclusions & Future

More information

LITERATURE SURVEY (BIG DATA ANALYTICS)!

LITERATURE SURVEY (BIG DATA ANALYTICS)! LITERATURE SURVEY (BIG DATA ANALYTICS) Applications frequently require more resources than are available on an inexpensive machine. Many organizations find themselves with business processes that no longer

More information

Automated Control for Elastic Storage Harold Lim, Shivnath Babu, Jeff Chase Duke University

Automated Control for Elastic Storage Harold Lim, Shivnath Babu, Jeff Chase Duke University D u k e S y s t e m s Automated Control for Elastic Storage Harold Lim, Shivnath Babu, Jeff Chase Duke University Motivation We address challenges for controlling elastic applications, specifically storage.

More information

Jumbo: Beyond MapReduce for Workload Balancing

Jumbo: Beyond MapReduce for Workload Balancing Jumbo: Beyond Reduce for Workload Balancing Sven Groot Supervised by Masaru Kitsuregawa Institute of Industrial Science, The University of Tokyo 4-6-1 Komaba Meguro-ku, Tokyo 153-8505, Japan sgroot@tkl.iis.u-tokyo.ac.jp

More information

ADVANCES in NATURAL and APPLIED SCIENCES

ADVANCES in NATURAL and APPLIED SCIENCES ADVANCES in NATURAL and APPLIED SCIENCES ISSN: 1995-0772 Published BY AENSI Publication EISSN: 1998-1090 http://www.aensiweb.com/anas 2016 May 10(5): pages 166-171 Open Access Journal A Cluster Based Self

More information

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Hwajung Lee Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Cloud Introduction Cloud Service Model Big Data Hadoop MapReduce HDFS (Hadoop Distributed

More information

Parallelization of K-Means Clustering Algorithm for Data Mining

Parallelization of K-Means Clustering Algorithm for Data Mining Parallelization of K-Means Clustering Algorithm for Data Mining Hao JIANG a, Liyan YU b College of Computer Science and Engineering, Southeast University, Nanjing, China a hjiang@seu.edu.cn, b yly.sunshine@qq.com

More information