An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data

Size: px
Start display at page:

Download "An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data"

Transcription

1 2016 International Conference on Micro-Electronics and Telecommunication Engineering An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data Subhash Chandra Department of Computer Science ITM University Gwalior, India Deepak Motwani Department of Computer Science ITM University Gwalior, India Abstract Data analysis is becoming one of the highest research topic among researchers. Information is the baseline of every small and big organization. Everyone wants relevant information for their business to grow faster and bigger. Every organization wants to know what their customers like and dislike. This desirable information requires analysis of very large information stored in various places in different format. Hadoop MapReduce framework becoming a popular platform for processing so large amount of data in very efficient manner. It is used by organizations to process their customers information data sets. Hadoop process datasets in distributed parallel processes by using its HDFS and MapReduce model. Hadoop optimization is requiring more attention from researchers and programmers. Many approaches is already developed to make Hadoop framework optimized. These approaches includes performances tuning and efficient clustering formation. In this research work we have developed Optimal Approach to Improve the Performance of Hadoop framework. K-Means and K- Medoids are well known clustering approaches for clustering inside Hadoop. In proposed approach a modified K- Medoids clustering algorithm has been developed which gives better result for processing inside Hadoop. The research work is tested inside multi node Hadoop environment. Keywords K-mean clustering; MapReduce; Hadoop; HDFS I. INTRODUCTION Hadoop is an open source framework for processing and analyzing of big data with the help of HDFS and MapReduce. Apache Hadoop is developed for not only structured datasets but it can also process unstructured datasets. The data storing capacity of devices has becoming advanced. It make it possible to store a large data inexpensive. The Large data storage is not a big problem nowadays. It gives advantage to Hadoop in processing large data. Firstly any one can buy storage easily in reasonable amount and second Hadoop is an open source and does not require any money to run it. These combination makes it possible to make Hadoop so famous for data processing. Hadoop is not only used for research but also developed for commercial usage with modification. Hadoop Distributed File System (HDFS) is designed for storing large amount of data. When Hadoop process large data set by dividing it into smaller parts then it is highly possible that data loss will happen but HDFS makes reliable data storage which means data loss is avoidable while processing. Hadoop HDFS uses a distributed file system which uses additional nodes by simply adding them into network if requires. It makes it more scalable in terms of data storage. MapReduce is a programming model for processing large data set in distributed and parallel processes stored inside the Hadoop distributed file system. The large amount of data storage is not so big problem but processing so much data and getting desired result require more effort. MapReduce is a model which is used by Hadoop to solve this problem. It uses parallelization technique on smaller parts of data set by dividing from large data set. MapReduce perform execution model for processing smaller data set. Hadoop can store, process and analyze not only terabyte of data but petabyte of data. It is designed for efficient scaling capacity. Hadoop does not other if datasets becoming larger and larger. It can added more and more resources and processing capability by adding more processing nodes and storages. Hadoop is a very powerful tool for efficiently processing large amount of data by connecting multiple computer to each other. Hadoop MapReduce is a java program written as two different and distinct tasks. Hadoop splits datasets into blocks and process these small datasets and result is being merged and return to Hadoop. Map and reduce are two different programmes written in java. The input pair is in the format of key value <key.value>. Map takes these values and process according to given logic and send the output to reduce step. Reduce uses the output of the Map as input and process the result by combining into final output. Hadoop storage is deviled into two components Name node and Data node. Hadoop keeps watch on all the execution with the help of task tracker and job tracker. Hadoop can be run on single node to multiple node. Single node setup is not able to efficiently process the large data set and is only suitable for research purpose while multi node setup is designed to include multiple nodes. Multi node setup can add more node as the datasets increases. A single node Hadoop cluster includes a single Name Node and a single Data Node while a multi-node Hadoop cluster includes 1 Name Node and more than one Data Node. Hadoop clusters are designed such that it is avoid failure at any cost by making each piece of data duplicated on other cluster nodes. Hadoop can increase the capability by just adding additional cluster nodes. II. HADOOP:MAPREDUCE FARMEWORK The two main modules of Hadoop are HDFS and MapReduce. Hadoop Distributed File System (HDFS) is /16 $ IEEE DOI /ICMETE

2 designed for reliable data storage. HDFS can span to a large cluster of computing nodes. MapReduce is a framework which designed to execute an application for processing large amounts of both unstructured and structured data by dividing into smaller datasets in distributed and parallel processing environment on cluster of machines. Hadoop MapReduce process datasets in a fault-tolerant and reliable way. Three major components plays important roles in a Hadoop framework. These are Name Node, Data Node and computer machines. Name Node is created on the master node for storing the file system metadata such as keeps the record of which file and blocks are to be stored on which Data Node. Hadoop startup makes Name Node to reads HDFS state from fs image. Name node uses Job tracker a daemon to keep watch on jobs status. The secondary Name Node is connected to the Name Node. It keeps snapshot the metadata of the file system in local storage. The Data Node designed to work as the slave. It is repository for actual data storage while processing. The Data Node keeps on sending the signal to the Name Node periodically at regular intervals to indicate its presence in the Hadoop system. The Data Nodes communicate with other Data Nodes to keep the replication high and to balance the data by moving copies the data around. The Data Node is responsible for all the client requests for read and write. The daemon which is known as Task Tracker deployed on the Data Node for executing the individual tasks allocated by the Job Tracker. HDFS client machines have Hadoop installed. It has all the Hadoop settings. It does not work as a Master or a Slave. The role of the client machine is to store data into the Hadoop cluster, submit Map Reduce jobs with information of data processing and then retrieve the results of the job after finished. It can read, write, delete files and perform the operations to create and delete from Name Node. MapReduce is the heart of Hadoop. There are two distinct tasks for Hadoop to perform operation. The first task is to map job and second task is to reduce job. The map function selects an input data and divides into multiple set of data. The individual datasets is divided into small set of tuples in the form of keyvalue pair. The reduce function takes the output of the map as an input and then merges those output data. The map job is always performed first before the reduce job. III. RELEATED WORK Hadoop MapReduce is designed as a flexible solution for big data processing. It has facility of adjustment of flexible parameters as suitable to requirements. Hadoop framework depends on various parameters. These parameters are responsible for Hadoop framework performance. The parameters configuration is a topic of research. For efficient result Result in time, the better combination of MapReduce parameters must be known. The parameters must be effectively managed for every tasks and schedule for maximum performance[5]. Hadoop is developed for processing large data set inside the large number of nodes as clusters. The Hadoop applications can be different in aspect of resources, datasets size and other constrains. Hadoop applications are getting several problems such as ineffective CPU utilizations and memory utilizations. Hadoop configuration should need to update the parameters on to many conditions such as resource requirement of particular application. Small changes into Hadoop configuration parameters will make huge difference to performance for the same application with same data set. [6] The big data is processed in distributed and parallel programming model In Hadoop MapReduce. The K-Medoids algorithm HK-Medoids. HK Medoids is implemented in Hadoop MapReduce framework. Every scheduled job follows strict steps in MapReduce. There are various steps for scheduled job.. map phase, combine phase and reduce phase are steps required by any applications. Each input data sample is allocated to one cluster in map phase. The center is being calculated for every cluster in combine phase. In reduce phase, the center is re-calculated. All these phases continuously repeated until the new centre and old centre doesn t have any variation. [9] The data processing and analyzing is very complex job In Big Data. Hadoop MapReduce gives efficient solution for Big Data analysis. Hadoop MapReduce primarily depends on parameters selections and tuning of Hadoop MapReduce parameters for a better result. The tuning of Hadoop MapReduce is an efficient way to improve performance of job completion in respect to time and disk utilizations. The performance tuning uses network traffic, memory usage, CPU usage and many other parameters. The several performance tuning methods have been derived for a optimum result. [14] Data analysis approach in data mining have significant depends on the clustering approaches. In cluster formation, a data set is being divided into datasets. Clustering approach has been used on non similar data types. The data sets classification done when datasets do not have any predefined category. The pattern recognition, image processing, text mining and many more has been analyzed by the clustering approaches. Several algorithms of cluster formations has been proposed. Data clustering approaches are well defined area for research. It includes widely used approach like K-means. K-means is not providing better results into the research in many cases. [12] K-Medoids has advantages over k-means. Hadoop is increasingly being used in various industries. The organization which deals with consolidates and analyze data. Hadoop can be beneficial for such organizations. [11] Hadoop includes HDFS for storage system and MapReduce for processing. Hadoop MapReduce is used for analyzing a large datasets on multiple numbers of nodes. Hadoop framework divides into in to one master node and many slaves nodes. In many engineering and science domain big data is emerging area to analyze big data set. Finding useful data by analyzing from a huge data set is challenging. A large datasets need longer time compare to smaller datasets [15]. IV. PROPOSED APPROACH K-means and k-medoids are widely known clustering approach in academics and scientific researches. In these clustering approaches main purpose is to divide data into partitions. Cluster formation algorithm is designed to provide clusters of smaller datasets segment of similar types. In the research, K-means is not perform well in many scenario. For data partitions in many cases such as Absolute Pearson K

3 means does not perform well. Initial Centroids are selected first in K-means. The selection process of initial centroids is random. They are picked up randomly. This approach does not suitable for optimum result due to its randomness and can lead to ineffective output and low quality result. K-means uses can be expensive and result into waiting of time as number of clusters, iterations and data items increases. Many improved K-means has been proposed but they are lacking sometimes. Hadoop framework has more scope to improvement. The research is going on to find out effective initial centroids selection strategy. We have developed an improved version of K-Medoids algorithm. Our developed approach performs better than existing k-means algorithm in Hadoop MapReduce framework V. EXISTING ALGORITHAM Input: D = {d1, d2,...,dn} // set of n data items. K // Number of desired clusters. Output: A set of k clusters. Steps : 1. Initialize: randomly select k of the n data points as the Medoids 2. Associate each data point to the closest Medoid. 3. For each Medoid m a) For each non-medoid data point o b) Swap m and o and compute the total cost of the configuration 4. Select the configuration with the lowest cost. Repeat steps 2 to 4 until there is no change in the Medoid. VI. PROPOSED ALGORITHM Input: D = {d1, d2,...,dn} // set of n data items. K // Number of desired clusters. Output: A set of k clusters. Steps: 1. Calculate the initial Centroids. 2. Set the cluster with that Centroids. 3. Initially assign the each data point to the cluster. 4. Calculate the mean value of distance of the all data points of that cluster. 5. Define new Centroids with mean value. 6. Update the Centroids value. 7. Repeat steps 3 & 6 until all data points are assigned to any one of the clusters. 8. Initialize: randomly select k of the n data points as the Medoids 9. Associate each data point to the closest Medoid. 10. For each Medoid m a) For each non-medoid data point o b) Swap m and o and compute the total cost of the configuration 11. Select the configuration with the lowest cost. 12. Repeat steps 9 to 11 until there is no change in the Medoids. VII. EXPERIMENTAL SETUP AND RESULT We have developed an optimal approach to improve the performance of Hadoop MapReduce Framework for Big Data analysis. In this research work we have developed a modified clustering algorithm for Hadoop MapReduce framework. We have selected hadoop as our Hadoop MapReduce framework. For implementation, we have selected multi-node Hadoop installation on ubuntu operating system. Four machines have been configured for Hadoop MapReduce framework. One machine has been configured as master node while other machines perform as slave nodes. We have tested execution time for various sample datasets on Hadoop MapReduce framework for existing approach and proposed approach. Our approach outperform the existing approaches The table show the Results of K-means approach and Proposed approach in this table we are given the results of Sample 1, Sample 2, Sample 3, Sample 4 Sample 5 data these results generated by the K-mean approach and proposed approached these Samples run by the Hadoop Multi node Framework in this table Proposed approached Result Better then K-means approach So results show the improve the performance of the proposed approach. Because exertion time low of proposed approach. TABLE1. HADOOP MULTI NODE MAPREDUCE Datasets K-means approach Proposed approach Sample Sample Sample Sample Sample Fig.1(a) show the Comparision between the K-means approach and propsed approach. Graph show the time of Sample 1,Sample 2, Sample 3, Sample 4, Sample 5, in this graph propsed approach exetution time is reduce compare to k mean approach.. so propsed approached batter then the k means approach.. graph show the performance of the k means and propsed approach

4 Framework for Big Data. We have proposed improved k- Medoids algorithm. Our proposed algorithm performs better then k-means clustering approach. We have tested sample datasets and our approach outperforms the existing approaches. In future, we test our work against different types data sets such as image files datasets, small file datasets with varying configuration parameters for Hadoop MapReduce framework. Fig.1(a)Hadoop MapReduce Framework Comparision Result Fig 1(b) show the different Samples of data these Samples results improve the perforance of the propesed approach because execution time is low compare to k means approach show in the graph.graph show the comparion between k means approach and propesed approached. Fig.1 (b)hadoop MapReduce Framework Comparision Result VIII. CONCLUSION AND FUTURE WORK Hadoop is widely well known framework for data analysis for large datasets. Hadoop gives performance due to its capability of datasets analysis in parallel and distributed environment. Hadoop is an open source framework. Hadoop Distributed File System (HDFS) and the MapReduce are the modules of Hadoop. HDFS is responsible for data storages while MapReduce is responsible for data processing. Huge data set such as web logs can be processed for analysis by Hadoop. In this research work we have developed an optimal approach to improve the performance of Hadoop MapReduce REFERENCES [1] Parth Gohil, Bakul Panchal, J. S. Dhobi A Novel Approach to Improve the Performance of Hadoop in Handling of Small Files in ieee International Conference on Electrical, Computer And Communication Technologies [2] Dweepna Garg, Parth Gohil, Khushboo Trivedi Modified Fuzzy K- mean Clustering using MapReduce in Hadoop and Cloud in ieee International Conference on Electrical, Computer And 1Communication Technologies [3] E. Raju, M. A. Hameed & K. Sravanthi Detecting Communities in Social Networks using Unnormalized Spectral Clustering incorporated with Bisecting k Means in ieee International Conference on Electrical, Computer And Communication Technologies 2015 [4] Manish Kumar Sharma & Mahesh M. Bundele Design & Analysis of K-means Algorithm for Cognitive Fatigue Detection in Vehicular Driver using Respiration Signal in ieee International Conference on Electrical, Computer And Communication Technologies [5] Pramod Bide & Rajashree Shedge Improved Document Clustering using K-means Algorithm in ieee International Conference on Electrical, Computer And Communication Technologies [6] Rong Gu, Xiaoliang Yang, Jinshuang Yan, Yuanhao Sun, Bing Wang, Chunfeng Yuan, Yihua Huang Hadoop: Improving Map Reduce performance by optimizing job execution mechanism in Hadoop clusters in Journal of Parallel and Distributed Computing Volume 74, Issue 3, March 2014, Pages [7] Garvit Bansal, Anshul Gupta, Utkarsh Pyne, Manish Singhal and Subhasis Banerjee. A Framework for Performance Analysis and Tuning in Hadoop Based Clusters, Workshop on Smarter Planet and Big Data Analytics (SPBDA 2014) held in conjunction with ICDCN [8] Ji Wentian, Guo Qingju, Zhong Sheng "Improved K-medoids Clustering Algorithm under Semantic Web" in Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013). [9] Vasiliki Kalavri, Vladimir Vlassov Map Reduce: Limitations, Optimizations and Open Issues in 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), DOI: /TrustCom , July [10] Swathi Prabhu, Anisha P Rodrigues, Guru Prasad M S, Nagesh H R Performance Enhancement of Hadoop MapReduce Framework for Analyzing Big Data in IEEE International Conference one Electrical, Computer And Communication Technologies, ISBN: , 5-7 March [11] Josepha. issa Performance Evaluation and Estimation Model Using Regression Method for Hadoop Word Count in IEEE Access Volume3, 18 December 2015, Page(s): , ISSN: [12] Subhashree Comparison of k-means and k-medoids Clustering Algorithms for Big Data Using Map Reduce Techniques IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 4, April [13] Gopi Gandhi, Rohit Srivastava Analysis and Implementation of Modified K-Medoids Algorithm to Increase Scalability and Efficiency for Large dataset in International Journal of Research in Engineering and Technology, Volume: 03 Issue: 06 Jun-2014,eISSN: PISSN: [14] Yaobin Jiang, Jiongmin Zhang Parallel K-Medoids Clustering Algorithm Based on Hadoop in 5th IEEE International Conference on Software Engineering and Service Science (ICSESS), June 2014 ISSN:

5 [15] Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shining Li and Chen Wang "MRTuner: A Toolkit to Enable Holistic Optimization for Map Reduce Jobs" in Proceedings of the VLDB Endowment, Vol. 7, No [16] Dili Wu A Self Tuning System Based on Application profiling and performance analysis for Optimizing Hadoop Map Reduce Cluster Configuration IEEE [17] J. Dean and S. Ghemawat. Map Reduce: A Flexible Data Processing Tool. CACM, 53(1):72 77,

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark PL.Marichamy 1, M.Phil Research Scholar, Department of Computer Application, Alagappa University, Karaikudi,

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

Research Works to Cope with Big Data Volume and Variety. Jiaheng Lu University of Helsinki, Finland

Research Works to Cope with Big Data Volume and Variety. Jiaheng Lu University of Helsinki, Finland Research Works to Cope with Big Data Volume and Variety Jiaheng Lu University of Helsinki, Finland Big Data: 4Vs Photo downloaded from: https://blog.infodiagram.com/2014/04/visualizing-big-data-concepts-strong.html

More information

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve

More information

Mounica B, Aditya Srivastava, Md. Faisal Alam

Mounica B, Aditya Srivastava, Md. Faisal Alam International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 3 ISSN : 2456-3307 Clustering of large datasets using Hadoop Ecosystem

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

Survey Paper on Traditional Hadoop and Pipelined Map Reduce

Survey Paper on Traditional Hadoop and Pipelined Map Reduce International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,

More information

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation 2016 IJSRSET Volume 2 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

Research Article Apriori Association Rule Algorithms using VMware Environment

Research Article Apriori Association Rule Algorithms using VMware Environment Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,

More information

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA

More information

CLIENT DATA NODE NAME NODE

CLIENT DATA NODE NAME NODE Volume 6, Issue 12, December 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Efficiency

More information

Improved MapReduce k-means Clustering Algorithm with Combiner

Improved MapReduce k-means Clustering Algorithm with Combiner 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering

More information

A REVIEW PAPER ON BIG DATA ANALYTICS

A REVIEW PAPER ON BIG DATA ANALYTICS A REVIEW PAPER ON BIG DATA ANALYTICS Kirti Bhatia 1, Lalit 2 1 HOD, Department of Computer Science, SKITM Bahadurgarh Haryana, India bhatia.kirti.it@gmail.com 2 M Tech 4th sem SKITM Bahadurgarh, Haryana,

More information

Online Bill Processing System for Public Sectors in Big Data

Online Bill Processing System for Public Sectors in Big Data IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer

More information

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters. Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2 International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1,

More information

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

LITERATURE SURVEY (BIG DATA ANALYTICS)!

LITERATURE SURVEY (BIG DATA ANALYTICS)! LITERATURE SURVEY (BIG DATA ANALYTICS) Applications frequently require more resources than are available on an inexpensive machine. Many organizations find themselves with business processes that no longer

More information

Survey on MapReduce Scheduling Algorithms

Survey on MapReduce Scheduling Algorithms Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used

More information

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DISTRIBUTED FRAMEWORK FOR DATA MINING AS A SERVICE ON PRIVATE CLOUD RUCHA V. JAMNEKAR

More information

Map Reduce Group Meeting

Map Reduce Group Meeting Map Reduce Group Meeting Yasmine Badr 10/07/2014 A lot of material in this presenta0on has been adopted from the original MapReduce paper in OSDI 2004 What is Map Reduce? Programming paradigm/model for

More information

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 DOI:10.19026/ajfst.5.3106 ISSN: 2042-4868; e-issn: 2042-4876 2013 Maxwell Scientific Publication Corp. Submitted: May 29, 2013 Accepted:

More information

Implementation and Performance Analysis of Apache Hadoop

Implementation and Performance Analysis of Apache Hadoop IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 5, Ver. VI (Sep. - Oct. 2016), PP 48-58 www.iosrjournals.org Implementation and Performance Analysis

More information

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Journal homepage: www.mjret.in ISSN:2348-6953 A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Bhavsar Nikhil, Bhavsar Riddhikesh,Patil Balu,Tad Mukesh Department of Computer Engineering JSPM s

More information

The MapReduce Framework

The MapReduce Framework The MapReduce Framework In Partial fulfilment of the requirements for course CMPT 816 Presented by: Ahmed Abdel Moamen Agents Lab Overview MapReduce was firstly introduced by Google on 2004. MapReduce

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Efficient Algorithm for Frequent Itemset Generation in Big Data

Efficient Algorithm for Frequent Itemset Generation in Big Data Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

Transaction Analysis using Big-Data Analytics

Transaction Analysis using Big-Data Analytics Volume 120 No. 6 2018, 12045-12054 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ Transaction Analysis using Big-Data Analytics Rajashree. B. Karagi 1, R.

More information

INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES

INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES Al-Badarneh et al. Special Issue Volume 2 Issue 1, pp. 200-213 Date of Publication: 19 th December, 2016 DOI-https://dx.doi.org/10.20319/mijst.2016.s21.200213 INDEX-BASED JOIN IN MAPREDUCE USING HADOOP

More information

Document Clustering with Map Reduce using Hadoop Framework

Document Clustering with Map Reduce using Hadoop Framework Document Clustering with Map Reduce using Hadoop Framework Satish Muppidi* Department of IT, GMRIT, Rajam, AP, India msatishmtech@gmail.com M. Ramakrishna Murty Department of CSE GMRIT, Rajam, AP, India

More information

Data Analysis Using MapReduce in Hadoop Environment

Data Analysis Using MapReduce in Hadoop Environment Data Analysis Using MapReduce in Hadoop Environment Muhammad Khairul Rijal Muhammad*, Saiful Adli Ismail, Mohd Nazri Kama, Othman Mohd Yusop, Azri Azmi Advanced Informatics School (UTM AIS), Universiti

More information

SBKMMA: Sorting Based K Means and Median Based Clustering Algorithm Using Multi Machine Technique for Big Data

SBKMMA: Sorting Based K Means and Median Based Clustering Algorithm Using Multi Machine Technique for Big Data International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ SBKMMA: Sorting Based K Means and Median Based Algorithm

More information

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **

More information

Big Data & Hadoop ABSTRACT

Big Data & Hadoop ABSTRACT Big Data & Hadoop Darshil Doshi 1, Charan Tandel 2,Prof. Vijaya Chavan 3 1 Student, Computer Technology, Bharati Vidyapeeth Institute of Technology, Maharashtra, India 2 Student, Computer Technology, Bharati

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 11 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(11), 2014 [5368-5376] The study on magnanimous data-storage system based

More information

Survey on Incremental MapReduce for Data Mining

Survey on Incremental MapReduce for Data Mining Survey on Incremental MapReduce for Data Mining Trupti M. Shinde 1, Prof.S.V.Chobe 2 1 Research Scholar, Computer Engineering Dept., Dr. D. Y. Patil Institute of Engineering &Technology, 2 Associate Professor,

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017) Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 1 Information Engineering

More information

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

EXTRACT DATA IN LARGE DATABASE WITH HADOOP International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0

More information

A Novel Architecture to Efficient utilization of Hadoop Distributed File Systems for Small Files

A Novel Architecture to Efficient utilization of Hadoop Distributed File Systems for Small Files A Novel Architecture to Efficient utilization of Hadoop Distributed File Systems for Small Files Vaishali 1, Prem Sagar Sharma 2 1 M. Tech Scholar, Dept. of CSE., BSAITM Faridabad, (HR), India 2 Assistant

More information

Tag Based Image Search by Social Re-ranking

Tag Based Image Search by Social Re-ranking Tag Based Image Search by Social Re-ranking Vilas Dilip Mane, Prof.Nilesh P. Sable Student, Department of Computer Engineering, Imperial College of Engineering & Research, Wagholi, Pune, Savitribai Phule

More information

HADOOP FRAMEWORK FOR BIG DATA

HADOOP FRAMEWORK FOR BIG DATA HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large

More information

A Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data

A Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data A Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data Sachin Jadhav, Shubhangi Suryawanshi Abstract Nowadays, the volume of data is growing at an nprecedented rate, big

More information

Decision analysis of the weather log by Hadoop

Decision analysis of the weather log by Hadoop Advances in Engineering Research (AER), volume 116 International Conference on Communication and Electronic Information Engineering (CEIE 2016) Decision analysis of the weather log by Hadoop Hao Wu Department

More information

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG * 2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5

More information

ENHANCING CLUSTERING OF CLOUD DATASETS USING IMPROVED AGGLOMERATIVE ALGORITHMS

ENHANCING CLUSTERING OF CLOUD DATASETS USING IMPROVED AGGLOMERATIVE ALGORITHMS Scientific Journal of Impact Factor(SJIF): 3.134 e-issn(o): 2348-4470 p-issn(p): 2348-6406 International Journal of Advance Engineering and Research Development Volume 1,Issue 12, December -2014 ENHANCING

More information

Apache Spark and Hadoop Based Big Data Processing System for Clinical Research

Apache Spark and Hadoop Based Big Data Processing System for Clinical Research Apache Spark and Hadoop Based Big Data Processing System for Clinical Research Sreekanth Rallapalli 1,*, Gondkar R R 2 1 Research Scholar, R&D Centre, Bharathiyar University, Coimbatore, Tamilnadu, India.

More information

Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved

Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India Abstract Drastic growth

More information

Comparative Analysis of K means Clustering Sequentially And Parallely

Comparative Analysis of K means Clustering Sequentially And Parallely Comparative Analysis of K means Clustering Sequentially And Parallely Kavya D S 1, Chaitra D Desai 2 1 M.tech, Computer Science and Engineering, REVA ITM, Bangalore, India 2 REVA ITM, Bangalore, India

More information

A Text Information Retrieval Technique for Big Data Using Map Reduce

A Text Information Retrieval Technique for Big Data Using Map Reduce Bonfring International Journal of Software Engineering and Soft Computing, Vol. 6, Special Issue, October 2016 22 A Text Information Retrieval Technique for Big Data Using Map Reduce M.M. Kodabagi, Deepa

More information

SBKMA: Sorting based K-Means Clustering Algorithm using Multi Machine Technique for Big Data

SBKMA: Sorting based K-Means Clustering Algorithm using Multi Machine Technique for Big Data I J C T A, 8(5), 2015, pp. 2105-2110 International Science Press SBKMA: Sorting based K-Means Clustering Algorithm using Multi Machine Technique for Big Data E. Mahima Jane* and E. George Dharma Prakash

More information

DATA DEDUPLCATION AND MIGRATION USING LOAD REBALANCING APPROACH IN HDFS Pritee Patil 1, Nitin Pise 2,Sarika Bobde 3 1

DATA DEDUPLCATION AND MIGRATION USING LOAD REBALANCING APPROACH IN HDFS Pritee Patil 1, Nitin Pise 2,Sarika Bobde 3 1 DATA DEDUPLCATION AND MIGRATION USING LOAD REBALANCING APPROACH IN HDFS Pritee Patil 1, Nitin Pise 2,Sarika Bobde 3 1 Department of Computer Engineering 2 Department of Computer Engineering Maharashtra

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

International Journal of Advance Engineering and Research Development. Two Level Clustering Using Hadoop Map Reduce Framework

International Journal of Advance Engineering and Research Development. Two Level Clustering Using Hadoop Map Reduce Framework Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 03, March -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Two Level

More information

LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS

LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS Vandita Jain 1, Prof. Tripti Saxena 2, Dr. Vineet Richhariya 3 1 M.Tech(CSE)*,LNCT, Bhopal(M.P.)(India) 2 Prof. Dept. of CSE, LNCT, Bhopal(M.P.)(India)

More information

Obtaining Rough Set Approximation using MapReduce Technique in Data Mining

Obtaining Rough Set Approximation using MapReduce Technique in Data Mining Obtaining Rough Set Approximation using MapReduce Technique in Data Mining Varda Dhande 1, Dr. B. K. Sarkar 2 1 M.E II yr student, Dept of Computer Engg, P.V.P.I.T Collage of Engineering Pune, Maharashtra,

More information

An improved MapReduce Design of Kmeans for clustering very large datasets

An improved MapReduce Design of Kmeans for clustering very large datasets An improved MapReduce Design of Kmeans for clustering very large datasets Amira Boukhdhir Laboratoire SOlE Higher Institute of management Tunis Tunis, Tunisia Boukhdhir _ amira@yahoo.fr Oussama Lachiheb

More information

SQL Query Optimization on Cross Nodes for Distributed System

SQL Query Optimization on Cross Nodes for Distributed System 2016 International Conference on Power, Energy Engineering and Management (PEEM 2016) ISBN: 978-1-60595-324-3 SQL Query Optimization on Cross Nodes for Distributed System Feng ZHAO 1, Qiao SUN 1, Yan-bin

More information

Performance Analysis of Hadoop Application For Heterogeneous Systems

Performance Analysis of Hadoop Application For Heterogeneous Systems IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. I (May-Jun. 2016), PP 30-34 www.iosrjournals.org Performance Analysis of Hadoop Application

More information

Mining Distributed Frequent Itemset with Hadoop

Mining Distributed Frequent Itemset with Hadoop Mining Distributed Frequent Itemset with Hadoop Ms. Poonam Modgi, PG student, Parul Institute of Technology, GTU. Prof. Dinesh Vaghela, Parul Institute of Technology, GTU. Abstract: In the current scenario

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

A New HadoopBased Network Management System with Policy Approach

A New HadoopBased Network Management System with Policy Approach Computer Engineering and Applications Vol. 3, No. 3, September 2014 A New HadoopBased Network Management System with Policy Approach Department of Computer Engineering and IT, Shiraz University of Technology,

More information

Scheme of Big-Data Supported Interactive Evolutionary Computation

Scheme of Big-Data Supported Interactive Evolutionary Computation 2017 2nd International Conference on Information Technology and Management Engineering (ITME 2017) ISBN: 978-1-60595-415-8 Scheme of Big-Data Supported Interactive Evolutionary Computation Guo-sheng HAO

More information

A Review Approach for Big Data and Hadoop Technology

A Review Approach for Big Data and Hadoop Technology International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 A Review Approach for Big Data and Hadoop Technology Prof. Ghanshyam Dhomse

More information

Indexing Strategies of MapReduce for Information Retrieval in Big Data

Indexing Strategies of MapReduce for Information Retrieval in Big Data International Journal of Advances in Computer Science and Technology (IJACST), Vol.5, No.3, Pages : 01-06 (2016) Indexing Strategies of MapReduce for Information Retrieval in Big Data Mazen Farid, Rohaya

More information

Research and Realization of AP Clustering Algorithm Based on Cloud Computing Yue Qiang1, a *, Hu Zhongyu2, b, Lei Xinhua1, c, Li Xiaoming3, d

Research and Realization of AP Clustering Algorithm Based on Cloud Computing Yue Qiang1, a *, Hu Zhongyu2, b, Lei Xinhua1, c, Li Xiaoming3, d 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) Research and Realization of AP Clustering Algorithm Based on Cloud Computing Yue Qiang1, a *, Hu Zhongyu2, b,

More information

BIG DATA & HADOOP: A Survey

BIG DATA & HADOOP: A Survey Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

Analyzing and Improving Load Balancing Algorithm of MooseFS

Analyzing and Improving Load Balancing Algorithm of MooseFS , pp. 169-176 http://dx.doi.org/10.14257/ijgdc.2014.7.4.16 Analyzing and Improving Load Balancing Algorithm of MooseFS Zhang Baojun 1, Pan Ruifang 1 and Ye Fujun 2 1. New Media Institute, Zhejiang University

More information

ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS

ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS Radhakrishnan R 1, Karthik

More information

International Journal of Advanced Engineering and Management Research Vol. 2 Issue 5, ISSN:

International Journal of Advanced Engineering and Management Research Vol. 2 Issue 5, ISSN: International Journal of Advanced Engineering and Management Research Vol. 2 Issue 5, 2017 http://ijaemr.com/ ISSN: 2456-3676 IMPLEMENTATION OF BIG DATA FRAMEWORK IN WEB ACCESS LOG ANALYSIS Imam Fahrur

More information

Fast and Effective System for Name Entity Recognition on Big Data

Fast and Effective System for Name Entity Recognition on Big Data International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [MAPREDUCE] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Bit Torrent What is the right chunk/piece

More information

Mr. Bhavin J. Mathiya 1st Research Scholar, C.U. Shah University Wadhwan City, Gujarat, (India)

Mr. Bhavin J. Mathiya 1st Research Scholar, C.U. Shah University Wadhwan City, Gujarat, (India) Volume-3, Issue-05, May 2016 ISSN: 2349-7637 (Online) RESEARCH HUB International Multidisciplinary Research Journal (RHIMRJ) Research Paper Available online at: www.rhimrj.com Apache Hadoop Yarn Performance

More information

Enhanced Hadoop with Search and MapReduce Concurrency Optimization

Enhanced Hadoop with Search and MapReduce Concurrency Optimization Volume 114 No. 12 2017, 323-331 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Enhanced Hadoop with Search and MapReduce Concurrency Optimization

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

Hadoop and HDFS Overview. Madhu Ankam

Hadoop and HDFS Overview. Madhu Ankam Hadoop and HDFS Overview Madhu Ankam Why Hadoop We are gathering more data than ever Examples of data : Server logs Web logs Financial transactions Analytics Emails and text messages Social media like

More information

International Journal of Advance Engineering and Research Development. Performance Comparison of Hadoop Map Reduce and Apache Spark

International Journal of Advance Engineering and Research Development. Performance Comparison of Hadoop Map Reduce and Apache Spark Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 03, March -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Performance

More information

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

An Improved Document Clustering Approach Using Weighted K-Means Algorithm An Improved Document Clustering Approach Using Weighted K-Means Algorithm 1 Megha Mandloi; 2 Abhay Kothari 1 Computer Science, AITR, Indore, M.P. Pin 453771, India 2 Computer Science, AITR, Indore, M.P.

More information

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Google File System (GFS) and Hadoop Distributed File System (HDFS) Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear

More information

A Comparative Study of Selected Classification Algorithms of Data Mining

A Comparative Study of Selected Classification Algorithms of Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220

More information

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Available online at  ScienceDirect. Procedia Computer Science 89 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 341 348 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Parallel Approach

More information