A SURVEY ON BIG DATA TECHNIQUES

Size: px
Start display at page:

Download "A SURVEY ON BIG DATA TECHNIQUES"

Transcription

1 A SURVEY ON BIG DATA TECHNIQUES K. Anusha 1, K. UshaRani 2, C. Lakshmi 3 ABSTRACT: Big Data, the term for data sets that are so huge and complex which becomes difficult to process using old data management tools. Big Data explores many novel techniques and methods to capture, store, distribute, manage and analyze petabytes of datasets with high-velocity and different structures. Data from sensors, electronic devices like mobiles, social websites, scientific data and enterprises are contributing to the sudden increase in data. These large divisions of data generally called Big Data have become one of the latest research trends today. Big Data is a data whose size, variety and difficulty require new design, techniques, algorithms and analytics to manage it and mine value and hidden knowledge from it. Map Reduce environment which is now available open-source in Hadoop is one of the emerging solution to the Big Data problem. Hadoop enables the distributed processing of large data sets across clusters of service servers. It is considered to range from a single server to thousands of machines, with a very high extent of fault tolerance. Hadoop s distributed processing having Hadoop Distributed File System, Map Reduce algorithms and overall design are the key steps towards achieving the demanding benefits of Big Data. In this paper, a review on technical challenges with Big Data, Hadoop and Map Reduce Architectures is presented. Keywords: Big Data, Hadoop, Map Reduce, Hadoop Distributed File System. I. INTRODUCTION A. Big Data : Definition Big Data is the most recent trend in the IT world and business right now. Big data is a term that refers to combinations of data sets whose size, variability, and velocity make them difficult to be captured, managed, processed or analyzed by standard technologies and tools, such as relational databases and desktop statistics, within the time necessary to make them useful [12]. These large chunks of data generally called as Big Data has redefined the current data processing state. Most analysts currently refer to data sets from terabytes (one terabyte=10 12 or 1000 gigabytes) to multiple petabytes (one petabyte=1015 or 1000 terabytes) as big data. Big Data system can be divided into three layers including Infrastructure Layer, Computing Layer and Application Layer from top to bottom [13]. 1 Research Scholar 2,3 Professor Department of Computer Science Sri PadmavathiMahilaVisvavidyalayam Tirupati Fig. 1 Layered Architecture of Big Data System B. Big Data Characteristics Volume is defined as the potential data capacity of terabytes to petabytes Velocity is defined as how rapidly the data is entering the systems Variety includes all types of data like structured and unstructured data C. Evolution of Big Data From customers to companies, all have an unsatisfiable desire for data and all that can be done with it. All are depending on data for new ways to identify fraud, and keeping a check on consumer behavior and also for so many other things [12]. In the past, enterprise systems used to be major sources of data, but now-a-days many additional sources are contributing to the data group like sensors, social networking sites, etc,. D. Technical Challenges with Big Data Processing i. Fault Tolerance: With the upcoming of new technologies like Cloud computing and Big Data it is always wished that whenever the failure occur it should be within acceptable threshold. Thus the major task is to limit the probability of failure to 65 P a g e

2 an acceptable level. But it is very expensive to reduce the probability of failure [13]. ii. Heterogeneous data: Unstructured data represents nearly every kind of data being produced like social media communications to recorded meetings, to handling of pdf documents to more. Working with unstructured data is difficult and costly too. Structured data is always organized into highly automated and controllable way. iii. Scale : The first thing anyone thinks of with Big Data is its size. Managing large and quickly increasing volumes of data has been a challenging issue for many decades. In the earlier period, this challenge was mitigated by processors getting faster to provide us with the resources needed to deal with increasing volume of data. But, there is a fundamental shift underway now: data volume is scaling faster than compute resources, and CPU speeds are static. iv. Privacy: The privacy of data is another massive concern, and one that increases in the perspective of Big Data. However, there is fear regarding the inappropriate use of personal data, particularly through relating of data from multiple sources. Managing seclusion effectively is both a technical and a sociological problem, which must be addressed jointly from both contexts to realize the promise of big data. II. LITERATURE REVIEW Jimmy Lin et.al Used Hadoop which is currently the large scale data analysis hammer of choice, but there exists classes of algorithms that aren t nails in the sense that they are not particularly amenable to the Map Reduce programming model [7]. He focuses on the simple solution to find alternative non-iterative algorithms that solves the same problem. The standard Map Reduce is well known and described in many places.each iteration of the page rank corresponds to the Map Reduce job. The author suggested iterative graph, gradient descent & EM iteration which is typically implemented as Hadoop job with driven set up iteration & Check for convergences. The author suggests that if all you have is a hammer; throw away everything that s not a nail [7]. S. Vikram Phaneendra & E. Madhusudhan Reddy et.al Explained that in olden days the data was less and easily handled by RDBMS but recently it is difficult to handle huge data through RDBMS tools, which is preferred as big data. In this they told that big data differs from other data in 5 dimensions such as volume, velocity, variety, value and complexity. They illustrated the Hadoop architecture consisting of name node, data node, edge node, HDFS to handle big data systems. Hadoop architecture handle large data sets, scalable algorithm does log management application of big data can be found out in financial, retail industry, health-care, mobility, insurance. The authors also focused on the challenges that need to be faced by enterprises when handling big data: - data privacy, search analysis, etc [6 ]. Albert Bifet et.al Stated that Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge; allowing organizations to react quickly when problem appear or detect to improve performance. Huge amount of data is created everyday termed as big data. The tools used for mining big data are apache Hadoop, apache big, cascading, scribe, storm, apache hbase, apache mahout, MOA, R, etc [8]. Thus, he instructed that our ability to handle many exabytes of data mainly dependent on existence of rich variety dataset, technique, software framework. Aditya B. Patel et.al Addresses Big data Problem using Hadoop and Map Reduce reports the experimental research on the Big Data problems in various domains. It describe the optimal and efficient solutions using Hadoop cluster, Hadoop Distributed File System (HDFS) for storage data and Map Reduce framework for parallel processing to process massive data sets and records [9]. Suman Arora, Dr.Madhu Goel et.al Stated many techniques for making the efficient scheduler for the map reduce so that we can speed up our system or data retrieval Technique like Quincy, Asynchronous Processing, Speculative Execution, Job Awareness, Delay Scheduling, Copy Compute Splitting etc had made the scheduler effective for the faster processing [3]. Poonam S. Patil, Rajesh. N. Phursule et.al Illustrated the Map Reduce programming model has been successfully used at Google for many different purposes the model is easy to use, even for programmers without experience with parallel and distributed systems, since it hides the details of parallelization, fault tolerance, locality optimization, and load balancing. Second, a large variety of problems are easily expressible as Map Reduce computations. Map Reduce is easy to parallelize and distribute computations and to make such computations fault tolerant. And there are 66 P a g e

3 III. HADOOP & MAP REDUCE extensive list of products and projects that either extend Hadoop s functionality or expose some existing capability in new ways [5]. Vibhavari Chavan, Prof. Rajesh. N. Phursule et.al stated that Hadoop Map Reduce is a large scale, open source software framework dedicated to scalable, distributed, dataintensive computing. The framework breaks up large data into smaller parallelizable chunks and handles scheduling Maps each piece to an intermediate value Reduces intermediate values to a solution User-specified partition and combiner options Fault tolerant, reliable, and supports thousands of nodes and petabytes of data If you can rewrite algorithms into Maps and Reduces, and your problem can be broken up into small pieces solvable in parallel, then Hadoop s Map Reduce is the way to go for a distributed problem solving approach to large datasets Tried and tested in production Many implementation options. We can present the design and evaluation of a data aware cache framework that requires minimum change to the original Map Reduce programming model for provisioning incremental processing for Big Data applications using the Map Reduce model [4]. Amogh Pramod Kulkarni, Mahesh Khandewal et.al. Stated the importance of some of the technologies that handle Big Data like Hadoop, HDFS, Map Reduce. The author suggested about various schedulers used in Hadoop and about the technical aspects of Hadoop. The author also focuses on the importance of YARN which overcomes the limitations of Map Reduce. Dhole Poonam B, Gunjal Baisa L et.al. Focuses on Hadoop data flow and Pipelined Map Reduce data flow. The author suggested that Pipelined Map Reduce is much better than the traditional one. He states that it reduces the completion time of tasks. That means the implementation of Pipeline Map Reduce can processes large datasets effectively. Sabia and Love Arora et.al mainly focuses on various Big Data handling techniques those handle a massive amount of data from different sources and improves overall performance of systems. Mrigank Mridul, Akashdeep Khajuria, Snehasish Dutta, Kumar N. et.al stated that Map Reduce is the best tool available for processing data and its distributed, columnoriented database, HBase which uses HDFS for its underlying storage, and support provides more efficiency to the system [1]. A. Hadoop & Map Reduce These are the commonly used models for Big Data processing. Hadoop is a Programming framework used to sustain the processing of outsized data sets in a distributed computing atmosphere. Hadoop was developed by Google s Map Reduce that is a software framework where applications split down into various parts. The Apache Hadoop project consists of the Hadoop Distributed File System module and Hadoop Map Reduce in addition to other modules. The software is modeled to bring in upon the processing power of clustered computing while managing failures at node level. Fig. 2 Hadoop Architecture The Current Apache Hadoop ecosystem consists of the Hadoop Kernel, Map Reduce, HDFS and numbers of various components. B. Hadoop Distributed File System HDFS is a clustered file management system which holds huge amounts of data, and provides high turnout & highspeed access to data. HDFS stores massive amounts of information scale up incrementally and endure the breakdown of considerable chunks of the storage infrastructure without losing data. The system stores the files in a redundant way through a number of machines to make sure that they are fault-tolerant and presented to very similar applications [12]. Hadoop creates clusters of machines and coordinates work amongst them. Clusters can be built with low-cost computers. If one fails, Hadoop continues to run the cluster without losing data or disrupting work, by changing work to the remaining machines in the group. HDFS controls storage on the cluster by breaking received files into pieces, called blocks, and storing each of the blocks redundantly across the group of servers. In the common case, HDFS stores three full copies of each file by copying each piece to three different servers [12]. 67 P a g e

4 Name Node: Stores Meta Data only Data Nodes:Stores blocks from files METADATA /user/aaron/foo 1,2,4 /user/aaron/bar 3,5 5 2 Fig. 3 HDFS or the Hadoop Distributed File System The computing systems in each cluster are called Data Nodes. A file consists of multiple blocks, and it is not essential that they are stored on the same machine as the choice of where each block will be stored is selected at random. As such, locating particular file needs sustain from multiple machines. If multiple machines are needed in allocating a file, then a file could become unavailable even if one machine in the cluster is lost. HDFS handles this problem by replicating each block across several systems which is set to 3 as default. It is required that this file system stores the metadata reliably. The whole process is controlled by a single system called the Name Node which has the metadata of the entire file system. As Metadata of each file is relatively low, this whole information is stored in main memory of Name Node machine, thus allowing for faster accessibility Map-Reduce were introduced by Google in order to process and accumulate large datasets on commodity hardware. Map Reduce is a representation for processing large-scale data records in clusters. The processing pillar in the Hadoop environment is the Map Reduce framework. The framework allows the design of a procedure to be applied to a massive data set, split the problem and data, and run it in parallel [12]. For example, a very large dataset can be condensed into minor subsets where analytics can be applied. In a conventional data warehousing circumstances, this might involve applying an ETL operation on the data to generate something usable by the analyst. In Hadoop, all these kinds of operations are written as Map Reduce jobs in Java. The outputs of these jobs can be written back to either HDFS or placed in a conventional data warehouse. There are two key functions in Map Reduce as follows: map the function takes key or value pairs as input and produces an intermediary set of key or value pairs reduce the function which merges all the intermediary values related with the same intermediate key Fig. 5 Mapping Fig. 6 Reducing Fig. 4 HDFS Architecture C. Map Reduce Fig. 4 HDFS Architecture C. Map Reduce 68 P a g e

5 Fig. 7 Map Reduce Architecture IV. CONCLUSION The paper describes the concept of Big Data along with the characteristics of Big Data like Volume, Velocity and variety. The paper also focuses on technical challenges with Big Data processing. These technical challenges must be addressed for efficient and rapid processing of Big Data. The paper explores Hadoop which is an open source software used for processing of Big Data. Hadoop with its efficient DFS & programming framework based on concept of mapped reduction, is a powerful tool to manage large data sets. With its Map Reduce programming paradigms, overall architecture, ecosystem, fault- tolerance techniques and distributed processing, Hadoop offers a whole infrastructure to handle Big Data. Users must use the benefits of Big-Data by adopting Hadoop infrastructure for data processing. REFERENCES [1] Mrigank Mridul, Akashdeep Khajuria, Snehasish Dutta, Kumar N Analysis of Big Data using Apache Hadoop and Map Reduce Volume 4, Issue 5, May [2] Amogh Pramod Kulkarni, Mahesh Khandewal, Survey on Hadoop and Introduction to YARN, International Journal of Emerging Technology and Advanced Engineering Website: ISO 9001:2008 Certified Journal, Volume 4, Issue 5, May 2014) [3] Suman Arora, Dr.Madhu Goel, Survey Paper on Scheduling in Hadoop International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 5, May 2014 [4] Ms. Vibhavari Chavan, Prof. Rajesh. N. Phursule, Survey Paper On Big Data International Journal of Computer Science and Information Technologies, Vol. 5 (6), [5] Poonam S. Patil, Rajesh. N. Phursule, Survey Paper on Big Data Processing and Hadoop Components International Journal of Science and Research (IJSR), Volume 3 Issue 10, October 2014 [6] S.Vikram Phaneendra & E.Madhusudhan Reddy Big Data- solutions for RDBMS problems- A survey In 12th IEEE/IFIP Network Operations & Management Symposium (NOMS 2010) (Osaka, Japan, Apr ). [7] Jimmy Lin Map Reduce Is Good Enough? The control project, IEEE Computer 32 (2013). [8] Albert Bifet Mining Big Data In Real Time Informatica 2013) DEC 2012 [9] Aditya B. Patel, Manashvi Birla and Ushma Nair, "Addressing Big Data Problem Using Hadoop and Map Reduce," in Proc Nirma University International Conference On Engineering, [10] Dhole Poonam B, Gunjal Baisa L, Survey Paper on Traditional Hadoop and Pipelined Map Reduce International Journal of Computational Engineering Research Vol, 03 Issue, 12 [11] Sabia and Love Arora, Technologies to Handle Big Data: A Survey [12] Praveen Kumar1, Dr Vijay Singh Rathore, Efficient Capabilities of Processing of Big Data using Hadoop Map Reduce Vol. 3, Issue 6, June 2014 [13] Harshawardhan S. Bhosale, Prof. Devendra P. Gadekar A Review Paper on Big Data and Hadoop Volume 4, Issue 10, Oct 69 P a g e

An Emergence Techniques In Big Data Mining

An Emergence Techniques In Big Data Mining International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 2 ISSN : 2456-3307 An Emergence Techniques In Big Data Mining Hemant

More information

A Survey on Big Data and Hadoop

A Survey on Big Data and Hadoop A Survey on Big Data and Hadoop 1 Kawale S. M., 2 Dr. Holambe A. N. 1 Lecturure, 2 Head of Department 1 Department of Computer Engineering, 1 SVERI s College of Engineering (Polytechnic), Pandharpur, Pandharpur,

More information

Abstract. Keywords. 1. Introduction. Festim Halili 1, Festim Kamberi 2

Abstract. Keywords. 1. Introduction. Festim Halili 1, Festim Kamberi 2 Journal of Applied Mathematics and Computation (JAMC), 2018, 2(2), 50-57 http://www.hillpublisher.org/journal/jamc ISSN Online:2576-0645 ISSN Print:2576-0653 Performance analysis of classification Algorithms:

More information

A Review Paper on Big data & Hadoop

A Review Paper on Big data & Hadoop A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College

More information

BIG DATA & HADOOP: A Survey

BIG DATA & HADOOP: A Survey Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

Comparative Study on Tools and Techniques of Big Data Analysis

Comparative Study on Tools and Techniques of Big Data Analysis 61 Comparative Study on Tools and Techniques of Big Data Analysis B.THILLAIESWARI M.S., M.Phil., B.Ed., Assistant Professor, Department of Computer Science, TBAK College for Women, Kilakarai. Email:thillaikris@gmail.com

More information

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Journal homepage: www.mjret.in ISSN:2348-6953 A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Bhavsar Nikhil, Bhavsar Riddhikesh,Patil Balu,Tad Mukesh Department of Computer Engineering JSPM s

More information

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

EXTRACT DATA IN LARGE DATABASE WITH HADOOP International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

Introduction to Big-Data

Introduction to Big-Data Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,

More information

Department of Information Technology, St. Joseph s College (Autonomous), Trichy, TamilNadu, India

Department of Information Technology, St. Joseph s College (Autonomous), Trichy, TamilNadu, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 A Survey on Big Data and Hadoop Ecosystem Components

More information

A SURVEY ON BIG DATA AND HADOOP

A SURVEY ON BIG DATA AND HADOOP A SURVEY ON BIG DATA AND HADOOP S.Tamil Selvan 1 Dr. P. Balamurugan 2 1 (Asst Prof, Dept of CSE, M.P.Nachimuthu M.Jaganathan Engg College, Chennimalai, India, stamilselvancse@gmail.com) 2 (Associate Prof,

More information

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation 2016 IJSRSET Volume 2 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

More information

Comparative Analysis of Range Aggregate Queries In Big Data Environment

Comparative Analysis of Range Aggregate Queries In Big Data Environment Comparative Analysis of Range Aggregate Queries In Big Data Environment Ranjanee S PG Scholar, Dept. of Computer Science and Engineering, Institute of Road and Transport Technology, Erode, TamilNadu, India.

More information

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large

More information

BIG DATA WITH HADOOP FOR DATA MANAGEMENT, PROCESSING AND STORING

BIG DATA WITH HADOOP FOR DATA MANAGEMENT, PROCESSING AND STORING BIG DATA WITH HADOOP FOR DATA MANAGEMENT, PROCESSING AND STORING Revathi.V 1, Rakshitha.K.R 2, Sruthi.K 3, Guruprasaath.S 4 1Assistant Professor, Department of BCA & M.Sc SS, Sri Krishna Arts and Science

More information

LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS

LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS Vandita Jain 1, Prof. Tripti Saxena 2, Dr. Vineet Richhariya 3 1 M.Tech(CSE)*,LNCT, Bhopal(M.P.)(India) 2 Prof. Dept. of CSE, LNCT, Bhopal(M.P.)(India)

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Exploiting and Gaining New Insights for Big Data Analysis

Exploiting and Gaining New Insights for Big Data Analysis Exploiting and Gaining New Insights for Big Data Analysis K.Vishnu Vandana Assistant Professor, Dept. of CSE Science, Kurnool, Andhra Pradesh. S. Yunus Basha Assistant Professor, Dept.of CSE Sciences,

More information

Survey Paper on Traditional Hadoop and Pipelined Map Reduce

Survey Paper on Traditional Hadoop and Pipelined Map Reduce International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

Massive Online Analysis - Storm,Spark

Massive Online Analysis - Storm,Spark Massive Online Analysis - Storm,Spark presentation by R. Kishore Kumar Research Scholar Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Kharagpur-721302, India (R

More information

Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data

Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data Yojna Arora, Dinesh Goyal Abstract: Big Data refers to that huge amount of data which cannot be analyzed by using traditional analytics

More information

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The

More information

A Review on Hive and Pig

A Review on Hive and Pig A Review on Hive and Pig Kadhar Basha J Research Scholar, School of Computer Science, Engineering and Applications, Bharathidasan University Trichy, Tamilnadu, India Dr. M. Balamurugan, Associate Professor,

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

Online Bill Processing System for Public Sectors in Big Data

Online Bill Processing System for Public Sectors in Big Data IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Document Clustering with Map Reduce using Hadoop Framework

Document Clustering with Map Reduce using Hadoop Framework Document Clustering with Map Reduce using Hadoop Framework Satish Muppidi* Department of IT, GMRIT, Rajam, AP, India msatishmtech@gmail.com M. Ramakrishna Murty Department of CSE GMRIT, Rajam, AP, India

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

On The Fly Mapreduce Aggregation for Big Data Processing In Hadoop Environment

On The Fly Mapreduce Aggregation for Big Data Processing In Hadoop Environment ISSN (e): 2250 3005 Volume, 07 Issue, 07 July 2017 International Journal of Computational Engineering Research (IJCER) On The Fly Mapreduce Aggregation for Big Data Processing In Hadoop Environment Ms.

More information

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation

More information

SBKMMA: Sorting Based K Means and Median Based Clustering Algorithm Using Multi Machine Technique for Big Data

SBKMMA: Sorting Based K Means and Median Based Clustering Algorithm Using Multi Machine Technique for Big Data International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ SBKMMA: Sorting Based K Means and Median Based Algorithm

More information

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK 1 Dr.R.Kousalya, 2 T.Sindhupriya 1 Research Supervisor, Professor & Head, Department of Computer Applications, Dr.N.G.P Arts and Science College, Coimbatore

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved. Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype? Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/

More information

Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop

Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop K. Senthilkumar PG Scholar Department of Computer Science and Engineering SRM University, Chennai, Tamilnadu, India

More information

COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING

COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING Volume 119 No. 16 2018, 937-948 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING K.Anusha

More information

Mounica B, Aditya Srivastava, Md. Faisal Alam

Mounica B, Aditya Srivastava, Md. Faisal Alam International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 3 ISSN : 2456-3307 Clustering of large datasets using Hadoop Ecosystem

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

International Journal of Advance Engineering and Research Development. Performance Comparison of Hadoop Map Reduce and Apache Spark

International Journal of Advance Engineering and Research Development. Performance Comparison of Hadoop Map Reduce and Apache Spark Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 03, March -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Performance

More information

The Hadoop Paradigm & the Need for Dataset Management

The Hadoop Paradigm & the Need for Dataset Management The Hadoop Paradigm & the Need for Dataset Management 1. Hadoop Adoption Hadoop is being adopted rapidly by many different types of enterprises and government entities and it is an extraordinarily complex

More information

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018 Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/

More information

Cloud Computing Techniques for Big Data and Hadoop Implementation

Cloud Computing Techniques for Big Data and Hadoop Implementation Cloud Computing Techniques for Big Data and Hadoop Implementation Nikhil Gupta (Author) Ms. komal Saxena(Guide) Research scholar Assistant Professor AIIT, Amity university AIIT, Amity university NOIDA-UP

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

HADOOP FRAMEWORK FOR BIG DATA

HADOOP FRAMEWORK FOR BIG DATA HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further

More information

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela

More information

"Big Data... and Related Topics" John S. Erickson, Ph.D The Rensselaer IDEA Rensselaer Polytechnic Institute

Big Data... and Related Topics John S. Erickson, Ph.D The Rensselaer IDEA Rensselaer Polytechnic Institute "Big Data... and Related Topics" John S. Erickson, Ph.D The Rensselaer IDEA Rensselaer Polytechnic Institute erickj4@rpi.edu @olyerickson Director of Operations, The Rensselaer IDEA Deputy Director, Rensselaer

More information

The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian ZHENG 1, Mingjiang LI 1, Jinpeng YUAN 1

The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian ZHENG 1, Mingjiang LI 1, Jinpeng YUAN 1 International Conference on Intelligent Systems Research and Mechatronics Engineering (ISRME 2015) The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

Adoption of E-Governance Applications towards Big Data Approach

Adoption of E-Governance Applications towards Big Data Approach Adoption of E-Governance Applications towards Big Data Approach Ethirajan D Principal Engineer, Center for Development of Advanced Computing Orcid : 0000-0002-7090-1870 Dr. S.Purushothaman Professor 5/411

More information

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 Introduction MapReduce is a programming model and an associated implementation for processing and generating large data set with parallel,

More information

Hadoop: Components and Working

Hadoop: Components and Working Volume 6, No. 7, September-October 2015 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info Hadoop: Components and Working Nitika Arora Govt.College

More information

A REVIEW: MAPREDUCE AND SPARK FOR BIG DATA ANALYTICS

A REVIEW: MAPREDUCE AND SPARK FOR BIG DATA ANALYTICS A REVIEW: MAPREDUCE AND SPARK FOR BIG DATA ANALYTICS Meenakshi Sharma 1, Vaishali Chauhan 2, Keshav Kishore 3 1,2 Students of Master of Technology, A P Goyal Shimla University, (India) 3 Head of department,

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY Journal of Analysis and Computation (JAC) (An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861 International Conference on Emerging Trends in IOT & Machine Learning, 2018 TOOLS

More information

Big Data-A Review Study with Comparitive Analysis of Hadoop

Big Data-A Review Study with Comparitive Analysis of Hadoop Big Data-A Review Study with Comparitive Analysis of Hadoop Himani Tyagi 1 1Department of CSE, BSAITM, Faridabad, Hariyana ---------------------------------------------------------------------***----------------------------------------------------------------------

More information

Survey on Incremental MapReduce for Data Mining

Survey on Incremental MapReduce for Data Mining Survey on Incremental MapReduce for Data Mining Trupti M. Shinde 1, Prof.S.V.Chobe 2 1 Research Scholar, Computer Engineering Dept., Dr. D. Y. Patil Institute of Engineering &Technology, 2 Associate Professor,

More information

Data Storage Infrastructure at Facebook

Data Storage Infrastructure at Facebook Data Storage Infrastructure at Facebook Spring 2018 Cleveland State University CIS 601 Presentation Yi Dong Instructor: Dr. Chung Outline Strategy of data storage, processing, and log collection Data flow

More information

A Survey on Big Data

A Survey on Big Data A Survey on Big Data D.Prudhvi 1, D.Jaswitha 2, B. Mounika 3, Monika Bagal 4 1 2 3 4 B.Tech Final Year, CSE, Dadi Institute of Engineering & Technology,Andhra Pradesh,INDIA ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Chase Wu New Jersey Institute of Technology

Chase Wu New Jersey Institute of Technology CS 644: Introduction to Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Institute of Technology Some of the slides were provided through the courtesy of Dr. Ching-Yung Lin at Columbia

More information

ENHANCED DATA ARCHIVAL AND RETRIEVAL SYSTEM USING REED SOLOMON CODING

ENHANCED DATA ARCHIVAL AND RETRIEVAL SYSTEM USING REED SOLOMON CODING ENHANCED DATA ARCHIVAL AND RETRIEVAL SYSTEM USING REED SOLOMON CODING G.Brajith kumar #1 and V.Karthi *2 # PG Scholar, Dept. of CSE, J.K.K Nataraja college of engineering and technology komarapalayam,

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

MapReduce, Hadoop and Spark. Bompotas Agorakis

MapReduce, Hadoop and Spark. Bompotas Agorakis MapReduce, Hadoop and Spark Bompotas Agorakis Big Data Processing Most of the computations are conceptually straightforward on a single machine but the volume of data is HUGE Need to use many (1.000s)

More information

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Hwajung Lee Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Cloud Introduction Cloud Service Model Big Data Hadoop MapReduce HDFS (Hadoop Distributed

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017 Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104 Real Time for Big Data: The Next Age of Data Management Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104 Real Time for Big Data The Next Age of Data Management Introduction

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 11 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(11), 2014 [5368-5376] The study on magnanimous data-storage system based

More information

Top 25 Big Data Interview Questions And Answers

Top 25 Big Data Interview Questions And Answers Top 25 Big Data Interview Questions And Answers By: Neeru Jain - Big Data The era of big data has just begun. With more companies inclined towards big data to run their operations, the demand for talent

More information

MI-PDB, MIE-PDB: Advanced Database Systems

MI-PDB, MIE-PDB: Advanced Database Systems MI-PDB, MIE-PDB: Advanced Database Systems http://www.ksi.mff.cuni.cz/~svoboda/courses/2015-2-mie-pdb/ Lecture 10: MapReduce, Hadoop 26. 4. 2016 Lecturer: Martin Svoboda svoboda@ksi.mff.cuni.cz Author:

More information

A BigData Tour HDFS, Ceph and MapReduce

A BigData Tour HDFS, Ceph and MapReduce A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!

More information

World Journal of Engineering Research and Technology WJERT

World Journal of Engineering Research and Technology WJERT wjert, 2017, Vol. 3, Issue 5, 438-448. Original Article ISSN 2454-695X Sherine et al. WJERT www.wjert.org SJIF Impact Factor: 4.326 OPTIMIZATION OF THE SEARCH GRAPH USING HADOOP AND LINUX OPERATING SYSTEM

More information

A Text Information Retrieval Technique for Big Data Using Map Reduce

A Text Information Retrieval Technique for Big Data Using Map Reduce Bonfring International Journal of Software Engineering and Soft Computing, Vol. 6, Special Issue, October 2016 22 A Text Information Retrieval Technique for Big Data Using Map Reduce M.M. Kodabagi, Deepa

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

Strategic Briefing Paper Big Data

Strategic Briefing Paper Big Data Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which

More information

SpagoBI and Talend jointly support Big Data scenarios

SpagoBI and Talend jointly support Big Data scenarios SpagoBI and Talend jointly support Big Data scenarios Monica Franceschini - SpagoBI Architect SpagoBI Competency Center - Engineering Group Big-data Agenda Intro & definitions Layers Talend & SpagoBI SpagoBI

More information

A brief history on Hadoop

A brief history on Hadoop Hadoop Basics A brief history on Hadoop 2003 - Google launches project Nutch to handle billions of searches and indexing millions of web pages. Oct 2003 - Google releases papers with GFS (Google File System)

More information

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate

More information

Big Data and Object Storage

Big Data and Object Storage Big Data and Object Storage or where to store the cold and small data? Sven Bauernfeind Computacenter AG & Co. ohg, Consultancy Germany 28.02.2018 Munich Volume, Variety & Velocity + Analytics Velocity

More information

Efficient Algorithm for Frequent Itemset Generation in Big Data

Efficient Algorithm for Frequent Itemset Generation in Big Data Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information

Big Data and Cloud Computing

Big Data and Cloud Computing Big Data and Cloud Computing Presented at Faculty of Computer Science University of Murcia Presenter: Muhammad Fahim, PhD Department of Computer Eng. Istanbul S. Zaim University, Istanbul, Turkey About

More information

A Review Approach for Big Data and Hadoop Technology

A Review Approach for Big Data and Hadoop Technology International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 A Review Approach for Big Data and Hadoop Technology Prof. Ghanshyam Dhomse

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information