PRE HADOOP AND POST HADOOP VALIDATIONS FOR BIG DATA

Size: px
Start display at page:

Download "PRE HADOOP AND POST HADOOP VALIDATIONS FOR BIG DATA"

Transcription

1 International Journal of Mechanical Engineering and Technology (IJMET) Volume 8, Issue 10, October 2017, pp , Article ID: IJMET_08_10_066 Available online at ISSN Print: and ISSN Online: IAEME Publication Scopus Indexed PRE HADOOP AND POST HADOOP VALIDATIONS FOR BIG DATA Nachiyappan.S School of Computer Science and Engineering, VIT University, Chennai, TN, India Justus Selwyn School of Computer Science and Engineering, VIT University, Chennai, TN, India ABSTRACT: Big data, a platform in which everybody wants to gain knowledge on processing and analyzing the vast amount of data with ease. Various application are available today for processing vast amount of data and get the intended result from it, And various application testing tools are available to test the Big Data application. But no application or tool explain how data is validated before processing and after processing of data. Various existing functional and non-functional testing can be performed to assure the quality of the data so that the cost and time can be saved for the processing party. Here in this paper we are going to propose a set of testing strategies at various stages on Big Data analysis process so that one can validate data before retrieving the vast amount of data from Hadoop. Keywords: Big Data, Fuctional testing, Non-functional testing, hadoop, quality. Cite this Article: Nachiyappan.S and Justus Selwyn, Pre Hadoop and Post Hadoop Validations for Big Data, International Journal of Mechanical Engineering and Technology 8(10), 2017, pp INTRODUCTION Large size of data which is either structured or unstructured or semi-structured cannot be processed with existing traditional DBMS methods can be processed with the help of a concept known as Big Data. Today, various companies adapted to Big data to process the huge streams of data. Since almost everyone has a device which is connected to internet, but on processing such vast data they only care about how the data is processed, what algorithm is used to process the data? But nobody cares to check whether the dataset they are processing is valid or not whether it can provide the intended result or not. If the data is acquired from valid data provider or data collector then there will be no issue, Say one want to do sentimental analysis and he got data by crawling from various stream. Nowadays more false data circulate in the form of rumors. Processing such data will get results from anywhere[2]. Hence editor@iaeme.com

2 Pre Hadoop and Post Hadoop Validations for Big Data validating data is equally important as processing the data to get the intended result. Big data testing methods must be implemented along with the Big Data processing. 2. AUTOMATED TESTING Testing generally falls into 2 major categories based on how the testing is performed Manual Testing Automation Testing Testing which is performed by writing script will automate the testing process and reduce the effort and time by using various automation tools is known as Automation testing. Most of the automation tools include record and playback process that will generate the script and by adding data pool to the tool will automate the testing process for different input and will generate output record accordingly for each iteration. Most of the testing strategies are performed using Pig script. This paper meets the basic testing which includes functional and non- functional testing. 3. EXISTING TESTING METHODOLOGIES Most of the existing validations on Big Data cover the testing that can be done on data set for finding the quality and consistency. Testing process mainly concentrates on application but they fail to test the data that is to be processed. Non-functional testing can be performed on any big data application that will check the reliability of the application. List of proposed functional and non- functional testing that can be done on data for checking the consistency of the data are mentioned in proposed validation[1]. Figure 1 Various Data source 4. PROPOSED TESTING METHODOLOGIES: Testing in Big Data can be performed in three different stages that can validate the data at various stages at various forms. List of stages at which testing can be performed is: Pre-Hadoop validation. Map-reduce validation. ETL or Post-Hadoop Validation editor@iaeme.com

3 Nachiyappan.S and Justus Selwyn Both functional and non-functional testing can be performed on data to check the consistency of the data and the list of validations that can be performed on the data are listed below. Most of the testing are done in Pig Latin. A. Pre Hadoop Validation: Pre-Hadoop Validation mainly constitute the testing of data before processing so that the data cleansing can be done to avoid negative processing that will result in wasting of both time and resource. Data cleansing plays a major role in getting reliable output. Big Data systems typically process a mix of structured data (such as point-of-sale transactions, call detail records, general ledger transactions, and call center transactions), unstructured data (such as user comments, doctors' notes, insurance claims descriptions and web logs) and semi-structured social media data (from sites like Twitter, Facebook, LinkedIn and Pinterest). Often the data is extracted from its source location and saved in its raw or a processed form in Hadoop or another Big Data database management system. Data is typically extracted from a variety of source systems and in varying file formats, e.g. relational tables, fixed size records, flat files with delimiters (CSV), XML files, JSON and text files. Most Big Data database management systems are designed to store data in its rawest form, creating what has come to be known as a "data lake," a largely undifferentiated collection of data as captured from the source. These DBMSs use an approach called "schema on read," i.e. the data is given a simple structure appropriate to the application as it is read, but very little structure is imposed during the loading phase. The most important activity during data loading is to compare data to ensure extraction has happened correctly and to confirm that the data loaded into the HDFS (Hadoop Distributed File System) is a complete, accurate copy[14]. Typical tests include: 1. Data type validation: Data type validation is customarily carried out on one or more simple data fields. The simplest kind of data type validation verifies that the individual characters provided through user input are consistent with the expected characters of one or more known primitive data types as defined in a programming language or data storage and retrieval mechanism[14]. 2. Range and constraint validation: Simple range and constraint validation may examine user input for consistency with a minimum/maximum range, or consistency with a test for evaluating a sequence of characters, such as one or more tests against regular expressions [14]. 3. Code and cross-reference validation. Code and cross-reference validation includes tests for data type validation, combined with one or more operations to verify that the user-supplied data is consistent with one or more external rules, requirements or validity constraints relevant to a particular organization, context or set of underlying assumptions. These additional validity constraints may involve crossreferencing supplied data with a known look-up table or directory information service such as LDAP. 4. Structured validation. Structured validation allows for the combination of any number of various basic data type validation steps, along with more complex processing. Such complex processing may include editor@iaeme.com

4 Pre Hadoop and Post Hadoop Validations for Big Data the testing of conditional constraints for an entire complex data object or set of process operations within a system. B. Map-Reduce Validation: MapReduce is the heart of Apache Hadoop. It is this programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster. The MapReduce concept is fairly simple to understand for those who are familiar with clustered scale-out data processing solutions. Figure 2 Map Reduce Word Count Process. Map-Reduce Validation constitute the checking of key-value pairs generation and validate the map-reduce by applying various business rules. The term MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform. The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after the map job. C. Post Hadoop Validation After Map-reduce process is completed rest is validating the Extract-Transfer-Load Validation. This mainly constitute the output file extraction and loading it into target output folder. Post Hadoop validation is done before data is moved into a production data warehouse system. It is sometimes also called as table balancing or production reconciliation. It is different from database testing in terms of its scope and the steps to be taken to complete this. The main objective of Post Hadoop validation is to identify and mitigate data defects and general errors that occur prior to processing of data for analytical reporting. Post Hadoop validation is different from database testing or any other conventional testing. One may have to face different types of challenges while performing Post Hadoop validation. As we are dealing with huge data and executing on multiple nodes there are high chances of having bad data issues at each stage of the process. As we know, processing big data is difficult since it is a collection of huge amount of data and executing it on multiple nodes there is a high risk of bad data and even quality issues editor@iaeme.com

5 Nachiyappan.S and Justus Selwyn Main challenges are: Incorrect data Incomplete or duplicate data. Inefficient in procedures and business process. DW system contains historical data, so the data volume is too large and extremely complex to perform Post Hadoop testing in the target system. Once map-reduce process is completed and data output files are generated, this processed data is moved to enterprise data warehouse or any transactional systems depending on the requirement. Some issues that we face during this phase include incorrectly applied transformation rules, incorrect load of HDFS files into EDW and incomplete data extract from Hadoop HDFS. Some high level scenarios that need to be validated during this phase include: Validating that transformation rules are applied correctly. Validating that that there is no data corruption by comparing target table data against HDFS files data. Validating the data load in target system. Validating the aggregation of data. Testing on various stages is listed below: D. Functional Testing The Functional testing is done to check the dataset consistency by Comparing the data before and after uploading the dataset. Comparing the file size of file in Hadoop server and the same file in local system. Comparing the file format of file in Hadoop server and the same file in local system. Validating the schema of both files. Checking the generation of key value pair after map reduce process. Checking the generated output file. Apply Business rules for validating the map reduce process After Map reduce process check if business rules are applied correctly and generated output is as desired. Checking if output file is extracted correctly. Figure 3 Phases if Testing in Big Data editor@iaeme.com

6 Pre Hadoop and Post Hadoop Validations for Big Data E. NON FUNCTIONAL TESTING Non-functional validations of application are done to check the reliability of the application. List of non- functional testing that can be done on the application are: Performance testing Security testing Reusability testing Reliability testing 5. TEST RESULTS AND DISCUSSION A. Functional Validations 1. Comparing Data Data has been collected from various sources and after collecting the dataset and uploading the data into the Hadoop system and before processing it, it is loaded into the Pig storage by loading it into the Pig Storage, use DIFF function to compare if both the files (Source and the destination) in Hadoop system and the file in local file system then generate the report of validation, which will give clear picture what field has been changed. If the replication is done correctly the output file will have a pair of empty brasses. Figure 4 Pig Output of Comparing two files 2. File extraction: The next validation is file extraction, here we need to compare the file which is inside the hadoop and source file and validations are done according to that. Once the file is loaded into the pig storage, implement the sample application in local mode so that we will get the output file. Comparing the file with file in Hadoop system will give result that will ensure that the file is extracted correctly or not editor@iaeme.com

7 Nachiyappan.S and Justus Selwyn Figure 5 File Extraction 3. File size comparision: The File size comparison is one of the important validation to conclude that there is any modification in file, if there is any modification in the file which is uploaded the file size will be varies. This particular validation uses java code which will get the file from its location and retrieve the size of the file and uses apache function that will get the size of the file in hadoop and compare the both sizes and return the value accordingly so that we can compare both file size and validate it. In the below fig. 6 it is depicted as file and file1, file is the one we have as a source and file1 is the one which is stored in HDFS. So both the files are in same size. Figure 6 File Size Comparision 4. File format validation: File format validation helps us in many ways to confine the format of file. The format which we store the file in source system and the file which is moved into the destination varies many times. This validation is used to find the file format of source and destination and it compares and gives us the result. It uses the java code that retrieve the file size and will do the same with the file in local system and uses arrayutil.getextension() to get the extension from hadoop file system and compare both extension and return value that will validate the file in Hadoop system. In below fig.7 it depicts the file size and the format. If both the file sizes are same then it will be as same size, if it is in the same format it responds as same format, if there is any change it alerts the user that the file size and formats are different. Figure 7 File Format Validation editor@iaeme.com

8 Pre Hadoop and Post Hadoop Validations for Big Data 5. Key Value Pair Validation: Hash Map function generates the tokenized map value and then compare the same with the output that is generated from map-reduce method. Figure 8 Key value Pair Validation 6. Output File Generation: After completion of the map-reduce process the output file location must be validated and then return the size of the output file. Figure 9 Output File Generation 6. CONCLUSION Big data is still emerging and a there is a lot of responsibility on testers to identify innovative ideas to test the implementation. One of the most challenging things for a tester is to keep pace with changing dynamics of the industry. While on most aspects of testing, the tester need not know the technical details behind the scene however this is where testing Big Data Technology is so different. A tester not only needs to be strong on testing fundamentals but also has to be equally aware of minute details in the architecture of the database designs to analyze several performance bottlenecks and other issues. Hadoop testers have to learn the components of the Hadoop eco system from the scratch. In this paper we have used some sample data and we have pushed the same into Hadoop in single cluster mode. We have come out with the both functional and non-functional testing results. The future work in this is to test the data with multi cluster systems. REFERENCES [1] S.Nachiyappan and Dr.S.Justus, Getting ready for Big Data Testing:A practitioner perception 4th ICCCNT 2013 July 4-6,2013, Tiruchengode, India. [2] Muthuraman Thangaraj and Subramanian Anuradha, State of art in testing big data IEEE International Conference on Computational Intelligence and Computing Research, editor@iaeme.com

9 Nachiyappan.S and Justus Selwyn [3] Harry M. Sneed and Katalin Erdoes, Testing the Big Data 2015 IEEE Eighth International Conference on Software Testing,Verification and Validation Workshops (ICSTW) 13th User Symposium on Software Quality, Test and Innovation (ASQT 2015) /15/$ IEEE. [4] Piyaporn Samsuwan and Yachai Limpiyakorn, Generation of Data Warehouse Design Test Cases IT convergence and security 5 th international confrence on 24-27Aug,2015 [5] White paper by Infosys, infosys data warehouse testing solutions. [6] Data Warehouse Testing Solutions. - White Paper by Infosys [7] White paper by Infosys, big data testing services. [8] White paper by Syntel, Proven testing techniques in large data warehousing projects. [9] White paper by Infosys, Teat data management in software testing life cycle. [10] Proven testing techniques in large data warehousing projects. - White Paper by Syntel. [11] Teat data management in software testing life cycle. - White Paper by Infosys. [12] Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges. White Paper by Cognizant Technology Solutions. [13] The Emerging Big Data System - Testing Perspective. - White Paper by Hexaware. [14] A Primer on Big Data. White paper by QA Consultants. [15] Suja Cherukullapurath Mana, Big Data Paradigm and a Survey of Big Data Schedulers. International Journal of Computer Engineering & Technology, 8(5), 2017, pp [16] Dr. M Nagalakshmi, Dr. I Surya Prabha, K Anil, Big Data Map Reducing Technique Based Apriori in Distributed Mining. International Journal of Advanced Research in Engineering and Technology, 8(5), 2017, pp [17] Dr. V.V.R. Maheswara Rao, Dr. V. Valli Kumari and N. Silpa. An Extensive Study on Leading Research Paths on Big Data Techniques & Technologies. International Journal of Computer Engineering and Technology, 6(12), 2015, pp [18] Vijayashanthi.R and N.Shunmuga Karpagam, A Literature Survey on Sp Theory of Intelligence Algorithm for Big Data Analysis, International Journal Of Computer Engineering & Technology (IJCET), Volume 5, Issue 12, December (2014), pp editor@iaeme.com

A Review Paper on Big data & Hadoop

A Review Paper on Big data & Hadoop A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

BIG DATA TESTING: A UNIFIED VIEW

BIG DATA TESTING: A UNIFIED VIEW http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

International Journal of Computer Engineering and Applications, BIG DATA ANALYTICS USING APACHE PIG Prabhjot Kaur

International Journal of Computer Engineering and Applications, BIG DATA ANALYTICS USING APACHE PIG Prabhjot Kaur Prabhjot Kaur Department of Computer Engineering ME CSE(BIG DATA ANALYTICS)-CHANDIGARH UNIVERSITY,GHARUAN kaurprabhjot770@gmail.com ABSTRACT: In today world, as we know data is expanding along with the

More information

Modelling Structures in Data Mining Techniques

Modelling Structures in Data Mining Techniques Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor

More information

A CASE STUDY ON COACTIVE SUBJECT MODELLING FOR ACCLAIMING TECHNICAL ARTICLES

A CASE STUDY ON COACTIVE SUBJECT MODELLING FOR ACCLAIMING TECHNICAL ARTICLES International Journal of Mechanical Engineering and Technology (IJMET) Volume 8, Issue 12, December 2017, pp. 456 464, Article ID: IJMET_08_12_046 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=8&itype=12

More information

Election Analysis and Prediction Using Big Data Analytics

Election Analysis and Prediction Using Big Data Analytics Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India

More information

A Review Approach for Big Data and Hadoop Technology

A Review Approach for Big Data and Hadoop Technology International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 A Review Approach for Big Data and Hadoop Technology Prof. Ghanshyam Dhomse

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data

Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data Yojna Arora, Dinesh Goyal Abstract: Big Data refers to that huge amount of data which cannot be analyzed by using traditional analytics

More information

A REVIEW PAPER ON BIG DATA ANALYTICS

A REVIEW PAPER ON BIG DATA ANALYTICS A REVIEW PAPER ON BIG DATA ANALYTICS Kirti Bhatia 1, Lalit 2 1 HOD, Department of Computer Science, SKITM Bahadurgarh Haryana, India bhatia.kirti.it@gmail.com 2 M Tech 4th sem SKITM Bahadurgarh, Haryana,

More information

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Journal homepage: www.mjret.in ISSN:2348-6953 A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Bhavsar Nikhil, Bhavsar Riddhikesh,Patil Balu,Tad Mukesh Department of Computer Engineering JSPM s

More information

Introduction to Big-Data

Introduction to Big-Data Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,

More information

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Hadoop Online Training

Hadoop Online Training Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the

More information

Chapter 6 VIDEO CASES

Chapter 6 VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

<Insert Picture Here> Introduction to Big Data Technology

<Insert Picture Here> Introduction to Big Data Technology Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

More information

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training:: Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional

More information

Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop

Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop K. Senthilkumar PG Scholar Department of Computer Science and Engineering SRM University, Chennai, Tamilnadu, India

More information

Department of Information Technology, St. Joseph s College (Autonomous), Trichy, TamilNadu, India

Department of Information Technology, St. Joseph s College (Autonomous), Trichy, TamilNadu, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 A Survey on Big Data and Hadoop Ecosystem Components

More information

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig? Volume: 72 Questions Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig? A. update hdfs set D as./output ; B. store D

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Chase Wu New Jersey Institute of Technology

Chase Wu New Jersey Institute of Technology CS 644: Introduction to Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Institute of Technology Some of the slides were provided through the courtesy of Dr. Ching-Yung Lin at Columbia

More information

BIG DATA & HADOOP: A Survey

BIG DATA & HADOOP: A Survey Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Wearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life

Wearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life Wearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life Ch.Srilakshmi Asst Professor,Department of Information Technology R.M.D Engineering College, Kavaraipettai,

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Facebook data extraction using R & process in Data Lake

Facebook data extraction using R & process in Data Lake Facebook data extraction using R & process in Data Lake An approach to understand how retail companie B s y G c a a ut n am p Go e sw rf a o m r i m Facebook data mining to analyze customers behavioral

More information

Automated Netezza Migration to Big Data Open Source

Automated Netezza Migration to Big Data Open Source Automated Netezza Migration to Big Data Open Source CASE STUDY Client Overview Our client is one of the largest cable companies in the world*, offering a wide range of services including basic cable, digital

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

A Survey on Comparative Analysis of Big Data Tools

A Survey on Comparative Analysis of Big Data Tools Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Online Bill Processing System for Public Sectors in Big Data

Online Bill Processing System for Public Sectors in Big Data IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY Journal of Analysis and Computation (JAC) (An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861 International Conference on Emerging Trends in IOT & Machine Learning, 2018 TOOLS

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Acquiring Big Data to Realize Business Value

Acquiring Big Data to Realize Business Value Acquiring Big Data to Realize Business Value Agenda What is Big Data? Common Big Data technologies Use Case Examples Oracle Products in the Big Data space In Summary: Big Data Takeaways

More information

A Text Information Retrieval Technique for Big Data Using Map Reduce

A Text Information Retrieval Technique for Big Data Using Map Reduce Bonfring International Journal of Software Engineering and Soft Computing, Vol. 6, Special Issue, October 2016 22 A Text Information Retrieval Technique for Big Data Using Map Reduce M.M. Kodabagi, Deepa

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large

More information

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela

More information

Projected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze

Projected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze Projected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze About HBase HBase is a column-oriented database management system that runs on top of HDFS. It is well suited for sparse data

More information

BUSINESS INTELLIGENCE FOR EVALUATION E-VOUCHER AIRLINE REPORT

BUSINESS INTELLIGENCE FOR EVALUATION E-VOUCHER AIRLINE REPORT International Journal of Mechanical Engineering and Technology (IJMET) Volume 10, Issue 02, February 2019, pp. 213 220, Article ID: IJMET_10_02_024 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=10&itype=2

More information

CONTENTS 1. ABOUT KUMARAGURU COLLEGE OF TECHNOLOGY 2. MASTER OF COMPUTER APPLICATIONS 3. FACULTY ZONE 4. CO-CURRICULAR / EXTRA CURRICULAR ACTIVITIES

CONTENTS 1. ABOUT KUMARAGURU COLLEGE OF TECHNOLOGY 2. MASTER OF COMPUTER APPLICATIONS 3. FACULTY ZONE 4. CO-CURRICULAR / EXTRA CURRICULAR ACTIVITIES CONTENTS 1. ABOUT KUMARAGURU COLLEGE OF TECHNOLOGY 2. MASTER OF COMPUTER APPLICATIONS 3. FACULTY ZONE 4. CO-CURRICULAR / EXTRA CURRICULAR ACTIVITIES 5. STUDENTS DOMAIN 6. ALUMINI COLUMN 7. GUEST COLUMN

More information

CLIENT DATA NODE NAME NODE

CLIENT DATA NODE NAME NODE Volume 6, Issue 12, December 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Efficiency

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

File Inclusion Vulnerability Analysis using Hadoop and Navie Bayes Classifier

File Inclusion Vulnerability Analysis using Hadoop and Navie Bayes Classifier File Inclusion Vulnerability Analysis using Hadoop and Navie Bayes Classifier [1] Vidya Muraleedharan [2] Dr.KSatheesh Kumar [3] Ashok Babu [1] M.Tech Student, School of Computer Sciences, Mahatma Gandhi

More information

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT MANAGING THE DIGITAL FIRM, 12 TH EDITION Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT VIDEO CASES Case 1: Maruti Suzuki Business Intelligence and Enterprise Databases

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DISTRIBUTED FRAMEWORK FOR DATA MINING AS A SERVICE ON PRIVATE CLOUD RUCHA V. JAMNEKAR

More information

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES

INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES Al-Badarneh et al. Special Issue Volume 2 Issue 1, pp. 200-213 Date of Publication: 19 th December, 2016 DOI-https://dx.doi.org/10.20319/mijst.2016.s21.200213 INDEX-BASED JOIN IN MAPREDUCE USING HADOOP

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can

More information

A Survey on Big Data

A Survey on Big Data A Survey on Big Data D.Prudhvi 1, D.Jaswitha 2, B. Mounika 3, Monika Bagal 4 1 2 3 4 B.Tech Final Year, CSE, Dadi Institute of Engineering & Technology,Andhra Pradesh,INDIA ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Exploiting and Gaining New Insights for Big Data Analysis

Exploiting and Gaining New Insights for Big Data Analysis Exploiting and Gaining New Insights for Big Data Analysis K.Vishnu Vandana Assistant Professor, Dept. of CSE Science, Kurnool, Andhra Pradesh. S. Yunus Basha Assistant Professor, Dept.of CSE Sciences,

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

HBase vs Neo4j. Technical overview. Name: Vladan Jovičić CR09 Advanced Scalable Data (Fall, 2017) Ecolé Normale Superiuere de Lyon

HBase vs Neo4j. Technical overview. Name: Vladan Jovičić CR09 Advanced Scalable Data (Fall, 2017) Ecolé Normale Superiuere de Lyon HBase vs Neo4j Technical overview Name: Vladan Jovičić CR09 Advanced Scalable Data (Fall, 2017) Ecolé Normale Superiuere de Lyon 12th October 2017 1 Contents 1 Introduction 3 2 Overview of HBase and Neo4j

More information

LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS

LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS Vandita Jain 1, Prof. Tripti Saxena 2, Dr. Vineet Richhariya 3 1 M.Tech(CSE)*,LNCT, Bhopal(M.P.)(India) 2 Prof. Dept. of CSE, LNCT, Bhopal(M.P.)(India)

More information

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018 Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/

More information

NOSQL Databases: The Need of Enterprises

NOSQL Databases: The Need of Enterprises International Journal of Allied Practice, Research and Review Website: www.ijaprr.com (ISSN 2350-1294) NOSQL Databases: The Need of Enterprises Basit Maqbool Mattu M-Tech CSE Student. (4 th semester).

More information

Social Network Data Extraction Analysis

Social Network Data Extraction Analysis Journal homepage: www.mjret.in ISSN:2348-6953 Prajakta Kulkarni Social Network Data Extraction Analysis Pratibha Bodkhe Kalyani Hole Ashwini Kondalkar Abstract Now-a-days the use of internet is increased;

More information

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

Transaction Analysis using Big-Data Analytics

Transaction Analysis using Big-Data Analytics Volume 120 No. 6 2018, 12045-12054 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ Transaction Analysis using Big-Data Analytics Rajashree. B. Karagi 1, R.

More information

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG Prof R.Angelin Preethi #1 and Prof J.Elavarasi *2 # Department of Computer Science, Kamban College of Arts and Science for Women, TamilNadu,

More information

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka Course Curriculum: Your 10 Module Learning Plan Big Data and Hadoop About Edureka Edureka is a leading e-learning platform providing live instructor-led interactive online training. We cater to professionals

More information

Project Requirements

Project Requirements Project Requirements Version 4.0 2 May, 2016 2015-2016 Computer Science Department, Texas Christian University Revision Signatures By signing the following document, the team member is acknowledging that

More information

Research Article Apriori Association Rule Algorithms using VMware Environment

Research Article Apriori Association Rule Algorithms using VMware Environment Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,

More information

Introduction to Big Data

Introduction to Big Data Introduction to Big Data OVERVIEW We are experiencing transformational changes in the computing arena. Data is doubling every 12 to 18 months, accelerating the pace of innovation and time-to-value. The

More information

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

EXTRACT DATA IN LARGE DATABASE WITH HADOOP International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0

More information

ETL Testing Concepts:

ETL Testing Concepts: Here are top 4 ETL Testing Tools: Most of the software companies today depend on data flow such as large amount of information made available for access and one can get everything which is needed. This

More information

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012 Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema

More information

April Copyright 2013 Cloudera Inc. All rights reserved.

April Copyright 2013 Cloudera Inc. All rights reserved. Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on

More information

Certified Big Data and Hadoop Course Curriculum

Certified Big Data and Hadoop Course Curriculum Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation

More information

Survey Paper on Traditional Hadoop and Pipelined Map Reduce

Survey Paper on Traditional Hadoop and Pipelined Map Reduce International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

Cluster Computing Architecture. Intel Labs

Cluster Computing Architecture. Intel Labs Intel Labs Legal Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED

More information

IBM Data Replication for Big Data

IBM Data Replication for Big Data IBM Data Replication for Big Data Highlights Stream changes in realtime in Hadoop or Kafka data lakes or hubs Provide agility to data in data warehouses and data lakes Achieve minimum impact on source

More information

Mining Distributed Frequent Itemset with Hadoop

Mining Distributed Frequent Itemset with Hadoop Mining Distributed Frequent Itemset with Hadoop Ms. Poonam Modgi, PG student, Parul Institute of Technology, GTU. Prof. Dinesh Vaghela, Parul Institute of Technology, GTU. Abstract: In the current scenario

More information

Deploy Hadoop For Processing Text Data To Run Map Reduce Application On A Single Site

Deploy Hadoop For Processing Text Data To Run Map Reduce Application On A Single Site IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Volume 6, PP 27-33 www.iosrjen.org Deploy Hadoop For Processing Text Data To Run Map Reduce Application On A Single Site Shrusti

More information

Contents. Part I Setting the Scene

Contents. Part I Setting the Scene Contents Part I Setting the Scene 1 Introduction... 3 1.1 About Mobility Data... 3 1.1.1 Global Positioning System (GPS)... 5 1.1.2 Format of GPS Data... 6 1.1.3 Examples of Trajectory Datasets... 8 1.2

More information

Performance Analysis of Hadoop Application For Heterogeneous Systems

Performance Analysis of Hadoop Application For Heterogeneous Systems IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. I (May-Jun. 2016), PP 30-34 www.iosrjournals.org Performance Analysis of Hadoop Application

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark Announcements HW2 due this Thursday AWS accounts Any success? Feel

More information

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why

More information

HADOOP FRAMEWORK FOR BIG DATA

HADOOP FRAMEWORK FOR BIG DATA HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further

More information

MAIN DIFFERENCES BETWEEN MAP/REDUCE AND COLLECT/REPORT PARADIGMS. Krassimira Ivanova

MAIN DIFFERENCES BETWEEN MAP/REDUCE AND COLLECT/REPORT PARADIGMS. Krassimira Ivanova International Journal "Information Technologies & Knowledge" Volume 9, Number 4, 2015 303 MAIN DIFFERENCES BETWEEN MAP/REDUCE AND COLLECT/REPORT PARADIGMS Krassimira Ivanova Abstract: This article presents

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data

More information

Big Data Using Hadoop

Big Data Using Hadoop IEEE 2016-17 PROJECT LIST(JAVA) Big Data Using Hadoop 17ANSP-BD-001 17ANSP-BD-002 Hadoop Performance Modeling for JobEstimation and Resource Provisioning MapReduce has become a major computing model for

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information