Driving New Value from Big Data Investments
|
|
- Walter Newton
- 6 years ago
- Views:
Transcription
1 An Introduction to Using R with Hadoop Jeffrey Breen Principal, Think Big Academy jeffrey.breen@thinkbiganalytics.com February 2013 Driving New Value from Big Data Investments
2 Leading Provider of Innovative Big Analytics Services Building Modern Analytics Solutions to Monetize Big Data Investments IMAGINE Strategy and Roadmap ILLUMINATE Training and Education IMPLEMENT Hands-On Data Science and Data Engineering 2
3 THINK BIG Analytics Methodology Experiment-Driven Short Projects with Nimble Test Solution Cycles ILLUMINATE IMAGINE Innovation and Value IMPLEMENT Breaking Down Business and IT Barriers We Accelerate Your Time to Value Discrete Projects with Beginning and End Early Releases to Validate ROI and Ensure Long Term Success 3
4 ILLUMINATE: Training and Education THINK BIG Analytics Enable Your IT Staff with New Skills Data Architect Data Architect Big Data Monitoring Database Administrator Big Data Administrator Business Analyst Data Science Math Modeler Developers Expert Training/Courses e.g. Hadoop Developer, HBase, Pig and Hive for Modelers Joint Application Development Side-by-Side Mentoring Big Data Engineering Build Capabilities to Manage Rapid Innovation Needed with Big Data Invest in and Scale Skills to Create Data-Driven Organization 4
5 Agenda Why R? What is Hadoop? Counting words with MapReduce Writing MapReduce jobs with RHadoop Data Warehousing with Hive Big Data Hadoop Want to learn more? Q&A 5
6 Agenda Why R? What is Hadoop? Counting words with MapReduce Writing MapReduce jobs with RHadoop Data Warehousing with Hive Big Data Hadoop Want to learn more? Q&A 6
7 Revolution Confidential 7
8 Revolution Confidential 8
9 Number of R Packages Available How many R Packages are there now? At the command line enter: > dim(available.packages()) Slide courtesy of John Versotek, organizer of the Boston Predictive Analytics Meetup
10 Agenda Why R? What is Hadoop? Counting words with MapReduce Writing MapReduce jobs with RHadoop Data Warehousing with Hive Big Data Hadoop Want to learn more? Q&A 10
11 Revolution Confidential
12
13 Google File System is the Storage
14 MapReduce is the framework
15 Enter Hadoop About this time, Doug Cutting, the creator of Lucene, was working on Nutch. 15
16 Nutch Timeline Year Topics 2003 Google s GFS paper Nutch Distributed File System (NDFS) Google s MapReduce paper Nutch MapReduce Implementation. 16
17 Hadoop Timeline Year Topics 2006 NDFS and Nutch MapReduce extracted to separate Hadoop Apache project Hadoop is a top-level Apache project. Yahoo! announces 10K core cluster. 17
18 Hadoop Design Goals Optimize disk I/O performance. -Minimize disk head seeks! Redundant data storage and processing to eliminate many kinds of data loss. Horizontal scalability. Run on commodity, server-class hardware. 18
19 Revolution Confidential from Jeff Dean, based on Peter Norvig s 19
20 What is Hadoop? An open source project designed to support large scale data processing Inspired by Google s MapReduce-based computational infrastructure Comprised of several components - Hadoop Distributed File System (HDFS) - MapReduce processing framework, job scheduler, etc. - Ingest/outgest services (Sqoop, Flume, etc.) - Higher level languages and libraries (Hive, Pig, Cascading, Mahout) Written in Java, first opened up to alternatives through its Streaming API If your language of choice can handle stdin and stdout, you can use it to write MapReduce jobs 20
21 Hadoop cluster components SQL Store Ingest Service Outgest Service SQL Store Logs Key italics: process : MR jobs Primary Master Server Job Tracker Name Node Cluster Secondary Master Server Secondary Name Node Client Servers Hive, Pig,... cron+bash, Azkaban, Sqoop, Scribe, Monitoring, Management Slaves Slave Server Slave Server Slave Server Task Tracker Task Tracker Task Tracker Data Node Data Node Data Node... from Think Big Academy s Hadoop Developer Course Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk 21
22 Hadoop s distributed file system SQL Store Ingest Service Outgest Service SQL Store Logs Services Name Node Data Nodes Primary Master Server Job Tracker Name Node Cluster Secondary Master Server Secondary Name Node Client Servers Hive, Pig,... cron+bash, Azkaban, Sqoop, Scribe, Monitoring, Management 64MB blocks Slaves 3x replication Slave Server Task Tracker Data Node Slave Server Task Tracker Data Node Slave Server Task Tracker Data Node... from Think Big Academy s Hadoop Developer Course Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk 22
23 Agenda Why R? What is Hadoop? Counting words with MapReduce Writing MapReduce jobs with RHadoop Data Warehousing with Hive Big Data Hadoop Want to learn more? Q&A 23
24 True confession: I was wrong about MapReduce When the Google paper was published in 2004, I was running a typical enterprise IT department Big hardware (Sun, EMC) + big applications (Siebel, Peoplesoft) + big databases (Oracle, SQL Server) = big licensing & support costs Loved the scalability, COTS components, and price, but missed the fact that keys (and values) could be compound & complex... and examples like Wordcount didn t help! Source: Hadoop: The Definitive Guide, Second Edition, p
25 Input Mappers Sort, Shuffle Reducers Output Hadoop uses MapReduce There is a Map phase There is a Reduce phase (hadoop, 1) (mapreduce, 1) (uses, 1) (is, 1), (a, 1) We need to convert (map, 1),(phase,1) (there, 1) the Input into the Output. (is, 1), (a, 1) (phase,1) (there, 1), (reduce 1) a 2 hadoop 1 is 2 map 1 mapreduce 1 phase 2 reduce 1 there 2 uses 1 from Think Big Academy s Hadoop Developer Course Copyright , Think Big AnalyNcs, All Rights Reserved
26 Input Mappers Hadoop uses MapReduce (N, " ") There is a Map phase (N, " ") (N, "") There is a Reduce phase (N, " ") from Think Big Academy s Hadoop Developer Course Copyright , Think Big AnalyNcs, All Rights Reserved
27 Input Mappers Hadoop uses MapReduce There is a Map phase (N, " ") (N, " ") (hadoop, 1) (uses, 1) (mapreduce, 1) (there, 1) (is, 1) (a, 1) (map, 1) (phase, 1) (N, "") There is a Reduce phase (N, " ") (there, 1) (is, 1) (a, 1) (reduce, 1) (phase, 1) from Think Big Academy s Hadoop Developer Course Copyright , Think Big AnalyNcs, All Rights Reserved
28 Revolution Confidential 28
29 Input Mappers Sort, Shuffle Reducers Hadoop uses MapReduce (N, " ") (hadoop, 1) (mapreduce, 1) 0-9, a-l (uses, 1) There is a Map phase (N, " ") (is, 1), (a, 1) (map, 1),(phase,1) (there, 1) m-q (N, "") There is a Reduce phase (N, " ") (is, 1), (a, 1) (phase,1) (there, 1), (reduce, 1) r-z from Think Big Academy s Hadoop Developer Course Copyright , Think Big AnalyNcs, All Rights Reserved
30 Input Mappers Sort, Shuffle Reducers Hadoop uses MapReduce There is a Map phase (N, " ") (N, " ") (N, "") (hadoop, 1) (mapreduce, 1) (uses, 1) (is, 1), (a, 1) 0-9, a-l (a, [1,1]), (hadoop, [1]), (is, [1,1]) m-q (map, 1),(phase,1) (there, 1) (map, [1]), (mapreduce, [1]), (phase, [1,1]) There is a Reduce phase (N, " ") (is, 1), (a, 1) (phase,1) (there, 1), (reduce 1) r-z (reduce, [1]), (there, [1,1]), (uses, 1) from Think Big Academy s Hadoop Developer Course Copyright , Think Big AnalyNcs, All Rights Reserved
31 Input Mappers Sort, Shuffle Reducers Output Hadoop uses MapReduce (N, " ") (hadoop, 1) (mapreduce, 1) (uses, 1) 0-9, a-l (a, [1,1]), (hadoop, [1]), (is, [1,1]) a 2 hadoop 1 is 2 There is a Map phase (N, " ") (is, 1), (a, 1) m-q (map, 1),(phase,1) (there, 1) (map, [1]), (mapreduce, [1]), (phase, [1,1]) map 1 mapreduce 1 phase 2 (N, "") There is a Reduce phase (N, " ") (is, 1), (a, 1) (phase,1) (there, 1), (reduce 1) r-z (reduce, [1]), (there, [1,1]), (uses, 1) reduce 1 there 2 uses 1 from Think Big Academy s Hadoop Developer Course Copyright , Think Big AnalyNcs, All Rights Reserved
32 Input Mappers Sort, Shuffle Reducers Output Hadoop uses MapReduce (N, " ") (hadoop, 1) (mapreduce, 1) (uses, 1) 0-9, a-l (a, [1,1]), (hadoop, [1]), (is, [1,1]) a 2 hadoop 1 is 2 There is a Map phase (N, " ") (is, 1), (a, 1) m-q (map, 1),(phase,1) (there, 1) (map, [1]), (mapreduce, [1]), (phase, [1,1]) map 1 mapreduce 1 phase 2 Map: There is a Reduce phase (N, "") (N, " ") (is, 1), (a, 1) (phase,1) Transform one input to 0- N outputs. (there, 1), (reduce 1) Reduce: r-z (reduce, [1]), (there, [1,1]), one output. (uses, 1) reduce 1 there 2 uses 1 Collect multiple inputs into from Think Big Academy s Hadoop Developer Course Copyright , Think Big AnalyNcs, All Rights Reserved
33 Agenda Why R? What is Hadoop? Counting words with MapReduce Writing MapReduce jobs with RHadoop Data Warehousing with Hive Big Data Hadoop Want to learn more? Q&A 33
34 Enter RHadoop RHadoop is an open source project sponsored by Revolution Analytics Package Overview - rmr2 - all MapReduce-related functions - rhdfs - interaction with Hadoop s HDFS file system - rhbase - access to the NoSQL HBase database rmr2 uses Hadoop s Streaming API to allow R users to write MapReduce jobs in R - handles all of the I/O and job submission for you (no while(<stdin>)-like loops!) 34
35 RHadoop Advantages Modular - Packages group similar functions - Only load (and learn!) what you need - Minimizes prerequisites and dependencies Open Source - Cost: Low (no) barrier to start using - Transparency: Development, issue tracker, Wiki, etc. hosted on github Supported - Sponsored by Revolution Analytics - Training & professional services available - Support available with Revolution R Enterprise subscriptions 35
36 wordcount: code library(rmr2) map = function(k,lines) { words.list = strsplit(lines, '\\s') words = unlist(words.list) } return( keyval(words, 1) ) reduce = function(word, counts) { } keyval(word, sum(counts)) wordcount = function (input, output = NULL) { mapreduce(input = input, output = output, input.format = "text", map = map, reduce = reduce)} from Revolution Analytics Getting Started with RHadoop course 36
37 wordcount: submit job and fetch results Submit job > hdfs.root = 'wordcount' > hdfs.data = file.path(hdfs.root, 'data') > hdfs.out = file.path(hdfs.root, 'out') > out = wordcount(hdfs.data, hdfs.out) Fetch results from HDFS > results = from.dfs( out ) > results.df = as.data.frame(results, stringsasfactors=f ) > colnames(results.df) = c('word', 'count') > head(results.df) word count 1 greatness 2 2 damned 3 3 tis 5 4 jade 1 5 magician 1 from Revolution Analytics Getting Started with RHadoop course 37
38 Code notes Scalable - Hadoop and MapReduce abstract away system details - Code runs on 1 node or 1,000 nodes without modification Portable - You write normal R code, interacting with normal R objects - RHadoop s rmr2 library abstracts away Hadoop details - All the functionality you expect is there including Enterprise R s Flexible - Only the mapper deals with the data directly - All components communicate via key-value pairs - Key-value schema chosen for each analysis rather than as a prerequisite to loading data into the system 38
39 rmr2 Function Overview Convenience - keyval() - creates a key-value pair from any two R objects. Used to generate output from input formatters, mappers, reducers, etc. Input/output - from.dfs(), to.dfs() - read/write data from/to the HDFS - make.input.format() - provides common file parsing (text, CSV) or will wrap a usersupplied function Job execution - mapreduce() - submit job and return an HDFS path to the results if successful 39
40 rhdfs function overview File & directory manipulation - hdfs.ls(), hdfslist.files() - hdfs.delete(), hdfs.del(), hdfs.rm() - hdfs.dircreate(), hdfs.mkdir() - hdfs.chmod(), hdfs.chown(), hdfs.file.info() - hdfs.exists() Copying, moving & renaming files to/from/within HDFS - hdfs.copy(), hdfs.move(), hdfs.rename() - hdfs.put(), hdfs.get() Reading files directly from HDFS - hdfs.file(), hdfs.read(), hdfs.write(), hdfs.flush() - hdfs.seek(), hdfs.tell(con), hdfs.close() - hdfs.line.reader(), hdfs.read.text.file() Misc. - hdfs.init(), hdfs.defaults() 40
41 rhbase function overview Initialization - hb.init() Create and manage tables - hb.list.tables(), hb.describe.table() - hb.new.table(), hb.delete.table() Read and write data - hb.insert(), hb.insert.data.frame() - hb.get(), hb.get.data.frame(), hb.scan() - hb.delete() Administrative, etc. - hb.defaults(), hb.set.table.mode() - hb.regions.table(), hb.compact.table() 41
42 Big Data Warehousing with Hive Hive supplies a SQL-like query language - very familiar for those with relational database experience But Hive compiles, optimizes, and executes these queries as MapReduce jobs on the Hadoop cluster Can be used in conjunction with other Hadoop jobs, such as those written with rmr2 42
43 Hive architecture & access Terminal browser RODBC, RJDBC, etc. Hive JDBC ODBC CLI HWI Thrift Server Driver (compiles, optimizes, executes) Metastore Hadoop Job Tracker Master Name Node DFS 43
44 Accessing Hive via ODBC/JDBC library(rjdbc) # set the classpath to include the JDBC driver location, plus commons-logging [...] class.path = c(hive.class.path, commons.class.path) drv = JDBC("org.apache.hadoop.hive.jdbc.HiveDriver", classpath=class.path, "`") # make a connection to the running Hive Server: conn = dbconnect(drv, "jdbc:hive://localhost:10000/default") # setting the database name in the URL doesn't help, # so issue 'use databasename' command: res = dbsendquery(conn, 'use mydatabase') # submit the query and fetch the results as a data.frame: df = dbgetquery(conn, 'SELECT name, sub FROM employees LATERAL VIEW explode(subordinates) subview AS sub') 44
45 Other ways to use R and Hadoop HDFS - Revolution Enterprise R can read and write files directly on the distributed file system - Files can include ScaleR s XDF-formatted data sets MapReduce - Many other R packages have been written to use R and Hadoop together, including RHIPE, segue, Oracle s R Connector for Hadoop, etc. Hive - Hadoop Streaming is also available for Hive to leverage functionality external to Hadoop and Java - RHive leverages RServe to connect the two 45
46 Big Data Hadoop NoSQL databases offer low-latency, random-access to key-values - HBase - Cassandra - CouchDB - MongoDB - Accumulo Next week, Think Big s Douglas Moore will be presenting at the Boston Storm Meetup: - Predictive Analytics with Storm, Hadoop, R and AWS
47 Want to learn more? Upcoming public Getting Started with RHadoop 1-day classes - Hands-on examples and exercises covering rhdfs, rhbase, and rmr2 - Algorithms and data include wordcount, analysis of airline flight data, and collaborative filtering using structured and unstructured data from text, CSV files and Twitter February 25, Palo Alto, CA March 13, Boston, MA 25% off with user discount Revolution Analytics Quick Start Program for Hadoop - Private Getting Started with RHadoop training - Onsite consulting assistance for initial use case - Revolution R for Hadoop licenses and support - More 47
Hadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce Who Am I - Ryan Tabora - Data Developer at Think Big Analytics - Big Data Consulting - Experience working with Hadoop, HBase, Hive, Solr, Cassandra, etc. 2 Who Am I -
More informationHDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung
HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per
More informationNew Approaches to Big Data Processing and Analytics
New Approaches to Big Data Processing and Analytics Contributing authors: David Floyer, David Vellante Original publication date: February 12, 2013 There are number of approaches to processing and analyzing
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationIntroduction to BigData, Hadoop:-
Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationBig Data with R and Hadoop
Big Data with R and Hadoop Jamie F Olson June 11, 2015 R and Hadoop Review various tools for leveraging Hadoop from R MapReduce Spark Hive/Impala Revolution R 2 / 52 Scaling R to Big Data R has scalability
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationLecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018
Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018 K. Zhang (pic source: mapr.com/blog) Copyright BUDT 2016 758 Where
More informationData Analytics Job Guarantee Program
Data Analytics Job Guarantee Program 1. INSTALLATION OF VMWARE 2. MYSQL DATABASE 3. CORE JAVA 1.1 Types of Variable 1.2 Types of Datatype 1.3 Types of Modifiers 1.4 Types of constructors 1.5 Introduction
More informationHadoop. copyright 2011 Trainologic LTD
Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationDHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI
DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Information Technology IT6701 - INFORMATION MANAGEMENT Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation: 2013
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationCERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)
CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More informationExpert Lecture plan proposal Hadoop& itsapplication
Expert Lecture plan proposal Hadoop& itsapplication STARTING UP WITH BIG Introduction to BIG Data Use cases of Big Data The Big data core components Knowing the requirements, knowledge on Analyst job profile
More informationMicrosoft Big Data and Hadoop
Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common
More informationHadoop & Big Data Analytics Complete Practical & Real-time Training
An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE
More information1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions
1Z0-449 Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions Table of Contents Introduction to 1Z0-449 Exam on Oracle Big Data 2017 Implementation Essentials... 2 Oracle 1Z0-449
More informationInternational Journal of Advance Engineering and Research Development. A Study: Hadoop Framework
Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationGain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.
Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources
More informationThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,
More informationChase Wu New Jersey Institute of Technology
CS 644: Introduction to Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Institute of Technology Some of the slides were provided through the courtesy of Dr. Ching-Yung Lin at Columbia
More informationHortonworks Data Platform
Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks
More informationIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large
More informationHadoop. Introduction to BIGDATA and HADOOP
Hadoop Introduction to BIGDATA and HADOOP What is Big Data? What is Hadoop? Relation between Big Data and Hadoop What is the need of going ahead with Hadoop? Scenarios to apt Hadoop Technology in REAL
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationHadoop Overview. Lars George Director EMEA Services
Hadoop Overview Lars George Director EMEA Services 1 About Me Director EMEA Services @ Cloudera Consulting on Hadoop projects (everywhere) Apache Committer HBase and Whirr O Reilly Author HBase The Definitive
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More information"Big Data... and Related Topics" John S. Erickson, Ph.D The Rensselaer IDEA Rensselaer Polytechnic Institute
"Big Data... and Related Topics" John S. Erickson, Ph.D The Rensselaer IDEA Rensselaer Polytechnic Institute erickj4@rpi.edu @olyerickson Director of Operations, The Rensselaer IDEA Deputy Director, Rensselaer
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationWebinar Series TMIP VISION
Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing
More informationHADOOP FRAMEWORK FOR BIG DATA
HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further
More informationA brief history on Hadoop
Hadoop Basics A brief history on Hadoop 2003 - Google launches project Nutch to handle billions of searches and indexing millions of web pages. Oct 2003 - Google releases papers with GFS (Google File System)
More informationHadoop Online Training
Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the
More informationCertified Big Data and Hadoop Course Curriculum
Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationPerformance Comparison of Hive, Pig & Map Reduce over Variety of Big Data
Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data Yojna Arora, Dinesh Goyal Abstract: Big Data refers to that huge amount of data which cannot be analyzed by using traditional analytics
More informationBig Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka
Course Curriculum: Your 10 Module Learning Plan Big Data and Hadoop About Edureka Edureka is a leading e-learning platform providing live instructor-led interactive online training. We cater to professionals
More informationIntroduction to Big Data. Hadoop. Instituto Politécnico de Tomar. Ricardo Campos
Instituto Politécnico de Tomar Introduction to Big Data Hadoop Ricardo Campos Mestrado EI-IC Análise e Processamento de Grandes Volumes de Dados Tomar, Portugal, 2016 Part of the slides used in this presentation
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationCIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench
CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench Abstract Implementing a Hadoop-based system for processing big data and doing analytics is a topic which has been
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationHADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)
HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation) Introduction to BIGDATA and HADOOP What is Big Data? What is Hadoop? Relation between Big
More information@Pentaho #BigDataWebSeries
Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of
More informationThis is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.
About the Tutorial Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationBig Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture
Big Data Syllabus Hadoop YARN Setup Programming in YARN framework j Understanding big data and Hadoop Big Data Limitations and Solutions of existing Data Analytics Architecture Hadoop Features Hadoop Ecosystem
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More informationOracle Big Data. A NA LYT ICS A ND MA NAG E MENT.
Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem
More informationDatabricks, an Introduction
Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,
More informationPROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.
PROFESSIONAL NoSQL Shashank Tiwari WILEY John Wiley & Sons, Inc. Examining CONTENTS INTRODUCTION xvil CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit
More informationHadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop
Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce
More informationBIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG
BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG Prof R.Angelin Preethi #1 and Prof J.Elavarasi *2 # Department of Computer Science, Kamban College of Arts and Science for Women, TamilNadu,
More information50 Must Read Hadoop Interview Questions & Answers
50 Must Read Hadoop Interview Questions & Answers Whizlabs Dec 29th, 2017 Big Data Are you planning to land a job with big data and data analytics? Are you worried about cracking the Hadoop job interview?
More informationSOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera
SOLUTION TRACK Finding the Needle in a Big Data Haystack @EvaAndreasson, Innovator & Problem Solver Cloudera Agenda Problem (Solving) Apache Solr + Apache Hadoop et al Real-world examples Q&A Problem Solving
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationCertified Big Data Hadoop and Spark Scala Course Curriculum
Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills
More informationImporting and Exporting Data Between Hadoop and MySQL
Importing and Exporting Data Between Hadoop and MySQL + 1 About me Sarah Sproehnle Former MySQL instructor Joined Cloudera in March 2010 sarah@cloudera.com 2 What is Hadoop? An open-source framework for
More informationIntroduction to Big Data
Introduction to Big Data OVERVIEW We are experiencing transformational changes in the computing arena. Data is doubling every 12 to 18 months, accelerating the pace of innovation and time-to-value. The
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationCopyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle NoSQL Database: Release 3.0 What s new and why you care Dave Segleau NoSQL Product Manager The following is intended to outline our general product direction. It is intended for information purposes
More informationScaling Up 1 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. Hadoop, Pig
CSE 6242 / CX 4242 Scaling Up 1 Hadoop, Pig Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le
More informationProcessing big data with modern applications: Hadoop as DWH backend at Pro7. Dr. Kathrin Spreyer Big data engineer
Processing big data with modern applications: Hadoop as DWH backend at Pro7 Dr. Kathrin Spreyer Big data engineer GridKa School Karlsruhe, 02.09.2014 Outline 1. Relational DWH 2. Data integration with
More informationOracle Big Data Fundamentals Ed 1
Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data
More informationBig Data Development HADOOP Training - Workshop. FEB 12 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI
Big Data Development HADOOP Training - Workshop FEB 12 to 16 2017 (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 9798 Dubai UAE, email training-coordinator@isidusnet M: +97150
More informationData Lake Based Systems that Work
Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a
More informationQuestion: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?
Volume: 72 Questions Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig? A. update hdfs set D as./output ; B. store D
More informationA BigData Tour HDFS, Ceph and MapReduce
A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationBig Data Infrastructure at Spotify
Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure September 26, 2013 2 Who am I? According to ZDNet: "The work they have done to improve the Apache Hive data warehouse system
More informationOracle Data Integrator 12c: Integration and Administration
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive
More informationHortonworks Certified Developer (HDPCD Exam) Training Program
Hortonworks Certified Developer (HDPCD Exam) Training Program Having this badge on your resume can be your chance of standing out from the crowd. The HDP Certified Developer (HDPCD) exam is designed for
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationA Review Paper on Big data & Hadoop
A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College
More informationSpagoBI and Talend jointly support Big Data scenarios
SpagoBI and Talend jointly support Big Data scenarios Monica Franceschini - SpagoBI Architect SpagoBI Competency Center - Engineering Group Big-data Agenda Intro & definitions Layers Talend & SpagoBI SpagoBI
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationData Analysis Using MapReduce in Hadoop Environment
Data Analysis Using MapReduce in Hadoop Environment Muhammad Khairul Rijal Muhammad*, Saiful Adli Ismail, Mohd Nazri Kama, Othman Mohd Yusop, Azri Azmi Advanced Informatics School (UTM AIS), Universiti
More information