Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Similar documents
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Innovatus Technologies

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Introduction to BigData, Hadoop:-

Big Data Hadoop Course Content

Hadoop Development Introduction

Big Data Analytics using Apache Hadoop and Spark with Scala

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Techno Expert Solutions An institute for specialized studies!

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

Certified Big Data and Hadoop Course Curriculum

Hadoop. Introduction to BIGDATA and HADOOP

Configuring and Deploying Hadoop Cluster Deployment Templates

Hadoop course content

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka

Certified Big Data Hadoop and Spark Scala Course Curriculum

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Hadoop Online Training

Hadoop: The Definitive Guide

Big Data Hadoop Stack

Data Analytics Job Guarantee Program

Hadoop. copyright 2011 Trainologic LTD

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Big Data Development HADOOP Training - Workshop. FEB 12 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Hadoop An Overview. - Socrates CCDH

We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info

Oracle Big Data Fundamentals Ed 2

Hadoop-PR Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer)

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Expert Lecture plan proposal Hadoop& itsapplication

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Hortonworks PR PowerCenter Data Integration 9.x Administrator Specialist.

Cmprssd Intrduction To

Hortonworks HDPCD. Hortonworks Data Platform Certified Developer. Download Full Version :

Hadoop. Introduction / Overview

A complete Hadoop Development Training Program.

50 Must Read Hadoop Interview Questions & Answers

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

Top 25 Hadoop Admin Interview Questions and Answers

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Big Data Architect.

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

BIG DATA COURSE CONTENT

Oracle Big Data Fundamentals Ed 1

Apache Hive for Oracle DBAs. Luís Marques

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Exam Questions

ExamTorrent. Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you

MapR Enterprise Hadoop

Exam Questions CCA-505

Hortonworks Certified Developer (HDPCD Exam) Training Program

Big Data Hadoop Certification Training

microsoft

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Cloudera Introduction

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

Hortonworks Data Platform

Cloudera Introduction

Lecture 11 Hadoop & Spark

Pig A language for data processing in Hadoop

Microsoft Big Data and Hadoop

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam

Vendor: Cloudera. Exam Code: CCD-410. Exam Name: Cloudera Certified Developer for Apache Hadoop. Version: Demo

HDInsight > Hadoop. October 12, 2017

Hortonworks University. Education Catalog 2018 Q1

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

April Copyright 2013 Cloudera Inc. All rights reserved.

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

KillTest *KIJGT 3WCNKV[ $GVVGT 5GTXKEG Q&A NZZV ]]] QORRZKYZ IUS =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

Actual4Dumps. Provide you with the latest actual exam dumps, and help you succeed

HADOOP FRAMEWORK FOR BIG DATA

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Getting Started with Hadoop and BigInsights

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

Technical White Paper

Practical Big Data Processing An Overview of Apache Flink

Hadoop ecosystem. Nikos Parlavantzas

Introduction to the Hadoop Ecosystem - 1

docs.hortonworks.com

HBase... And Lewis Carroll! Twi:er,

Copyright 2015 EMC Corporation. All rights reserved. A long time ago

Pig on Spark project proposes to add Spark as an execution engine option for Pig, similar to current options of MapReduce and Tez.

Talend Open Studio for Big Data. Getting Started Guide 5.4.2

Databases 2 (VU) ( / )

Transcription:

Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions: a. Hortonworks b. Cloudera c. Pivotal HD d. Greenplum. Day2: (2hours) 3. Introduction to Apache. a. Flavors of : Big-Insights, Google Query etc.. 4. Eco-system components: Introduction a. MapReduce b. HDFS c. Apache Pig d. Apache Hive e. HBASE f. Apache Oozie g. FLUME h. SQOOP i. Spark. j. Kafka k. Crunch Day3: (2hours) 5. Understanding Cluster 6. Core-Components. a. NameNode. b. JobTracker. c. TaskTracker. d. DataNode. e. SecondaryNameNode. 7. HDFS Architecture f. Why 64MB? g. Why Block? h. Why replication factor 3?

Day4: (2hours) 8. Discuss NameNode and DataNode. 9. Discuss JobTracker and TaskTracker. 10. Typical workflow of application 11. Rack Awareness. a. Network Topology. b. Assignment of Blocks to Racks and Nodes. c. Block Reports d. Heart Beat e. Block Management Service. Day5: (4hours) 12. Anatomy of File Write. 13. Anatomy of File Read. 14. Heart Beats and Block Reports 15. Map Reduce Overview 16. Cluster Configuration a. Core-default.xml b. Hdfs-default.xml c. Mapred-default.xml d. Yarn-site.xml e. -env.sh f. Slaves g. Masters 17. Map Reduce Framework 18. Why Map Reduce? 19. Use cases where Map Reduce is used. 20. YARN Architecture 21. Classic vs YARN 22. YARN Demo Day6: (2hours) 23. MR Practicals a. Setup environment for the programs. b. Possible ways of writing Map Reduce program with sample codes find the best code and discuss. c. Configured, Tool, GenericOptionParser and queues usage. 24. Limitations of traditional way of solving word count with large dataset.

Day7: (2hours) 25. Map Reduce way of solving the problem. 26. Complete overview of MapReduce. 27. Unit testing of mapreduce programs using Junit, MRUnit frameworks. 28. Challenges in Testing and options available 29. Manual testing of MapReduce programs Day8: (2hours) 30. Split Size 31. Combiners 32. Multi Reducers 33. Parts of Map Reduce 34. Shuffle, Sort and Merge phases 35. Map Reduce Design Patterns Day 9: (2hours) 1. Cloudera Distribution of (CDH) VM Setup 2. HDFS Practicals (HDFS Commands) 3. Map Reduce Anatomy a. Job Submission. b. Job Initialization. c. Task Assignments. d. Task Execution. Day10: (4hours) 4. Schedulers 5. Map Reduce Failure Scenarios 6. Speculative Execution 7. Sequence File 8. Input File Formats 9. Output File Formats 10. Writable DataTypes 11. Custom Input Formats 12. Example List, show and run examples in map reduce. 13. Debugging Map Reduce Programs 14. Error Tracing of Map Reduce programs 15. Discussion on most common issues in MR. 16. Calculating the stats of MR Programs Day11: (2hours) Map Reduce Advance Concepts with usecases(hands On): 17. Partitioning and Custom Partitioner 18. Joins

19. Multi outputs 20. Counters 21. MR unit testcases 22. MR Design patterns 23. Distributed Cache a. Command line implementation 24. MapReduce API implementation Day12: (2hours) Sqoop: 1. Sqoop Theory 2. Demo for Sqoop and Practicals. 3. Sqoop Imports and Exports 4. Sqoop Tuning. Day 13: (2hours) Hive: 1. Hive Background. 2. What is Hive? 3. Where to Use Hive? 4. Hive Architecture 5. Metastore 6. Hive execution modes. 7. External, Managed, External tables. Day 14: (2hours) 8. Hive Partitioning 9. Hive Bucketing 10. Hive Data Model 11. Hive Data Types a. Primitive b. Complex 12. Queries: c. Create Managed Table d. Load Data e. Insert overwrite table f. Insert into Local directory. g. CTAS. h. Insert Overwrite table select. 13. Joins a. Inner Joins b. Outer Joins

c. Skew Joins Day 15: (4hours) 14. Hive Sort By, Order By 15. Multi-table Inserts 16. Multiple files, directories, table inserts. 17. Serde a. RegexSerde b. AvroSerde 18. Storing in Sequence and ORC File Format 19. UDF 20. Hive through CLI, Batch and Hue 21. Hive Practical s and Usecases 22. Hive Configuration and Hive-site.xml 23. Optimizing hive queries 24. Best Practices in Hive 25. Debugging Hive Scripts and Error Tracing. a. Common Issues in Hive. b. Hive Optimization Techniques and Best Practices Day 16: (2hours) Pig: 1. Need of Pig? 2. Why Pig Created? 3. Introduction to skew Join. 4. Why go for Pig when Map Reduce is there? 5. Pig use cases. 6. Pig built in operators 7. Pig store schema. Day 17: (2hours) 8. Operators: a. Load b. Store c. Dump d. Filter. e. Distinct f. Group g. CoGroup h. Join i. Stream j. Foreach Generate k. Parallel. l. Distinct

m. Limit n. ORDER o. CROSS p. UNION q. SPLIT r. Sampling Day 18: (2hours) 9. Dump Vs Store 10. DataTypes a. Complex i. Bag ii. Tuple iii. Atom iv. Map b. Primitives. v. Integers vi. Float vii. Chararray viii. bytearray ix. Double Day 19: (2hours) 11. Diagnostic Operators c. Describe d. Explain e. Illustrate 12. UDFs. 13. Physical and Logical Execution Plans 14. Storage Handlers. 15. Pig Practicals and Usecases. 16. Pig vs Hive 17. Testing suite s for Pig like Pig Lipstick, Pig Penny for debugging and error tracing. 18. Data loading using HCatalog Loader 19. Pig Debugging using Explain and Illustrate commands 20. Pig Stats Day 20: (4hours) Impala: 1. Impala Architecture 2. ImpalaD Deamon 3. Impala StateStore 4. Impala Catalog

5. MPP Architecture 6. Impala Practicals 7. Adhoc Querying in Impala. 8. Impala integration with Hive Day 21: (2hours) File Formats: 1. Sequence File 2. Avro File 3. ORC File Format 4. Parquet File Format 5. Storing Hive data in these File Formats 6. Comparing File Formats 7. Compression techniques like snappy, lzo, bgzip,etc 8. AVRO Shemas Comparing BigData Execution Engines (Tez, MR, DAG, RDD, MPP) Day 22: (2hours) Introduction to NOSQL Databases: 1. Problem with RDBMS 2. Row Oriented vs Column Oriented 3. Introduction to NOSQL DB s 4. CAP Theorem Day 23: (2hours) HBase: 1. Introduction to NOSQL Databases. 2. NOSql Landscapes 3. Introduction to HBASE 4. HBASE vs RDBMS 5. Create Table on HBASE using HBASE shell 6. Where to use HBASE? 7. Where not to use HBASE? 8. Write Files to HBASE. 9. Major Components of HBASE. a. HBase Master. b. HRegionServer. c. HBase Client. d. Zookeeper. e. Region. 10. Compactions 11. HBase Practicals

12. Bulk Loads 13. HBase Command Line 14. Using Map Reduce for HBase Operations 15. HBase Java Client Programming Day 24: (2hours) Flume: 1. Flume Architecture 2. Real time streaming in Flume 3. Defining and Deploying Flume Agents 4. Access data from multiple sources to collectors. 5. Different types of channels 6. Configuring Flume Agents 7. Running a Usecase Day 25: (4hours) Kafka: Oozie: 1. Learn how to Develop Game-Changing Real Time Applications 2. Master kafka & its components 3. Understand architecture of kafka 4. Install kafka on single node as well as on multi-node cluster 5. Configure consumer, producer and brokers 6. Perform various Kafka Operations like adding and removing topics, modifying topics etc. 1. Oozie Architecture 2. Workflow designing in Oozie 3. Scheduling workflows in Oozie 4. Oozie practicals. 5. Automate the testing process using Oozie