Big Data Hadoop Course Content

Similar documents
Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Innovatus Technologies

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Certified Big Data and Hadoop Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum

Big Data Analytics using Apache Hadoop and Spark with Scala

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Hadoop Online Training

Hadoop Development Introduction

Techno Expert Solutions An institute for specialized studies!

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Introduction to BigData, Hadoop:-

Hadoop course content

Hadoop. Introduction to BIGDATA and HADOOP

Hadoop: The Definitive Guide

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Expert Lecture plan proposal Hadoop& itsapplication

Big Data Hadoop Stack

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

Hadoop An Overview. - Socrates CCDH

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

A complete Hadoop Development Training Program.

Big Data Architect.

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka

Data Analytics Job Guarantee Program

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Hadoop, Yarn and Beyond

Microsoft Big Data and Hadoop

Configuring and Deploying Hadoop Cluster Deployment Templates

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

BHAKTA KAVI NARSINH MEHTA UNIVERSITY JUNAGADH

Oracle Big Data Fundamentals Ed 2

CISC 7610 Lecture 2b The beginnings of NoSQL

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Big Data Development HADOOP Training - Workshop. FEB 12 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

Hadoop. Introduction / Overview

Hadoop. copyright 2011 Trainologic LTD

We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

CSE 444: Database Internals. Lecture 23 Spark

Oracle Big Data Fundamentals Ed 1

50 Must Read Hadoop Interview Questions & Answers

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Top 25 Hadoop Admin Interview Questions and Answers

Apache Hive for Oracle DBAs. Luís Marques

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

DATA SCIENCE USING SPARK: AN INTRODUCTION

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Big Data Hadoop Certification Training

microsoft

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Big Data Analytics. Description:

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

MapR Enterprise Hadoop

MapReduce and Hadoop

Processing of big data with Apache Spark

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

MapReduce, Hadoop and Spark. Bompotas Agorakis

Understanding NoSQL Database Implementations

Exam Questions

Hortonworks Data Platform

Lecture 11 Hadoop & Spark

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Comparing SQL and NOSQL databases

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

CIB Session 12th NoSQL Databases Structures

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Big Data Fundamentals

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Big Data with Hadoop Ecosystem

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński

TIE Data-intensive Programming. Dr. Timo Aaltonen Department of Pervasive Computing

Improving the MapReduce Big Data Processing Framework

Summary of Big Data Frameworks Course 2015 Professor Sasu Tarkoma

Apache Kudu. Zbigniew Baranowski

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

Databases 2 (VU) ( / )

A Glimpse of the Hadoop Echosystem

Cloud Computing & Visualization

Scalable Tools - Part I Introduction to Scalable Tools

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Prototyping Data Intensive Apps: TrendingTopics.org

Transcription:

Big Data Hadoop Course Content

Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux - Why Linux? - Windows and the Linux equivalents - Different flavors of Linux - Unity Shell (Ubuntu UI) - Basic Linux Com m ands (enough to get started with Hadoop) Understanding Big Data Understanding Big Data 3V ( Volume- Variety- Velocity) characteristics Structured and Unstructured Data Application and use cases of Big Data Limitations of traditional large Scale system s How a distributed way of computing is superior (cost and scale) Opportunities and challenges with Big Data HDFS (The Hadoop Distributed File System) HDFS Overview and Architecture Deployment Architecture Nam e Node, Data Node and Checkpoint Node ( aka Secondary Name Node) Safe mode HDFS Data Flows ( Read v/s Write) How HDFS addresses fault tolerance? Advanced HDFS features Load Balancer Dist Cp HDFS Federation HDFS High Availability Hadoop Archives Map Reduce - 1 (Theoretical Concepts) MapReduce overview Functional Programming paradigm s How to think in a MapReduce way? CRC Check Sum Data replication Rack awareness and Block placement policy Small files problem HDFS Interfaces Com m and Line Interface File System Administrative Web Interface

MapReduce Architecture Legacy MR v/s Next Generation MapReduce ( aka YARN/ MRv2) Slots v/s Containers Schedulers Shuffling, Sorting Hadoop Data Types Input and Output Formats Input Splits - Partitioning ( Hash Partitioner v/s Customer Partitioner) Distributed Cache MR Algorithm and Data Flow Word Count Alternatives to MR - BSP (Bulk Synchronous Parallel) Adhoc querying Graph Computing Engines Saving Binary Data using SequenceFiles and Avro Files Hadoop Streaming (developing and debugging non Java MR program s - Ruby and Python) Optimization techniques Speculative execution Combiners JVM Reuse Compression MR algorithm s (Non- graph) Sorting Term Frequency Inverse Document Frequency Student Data Base Max Temperature Different ways of joining data Word Co- Occurrence Map Reduce - 2 (Practice) Developing, debugging and deploying MR programs Stand alone m ode ( in Eclipse) Pseudo distributed mode ( as in the Big Data VM) - Fully distributed mode ( as in Pr oduction) MR API Old and the new MR API Java Client API Hadoop data types and custom Writable / WritableCom parables Different input and output formats MR algorithm s (Graph) - PageRank - Inverted I ndex Higher Level Abstractions for MR (Pig) Introduction and Architecture Different Modes of executing Pig constructs Data Types Dynamic invokers Pig streaming Macros Pig Latin language Constructs (LOAD, STORE, DUMP, SPLI T, etc) User Defined Functions Use Cases

Higher Level Abstractions for MR (Hive) Introduction and Architecture Different Modes of executing Hive queries Metastore Implementations HiveQL (DDL & DML Operations) External v/s Managed Tables Views Partitions & Buckets User Defined Functions Transformations using Non Java Use Cases Comparison of Pig and Hive NoSQL Databases - 1 (Theoretical Concepts) NoSQL Concepts Review of RDBMS Need for NoSQL Brewers CAP Theorem ACI D v/s BASE Schema on Read vs. Schema on Write Different levels of consistency Bloom filters Major and Minor compaction HBase v/s Cassandra Interfaces to HBase (for DDL and DML operations) Java API Client API Filters Scan Caching and Batching Com m and Line Interface REST API Advance HBase Features HBase Data Modeling Bulk loading data in HBase HBase Coprocessors - EndPoints (similar to Stored Procedur es in RDBMS) HBase Coprocessors - Observers (similar to Triggers in RDBMS) Different types of NoSQL databases Key Value Columnar Document Graph Columnar Databases concepts NoSQL Databases - 2 (Practice) HBase Architecture Master and the Region Server Catalog tables ( ROOT and META) Spark Introduction to RDD Installation and Configuration of Spark Spark Architecture Different interfaces to Spark Sample Python program s in Spark Setting up a Hadoop Cluster using Apache Hadoop Cloudera Hadoop cluster on the Amazon Cloud (Practice) Using EMR ( Elastic Map Reduce)

Using EC2 ( Elastic Compute Cloud) SSH Configuration Stand alone m ode (Theory) Distributed m ode (Theory) Pseudo distributed Fully distributed Hadoop Ecosystem and Use Cases Hadoop industry solutions Importing/ exporting data across RDBMS and HDFS using Sqoop Getting real- time events into HDFS using Flume Creating workflows in Oozie Introduction to Graph processing Graph processing with Neo4J Processing data in real time using Storm Interactive Adhoc querying with Impala Proof of concepts and use cases Click Stream Analysis using Pig and Hive Analyzing the Twitter data with Hive Further ideas for data analysis CASE STUDY # 1 Twitter Analysis It's not a hard rule, but almost 80% of the data is unstructured, while the remaining 20% is structured data. RDBMS helps to store/ process the structured data (20%), while Hadoop solves the problem of storing/ processing both types of data. Twitter is becoming the t rue source of data to figure out what the customer is thinking about a certain thing (sentimental analysis), to figure out what the t rending topic is and m any m any m ore. I n this case study we will get the data from Twitter using different means and do some interesting analysis on it. CASE STUDY # 2 Click Stream Analysis Many of the e- commerce sites had been making a quite an impact on the overall economy for quite some time in m any of the countries. All the e- commerce portals store the user activities on their site as clickstream activity and later they analyze it to identify what the user has browsed and show the appropriate recommendations when the user visits the site again or to send a personalized email. We will see how to analyze the clickstream and the user data together using Pig and Hive. We will get user data from RDBMS and the user behavior (click stream) using Flume into HDFS and then do some interesting analysis using both Hive and Pig. The above mentioned Click Stream Analysis will also be automated using the workflow engine Oozie.